aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2018-10-09MIPS: Correct `mmiowb' barrier for `wbflush' platformsMaciej W. Rozycki1-8/+3
Redefine `mmiowb' in terms of `iobarrier_w' so that it works correctly for MIPS I platforms, which have no SYNC machine instruction and use a call to `wbflush' instead. This doesn't change the semantics for CONFIG_CPU_CAVIUM_OCTEON, because `iobarrier_w' expands to `wmb', which is ultimately the same as the current arrangement. For MIPS I platforms this not only makes any code that would happen to use `mmiowb' build and run, but it actually enforces the ordering required as well, as `iobarrier_w' has it already covered with the use of `wmb'. Signed-off-by: Maciej W. Rozycki <[email protected]> Signed-off-by: Paul Burton <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/20863/ Cc: Ralf Baechle <[email protected]>
2018-10-09MIPS: Define MMIO ordering barriersMaciej W. Rozycki1-0/+13
Define MMIO ordering barriers as separate operations so as to allow making places where such a barrier is required distinct from places where a memory or a DMA barrier is needed. Architecturally MIPS does not specify ordering requirements for uncached bus accesses such as MMIO operations normally use and therefore explicit barriers have to be inserted between MMIO accesses where unspecified ordering of operations would cause unpredictable results. MIPS MMIO ordering barriers are implemented using the same underlying mechanism that memory or a DMA barrier ordering barriers use, that is either a suitable SYNC instruction or a platform-specific `wbflush' call. However platforms may implement different ordering rules for different kinds of bus activity, so having a separate API makes it possible to remove unnecessary barriers and avoid a performance hit they may cause due to unrelated bus activity by making their implementation expand to nil while keeping the necessary ones. Also having distinct barriers for each kind of use makes it easier for the reader to understand what code has been intended to do. Signed-off-by: Maciej W. Rozycki <[email protected]> Signed-off-by: Paul Burton <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/20862/ Cc: Ralf Baechle <[email protected]>
2018-10-09MIPS: mscc: add PCB120 to the ocelot fitImageQuentin Schulz3-4/+21
PCB120 and PCB123 are both development boards based on Microsemi Ocelot so let's use the same fitImage for both. Reviewed-by: Alexandre Belloni <[email protected]> Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Paul Burton <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/20871/ Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected]
2018-10-09MIPS: mscc: add DT for Ocelot PCB120Quentin Schulz2-1/+108
The Ocelot PCB120 evaluation board is different from the PCB123 in that it has 4 external VSC8584 (or VSC8574) PHYs. It uses the SoC's second MDIO bus for external PHYs which have a reversed address on the bus (i.e. PHY4 is on address 3, PHY5 is on address 2, PHY6 on 1 and PHY7 on 0). Here is how the PHYs are connected to the switch ports: port 0: phy0 (internal) port 1: phy1 (internal) port 2: phy2 (internal) port 3: phy3 (internal) port 4: phy7 port 5: phy4 port 6: phy6 port 9: phy5 Reviewed-by: Alexandre Belloni <[email protected]> Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Paul Burton <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/20869/ Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected]
2018-10-09MIPS: memset: Limit excessive `noreorder' assembly mode useMaciej W. Rozycki1-24/+24
Rewrite to use the `reorder' assembly mode and remove manually scheduled delay slots except where GAS cannot schedule a delay-slot instruction due to a data dependency or a section switch (as is the case with the EX macro). No change in machine code produced. Signed-off-by: Maciej W. Rozycki <[email protected]> [[email protected]: Fix conflict with commit 932afdeec18b ("MIPS: Add Kconfig variable for CPUs with unaligned load/store instructions")] Signed-off-by: Paul Burton <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/20834/ Cc: Ralf Baechle <[email protected]>
2018-10-09MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regressionMaciej W. Rozycki1-1/+3
Fix a commit 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset") regression and remove assembly warnings: arch/mips/lib/memset.S: Assembler messages: arch/mips/lib/memset.S:243: Warning: Macro instruction expanded into multiple instructions in a branch delay slot triggering with the CPU_DADDI_WORKAROUNDS option set and this code: PTR_SUBU a2, t1, a0 jr ra PTR_ADDIU a2, 1 This is because with that option in place the DADDIU instruction, which the PTR_ADDIU CPP macro expands to, becomes a GAS macro, which in turn expands to an LI/DADDU (or actually ADDIU/DADDU) sequence: 13c: 01a4302f dsubu a2,t1,a0 140: 03e00008 jr ra 144: 24010001 li at,1 148: 00c1302d daddu a2,a2,at ... Correct this by switching off the `noreorder' assembly mode and letting GAS schedule this jump's delay slot, as there is nothing special about it that would require manual scheduling. With this change in place correct code is produced: 13c: 01a4302f dsubu a2,t1,a0 140: 24010001 li at,1 144: 03e00008 jr ra 148: 00c1302d daddu a2,a2,at ... Signed-off-by: Maciej W. Rozycki <[email protected]> Signed-off-by: Paul Burton <[email protected]> Fixes: 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset") Patchwork: https://patchwork.linux-mips.org/patch/20833/ Cc: Ralf Baechle <[email protected]> Cc: [email protected] # 4.17+
2018-10-09Merge tag 'kvmarm-fixes-for-4.19-2' of ↵Paolo Bonzini1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm fixes for 4.19, take #2 - Correctly order GICv3 SGI registers in the cp15 array
2018-10-09KVM: x86: support CONFIG_KVM_AMD=y with CONFIG_CRYPTO_DEV_CCP_DD=mPaolo Bonzini1-1/+5
SEV requires access to the AMD cryptographic device APIs, and this does not work when KVM is builtin and the crypto driver is a module. Actually the Kconfig conditions for CONFIG_KVM_AMD_SEV try to disable SEV in that case, but it does not work because the actual crypto calls are not culled, only sev_hardware_setup() is. This patch adds two CONFIG_KVM_AMD_SEV checks that gate all the remaining SEV code; it fixes this particular configuration, and drops 5 KiB of code when CONFIG_KVM_AMD_SEV=n. Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-09arm64: mm: Drop the unused cpu parameterShaokun Zhang1-4/+4
Cpu parameter is never used in flush_context, remove it. Acked-by: Will Deacon <[email protected]> Signed-off-by: Shaokun Zhang <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2018-10-09x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one errorBjorn Helgaas1-1/+1
The only use of KEXEC_BACKUP_SRC_END is as an argument to walk_system_ram_res(): int crash_load_segments(struct kimage *image) { ... walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END, image, determine_backup_region); walk_system_ram_res() expects "start, end" arguments that are inclusive, i.e., the range to be walked includes both the start and end addresses. KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the first address *past* the desired 0-640KB range. Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC region is [0-0x9ffff], not [0-0xa0000]. Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call") Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> CC: "H. Peter Anvin" <[email protected]> CC: Andrew Morton <[email protected]> CC: Brijesh Singh <[email protected]> CC: Greg Kroah-Hartman <[email protected]> CC: Ingo Molnar <[email protected]> CC: Lianbo Jiang <[email protected]> CC: Takashi Iwai <[email protected]> CC: Thomas Gleixner <[email protected]> CC: Tom Lendacky <[email protected]> CC: Vivek Goyal <[email protected]> CC: [email protected] CC: [email protected] CC: [email protected] CC: [email protected] CC: [email protected] Link: http://lkml.kernel.org/r/153805811578.1157.6948388946904655969.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09x86/mm: Remove spurious fault pkey checkDave Hansen1-6/+7
Spurious faults only ever occur in the kernel's address space. They are also constrained specifically to faults with one of these error codes: X86_PF_WRITE | X86_PF_PROT X86_PF_INSTR | X86_PF_PROT So, it's never even possible to reach spurious_kernel_fault_check() with X86_PF_PK set. In addition, the kernel's address space never has pages with user-mode protections. Protection Keys are only enforced on pages with user-mode protection. This gives us lots of reasons to not check for protection keys in our sprurious kernel fault handling. But, let's also add some warnings to ensure that these assumptions about protection keys hold true. Cc: [email protected] Cc: Jann Horn <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm/vsyscall: Consider vsyscall page part of user address spaceDave Hansen1-13/+25
The vsyscall page is weird. It is in what is traditionally part of the kernel address space. But, it has user permissions and we handle faults on it like we would on a user page: interrupts on. Right now, we handle vsyscall emulation in the "bad_area" code, which is used for both user-address-space and kernel-address-space faults. Move the handling to the user-address-space code *only* and ensure we get there by "excluding" the vsyscall page from the kernel address space via a check in fault_in_kernel_space(). Since the fault_in_kernel_space() check is used on 32-bit, also add a 64-bit check to make it clear we only use this path on 64-bit. Also move the unlikely() to be in is_vsyscall_vaddr() itself. This helps clean up the kernel fault handling path by removing a case that can happen in normal[1] operation. (Yeah, yeah, we can argue about the vsyscall page being "normal" or not.) This also makes sanity checks easier, like the "we never take pkey faults in the kernel address space" check in the next patch. Cc: [email protected] Cc: Jann Horn <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm: Add vsyscall address helperDave Hansen1-1/+10
We will shortly be using this check in two locations. Put it in a helper before we do so. Let's also insert PAGE_MASK instead of the open-coded ~0xfff. It is easier to read and also more obviously correct considering the implicit type conversion that has to happen when it is not an implicit 'unsigned long'. Cc: [email protected] Cc: Jann Horn <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm: Fix exception table commentsDave Hansen1-13/+15
The comments here are wrong. They are too absolute about where faults can occur when running in the kernel. The comments are also a bit hard to match up with the code. Trim down the comments, and make them more precise. Also add a comment explaining why we are doing the bad_area_nosemaphore() path here. Cc: [email protected] Cc: Jann Horn <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm: Add clarifying comments for user addr spaceDave Hansen1-0/+8
The SMAP and Reserved checking do not have nice comments. Add some to clarify and make it match everything else. Cc: [email protected] Cc: Jann Horn <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm: Break out user address space handlingDave Hansen1-19/+28
The last patch broke out kernel address space handing into its own helper. Now, do the same for user address space handling. Cc: [email protected] Cc: Jann Horn <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm: Break out kernel address space handlingDave Hansen1-39/+62
The page fault handler (__do_page_fault()) basically has two sections: one for handling faults in the kernel portion of the address space and another for faults in the user portion of the address space. But, these two parts don't stick out that well. Let's make that more clear from code separation and naming. Pull kernel fault handling into its own helper, and reflect that naming by renaming spurious_fault() -> spurious_kernel_fault(). Also, rewrite the vmalloc() handling comment a bit. It was a bit stale and also glossed over the reserved bit handling. Cc: [email protected] Cc: Jann Horn <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm: Clarify hardware vs. software "error_code"Dave Hansen1-25/+52
We pass around a variable called "error_code" all around the page fault code. Sounds simple enough, especially since "error_code" looks like it exactly matches the values that the hardware gives us on the stack to report the page fault error code (PFEC in SDM parlance). But, that's not how it works. For part of the page fault handler, "error_code" does exactly match PFEC. But, during later parts, it diverges and starts to mean something a bit different. Give it two names for its two jobs. The place it diverges is also really screwy. It's only in a spot where the hardware tells us we have kernel-mode access that occurred while we were in usermode accessing user-controlled address space. Add a warning in there. Cc: [email protected] Cc: Jann Horn <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm/tlb: Make lazy TLB mode lazierRik van Riel1-9/+58
Lazy TLB mode can result in an idle CPU being woken up by a TLB flush, when all it really needs to do is reload %CR3 at the next context switch, assuming no page table pages got freed. Memory ordering is used to prevent race conditions between switch_mm_irqs_off, which checks whether .tlb_gen changed, and the TLB invalidation code, which increments .tlb_gen whenever page table entries get invalidated. The atomic increment in inc_mm_tlb_gen is its own barrier; the context switch code adds an explicit barrier between reading tlbstate.is_lazy and next->context.tlb_gen. CPUs in lazy TLB mode remain part of the mm_cpumask(mm), both because that allows TLB flush IPIs to be sent at page table freeing time, and because the cache line bouncing on the mm_cpumask(mm) was responsible for about half the CPU use in switch_mm_irqs_off(). We can change native_flush_tlb_others() without touching other (paravirt) implementations of flush_tlb_others() because we'll be flushing less. The existing implementations flush more and are therefore still correct. Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Tested-by: Song Liu <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm/tlb: Add freed_tables element to flush_tlb_infoRik van Riel2-0/+2
Pass the information on to native_flush_tlb_others. No functional changes. Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_rangeRik van Riel5-8/+11
Add an argument to flush_tlb_mm_range to indicate whether page tables are about to be freed after this TLB flush. This allows for an optimization of flush_tlb_mm_range to skip CPUs in lazy TLB mode. No functional changes. Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm/tlb: Restructure switch_mm_irqs_off()Rik van Riel1-33/+33
Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. (cherry picked from commit 61d0beb5796ab11f7f3bf38cb2eccc6579aaa70b) Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Acked-by: Dave Hansen <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm/tlb: Always use lazy TLB modeRik van Riel2-30/+1
On most workloads, the number of context switches far exceeds the number of TLB flushes sent. Optimizing the context switches, by always using lazy TLB mode, speeds up those workloads. This patch results in about a 1% reduction in CPU use on a two socket Broadwell system running a memcache like workload. Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Tested-by: Song Liu <[email protected]> Signed-off-by: Rik van Riel <[email protected]> (cherry picked from commit 95b0e6357d3e4e05349668940d7ff8f3b7e7e11e) Acked-by: Dave Hansen <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-10-09x86/mm: Page size aware flush_tlb_mm_range()Peter Zijlstra5-22/+32
Use the new tlb_get_unmap_shift() to determine the stride of the INVLPG loop. Cc: Nick Piggin <[email protected]> Cc: Will Deacon <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
2018-10-09x86/hyperv: Enable PV qspinlock for Hyper-VYi Sun4-0/+108
Implement the required wait and kick callbacks to support PV spinlocks in Hyper-V guests. [ tglx: Document the requirement for disabling interrupts in the wait() callback. Remove goto and unnecessary includes. Add prototype for hv_vcpu_is_preempted(). Adapted to pending paravirt changes. ] Signed-off-by: Yi Sun <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Michael Kelley (EOSG) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-10-09x86/hyperv: Add GUEST_IDLE_MSR supportYi Sun1-0/+5
Hyper-V may expose a HV_X64_MSR_GUEST_IDLE MSR via HYPERV_CPUID_FEATURES. Reading this MSR triggers the host to transition the guest vCPU into an idle state. This state can be exited via an IPI even if the read in the guest happened from an interrupt disabled section. Signed-off-by: Yi Sun <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: "K. Y. Srinivasan" <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Stephen Hemminger <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-09ARM: KVM: Correctly order SGI register entries in the cp15 arrayMarc Zyngier1-4/+4
The ICC_ASGI1R and ICC_SGI0R register entries in the cp15 array are not correctly ordered, leading to a BUG() at boot time. Move them to their natural location. Fixes: 3e8a8a50c7ef ("KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses") Reported-by: Florian Fainelli <[email protected]> Tested-by: Florian Fainelli <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2018-10-09s390/pkey: Introduce new API for random protected key generationIngo Franzki2-0/+18
This patch introduces a new ioctl API and in-kernel API to generate a random protected key. The protected key is generated in a way that the effective clear key is never exposed in clear. Both APIs are described in detail in the header files arch/s390/include/asm/pkey.h and arch/s390/include/uapi/asm/pkey.h. Signed-off-by: Ingo Franzki <[email protected]> Reviewed-by: Harald Freudenberger <[email protected]> Reviewed-by: Hendrik Brueckner <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/zcrypt: zcrypt device driver cleanupHarald Freudenberger1-2/+2
Some cleanup in the s390 zcrypt device driver: - Removed fragments of pcixx crypto card code. This code can't be reached anymore because the hardware detection function does not recognize crypto cards < CEX2 since commit f56545430736 ("s390/zcrypt: Introduce QACT support for AP bus devices.") - Rename of some files and driver names which where still reflecting pcixx support to cex2a/cex2c. - Removed all the zcrypt version strings in the file headers. There is only one place left - the zcrypt.h header file is now the only place for zcrypt device driver version info. - Zcrypt version pump up from 2.2.0 to 2.2.1. Signed-off-by: Harald Freudenberger <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: add support for mem= kernel parameterVasily Gorbik1-0/+3
Handle mem= kernel parameter in kasan to limit physical memory. Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: optimize kasan vmemmap allocationVasily Gorbik1-1/+2
Kasan implementation now supports memory hotplug operations. For that reason regions of initially standby memory are now skipped from shadow mapping and are mapped/unmapped dynamically upon bringing memory online/offline. Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: avoid kasan crash with standby memory definedVasily Gorbik1-1/+18
Kasan early memory allocator simply chops off memory blocks from the end of the physical memory. Reuse mem_detect info to identify actual online memory end rather than using max_physmem_end. This allows to run the kernel with kasan enabled and standby memory defined. Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/head: avoid doubling early boot stack size under KASANVasily Gorbik2-1/+2
Early boot stack uses predefined 4 pages of memory 0x8000-0xC000. This stack is used to run not instumented decompressor/facilities verification C code. It doesn't make sense to double its size when the kernel is built with KASAN support. BOOT_STACK_ORDER is introduced to avoid that. Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/mm: improve debugfs ptdump markers walkingVasily Gorbik1-1/+1
This allows to print multiple markers when they happened to have the same value. ... 0x001bfffff0100000-0x001c000000000000 255M PMD I ---[ Kasan Shadow End ]--- ---[ vmemmap Area ]--- 0x001c000000000000-0x001c000002000000 32M PMD RW X ... Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/mm: optimize debugfs ptdump kasan zero page walkingVasily Gorbik1-1/+34
Kasan zero p4d/pud/pmd/pte are always filled in with corresponding kasan zero entries. Walking kasan zero page backed area is time consuming and unnecessary. When kasan zero p4d/pud/pmd is encountered, it eventually points to the kasan zero page always with the same attributes and nothing but it, therefore zero p4d/pud/pmd could be jumped over. Also adds a space between address range and pages number to separate them from each other when pages number is huge. 0x0018000000000000-0x0018000010000000 256M PMD RW X 0x0018000010000000-0x001bfffff0000000 1073741312M PTE RO X 0x001bfffff0000000-0x001bfffff0001000 4K PTE RW X Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: add option for 4-level paging supportVasily Gorbik4-7/+26
By default 3-level paging is used when the kernel is compiled with kasan support. Add 4-level paging option to support systems with more then 3TB of physical memory and to cover 4-level paging specific code with kasan as well. Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: free early identity mapping structuresVasily Gorbik3-2/+13
Kasan initialization code is changed to populate persistent shadow first, save allocator position into pgalloc_freeable and proceed with early identity mapping creation. This way early identity mapping paging structures could be freed at once after switching to swapper_pg_dir when early identity mapping is not needed anymore. Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: enable stack and global variables access checksVasily Gorbik2-1/+6
By defining KASAN_SHADOW_OFFSET in Kconfig stack and global variables memory access check instrumentation is enabled. gcc version 4.9.2 or newer is also required. Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/dumpstack: disable __dump_trace kasan instrumentationVasily Gorbik1-1/+1
Walking async_stack produces false positives. Disable __dump_trace function instrumentation for now. Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: reipl and kexec supportVasily Gorbik4-3/+8
Some functions from both arch/s390/kernel/ipl.c and arch/s390/kernel/machine_kexec.c are called without DAT enabled (or with and without DAT enabled code paths). There is no easy way to partially disable kasan for those files without a substantial rework. Disable kasan for both files for now. To avoid disabling kasan for arch/s390/kernel/diag.c DAT flag is enabled in diag308 call. pcpu_delegate which disables DAT is marked with __no_sanitize_address to disable instrumentation for that one function. Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/smp: kasan stack instrumentation supportVasily Gorbik2-4/+4
smp_start_secondary function is called without DAT enabled. To avoid disabling kasan instrumentation for entire arch/s390/kernel/smp.c smp_start_secondary has been split in 2 parts. smp_start_secondary has instrumentation disabled, it does minimal setup and enables DAT. Then instrumentated __smp_start_secondary is called to do the rest. __load_psw_mask function instrumentation has been disabled as well to be able to call it from smp_start_secondary. Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: use noexec and large pagesVasily Gorbik6-5/+118
To lower memory footprint and speed up kasan initialisation detect EDAT availability and use large pages if possible. As we know how much memory is needed for initialisation, another simplistic large page allocator is introduced to avoid memory fragmentation. Since facilities list is retrieved anyhow, detect noexec support and adjust pages attributes. Handle noexec kernel option to avoid inconsistent kasan shadow memory pages flags. Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: dynamic shadow mem allocation for modulesVasily Gorbik2-12/+14
Move from modules area entire shadow memory preallocation to dynamic allocation per module load. This behaivior has been introduced for x86 with bebf56a1b: "This patch also forces module_alloc() to return 8*PAGE_SIZE aligned address making shadow memory handling ( kasan_module_alloc()/kasan_module_free() ) more simple. Such alignment guarantees that each shadow page backing modules address space correspond to only one module_alloc() allocation" Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/mm: add kasan shadow to the debugfs pgtable dumpVasily Gorbik1-6/+15
This change adds address space markers for kasan shadow memory. Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: avoid user access code instrumentationVasily Gorbik1-0/+4
Kasan instrumentation adds "store" check for variables marked as modified by inline assembly. With user pointers containing addresses from another address space this produces false positives. static inline unsigned long clear_user_xc(void __user *to, ...) { asm volatile( ... : "+a" (to) ... User space access functions are wrapped by manually instrumented functions in kasan common code, which should be sufficient to catch errors. So, we just disable uaccess.o instrumentation altogether. Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: double the stack sizeVasily Gorbik2-1/+5
Kasan stack instrumentation pads stack variables with redzones, which increases stack frames size significantly. Stack sizes are increased from 16k to 32k in the code, as well as for the kernel stack overflow detection option (CHECK_STACK). Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: add initialization code and enable itVasily Gorbik8-7/+343
Kasan needs 1/8 of kernel virtual address space to be reserved as the shadow area. And eventually it requires the shadow memory offset to be known at compile time (passed to the compiler when full instrumentation is enabled). Any value picked as the shadow area offset for 3-level paging would eat up identity mapping on 4-level paging (with 1PB shadow area size). So, the kernel sticks to 3-level paging when kasan is enabled. 3TB border is picked as the shadow offset. The memory layout is adjusted so, that physical memory border does not exceed KASAN_SHADOW_START and vmemmap does not go below KASAN_SHADOW_END. Due to the fact that on s390 paging is set up very late and to cover more code with kasan instrumentation, temporary identity mapping and final shadow memory are set up early. The shadow memory mapping is later carried over to init_mm.pgd during paging_init. For the needs of paging structures allocation and shadow memory population a primitive allocator is used, which simply chops off memory blocks from the end of the physical memory. Kasan currenty doesn't track vmemmap and vmalloc areas. Current memory layout (for 3-level paging, 2GB physical memory). ---[ Identity Mapping ]--- 0x0000000000000000-0x0000000000100000 ---[ Kernel Image Start ]--- 0x0000000000100000-0x0000000002b00000 ---[ Kernel Image End ]--- 0x0000000002b00000-0x0000000080000000 2G <- physical memory border 0x0000000080000000-0x0000030000000000 3070G PUD I ---[ Kasan Shadow Start ]--- 0x0000030000000000-0x0000030010000000 256M PMD RW X <- shadow for 2G memory 0x0000030010000000-0x0000037ff0000000 523776M PTE RO NX <- kasan zero ro page 0x0000037ff0000000-0x0000038000000000 256M PMD RW X <- shadow for 2G modules ---[ Kasan Shadow End ]--- 0x0000038000000000-0x000003d100000000 324G PUD I ---[ vmemmap Area ]--- 0x000003d100000000-0x000003e080000000 ---[ vmalloc Area ]--- 0x000003e080000000-0x000003ff80000000 ---[ Modules Area ]--- 0x000003ff80000000-0x0000040000000000 2G Acked-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390: add pgd_page primitiveVasily Gorbik1-1/+10
Add pgd_page primitive which is required by kasan common code. Also fixes typo in p4d_page definition. Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390: introduce MAX_PTRS_PER_P4DVasily Gorbik1-0/+2
Kasan common code requires MAX_PTRS_PER_P4D definition, which in case of s390 is always PTRS_PER_P4D. Reviewed-by: Martin Schwidefsky <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-10-09s390/kasan: replace some memory functionsVasily Gorbik2-3/+30
Follow the common kasan approach: "KASan replaces memory functions with manually instrumented variants. Original functions declared as weak symbols so strong definitions in mm/kasan/kasan.c could replace them. Original functions have aliases with '__' prefix in name, so we could call non-instrumented variant if needed." Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>