Age | Commit message (Collapse) | Author | Files | Lines |
|
Redefine `mmiowb' in terms of `iobarrier_w' so that it works correctly
for MIPS I platforms, which have no SYNC machine instruction and use a
call to `wbflush' instead.
This doesn't change the semantics for CONFIG_CPU_CAVIUM_OCTEON, because
`iobarrier_w' expands to `wmb', which is ultimately the same as the
current arrangement. For MIPS I platforms this not only makes any code
that would happen to use `mmiowb' build and run, but it actually
enforces the ordering required as well, as `iobarrier_w' has it already
covered with the use of `wmb'.
Signed-off-by: Maciej W. Rozycki <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/20863/
Cc: Ralf Baechle <[email protected]>
|
|
Define MMIO ordering barriers as separate operations so as to allow
making places where such a barrier is required distinct from places
where a memory or a DMA barrier is needed.
Architecturally MIPS does not specify ordering requirements for uncached
bus accesses such as MMIO operations normally use and therefore explicit
barriers have to be inserted between MMIO accesses where unspecified
ordering of operations would cause unpredictable results.
MIPS MMIO ordering barriers are implemented using the same underlying
mechanism that memory or a DMA barrier ordering barriers use, that is
either a suitable SYNC instruction or a platform-specific `wbflush'
call. However platforms may implement different ordering rules for
different kinds of bus activity, so having a separate API makes it
possible to remove unnecessary barriers and avoid a performance hit they
may cause due to unrelated bus activity by making their implementation
expand to nil while keeping the necessary ones.
Also having distinct barriers for each kind of use makes it easier for
the reader to understand what code has been intended to do.
Signed-off-by: Maciej W. Rozycki <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/20862/
Cc: Ralf Baechle <[email protected]>
|
|
PCB120 and PCB123 are both development boards based on Microsemi Ocelot
so let's use the same fitImage for both.
Reviewed-by: Alexandre Belloni <[email protected]>
Signed-off-by: Quentin Schulz <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/20871/
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
|
|
The Ocelot PCB120 evaluation board is different from the PCB123 in that
it has 4 external VSC8584 (or VSC8574) PHYs.
It uses the SoC's second MDIO bus for external PHYs which have a
reversed address on the bus (i.e. PHY4 is on address 3, PHY5 is on
address 2, PHY6 on 1 and PHY7 on 0).
Here is how the PHYs are connected to the switch ports:
port 0: phy0 (internal)
port 1: phy1 (internal)
port 2: phy2 (internal)
port 3: phy3 (internal)
port 4: phy7
port 5: phy4
port 6: phy6
port 9: phy5
Reviewed-by: Alexandre Belloni <[email protected]>
Signed-off-by: Quentin Schulz <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/20869/
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
|
|
Rewrite to use the `reorder' assembly mode and remove manually scheduled
delay slots except where GAS cannot schedule a delay-slot instruction
due to a data dependency or a section switch (as is the case with the EX
macro). No change in machine code produced.
Signed-off-by: Maciej W. Rozycki <[email protected]>
[[email protected]:
Fix conflict with commit 932afdeec18b ("MIPS: Add Kconfig variable for
CPUs with unaligned load/store instructions")]
Signed-off-by: Paul Burton <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/20834/
Cc: Ralf Baechle <[email protected]>
|
|
Fix a commit 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for
small_memset") regression and remove assembly warnings:
arch/mips/lib/memset.S: Assembler messages:
arch/mips/lib/memset.S:243: Warning: Macro instruction expanded into multiple instructions in a branch delay slot
triggering with the CPU_DADDI_WORKAROUNDS option set and this code:
PTR_SUBU a2, t1, a0
jr ra
PTR_ADDIU a2, 1
This is because with that option in place the DADDIU instruction, which
the PTR_ADDIU CPP macro expands to, becomes a GAS macro, which in turn
expands to an LI/DADDU (or actually ADDIU/DADDU) sequence:
13c: 01a4302f dsubu a2,t1,a0
140: 03e00008 jr ra
144: 24010001 li at,1
148: 00c1302d daddu a2,a2,at
...
Correct this by switching off the `noreorder' assembly mode and letting
GAS schedule this jump's delay slot, as there is nothing special about
it that would require manual scheduling. With this change in place
correct code is produced:
13c: 01a4302f dsubu a2,t1,a0
140: 24010001 li at,1
144: 03e00008 jr ra
148: 00c1302d daddu a2,a2,at
...
Signed-off-by: Maciej W. Rozycki <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Fixes: 8a8158c85e1e ("MIPS: memset.S: EVA & fault support for small_memset")
Patchwork: https://patchwork.linux-mips.org/patch/20833/
Cc: Ralf Baechle <[email protected]>
Cc: [email protected] # 4.17+
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm fixes for 4.19, take #2
- Correctly order GICv3 SGI registers in the cp15 array
|
|
SEV requires access to the AMD cryptographic device APIs, and this
does not work when KVM is builtin and the crypto driver is a module.
Actually the Kconfig conditions for CONFIG_KVM_AMD_SEV try to disable
SEV in that case, but it does not work because the actual crypto
calls are not culled, only sev_hardware_setup() is.
This patch adds two CONFIG_KVM_AMD_SEV checks that gate all the remaining
SEV code; it fixes this particular configuration, and drops 5 KiB of
code when CONFIG_KVM_AMD_SEV=n.
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Cpu parameter is never used in flush_context, remove it.
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Shaokun Zhang <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|
|
The only use of KEXEC_BACKUP_SRC_END is as an argument to
walk_system_ram_res():
int crash_load_segments(struct kimage *image)
{
...
walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
image, determine_backup_region);
walk_system_ram_res() expects "start, end" arguments that are inclusive,
i.e., the range to be walked includes both the start and end addresses.
KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the
first address *past* the desired 0-640KB range.
Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC
region is [0-0x9ffff], not [0-0xa0000].
Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Brijesh Singh <[email protected]>
CC: Greg Kroah-Hartman <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Lianbo Jiang <[email protected]>
CC: Takashi Iwai <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Tom Lendacky <[email protected]>
CC: Vivek Goyal <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
Link: http://lkml.kernel.org/r/153805811578.1157.6948388946904655969.stgit@bhelgaas-glaptop.roam.corp.google.com
|
|
Spurious faults only ever occur in the kernel's address space. They
are also constrained specifically to faults with one of these error codes:
X86_PF_WRITE | X86_PF_PROT
X86_PF_INSTR | X86_PF_PROT
So, it's never even possible to reach spurious_kernel_fault_check() with
X86_PF_PK set.
In addition, the kernel's address space never has pages with user-mode
protections. Protection Keys are only enforced on pages with user-mode
protection.
This gives us lots of reasons to not check for protection keys in our
sprurious kernel fault handling.
But, let's also add some warnings to ensure that these assumptions about
protection keys hold true.
Cc: [email protected]
Cc: Jann Horn <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The vsyscall page is weird. It is in what is traditionally part of
the kernel address space. But, it has user permissions and we handle
faults on it like we would on a user page: interrupts on.
Right now, we handle vsyscall emulation in the "bad_area" code, which
is used for both user-address-space and kernel-address-space faults.
Move the handling to the user-address-space code *only* and ensure we
get there by "excluding" the vsyscall page from the kernel address
space via a check in fault_in_kernel_space().
Since the fault_in_kernel_space() check is used on 32-bit, also add a
64-bit check to make it clear we only use this path on 64-bit. Also
move the unlikely() to be in is_vsyscall_vaddr() itself.
This helps clean up the kernel fault handling path by removing a case
that can happen in normal[1] operation. (Yeah, yeah, we can argue
about the vsyscall page being "normal" or not.) This also makes
sanity checks easier, like the "we never take pkey faults in the
kernel address space" check in the next patch.
Cc: [email protected]
Cc: Jann Horn <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
We will shortly be using this check in two locations. Put it in
a helper before we do so.
Let's also insert PAGE_MASK instead of the open-coded ~0xfff.
It is easier to read and also more obviously correct considering
the implicit type conversion that has to happen when it is not
an implicit 'unsigned long'.
Cc: [email protected]
Cc: Jann Horn <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The comments here are wrong. They are too absolute about where
faults can occur when running in the kernel. The comments are
also a bit hard to match up with the code.
Trim down the comments, and make them more precise.
Also add a comment explaining why we are doing the
bad_area_nosemaphore() path here.
Cc: [email protected]
Cc: Jann Horn <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The SMAP and Reserved checking do not have nice comments. Add
some to clarify and make it match everything else.
Cc: [email protected]
Cc: Jann Horn <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The last patch broke out kernel address space handing into its own
helper. Now, do the same for user address space handling.
Cc: [email protected]
Cc: Jann Horn <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The page fault handler (__do_page_fault()) basically has two sections:
one for handling faults in the kernel portion of the address space
and another for faults in the user portion of the address space.
But, these two parts don't stick out that well. Let's make that more
clear from code separation and naming. Pull kernel fault
handling into its own helper, and reflect that naming by renaming
spurious_fault() -> spurious_kernel_fault().
Also, rewrite the vmalloc() handling comment a bit. It was a bit
stale and also glossed over the reserved bit handling.
Cc: [email protected]
Cc: Jann Horn <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
We pass around a variable called "error_code" all around the page
fault code. Sounds simple enough, especially since "error_code" looks
like it exactly matches the values that the hardware gives us on the
stack to report the page fault error code (PFEC in SDM parlance).
But, that's not how it works.
For part of the page fault handler, "error_code" does exactly match
PFEC. But, during later parts, it diverges and starts to mean
something a bit different.
Give it two names for its two jobs.
The place it diverges is also really screwy. It's only in a spot
where the hardware tells us we have kernel-mode access that occurred
while we were in usermode accessing user-controlled address space.
Add a warning in there.
Cc: [email protected]
Cc: Jann Horn <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Lazy TLB mode can result in an idle CPU being woken up by a TLB flush,
when all it really needs to do is reload %CR3 at the next context switch,
assuming no page table pages got freed.
Memory ordering is used to prevent race conditions between switch_mm_irqs_off,
which checks whether .tlb_gen changed, and the TLB invalidation code, which
increments .tlb_gen whenever page table entries get invalidated.
The atomic increment in inc_mm_tlb_gen is its own barrier; the context
switch code adds an explicit barrier between reading tlbstate.is_lazy and
next->context.tlb_gen.
CPUs in lazy TLB mode remain part of the mm_cpumask(mm), both because
that allows TLB flush IPIs to be sent at page table freeing time, and
because the cache line bouncing on the mm_cpumask(mm) was responsible
for about half the CPU use in switch_mm_irqs_off().
We can change native_flush_tlb_others() without touching other
(paravirt) implementations of flush_tlb_others() because we'll be
flushing less. The existing implementations flush more and are
therefore still correct.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Tested-by: Song Liu <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Pass the information on to native_flush_tlb_others.
No functional changes.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Add an argument to flush_tlb_mm_range to indicate whether page tables
are about to be freed after this TLB flush. This allows for an
optimization of flush_tlb_mm_range to skip CPUs in lazy TLB mode.
No functional changes.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Move some code that will be needed for the lazy -> !lazy state
transition when a lazy TLB CPU has gotten out of date.
No functional changes, since the if (real_prev == next) branch
always returns.
(cherry picked from commit 61d0beb5796ab11f7f3bf38cb2eccc6579aaa70b)
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
On most workloads, the number of context switches far exceeds the
number of TLB flushes sent. Optimizing the context switches, by always
using lazy TLB mode, speeds up those workloads.
This patch results in about a 1% reduction in CPU use on a two socket
Broadwell system running a memcache like workload.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Tested-by: Song Liu <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
(cherry picked from commit 95b0e6357d3e4e05349668940d7ff8f3b7e7e11e)
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Use the new tlb_get_unmap_shift() to determine the stride of the
INVLPG loop.
Cc: Nick Piggin <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
|
|
Implement the required wait and kick callbacks to support PV spinlocks in
Hyper-V guests.
[ tglx: Document the requirement for disabling interrupts in the wait()
callback. Remove goto and unnecessary includes. Add prototype
for hv_vcpu_is_preempted(). Adapted to pending paravirt changes. ]
Signed-off-by: Yi Sun <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Michael Kelley (EOSG) <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Hyper-V may expose a HV_X64_MSR_GUEST_IDLE MSR via HYPERV_CPUID_FEATURES.
Reading this MSR triggers the host to transition the guest vCPU into an
idle state. This state can be exited via an IPI even if the read in the
guest happened from an interrupt disabled section.
Signed-off-by: Yi Sun <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The ICC_ASGI1R and ICC_SGI0R register entries in the cp15 array
are not correctly ordered, leading to a BUG() at boot time.
Move them to their natural location.
Fixes: 3e8a8a50c7ef ("KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses")
Reported-by: Florian Fainelli <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
This patch introduces a new ioctl API and in-kernel API to
generate a random protected key. The protected key is generated
in a way that the effective clear key is never exposed in clear.
Both APIs are described in detail in the header files
arch/s390/include/asm/pkey.h and arch/s390/include/uapi/asm/pkey.h.
Signed-off-by: Ingo Franzki <[email protected]>
Reviewed-by: Harald Freudenberger <[email protected]>
Reviewed-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Some cleanup in the s390 zcrypt device driver:
- Removed fragments of pcixx crypto card code. This code
can't be reached anymore because the hardware detection
function does not recognize crypto cards < CEX2 since
commit f56545430736 ("s390/zcrypt: Introduce QACT support
for AP bus devices.")
- Rename of some files and driver names which where still
reflecting pcixx support to cex2a/cex2c.
- Removed all the zcrypt version strings in the file headers.
There is only one place left - the zcrypt.h header file is
now the only place for zcrypt device driver version info.
- Zcrypt version pump up from 2.2.0 to 2.2.1.
Signed-off-by: Harald Freudenberger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Handle mem= kernel parameter in kasan to limit physical memory.
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Kasan implementation now supports memory hotplug operations. For that
reason regions of initially standby memory are now skipped from
shadow mapping and are mapped/unmapped dynamically upon bringing
memory online/offline.
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Kasan early memory allocator simply chops off memory blocks from the
end of the physical memory. Reuse mem_detect info to identify actual
online memory end rather than using max_physmem_end. This allows to run
the kernel with kasan enabled and standby memory defined.
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Early boot stack uses predefined 4 pages of memory 0x8000-0xC000. This
stack is used to run not instumented decompressor/facilities
verification C code. It doesn't make sense to double its size when
the kernel is built with KASAN support. BOOT_STACK_ORDER is introduced
to avoid that.
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
This allows to print multiple markers when they happened to have the
same value.
...
0x001bfffff0100000-0x001c000000000000 255M PMD I
---[ Kasan Shadow End ]---
---[ vmemmap Area ]---
0x001c000000000000-0x001c000002000000 32M PMD RW X
...
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Kasan zero p4d/pud/pmd/pte are always filled in with corresponding
kasan zero entries. Walking kasan zero page backed area is time
consuming and unnecessary. When kasan zero p4d/pud/pmd is encountered,
it eventually points to the kasan zero page always with the same
attributes and nothing but it, therefore zero p4d/pud/pmd could
be jumped over.
Also adds a space between address range and pages number to separate
them from each other when pages number is huge.
0x0018000000000000-0x0018000010000000 256M PMD RW X
0x0018000010000000-0x001bfffff0000000 1073741312M PTE RO X
0x001bfffff0000000-0x001bfffff0001000 4K PTE RW X
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
By default 3-level paging is used when the kernel is compiled with
kasan support. Add 4-level paging option to support systems with more
then 3TB of physical memory and to cover 4-level paging specific code
with kasan as well.
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Kasan initialization code is changed to populate persistent shadow
first, save allocator position into pgalloc_freeable and proceed with
early identity mapping creation. This way early identity mapping paging
structures could be freed at once after switching to swapper_pg_dir
when early identity mapping is not needed anymore.
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
By defining KASAN_SHADOW_OFFSET in Kconfig stack and global variables
memory access check instrumentation is enabled. gcc version 4.9.2 or
newer is also required.
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Walking async_stack produces false positives. Disable __dump_trace
function instrumentation for now.
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Some functions from both arch/s390/kernel/ipl.c and
arch/s390/kernel/machine_kexec.c are called without DAT enabled
(or with and without DAT enabled code paths). There is no easy way
to partially disable kasan for those files without a substantial
rework. Disable kasan for both files for now.
To avoid disabling kasan for arch/s390/kernel/diag.c DAT flag is
enabled in diag308 call. pcpu_delegate which disables DAT is marked
with __no_sanitize_address to disable instrumentation for that one
function.
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
smp_start_secondary function is called without DAT enabled. To avoid
disabling kasan instrumentation for entire arch/s390/kernel/smp.c
smp_start_secondary has been split in 2 parts. smp_start_secondary has
instrumentation disabled, it does minimal setup and enables DAT. Then
instrumentated __smp_start_secondary is called to do the rest.
__load_psw_mask function instrumentation has been disabled as well
to be able to call it from smp_start_secondary.
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
To lower memory footprint and speed up kasan initialisation detect
EDAT availability and use large pages if possible. As we know how
much memory is needed for initialisation, another simplistic large
page allocator is introduced to avoid memory fragmentation.
Since facilities list is retrieved anyhow, detect noexec support and
adjust pages attributes. Handle noexec kernel option to avoid inconsistent
kasan shadow memory pages flags.
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Move from modules area entire shadow memory preallocation to dynamic
allocation per module load.
This behaivior has been introduced for x86 with bebf56a1b: "This patch
also forces module_alloc() to return 8*PAGE_SIZE aligned address making
shadow memory handling ( kasan_module_alloc()/kasan_module_free() )
more simple. Such alignment guarantees that each shadow page backing
modules address space correspond to only one module_alloc() allocation"
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
This change adds address space markers for kasan shadow memory.
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Kasan instrumentation adds "store" check for variables marked as
modified by inline assembly. With user pointers containing addresses
from another address space this produces false positives.
static inline unsigned long clear_user_xc(void __user *to, ...)
{
asm volatile(
...
: "+a" (to) ...
User space access functions are wrapped by manually instrumented
functions in kasan common code, which should be sufficient to catch
errors. So, we just disable uaccess.o instrumentation altogether.
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Kasan stack instrumentation pads stack variables with redzones, which
increases stack frames size significantly. Stack sizes are increased
from 16k to 32k in the code, as well as for the kernel stack overflow
detection option (CHECK_STACK).
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Kasan needs 1/8 of kernel virtual address space to be reserved as the
shadow area. And eventually it requires the shadow memory offset to be
known at compile time (passed to the compiler when full instrumentation
is enabled). Any value picked as the shadow area offset for 3-level
paging would eat up identity mapping on 4-level paging (with 1PB
shadow area size). So, the kernel sticks to 3-level paging when kasan
is enabled. 3TB border is picked as the shadow offset. The memory
layout is adjusted so, that physical memory border does not exceed
KASAN_SHADOW_START and vmemmap does not go below KASAN_SHADOW_END.
Due to the fact that on s390 paging is set up very late and to cover
more code with kasan instrumentation, temporary identity mapping and
final shadow memory are set up early. The shadow memory mapping is
later carried over to init_mm.pgd during paging_init.
For the needs of paging structures allocation and shadow memory
population a primitive allocator is used, which simply chops off
memory blocks from the end of the physical memory.
Kasan currenty doesn't track vmemmap and vmalloc areas.
Current memory layout (for 3-level paging, 2GB physical memory).
---[ Identity Mapping ]---
0x0000000000000000-0x0000000000100000
---[ Kernel Image Start ]---
0x0000000000100000-0x0000000002b00000
---[ Kernel Image End ]---
0x0000000002b00000-0x0000000080000000 2G <- physical memory border
0x0000000080000000-0x0000030000000000 3070G PUD I
---[ Kasan Shadow Start ]---
0x0000030000000000-0x0000030010000000 256M PMD RW X <- shadow for 2G memory
0x0000030010000000-0x0000037ff0000000 523776M PTE RO NX <- kasan zero ro page
0x0000037ff0000000-0x0000038000000000 256M PMD RW X <- shadow for 2G modules
---[ Kasan Shadow End ]---
0x0000038000000000-0x000003d100000000 324G PUD I
---[ vmemmap Area ]---
0x000003d100000000-0x000003e080000000
---[ vmalloc Area ]---
0x000003e080000000-0x000003ff80000000
---[ Modules Area ]---
0x000003ff80000000-0x0000040000000000 2G
Acked-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Add pgd_page primitive which is required by kasan common code.
Also fixes typo in p4d_page definition.
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Kasan common code requires MAX_PTRS_PER_P4D definition, which in case
of s390 is always PTRS_PER_P4D.
Reviewed-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
Follow the common kasan approach:
"KASan replaces memory functions with manually instrumented
variants. Original functions declared as weak symbols so strong
definitions in mm/kasan/kasan.c could replace them. Original
functions have aliases with '__' prefix in name, so we could call
non-instrumented variant if needed."
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|