Age | Commit message (Collapse) | Author | Files | Lines |
|
search
When doing top-down search the low_limit is not PAGE_SIZE but rather
max(PAGE_SIZE, mmap_min_addr). This handle cases in which mmap_min_addr >
PAGE_SIZE.
Fixes: fba2369e6ceb ("mm: use vm_unmapped_area() on powerpc architecture")
Reviewed-by: Laurent Dufour <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
After we ALIGN up the address we need to make sure we didn't overflow
and resulted in zero address. In that case, we need to make sure that
the returned address is greater than mmap_min_addr.
This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
run as non root user for
mmap(-1, MAP_HUGETLB)
The bug is that a non-root user requesting address -1 will be given address 0
which will then fail, whereas they should have been given something else that
would have succeeded.
We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address
with this change. So we think this is not a security issue, because it only affects
whether we choose an address below mmap_min_addr, not whether we
actually allow that address to be mapped. ie. there are existing capability
checks to prevent a user mapping below mmap_min_addr and those will still be
honoured even without this fix.
Fixes: 484837601d4d ("powerpc/mm: Add radix support for hugetlb")
Reviewed-by: Laurent Dufour <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In
c7d606f560e4 ("x86/mce: Improve error message when kernel cannot recover")
a case was added for a machine check caused by a DATA access to poison
memory from the kernel. A case should have been added also for an
uncorrectable error during an instruction fetch in the kernel.
Add that extra case so the error message now reads:
mce: [Hardware Error]: Machine check: Instruction fetch error in kernel
Fixes: c7d606f560e4 ("x86/mce: Improve error message when kernel cannot recover")
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The separate GPHY Firmware loader driver is not used any more, the GPHY
firmware is now loaded by the GSWIP switch driver which also makes use
of the GPHY.
Remove the old unused GPHY firmware loader driver.
The GPHY firmware is useless without an Ethernet and switch driver, it
should not harm if loading this does not work for system using an old
device tree.
I am not aware of any vendor separating the device tree from the kernel
binary, it should be ok to remove this.
The code and the functionality form this separate GPHY firmware loader
was added to the gswip driver in commit 14fceff4771e ("net: dsa: Add
Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Hauke Mehrtens <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
|
|
This was caught while staring at the whole {set,get}_fs() machinery.
It's last user, the 32-bit version of strnlen_user() went away with
5723aa993d83 ("x86: use the new generic strnlen_user() function")
so drop it.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
Cc: "Tobin C. Harding" <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The switch to the generic dma ops made dma masks mandatory, breaking
devices having them not set. In case of bcm63xx, it broke ethernet with
the following warning when trying to up the device:
[ 2.633123] ------------[ cut here ]------------
[ 2.637949] WARNING: CPU: 0 PID: 325 at ./include/linux/dma-mapping.h:516 bcm_enetsw_open+0x160/0xbbc
[ 2.647423] Modules linked in: gpio_button_hotplug
[ 2.652361] CPU: 0 PID: 325 Comm: ip Not tainted 4.19.16 #0
[ 2.658080] Stack : 80520000 804cd3ec 00000000 00000000 804ccc00 87085bdc 87d3f9d4 804f9a17
[ 2.666707] 8049cf18 00000145 80a942a0 00000204 80ac0000 10008400 87085b90 eb3d5ab7
[ 2.675325] 00000000 00000000 80ac0000 000022b0 00000000 00000000 00000007 00000000
[ 2.683954] 0000007a 80500000 0013b381 00000000 80000000 00000000 804a1664 80289878
[ 2.692572] 00000009 00000204 80ac0000 00000200 00000002 00000000 00000000 80a90000
[ 2.701191] ...
[ 2.703701] Call Trace:
[ 2.706244] [<8001f3c8>] show_stack+0x58/0x100
[ 2.710840] [<800336e4>] __warn+0xe4/0x118
[ 2.715049] [<800337d4>] warn_slowpath_null+0x48/0x64
[ 2.720237] [<80289878>] bcm_enetsw_open+0x160/0xbbc
[ 2.725347] [<802d1d4c>] __dev_open+0xf8/0x16c
[ 2.729913] [<802d20cc>] __dev_change_flags+0x100/0x1c4
[ 2.735290] [<802d21b8>] dev_change_flags+0x28/0x70
[ 2.740326] [<803539e0>] devinet_ioctl+0x310/0x7b0
[ 2.745250] [<80355fd8>] inet_ioctl+0x1f8/0x224
[ 2.749939] [<802af290>] sock_ioctl+0x30c/0x488
[ 2.754632] [<80112b34>] do_vfs_ioctl+0x740/0x7dc
[ 2.759459] [<80112c20>] ksys_ioctl+0x50/0x94
[ 2.763955] [<800240b8>] syscall_common+0x34/0x58
[ 2.768782] ---[ end trace fb1a6b14d74e28b6 ]---
[ 2.773544] bcm63xx_enetsw bcm63xx_enetsw.0: cannot allocate rx ring 512
Fix this by adding appropriate DMA masks for the platform devices.
Fixes: f8c55dc6e828 ("MIPS: use generic dma noncoherent ops for simple noncoherent platforms")
Signed-off-by: Jonas Gorski <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected] # v4.19+
|
|
We don't yet have an upstream glibc port for riscv, so there is no user
space for the existing ABI, and we can remove the definitions for 32-bit
time_t, off_t and struct resource and system calls based on them,
including the vdso.
Reviewed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
When calling __put_user(foo(), ptr), the __put_user() macro would call
foo() in between __uaccess_begin() and __uaccess_end(). If that code
were buggy, then those bugs would be run without SMAP protection.
Fortunately, there seem to be few instances of the problem in the
kernel. Nevertheless, __put_user() should be fixed to avoid doing this.
Therefore, evaluate __put_user()'s argument before setting AC.
This issue was noticed when an objtool hack by Peter Zijlstra complained
about genregs_get() and I compared the assembly output to the C source.
[ bp: Massage commit message and fixed up whitespace. ]
Fixes: 11f1a4b9755f ("x86: reorganize SMAP handling in user space accesses")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
|
|
This reverts commit 9da3f2b74054406f87dff7101a569217ffceb29b.
It was well-intentioned, but wrong. Overriding the exception tables for
instructions for random reasons is just wrong, and that is what the new
code did.
It caused problems for tracing, and it caused problems for strncpy_from_user(),
because the new checks made perfectly valid use cases break, rather than
catch things that did bad things.
Unchecked user space accesses are a problem, but that's not a reason to
add invalid checks that then people have to work around with silly flags
(in this case, that 'kernel_uaccess_faults_ok' flag, which is just an
odd way to say "this commit was wrong" and was sprinked into random
places to hide the wrongness).
The real fix to unchecked user space accesses is to get rid of the
special "let's not check __get_user() and __put_user() at all" logic.
Make __{get|put}_user() be just aliases to the regular {get|put}_user()
functions, and make it impossible to access user space without having
the proper checks in places.
The raison d'être of the special double-underscore versions used to be
that the range check was expensive, and if you did multiple user
accesses, you'd do the range check up front (like the signal frame
handling code, for example). But SMAP (on x86) and PAN (on ARM) have
made that optimization pointless, because the _real_ expense is the "set
CPU flag to allow user space access".
Do let's not break the valid cases to catch invalid cases that shouldn't
even exist.
Cc: Thomas Gleixner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tobin C. Harding <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jann Horn <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This adds emulation support for the following integer instructions:
* Modulo Signed Doubleword (modsd)
* Modulo Unsigned Doubleword (modud)
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This adds emulation support for the following integer instructions:
* Modulo Signed Word (modsw)
* Modulo Unsigned Word (moduw)
Signed-off-by: PrasannaKumar Muralidharan <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This adds emulation support for the following integer instructions:
* Extend-Sign Word and Shift Left Immediate (extswsli[.])
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This adds emulation support for the following integer instructions:
* Count Trailing Zeros Word (cnttzw[.])
* Count Trailing Zeros Doubleword (cnttzd[.])
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This adds emulation support for the following integer instructions:
* Deliver A Random Number (darn)
As suggested by Michael, this uses a raw .long for specifying the
instruction word when using inline assembly to retain compatibility
with older binutils.
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This adds emulation support for the following integer instructions:
* Multiply-Add High Doubleword (maddhd)
* Multiply-Add High Doubleword Unsigned (maddhdu)
* Multiply-Add Low Doubleword (maddld)
As suggested by Michael, this uses a raw .long for specifying the
instruction word when using inline assembly to retain compatibility
with older binutils.
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
commit 137cd7100ec6fa36d610e106df00acb4d8af99df
"ARM: dts: Enable Gemini flash access" contained a bug
by disabling the display controller, while the whole
idea with the patch was to enable flash access AND
the display controller, simultaneously. Fix it up.
Fixes: 137cd7100ec6 ("ARM: dts: Enable Gemini flash access")
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
This patch adds support for ColdFire eDMA platform driver.
Signed-off-by: Angelo Dureghello <[email protected]>
Signed-off-by: Greg Ungerer <[email protected]>
|
|
|
|
Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.
The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.
However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull KVM fixes from Paolo Bonzini:
"Bug fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: record maximum physical address width in kvm_mmu_extended_role
kvm: x86: Return LA57 feature based on hardware capability
x86/kvm/mmu: fix switch between root and guest MMUs
s390: vsie: Use effective CRYCBD.31 to check CRYCBD validity
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"One fix for an oops when using SRIOV, introduced by the recent changes
to support compound IOMMU groups.
Thanks to Alexey Kardashevskiy"
* tag 'powerpc-5.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv/sriov: Register IOMMU groups for VFs
|
|
Some stack pointers used to also be thread_info pointers
and were called tp. Now that they are only stack pointers,
rename them sp.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Now that current_thread_info is located at the beginning of 'current'
task struct, CURRENT_THREAD_INFO macro is not really needed any more.
This patch replaces it by loads of the value at PACA_THREAD_INFO(r13).
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Add PACA_THREAD_INFO rather than using PACACURRENT]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Now that thread_info is similar to task_struct, its address is in r2
so CURRENT_THREAD_INFO() macro is useless. This patch removes it.
This patch also moves the 'tovirt(r2, r2)' down just before the
reactivation of MMU translation, so that we keep the physical address
of 'current' in r2 until then. It avoids a few calls to tophys().
At the same time, as the 'cpu' field is not anymore in thread_info,
TI_CPU is renamed TASK_CPU by this patch.
It also allows to get rid of a couple of
'#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE' as ACCOUNT_CPU_USER_ENTRY()
and ACCOUNT_CPU_USER_EXIT() are empty when
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not defined.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Fix a missed conversion of TI_CPU idle_6xx.S]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The table of pointers 'current_set' has been used for retrieving
the stack and current. They used to be thread_info pointers as
they were pointing to the stack and current was taken from the
'task' field of the thread_info.
Now, the pointers of 'current_set' table are now both pointers
to task_struct and pointers to thread_info.
As they are used to get current, and the stack pointer is
retrieved from current's stack field, this patch changes
their type to task_struct, and renames secondary_ti to
secondary_current.
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
thread_info is not anymore in the stack, so the entire stack
can now be used.
There is also no risk anymore of corrupting task_cpu(p) with a
stack overflow so the patch removes the test.
When doing this, an explicit test for NULL stack pointer is
needed in validate_sp() as it is not anymore implicitely covered
by the sizeof(thread_info) gap.
In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.
Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are leaked,
making a number of attacks more difficult.
This has the following consequences:
- thread_info is now located at the beginning of task_struct.
- The 'cpu' field is now in task_struct, and only exists when
CONFIG_SMP is active.
- thread_info doesn't have anymore the 'task' field.
This patch:
- Removes all recopy of thread_info struct when the stack changes.
- Changes the CURRENT_THREAD_INFO() macro to point to current.
- Selects CONFIG_THREAD_INFO_IN_TASK.
- Modifies raw_smp_processor_id() to get ->cpu from current without
including linux/sched.h to avoid circular inclusion and without
including asm/asm-offsets.h to avoid symbol names duplication
between ASM constants and C constants.
- Modifies klp_init_thread_info() to take a task_struct pointer
argument.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Add task_stack.h to livepatch.h to fix build fails]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Make sure CURRENT_THREAD_INFO() is used with r1 which is the virtual
address of the stack, in order to ease the switch to r2 when we enable
THREAD_INFO_IN_TASK, as we have no register having the phys address of
current.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Change current_pt_regs() to use task_stack_page() rather than
current_thread_info() so that it keeps working once we enable
THREAD_INFO_IN_TASK.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Split out of large patch]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
When we enable THREAD_INFO_IN_TASK we will remove our definition of
current_thread_info(). Instead it will come from linux/thread_info.h
So switch processor.h to include the latter, so that it can continue
to find current_thread_info().
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently INIT_SP_LIMIT uses sizeof(init_thread_info), but that symbol
won't exist when we enable THREAD_INFO_IN_TASK. So just use the sizeof
the type which is the same value but will continue to work.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Rather than using the thread info use task_stack_page() to initialise
paca->kstack, that way it will work with THREAD_INFO_IN_TASK.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Update a few comments that talk about current_thread_info() in
preparation for THREAD_INFO_IN_TASK.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We have a few places that use current_thread_info()->task to access
current. This won't work with THREAD_INFO_IN_TASK so fix them now.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
A few places use CURRENT_THREAD_INFO, or the C version, to find the
stack. This will no longer work with THREAD_INFO_IN_TASK so change
them to find the stack in other ways.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The purpose of the pointer given to call_do_softirq() and
call_do_irq() is to point the new stack. Currently that's the same
thing as the thread_info, but won't be with THREAD_INFO_IN_TASK.
So change the parameter to void* and rename it 'sp'.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This patch renames THREAD_INFO to TASK_STACK, because it is in fact
the offset of the pointer to the stack in task_struct so this pointer
will not be impacted by the move of THREAD_INFO.
Also make it available on 64-bit, as we'll need it there when we
activate THREAD_INFO_IN_TASK.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Make available on 64-bit]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
[text copied from commit 9bbd4c56b0b6
("arm64: prep stack walkers for THREAD_INFO_IN_TASK")]
When CONFIG_THREAD_INFO_IN_TASK is selected, task stacks may be freed
before a task is destroyed. To account for this, the stacks are
refcounted, and when manipulating the stack of another task, it is
necessary to get/put the stack to ensure it isn't freed and/or re-used
while we do so.
This patch reworks the powerpc stack walking code to account for this.
When CONFIG_THREAD_INFO_IN_TASK is not selected these perform no
refcounting, and this should only be a structural change that does not
affect behaviour.
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Move try_get_task_stack() below tsk == NULL check in show_stack()]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field
gets moved into task_struct and only defined when CONFIG_SMP is set.
This patch ensures that TI_CPU is only used when CONFIG_SMP is set and
that task_struct 'cpu' field is not used directly out of SMP code.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes
asm/current.h. This generates a circular dependency. To avoid that,
asm/processor.h shall not be included in mmu-hash.h.
In order to do that, this patch moves into a new header called
asm/task_size_64/32.h all the TASK_SIZE related constants, which can
then be included in mmu-hash.h directly.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
[mpe: Split out all the TASK_SIZE constants not just 64-bit ones]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Since only the virtual address of allocated blocks is used,
lets use functions returning directly virtual address.
Those functions have the advantage of also zeroing the block.
Suggested-by: Mike Rapoport <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In __secondary_start() we load the thread_info of the idle task of the
secondary CPU from current_set[cpu], and then convert it into a stack
pointer before storing that back to paca->kstack.
As pointed out in commit f761622e5943 ("powerpc: Initialise
paca->kstack before early_setup_secondary") it's important that we
initialise paca->kstack before calling the MMU setup code, in
particular slb_initialize(), because it will bolt the SLB entry for
the kstack into the SLB.
However we have already setup paca->kstack in cpu_idle_thread_init(),
since commit 3b5750644b2f ("[POWERPC] Bolt in SLB entry for kernel
stack on secondary cpus") (May 2008).
It's also in cpu_idle_thread_init() that we initialise current_set[cpu]
with the thread_info pointer, so there is no issue of the timing being
different between the two.
Therefore the initialisation of paca->kstack in __setup_secondary() is
completely redundant, so remove it.
This has the added benefit of removing code that runs in real mode,
and is therefore restricted by the RMO, and so opens the way for us to
enable THREAD_INFO_IN_TASK.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently in system_call_exit() we have an optimisation where we
disable MSR_RI (recoverable interrupt) and MSR_EE (external interrupt
enable) in a single mtmsrd instruction.
Unfortunately this will no longer work with THREAD_INFO_IN_TASK,
because then the load of TI_FLAGS might fault and faulting with MSR_RI
clear is treated as an unrecoverable exception which leads to a
panic().
So change the code to only clear MSR_EE prior to loading TI_FLAGS,
leaving the clear of MSR_RI until later. We have some latitude in
where do the clear of MSR_RI. A bit of experimentation has shown that
this location gives the least slow down.
This still causes a noticeable slow down in our null_syscall
performance. On a Power9 DD2.2:
Before After Delta Delta %
955 cycles 999 cycles -44 -4.6%
On the plus side this does simplify the code somewhat, because we
don't have to reenable MSR_RI on the restore_math() or
syscall_exit_work() paths which was necessitated previously by the
optimisation.
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
kcov provides kernel coverage data that's useful for fuzzing tools like
syzkaller.
Wire up kcov support on powerpc. Disable kcov instrumentation on the same
files where we currently disable gcov and UBSan instrumentation, plus some
additional exclusions which appear necessary to boot on book3e machines.
Signed-off-by: Andrew Donnellan <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Tested-by: Daniel Axtens <[email protected]> # e6500
Signed-off-by: Michael Ellerman <[email protected]>
|
|
On 8xx, large pages (512kb or 8M) are used to map kernel linear
memory. Aligning to 8M reduces TLB misses as only 8M pages are used
in that case. We make 8M the default for data.
This patchs allows the user to do it via Kconfig.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This patch implements handling of STRICT_KERNEL_RWX with
large TLBs directly in the TLB miss handlers.
To do so, etext and sinittext are aligned on 512kB boundaries
and the miss handlers use 512kB pages instead of 8Mb pages for
addresses close to the boundaries.
It sets RO PP flags for addresses under sinittext.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Depending on the number of available BATs for mapping the different
kernel areas, it might be needed to increase the alignment of _etext
and/or of data areas.
This patchs allows the user to do it via Kconfig.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Today, STRICT_KERNEL_RWX is based on the use of regular pages
to map kernel pages.
On Book3s 32, it has three consequences:
- Using pages instead of BAT for mapping kernel linear memory severely
impacts performance.
- Exec protection is not effective because no-execute cannot be set at
page level (except on 603 which doesn't have hash tables)
- Write protection is not effective because PP bits do not provide RO
mode for kernel-only pages (except on 603 which handles it in software
via PAGE_DIRTY)
On the 603+, we have:
- Independent IBAT and DBAT allowing limitation of exec parts.
- NX bit can be set in segment registers to forbit execution on memory
mapped by pages.
- RO mode on DBATs even for kernel-only blocks.
On the 601, there is nothing much we can do other than warn the user
about it, because:
- BATs are common to instructions and data.
- BAT do not provide RO mode for kernel-only blocks.
- segment registers don't have the NX bit.
In order to use IBAT for exec protection, this patch:
- Aligns _etext to BAT block sizes (128kb)
- Set NX bit in kernel segment register (Except on vmalloc area when
CONFIG_MODULES is selected)
- Maps kernel text with IBATs.
In order to use DBAT for exec protection, this patch:
- Aligns RW DATA to BAT block sizes (4M)
- Maps kernel RO area with write prohibited DBATs
- Maps remaining memory with remaining DBATs
Here is what we get with this patch on a 832x when activating
STRICT_KERNEL_RWX:
Symbols:
c0000000 T _stext
c0680000 R __start_rodata
c0680000 R _etext
c0800000 T __init_begin
c0800000 T _sinittext
~# cat /sys/kernel/debug/block_address_translation
---[ Instruction Block Address Translation ]---
0: 0xc0000000-0xc03fffff 0x00000000 Kernel EXEC coherent
1: 0xc0400000-0xc05fffff 0x00400000 Kernel EXEC coherent
2: 0xc0600000-0xc067ffff 0x00600000 Kernel EXEC coherent
3: -
4: -
5: -
6: -
7: -
---[ Data Block Address Translation ]---
0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent
1: 0xc0800000-0xc0ffffff 0x00800000 Kernel RW coherent
2: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent
3: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent
4: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent
5: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent
6: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent
7: -
~# cat /sys/kernel/debug/segment_registers
---[ User Segments ]---
0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xa085d0
0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xa086e1
0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xa087f2
0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xa08903
0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xa08a14
0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xa08b25
0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xa08c36
0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xa08d47
0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xa08e58
0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xa08f69
0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xa0907a
0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xa0918b
---[ Kernel Segments ]---
0xc0000000-0xcfffffff Kern key 0 User key 1 No Exec VSID 0x000ccc
0xd0000000-0xdfffffff Kern key 0 User key 1 No Exec VSID 0x000ddd
0xe0000000-0xefffffff Kern key 0 User key 1 No Exec VSID 0x000eee
0xf0000000-0xffffffff Kern key 0 User key 1 No Exec VSID 0x000fff
Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs:
16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb
(A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block)
Aligning data to 4M allows to map up to 512Mb data with 8 DBATs:
16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb
Because some processors only have 4 BATs and because some targets need
DBATs for mapping other areas, the following patch will allow to
modify _etext and data alignment.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
setibat() and clearibat() allows to manipulate IBATs independently
of DBATs.
update_bats() allows to update bats after init. This is done
with MMU off.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
CONFIG_STRICT_KERNEL_RWX requires a special alignment
for DATA for some subarches. Today it is just defined
as an #ifdef in vmlinux.lds.S
In order to get more flexibility, this patch moves the
definition of this alignment in Kconfig
On some subarches, CONFIG_STRICT_KERNEL_RWX will
require a special alignment of _etext.
This patch also adds a configuration item for it in Kconfig
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|