Age | Commit message (Collapse) | Author | Files | Lines |
|
The ISA v3.0B copy-paste facility only requires cpabort when switching
to a process that has foreign real addresses mapped (direct access to
accelerators), to clear a potential copy buffer filled by a previous
thread. There is no accelerator driver implemented yet, so cpabort can
be removed. It can be be re-added when a driver is implemented.
POWER9 DD1 requires the copy buffer to always be cleared on context
switch, but if accelerators are not in use, then an unpaired copy from
a dummy region is sufficient to clear data out of the copy buffer.
This increases context switch performance by about 5% on POWER9.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The sync (aka. hwsync, aka. heavyweight sync) in the context switch
code to prevent MMIO access being reordered from the point of view of
a single process if it gets migrated to a different CPU is not
required because there is an hwsync performed earlier in the context
switch path.
Comment this so it's clear enough if anything changes on the scheduler
or the powerpc sides. Remove the hwsync from _switch.
This improves context switch performance by 2-3% on POWER8.
Signed-off-by: Nicholas Piggin <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
There is no need to explicitly break the reservation in _switch,
because we are guaranteed that the context switch path will include a
larx/stcx.
Comment the guarantee and remove the reservation clear from _switch.
This is worth 1-2% in context switch performance.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug")
hard disabled interrupts over the low level context switch, because
the SLB management can't cope with a PMU interrupt accesing the stack
in that window.
Radix based kernel mapping does not use the SLB so it does not require
interrupts hard disabled here.
This is worth 1-2% in context switch performance on POWER9.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The syscall exit code that branches to restore_math is quite heavy on
Book3S, consisting of 2 mtmsr instructions. Threads that don't use both
FP and vector can get caught here if the kernel ever uses FP or vector.
Lazy-FP/vec context switching also trips this case.
So check for lazy FP and vector before switching RI for restore_math.
Move most of this case out of line.
For threads that do want to restore math registers, the MSR switches are
still suboptimal. Future direction may be to use a soft-RI bit to avoid
MSR switches in kernel (similar to soft-EE), but for now at least the
no-restore
POWER9 context switch rate increases by about 5% due to sched_yield(2)
return performance. I haven't constructed a test to measure the syscall
cost.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
guest to host"), a getppid() system call goes from 307 cycles to 358
cycles (+17%) on POWER8. This is due significantly to the scratch SPR
used by the hypercall check.
It turns out there are a some volatile registers common to both system
call and hypercall (in particular, r12, cr0, ctr), which can be used to
avoid the SPR and some other overheads. This brings getppid to 320 cycles
(+4%).
Testing hcall entry performance by running "sc 1" in guest userspace
before this patch is 854 cycles, afterwards is 826. Also a small win
there.
POWER9 syscall is improved by about the same amount, hcall not tested.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently we map the whole linear mapping with PAGE_KERNEL_X. Instead we
should check if the page overlaps the kernel text and only then add
PAGE_KERNEL_X.
Note that we still use 1G pages if they're available, so this will
typically still result in a 1G executable page at KERNELBASE. So this fix is
primarily useful for catching stray branches to high linear mapping addresses.
Without this patch, we can execute at 1G in xmon using:
0:mon> m c000000040000000
c000000040000000 00 l
c000000040000000 00000000 01006038
c000000040000004 00000000 2000804e
c000000040000008 00000000 x
0:mon> di c000000040000000
c000000040000000 38600001 li r3,1
c000000040000004 4e800020 blr
0:mon> p c000000040000000
return value is 0x1
After we get a 400 as expected:
0:mon> p c000000040000000
*** 400 exception occurred
Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines")
Cc: [email protected] # v4.7+
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Balbir Singh <[email protected]>
|
|
This reverts commit 45cb08f4791ce6a15c54598b4cb73db4b4b8294f.
For some reason this is causing IRQ problems on Freescale Book3E
machines, eg on my p5020ds:
irq 25: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc3-gcc-6.3.1-00037-g45cb08f4791c #624
Call Trace:
[c0000000fffdbb10] [c00000000049962c] .dump_stack+0xa8/0xe8 (unreliable)
[c0000000fffdbba0] [c0000000000babf4] .__report_bad_irq+0x54/0x140
[c0000000fffdbc40] [c0000000000bb11c] .note_interrupt+0x324/0x380
[c0000000fffdbd00] [c0000000000b7110] .handle_irq_event_percpu+0x68/0x88
[c0000000fffdbd90] [c0000000000b718c] .handle_irq_event+0x5c/0xa8
[c0000000fffdbe10] [c0000000000bc01c] .handle_fasteoi_irq+0xe4/0x298
[c0000000fffdbe90] [c0000000000b59c4] .generic_handle_irq+0x50/0x74
[c0000000fffdbf10] [c0000000000075d8] .__do_irq+0x74/0x1f0
[c0000000fffdbf90] [c0000000000189f8] .call_do_irq+0x14/0x24
[c0000000f7173060] [c0000000000077e4] .do_IRQ+0x90/0x120
[c0000000f7173100] [c00000000001d93c] exc_0x500_common+0xfc/0x100
--- interrupt: 501 at .prepare_to_wait_event+0xc/0x14c
LR = .fsl_elbc_run_command+0xc8/0x23c
[c0000000f71734d0] [c00000000065f418] .nand_reset+0xb8/0x168
[c0000000f7173560] [c00000000065fec4] .nand_scan_ident+0x2b0/0x1638
[c0000000f7173650] [c000000000666cd8] .fsl_elbc_nand_probe+0x34c/0x5f0
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[c0000000f7173750] [c0000000005a3c60] .platform_drv_probe+0x64/0xb0
[c0000000f71737d0] [c0000000005a12e0] .really_probe+0x290/0x334
[c0000000f7173870] [c0000000005a14a0] .__driver_attach+0x11c/0x120
[c0000000f7173900] [c00000000059e6a0] .bus_for_each_dev+0x98/0xfc
[c0000000f71739a0] [c0000000005a0b3c] .driver_attach+0x34/0x4c
[c0000000f7173a20] [c0000000005a04b0] .bus_add_driver+0x1ac/0x2e0
[c0000000f7173ac0] [c0000000005a2170] .driver_register+0x94/0x160
[c0000000f7173b40] [c0000000005a3be0] .__platform_driver_register+0x60/0x7c
[c0000000f7173bc0] [c000000000d6aab4] .fsl_elbc_nand_driver_init+0x24/0x38
[c0000000f7173c30] [c000000000001934] .do_one_initcall+0x68/0x1b8
[c0000000f7173d00] [c000000000d210f8] .kernel_init_freeable+0x260/0x338
[c0000000f7173db0] [c0000000000021b0] .kernel_init+0x20/0xe70
[c0000000f7173e30] [c0000000000009bc] .ret_from_kernel_thread+0x58/0x9c
handlers:
[<c000000000ed85c8>] .fsl_lbc_ctrl_irq
Disabling IRQ #25
Ben also had concerns with the implementation being potentially slow on
some PICs, so revert it for now.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit 4c3b89effc28 ("powerpc/powernv: Add sanity checks to
pnv_pci_get_{gpu|npu}_dev") introduced explicit warnings in
pnv_pci_get_npu_dev() when a PCIe device has no associated device-tree
node. However not all PCIe devices have an of_node and
pnv_pci_get_npu_dev() gets indirectly called at least once for every
PCIe device in the system. This results in spurious WARN_ON()'s so
remove it.
The same situation should not exist for pnv_pci_get_gpu_dev() as any
NPU based PCIe device requires a device-tree node.
Fixes: 4c3b89effc28 ("powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev")
Reported-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with
other general kernel options. It's really more at home in the "Platform
Support" section, so move it there.
Also enable it by default, for Book3s 64. It does mostly nothing unless
the device tree properties are found, and we will want it enabled
eventually in distro kernels, so turn it on to start getting more
testing.
Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Supporting 512TB requires us to do a order 3 allocation for level 1 page
table (pgd). This results in page allocation failures with certain workloads.
For now limit 4k linux page size config to 64TB.
Fixes: f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB")
Reported-by: Hugh Dickins <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK
ioctl. We shouldn't unlock the context status mutex as it was not
locked (yet).
Fixes: 0712dc7e73e5 ("cxl: Fix issues when unmapping contexts")
Cc: [email protected] # v3.19+
Signed-off-by: Frederic Barrat <[email protected]>
Reviewed-by: Vaibhav Jain <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit 8d911904f3ce4 ('powerpc/perf: Add restrictions to PMC5 in power9 DD1')
was added to restrict the use of PMC5 in Power9 DD1. Intention was to disable
the use of PMC5 using raw event code. But instead of updating the
power9_isa207_pmu structure (used on DD1), the commit incorrectly updated the
power9_pmu structure. Fix it.
Fixes: 8d911904f3ce ("powerpc/perf: Add restrictions to PMC5 in power9 DD1")
Reported-by: Shriya <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
Tested-by: Shriya <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.
Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.
This is visible in the dmesg output, as all pcpu allocs being in group 0:
pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:
pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
We can also check the data_offset in the paca of various CPUs, with the fix we
see:
CPU 0: data_offset = 0x0ffe8b0000
CPU 24: data_offset = 0x1ffe5b0000
And we can see from dmesg that CPU 24 has an allocation on node 1:
node 0: [mem 0x0000000000000000-0x0000000fffffffff]
node 1: [mem 0x0000001000000000-0x0000001fffffffff]
Cc: [email protected] # v3.16+
Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The i-side 0111b machine check, which is "Instruction Fetch to foreign
address space", was missed by 7b9f71f974 ("powerpc/64s: POWER9 machine
check handler").
The POWER9 processor core considers host real addresses with a
nonzero value in RA(8:12) as foreign address space, accessible only
by the copy and paste instructions. The copy and paste instruction
pair can be used to invoke the Nest accelerators via the Virtual
Accelerator Switchboard (VAS).
It is an error for any regular load/store or ifetch to go to a foreign
addresses. When relocation is on, this causes an MMU exception. When
relocation is off, a machine check exception. It is possible to trigger
this machine check by branching to a foreign address with MSR[IR]=0.
Fixes: 7b9f71f974a1 ("powerpc/64s: POWER9 machine check handler")
Reported-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We should unlock if get_cxl_adapter() fails.
Fixes: 594ff7d067ca ("cxl: Support to flash a new image on the adapter from a guest")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
During an eeh call to cxl_remove can result in double free_irq of
psl,slice interrupts. This can happen if perst_reloads_same_image == 1
and call to cxl_configure_adapter() fails during slot_reset
callback. In such a case we see a kernel oops with following back-trace:
Oops: Kernel access of bad area, sig: 11 [#1]
Call Trace:
free_irq+0x88/0xd0 (unreliable)
cxl_unmap_irq+0x20/0x40 [cxl]
cxl_native_release_psl_irq+0x78/0xd8 [cxl]
pci_deconfigure_afu+0xac/0x110 [cxl]
cxl_remove+0x104/0x210 [cxl]
pci_device_remove+0x6c/0x110
device_release_driver_internal+0x204/0x2e0
pci_stop_bus_device+0xa0/0xd0
pci_stop_and_remove_bus_device+0x28/0x40
pci_hp_remove_devices+0xb0/0x150
pci_hp_remove_devices+0x68/0x150
eeh_handle_normal_event+0x140/0x580
eeh_handle_event+0x174/0x360
eeh_event_handler+0x1e8/0x1f0
This patch fixes the issue of double free_irq by checking that
variables that hold the virqs (err_hwirq, serr_hwirq, psl_virq) are
not '0' before un-mapping and resetting these variables to '0' when
they are un-mapped.
Cc: [email protected]
Signed-off-by: Vaibhav Jain <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently tsk->thread.load_tm is not initialized in the task creation
and can contain garbage on a new task.
This is an undesired behaviour, since it affects the timing to enable
and disable the transactional memory laziness (disabling and enabling
the MSR TM bit, which affects TM reclaim and recheckpoint in the
scheduling process).
Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Cc: [email protected] # v4.9+
Signed-off-by: Breno Leitao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
These two functions implement the same semantics, so unify their naming so we
can share code that calls them. The longer name is more descriptive so use it.
Signed-off-by: Christophe Leroy <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add __GFP_ACCOUNT to __hugepte_alloc()
Signed-off-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add support in pte_alloc_one() and pgd_alloc() by
passing __GFP_ACCOUNT in the flags
Signed-off-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Introduce a helper pgtable_gfp_flags() which
just returns the current gfp flags and adds
__GFP_ACCOUNT to account for page table allocation.
The generic helper is added to include/asm/pgalloc.h
and has two variants - WARNING ugly bits ahead
1. If the header is included from a module, no check
for mm == &init_mm is done, since init_mm is not
exported
2. For kernel includes, the check is done and required
see (3e79ec7 arch: x86: charge page tables to kmemcg)
The fundamental assumption is that no module should be
doing pgd/pud/pmd and pte alloc's on behalf of init_mm
directly.
NOTE: This adds an overhead to pmd/pud/pgd allocations
similar to x86. The other alternative was to implement
pmd_alloc_kernel/pud_alloc_kernel and pgd_alloc_kernel
with their offset variants.
For 4k page size, pte_alloc_one no longer calls
pte_alloc_one_kernel.
Signed-off-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently in hpte_need_flush() if there is no batch pending we always do a
global TLB flush, which is inefficient if the mm has never run on another
thread.
Instead do the same check that __flush_tlb_pending() does and check if a local
flush is sufficient when batch->active is false. Instead of open-coding it we
use mm_is_thread_local().
Signed-off-by: Balbir Singh <[email protected]>
[mpe: Don't use a local, just inline mm_is_thread_local()]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Signed-off-by: Li Yang <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add myself as the maintainer for drivers/fsl/soc/ and fix the scope for
device tree bindings.
Signed-off-by: Li Yang <[email protected]>
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This reduces overhead of mutex locking and increases context switch
rate significantly (which helps to measure and profile the context
switch path).
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Collation of some spelling fixes from Colin.
Attemping -> Attempting
intialized -> initialized
missmanaged -> mismanaged
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently tsk->thread->load_vec and load_fp are not initialized during
task creation, which can lead to garbage values in these variables (non-zero
values).
These variables will be checked later in restore_math() to validate if the
FP and vector registers are being utilized. Since these values might be
non-zero, the restore_math() will continue to save the FP and vectors even if
they were never utilized by the userspace application. load_fp and load_vec
counters will then overflow (they wrap at 255) and the FP and Altivec will be
finally disabled, but before that condition is reached (counter overflow)
several context switches will have restored FP and vector registers without
need, causing a performance degradation.
Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used")
Cc: [email protected] # v4.6+
Signed-off-by: Breno Leitao <[email protected]>
Signed-off-by: Gustavo Romero <[email protected]>
Acked-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
enable_kernel_altivec()
The xor_vmx.c file is used for the RAID5 xor operations. In these functions
altivec is enabled to run the operation and then disabled.
The code uses enable_kernel_altivec() around the core of the algorithm, however
the whole file is built with -maltivec, so the compiler is within its rights to
generate altivec code anywhere. This has been seen at least once in the wild:
0:mon> di $xor_altivec_2
c0000000000b97d0 3c4c01d9 addis r2,r12,473
c0000000000b97d4 3842db30 addi r2,r2,-9424
c0000000000b97d8 7c0802a6 mflr r0
c0000000000b97dc f8010010 std r0,16(r1)
c0000000000b97e0 60000000 nop
c0000000000b97e4 7c0802a6 mflr r0
c0000000000b97e8 faa1ffa8 std r21,-88(r1)
...
c0000000000b981c f821ff41 stdu r1,-192(r1)
c0000000000b9820 7f8101ce stvx v28,r1,r0 <-- POP
c0000000000b9824 38000030 li r0,48
c0000000000b9828 7fa101ce stvx v29,r1,r0
...
c0000000000b984c 4bf6a06d bl c0000000000238b8 # enable_kernel_altivec
This patch splits the non-altivec code into xor_vmx_glue.c which calls the
altivec functions in xor_vmx.c. By compiling xor_vmx_glue.c without
-maltivec we can guarantee that altivec instruction will not be executed
outside of the enable/disable block.
Signed-off-by: Matt Brown <[email protected]>
[mpe: Rework change log and include disassembly]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
By default, 5% of system RAM is reserved for preserving boot memory.
Alternatively, a user can specify the amount of memory to reserve.
See Documentation/powerpc/firmware-assisted-dump.txt for details. In
addition to the memory reserved for preserving boot memory, some more
memory is reserved, to save HPTE region, CPU state data and ELF core
headers.
Memory Reservation during first kernel looks like below:
Low memory Top of memory
0 boot memory size |
| | |<--Reserved dump area -->|
V V | Permanent Reservation V
+-----------+----------/ /----------+---+----+-----------+----+
| | |CPU|HPTE| DUMP |ELF |
+-----------+----------/ /----------+---+----+-----------+----+
| ^
| |
\ /
-------------------------------------------
Boot memory content gets transferred to
reserved area by firmware at the time of
crash
This implicitly means that the sum of the sizes of boot memory, CPU
state data, HPTE region, DUMP preserving area and ELF core headers
can't be greater than the total memory size. But currently, a user is
allowed to specify any value as boot memory size. So, the above rule
is violated when a boot memory size around 50% of the total available
memory is specified. As the kernel is not handling this currently, it
may lead to undefined behavior. Fix it by setting an upper limit for
boot memory size to 25% of the total available memory. Also, instead
of using memblock_end_of_DRAM(), which doesn't take the holes, if any,
in the memory layout into account, use memblock_phys_mem_size() to
calculate the percentage of total available memory.
Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
With commit f6e6bedb7731 ("powerpc/fadump: Reserve memory at an offset
closer to bottom of RAM"), memory for fadump is no longer reserved at
the top of RAM. But there are still a few places which say so. Change
them appropriately.
Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
With commit 11550dc0a00b ("powerpc/fadump: reuse crashkernel parameter
for fadump memory reservation"), 'fadump_reserve_mem=' parameter is
deprecated in favor of 'crashkernel=' parameter. Add a warning if
'fadump_reserve_mem=' is still used.
Fixes: 11550dc0a00b ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation")
Suggested-by: Prarit Bhargava <[email protected]>
Signed-off-by: Hari Bathini <[email protected]>
[mpe: Unsplit long printk strings]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
- log an error message when registration fails and no error code listed
in the switch is returned
- translate the hv error code to posix error code and return it from
fw_register
- return the posix error code from fw_register to the process writing
to sysfs
- return EEXIST on re-registration
- return success on deregistration when fadump is not registered
- return ENODEV when no memory is reserved for fadump
Signed-off-by: Michal Suchanek <[email protected]>
Tested-by: Hari Bathini <[email protected]>
[mpe: Use pr_err() to shrink the error print]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
With the __ilog2() function as defined in
arch/powerpc/include/asm/bitops.h, GCC will not optimise the code
in case of constant parameter.
The generic ilog2() function in include/linux/log2.h is written
to handle the case of the constant parameter.
This patch discards the three __ilog2() functions and
defines __ilog2() as ilog2()
For non constant calls, the generated code is doing the same:
int test__ilog2(unsigned long x)
{
return __ilog2(x);
}
int test__ilog2_u32(u32 n)
{
return __ilog2_u32(n);
}
int test__ilog2_u64(u64 n)
{
return __ilog2_u64(n);
}
On PPC32 before the patch:
00000000 <test__ilog2>:
0: 7c 63 00 34 cntlzw r3,r3
4: 20 63 00 1f subfic r3,r3,31
8: 4e 80 00 20 blr
0000000c <test__ilog2_u32>:
c: 7c 63 00 34 cntlzw r3,r3
10: 20 63 00 1f subfic r3,r3,31
14: 4e 80 00 20 blr
On PPC32 after the patch:
00000000 <test__ilog2>:
0: 7c 63 00 34 cntlzw r3,r3
4: 20 63 00 1f subfic r3,r3,31
8: 4e 80 00 20 blr
0000000c <test__ilog2_u32>:
c: 7c 63 00 34 cntlzw r3,r3
10: 20 63 00 1f subfic r3,r3,31
14: 4e 80 00 20 blr
On PPC64 before the patch:
0000000000000000 <.test__ilog2>:
0: 7c 63 00 74 cntlzd r3,r3
4: 20 63 00 3f subfic r3,r3,63
8: 7c 63 07 b4 extsw r3,r3
c: 4e 80 00 20 blr
0000000000000010 <.test__ilog2_u32>:
10: 7c 63 00 34 cntlzw r3,r3
14: 20 63 00 1f subfic r3,r3,31
18: 7c 63 07 b4 extsw r3,r3
1c: 4e 80 00 20 blr
0000000000000020 <.test__ilog2_u64>:
20: 7c 63 00 74 cntlzd r3,r3
24: 20 63 00 3f subfic r3,r3,63
28: 7c 63 07 b4 extsw r3,r3
2c: 4e 80 00 20 blr
On PPC64 after the patch:
0000000000000000 <.test__ilog2>:
0: 7c 63 00 74 cntlzd r3,r3
4: 20 63 00 3f subfic r3,r3,63
8: 7c 63 07 b4 extsw r3,r3
c: 4e 80 00 20 blr
0000000000000010 <.test__ilog2_u32>:
10: 7c 63 00 34 cntlzw r3,r3
14: 20 63 00 1f subfic r3,r3,31
18: 7c 63 07 b4 extsw r3,r3
1c: 4e 80 00 20 blr
0000000000000020 <.test__ilog2_u64>:
20: 7c 63 00 74 cntlzd r3,r3
24: 20 63 00 3f subfic r3,r3,63
28: 7c 63 07 b4 extsw r3,r3
2c: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
With the ffz() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.
This patch replaces ffz() by the generic function.
The generic ffz(x) expects to never be called with ~x == 0
as written in the comment in include/asm-generic/bitops/ffz.h
The only user of ffz() within arch/powerpc/ is
platforms/512x/mpc5121_ads_cpld.c, which checks if x is not 0xff
For non constant calls, the generated code is doing the same:
unsigned long testffz(unsigned long x)
{
return ffz(x);
}
On PPC32, before the patch:
00000018 <testffz>:
18: 7c 63 18 f9 not. r3,r3
1c: 40 82 00 0c bne 28 <testffz+0x10>
20: 38 60 00 20 li r3,32
24: 4e 80 00 20 blr
28: 7d 23 00 d0 neg r9,r3
2c: 7d 23 18 38 and r3,r9,r3
30: 7c 63 00 34 cntlzw r3,r3
34: 20 63 00 1f subfic r3,r3,31
38: 4e 80 00 20 blr
On PPC32, after the patch:
00000018 <testffz>:
18: 39 23 00 01 addi r9,r3,1
1c: 7d 23 18 78 andc r3,r9,r3
20: 7c 63 00 34 cntlzw r3,r3
24: 20 63 00 1f subfic r3,r3,31
28: 4e 80 00 20 blr
On PPC64, before the patch:
0000000000000030 <.testffz>:
30: 7c 60 18 f9 not. r0,r3
34: 38 60 00 40 li r3,64
38: 4d 82 00 20 beqlr
3c: 7c 60 00 d0 neg r3,r0
40: 7c 63 00 38 and r3,r3,r0
44: 7c 63 00 74 cntlzd r3,r3
48: 20 63 00 3f subfic r3,r3,63
4c: 7c 63 07 b4 extsw r3,r3
50: 4e 80 00 20 blr
On PPC64, after the patch:
0000000000000030 <.testffz>:
30: 38 03 00 01 addi r0,r3,1
34: 7c 03 18 78 andc r3,r0,r3
38: 7c 63 00 74 cntlzd r3,r3
3c: 20 63 00 3f subfic r3,r3,63
40: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
With the fls() functions as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.
This patch replaces __fls() by the builtin function, and modifies
fls() and fls64() to use builtins instead of inline assembly
For non constant calls, the generated code is doing the same:
int testfls(unsigned int x)
{
return fls(x);
}
unsigned long test__fls(unsigned long x)
{
return __fls(x);
}
int testfls64(__u64 x)
{
return fls64(x);
}
On PPC32, before the patch:
00000064 <testfls>:
64: 7c 63 00 34 cntlzw r3,r3
68: 20 63 00 20 subfic r3,r3,32
6c: 4e 80 00 20 blr
00000070 <test__fls>:
70: 7c 63 00 34 cntlzw r3,r3
74: 20 63 00 1f subfic r3,r3,31
78: 4e 80 00 20 blr
0000007c <testfls64>:
7c: 2c 03 00 00 cmpwi r3,0
80: 40 82 00 10 bne 90 <testfls64+0x14>
84: 7c 83 00 34 cntlzw r3,r4
88: 20 63 00 20 subfic r3,r3,32
8c: 4e 80 00 20 blr
90: 7c 63 00 34 cntlzw r3,r3
94: 20 63 00 40 subfic r3,r3,64
98: 4e 80 00 20 blr
On PPC32, after the patch:
00000054 <testfls>:
54: 7c 63 00 34 cntlzw r3,r3
58: 20 63 00 20 subfic r3,r3,32
5c: 4e 80 00 20 blr
00000060 <test__fls>:
60: 7c 63 00 34 cntlzw r3,r3
64: 20 63 00 1f subfic r3,r3,31
68: 4e 80 00 20 blr
0000006c <testfls64>:
6c: 2c 03 00 00 cmpwi r3,0
70: 41 82 00 10 beq 80 <testfls64+0x14>
74: 7c 63 00 34 cntlzw r3,r3
78: 20 63 00 40 subfic r3,r3,64
7c: 4e 80 00 20 blr
80: 7c 83 00 34 cntlzw r3,r4
84: 20 63 00 40 subfic r3,r3,32
88: 4e 80 00 20 blr
On PPC64, before the patch:
00000000000000a0 <.testfls>:
a0: 7c 63 00 34 cntlzw r3,r3
a4: 20 63 00 20 subfic r3,r3,32
a8: 7c 63 07 b4 extsw r3,r3
ac: 4e 80 00 20 blr
00000000000000b0 <.test__fls>:
b0: 7c 63 00 74 cntlzd r3,r3
b4: 20 63 00 3f subfic r3,r3,63
b8: 7c 63 07 b4 extsw r3,r3
bc: 4e 80 00 20 blr
00000000000000c0 <.testfls64>:
c0: 7c 63 00 74 cntlzd r3,r3
c4: 20 63 00 40 subfic r3,r3,64
c8: 7c 63 07 b4 extsw r3,r3
cc: 4e 80 00 20 blr
On PPC64, after the patch:
0000000000000090 <.testfls>:
90: 7c 63 00 34 cntlzw r3,r3
94: 20 63 00 20 subfic r3,r3,32
98: 7c 63 07 b4 extsw r3,r3
9c: 4e 80 00 20 blr
00000000000000a0 <.test__fls>:
a0: 7c 63 00 74 cntlzd r3,r3
a4: 20 63 00 3f subfic r3,r3,63
a8: 4e 80 00 20 blr
ac: 60 00 00 00 nop
00000000000000b0 <.testfls64>:
b0: 7c 63 00 74 cntlzd r3,r3
b4: 20 63 00 40 subfic r3,r3,64
b8: 7c 63 07 b4 extsw r3,r3
bc: 4e 80 00 20 blr
Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
With the ffs() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter, as shown
by the small exemple below.
int ffs_test(void)
{
return 4 << ffs(31);
}
c0012334 <ffs_test>:
c0012334: 39 20 00 01 li r9,1
c0012338: 38 60 00 04 li r3,4
c001233c: 7d 29 00 34 cntlzw r9,r9
c0012340: 21 29 00 20 subfic r9,r9,32
c0012344: 7c 63 48 30 slw r3,r3,r9
c0012348: 4e 80 00 20 blr
With this patch, the same function will compile as follows:
c0012334 <ffs_test>:
c0012334: 38 60 00 08 li r3,8
c0012338: 4e 80 00 20 blr
The same happens with __ffs()
For non constant calls, the generated code is doing the same,
allthought it is slightly different on 64 bits for ffs():
unsigned long test__ffs(unsigned long x)
{
return __ffs(x);
}
int testffs(int x)
{
return ffs(x);
}
On PPC32, before the patch:
0000003c <test__ffs>:
3c: 7d 23 00 d0 neg r9,r3
40: 7d 23 18 38 and r3,r9,r3
44: 7c 63 00 34 cntlzw r3,r3
48: 20 63 00 1f subfic r3,r3,31
4c: 4e 80 00 20 blr
00000050 <testffs>:
50: 7d 23 00 d0 neg r9,r3
54: 7d 23 18 38 and r3,r9,r3
58: 7c 63 00 34 cntlzw r3,r3
5c: 20 63 00 20 subfic r3,r3,32
60: 4e 80 00 20 blr
On PPC32, after the patch:
0000002c <test__ffs>:
2c: 7d 23 00 d0 neg r9,r3
30: 7d 23 18 38 and r3,r9,r3
34: 7c 63 00 34 cntlzw r3,r3
38: 20 63 00 1f subfic r3,r3,31
3c: 4e 80 00 20 blr
00000040 <testffs>:
40: 7d 23 00 d0 neg r9,r3
44: 7d 23 18 38 and r3,r9,r3
48: 7c 63 00 34 cntlzw r3,r3
4c: 20 63 00 20 subfic r3,r3,32
50: 4e 80 00 20 blr
On PPC64, before the patch:
0000000000000060 <.test__ffs>:
60: 7c 03 00 d0 neg r0,r3
64: 7c 03 18 38 and r3,r0,r3
68: 7c 63 00 74 cntlzd r3,r3
6c: 20 63 00 3f subfic r3,r3,63
70: 7c 63 07 b4 extsw r3,r3
74: 4e 80 00 20 blr
0000000000000080 <.testffs>:
80: 7c 03 00 d0 neg r0,r3
84: 7c 03 18 38 and r3,r0,r3
88: 7c 63 00 74 cntlzd r3,r3
8c: 20 63 00 40 subfic r3,r3,64
90: 7c 63 07 b4 extsw r3,r3
94: 4e 80 00 20 blr
On PPC64, after the patch:
0000000000000050 <.test__ffs>:
50: 7c 03 00 d0 neg r0,r3
54: 7c 03 18 38 and r3,r0,r3
58: 7c 63 00 74 cntlzd r3,r3
5c: 20 63 00 3f subfic r3,r3,63
60: 4e 80 00 20 blr
0000000000000070 <.testffs>:
70: 7c 03 00 d0 neg r0,r3
74: 7c 03 18 38 and r3,r0,r3
78: 7c 63 00 34 cntlzw r3,r3
7c: 20 63 00 20 subfic r3,r3,32
80: 7c 63 07 b4 extsw r3,r3
84: 4e 80 00 20 blr
(ffs() operates on an int so cntlzw is equivalent to cntlzd)
In addition, when reading the generated vmlinux, we can observe
that with the builtin functions, GCC sometimes efficiently spreads
the instructions within the generated functions while the inline
assembly force them to remain grouped together.
__builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h
Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
It often happens to have simultaneous interrupts, for instance
when having double Ethernet attachment. With the current
implementation, we suffer the cost of kernel entry/exit for each
interrupt.
This patch introduces a loop in __do_irq() to handle all interrupts
at once before returning.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
IRQ 0 is a valid HW interrupt. So get_irq() shall return 0 when
there is no irq, instead of returning irq_linear_revmap(... ,0)
Fixes: f2a0bd3753dad ("[POWERPC] 8xx: powerpc port of core CPM PIC")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The 8xx has a dedicated exception for breakpoints, that directly
calls do_break()
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Analysis of the assembly code shows that when using user_mode(regs),
at least the 'andi.' is redone all the time, and also
the 'lwz ,132(r31)' most of the time. With the new form, the 'is_user'
is mapped to cr4, then all further use of is_user results in just
things like 'beq cr4,218 <do_page_fault+0x218>'
Without the patch:
50: 81 1e 00 84 lwz r8,132(r30)
54: 71 09 40 00 andi. r9,r8,16384
58: 40 82 00 0c bne 64 <do_page_fault+0x64>
84: 81 3e 00 84 lwz r9,132(r30)
8c: 71 2a 40 00 andi. r10,r9,16384
90: 41 a2 01 64 beq 1f4 <do_page_fault+0x1f4>
d4: 81 3e 00 84 lwz r9,132(r30)
dc: 71 28 40 00 andi. r8,r9,16384
e0: 41 82 02 08 beq 2e8 <do_page_fault+0x2e8>
108: 81 3e 00 84 lwz r9,132(r30)
110: 71 28 40 00 andi. r8,r9,16384
118: 41 82 02 28 beq 340 <do_page_fault+0x340>
1e4: 81 3e 00 84 lwz r9,132(r30)
1e8: 71 2a 40 00 andi. r10,r9,16384
1ec: 40 82 01 68 bne 354 <do_page_fault+0x354>
228: 81 3e 00 84 lwz r9,132(r30)
22c: 71 28 40 00 andi. r8,r9,16384
230: 41 82 ff c4 beq 1f4 <do_page_fault+0x1f4>
288: 71 2a 40 00 andi. r10,r9,16384
294: 41 a2 fe 60 beq f4 <do_page_fault+0xf4>
50c: 81 3e 00 84 lwz r9,132(r30)
514: 71 2a 40 00 andi. r10,r9,16384
518: 40 a2 fc e0 bne 1f8 <do_page_fault+0x1f8>
534: 81 3e 00 84 lwz r9,132(r30)
53c: 71 2a 40 00 andi. r10,r9,16384
540: 41 82 fc b8 beq 1f8 <do_page_fault+0x1f8>
This patch creates a local var called 'is_user' which contains the
result of user_mode(regs)
With the patch:
20: 81 03 00 84 lwz r8,132(r3)
48: 55 09 97 fe rlwinm r9,r8,18,31,31
58: 2e 09 00 00 cmpwi cr4,r9,0
5c: 40 92 00 0c bne cr4,68 <do_page_fault+0x68>
88: 41 b2 01 90 beq cr4,218 <do_page_fault+0x218>
d4: 40 92 01 d0 bne cr4,2a4 <do_page_fault+0x2a4>
120: 41 b2 00 f8 beq cr4,218 <do_page_fault+0x218>
138: 41 b2 ff a0 beq cr4,d8 <do_page_fault+0xd8>
1d4: 40 92 00 e0 bne cr4,2b4 <do_page_fault+0x2b4>
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The result of (trap == 0x400) is already in is_exec.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Function store_updates_sp() checks whether the faulting
instruction is a store updating r1. Therefore we can limit its calls
to store exceptions.
This patch is an improvement of commit a7a9dcd882a67 ("powerpc: Avoid
taking a data miss on every userspace instruction miss")
With the same microbenchmark app, run with 500 as argument, on an
MPC885 we get:
Before this patch: 152000 DTLB misses
After this patch: 147000 DTLB misses
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This function has not been used since commit 9494a1e8428ea
("powerpc: use generic fixmap.h)
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The check in hpte_find() should be < and not <= for PAGE_OFFSET
Signed-off-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We are running low on CPU feature bits, so we only want to use them when
it's really necessary.
CPU_FTR_SUBCORE is only used in one place, and only in C, so we don't
need it in order to make asm patching work. It can only be set on
"Power8" CPUs, which in practice means POWER8, POWER8E and POWER8NVL.
There are no plans to implement it on future CPUs, but if there ever
were we could retrofit it then.
Although KVM uses subcores, it never looks at the CPU feature, it either
looks at the ISA level or the threads_per_subcore value.
So drop the CPU feature and do a PVR check instead. Drop the device tree
"subcore" feature as we no longer support doing anything with it, and we
will drop it from skiboot too.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property. Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.
Fixes: 5f97b2a0d176 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Cc: [email protected] # v4.1+
Signed-off-by: Michael Bringmann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
of_mm_gpiochip_add_data() generates an oops for NULL pointer dereference.
of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
setting the data, therefore ->save_regs() cannot use gpiochip_get_data()
Fixes: 937daafca774 ("powerpc: simple-gpio: use gpiochip data pointer")
Cc: [email protected] # v4.7+
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
If a process dumps core while it has SPU contexts active then we have
code to also dump information about the SPU contexts.
Unfortunately it's been broken for 3 1/2 years, and we didn't notice. In
commit 7b1f4020d0d1 ("spufs: get rid of dump_emit() wrappers") the nread
variable was removed and rc used instead. That means when the loop exits
successfully, rc has the number of bytes read, but it's then used as the
return value for the function, which should return 0 on success.
So fix it by setting rc = 0 before returning in the success case.
Fixes: 7b1f4020d0d1 ("spufs: get rid of dump_emit() wrappers")
Signed-off-by: Michael Ellerman <[email protected]>
Acked-by: Jeremy Kerr <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|