aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2013-06-30powerpc/tm: Clear MSR RI in non-recoverable TM codeMichael Neuling1-2/+16
When we treclaim and trecheckpoint there's an unavoidable period when r1 will not be a valid kernel stack pointer. This patch clears the MSR recoverable interrupt (RI) bit over these regions to indicate we have an invalid kernel stack pointer. For treclaim, the region over which we clear MSR RI is larger than required to avoid the need for an extra costly mtmsrd. Thanks to Paulus for suggesting this change. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-30powerpc: Fix string instr. emulation for 32-bit processes on ppc64James Yang1-0/+4
String instruction emulation would erroneously result in a segfault if the upper bits of the EA are set and is so high that it fails access check. Truncate the EA to 32 bits if the process is 32-bit. Signed-off-by: James Yang <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-30powerpc/pci: Improve device hotplug initializationGuenter Roeck1-5/+12
Commit 37f02195b (powerpc/pci: fix PCI-e devices rescan issue on powerpc platform) fixes a problem with interrupt and DMA initialization on hot plugged devices. With this commit, interrupt and DMA initialization for hot plugged devices is handled in the pci device enable function. This approach has a couple of drawbacks. First, it creates two code paths for device initialization, one for hot plugged devices and another for devices known during the initial PCI scan. Second, the initialization code for hot plugged devices is only called when the device is enabled, ie typically in the probe function. Also, the platform specific setup code is called each time pci_enable_device() is called, not only once during device discovery, meaning it is actually called multiple times, once for devices discovered during the initial scan and again each time a driver is re-loaded. The visible result is that interrupt pins are only assigned to hot plugged devices when the device driver is loaded. Effectively this changes the PCI probe API, since pci_dev->irq and the device's dma configuration will now only be valid after pci_enable() was called at least once. A more subtle change is that platform specific PCI device setup is moved from device discovery into the driver's probe function, more specifically into the pci_enable_device() call. To fix the inconsistencies, add new function pcibios_add_device. Call pcibios_setup_device from pcibios_setup_bus_devices if device setup is not complete, and from pcibios_add_device if bus setup is complete. With this change, device setup code is moved back into device initialization, and called exactly once for both static and hot plugged devices. [ This also fixes a regression introduced by the above patch which causes dev->irq to be overwritten under some cirumstances after MSIs have been enabled for the device which leads to crashes due to the MSI core "hijacking" dev->irq to store the base MSI number and not the LSI. --BenH ] Cc: Yuanquan Chen <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Hiroo Matsumoto <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-29proc_powerpc: switch to fixed_size_llseek()Al Viro1-18/+2
Signed-off-by: Al Viro <[email protected]>
2013-06-25powerpc/eeh: Use interruptible sleep in keehdGavin Shan1-1/+2
To replace down() with down_interrutible() to avoid following warning: [c00000007ba7b710] [c000000000014410] .__switch_to+0x1b0/0x380 [c00000007ba7b7c0] [c0000000007b408c] .__schedule+0x3ec/0x970 [c00000007ba7ba50] [c0000000007b1f24] .schedule_timeout+0x1a4/0x2b0 [c00000007ba7bb30] [c0000000007b34a4] .__down+0xa4/0x104 [c00000007ba7bbf0] [c0000000000b9230] .down+0x60/0x70 [c00000007ba7bc80] [c0000000000336d0] .eeh_event_handler+0x70/0x190 [c00000007ba7bd30] [c0000000000b1a58] .kthread+0xe8/0xf0 [c00000007ba7be30] [c00000000000a05c] .ret_from_kernel_thread+0x5c/0x8 This also avoids keeping the load average up while doing nothing. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25powerpc/eeh: Remove eeh_mutexGavin Shan2-32/+1
Originally, eeh_mutex was introduced to protect the PE hierarchy tree and the attached EEH devices because EEH core was possiblly running with multiple threads to access the PE hierarchy tree. However, we now have only one kthread in EEH core. So we needn't the eeh_mutex and just remove it. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25powerpc/hw_brk: Fix clearing of extraneous IRQMichael Neuling1-0/+1
In 9422de3 "powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers" we changed the way we mark extraneous irqs with this: - info->extraneous_interrupt = !((bp->attr.bp_addr <= dar) && - (dar - bp->attr.bp_addr < bp->attr.bp_len)); + if (!((bp->attr.bp_addr <= dar) && + (dar - bp->attr.bp_addr < bp->attr.bp_len))) + info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; Unfortunately this is bogus as it never clears extraneous IRQ if it's already set. This correctly clears extraneous IRQ before possibly setting it. Signed-off-by: Michael Neuling <[email protected]> Reported-by: Edjunior Barbosa Machado <[email protected]> Cc: [email protected] Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25powerpc/hw_brk: Fix setting of length for exact mode breakpointsMichael Neuling1-1/+3
The smallest match region for both the DABR and DAWR is 8 bytes, so the kernel needs to filter matches when users want to look at regions smaller than this. Currently we set the length of PPC_BREAKPOINT_MODE_EXACT breakpoints to 8. This is wrong as in exact mode we should only match on 1 address, hence the length should be 1. This ensures that the kernel will filter out any exact mode hardware breakpoint matches on any addresses other than the requested one. Signed-off-by: Michael Neuling <[email protected]> Reported-by: Edjunior Barbosa Machado <[email protected]> Cc: [email protected] Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: Replace find_linux_pte with find_linux_pte_or_hugepteAneesh Kumar K.V2-3/+15
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly document why we don't need to handle transparent hugepages at callsites. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Allow to check fenced PHB proactivelyGavin Shan1-0/+60
It's meaningless to handle frozen PE if we already had fenced PHB. The patch intends to check the PHB state before checking PE. If the PHB has been put into fenced state, we need take care of that firstly. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: EEH core to handle special eventGavin Shan1-18/+110
On PowerNV platform, the EEH event caused by interrupt won't have binding PE. The patch enables EEH core to handle the special event. To avoid the current logic we have, The eeh_handle_event() is renamed to eeh_handle_normal_event(), and the eeh_handle_special_event() is introduced. The function eeh_handle_event() dispatches to above two functions according to the input parameter. Besides, new backend "next_error" added to eeh_ops and it's expected to have following return values: 4 - Dead IOC 3 - Dead PHB 2 - Fenced PHB 1 - Frozen PE 0 - No error found Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Export confirm_error_lockGavin Shan1-6/+4
An EEH event is created and queued to the event queue for each ingress EEH error. When there're mutiple EEH errors, we need serialize the process to keep consistent PE state (flags). The spinlock "confirm_error_lock" was introduced for the purpose. We'll inject EEH event upon error reporting interrupts on PowerNV platform. So we export the spinlock for that to use for consistent PE state. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Allow to purge EEH eventsGavin Shan1-0/+37
On PowerNV platform, we might run into the situation where subsequent events are duplicated events of former one, which is being processed. For the case, we need the function implemented by the patch to purge EEH events accordingly. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Trace time on first error for PEGavin Shan2-0/+32
We're not expecting that one specific PE got frozen for over 5 times in last hour. Otherwise, the PE will be removed from the system upon newly coming EEH errors. The patch introduces time stamp to trace the first error on specific PE in last hour and function to update that accordingly. Besides, the time stamp is recovered during PE hotplug path as we did for frozen count. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Single kthread to handle eventsGavin Shan2-47/+54
We possiblly have multiple kthreads running for multiple EEH errors (events) and use one spinlock to make the process of handling those EEH events serialized. That's unnecessary and the patch creates only one kthread, which is started during EEH core initialization time in eeh_init(). A new semaphore introduced to count the number of existing EEH events in the queue and the kthread waiting on the semaphore. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Delay EEH probe during hotplugGavin Shan1-1/+15
While doing EEH recovery, the PCI devices of the problematic PE should be removed and then added to the system again. During the so-called hotplug event, the PCI devices of the problematic PE will be probed through early/late phase. We would delay EEH probe on late point for PowerNV platform since the PCI device isn't available in early phase. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Refactor eeh_reset_pe_once()Gavin Shan1-1/+2
We shouldn't check that the returned PE status is exactly equal to (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE) but instead only check that they are both set. [benh: changelog] Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: EEH post initialization operationGavin Shan1-0/+11
The patch adds new EEH operation post_init. It's used to notify the platform that EEH core has completed the EEH probe. By that, PowerNV platform starts to use the services supplied by EEH functionality. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Make eeh_init() publicGavin Shan1-2/+20
For EEH on PowerNV platform, we will do EEH probe based on the real PCI devices. The PCI devices are available after PCI probe. So we have to call eeh_init() explicitly on PowerNV platform after PCI probe. The patch also does EEH probe for PowerNV platform in eeh_init(). Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Trace PCI bus from PEGavin Shan1-0/+17
There're several types of PEs can be supported for now: PHB, Bus and Device dependent PE. For PCI bus dependent PE, tracing the corresponding PCI bus from PE (struct eeh_pe) would make the code more efficient. The patch also enables the retrieval of PCI bus based on the PCI bus dependent PE. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Make eeh_pe_get() publicGavin Shan1-2/+2
While processing EEH event interrupt from P7IOC, we need function to retrieve the PE according to the indicated EEH device. The patch makes function eeh_pe_get() public so that other source files can call it for that purpose. Also, the patch fixes referring to wrong BDF (Bus/Device/Function) address while searching PE in function __eeh_pe_get(). Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Make eeh_phb_pe_get() publicGavin Shan1-1/+1
One of the possible cases indicated by P7IOC interrupt is fenced PHB. For that case, we need fetch the PE corresponding to the PHB and disable the PHB and all subordinate PCI buses/devices, recover from the fenced state and eventually enable the whole PHB. We need one function to fetch the PHB PE outside eeh_pe.c and the patch is going to make eeh_phb_pe_get() public for that purpose. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Move common part to kernel directoryGavin Shan9-1/+2906
The patch moves the common part of EEH core into arch/powerpc/kernel directory so that we needn't PPC_PSERIES while compiling POWERNV platform: * Move the EEH common part into arch/powerpc/kernel * Move the functions for PCI hotplug from pSeries platform to arch/powerpc/kernel/pci-hotplug.c * Move CONFIG_EEH from arch/powerpc/platforms/pseries/Kconfig to arch/powerpc/platforms/Kconfig * Adjust makefile accordingly Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/tm: Fix return of active 64bit signalsMichael Neuling1-3/+5
Currently we only restore signals which are transactionally suspended but it's possible that the transaction can be restored even when it's active. Most likely this will result in a transactional rollback by the hardware as the transaction will have been doomed by an earlier treclaim. The current code is a legacy of earlier kernel implementations which did software rollback of active transactions in the kernel. That code has now gone but we didn't correctly fix up this part of the signals code which still makes assumptions based on having software rollback. This changes the signal return code to always restore both contexts on 64 bit signal return. It also ensures that the MSR TM bits are properly restored from the signal context which they are not currently. Signed-off-by: Michael Neuling <[email protected]> cc: [email protected] (v3.9+) Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/tm: Fix return of 32bit rt signals to active transactionsMichael Neuling1-1/+1
Currently we only restore signals which are transactionally suspended but it's possible that the transaction can be restored even when it's active. Most likely this will result in a transactional rollback by the hardware as the transaction will have been doomed by an earlier treclaim. The current code is a legacy of earlier kernel implementations which did software rollback of active transactions in the kernel. That code has now gone but we didn't correctly fix up this part of the signals code which still makes assumptions based on having software rollback. This changes the signal return code to always restore both contexts on 32 bit rt signal return. Signed-off-by: Michael Neuling <[email protected]> cc: [email protected] (v3.9+) Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/tm: Fix restoration of MSR on 32bit signal returnMichael Neuling1-3/+6
Currently we clear out the MSR TM bits on signal return assuming that the signal should never return to an active transaction. This is bogus as the user may do this. It's most likely the transaction will be doomed due to a treclaim but that's a problem for the HW not the kernel. The current code is a legacy of earlier kernel implementations which did software rollback of active transactions in the kernel. That code has now gone but we didn't correctly fix up this part of the signals code which still makes the assumption that it must be returning to a suspended transaction. This pulls out both MSR TM bits from the user supplied context rather than just setting TM suspend. We pull out only the bits needed to ensure the user can't do anything dangerous to the MSR. Signed-off-by: Michael Neuling <[email protected]> cc: [email protected] (v3.9+) Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/tm: Fix 32 bit non-rt signalsMichael Neuling1-5/+25
Currently sys_sigreturn() is TM unaware. Therefore, if we take a 32 bit signal without SIGINFO (non RT) inside a transaction, on signal return we don't restore the signal frame correctly. This checks if the signal frame being restoring is an active transaction, and if so, it copies the additional state to ptregs so it can be restored. Signed-off-by: Michael Neuling <[email protected]> cc: [email protected] (v3.9+) Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/tm: Fix writing top half of MSR on 32 bit signalsMichael Neuling1-8/+21
The MSR TM controls are in the top 32 bits of the MSR hence on 32 bit signals, we stick the top half of the MSR in the checkpointed signal context so that the user can access it. Unfortunately, we don't currently write anything to the checkpointed signal context when coming in a from a non transactional process and hence the top MSR bits can contain junk. This updates the 32 bit signal handling code to always write something to the top MSR bits so that users know if the process is transactional or not and the kernel can use it on signal return. Signed-off-by: Michael Neuling <[email protected]> cc: [email protected] (v3.9+) Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/8xx: Remove 8xx specific "minimal FPU emulation"Benjamin Herrenschmidt1-21/+1
This is duplicated code from math-emu and implements such a small subset of the FPU (load/stores/fmr) that it's essentially pointless nowdays. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/math-emu: Allow math-emu to be used for HW FPUBenjamin Herrenschmidt1-1/+11
(Including 64-bit ones) This allow SW emulation by the kernel of optional instructions such as fsqrt which aren't implemented on some processors, and thus fixes some Fedora 19 issues such as Anaconda since the compiler is set to generate those by default on 64-bit. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc: Restore dbcr0 on user space exitBharat Bhushan2-7/+29
On BookE (Branch taken + Single Step) is as same as Branch Taken on BookS and in Linux we simulate BookS behavior for BookE as well. When doing so, in Branch taken handling we want to set DBCR0_IC but we update the current->thread->dbcr0 and not DBCR0. Now on 64bit the current->thread.dbcr0 (and other debug registers) is synchronized ONLY on context switch flow. But after handling Branch taken in debug exception if we return back to user space without context switch then single stepping change (DBCR0_ICMP) does not get written in h/w DBCR0 and Instruction Complete exception does not happen. This fixes using ptrace reliably on BookE-PowerPC lmbench latency test (lat_syscall) Results are (they varies a little on each run) 1) ./lat_syscall <action> /dev/shm/uImage action: Open read write stat fstat null Before: 3.8618 0.2017 0.2851 1.6789 0.2256 0.0856 After: 3.8580 0.2017 0.2851 1.6955 0.2255 0.0856 1) ./lat_syscall -P 2 -N 10 <action> /dev/shm/uImage action: Open read write stat fstat null Before: 4.1388 0.2238 0.3066 1.7106 0.2256 0.0856 After: 4.1413 0.2236 0.3062 1.7107 0.2256 0.0856 [ Slightly modified to avoid extra branch in the fast path on Book3S and fix build on all non-BookE 64-bit -- BenH ] Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/vfio: Enable on PowerNV platformAlexey Kardashevskiy1-0/+323
This initializes IOMMU groups based on the IOMMU configuration discovered during the PCI scan on POWERNV (POWER non virtualized) platform. The IOMMU groups are to be used later by the VFIO driver, which is used for PCI pass through. It also implements an API for mapping/unmapping pages for guest PCI drivers and providing DMA window properties. This API is going to be used later by QEMU-VFIO to handle h_put_tce hypercalls from the KVM guest. The iommu_put_tce_user_mode() does only a single page mapping as an API for adding many mappings at once is going to be added later. Although this driver has been tested only on the POWERNV platform, it should work on any platform which supports TCE tables. As h_put_tce hypercall is received by the host kernel and processed by the QEMU (what involves calling the host kernel again), performance is not the best - circa 220MB/s on 10Gb ethernet network. To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config option and configure VFIO as required. Cc: David Gibson <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc: Add a configuration option for early BootX/OpenFirmware debugAlistair Popple1-1/+1
Signed-off-by: Alistair Popple <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/prom: Scan reserved-ranges node for memory reservationsJeremy Kerr1-0/+35
Based on benh's proposal at https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-September/101237.html, this change provides support for reserving memory from the reserved-ranges node at the root of the device tree. We just call memblock_reserve on these ranges for now. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc: Remove the unneeded trigger of decrementer interrupt in ↵Kevin Hao1-2/+0
decrementer_check_overflow Previously in order to handle the edge sensitive decrementers, we choose to set the decrementer to 1 to trigger a decrementer interrupt when re-enabling interrupts. But with the rework of the lazy EE, we would replay the decrementer interrupt when re-enabling interrupts if a decrementer interrupt occurs with irq soft-disabled. So there is no need to trigger a decrementer interrupt in this case any more. Signed-off-by: Kevin Hao <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc: Move the single step enable code to a generic pathSuzuki K. Poulose1-19/+1
This patch moves the single step enable code used by kprobe to a generic routine header so that, it can be re-used by other code, in this case, uprobes. No functional changes. Signed-off-by: Suzuki K. Poulose <[email protected]> Cc: Ananth N Mavinakaynahalli <[email protected]> Cc: Kumar Gala <[email protected]> Cc: [email protected] Acked-by: Ananth N Mavinakayanahalli <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/kprobes: Do not disable External interrupts during single stepSuzuki K. Poulose1-5/+5
External/Decrement exceptions have lower priority than the Debug Exception. So, we don't have to disable the External interrupts before a single step. However, on BookE, Critical Input Exception(CE) has higher priority than a Debug Exception. Hence we mask them. Signed-off-by: Suzuki K. Poulose <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Ananth N Mavinakaynahalli <[email protected]> Cc: Kumar Gala <[email protected]> Cc: [email protected] Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-15powerpc: Fix missing/delayed calls to irq_workBenjamin Herrenschmidt1-1/+1
When replaying interrupts (as a result of the interrupt occurring while soft-disabled), in the case of the decrementer, we are exclusively testing for a pending timer target. However we also use decrementer interrupts to trigger the new "irq_work", which in this case would be missed. This change the logic to force a replay in both cases of a timer boundary reached and a decrementer interrupt having actually occurred while disabled. The former test is still useful to catch cases where a CPU having been hard-disabled for a long time completely misses the interrupt due to a decrementer rollover. CC: <[email protected]> [v3.4+] Signed-off-by: Benjamin Herrenschmidt <[email protected]> Tested-by: Steven Rostedt <[email protected]>
2013-06-15powerpc: Fix emulation of illegal instructions on PowerNV platformPaul Mackerras2-1/+11
Normally, the kernel emulates a few instructions that are unimplemented on some processors (e.g. the old dcba instruction), or privileged (e.g. mfpvr). The emulation of unimplemented instructions is currently not working on the PowerNV platform. The reason is that on these machines, unimplemented and illegal instructions cause a hypervisor emulation assist interrupt, rather than a program interrupt as on older CPUs. Our vector for the emulation assist interrupt just calls program_check_exception() directly, without setting the bit in SRR1 that indicates an illegal instruction interrupt. This fixes it by making the emulation assist interrupt set that bit before calling program_check_interrupt(). With this, old programs that use no-longer implemented instructions such as dcba now work again. CC: <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-15powerpc: Fix stack overflow crash in resume_kernel when ftracingMichael Ellerman1-2/+2
It's possible for us to crash when running with ftrace enabled, eg: Bad kernel stack pointer bffffd12 at c00000000000a454 cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40] pc: c00000000000a454: resume_kernel+0x34/0x60 lr: c00000000000335c: performance_monitor_common+0x15c/0x180 sp: bffffd12 msr: 8000000000001032 dar: bffffd12 dsisr: 42000000 If we look at current's stack (paca->__current->stack) we see it is equal to c0000002ecab0000. Our stack is 16K, and comparing to paca->kstack (c0000002ecab3e30) we can see that we have overflowed our kernel stack. This leads to us writing over our struct thread_info, and in this case we have corrupted thread_info->flags and set _TIF_EMULATE_STACK_STORE. Dumping the stack we see: 3:mon> t c0000002ecab0000 [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70 [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180 --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30 [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable) [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90 [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34 [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300 [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable) [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280 [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40 [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 ... and so on __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry path. At that point the irq state is not consistent, ie. interrupts are hard disabled (by the exception entry), but the paca soft-enabled flag may be out of sync. This leads to the local_irq_restore() in trace_graph_entry() actually enabling interrupts, which we do not want. Because we have not yet reprogrammed the decrementer we immediately take another decrementer exception, and recurse. The fix is twofold. Firstly make sure we call DISABLE_INTS before calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles the irq state in the paca with the hardware, making it safe again to call local_irq_save/restore(). Although that should be sufficient to fix the bug, we also mark the runlatch routines as notrace. They are called very early in the exception entry and we are asking for trouble tracing them. They are also fairly uninteresting and tracing them just adds unnecessary overhead. [ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873 "powerpc: Rework runlatch code" by myself --BenH ] CC: <[email protected]> [v3.4+] Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-14Merge branch 'pci/jiang-bus-lock-v3' into nextBjorn Helgaas1-2/+1
* pci/jiang-bus-lock-v3: PCI: Return early on allocation failures to unindent mainline code PCI: Simplify IOV implementation and fix reference count races PCI: Drop redundant setting of bus->is_added in virtfn_add_bus() unicore32/PCI: Remove redundant call of pci_bus_add_devices() m68k/PCI: Remove redundant call of pci_bus_add_devices() PCI: Rename pci_release_bus_bridge_dev() to pci_release_host_bridge_dev() PCI: Fix refcount issue in pci_create_root_bus() error recovery path ia64/PCI: Clean up pci_scan_root_bus() usage PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) PCI: Introduce pci_alloc_dev(struct pci_bus*) to replace alloc_pci_dev() PCI: Introduce pci_bus_{get|put}() to manage PCI bus reference count Conflicts: drivers/pci/probe.c
2013-06-12ibmebus: convert of_platform_driver to platform_driverRob Herring1-12/+10
ibmebus is the last remaining user of of_platform_driver and the conversion to a regular platform driver is trivial. Signed-off-by: Rob Herring <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Tested-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Grant Likely <[email protected]>
2013-06-10powerpc: Partial revert of "Context switch more PMU related SPRs"Michael Ellerman1-28/+0
In commit 59affcd I added context switching of more PMU SPRs, because they are potentially exposed to userspace on Power8. However despite me being a smart arse in the commit message it's actually not correct. In particular it interacts badly with a global perf record. We will have to do something more complicated, but that will have to wait for 3.11. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-10powerpc/hw_breakpoints: Add DABRX cpu feature to fix 32-bit regressionMichael Neuling1-1/+2
When introducing support for DABRX in 4474ef0, we broke older 32-bit CPUs that don't have that register. Some CPUs have a DABR but not DABRX. Configuration are: - No 32bit CPUs have DABRX but some have DABR. - POWER4+ and below have the DABR but no DABRX. - 970 and POWER5 and above have DABR and DABRX. - POWER8 has DAWR, hence no DABRX. This introduces CPU_FTR_DABRX and sets it on appropriate CPUs. We use the top 64 bits for CPU FTR bits since only 64 bit CPUs have this. Processors that don't have the DABRX will still work as they will fall back to software filtering these breakpoints via perf_exclude_event(). Signed-off-by: Michael Neuling <[email protected]> Reported-by: "Gorelik, Jacob (335F)" <[email protected]> cc: [email protected] (v3.9 only) Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-10powerpc/power8: Update denormalization handlerMichael Neuling1-0/+10
POWER8 can take a denormalisation exception on any VSX registers. This does the extra 32 VSX registers we don't currently handle. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-10powerpc/pseries: Simplify denormalization handlerMichael Neuling1-64/+16
The following simplifies the denorm code by using macros to generate the long stream of almost identical instructions. This patch results in no changes to the output binary, but removes a lot of lines of code. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-10powerpc/power8: Fix oprofile and perfMichael Neuling1-4/+4
In 2ac6f42 powerpc/cputable: Fix oprofile_cpu_type on power8 we broke all power8 hw events. This reverts this change and uses oprofile_type instead. Perf now works on POWER8 again and oprofile will revert to using timers on POWER8. Kudos to mpe this fix. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-10powerpc/pci: Check the bus address instead of resource address in ↵Kevin Hao1-1/+3
pcibios_fixup_resources If a BAR has the value of 0, we would assume that it is unset yet and then mark the resource as unset and would reassign it later. But after commit 6c5705fe (powerpc/PCI: get rid of device resource fixups) the pcibios_fixup_resources is invoked after the bus address was translated to linux resource. So the value of res->start is resource address. And since the resource and bus address may be different, we should translate it to the bus address before doing the check. Signed-off-by: Kevin Hao <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-05PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus)Gu Zheng1-2/+1
Use the new pci_alloc_dev(bus) to replace the existing using of alloc_pci_dev(void). [bhelgaas: drop pci_bus ref later in pci_release_dev()] Signed-off-by: Gu Zheng <[email protected]> Signed-off-by: Jiang Liu <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: David Airlie <[email protected]> Cc: Neela Syam Kolli <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Andrew Morton <[email protected]>
2013-06-01powerpc/cputable: Fix typo on P7+ cputable entryWill Schmidt1-1/+1
Fix a typo in setting COMMON_USER2_POWER7 bits to .cpu_user_features2 cpu specs table. Signed-off-by: Will Schmidt <[email protected]> Acked-by: Michael Neuling <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>