aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2015-10-12powerpc/mm: Disable hugepd for 64K page size.Aneesh Kumar K.V2-0/+34
After commit e2b3d202d1dba8f3546ed28224ce485bc50010be ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format"), we don't need to support is_hugepd() for 64K page size. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-09powerpc/powernv: Panic on unhandled Machine CheckDaniel Axtens1-2/+5
All unrecovered machine check errors on PowerNV should cause an immediate panic. There are 2 reasons that this is the right policy: it's not safe to continue, and we're already trying to reboot. Firstly, if we go through the recovery process and do not successfully recover, we can't be sure about the state of the machine, and it is not safe to recover and proceed. Linux knows about the following sources of Machine Check Errors: - Uncorrectable Errors (UE) - Effective - Real Address Translation (ERAT) - Segment Lookaside Buffer (SLB) - Translation Lookaside Buffer (TLB) - Unknown/Unrecognised In the SLB, TLB and ERAT cases, we can further categorise these as parity errors, multihit errors or unknown/unrecognised. We can handle SLB errors by flushing and reloading the SLB. We can handle TLB and ERAT multihit errors by flushing the TLB. (It appears we may not handle TLB and ERAT parity errors: I will investigate further and send a followup patch if appropriate.) This leaves us with uncorrectable errors. Uncorrectable errors are usually the result of ECC memory detecting an error that it cannot correct, but they also crop up in the context of PCI cards failing during DMA writes, and during CAPI error events. There are several types of UE, and there are 3 places a UE can occur: Skiboot, the kernel, and userspace. For Skiboot errors, we have the facility to make some recoverable. For userspace, we can simply kill (SIGBUS) the affected process. We have no meaningful way to deal with UEs in kernel space or in unrecoverable sections of Skiboot. Currently, these unrecovered UEs fall through to machine_check_expection() in traps.c, which calls die(), which OOPSes and sends SIGBUS to the process. This sometimes allows us to stumble onwards. For example we've seen UEs kill the kernel eehd and khugepaged. However, the process killed could have held a lock, or it could have been a more important process, etc: we can no longer make any assertions about the state of the machine. Similarly if we see a UE in skiboot (and again we've seen this happen), we're not in a position where we can make any assertions about the state of the machine. Likewise, for unknown or unrecognised errors, we're not able to say anything about the state of the machine. Therefore, if we have an unrecovered MCE, the most appropriate thing to do is to panic. The second reason is that since e784b6499d9c ("powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors."), we attempt a special OPAL reboot on an unhandled MCE. This is so the hardware can record error data for later debugging. The comments in that commit assert that we are heading down the panic path anyway. At the moment this is not always true. With UEs in kernel space, for instance, they are marked as recoverable by the hardware, so if the attempt to reboot failed (e.g. old Skiboot), we wouldn't panic() but would simply die() and OOPS. It doesn't make sense to be staggering on if we've just tried to reboot: we should panic(). Explicitly panic() on unrecovered MCEs on PowerNV. Update the comments appropriately. This fixes some hangs following EEH events on cxlflash setups. Signed-off-by: Daniel Axtens <[email protected]> Reviewed-by: Andrew Donnellan <[email protected]> Reviewed-by: Ian Munsie <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-09powerpc/pseries/hvcserver: don't memset pi_buff if it is nullColin Ian King1-1/+1
pi_buff is being memset before it is sanity checked. Move the memset after the null pi_buff sanity check to avoid an oops. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-09powerpc: Fix _ALIGN_* errors due to type difference.Aneesh Kumar K.V2-4/+5
This avoid errors like unsigned int usize = 1 << 30; int size = 1 << 30; unsigned long addr = 64UL << 30 ; value = _ALIGN_DOWN(addr, usize); -> 0 value = _ALIGN_DOWN(addr, size); -> 0x1000000000 Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-09powerpc: Fix checkstop in native_hpte_clear() with lockdepCyril Bur2-14/+18
native_hpte_clear() is called in real mode from two places: - Early in boot during htab initialisation if firmware assisted dump is active. - Late in the kexec path. In both contexts there is no need to disable interrupts are they are already disabled. Furthermore, locking around the tlbie() is only required for pre POWER5 hardware. On POWER5 or newer hardware concurrent tlbie()s work as expected and on pre POWER5 hardware concurrent tlbie()s could result in deadlock. This code would only be executed at crashdump time, during which all bets are off, concurrent tlbie()s are unlikely and taking locks is unsafe therefore the best course of action is to simply do nothing. Concurrent tlbie()s are not possible in the first case as secondary CPUs have not come up yet. Signed-off-by: Cyril Bur <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-08arch/powerpc: provide zero_bytemask() for big-endianChris Metcalf1-0/+5
For some reason, only the little-endian flavor of powerpc provided the zero_bytemask() implementation. Reported-by: Michal Sojka <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Chris Metcalf <[email protected]>
2015-10-08Merge branch 'perf/urgent' into perf/core, before pulling new changesIngo Molnar1-0/+1
Signed-off-by: Ingo Molnar <[email protected]>
2015-10-07powerpc: dts: p1022si: Add fsl,wake-on-filer for eTSECClaudiu Manoil1-0/+2
Enable the "wake-on-filer" (aka. wake on user defined packet) wake on lan capability for the eTSEC ethernet nodes. Cc: Li Yang <[email protected]> Cc: Zhao Chenhui <[email protected]> Signed-off-by: Claudiu Manoil <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-06word-at-a-time.h: fix some Kbuild filesChris Metcalf1-1/+0
arch/tile added word-at-a-time.h after the patch that added generic-y entries; the generic-y entry is now stale. arch/h8300 is newer than the generic-y patch for word-at-a-time.h, and needs a generic-y entry. arch/powerpc seems to have gotten a generic-y entry by mistake in the first patch; this change removes it. Signed-off-by: Chris Metcalf <[email protected]>
2015-10-06powerpc/kexec: Wait 1s for secondaries to enter OPALSamuel Mendoza-Jonas1-8/+13
Always include a timeout when waiting for secondary cpus to enter OPAL in the kexec path, rather than only when crashing. Signed-off-by: Samuel Mendoza-Jonas <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-06powerpc/8xx: Shorten irq_chip name for the SIUChristophe Leroy1-1/+1
show_interrupts() expects the irq_chip name to be max 8 characters otherwise everything get misaligned # cat /proc/interrupts CPU0 17: 0 CPM PIC 0 Level error 19: 0 MPC8XX SIU 15 Level tbint 20: 90 CPM PIC 4 Level cpm_uart 38: 29746 MPC8XX SIU 5 Level fs_enet-mac 39: 0 MPC8XX SIU 7 Level fs_enet-mac 47: 401 CPM PIC 5 Level fsl_spi 68: 1 MPC8XX SIU 2 Level phy_interrupt, phy_interrupt, phy_interrupt LOC: 7225485 Local timer interrupts for timer event device LOC: 9 Local timer interrupts for others SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts MCE: 0 Machine check exceptions Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-05powerpc/msi: Free the bitmap if it was slab allocatedDenis Kirjanov2-4/+13
During the MSI bitmap test on boot kmemleak spews the following trace: unreferenced object 0xc00000016e86c900 (size 64): comm "swapper/0", pid 1, jiffies 4294893173 (age 518.024s) hex dump (first 32 bytes): 00 00 01 ff 7f ff 7f 37 00 00 00 00 00 00 00 00 .......7........ ff ff ff ff ff ff ff ff 01 ff ff ff ff ff ff ff ................ backtrace: [<c00000000003eebc>] .zalloc_maybe_bootmem+0x3c/0x380 [<c000000000042d6c>] .msi_bitmap_alloc+0x3c/0xb0 [<c000000000a9aff8>] .msi_bitmap_selftest+0x30/0x2b4 [<c0000000000090f4>] .do_one_initcall+0xd4/0x270 [<c000000000a8e250>] .kernel_init_freeable+0x1a0/0x280 [<c000000000009b5c>] .kernel_init+0x1c/0x120 [<c000000000007fbc>] .ret_from_kernel_thread+0x58/0x9c Add a flag to msi_bitmap for tracking allocations from slab and memblock so we can properly free/handle memory in msi_bitmap_free(). Signed-off-by: Denis Kirjanov <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> [mpe: Reword changelog & use bitmap_from_slab in the if] Signed-off-by: Michael Ellerman <[email protected]>
2015-10-05powerpc/pseries: re-use code from of_helpers moduleAndy Shevchenko1-26/+5
The derive_parent() has similar semantics to what we have in newly introduced of_helpers module. The replacement reduces code base and propagates the actual error code to the caller. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-05powerpc/pseries: handle nodes without '/'Andy Shevchenko1-3/+3
In case we have node without '/' strrchr() returns NULL which might lead to crash. Replace strrchr() by kbasename() and modify condition to avoid such behaviour. Suggested-by: Segher Boessenkool <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-05powerpc/pseries: replace kmalloc + strlcpyAndy Shevchenko1-2/+1
The helper kstrndup() will do the same in one line. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-05powerpc/pseries: fix a potential memory leakAndy Shevchenko1-4/+2
In case we have a full node name like /foo/bar and /foo is not found the parent_path left unfreed. So, free a memory before return to a caller. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-05powerpc/pseries: extract of_helpers moduleAndy Shevchenko4-32/+49
Extract a new module to share the code between other modules. There is no functional change. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-04Merge branch 'strscpy' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull strscpy string copy function implementation from Chris Metcalf. Chris sent this during the merge window, but I waffled back and forth on the pull request, which is why it's going in only now. The new "strscpy()" function is definitely easier to use and more secure than either strncpy() or strlcpy(), both of which are horrible nasty interfaces that have serious and irredeemable problems. strncpy() has a useless return value, and doesn't NUL-terminate an overlong result. To make matters worse, it pads a short result with zeroes, which is a performance disaster if you have big buffers. strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking the insane NUL padding, but having a differently broken return value which returns the original length of the source string. Which means that it will read characters past the count from the source buffer, and you have to trust the source to be properly terminated. It also makes error handling fragile, since the test for overflow is unnecessarily subtle. strscpy() avoids both these problems, guaranteeing the NUL termination (but not excessive padding) if the destination size wasn't zero, and making the overflow condition very obvious by returning -E2BIG. It also doesn't read past the size of the source, and can thus be used for untrusted source data too. So why did I waffle about this for so long? Every time we introduce a new-and-improved interface, people start doing these interminable series of trivial conversion patches. And every time that happens, somebody does some silly mistake, and the conversion patch to the improved interface actually makes things worse. Because the patch is mindnumbing and trivial, nobody has the attention span to look at it carefully, and it's usually done over large swatches of source code which means that not every conversion gets tested. So I'm pulling the strscpy() support because it *is* a better interface. But I will refuse to pull mindless conversion patches. Use this in places where it makes sense, but don't do trivial patches to fix things that aren't actually known to be broken. * 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: use global strscpy() rather than private copy string: provide strscpy() Make asm/word-at-a-time.h available on all architectures
2015-10-03ebpf: migrate bpf_prog's flags to bitfieldDaniel Borkmann1-1/+1
As we need to add further flags to the bpf_prog structure, lets migrate both bools to a bitfield representation. The size of the base structure (excluding insns) remains unchanged at 40 bytes. Add also tags for the kmemchecker, so that it doesn't throw false positives. Even in case gcc would generate suboptimal code, it's not being accessed in performance critical paths. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-02powerpc/nvram: Fix function name in some errors messages.Christophe Jaillet1-7/+7
'nvram_create_os_partition' should be 'nvram_create_partition'. Use __func__ to have it right, as done elsewhere in this file. Signed-off-by: Christophe Jaillet <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-02powerpc/nvram: Add missing kfree in error pathChristophe Jaillet1-0/+1
If 'nvram_write_header' fails, then 'new_part' should be freed, otherwise, there is a memory leak. Signed-off-by: Christophe Jaillet <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-01powerpc: Add ppc64le_defconfigMichael Ellerman1-0/+4
Based directly on ppc64_defconfig using merge_config. Signed-off-by: Michael Ellerman <[email protected]>
2015-10-01powerpc/mm: Add virt_to_pfn and use this instead of opencodingAneesh Kumar K.V1-2/+3
This add helper virt_to_pfn and remove the opencoded usage of the same. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-01powerpc/vdso: Avoid link stack corruption in __get_datapage()Michael Neuling2-10/+14
powerpc has a link register (lr) used for calling functions. We "bl <func>" to call a function, and "blr" to return back to the call site. The lr is only a single register, so if we call another function from inside this function (ie. nested calls), software must save away the lr on the software stack before calling the new function. Before returning (ie. before the "blr"), the lr is restored by software from the software stack. This makes branch prediction quite difficult for the processor as it will only know the branch target just before the "blr". To help with this, modern powerpc processors keep a (non-architected) hardware stack of lr called a "link stack". When a "bl <func>" is run, the lr is pushed onto this stack. When a "blr" is called, the branch predictor pops the lr value from the top of the link stack, and uses it to predict the branch target. Hence the processor pipeline knows a lot earlier the branch target. This works great but there are some cases where you call "bl" but without a matching "blr". Once such case is when trying to determine the program counter (which can't be read directly). Here you "bl+4; mflr" to get the program counter. If you do this, the link stack will get out of sync with reality, causing the branch predictor to mis-predict subsequent function returns. To avoid this, modern micro-architectures have a special case of bl. Using the form "bcl 20,31,+4", ensures the processor doesn't push to the link stack. The 32 and 64 bit variants of __get_datapage() use a "bl; mflr" to determine the loaded address of the VDSO. The current versions of these attempt to use this special bl variant. Unfortunately they use +8 rather than the required +4. Hence the current code results in the link stack getting out of sync with reality and hence the resulting performance degradation. This patch moves it to bcl+4 by moving __kernel_datapage_offset out of __get_datapage(). With this patch, running a gettimeofday() (which uses __get_datapage()) microbenchmark we get a decent bump in performance on POWER7/8. For the benchmark in tools/testing/selftests/powerpc/benchmarks/gettimeofday.c POWER8: 64bit gets ~4% improvement 32bit gets ~9% improvement POWER7: 64bit gets ~7% improvement Signed-off-by: Michael Neuling <[email protected]> Reported-by: Aaron Sawdey <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-01powerpc/slb: Use a local to avoid multiple calls to get_slb_shadow()Michael Ellerman1-5/+5
For no reason other than it looks ugly. Signed-off-by: Michael Ellerman <[email protected]>
2015-10-01powerpc/slb: Define an enum for the bolted indexesAnshuman Khandual1-21/+26
This patch defines macros for the three bolted SLB indexes we use. Switch the functions that take the indexes as an argument to use the enum. Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-10-01powerpc/vdso: Emit GNU & SysV hashesMichael Ellerman2-2/+2
Andy Lutomirski says: Some dynamic loaders may be slightly faster if a GNU hash is available. This is unlikely to have any measurable effect on the time it takes to resolve vdso symbols (since there are so few of them). In some contexts, it can be a win for a different reason: if every DSO has a GNU hash section, then libc can avoid calculating SysV hashes at all. Both musl and glibc appear to have this optimization. Signed-off-by: Michael Ellerman <[email protected]>
2015-09-30powerpc/ps3: Remove unused os_area_db_id_video_modeMichael Ellerman1-5/+0
This struct is unused, which is now a build error with gcc 6: error: 'os_area_db_id_video_mode' defined but not used There doesn't seem to be any good reason to keep it around so remove it, it's in the history if anyone needs it. Signed-off-by: Michael Ellerman <[email protected]>
2015-09-29powerpc/ps3: Refresh ps3_defconfigGeoff Levand1-5/+0
Refresh and remove obsolete CONFIG_EXT3_FS. Signed-off-by: Geoff Levand <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-09-29powerpc: Kconfig: remove BE-only platforms from LE kernel buildBoqun Feng5-6/+6
Currently, little endian is only supported on powernv and pseries, however, Kconfigs still allow us to include other platforms in a LE kernel, this may result in space wasting or even build error if some BE-only platforms always assume they are built for a BE kernel. So just modify the Kconfigs of BE-only platforms to remove them from being built for a LE kernel. For 32bit only platforms, nothing needs to be done, because CPU_LITTLE_ENDIAN depends on PPC64. For 64bit supported platforms, add CPU_BIG_ENDIAN to dependencies explicitly, so that these platforms will be disabled for LE [Suggested-by: Cédric Le Goater <[email protected]>]. Signed-off-by: Boqun Feng <[email protected]> Acked-by: Geoff Levand <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-09-29powerpc/configs: Re-enable CONFIG_SCSI_DHMichael Ellerman2-2/+2
Commit 086b91d052eb ("scsi_dh: integrate into the core SCSI code") changed CONFIG_SCSI_DH from tristate to bool. Our defconfigs have CONFIG_SCSI_DH=m, which the kconfig machinery warns us is invalid, but instead of converting it to =y it leaves it unset. This means we loose the CONFIG_SCSI_DH code and everything that depends on it. So convert the values in the defconfigs to =y. Fixes: 086b91d052eb ("scsi_dh: integrate into the core SCSI code") Signed-off-by: Michael Ellerman <[email protected]>
2015-09-28Merge branch 'linus' into perf/core, to pick up fixes before applying new ↵Ingo Molnar53-70/+131
changes Signed-off-by: Ingo Molnar <[email protected]>
2015-09-25Merge tag 'pci-v4.3-fixes-1' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "These are fixes for things we merged for v4.3 (VPD, MSI, and bridge window management), and a new Renesas R8A7794 SoC device ID. Details: Resource management: - Revert pci_read_bridge_bases() unification (Bjorn Helgaas) - Clear IORESOURCE_UNSET when clipping a bridge window (Bjorn Helgaas) MSI: - Fix MSI IRQ domains for VFs on virtual buses (Alex Williamson) Renesas R-Car host bridge driver: - Add R8A7794 support (Sergei Shtylyov) Miscellaneous: - Fix devfn for VPD access through function 0 (Alex Williamson) - Use function 0 VPD only for identical functions (Alex Williamson)" * tag 'pci-v4.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: rcar: Add R8A7794 support PCI: Use function 0 VPD for identical functions, regular VPD for others PCI: Fix devfn for VPD access through function 0 PCI/MSI: Fix MSI IRQ domains for VFs on virtual buses PCI: Clear IORESOURCE_UNSET when clipping a bridge window PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code"
2015-09-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-1/+13
Pull KVM fixes from Paolo Bonzini: "AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC bug fixes too" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: disable halt_poll_ns as default for s390x KVM: x86: fix off-by-one in reserved bits check KVM: x86: use correct page table format to check nested page table reserved bits KVM: svm: do not call kvm_set_cr0 from init_vmcb KVM: x86: trap AMD MSRs for the TSeg base and mask KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store() KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs kvm: svm: reset mmu on VCPU reset
2015-09-25KVM: disable halt_poll_ns as default for s390xDavid Hildenbrand1-0/+1
We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2015-09-21powerpc: Wire up sys_membarrier()Michael Ellerman3-1/+3
The selftest passes on 64-bit LE & BE, and 32-bit. Signed-off-by: Michael Ellerman <[email protected]>
2015-09-21KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()Thomas Huth1-0/+6
Access to the kvm->buses (like with the kvm_io_bus_read() and -write() functions) has to be protected via the kvm->srcu lock. The kvmppc_h_logical_ci_load() and -store() functions are missing this lock so far, so let's add it there, too. This fixes the problem that the kernel reports "suspicious RCU usage" when lock debugging is enabled. Cc: [email protected] # v4.1+ Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1 Signed-off-by: Thomas Huth <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2015-09-21KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exitGautham R. Shenoy1-0/+1
In guest_exit_cont we call kvmhv_commence_exit which expects the trap number as the argument. However r3 doesn't contain the trap number at this point and as a result we would be calling the function with a spurious trap number. Fix this by copying r12 into r3 before calling kvmhv_commence_exit as r12 contains the trap number. Cc: [email protected] # v4.1+ Fixes: eddb60fb1443 Signed-off-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2015-09-21KVM: PPC: Book3S HV: Fix handling of interrupted VCPUsPaul Mackerras1-1/+5
This fixes a bug which results in stale vcore pointers being left in the per-cpu preempted vcore lists when a VM is destroyed. The result of the stale vcore pointers is usually either a crash or a lockup inside collect_piggybacks() when another VM is run. A typical lockup message looks like: [ 472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039] [ 472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv] [ 472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49 [ 472.162187] task: c000001e38512750 ti: c000001e41bfc000 task.ti: c000001e41bfc000 [ 472.162262] NIP: c00000000096b094 LR: c00000000096b08c CTR: c000000000111130 [ 472.162337] REGS: c000001e41bff520 TRAP: 0901 Not tainted (4.2.0-kvm+) [ 472.162399] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24848844 XER: 00000000 [ 472.162588] CFAR: c00000000096b0ac SOFTE: 1 GPR00: c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001 GPR04: 0000000000000003 0000000000000001 0000000000000000 0000000000874821 GPR08: c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740 GPR12: c000000000111130 c00000000fdae400 [ 472.163053] NIP [c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130 [ 472.163117] LR [c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130 [ 472.163179] Call Trace: [ 472.163206] [c000001e41bff7a0] [c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable) [ 472.163295] [c000001e41bff7e0] [c000000000111170] __wake_up+0x40/0x90 [ 472.163375] [c000001e41bff830] [d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv] [ 472.163465] [c000001e41bffa30] [d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv] [ 472.163559] [c000001e41bffb70] [d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm] [ 472.163653] [c000001e41bffba0] [d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm] [ 472.163745] [c000001e41bffbe0] [d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm] [ 472.163834] [c000001e41bffd40] [c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0 [ 472.163910] [c000001e41bffde0] [c0000000002d1364] SyS_ioctl+0xd4/0xf0 [ 472.163986] [c000001e41bffe30] [c000000000009260] system_call+0x38/0xd0 [ 472.164060] Instruction dump: [ 472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2 [ 472.164224] 7fc3f378 4b6a57c1 60000000 7c210b78 <e92d0000> 89290009 792affe3 40820070 The bug is that kvmppc_run_vcpu does not correctly handle the case where a vcpu task receives a signal while its guest vcpu is executing in the guest as a result of being piggy-backed onto the execution of another vcore. In that case we need to wait for the vcpu to finish executing inside the guest, and then remove this vcore from the preempted vcores list. That way, we avoid leaving this vcpu's vcore on the preempted vcores list when the vcpu gets interrupted. Fixes: ec2571650826 Reported-by: Thomas Huth <[email protected]> Tested-by: Thomas Huth <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2015-09-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-0/+3
Pull KVM fixes from Paolo Bonzini: "Mostly stable material, a lot of ARM fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) sched: access local runqueue directly in single_task_running arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' arm64: KVM: Remove all traces of the ThumbEE registers arm: KVM: Disable virtual timer even if the guest is not using it arm64: KVM: Disable virtual timer even if the guest is not using it arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources KVM: s390: Replace incorrect atomic_or with atomic_andnot arm: KVM: Fix incorrect device to IPA mapping arm64: KVM: Fix user access for debug registers KVM: vmx: fix VPID is 0000H in non-root operation KVM: add halt_attempted_poll to VCPU stats kvm: fix zero length mmio searching kvm: fix double free for fast mmio eventfd kvm: factor out core eventfd assign/deassign logic kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd KVM: make the declaration of functions within 80 characters KVM: arm64: add workaround for Cortex-A57 erratum #852523 KVM: fix polling for guest halt continued even if disable it arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores arm64: KVM: set {v,}TCR_EL2 RES1 bits ...
2015-09-18Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds33-54/+48
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "This is a rather large update post rc1 due to the final steps of cleanups and API changes which had to wait for the preparatory patches to hit your tree. - Regression fixes for ARM GIC irqchips - Regression fixes and lockdep anotations for renesas irq chips - The leftovers of the cleanup and preparatory patches which have been ignored by maintainers - Final conversions of the newly merged users of obsolete APIs - Final removal of obsolete APIs - Final removal of ARM artifacts which had been introduced during the conversion of ARM to the generic interrupt code. - Final split of the irq_data into chip specific and common data to reflect the needs of hierarchical irq domains. - Treewide removal of the first argument of interrupt flow handlers, i.e. the irq number, which is not used by the majority of handlers and simple to retrieve from the other argument the irq descriptor. - A few comment updates and build warning fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) arm64: Remove ununsed set_irq_flags ARM: Remove ununsed set_irq_flags sh: Kill off set_irq_flags usage irqchip: Kill off set_irq_flags usage gpu/drm: Kill off set_irq_flags usage genirq: Remove irq argument from irq flow handlers genirq: Move field 'msi_desc' from irq_data into irq_common_data genirq: Move field 'affinity' from irq_data into irq_common_data genirq: Move field 'handler_data' from irq_data into irq_common_data genirq: Move field 'node' from irq_data into irq_common_data irqchip/gic-v3: Use IRQD_FORWARDED_TO_VCPU flag irqchip/gic: Use IRQD_FORWARDED_TO_VCPU flag genirq: Provide IRQD_FORWARDED_TO_VCPU status flag genirq: Simplify irq_data_to_desc() genirq: Remove __irq_set_handler_locked() pinctrl/pistachio: Use irq_set_handler_locked gpio: vf610: Use irq_set_handler_locked powerpc/mpc8xx: Use irq_set_handler_locked() powerpc/ipic: Use irq_set_handler_locked() powerpc/cpm2: Use irq_set_handler_locked() ...
2015-09-18Merge tag 'powerpc-4.3-2' of ↵Linus Torvalds14-14/+58
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix 32-bit TCE table init in kdump kernel from Nish - Fix kdump with non-power-of-2 crashkernel= from Nish - Abort cxl_pci_enable_device_hook() if PCI channel is offline from Andrew - Fix to release DRC when configure_connector() fails from Bharata - Wire up sys_userfaultfd() - Fix race condition in tearing down MSI interrupts from Paul - Fix unbalanced pci_dev_get() in cxl_probe() from Daniel - Fix cxl build failure due to -Wunused-variable gcc behaviour change from Ian - Tell the toolchain to use ABI v2 when building an LE boot wrapper from Benh - Fix THP to recompute hash value after a failed update from Aneesh - 32-bit memcpy/memset: only use dcbz once cache is enabled from Christophe * tag 'powerpc-4.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc32: memset: only use dcbz once cache is enabled powerpc32: memcpy: only use dcbz once cache is enabled powerpc/mm: Recompute hash value after a failed update powerpc/boot: Specify ABI v2 when building an LE boot wrapper cxl: Fix build failure due to -Wunused-variable behaviour change cxl: Fix unbalanced pci_dev_get in cxl_probe powerpc/MSI: Fix race condition in tearing down MSI interrupts powerpc: Wire up sys_userfaultfd() powerpc/pseries: Release DRC when configure_connector fails cxl: abort cxl_pci_enable_device_hook() if PCI channel is offline powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel= powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel
2015-09-17powerpc/PCI: Fix lookup of linux,pci-probe-only propertyMarc Zyngier1-12/+2
When find_and_init_phbs() looks for the probe-only property, it seems to trust the firmware to be correctly written, and assumes that there is a parameter to the property. It is conceivable that the firmware could not be that perfect, and it could expose this property naked (at least one arm64 platform seems to exhibit this exact behaviour). The setup code the ends up making a decision based on whatever the property pointer points to, which is likely to be junk. Instead, switch to the common of_pci.c implementation that doesn't suffer from this problem and ignore the property if the firmware couldn't make up its mind. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Rob Herring <[email protected]> Acked-by: Michael Ellerman <[email protected]>
2015-09-17powerpc32: memset: only use dcbz once cache is enabledLEROY Christophe2-0/+9
memset() uses instruction dcbz to speed up clearing by not wasting time loading cache line with data that will be overwritten. Some platform like mpc52xx do no have cache active at startup and can therefore not use memset(). Allthough no part of the code explicitly uses memset(), GCC may make calls to it. This patch modifies memset() such that at startup, memset() unconditionally skip the optimised bloc that uses dcbz instruction. Once the initial MMU is set up, in machine_init() we patch memset() by replacing this inconditional jump by a NOP Tested-by: Thomas Gleixner <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-09-17powerpc32: memcpy: only use dcbz once cache is enabledLEROY Christophe2-0/+8
memcpy() uses instruction dcbz to speed up copy by not wasting time loading cache line with data that will be overwritten. Some platform like mpc52xx do no have cache active at startup and can therefore not use memcpy(). Allthough no part of the code explicitly uses memcpy(), GCC makes calls to it. This patch modifies memcpy() such that at startup, memcpy() unconditionally jumps to generic_memcpy() which doesn't use the dcbz instruction. Once the initial MMU is set up, in machine_init() we patch memcpy() by replacing this inconditional jump by a NOP Reported-by: Michal Sojka <[email protected]> Tested-by: Thomas Gleixner <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-09-16genirq: Remove irq argument from irq flow handlersThomas Gleixner26-46/+35
Most interrupt flow handlers do not use the irq argument. Those few which use it can retrieve the irq number from the irq descriptor. Remove the argument. Search and replace was done with coccinelle and some extra helper scripts around it. Thanks to Julia for her help! Signed-off-by: Thomas Gleixner <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Jiang Liu <[email protected]>
2015-09-16powerpc/mpc8xx: Use irq_set_handler_locked()Thomas Gleixner1-1/+1
Use irq_set_handler_locked() as it avoids a redundant lookup of the irq descriptor. Search and replacement was done with coccinelle: @@ struct irq_data *d; expression E1; @@ -__irq_set_handler_locked(d->irq, E1); +irq_set_handler_locked(d, E1); Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: [email protected]
2015-09-16powerpc/ipic: Use irq_set_handler_locked()Thomas Gleixner1-2/+2
Use irq_set_handler_locked() as it avoids a redundant lookup of the irq descriptor. Search and replacement was done with coccinelle: @@ struct irq_data *d; expression E1; @@ -__irq_set_handler_locked(d->irq, E1); +irq_set_handler_locked(d, E1); Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: [email protected]
2015-09-16powerpc/cpm2: Use irq_set_handler_locked()Thomas Gleixner1-2/+2
Use irq_set_handler_locked() as it avoids a redundant lookup of the irq descriptor. Search and replacement was done with coccinelle: @@ struct irq_data *d; expression E1; @@ -__irq_set_handler_locked(d->irq, E1); +irq_set_handler_locked(d, E1); Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: [email protected]
2015-09-16powerpc/mpc52xx: Use irq_set_handler_locked()Thomas Gleixner1-1/+1
Use irq_set_handler_locked() as it avoids a redundant lookup of the irq descriptor. Search and replacement was done with coccinelle: @@ struct irq_data *d; expression E1; @@ -__irq_set_handler_locked(d->irq, E1); +irq_set_handler_locked(d, E1); Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Anatolij Gustschin <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: [email protected]