aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-06-04Merge branch 'siginfo-linus' of ↵Linus Torvalds2-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "This set of changes close the known issues with setting si_code to an invalid value, and with not fully initializing struct siginfo. There remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64 and x86 to get the code that generates siginfo into a simpler and more maintainable state. Most of that work involves refactoring the signal handling code and thus careful code review. Also not included is the work to shrink the in kernel version of struct siginfo. That depends on getting the number of places that directly manipulate struct siginfo under control, as it requires the introduction of struct kernel_siginfo for the in kernel things. Overall this set of changes looks like it is making good progress, and with a little luck I will be wrapping up the siginfo work next development cycle" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) signal/sh: Stop gcc warning about an impossible case in do_divide_error signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions signal/um: More carefully relay signals in relay_signal. signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR} signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code signal/signalfd: Add support for SIGSYS signal/signalfd: Remove __put_user from signalfd_copyinfo signal/xtensa: Use force_sig_fault where appropriate signal/xtensa: Consistenly use SIGBUS in do_unaligned_user signal/um: Use force_sig_fault where appropriate signal/sparc: Use force_sig_fault where appropriate signal/sparc: Use send_sig_fault where appropriate signal/sh: Use force_sig_fault where appropriate signal/s390: Use force_sig_fault where appropriate signal/riscv: Replace do_trap_siginfo with force_sig_fault signal/riscv: Use force_sig_fault where appropriate signal/parisc: Use force_sig_fault where appropriate signal/parisc: Use force_sig_mceerr where appropriate signal/openrisc: Use force_sig_fault where appropriate signal/nios2: Use force_sig_fault where appropriate ...
2018-06-04Merge tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-3/+0
Pull dma-mapping updates from Christoph Hellwig: - replace the force_dma flag with a dma_configure bus method. (Nipun Gupta, although one patch is іncorrectly attributed to me due to a git rebase bug) - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai) - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the right thing for bounce buffering. - move dma-debug initialization to common code, and apply a few cleanups to the dma-debug code. - cleanup the Kconfig mess around swiotlb selection - swiotlb comment fixup (Yisheng Xie) - a trivial swiotlb fix. (Dan Carpenter) - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt) - add a new generic dma-noncoherent dma_map_ops implementation and use it for arc, c6x and nds32. - improve scatterlist validity checking in dma-debug. (Robin Murphy) - add a struct device quirk to limit the dma-mask to 32-bit due to bridge/system issues, and switch x86 to use it instead of a local hack for VIA bridges. - handle devices without a dma_mask more gracefully in the dma-direct code. * tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits) dma-direct: don't crash on device without dma_mask nds32: use generic dma_noncoherent_ops nds32: implement the unmap_sg DMA operation nds32: consolidate DMA cache maintainance routines x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag x86/pci-dma: remove the explicit nodac and allowdac option x86/pci-dma: remove the experimental forcesac boot option Documentation/x86: remove a stray reference to pci-nommu.c core, dma-direct: add a flag 32-bit dma limits dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs dma-debug: check scatterlist segments c6x: use generic dma_noncoherent_ops arc: use generic dma_noncoherent_ops arc: fix arc_dma_{map,unmap}_page arc: fix arc_dma_sync_sg_for_{cpu,device} arc: simplify arc_dma_sync_single_for_{cpu,device} dma-mapping: provide a generic dma-noncoherent implementation dma-mapping: simplify Kconfig dependencies riscv: add swiotlb support riscv: only enable ZONE_DMA32 for 64-bit ...
2018-06-04Merge branch 'hch.procfs' of ↵Linus Torvalds2-41/+5
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull procfs updates from Al Viro: "Christoph's proc_create_... cleanups series" * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits) xfs, proc: hide unused xfs procfs helpers isdn/gigaset: add back gigaset_procinfo assignment proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields tty: replace ->proc_fops with ->proc_show ide: replace ->proc_fops with ->proc_show ide: remove ide_driver_proc_write isdn: replace ->proc_fops with ->proc_show atm: switch to proc_create_seq_private atm: simplify procfs code bluetooth: switch to proc_create_seq_data netfilter/x_tables: switch to proc_create_seq_private netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data neigh: switch to proc_create_seq_data hostap: switch to proc_create_{seq,single}_data bonding: switch to proc_create_seq_data rtc/proc: switch to proc_create_single_data drbd: switch to proc_create_single resource: switch to proc_create_seq_data staging/rtl8192u: simplify procfs code jfs: simplify procfs code ...
2018-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull KVM fixes from Radim Krčmář: "PPC: - Close a hole which could possibly lead to the host timebase getting out of sync. - Three fixes relating to PTEs and TLB entries for radix guests. - Fix a bug which could lead to an interrupt never getting delivered to the guest, if it is pending for a guest vCPU when the vCPU gets offlined. s390: - Fix false negatives in VSIE validity check (Cc stable) x86: - Fix time drift of VMX preemption timer when a guest uses LAPIC timer in periodic mode (Cc stable) - Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow migration from hosts that don't need retpoline mitigation (Cc stable) - Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and CPUID.OSXSAVE (Cc stable) - Report correct RIP after Hyper-V hypercall #UD (introduced in -rc6)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix #UD address of failed Hyper-V hypercalls kvm: x86: IA32_ARCH_CAPABILITIES is always supported KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed x86/kvm: fix LAPIC timer drift when guest uses periodic mode KVM: s390: vsie: fix < 8k check for the itdba KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change KVM: PPC: Book3S HV: Make radix clear pte when unmapping KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
2018-05-25Merge tag 'powerpc-4.17-7' of ↵Linus Torvalds2-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Just one fix, to make sure the PCR (Processor Compatibility Register) is reset on boot. Otherwise if we're running in compat mode in a guest (eg. pretending a Power9 is a Power8) and the host kernel oopses and kdumps then the kdump kernel's userspace will be running in Power8 mode, and will SIGILL if it uses Power9-only instructions. Thanks to Michael Neuling" * tag 'powerpc-4.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Clear PCR on boot
2018-05-21powerpc/64s: Add support for a store forwarding barrier at kernel entry/exitNicholas Piggin3-2/+180
On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths. This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs. Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected. Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching. Firmware advertisement is not implemented yet, so CPU flush types are hard coded. Thanks to Michal Suchánek for bug fixes and review. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18powerpc/64s: Clear PCR on bootMichael Neuling2-0/+7
Clear the PCR (Processor Compatibility Register) on boot to ensure we are not running in a compatibility mode. We've seen this cause problems when a crash (and kdump) occurs while running compat mode guests. The kdump kernel then runs with the PCR set and causes problems. The symptom in the kdump kernel (also seen in petitboot after fast-reboot) is early userspace programs taking sigills on newer instructions (seen in libc). Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-17KVM: PPC: Book3S HV: Snapshot timebase offset on guest entryPaul Mackerras1-0/+1
Currently, the HV KVM guest entry/exit code adds the timebase offset from the vcore struct to the timebase on guest entry, and subtracts it on guest exit. Which is fine, except that it is possible for userspace to change the offset using the SET_ONE_REG interface while the vcore is running, as there is only one timebase offset per vcore but potentially multiple VCPUs in the vcore. If that were to happen, KVM would subtract a different offset on guest exit from that which it had added on guest entry, leading to the timebase being out of sync between cores in the host, which then leads to bad things happening such as hangs and spurious watchdog timeouts. To fix this, we add a new field 'tb_offset_applied' to the vcore struct which stores the offset that is currently applied to the timebase. This value is set from the vcore tb_offset field on guest entry, and is what is subtracted from the timebase on guest exit. Since it is zero when the timebase offset is not applied, we can simplify the logic in kvmhv_start_timing and kvmhv_accumulate_time. In addition, we had secondary threads reading the timebase while running concurrently with code on the primary thread which would eventually add or subtract the timebase offset from the timebase. This occurred while saving or restoring the DEC register value on the secondary threads. Although no specific incorrect behaviour has been observed, this is a race which should be fixed. To fix it, we move the DEC saving code to just before we call kvmhv_commence_exit, and the DEC restoring code to after the point where we have waited for the primary thread to switch the MMU context and add the timebase offset. That way we are sure that the timebase contains the guest timebase value in both cases. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-16proc: introduce proc_create_single{,_data}Christoph Hellwig2-41/+5
Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-08dma-debug: move initialization to common codeChristoph Hellwig1-3/+0
Most mainstream architectures are using 65536 entries, so lets stick to that. If someone is really desperate to override it that can still be done through <asm/dma-mapping.h>, but I'd rather see a really good rationale for that. dma_debug_init is now called as a core_initcall, which for many architectures means much earlier, and provides dma-debug functionality earlier in the boot process. This should be safe as it only relies on the memory allocator already being available. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-04-27powerpc: Fix deadlock with multiple calls to smp_send_stopNicholas Piggin1-16/+39
smp_send_stop can lock up the IPI path for any subsequent calls, because the receiving CPUs spin in their handler function. This started becoming a problem with the addition of an smp_send_stop call in the reboot path, because panics can reboot after doing their own smp_send_stop. The NMI IPI variant was fixed with ac61c11566 ("powerpc: Fix smp_send_stop NMI IPI handling"), which leaves the smp_call_function variant. This is fixed by having smp_send_stop only ever do the smp_call_function once. This is a bit less robust than the NMI IPI fix, because any other call to smp_call_function after smp_send_stop could deadlock, but that has always been the case, and it was not been a problem before. Fixes: f2748bdfe1573 ("powerpc/powernv: Always stop secondaries before reboot/shutdown") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-25signal/powerpc: Replace TRAP_FIXME with TRAP_UNKEric W. Biederman1-2/+2
Using an si_code of 0 that aliases with SI_USER is clearly the wrong thing todo, and causes problems in interesting ways. For use in unknown_exception the recently defined TRAP_UNK semantically is a perfect fit. For use in RunModeException it looks like something more specific than TRAP_UNK could be used. No one has bothered to find a better fit than the broken si_code of 0 in all of these years and I don't see an obvious better fit so TRAP_UNK is switching RunModeException to return TRAP_UNK is clearly an improvement. Recent history suggests no actually cares about crazy corner cases of the kernel behavior like this so I don't expect any regressions from changing this. However if something does happen this change is easy to revert. Though I wonder if SIGKILL might not be a better fit. Cc: Paul Mackerras <paulus@samba.org> Cc: Kumar Gala <kumar.gala@freescale.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx") Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-25signal/powerpc: Replace FPE_FIXME with FPE_FLTUNKEric W. Biederman1-3/+3
Using an si_code of 0 that aliases with SI_USER is clearly the wrong thing todo, and causes problems in interesting ways. The newly defined FPE_FLTUNK semantically appears to fit the bill so use it instead. Cc: Paul Mackerras <paulus@samba.org> Cc: Kumar Gala <kumar.gala@freescale.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx") Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-25signal: Ensure every siginfo we send has all bits initializedEric W. Biederman2-2/+2
Call clear_siginfo to ensure every stack allocated siginfo is properly initialized before being passed to the signal sending functions. Note: It is not safe to depend on C initializers to initialize struct siginfo on the stack because C is allowed to skip holes when initializing a structure. The initialization of struct siginfo in tracehook_report_syscall_exit was moved from the helper user_single_step_siginfo into tracehook_report_syscall_exit itself, to make it clear that the local variable siginfo gets fully initialized. In a few cases the scope of struct siginfo has been reduced to make it clear that siginfo siginfo is not used on other paths in the function in which it is declared. Instances of using memset to initialize siginfo have been replaced with calls clear_siginfo for clarity. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-25powerpc: Fix smp_send_stop NMI IPI handlingNicholas Piggin1-5/+17
The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count over the handler function call, which causes later smp_send_nmi_ipi() callers to spin until the call is finished. The stop_this_cpu() function never returns, so the busy count is never decremeted, which can cause the system to hang in some cases. For example panic() will call smp_send_stop() early on which calls stop_this_cpu() on other CPUs, then later in the reboot path, pnv_restart() will call smp_send_stop() again, which hangs. Fix this by adding a special case to the stop_this_cpu() handler to decrement the busy count, because it will never return. Now that the NMI/non-NMI versions of stop_this_cpu() are different, split them out into separate functions rather than doing #ifdef tricks to share the body between the two functions. Fixes: 6bed3237624e3 ("powerpc: use NMI IPI for smp_send_stop") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out the functions, tweak change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/mce: Fix a bug where mce loops on memory UE.Mahesh Salgaonkar1-5/+2
The current code extracts the physical address for UE errors and then hooks it up into memory failure infrastructure. On successful extraction of physical address it wrongly sets "handled = 1" which means this UE error has been recovered. Since MCE handler gets return value as handled = 1, it assumes that error has been recovered and goes back to same NIP. This causes MCE interrupt again and again in a loop leading to hard lockup. Also, initialize phys_addr to ULONG_MAX so that we don't end up queuing undesired page to hwpoison. Without this patch we see: Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 ... Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned ... Watchdog CPU:38 Hard LOCKUP After this patch we see: Severe Machine check interrupt [Not recovered] NIP: [00007fffaae585f4] PID: 7168 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffaafe28ac Physical address: 00002017c0bd0000 find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4 Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered Fixes: 01eaac2b0591 ("powerpc/mce: Hookup ierror (instruction) UE errors") Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19powerpc/kvm: Fix lockups when running KVM guests on Power8Michael Ellerman1-2/+2
When running KVM guests on Power8 we can see a lockup where one CPU stops responding. This often leads to a message such as: watchdog: CPU 136 detected hard LOCKUP on other CPUs 72 Task dump for CPU 72: qemu-system-ppc R running task 10560 20917 20908 0x00040004 And then backtraces on other CPUs, such as: Task dump for CPU 48: ksmd R running task 10032 1519 2 0x00000804 Call Trace: ... --- interrupt: 901 at smp_call_function_many+0x3c8/0x460 LR = smp_call_function_many+0x37c/0x460 pmdp_invalidate+0x100/0x1b0 __split_huge_pmd+0x52c/0xdb0 try_to_unmap_one+0x764/0x8b0 rmap_walk_anon+0x15c/0x370 try_to_unmap+0xb4/0x170 split_huge_page_to_list+0x148/0xa30 try_to_merge_one_page+0xc8/0x990 try_to_merge_with_ksm_page+0x74/0xf0 ksm_scan_thread+0x10ec/0x1ac0 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x78 This is caused by commit 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle"), which added a check in pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and test of kvm_hstate.hwthread_req. The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when entering the guest, so it can then come out to cede with KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after setting hwthread_req to 1, but because hwthread_state is still KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we wake up from idle and won't go to kvm_start_guest. From there the thread will return somewhere garbage and crash. Fix it by skipping the store of hwthread_state, but not the test of hwthread_req, when coming out of idle. It's OK to skip the sync in that case because hwthread_req will have been set on the same thread, so there is no synchronisation required. Fixes: 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19powerpc/eeh: Fix enabling bridge MMIO windowsMichael Neuling1-1/+2
On boot we save the configuration space of PCIe bridges. We do this so when we get an EEH event and everything gets reset that we can restore them. Unfortunately we save this state before we've enabled the MMIO space on the bridges. Hence if we have to reset the bridge when we come back MMIO is not enabled and we end up taking an PE freeze when the driver starts accessing again. This patch forces the memory/MMIO and bus mastering on when restoring bridges on EEH. Ideally we'd do this correctly by saving the configuration space writes later, but that will have to come later in a larger EEH rewrite. For now we have this simple fix. The original bug can be triggered on a boston machine by doing: echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound On boston, this PHB has a PCIe switch on it. Without this patch, you'll see two EEH events, 1 expected and 1 the failure we are fixing here. The second EEH event causes the anything under the PHB to disappear (i.e. the i40e eth). With this patch, only 1 EEH event occurs and devices properly recover. Fixes: 652defed4875 ("powerpc/eeh: Check PCIe link after reset") Cc: stable@vger.kernel.org # v3.11+ Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-17powerpc/64s: Default l1d_size to 64K in RFI fallback flushMadhavan Srinivasan1-0/+11
If there is no d-cache-size property in the device tree, l1d_size could be zero. We don't actually expect that to happen, it's only been seen on mambo (simulator) in some configurations. A zero-size l1d_size leads to the loop in the asm wrapping around to 2^64-1, and then walking off the end of the fallback area and eventually causing a page fault which is fatal. Just default to 64K which is correct on some CPUs, and sane enough to not cause a crash on others. Fixes: aa8a5e0062ac9 ('powerpc/64s: Add support for RFI flush of L1-D cache') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Rewrite comment and change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-15Merge tag 'powerpc-4.17-2' of ↵Linus Torvalds3-29/+19
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes when loading modules built with a different CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic. - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions from firmware. - Remove tlbie trace points from KVM code that's called in real mode, because it causes crashes. - Fix checkstops caused by invalid tlbiel on Power9 Radix. - Ensure the set of CPU features we "know" are always enabled is actually the minimal set when we build with support for firmware supplied CPU features. Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin. * tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features powerpc/mm/radix: Fix checkstops caused by invalid tlbiel KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode powerpc/8xx: Fix build with hugetlbfs enabled powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops powerpc/fscr: Enable interrupts earlier before calling get_user() powerpc/64s: Fix section mismatch warnings from setup_rfi_flush() powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
2018-04-13kernel/kexec_file.c: allow archs to set purgatory load addressPhilipp Rudo1-4/+5
For s390 new kernels are loaded to fixed addresses in memory before they are booted. With the current code this is a problem as it assumes the kernel will be loaded to an 'arbitrary' address. In particular, kexec_locate_mem_hole searches for a large enough memory region and sets the load address (kexec_bufer->mem) to it. Luckily there is a simple workaround for this problem. By returning 1 in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This allows the architecture to set kbuf->mem by hand. While the trick works fine for the kernel it does not for the purgatory as here the architectures don't have access to its kexec_buffer. Give architectures access to the purgatories kexec_buffer by changing kexec_load_purgatory to take a pointer to it. With this change architectures have access to the buffer and can edit it as they need. A nice side effect of this change is that we can get rid of the purgatory_info->purgatory_load_address field. As now the information stored there can directly be accessed from kbuf->mem. Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec_file,x86,powerpc: factor out kexec_file_ops functionsAKASHI Takahiro2-37/+4
As arch_kexec_kernel_image_{probe,load}(), arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig() are almost duplicated among architectures, they can be commonalized with an architecture-defined kexec_file_ops array. So let's factor them out. Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU featuresMichael Ellerman1-13/+1
The cpu_has_feature() mechanism has an optimisation where at build time we construct a mask of the CPU feature bits that will always be true for the given .config, based on the platform/bitness/etc. that we are building for. That is incompatible with DT CPU features, where the set of CPU features is dependent on feature flags that are given to us by firmware. The result is that some feature bits can not be *disabled* by DT CPU features. Or more accurately, they can be disabled but they will still appear in the ALWAYS mask, meaning cpu_has_feature() will always return true for them. In the past this hasn't really been a problem because on Book3S 64 (where we support DT CPU features), the set of ALWAYS bits has been very small. That was because we always built for POWER4 and later, meaning the set of common bits was small. The only bit that could be cleared by DT CPU features that was also in the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in the alignment handler to create a fake DSISR. That code was itself deleted in 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults") (Sep 2017). However the set of ALWAYS features changed with the recent commit db5ae1c155af ("powerpc/64s: Refine feature sets for little endian builds") which restricted the set of feature flags when building little endian to Power7 or later. That caused the ALWAYS mask to become much larger for little endian builds. The result is that the following feature bits can currently not be *disabled* by DT CPU features: CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT, CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY, CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR. To fix it we need to mask the set of ALWAYS features with the base set of DT CPU features, ie. the features that are always enabled by DT CPU features. That way there are no bits in the ALWAYS mask that are not also always set by DT CPU features. Fixes: db5ae1c155af ("powerpc/64s: Refine feature sets for little endian builds") Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-10powerpc/fscr: Enable interrupts earlier before calling get_user()Anshuman Khandual1-15/+17
The function get_user() can sleep while trying to fetch instruction from user address space and causes the following warning from the scheduler. BUG: sleeping function called from invalid context Though interrupts get enabled back but it happens bit later after get_user() is called. This change moves enabling these interrupts earlier covering the function get_user(). While at this, lets check for kernel mode and crash as this interrupt should not have been triggered from the kernel context. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-10powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()Michael Ellerman1-1/+1
The recent LPM changes to setup_rfi_flush() are causing some section mismatch warnings because we removed the __init annotation on setup_rfi_flush(): The function setup_rfi_flush() references the function __init ppc64_bolted_size(). the function __init memblock_alloc_base(). The references are actually in init_fallback_flush(), but that is inlined into setup_rfi_flush(). These references are safe because: - only pseries calls setup_rfi_flush() at runtime - pseries always passes L1D_FLUSH_FALLBACK at boot - so the fallback flush area will always be allocated - so the check in init_fallback_flush() will always return early: /* Only allocate the fallback flush area once (at boot time). */ if (l1d_flush_fallback_area) return; - and therefore we won't actually call the freed init routines. We should rework the code to make it safer by default rather than relying on the above, but for now as a quick-fix just add a __ref annotation to squash the warning. Fixes: abf110f3e1ce ("powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-07Merge tag 'powerpc-4.17-1' of ↵Linus Torvalds40-521/+828
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for 4PB user address space on 64-bit, opt-in via mmap(). - Removal of POWER4 support, which was accidentally broken in 2016 and no one noticed, and blocked use of some modern instructions. - Workarounds so that the hypervisor can enable Transactional Memory on Power9. - A series to disable the DAWR (Data Address Watchpoint Register) on Power9. - More information displayed in the meltdown/spectre_v1/v2 sysfs files. - A vpermxor (Power8 Altivec) implementation for the raid6 Q Syndrome. - A big series to make the allocation of our pacas (per cpu area), kernel page tables, and per-cpu stacks NUMA aware when using the Radix MMU on Power9. And as usual many fixes, reworks and cleanups. Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook, Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe, Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring, Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun" * tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits) powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep powerpc/64s: Fix POWER9 DD2.2 and above in cputable features powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead" powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo} powerpc: io.h: move iomap.h include so that it can use readq/writeq defs cxl: Fix possible deadlock when processing page faults from cxllib powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it powerpc/mm/radix: Update command line parsing for disable_radix powerpc/mm/radix: Parse disable_radix commandline correctly. powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix powerpc/mm/keys: Update documentation and remove unnecessary check powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop() powerpc/powernv: Always stop secondaries before reboot/shutdown powerpc: hard disable irqs in smp_send_stop loop powerpc: use NMI IPI for smp_send_stop powerpc/powernv: Fix SMT4 forcing idle code ...
2018-04-06Merge tag 'pci-v4.17-changes' of ↵Linus Torvalds1-96/+10
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - move pci_uevent_ers() out of pci.h (Michael Ellerman) - skip ASPM common clock warning if BIOS already configured it (Sinan Kaya) - fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva) - remove last user of pci_get_bus_and_slot() and the function itself (Sinan Kaya) - add decoding for 16 GT/s link speed (Jay Fang) - add interfaces to get max link speed and width (Tal Gilboa) - add pcie_bandwidth_capable() to compute max supported link bandwidth (Tal Gilboa) - add pcie_bandwidth_available() to compute bandwidth available to device (Tal Gilboa) - add pcie_print_link_status() to log link speed and whether it's limited (Tal Gilboa) - use PCI core interfaces to report when device performance may be limited by its slot instead of doing it in each driver (Tal Gilboa) - fix possible cpqphp NULL pointer dereference (Shawn Lin) - rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI hotplug (Mika Westerberg) - add support for PCI I/O port space that's neither directly accessible via CPU in/out instructions nor directly mapped into CPU physical memory space. This is fairly intrusive and includes minor changes to interfaces used for I/O space on most platforms (Zhichang Yuan, John Garry) - add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan, John Garry) - use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas) - remove possible NULL pointer dereference in of_pci_bus_find_domain_nr() (Shawn Lin) - report quirk timings with dev_info (Bjorn Helgaas) - report quirks that take longer than 10ms (Bjorn Helgaas) - add and use Altera Vendor ID (Johannes Thumshirn) - tidy Makefiles and comments (Bjorn Helgaas) - don't set up INTx if MSI or MSI-X is enabled to align cris, frv, ia64, and mn10300 with x86 (Bjorn Helgaas) - move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick Lawler) - merge pcieport_if.h into portdrv.h (Bjorn Helgaas) - move workaround for BIOS PME issue from portdrv to PCI core (Bjorn Helgaas) - completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas) - remove portdrv link order dependency (Bjorn Helgaas) - remove support for unused VC portdrv service (Bjorn Helgaas) - simplify portdrv feature permission checking (Bjorn Helgaas) - remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn Helgaas) - remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas) - use cached AER capability offset (Frederick Lawler) - don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg) - rename pcie-dpc.c to dpc.c (Bjorn Helgaas) - use generic pci_mmap_resource_range() instead of powerpc and xtensa arch-specific versions (David Woodhouse) - support arbitrary PCI host bridge offsets on sparc (Yinghai Lu) - remove System and Video ROM reservations on sparc (Bjorn Helgaas) - probe for device reset support during enumeration instead of runtime (Bjorn Helgaas) - add ACS quirk for Ampere (née APM) root ports (Feng Kan) - add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas Vincent-Cross) - protect device restore with device lock (Sinan Kaya) - handle failure of FLR gracefully (Sinan Kaya) - handle CRS (config retry status) after device resets (Sinan Kaya) - skip various config reads for SR-IOV VFs as an optimization (KarimAllah Ahmed) - consolidate VPD code in vpd.c (Bjorn Helgaas) - add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann) - add DT support for R-Car r8a7743 (Biju Das) - fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host bridge driver that causes a general protection fault (Dexuan Cui) - fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV (Dexuan Cui) - fix Hyper-V host bridge hang when ejecting a VF before setting up MSI (Dexuan Cui) - make several structures static (Fengguang Wu) - increase number of MSI IRQs supported by Synopsys DesignWare bridges from 32 to 256 (Gustavo Pimentel) - implemented multiplexed IRQ domain API and remove obsolete MSI IRQ API from DesignWare drivers (Gustavo Pimentel) - add Tegra power management support (Manikanta Maddireddy) - add Tegra loadable module support (Manikanta Maddireddy) - handle 64-bit BARs correctly in endpoint support (Niklas Cassel) - support optional regulator for HiSilicon STB (Shawn Guo) - use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla) - support power supplies for Qualcomm msm8996 (Srinivas Kandagatla) * tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits) MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver HISI LPC: Add ACPI support ACPI / scan: Do not enumerate Indirect IO host children ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings of: Add missing I/O range exception for indirect-IO devices PCI: Apply the new generic I/O management on PCI IO hosts PCI: Add fwnode handler as input param of pci_register_io_range() PCI: Remove __weak tag from pci_register_io_range() MAINTAINERS: Add missing /drivers/pci/cadence directory entry fm10k: Report PCIe link properties with pcie_print_link_status() net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth net/mlx5: Report PCIe link properties with pcie_print_link_status() net/mlx4_core: Report PCIe link properties with pcie_print_link_status() PCI: Add pcie_print_link_status() to log link speed and whether it's limited PCI: Add pcie_bandwidth_available() to compute bandwidth available to device misc: pci_endpoint_test: Handle 64-bit BARs properly PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar ...
2018-04-05powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleepNicholas Piggin1-0/+2
POWER8 restores AMOR when waking from deep sleep, but POWER9 does not, because it does not go through the subcore restore. Have POWER9 restore it in core restore. Fixes: ee97b6b99f42 ("powerpc/mm/radix: Setup AMOR in HV mode to allow key 0") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bitNicholas Piggin1-0/+7
The pkey code added a CPU_FTR_PKEY bit, but did not add it to the dt_cpu_ftrs feature set. Although capability is supported by all processors in the base dt_cpu_ftrs set for 64s, it's a significant and sufficiently well defined feature to make it optional. So add it as a quirk for now, which can be versioned out then controlled by the firmware (once dt_cpu_ftrs gains versioning support). Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bitsNicholas Piggin1-3/+9
Presently the dt_cpu_ftrs restore_cpu will only add bits to the LPCR for secondaries, but some bits must be removed (e.g., UPRT for HPT). Not clearing these bits on secondaries causes checkstops when booting with disable_radix. restore_cpu can not just set LPCR, because it is also called by the idle wakeup code which relies on opal_slw_set_reg to restore the value of LPCR, at least on P8 which does not save LPCR to stack in the idle code. Fix this by including a mask of bits to clear from LPCR as well, which is used by restore_cpu. This is a little messy now, but it's a minimal fix that can be backported. Longer term, the idle SPR save/restore code can be reworked to completely avoid calls to restore_cpu, then restore_cpu would be able to unconditionally set LPCR to match boot processor environment. Fixes: 5a61ef74f269f ("powerpc/64s: Support new device tree binding for discovering CPU features") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"Michael Ellerman1-16/+29
As described in that commit: When stop is executed with EC=ESL=0, it appears to execute like a normal instruction (resuming from NIP when woken by interrupt). So all the save/restore handling can be avoided completely. This is true, except in the case of an NMI interrupt (sreset or machine check) interrupting the instruction. In that case, the NMI gets an "interrupt occurred while the processor was in power-saving mode" indication. The power-save wakeup code uses that bit to decide whether to restore some registers (e.g., LR). Because these are no longer saved, this causes random register corruption. It may be possible to restore this optimisation by detecting the case of no register loss on the wakeup side, and avoid restoring in that case, but that's not a minor fix because the wakeup code itself uses some registers that would be live (e.g., LR). Fixes: b9ee31e100e7 ("powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}Logan Gunthorpe1-0/+40
These functions will be introduced into the generic iomap.c so they can deal with PIO accesses in hi-lo/lo-hi variants. Thus, the powerpc version of iomap.c will need to provide the same functions even though, in this arch, they are identical to the regular io{read|write}64 functions. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Horia Geantă <horia.geanta@nxp.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/radix: Update command line parsing for disable_radixAneesh Kumar K.V2-4/+14
kernel parameter disable_radix takes different options disable_radix=yes|no|1|0 or just disable_radix. prom_init parsing is not supporting these options. Fixes: 1fd6c0220710 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overheadNicholas Piggin1-29/+16
When stop is executed with EC=ESL=0, it appears to execute like a normal instruction (resuming from NIP when woken by interrupt). So all the save/restore handling can be avoided completely. In particular NV GPRs do not have to be saved, and MSR does not have to be switched back to kernel MSR. So move the test for EC=ESL=0 sleep states out to power9_idle_stop, and return directly to the caller after stop in that case. This improves performance for ping-pong benchmark with the stop0_lite idle state by 2.54% for 2 threads in the same core, and 2.57% for different cores. Performance increase with HV_POSSIBLE defined will be improved further by avoiding the hwsync. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()Michael Ellerman1-12/+11
Commit 3d4fbffdd703 ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug") that added power9_offline_stop() was written before commit 7672691a08c8 ("powerpc/powernv: Provide a way to force a core into SMT4 mode"). When merging the former I failed to notice that it caused us to skip the force-SMT4 logic for offline CPUs. The result is that offlined CPUs will not correctly participate in the force-SMT4 logic, which presumably will result in badness (not tested). Reconcile the two commits by making power9_offline_stop() a pre-cursor to power9_idle_stop(), so that they share the force-SMT4 logic. This is based on an original commit from Nick, all breakage is my own. Fixes: 3d4fbffdd703 ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2018-04-03Merge tag 'kbuild-v4.17' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - add a shell script to get Clang version - improve portability of build scripts - drop always-enabled CONFIG_THIN_ARCHIVE and remove unused code - rename built-in.o which is now thin archive to built-in.a - process clean/build targets one by one to get along with -j option - simplify ld-option - improve building with CONFIG_TRIM_UNUSED_KSYMS - define KBUILD_MODNAME even for objects shared among multiple modules - avoid linking multiple instances of same objects from composite objects - move <linux/compiler_types.h> to c_flags to include it only for C files - clean-up various Makefiles * tag 'kbuild-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits) kbuild: get <linux/compiler_types.h> out of <linux/kconfig.h> kbuild: clean up link rule of composite modules kbuild: clean up archive rule of built-in.a kbuild: remove partial section mismatch detection for built-in.a net: liquidio: clean up Makefile for simpler composite object handling lib: zstd: clean up Makefile for simpler composite object handling kbuild: link $(real-obj-y) instead of $(obj-y) into built-in.a kbuild: rename real-objs-y/m to real-obj-y/m kbuild: move modname and modname-multi close to modname_flags kbuild: simplify modname calculation kbuild: fix modname for composite modules kbuild: define KBUILD_MODNAME even if multiple modules share objects kbuild: remove unnecessary $(subst $(obj)/, , ...) in modname-multi kbuild: Use ls(1) instead of stat(1) to obtain file size kbuild: link vmlinux only once for CONFIG_TRIM_UNUSED_KSYMS kbuild: move include/config/ksym/* to include/ksym/* kbuild: move CONFIG_TRIM_UNUSED_KSYMS code unneeded for external module kbuild: restore autoksyms.h touch to the top Makefile kbuild: move 'scripts' target below kbuild: remove wrong 'touch' in adjust_autoksyms.sh ...
2018-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextLinus Torvalds1-1/+1
Pull sparc updates from David Miller: 1) Add support for ADI (Application Data Integrity) found in more recent sparc64 cpus. Essentially this is keyed based access to virtual memory, and if the key encoded in the virual address is wrong you get a trap. The mm changes were reviewed by Andrew Morton and others. Work by Khalid Aziz. 2) Validate DAX completion index range properly, from Rob Gardner. 3) Add proper Kconfig deps for DAX driver. From Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc64: Make atomic_xchg() an inline function rather than a macro. sparc64: Properly range check DAX completion index sparc: Make auxiliary vectors for ADI available on 32-bit as well sparc64: Oracle DAX driver depends on SPARC64 sparc64: Update signal delivery to use new helper functions sparc64: Add support for ADI (Application Data Integrity) mm: Allow arch code to override copy_highpage() mm: Clear arch specific VM flags on protection change mm: Add address parameter to arch_validate_prot() sparc64: Add auxiliary vectors to report platform ADI properties sparc64: Add handler for "Memory Corruption Detected" trap sparc64: Add HV fault type handlers for ADI related faults sparc64: Add support for ADI register fields, ASIs and traps mm, swap: Add infrastructure for saving page metadata on swap signals, sparc: Add signal codes for ADI violations
2018-04-03powerpc: hard disable irqs in smp_send_stop loopNicholas Piggin1-2/+3
The hard lockup watchdog can fire under local_irq_disable on platforms with irq soft masking. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc: use NMI IPI for smp_send_stopNicholas Piggin1-0/+8
Use the NMI IPI rather than smp_call_function for smp_send_stop. Have stopped CPUs hard disable interrupts rather than just soft disable. This function is used in crash/panic/shutdown paths to bring other CPUs down as quickly and reliably as possible, and minimizing their potential to cause trouble. Avoiding the Linux smp_call_function infrastructure and (if supported) using true NMI IPIs makes this more robust. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc/powernv: Fix SMT4 forcing idle codeNicholas Piggin1-4/+5
The PSSCR value is not stored to PACA_REQ_PSSCR if the CPU does not have the XER[SO] bug. Fix this by storing up-front, outside the workaround code. The initial test is not required because it is a slow path. The workaround is made to depend on CONFIG_KVM_BOOK3S_HV_POSSIBLE, to match pnv_power9_force_smt4_catch() where it is used. Drop the comment on pnv_power9_force_smt4_catch() as it's no longer true. Fixes: 7672691a08c8 ("powerpc/powernv: Provide a way to force a core into SMT4 mode") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc: Move default security feature flagsMauricio Faria de Oliveira1-6/+1
This moves the definition of the default security feature flags (i.e., enabled by default) closer to the security feature flags. This can be used to restore current flags to the default flags. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc: Don't write to DABR on >= Power8 if DAWR is disabledNicholas Piggin1-2/+8
flush_thread() calls __set_breakpoint() via set_debug_reg_defaults() without checking ppc_breakpoint_available(). On Power8 or later CPUs which have the DAWR feature disabled that will cause a write to the DABR which is incorrect as those CPUs don't have a DABR. Fix it two ways, by checking ppc_breakpoint_available() in set_debug_reg_defaults(), and also by reworking __set_breakpoint() to only write to DABR on Power7 or earlier. Fixes: 9654153158d3 ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rework the logic in __set_breakpoint()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc: Fix oops due to bad access of lppaca on bare metalAneesh Kumar K.V1-0/+3
Commit 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") removed allocation of lppaca on bare metal platforms. But with CONFIG_PPC_SPLPAR enabled, we still access the lppaca on bare metal in some code paths. Fix this but adding runtime checks for SPLPAR (shared processor LPAR). Fixes: 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-02Merge branch 'syscalls-next' of ↵Linus Torvalds2-12/+12
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull removal of in-kernel calls to syscalls from Dominik Brodowski: "System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h" * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits) bpf: whitelist all syscalls for error injection kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions kernel/sys_ni: sort cond_syscall() entries syscalls/x86: auto-create compat_sys_*() prototypes syscalls: sort syscall prototypes in include/linux/compat.h net: remove compat_sys_*() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/syscalls.h kexec: move sys_kexec_load() prototype to syscalls.h x86/sigreturn: use SYSCALL_DEFINE0 x86: fix sys_sigreturn() return type to be long, not unsigned long x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() ...
2018-04-02mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()Dominik Brodowski1-1/+1
Using this helper allows us to avoid the in-kernel calls to the sys_readahead() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_readahead(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()Dominik Brodowski1-1/+1
Using this helper allows us to avoid the in-kernel calls to the sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_mmap_pgoff(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()Dominik Brodowski2-4/+4
Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel calls to the sys_fadvise64_64() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as ksys_fadvise64_64(). Some compat stubs called sys_fadvise64(), which then just passed through the arguments to sys_fadvise64_64(). Get rid of this indirection, and call ksys_fadvise64_64() directly. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()Dominik Brodowski1-1/+1
Using the ksys_fallocate() wrapper allows us to get rid of in-kernel calls to the sys_fallocate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_fallocate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscallsDominik Brodowski1-2/+2
Using the ksys_p{read,write}64() wrappers allows us to get rid of in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_p{read,write}64(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()Dominik Brodowski1-1/+1
Using the ksys_truncate() wrapper allows us to get rid of in-kernel calls to the sys_truncate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_truncate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>