aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2015-06-04of: clean-up unnecessary libfdt include pathsRob Herring1-1/+0
With the libfdt include fixups to use "" instead of <> in the latest dtc import in commit 4760597 (scripts/dtc: Update to upstream version 9d3649bd3be245c9), it is no longer necessary to add explicit include paths to use libfdt. Remove these across the kernel. Signed-off-by: Rob Herring <[email protected]> Acked-by: Ralf Baechle <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Acked-by: Michael Ellerman <[email protected]> Acked-by: Grant Likely <[email protected]> Cc: [email protected] Cc: [email protected]
2015-06-03powerpc/pci: Add pcibios_disable_device() hookMichael Neuling1-0/+8
This adds a hook into the powerpc pci code for pci_disable_device() calls. The generic code already provides a weak pcibios_disable_device() symbol, so we just need to provide our own in powerpc and it'll get picked up. This is passed directly to the phb controller ops, provided one exists. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-06-03powerpc/pci: Add release_device() hook to phb opsMichael Neuling1-0/+5
Add release_device() hook to phb ops so we can clean up for specific phbs. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-06-03powerpc/pci: Export symbols for CXLDaniel Axtens1-0/+3
Export pcibios_claim_one_bus, pcibios_scan_phb and pcibios_alloc_controller. These will be used by the CXL driver. Signed-off-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-06-02powerpc/8xx: Implementation of PAGE_EXECLEROY Christophe1-3/+9
This patch implements PAGE_EXEC capability on the 8xx. All pages PP exec bits are set to 000, which means Execute for Supervisor and no Execute for User. Then we use the APG to say whether accesses are according to Page rules, "all Supervisor" rules (Exec for all) and "all User" rules (Exec for noone) Therefore, we define 4 APG groups. msb is _PAGE_EXEC, lsb is _PAGE_USER. MI_AP is initialised as follows: GP0 (00) => Not User, no exec => 11 (all accesses performed as user) GP1 (01) => User but no exec => 11 (all accesses performed as user) GP2 (10) => Not User, exec => 01 (rights according to page definition) GP3 (11) => User, exec => 00 (all accesses performed as supervisor) Signed-off-by: Christophe Leroy <[email protected]> [scottwood: comments: s/exec/data/ on data side, and s/pages/pages'/] Signed-off-by: Scott Wood <[email protected]>
2015-06-02powerpc/8xx: Handle PAGE_USER via APG bitsLEROY Christophe1-9/+12
Use of APG for handling PAGE_USER. All pages PP exec bits are set to either 000 or 011, which means respectively RW for Supervisor and no access for User, or RO for Supervisor and no access for user. Then we use the APG to say whether accesses are according to Page rules or "all Supervisor" rules (Access to all) Therefore, we define 2 APG groups corresponding to _PAGE_USER. Mx_AP are initialised as follows: GP0 => No user => 01 (all accesses performed according to page definition) GP1 => User => 00 (all accesses performed as supervisor according to page definition) This removes the special 8xx handling in pte_update() Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2015-06-02powerpc/8xx: Add support for TASK_SIZE greater than 0x80000000LEROY Christophe1-6/+19
By default, TASK_SIZE is set to 0x80000000 for PPC_8xx, which is most likely sufficient for most cases. However, kernel configuration allows to set TASK_SIZE to another value, so the 8xx shall handle it. This patch also takes into account the case of PAGE_OFFSET lower than 0x80000000, allthought most of the time it is equal to 0xC0000000 Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2015-06-02powerpc/8xx: Use SPRG2 instead of DAR for saving r3LEROY Christophe1-5/+4
We now have SPRG2 available as in it not used anymore for saving CR, so we don't need to crash DAR anymore for saving r3 for CPU6 ERRATA handling. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2015-06-02powerpc/8xx: dont save CR in SCRATCH registersLEROY Christophe1-14/+15
CR only needs to be preserved when checking if we are handling a kernel address. So we can preserve CR in a register: - In ITLBMiss, check is done only when CONFIG_MODULES is defined. Otherwise we don't need to do anything at all with CR. - We use r10, then we reload SRR0/MD_EPN into r10 when CR is restored Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2015-06-02powerpc/8xx: Handle CR out of exception PROLOG/EPILOGLEROY Christophe1-3/+7
In order to be able to reduce scope during which CR is saved, we take CR saving/restoring out of exception PROLOG and EPILOG Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2015-06-02powerpc/8xx: macro for handling CPU15 errataLEROY Christophe1-6/+12
Having a macro will help keep clear code. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2015-06-02powerpc/e500mc: Remove dead L2 flushing code in idle_e500.SScott Wood1-9/+0
This code can never be executed as it is only built when CONFIG_PPC_E500MC is unset, but the only CPUs that have CPU_FTR_L2CSR require CONFIG_PPC_E500MC and do not have the MSR/HID0-based nap mechanism that this file uses. Signed-off-by: Scott Wood <[email protected]>
2015-06-02of/fdt: split off FDT self reservation from memreserve processingArd Biesheuvel1-0/+1
This splits off the reservation of the memory occupied by the FDT binary itself from the processing of the memory reservations it contains. This is necessary because the physical address of the FDT, which is needed to perform the reservation, may not be known to the FDT driver core, i.e., it may be mapped outside the linear direct mapping, in which case __pa() returns a bogus value. Cc: Russell King <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Acked-by: Rob Herring <[email protected]> Acked-by: Mark Rutland <[email protected]> Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2015-06-02powerpc: Non relocatable system call doesn't need a trampolineAnton Blanchard1-1/+1
We need to use a trampoline when using LOAD_HANDLER(), because the destination needs to be in the first 64kB. An absolute branch has no such limitations, so just jump there. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-06-02powerpc: Relocatable system call no longer uses the LRAnton Blanchard1-12/+4
We had some code to restore the LR in the relocatable system call path back when we used the LR to do an indirect branch. Commit 6a404806dfce ("powerpc: Avoid link stack corruption in MMU on syscall entry path") changed this to use the CTR which is volatile across system calls so does not need restoring. Remove the stale comment and the restore of the LR. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-06-02powerpc/pci: add dma_set_mask to pci_controller_opsDaniel Axtens1-0/+8
Some systems only need to deal with DMA masks for PCI devices. For these systems, we can avoid the need for a platform hook and instead use a pci controller based hook. Signed-off-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-06-02powerpc: Remove MSI-related PCI controller ops from ppc_mdDaniel Axtens1-11/+4
Remove unneeded ppc_md functions. Patch callsites to use pci_controller_ops functions exclusively. Signed-off-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-22powerpc: Add MSI operations to pci_controller_ops structDaniel Axtens1-3/+15
Add MSI setup and teardown functions to pci_controller_ops. Patch the callsites (arch_{setup,teardown}_msi_irqs) to prefer the controller ops version if it's available. Signed-off-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-14powerpc: Align TOC to 256 bytesAnton Blanchard1-0/+1
Recent toolchains force the TOC to be 256 byte aligned. We need to enforce this alignment in our linker script, otherwise pointers to our TOC variables (__toc_start, __prom_init_toc_start) could be incorrect. If they are bad, we die a few hundred instructions into boot. Cc: [email protected] Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-13powerpc/eeh: remove unused macro IS_BRIDGEWei Yang1-2/+0
Currently, the macro IS_BRIDGE is not used any where. This patch just removes it. Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-13powerpc/eeh: fix comment for wait_state()Wei Yang1-1/+1
To retrieve the PCI slot state, EEH driver would set a timeout for that. While current comment is not aligned to what the code does. This patch fixes those comments according to the code. Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-13powerpc/eeh: fix start/end/flags type in struct pci_io_addr_range{}Wei Yang1-8/+8
struct pci_io_addr_range{} stores the information of pci resources. It would be better to keep these related fields have the same type as in struct resource{}. This patch fixes the start/end/flags type in struct pci_io_addr_range{} to have the same type as in struct resource{}. Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-12powerpc/eeh: Introduce eeh_pe_inject_err()Gavin Shan1-0/+35
The patch defines PCI error types and functions in uapi/asm/eeh.h and exports function eeh_pe_inject_err(), which will be called by VFIO driver to inject the specified PCI error to the indicated PE for testing purpose. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-12powerpc/mce: fix off by one errors in mce event handlingDaniel Axtens1-2/+2
Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in save_mce_event, index got the value of mce_nest_count, and mce_nest_count was incremented *after* index was set. However, that patch changed the behaviour so that mce_nest count was incremented *before* setting index. This causes an off-by-one error, as get_mce_event sets index as mce_nest_count - 1 before reading mce_event. Thus get_mce_event reads bogus data, causing warnings like "Machine Check Exception, Unknown event version 0 !" and breaking MCEs handling. Restore the old behaviour and unbreak MCE handling by subtracting one from the newly incremented value. The same broken change occured in machine_check_queue_event (which set a queue read by machine_check_process_queued_event). Fix that too, unbreaking printing of MCE information. Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses") CC: [email protected] CC: Mahesh Salgaonkar <[email protected]> CC: Christoph Lameter <[email protected]> Signed-off-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-11powerpc/vdso: Disable building the 32-bit VDSO on little endianMichael Ellerman2-3/+36
The only little endian configuration we support is ppc64le. As such if we're building little endian we don't need a 32-bit VDSO, because there is no 32-bit userspace. This patch is a fairly ugly mess of #ifdefs, but is the minimal logic required to disable the 32-bit VDSO. We can hopefully clean up the result in future with some further refactoring. Signed-off-by: Michael Ellerman <[email protected]>
2015-05-11powerpc/vdso: Combine start/size variablesMichael Ellerman1-29/+26
In vdso_fixup_features() we have start64/start32 and size64/size32, but they have the same types, ie. void * and unsigned long. They're only used to save the return value from find_sectionXX() for the subsequent call to do_feature_fixups(), so there's no overlap in their usage either. So we can just consolidate them into start/size and avoid the duplication. Signed-off-by: Michael Ellerman <[email protected]>
2015-05-11powerpc/vdso: Remove unused debug codeMichael Ellerman1-44/+0
It's in the git history if we ever need it back. Signed-off-by: Michael Ellerman <[email protected]>
2015-05-11powerpc: Show utsname->machine in boot-up bannerMichael Ellerman1-1/+2
Currently we print "Starting Linux PPC64" at boot. But we don't mention anywhere whether the kernel is big or little endian. If we print the utsname->machine value instead we get either "ppc64" or "ppc64le" which is much more informative, eg: Starting Linux ppc64le #1 SMP Wed Apr 15 12:12:20 AEST 2015 Signed-off-by: Michael Ellerman <[email protected]>
2015-05-11powerpc: export of_get_ibm_chip_id functionDan Streetman1-0/+1
Export the of_get_ibm_chip_id() function. This will be used by the PowerNV NX-842 driver. Signed-off-by: Dan Streetman <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2015-05-01powerpc/powernv: Restore non-volatile CRs after napSam Bobroff1-0/+2
Patches 7cba160ad "powernv/cpuidle: Redesign idle states management" and 77b54e9f2 "powernv/powerpc: Add winkle support for offline cpus" use non-volatile condition registers (cr2, cr3 and cr4) early in the system reset interrupt handler (system_reset_pSeries()) before it has been determined if state loss has occurred. If state loss has not occurred, control returns via the power7_wakeup_noloss() path which does not restore those condition registers, leaving them corrupted. Fix this by restoring the condition registers in the power7_wakeup_noloss() case. This is apparent when running a KVM guest on hardware that does not support winkle or sleep and the guest makes use of secondary threads. In practice this means Power7 machines, though some early unreleased Power8 machines may also be susceptible. The secondary CPUs are taken off line before the guest is started and they call pnv_smp_cpu_kill_self(). This checks support for sleep states (in this case there is no support) and power7_nap() is called. When the CPU is woken, power7_nap() returns and because the CPU is still off line, the main while loop executes again. The sleep states support test is executed again, but because the tested values cannot have changed, the compiler has optimized the test away and instead we rely on the result of the first test, which has been left in cr3 and/or cr4. With the result overwritten, the wrong branch is taken and power7_winkle() is called on a CPU that does not support it, leading to it stalling. Fixes: 7cba160ad789 ("powernv/cpuidle: Redesign idle states management") Fixes: 77b54e9f213f ("powernv/powerpc: Add winkle support for offline cpus") [mpe: Massage change log a bit more] Signed-off-by: Sam Bobroff <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-01powerpc/eeh: Delay probing EEH device during hotplugGavin Shan1-0/+6
Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH devices in early stage, which is reasonable to pSeries platform. However, it's wrong for PowerNV platform because the PE# isn't determined until the resources (IO and MMIO) are assigned to PE in hotplug case. So we have to delay probing EEH devices for PowerNV platform until the PE# is assigned. Fixes: ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn") Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-05-01powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()Gavin Shan1-1/+4
When asserting reset in pcibios_set_pcie_reset_state(), the PE is enforced to (hardware) frozen state in order to drop unexpected PCI transactions (except PCI config read/write) automatically by hardware during reset, which would cause recursive EEH error. However, the (software) frozen state EEH_PE_ISOLATED is missed. When users get 0xFF from PCI config or MMIO read, EEH_PE_ISOLATED is set in PE state retrival backend. Unfortunately, nobody (the reset handler or the EEH recovery functinality in host) will clear EEH_PE_ISOLATED when the PE has been passed through to guest. The patch sets and clears EEH_PE_ISOLATED properly during reset in function pcibios_set_pcie_reset_state() to fix the issue. Fixes: 28158cd ("Enhance pcibios_set_pcie_reset_state()") Reported-by: Carol L. Soto <[email protected]> Signed-off-by: Gavin Shan <[email protected]> Tested-by: Carol L. Soto <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-04-30Revert "powerpc/tm: Abort syscalls in active transactions"Michael Ellerman1-19/+0
This reverts commit feba40362b11341bee6d8ed58d54b896abbd9f84. Although the principle of this change is good, the implementation has a few issues. Firstly we can sometimes fail to abort a syscall because r12 may have been clobbered by C code if we went down the virtual CPU accounting path, or if syscall tracing was enabled. Secondly we have decided that it is safer to abort the syscall even earlier in the syscall entry path, so that we avoid the syscall tracing path when we are transactional. So that we have time to thoroughly test those changes we have decided to revert this for this merge window and will merge the fixed version in the next window. NB. Rather than reverting the selftest we just drop tm-syscall from TEST_PROGS so that it's not run by default. Fixes: feba40362b11 ("powerpc/tm: Abort syscalls in active transactions") Signed-off-by: Michael Ellerman <[email protected]>
2015-04-26Merge tag 'powerpc-4.1-2' of ↵Linus Torvalds2-7/+9
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc fixes from Michael Ellerman: - fix for mm_dec_nr_pmds() from Scott. - fixes for oopses seen with KVM + THP from Aneesh. - build fixes from Aneesh & Shreyas. * tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error powerpc/mm/thp: Return pte address if we find trans_splitting. powerpc/mm/thp: Make page table walk safe against thp split/collapse KVM: PPC: Remove page table walk helpers KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
2015-04-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-3/+23
Pull second batch of KVM changes from Paolo Bonzini: "This mostly includes the PPC changes for 4.1, which this time cover Book3S HV only (debugging aids, minor performance improvements and some cleanups). But there are also bug fixes and small cleanups for ARM, x86 and s390. The task_migration_notifier revert and real fix is still pending review, but I'll send it as soon as possible after -rc1" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits) KVM: arm/arm64: check IRQ number on userland injection KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi KVM: VMX: Preserve host CR4.MCE value while in guest mode. KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C KVM: PPC: Book3S HV: Streamline guest entry and exit KVM: PPC: Book3S HV: Use bitmap of active threads rather than count KVM: PPC: Book3S HV: Use decrementer to wake napping threads KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu KVM: PPC: Book3S HV: Minor cleanups KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update KVM: PPC: Book3S HV: Accumulate timing information for real-mode code KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT KVM: PPC: Book3S HV: Add ICP real mode counters KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock KVM: PPC: Book3S HV: Add guest->host real mode completion counters KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte ...
2015-04-21KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8Paul Mackerras1-0/+3
This uses msgsnd where possible for signalling other threads within the same core on POWER8 systems, rather than IPIs through the XICS interrupt controller. This includes waking secondary threads to run the guest, the interrupts generated by the virtual XICS, and the interrupts to bring the other threads out of the guest when exiting. Aggregated statistics from debugfs across vcpus for a guest with 32 vcpus, 8 threads/vcore, running on a POWER8, show this before the change: rm_entry: 3387.6ns (228 - 86600, 1008969 samples) rm_exit: 4561.5ns (12 - 3477452, 1009402 samples) rm_intr: 1660.0ns (12 - 553050, 3600051 samples) and this after the change: rm_entry: 3060.1ns (212 - 65138, 953873 samples) rm_exit: 4244.1ns (12 - 9693408, 954331 samples) rm_intr: 1342.3ns (12 - 1104718, 3405326 samples) for a test of booting Fedora 20 big-endian to the login prompt. The time taken for a H_PROD hcall (which is handled in the host kernel) went down from about 35 microseconds to about 16 microseconds with this change. The noinline added to kvmppc_run_core turned out to be necessary for good performance, at least with gcc 4.9.2 as packaged with Fedora 21 and a little-endian POWER8 host. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2015-04-21KVM: PPC: Book3S HV: Use bitmap of active threads rather than countPaul Mackerras1-1/+1
Currently, the entry_exit_count field in the kvmppc_vcore struct contains two 8-bit counts, one of the threads that have started entering the guest, and one of the threads that have started exiting the guest. This changes it to an entry_exit_map field which contains two bitmaps of 8 bits each. The advantage of doing this is that it gives us a bitmap of which threads need to be signalled when exiting the guest. That means that we no longer need to use the trick of setting the HDEC to 0 to pull the other threads out of the guest, which led in some cases to a spurious HDEC interrupt on the next guest entry. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2015-04-21KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_wokenPaul Mackerras1-1/+0
We can tell when a secondary thread has finished running a guest by the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there is no real need for the nap_count field in the kvmppc_vcore struct. This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu pointers of the secondary threads rather than polling vc->nap_count. Besides reducing the size of the kvmppc_vcore struct by 8 bytes, this also means that we can tell which secondary threads have got stuck and thus print a more informative error message. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2015-04-21KVM: PPC: Book3S HV: Minor cleanupsPaul Mackerras1-1/+0
* Remove unused kvmppc_vcore::n_busy field. * Remove setting of RMOR, since it was only used on PPC970 and the PPC970 KVM support has been removed. * Don't use r1 or r2 in setting the runlatch since they are conventionally reserved for other things; use r0 instead. * Streamline the code a little and remove the ext_interrupt_to_host label. * Add some comments about register usage. * hcall_try_real_mode doesn't need to be global, and can't be called from C code anyway. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2015-04-21KVM: PPC: Book3S HV: Accumulate timing information for real-mode codePaul Mackerras2-0/+19
This reads the timebase at various points in the real-mode guest entry/exit code and uses that to accumulate total, minimum and maximum time spent in those parts of the code. Currently these times are accumulated per vcpu in 5 parts of the code: * rm_entry - time taken from the start of kvmppc_hv_entry() until just before entering the guest. * rm_intr - time from when we take a hypervisor interrupt in the guest until we either re-enter the guest or decide to exit to the host. This includes time spent handling hcalls in real mode. * rm_exit - time from when we decide to exit the guest until the return from kvmppc_hv_entry(). * guest - time spend in the guest * cede - time spent napping in real mode due to an H_CEDE hcall while other threads in the same vcore are active. These times are exposed in debugfs in a directory per vcpu that contains a file called "timings". This file contains one line for each of the 5 timings above, with the name followed by a colon and 4 numbers, which are the count (number of times the code has been executed), the total time, the minimum time, and the maximum time, all in nanoseconds. The overhead of the extra code amounts to about 30ns for an hcall that is handled in real mode (e.g. H_SET_DABR), which is about 25%. Since production environments may not wish to incur this overhead, the new code is conditional on a new config symbol, CONFIG_KVM_BOOK3S_HV_EXIT_TIMING. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2015-04-17powerpc/mm/thp: Make page table walk safe against thp split/collapseAneesh Kumar K.V2-7/+9
We can disable a THP split or a hugepage collapse by disabling irq. We do send IPI to all the cpus in the early part of split/collapse, and disabling local irq ensure we don't make progress with split/collapse. If the THP is getting split we return NULL from find_linux_pte_or_hugepte(). For all the current callers it should be ok. We need to be careful if we want to use returned pte_t pointer outside the irq disabled region. W.r.t to THP split, the pfn remains the same, but then a hugepage collapse will result in a pfn change. There are few steps we can take to avoid a hugepage collapse.One way is to take page reference inside the irq disable region. Other option is to take mmap_sem so that a parallel collapse will not happen. We can also disable collapse by taking pmd_lock. Another method used by kvm subsystem is to check whether we had a mmu_notifer update in between using mmu_notifier_retry(). Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-04-16Merge tag 'powerpc-4.1-1' of ↵Linus Torvalds30-308/+1439
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - Numerous minor fixes, cleanups etc. - More EEH work from Gavin to remove its dependency on device_nodes. - Memory hotplug implemented entirely in the kernel from Nathan Fontenot. - Removal of redundant CONFIG_PPC_OF by Kevin Hao. - Rewrite of VPHN parsing logic & tests from Greg Kurz. - A fix from Nish Aravamudan to reduce memory usage by clamping nodes_possible_map. - Support for pstore on powernv from Hari Bathini. - Removal of old powerpc specific byte swap routines by David Gibson. - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing your firmware when it wasn't. - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver. - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek. - Some fixes for migration from Tyrel Datwyler. - A new syscall to switch the cpu endian by Michael Ellerman. - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn. - A fix for the OPAL sensor driver from Cédric Le Goater. - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman. - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per machine. - Small patch from Sam Bobroff to explicitly abort non-suspended transactions on syscalls, plus a test to exercise it. - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu. - Small patch to enable the hard lockup detector from Anton Blanchard. - Fix from Dave Olson for missing L2 cache information on some CPUs. - Some fixes from Michael Ellerman to get Cell machines booting again. - Freescale updates from Scott: Highlights include BMan device tree nodes, an MSI erratum workaround, a couple minor performance improvements, config updates, and misc fixes/cleanup. * tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits) powerpc/powermac: Fix build error seen with powermac smp builds powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE powerpc: Remove PPC32 code from pseries specific find_and_init_phbs() powerpc/cell: Fix iommu breakage caused by controller_ops change powerpc/eeh: Fix crash in eeh_add_device_early() on Cell powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails powerpc/pseries: Correct memory hotplug locking powerpc: Fix missing L2 cache size in /sys/devices/system/cpu powerpc: Add ppc64 hard lockup detector support oprofile: Disable oprofile NMI timer on ppc64 powerpc/perf/hv-24x7: Add missing put_cpu_var() powerpc/perf/hv-24x7: Break up single_24x7_request powerpc/perf/hv-24x7: Define update_event_count() powerpc/perf/hv-24x7: Whitespace cleanup powerpc/perf/hv-24x7: Define add_event_to_24x7_request() powerpc/perf/hv-24x7: Rename hv_24x7_event_update powerpc/perf/hv-24x7: Move debug prints to separate function powerpc/perf/hv-24x7: Drop event_24x7_request() powerpc/perf/hv-24x7: Use pr_devel() to log message ... Conflicts: tools/testing/selftests/powerpc/Makefile tools/testing/selftests/powerpc/tm/Makefile
2015-04-14Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Usual trivial tree updates. Nothing outstanding -- mostly printk() and comment fixes and unused identifier removals" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: goldfish: goldfish_tty_probe() is not using 'i' any more powerpc: Fix comment in smu.h qla2xxx: Fix printks in ql_log message lib: correct link to the original source for div64_u64 si2168, tda10071, m88ds3103: Fix firmware wording usb: storage: Fix printk in isd200_log_config() qla2xxx: Fix printk in qla25xx_setup_mode init/main: fix reset_device comment ipwireless: missing assignment goldfish: remove unreachable line of code coredump: Fix do_coredump() comment stacktrace.h: remove duplicate declaration task_struct smpboot.h: Remove unused function prototype treewide: Fix typo in printk messages treewide: Fix typo in printk messages mod_devicetable: fix comment for match_flags
2015-04-14powerpc/eeh: Fix crash in eeh_add_device_early() on CellMichael Ellerman1-1/+1
The recent change to the EEH probing causes a crash on Cell because eeh_ops is NULL. Check if EEH is enabled and if not bail out. Fixes: ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn") Signed-off-by: Michael Ellerman <[email protected]>
2015-04-14Merge branch 'next-sriov' of ↵Michael Ellerman2-0/+149
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into next Merge Richard's work to support SR-IOV on PowerNV. All generic PCI patches acked by Bjorn. Some minor conflicts with Daniel's pci_controller_ops work. Conflicts: arch/powerpc/include/asm/machdep.h arch/powerpc/platforms/powernv/pci-ioda.c
2015-04-11powerpc: Fix missing L2 cache size in /sys/devices/system/cpuDave Olson1-10/+34
This problem appears to have been introduced in 2.6.29 by commit 93197a36a9c1 "Rewrite sysfs processor cache info code". This caused lscpu to error out on at least e500v2 devices, eg: error: cannot open /sys/devices/system/cpu/cpu0/cache/index2/size: No such file or directory Some embedded powerpc systems use cache-size in DTS for the unified L2 cache size, not d-cache-size, so we need to allow for both DTS names. Added a new CACHE_TYPE_UNIFIED_D cache_type_info structure to handle this. Fixes: 93197a36a9c1 ("powerpc: Rewrite sysfs processor cache info code") Signed-off-by: Dave Olson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-04-11powerpc: Add ppc64 hard lockup detector supportAnton Blanchard1-0/+20
The hard lockup detector uses a PMU event as a periodic NMI to detect if we are stuck (where stuck means no timer interrupts have occurred). Ben's rework of the ppc64 soft disable code has made ppc64 PMU exceptions a partial NMI. They can get disabled if an external interrupt comes in, but otherwise PMU interrupts will fire in interrupt disabled regions. We disable the hard lockup detector by default for a few reasons: - It breaks userspace event based branches on POWER8. - It is likely to produce false positives on KVM guests. - Since PMCs can only count to 2^31, counting cycles means we might take multiple PMU exceptions per second per hardware thread even if our hard lockup timeout is 10 seconds. It can be enabled via a boot option, or via procfs. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-04-11powerpc/tm: Abort syscalls in active transactionsSam bobroff1-0/+19
This patch changes the syscall handler to doom (tabort) active transactions when a syscall is made and return immediately without performing the syscall. Currently, the system call instruction automatically suspends an active transaction which causes side effects to persist when an active transaction fails. This does change the kernel's behaviour, but in a way that was documented as unsupported. It doesn't reduce functionality because syscalls will still be performed after tsuspend. It also provides a consistent interface and makes the behaviour of user code substantially the same across powerpc and platforms that do not support suspended transactions (e.g. x86 and s390). Performance measurements using http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of a system call increases by about 0.5%. Signed-off-by: Sam Bobroff <[email protected]> Acked-By: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-04-11powerpc: Remove shims for pci_controller_ops operationsDaniel Axtens3-10/+45
Remove shims, patch callsites to use pci_controller_ops versions instead. Also move back the probe mode defines, as explained in the patch for pci_probe_mode. Signed-off-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2015-04-11powerpc: fsl_pci, swiotlb: Move controller ops from ppc_md to controller_opsDaniel Axtens1-7/+4
Move the installation of DMA operations out of swiotlb's subsys initcall, and into the generic PCI controller operations struct. These ops are installed conditionally, based on the ppc_swiotlb_enable global. The global can be set in two places: - swiotlb_detect_4g, which is always called at the arch initcall level - setup_pci_atmu, which is called as part of the fsl_add_bridge and fsl_pci_syscore_do_resume. fsl_pci_syscore_do_resume is called late enough that any changes as a result of that call will have no effect. As such, if we test the global and set the operations as part of fsl_add_bridge, after the call to setup_pci_atmu, we can be confident that it will cover all the PCI implementations affected by the changes to dma-swiotlb.c. Signed-off-by: Daniel Axtens <[email protected]> Acked-by: Scott Wood <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>