aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2016-03-01powerpc/ps3: gelic_udbg: use struct vlan_hdr from <linux/if_vlan.h>Luis Henriques1-10/+6
Instead of defining the local struct vlantag use the standard definition of vlan_hdr from <linux/if_vlan.h>. The fields in the <linux/if_vlan.h> definition have different names: - vlan -> h_vlan_TCI - subtype -> h_vlan_encapsulated_proto While there, use also the ETH_P_IP macro instead of an hard-coded 0x0800 value. Signed-off-by: Luis Henriques <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-01powerpc/ps3: gelic_udbg: use struct ethhdr from <linux/if_ether.h>Luis Henriques1-10/+7
Instead of defining a local version of struct ethhdr use the standard definition from <linux/if_ether.h>. The fields in the <linux/if_ether.h> definition have different names: - dest -> h_dest - src -> h_source - type -> h_proto While there, use a few other standard functions/macros: - eth_broadcast_addr (instead of a memset) - ETH_ALEN - ETH_P_8021Q Signed-off-by: Luis Henriques <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-29powerpc/mm/book3s-64: Expand the real page number field of the Linux PTEPaul Mackerras2-8/+8
Now that other PTE fields have been moved out of the way, we can expand the RPN field of the PTE on 64-bit Book 3S systems and align it with the RPN field in the radix PTE format used by PowerISA v3.0 CPUs in radix mode. For 64k page size, this means we need to move the _PAGE_COMBO and _PAGE_4K_PFN bits. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-29powerpc/mm/book3s-64: Move software-used bits in PTEPaul Mackerras1-3/+3
This moves the _PAGE_SPECIAL and _PAGE_SOFT_DIRTY bits in the Linux PTE on 64-bit Book 3S systems to bit positions which are designated for software use in the radix PTE format used by PowerISA v3.0 CPUs in radix mode. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-29powerpc/mm/book3s-64: Shuffle read, write, execute and user bits in PTEPaul Mackerras1-4/+6
This moves the _PAGE_EXEC, _PAGE_RW and _PAGE_USER bits around in the Linux PTE on 64-bit Book 3S systems to correspond with the bit positions used in radix mode by PowerISA v3.0 CPUs. This also adds a _PAGE_READ bit corresponding to the read permission bit in the radix PTE. _PAGE_READ is currently unused but could possibly be used in future to improve pte_protnone(). Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-29powerpc/mm/book3s-64: Move HPTE-related bits in PTE to upper endPaul Mackerras2-6/+7
This moves the _PAGE_HASHPTE, _PAGE_F_GIX and _PAGE_F_SECOND fields in the Linux PTE on 64-bit Book 3S systems to the most significant byte. Of the 5 bits, one is a software-use bit and the other four are reserved bit positions in the PowerISA v3.0 radix PTE format. Using these bits is OK because these bits are all to do with tracking the HPTE(s) associated with the Linux PTE, and therefore won't be needed in radix mode. This frees up bit positions in the lower two bytes. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-29powerpc/mm/book3s-64: Move _PAGE_PTE to 2nd most significant bitPaul Mackerras1-1/+1
This changes _PAGE_PTE for 64-bit Book 3S processors from 0x1 to 0x4000_0000_0000_0000, because that bit is used as the L (leaf) bit by PowerISA v3.0 CPUs in radix mode. The "leaf" bit indicates that the PTE points to a page directly rather than another radix level, which is what the _PAGE_PTE bit means. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-29powerpc/mm/book3s-64: Move _PAGE_PRESENT to the most significant bitPaul Mackerras4-9/+11
This changes _PAGE_PRESENT for 64-bit Book 3S processors from 0x2 to 0x8000_0000_0000_0000, because that is where PowerISA v3.0 CPUs in radix mode will expect to find it. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-29powerpc/mm/book3s-64: Use physical addresses in upper page table tree levelsPaul Mackerras7-18/+28
This changes the Linux page tables to store physical addresses rather than kernel virtual addresses in the upper levels of the tree (pgd, pud and pmd) for 64-bit Book 3S machines. This also changes the hugepd pointers used to implement hugepages when the base page size is 4k to store physical addresses rather than virtual addresses (again just for 64-bit Book3S machines). This frees up some high order bits, and will be needed with PowerISA v3.0 machines which read the page table tree in hardware in radix mode. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-29Merge tag 'v4.5-rc6' into locking/core, to pick up fixesIngo Molnar16-15/+105
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵Ingo Molnar16-15/+105
applying new changes Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29KVM: PPC: Book3S HV: Add tunable to control H_IPI redirectionSuresh E. Warrier3-1/+16
Redirecting the wakeup of a VCPU from the H_IPI hypercall to a core running in the host is usually a good idea, most workloads seemed to benefit. However, in one heavily interrupt-driven SMT1 workload, some regression was observed. This patch adds a kvm_hv module parameter called h_ipi_redirect to control this feature. The default value for this tunable is 1 - that is enable the feature. Signed-off-by: Suresh Warrier <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2016-02-29KVM: PPC: Book3S HV: Send IPI to host core to wake VCPUSuresh E. Warrier3-3/+110
This patch adds support to real-mode KVM to search for a core running in the host partition and send it an IPI message with VCPU to be woken. This avoids having to switch to the host partition to complete an H_IPI hypercall when the VCPU which is the target of the the H_IPI is not loaded (is not running in the guest). The patch also includes the support in the IPI handler running in the host to do the wakeup by calling kvmppc_xics_ipi_action for the PPC_MSG_RM_HOST_ACTION message. When a guest is being destroyed, we need to ensure that there are no pending IPIs waiting to wake up a VCPU before we free the VCPUs of the guest. This is accomplished by: - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs before freeing any VCPUs in kvm_arch_destroy_vm(). - Any PPC_MSG_RM_HOST_ACTION messages must be executed first before any other PPC_MSG_CALL_FUNCTION messages. Signed-off-by: Suresh Warrier <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2016-02-29KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVMSuresh Warrier3-0/+39
This patch adds the support for the kick VCPU operation for kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function provides the function to be invoked for a host side operation when poked by the real mode KVM. This is initiated by KVM by sending an IPI to any free host core. KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and rm_data to point to the VCPU to be woken up before sending the IPI. Note that we have allocated one kvmppc_host_rm_core structure per core. The above values need to be set in the structure corresponding to the core to which the IPI will be sent. Signed-off-by: Suresh Warrier <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2016-02-29KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUsSuresh Warrier1-0/+39
The kvmppc_host_rm_ops structure keeps track of which cores are are in the host by maintaining a bitmask of active/runnable online CPUs that have not entered the guest. This patch adds support to manage the bitmask when a CPU is offlined or onlined in the host. Signed-off-by: Suresh Warrier <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2016-02-29KVM: PPC: Book3S HV: Manage core host stateSuresh Warrier1-0/+44
Update the core host state in kvmppc_host_rm_ops whenever the primary thread of the core enters the guest or returns back. Signed-off-by: Suresh Warrier <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2016-02-29KVM: PPC: Book3S HV: Host-side RM data structuresSuresh Warrier3-0/+104
This patch defines the data structures to support the setting up of host side operations while running in real mode in the guest, and also the functions to allocate and free it. The operations are for now limited to virtual XICS operations. Currently, we have only defined one operation in the data structure: - Wake up a VCPU sleeping in the host when it receives a virtual interrupt The operations are assigned at the core level because PowerKVM requires that the host run in SMT off mode. For each core, we will need to manage its state atomically - where the state is defined by: 1. Is the core running in the host? 2. Is there a Real Mode (RM) operation pending on the host? Currently, core state is only managed at the whole-core level even when the system is in split-core mode. This just limits the number of free or "available" cores in the host to perform any host-side operations. The kvmppc_host_rm_core.rm_data allows any data to be passed by KVM in real mode to the host core along with the operation to be performed. The kvmppc_host_rm_ops structure is allocated the very first time a guest VM is started. Initial core state is also set - all online cores are in the host. This structure is never deleted, not even when there are no active guests. However, it needs to be freed when the module is unloaded because the kvmppc_host_rm_ops_hv can contain function pointers to kvm-hv.ko functions for the different supported host operations. Signed-off-by: Suresh Warrier <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2016-02-29powerpc/xics: Add icp_native_cause_ipi_rmSuresh Warrier2-0/+22
Function to cause an IPI by directly updating the MFFR register in the XICS. The function is meant for real-mode callers since they cannot use the smp_ops->cause_ipi function which uses an ioremapped address. Normal usage is for the the KVM real mode code to set the IPI message using smp_muxed_ipi_message_pass and then invoke icp_native_cause_ipi_rm to cause the actual IPI. The function requires kvm_hstate.xics_phys to have been initialized with the physical address of XICS. Signed-off-by: Suresh Warrier <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2016-02-29powerpc/smp: Add smp_muxed_ipi_set_messageSuresh Warrier2-1/+9
smp_muxed_ipi_message_pass() invokes smp_ops->cause_ipi, which uses an ioremapped address to access registers on the XICS interrupt controller to cause the IPI. Because of this real mode callers cannot call smp_muxed_ipi_message_pass() for IPI messaging. This patch creates a separate function smp_muxed_ipi_set_message just to set the IPI message without the cause_ipi routine. After calling this function to set the IPI message, real mode callers must cause the IPI by writing to the XICS registers directly. As part of this, we also change smp_muxed_ipi_message_pass to call smp_muxed_ipi_set_message to set the message instead of doing it directly inside the routine. Signed-off-by: Suresh Warrier <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2016-02-29powerpc/smp: Support more IPI messagesSuresh Warrier2-4/+7
This patch increases the number of demuxed messages for a controller with a single ipi to 8 for 64-bit systems. This is required because we want to use the IPI mechanism to send messages from a CPU running in KVM real mode in a guest to a CPU in the host to take some action. Currently, we only support 4 messages and all 4 are already taken. Define a fifth message PPC_MSG_RM_HOST_ACTION for this purpose. Signed-off-by: Suresh Warrier <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2016-02-27mm: ASLR: use get_random_long()Daniel Cashman2-4/+4
Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: Daniel Cashman <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: David S. Miller <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Al Viro <[email protected]> Cc: Nick Kralevich <[email protected]> Cc: Jeff Vander Stoep <[email protected]> Cc: Mark Salyzyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-02-27powerpc/mm/book3s-64: Free up 7 high-order bits in the Linux PTEPaul Mackerras5-11/+14
This frees up bits 57-63 in the Linux PTE on 64-bit Book 3S machines. In the 4k page case, this is done just by reducing the size of the RPN field to 39 bits, giving 51-bit real addresses. In the 64k page case, we had 10 unused bits in the middle of the PTE, so this moves the RPN field down 10 bits to make use of those unused bits. This means the RPN field is now 3 bits larger at 37 bits, giving 53-bit real addresses in the normal case, or 49-bit real addresses for the special 4k PFN case. We are doing this in order to be able to move some other PTE bits into the positions where PowerISA V3.0 processors will expect to find them in radix-tree mode. Ultimately we will be able to move the RPN field to lower bit positions and make it larger. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-27powerpc/mm/book3s-64: Clean up some obsolete or misleading commentsPaul Mackerras3-14/+12
No code changes. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-25Merge tag 'powerpc-4.5-4' of ↵Linus Torvalds3-4/+19
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - eeh: Fix partial hotplug criterion from Gavin Shan - mm: Clear the invalid slot information correctly from Aneesh Kumar K.V * tag 'powerpc-4.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm/hash: Clear the invalid slot information correctly powerpc/eeh: Fix partial hotplug criterion
2016-02-25net: Facility to report route quality of connected socketsTom Herbert1-0/+2
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The purpose is to allow an application to give feedback to the kernel about the quality of the network path for a connected socket. The value argument indicates the type of quality report. For this initial patch the only supported advice is a value of 1 which indicates "bad path, please reroute"-- the action taken by the kernel is to call dst_negative_advice which will attempt to choose a different ECMP route, reset the TX hash for flow label and UDP source port in encapsulation, etc. This facility should be useful for connected UDP sockets where only the application can provide any feedback about path quality. It could also be useful for TCP applications that have additional knowledge about the path outside of the normal TCP control loop. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25Merge tag 'powerpc-4.5-4' into nextMichael Ellerman14-11/+101
Pull in our current fixes from 4.5, in particular the "Fix Multi hit ERAT" bug is causing folks some grief when testing next.
2016-02-25KVM: Use simple waitqueue for vcpu->wqMarcelo Tosatti2-14/+13
The problem: On -rt, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: This patch reduces the average latency in my tests from 14us to 11us. Daniel writes: Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency benchmark on mainline. The test was run 1000 times on tip/sched/core 4.4.0-rc8-01134-g0905f04: ./x86-run x86/tscdeadline_latency.flat -cpu host with idle=poll. The test seems not to deliver really stable numbers though most of them are smaller. Paolo write: "Anything above ~10000 cycles means that the host went to C1 or lower---the number means more or less nothing in that case. The mean shows an improvement indeed." Before: min max mean std count 1000.000000 1000.000000 1000.000000 1000.000000 mean 5162.596000 2019270.084000 5824.491541 20681.645558 std 75.431231 622607.723969 89.575700 6492.272062 min 4466.000000 23928.000000 5537.926500 585.864966 25% 5163.000000 1613252.750000 5790.132275 16683.745433 50% 5175.000000 2281919.000000 5834.654000 23151.990026 75% 5190.000000 2382865.750000 5861.412950 24148.206168 max 5228.000000 4175158.000000 6254.827300 46481.048691 After min max mean std count 1000.000000 1000.00000 1000.000000 1000.000000 mean 5143.511000 2076886.10300 5813.312474 21207.357565 std 77.668322 610413.09583 86.541500 6331.915127 min 4427.000000 25103.00000 5529.756600 559.187707 25% 5148.000000 1691272.75000 5784.889825 17473.518244 50% 5160.000000 2308328.50000 5832.025000 23464.837068 75% 5172.000000 2393037.75000 5853.177675 24223.969976 max 5222.000000 3922458.00000 6186.720500 42520.379830 [Patch was originaly based on the swait implementation found in the -rt tree. Daniel ported it to mainline's version and gathered the benchmark numbers for tscdeadline_latency test.] Signed-off-by: Daniel Wagner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Boqun Feng <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-02-24powerpc: Fix BUG_ON() reporting in real modeBalbir Singh1-1/+9
I ran into this issue while debugging an early boot problem. The system hit a BUG_ON() but report bug failed to print the line number and file name. The reason being that the system was running in real mode and report_bug() searches for addresses in the PAGE_OFFSET+ region. Suggested-by: Paul Mackerras <[email protected]> Signed-off-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-24powerpc: Use BUILD_BUG_ON_MSG() for unsupported {cmp}xchg sizespan xinhui1-16/+7
__xchg_called_with_bad_pointer() can't tell us which code uses {cmp}xchg with an unsupported size, and no error is reported until the link stage. To make such problems easier to debug, use BUILD_BUG_ON_MSG() instead. Signed-off-by: pan xinhui <[email protected]> [mpe: Tweak change log wording & add relaxed/acquire] Signed-off-by: Michael Ellerman <[email protected]> fixup
2016-02-24powerpc/powernv: Add AST graphics driver to powernv_defconfigJeremy Kerr1-2/+2
Most current OpenPOWER platforms have an AST BMC, so add graphics support via the AST DRM driver. Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-24powerpc/powernv: Add powernv firmware interface drivers to powernv_defconfigJeremy Kerr1-0/+6
There are a few firmware-provided interfaces for OpenPOWER platforms: the PRD infrastructure, IPMI support, and MTD access to the PNOR flash. This change adds these to powernv_defconfig Signed-off-by: Jeremy Kerr <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-24powerpc/powernv: Add powernv_defconfigJeremy Kerr1-0/+307
This change adds a defconfig for the non-virtualised power platforms, based on pseries_defconfig, but without pseries, and little-endian, and no OF trampoline. Signed-off-by: Jeremy Kerr <[email protected]> Acked-by: Joel Stanley <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-22powerpc: Add POWER9 cputable entryMichael Neuling6-12/+95
Add a cputable entry for POWER9. More code is required to actually boot and run on a POWER9 but this gets the base piece in which we can start building on. Copies over from POWER8 except for: - Adds a new CPU_FTR_ARCH_300 bit to start hanging new architecture features from (in subsequent patches). - Advertises new user features bits PPC_FEATURE2_ARCH_3_00 & HAS_IEEE128 when on POWER9. - Drops CPU_FTR_SUBCORE. - Drops PMU code and machine check. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-22powerpc: Use defines for __init_tlb_power[78]Michael Neuling1-2/+3
Use defines for literals __init_tlb_power[78] rather than hand coding them. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-22powerpc/powernv: Create separate subcores CPU feature bitMichael Neuling2-2/+3
Subcores isn't really part of the 2.07 architecture but currently we turn it on using the 2.07 feature bit. Subcores is really a POWER8 specific feature. This adds a new CPU_FTR bit just for subcores and moves the subcore init code over to use this. Reviewed-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-22powerpc/powernv: don't create OPAL msglog sysfs entry if memcons init failsAndrew Donnellan1-0/+5
When initialising OPAL interfaces, there is a possibility that opal_msglog_init() may fail to initialise the msglog/memory console. Fix opal_msglog_sysfs_init() so it doesn't try to create sysfs entry for the msglog if this occurs. Suggested-by: Joel Stanley <[email protected]> Fixes: 9b4fffa14906 ("powerpc/powernv: new function to access OPAL msglog") Signed-off-by: Andrew Donnellan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-22powerpc/mm/hash: Clear the invalid slot information correctlyAneesh Kumar K.V2-2/+18
We can get a hash pte fault with 4k base page size and find the pte already inserted with 64K base page size. In that case we need to clear the existing slot information from the old pte. Fix this correctly With THP, we also clear the slot information with respect to all the 64K hash pte mapping that 16MB page. They are all invalid now. This make sure we don't find the slot valid when we fault with 4k base page size. Finding the slot valid should not result in any wrong behavior because we do check again in hash page table for the validity. But we can avoid that check completely. Fixes: a43c0eb8364c022 ("powerpc/mm: Convert 4k hash insert to C") Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-22powerpc/eeh: Fix partial hotplug criterionGavin Shan1-2/+1
During error recovery, the device could be removed as part of the partial hotplug. The criterion used to come with partial hotplug is: if the device driver provides error_detected(), slot_reset() and resume() callbacks, it's immune from hotplug. Otherwise, it's going to experience partial hotplug during EEH recovery. But the criterion isn't correct enough: mlx4_core driver for Mellanox adapters provides error_detected(), slot_reset() callbacks, but resume() isn't there. Those Mellanox adapters won't be to involved in the partial hotplug. This fixes the criterion to a practical one: adpater with driver that provides error_detected(), slot_reset() will be immune from partial hotplug. resume() isn't mandatory. Fixes: f2da4ccf ("powerpc/eeh: More relaxed hotplug criterion") Cc: [email protected] #v4.4+ Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-20Merge tag 'powerpc-4.5-3' of ↵Linus Torvalds12-7/+82
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix build error on 32-bit with checkpoint restart from Aneesh Kumar - Fix dedotify for binutils >= 2.26 from Andreas Schwab - Don't trace hcalls on offline CPUs from Denis Kirjanov - eeh: Fix stale cached primary bus from Gavin Shan - eeh: Fix stale PE primary bus from Gavin Shan - mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V - ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy * tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/ioda: Set "read" permission when "write" is set powerpc/mm: Fix Multi hit ERAT cause by recent THP update powerpc/powernv: Fix stale PE primary bus powerpc/eeh: Fix stale cached primary bus powerpc/pseries: Don't trace hcalls on offline CPUs powerpc: Fix dedotify for binutils >= 2.26 powerpc/book3s_32: Fix build error with checkpoint restart
2016-02-18mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()Dave Hansen1-2/+3
This plumbs a protection key through calc_vm_flag_bits(). We could have done this in calc_vm_prot_bits(), but I did not feel super strongly which way to go. It was pretty arbitrary which one to use. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Chen Gang <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Airlie <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Geliang Tang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Riley Andrews <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-18mm/core, x86/mm/pkeys: Differentiate instruction fetchesDave Hansen1-1/+1
As discussed earlier, we attempt to enforce protection keys in software. However, the code checks all faults to ensure that they are not violating protection key permissions. It was assumed that all faults are either write faults where we check PKRU[key].WD (write disable) or read faults where we check the AD (access disable) bit. But, there is a third category of faults for protection keys: instruction faults. Instruction faults never run afoul of protection keys because they do not affect instruction fetches. So, plumb the PF_INSTR bit down in to the arch_vma_access_permitted() function where we do the protection key checks. We also add a new FAULT_FLAG_INSTRUCTION. This is because handle_mm_fault() is not passed the architecture-specific error_code where we keep PF_INSTR, so we need to encode the instruction fetch information in to the arch-generic fault flags. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-18mm/core: Do not enforce PKEY permissions on remote mm accessDave Hansen1-1/+2
We try to enforce protection keys in software the same way that we do in hardware. (See long example below). But, we only want to do this when accessing our *own* process's memory. If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then tried to PTRACE_POKE a target process which just happened to have some mprotect_pkey(pkey=6) memory, we do *not* want to deny the debugger access to that memory. PKRU is fundamentally a thread-local structure and we do not want to enforce it on access to _another_ thread's data. This gets especially tricky when we have workqueues or other delayed-work mechanisms that might run in a random process's context. We can check that we only enforce pkeys when operating on our *own* mm, but delayed work gets performed when a random user context is active. We might end up with a situation where a delayed-work gup fails when running randomly under its "own" task but succeeds when running under another process. We want to avoid that. To avoid that, we use the new GUP flag: FOLL_REMOTE and add a fault flag: FAULT_FLAG_REMOTE. They indicate that we are walking an mm which is not guranteed to be the same as current->mm and should not be subject to protection key enforcement. Thanks to Jerome Glisse for pointing out this scenario. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Alexey Kardashevskiy <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boaz Harrosh <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Gibson <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Dominik Dingel <[email protected]> Cc: Dominik Vogt <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Geliang Tang <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Low <[email protected]> Cc: Jerome Marchand <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mikulas Patocka <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Shachar Raindel <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xie XiuQi <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-18mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keysDave Hansen1-0/+11
Today, for normal faults and page table walks, we check the VMA and/or PTE to ensure that it is compatible with the action. For instance, if we get a write fault on a non-writeable VMA, we SIGSEGV. We try to do the same thing for protection keys. Basically, we try to make sure that if a user does this: mprotect(ptr, size, PROT_NONE); *ptr = foo; they see the same effects with protection keys when they do this: mprotect(ptr, size, PROT_READ|PROT_WRITE); set_pkey(ptr, size, 4); wrpkru(0xffffff3f); // access disable pkey 4 *ptr = foo; The state to do that checking is in the VMA, but we also sometimes have to do it on the page tables only, like when doing a get_user_pages_fast() where we have no VMA. We add two functions and expose them to generic code: arch_pte_access_permitted(pte_flags, write) arch_vma_access_permitted(vma, write) These are, of course, backed up in x86 arch code with checks against the PTE or VMA's protection key. But, there are also cases where we do not want to respect protection keys. When we ptrace(), for instance, we do not want to apply the tracer's PKRU permissions to the PTEs from the process being traced. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Alexey Kardashevskiy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boaz Harrosh <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Gibson <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Vrabel <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Dominik Dingel <[email protected]> Cc: Dominik Vogt <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Low <[email protected]> Cc: Jerome Marchand <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mikulas Patocka <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Shachar Raindel <[email protected]> Cc: Stephen Smalley <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-18powerpc: atomic: Implement acquire/release/relaxed variants for cmpxchgBoqun Feng2-1/+158
Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on which _release variants can be built. To avoid superfluous barriers in _acquire variants, we implement these operations with assembly code rather use __atomic_op_acquire() to build them automatically. For the same reason, we keep the assembly implementation of fully ordered cmpxchg operations. However, we don't do the similar for _release, because that will require putting barriers in the middle of ll/sc loops, which is probably a bad idea. Note cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed are not compiler barriers. Signed-off-by: Boqun Feng <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-18powerpc: atomic: Implement acquire/release/relaxed variants for xchgBoqun Feng2-39/+32
Implement xchg{,64}_relaxed and atomic{,64}_xchg_relaxed, based on these _relaxed variants, release/acquire variants and fully ordered versions can be built. Note that xchg{,64}_relaxed and atomic_{,64}_xchg_relaxed are not compiler barriers. Signed-off-by: Boqun Feng <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-18powerpc: atomic: Implement atomic{, 64}_*_return_* variantsBoqun Feng1-62/+85
On powerpc, acquire and release semantics can be achieved with lightweight barriers("lwsync" and "ctrl+isync"), which can be used to implement __atomic_op_{acquire,release}. For release semantics, since we only need to ensure all memory accesses that issue before must take effects before the -store- part of the atomics, "lwsync" is what we only need. On the platform without "lwsync", "sync" should be used. Therefore in __atomic_op_release() we use PPC_RELEASE_BARRIER. For acquire semantics, "lwsync" is what we only need for the similar reason. However on the platform without "lwsync", we can use "isync" rather than "sync" as an acquire barrier. Therefore in __atomic_op_acquire() we use PPC_ACQUIRE_BARRIER, which is barrier() on UP, "lwsync" if available and "isync" otherwise. Implement atomic{,64}_{add,sub,inc,dec}_return_relaxed, and build other variants with these helpers. Signed-off-by: Boqun Feng <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-18powerpc: Fix kgdb on little endian ppc64leBalbir Singh1-0/+4
I spent some time trying to use kgdb and debugged my inability to resume from kgdb_handle_breakpoint(). NIP is not incremented and that leads to a loop in the debugger. I've tested this lightly on a virtual instance with KDB enabled. After the patch, I am able to get the "go" command to work as expected. Signed-off-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-17powerpc/ioda: Set "read" permission when "write" is setAlexey Kardashevskiy1-0/+6
Quite often drivers set only "write" permission assuming that this includes "read" permission as well and this works on plenty of platforms. However IODA2 is strict about this and produces an EEH when "read" permission is not set and reading happens. This adds a workaround in the IODA code to always add the "read" bit when the "write" bit is set. Fixes: 10b35b2b7485 ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE") Cc: [email protected] # 4.2+ Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Tested-by: Douglas Miller <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-02-17crypto: xts - fix compile errorsStephan Mueller1-0/+1
Commit 28856a9e52c7 missed the addition of the crypto/xts.h include file for different architecture-specific AES implementations. Signed-off-by: Stephan Mueller <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2016-02-17crypto: xts - consolidate sanity check for keysStephan Mueller1-0/+5
The patch centralizes the XTS key check logic into the service function xts_check_key which is invoked from the different XTS implementations. With this, the XTS implementations in ARM, ARM64, PPC and S390 have now a sanity check for the XTS keys similar to the other arches. In addition, this service function received a check to ensure that the key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the check is not present in the standards defining XTS, it is only enforced in FIPS mode of the kernel. Signed-off-by: Stephan Mueller <[email protected]> Signed-off-by: Herbert Xu <[email protected]>