aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)AuthorFilesLines
2018-02-13powerpc/mm: Fix crashes with 16G huge pagesAneesh Kumar K.V4-2/+6
To support memory keys, we moved the hash pte slot information to the second half of the page table. This was ok with PTE entries at level 4 (PTE page) and level 3 (PMD). We already allocate larger page table pages at those levels to accomodate extra details. For level 4 we already have the extra space which was used to track 4k hash page table entry details and at level 3 the extra space was allocated to track the THP details. With hugetlbfs PTE, we used this extra space at the PMD level to store the slot details. But we also support hugetlbfs PTE at PUD level for 16GB pages and PUD level page didn't allocate extra space. This resulted in memory corruption. Fix this by allocating extra space at PUD level when HUGETLB is enabled. Fixes: bf9a95f9a648 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-02-13powerpc/mm: Flush radix process translations when setting MMU typeAlexey Kardashevskiy1-0/+2
Radix guests do normally invalidate process-scoped translations when a new pid is allocated but migrated guests do not invalidate these so migrated guests crash sometime, especially easy to reproduce with migration happening within first 10 seconds after the guest boot start on the same machine. This adds the "Invalidate process-scoped translations" flush to fix radix guests migration. Fixes: 2ee13be34b13 ("KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00") Cc: [email protected] # v4.10+ Signed-off-by: Alexey Kardashevskiy <[email protected]> Tested-by: Laurent Vivier <[email protected]> Tested-by: Daniel Henrique Barboza <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-02-08powerpc/mm/radix: Split linear mapping on hot-unplugBalbir Singh1-21/+74
This patch splits the linear mapping if the hot-unplug range is smaller than the mapping size. The code detects if the mapping needs to be split into a smaller size and if so, uses the stop machine infrastructure to clear the existing mapping and then remap the remaining range using a smaller page size. The code will skip any region of the mapping that overlaps with kernel text and warn about it once. We don't want to remove a mapping where the kernel text and the LMB we intend to remove overlap in the same TLB mapping as it may affect the currently executing code. I've tested these changes under a kvm guest with 2 vcpus, from a split mapping point of view, some of the caveats mentioned above applied to the testing I did. Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()") Signed-off-by: Balbir Singh <[email protected]> [mpe: Tweak change log to match updated behaviour] Signed-off-by: Michael Ellerman <[email protected]>
2018-02-08powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PIDNicholas Piggin1-1/+20
This change restores and formalises the behaviour that access to NULL or other user addresses by the kernel during boot should fault rather than succeed and modify memory. This was inadvertently broken when fixing another bug, because it was previously not well defined and only worked by chance. powerpc/64s/radix uses high address bits to select an address space "quadrant", which determines which PID and LPID are used to translate the rest of the address (effective PID, effective LPID). The kernel mapping at 0xC... selects quadrant 3, which uses PID=0 and LPID=0. So the kernel page tables are installed in the PID 0 process table entry. An address at 0x0... selects quadrant 0, which uses PID=PIDR for translating the rest of the address (that is, it uses the value of the PIDR register as the effective PID). If PIDR=0, then the translation is performed with the PID 0 process table entry page tables. This is the kernel mapping, so we effectively get another copy of the kernel address space at 0. A NULL pointer access will access physical memory address 0. To prevent duplicating the kernel address space in quadrant 0, this patch allocates a guard PID containing no translations, and initializes PIDR with this during boot, before the MMU is switched on. Any kernel access to quadrant 0 will use this guard PID for translation and find no valid mappings, and therefore fault. After boot, this PID will be switchd away to user context PIDs, but those contain user mappings (and usually NULL pointer protection) rather than kernel mapping, which is much safer (and by design). It may be in future this is tightened further, which the guard PID could be used for. Commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table"), introduced this problem because it zeroes PIDR at boot. However previously the value was inherited from firmware or kexec, which is not robust and can be zero (e.g., mambo). Fixes: 371b80447ff3 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table") Cc: [email protected] # v4.15+ Reported-by: Florian Weimer <[email protected]> Tested-by: Mauricio Faria de Oliveira <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-02-08powerpc/numa: Invalidate numa_cpu_lookup_table on cpu removeNathan Fontenot1-5/+0
When DLPAR removing a CPU, the unmapping of the cpu from a node in unmap_cpu_from_node() should also invalidate the CPUs entry in the numa_cpu_lookup_table. There is not a guarantee that on a subsequent DLPAR add of the CPU the associativity will be the same and thus could be in a different node. Invalidating the entry in the numa_cpu_lookup_table causes the associativity to be read from the device tree at the time of the add. The current behavior of not invalidating the CPUs entry in the numa_cpu_lookup_table can result in scenarios where the the topology layout of CPUs in the partition does not match the device tree or the topology reported by the HMC. This bug looks like it was introduced in 2004 in the commit titled "ppc64: cpu hotplug notifier for numa", which is 6b15e4e87e32 in the linux-fullhist tree. Hence tag it for all stable releases. Cc: [email protected] Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Tyrel Datwyler <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-02-06Merge branch 'linus' into sched/urgent, to resolve conflictsIngo Molnar25-406/+1473
Conflicts: arch/arm64/kernel/entry.S arch/x86/Kconfig include/linux/sched/mm.h kernel/fork.c Signed-off-by: Ingo Molnar <[email protected]>
2018-02-06Merge tag 'libnvdimm-for-4.16' of ↵Linus Torvalds2-15/+13
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ross Zwisler: - Require struct page by default for filesystem DAX to remove a number of surprising failure cases. This includes failures with direct I/O, gdb and fork(2). - Add support for the new Platform Capabilities Structure added to the NFIT in ACPI 6.2a. This new table tells us whether the platform supports flushing of CPU and memory controller caches on unexpected power loss events. - Revamp vmem_altmap and dev_pagemap handling to clean up code and better support future future PCI P2P uses. - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}. - Enhance nfit_test so we can test some of the new things added in version 1.6 of the DSM specification. This includes testing firmware download and simulating the Last Shutdown State (LSS) status. * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits) libnvdimm, namespace: remove redundant initialization of 'nd_mapping' acpi, nfit: fix register dimm error handling libnvdimm, namespace: make min namespace size 4K tools/testing/nvdimm: force nfit_test to depend on instrumented modules libnvdimm/nfit_test: adding support for unit testing enable LSS status libnvdimm/nfit_test: add firmware download emulation nfit-test: Add platform cap support from ACPI 6.2a to test libnvdimm: expose platform persistence attribute for nd_region acpi: nfit: add persistent memory control flag for nd_region acpi: nfit: Add support for detect platform CPU cache flush on power loss device-dax: Fix trailing semicolon libnvdimm, btt: fix uninitialized err_lock dax: require 'struct page' by default for filesystem dax ext2: auto disable dax instead of failing mount ext4: auto disable dax instead of failing mount mm, dax: introduce pfn_t_special() mm: Fix devm_memremap_pages() collision handling mm: Fix memory size alignment in devm_memremap_pages_release() memremap: merge find_dev_pagemap into get_dev_pagemap memremap: change devm_memremap_pages interface to use struct dev_pagemap ...
2018-02-05powerpc, membarrier: Skip memory barrier in switch_mm()Mathieu Desnoyers1-0/+7
Allow PowerPC to skip the full memory barrier in switch_mm(), and only issue the barrier when scheduling into a task belonging to a process that has registered to use expedited private. Threads targeting the same VM but which belong to different thread groups is a tricky case. It has a few consequences: It turns out that we cannot rely on get_nr_threads(p) to count the number of threads using a VM. We can use (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) instead to skip the synchronize_sched() for cases where the VM only has a single user, and that user only has a single thread. It also turns out that we cannot use for_each_thread() to set thread flags in all threads using a VM, as it only iterates on the thread group. Therefore, test the membarrier state variable directly rather than relying on thread flags. This means membarrier_register_private_expedited() needs to set the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows private expedited membarrier commands to succeed. membarrier_arch_switch_mm() now tests for the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag. Signed-off-by: Mathieu Desnoyers <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alan Stern <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Parri <[email protected]> Cc: Andrew Hunter <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Avi Kivity <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Dave Watson <[email protected]> Cc: David Sehr <[email protected]> Cc: Greg Hackmann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Maged Michael <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Russell King <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-02-03Merge branch 'for-4.16/nfit' into libnvdimm-for-nextRoss Zwisler1-1/+6
2018-02-02Merge tag 'powerpc-4.16-1' of ↵Linus Torvalds23-367/+1455
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Enable support for memory protection keys aka "pkeys" on Power7/8/9 when using the hash table MMU. - Extend our interrupt soft masking to support masking PMU interrupts as well as "normal" interrupts, and then use that to implement local_t for a ~4x speedup vs the current atomics-based implementation. - A new driver "ocxl" for "Open Coherent Accelerator Processor Interface (OpenCAPI)" devices. - Support for new device tree properties on PowerVM to describe hotpluggable memory and devices. - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit VDSO. - Freescale updates from Scott: fixes for CPM GPIO and an FSL PCI erratum workaround, plus a minor cleanup patch. As well as quite a lot of other changes all over the place, and small fixes and cleanups as always. Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G. Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur, David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva, Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee, Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann, Vaibhav Jain, Vasyl Gomonovych" * tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (199 commits) powerpc/mm/radix: Fix build error when RADIX_MMU=n macintosh/ams-input: Use true and false for boolean values macintosh: change some data types from int to bool powerpc/watchdog: Print the NIP in soft_nmi_interrupt() powerpc/watchdog: regs can't be null in soft_nmi_interrupt() powerpc/watchdog: Tweak watchdog printks powerpc/cell: Remove axonram driver rtc-opal: Fix handling of firmware error codes, prevent busy loops powerpc/mpc52xx_gpt: make use of raw_spinlock variants macintosh/adb: Properly mark continued kernel messages powerpc/pseries: Fix cpu hotplug crash with memoryless nodes powerpc/numa: Ensure nodes initialized for hotplug powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes powerpc/kernel: Block interrupts when updating TIDR powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn powerpc/mm/nohash: do not flush the entire mm when range is a single page powerpc/pseries: Add Initialization of VF Bars powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV powerpc/eeh: Add EEH notify resume sysfs powerpc/eeh: Add EEH operations to notify resume ...
2018-01-31mm/thp: remove pmd_huge_split_prepare()Aneesh Kumar K.V1-22/+0
Instead of marking the pmd ready for split, invalidate the pmd. This should take care of powerpc requirement. Only side effect is that we mark the pmd invalid early. This can result in us blocking access to the page a bit longer if we race against a thp split. [[email protected]: rebased, dirty THP once] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Daney <[email protected]> Cc: David Miller <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31powerpc/mm: update pmdp_invalidate to return old pmd valueAneesh Kumar K.V1-2/+5
It's required to avoid losing dirty and accessed bits. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-27powerpc/pseries: Fix cpu hotplug crash with memoryless nodesMichael Bringmann1-1/+3
On powerpc systems with shared configurations of CPUs and memory and memoryless nodes at boot, an event ordering problem was observed on a SLES12 build platforms with the hot-add of CPUs to the memoryless nodes. * The most common error occurred when the memory SLAB driver attempted to reference the memoryless node to which a CPU was being added before the kernel had finished initializing all of the data structures for the CPU and exited 'device_online' under DLPAR/hot-add. Normally the memoryless node would be initialized through the call path device_online ... arch_update_cpu_topology ... find_cpu_nid ... try_online_node. This patch ensures that the powerpc node will be initialized as early as possible, even if it was memoryless and CPU-less at the point when we are trying to hot-add a new CPU to it. Signed-off-by: Michael Bringmann <[email protected]> Reviewed-by: Nathan Fontenot <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-27powerpc/numa: Ensure nodes initialized for hotplugMichael Bringmann1-10/+37
This patch fixes some problems encountered at runtime with configurations that support memory-less nodes, or that hot-add CPUs into nodes that are memoryless during system execution after boot. The problems of interest include: * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them are allowed to be 'possible' and 'online'. Memory allocations for those nodes are taken from another node that does have memory until and if memory is hot-added to the node. * Nodes which have no resources assigned at boot, but which may still be referenced subsequently by affinity or associativity attributes, are kept in the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs to the system can reference these nodes and bring them online instead of redirecting the references to one of the set of nodes known to have memory at boot. Note that this software operates under the context of CPU hotplug. We are not doing memory hotplug in this code, but rather updating the kernel's CPU topology (i.e. arch_update_cpu_topology / numa_update_cpu_topology). We are initializing a node that may be used by CPUs or memory before it can be referenced as invalid by a CPU hotplug operation. CPU hotplug operations are protected by a range of APIs including cpu_maps_update_begin/cpu_maps_update_done, cpus_read/write_lock / cpus_read/write_unlock, device locks, and more. Memory hotplug operations, including try_online_node, are protected by mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the case of CPUs being hot-added to a previously memoryless node, the try_online_node operation occurs wholly within the CPU locks with no overlap. Using HMC hot-add/hot-remove operations, we have been able to add and remove CPUs to any possible node without failures. HMC operations involve a degree self-serialization, though. Signed-off-by: Michael Bringmann <[email protected]> Reviewed-by: Nathan Fontenot <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-27powerpc/numa: Use ibm,max-associativity-domains to discover possible nodesMichael Bringmann1-3/+34
On powerpc systems which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized. These empty nodes may occur when, * Dedicated vs. shared resources. Shared resources require information such as the VPHN hcall for CPU assignment to nodes. Associativity decisions made based on dedicated resource rules, such as associativity properties in the device tree, may vary from decisions made using the values returned by the VPHN hcall. * memoryless nodes at boot. Nodes need to be defined as 'possible' at boot for operation with other code modules. Previously, the powerpc code would limit the set of possible nodes to those which have memory assigned at boot, and were thus online. Subsequent add/remove of CPUs or memory would only work with this subset of possible nodes. * memoryless nodes with CPUs at boot. Due to the previous restriction on nodes, nodes that had CPUs but no memory were being collapsed into other nodes that did have memory at boot. In practice this meant that the node assignment presented by the runtime kernel differed from the affinity and associativity attributes presented by the device tree or VPHN hcalls. Nodes that might be known to the pHyp were not 'possible' in the runtime kernel because they did not have memory at boot. This patch ensures that sufficient nodes are defined to support configuration requirements after boot, as well as at boot. This patch set fixes a couple of problems. * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them are allowed to be 'possible' and 'online'. Memory allocations for those nodes are taken from another node that does have memory until and if memory is hot-added to the node. * Nodes which have no resources assigned at boot, but which may still be referenced subsequently by affinity or associativity attributes, are kept in the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs to the system can reference these nodes and bring them online instead of redirecting to one of the set of nodes that were known to have memory at boot. This patch extracts the value of the lowest domain level (number of allocable resources) from the device tree property "ibm,max-associativity-domains" to use as the maximum number of nodes to setup as possibly available in the system. This new setting will override the instruction: nodes_and(node_possible_map, node_possible_map, node_online_map); presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). If the "ibm,max-associativity-domains" property is not present at boot, no operation will be performed to define or enable additional nodes, or enable the above 'nodes_and()'. Signed-off-by: Michael Bringmann <[email protected]> Reviewed-by: Nathan Fontenot <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-27powerpc/mm/nohash: do not flush the entire mm when range is a single pageChristophe Leroy1-1/+4
Most of the time, flush_tlb_range() is called on single pages. At the time being, flush_tlb_range() inconditionnaly calls flush_tlb_mm() which flushes at least the entire PID pages and on older CPUs like 4xx or 8xx it flushes the entire TLB table. This patch calls flush_tlb_page() instead of flush_tlb_mm() when the range is a single page. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-22powerpc/radix: Remove trace_tlbie call from radix__flush_tlb_allMahesh Salgaonkar1-2/+0
radix__flush_tlb_all() is called only in kexec path in real mode and any tracepoints at this stage will make kexec to fail if enabled. To verify enable tlbie trace before kexec. $ echo 1 > /sys/kernel/debug/tracing/events/powerpc/tlbie/enable == kexec into new kernel and kexec fails. Fix this by not calling trace_tlbie from radix__flush_tlb_all(). Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions") Cc: [email protected] # v4.13+ Signed-off-by: Mahesh Salgaonkar <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-22powerpc/pseries: Don't give a warning when HPT resizing isn't availableDavid Gibson1-1/+1
As of 438cc81a41 "powerpc/pseries: Automatically resize HPT for memory hot add/remove" when running on the pseries platform, we always attempt to use the PAPR extension to resize the hashed page table (HPT) when we add or remove memory. This is fine, but when the extension is available we'll give a harmless, but scary warning. This patch suppresses the warning in this case. It will still warn if the feature is supposed to be available, but didn't work. Signed-off-by: David Gibson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-21powerpc/hash: Skip non initialized page size in init_hpte_page_sizesAneesh Kumar K.V1-1/+1
One of the easiest way to test config with 4K HPTE is to disable 64K hardware page size like below. int __init htab_dt_scan_page_sizes(unsigned long node, size -= 3; prop += 3; base_idx = get_idx_from_shift(base_shift); - if (base_idx < 0) { + if (base_idx < 0 || base_idx == MMU_PAGE_64K) { /* skip the pte encoding also */ prop += lpnum * 2; size -= lpnum * 2; But then this results in error in other part of the code such as MPSS parsing where we look at 4K base page size and 64K actual page size support. This patch fix MPSS parsing by ignoring the actual page sizes marked unsupported. In reality this can happen only with a corrupt device tree. But it is good to tighten the error check. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-21Merge branch 'fixes' into nextMichael Ellerman1-1/+6
Merge our fixes branch from the 4.15 cycle. Unusually the fixes branch saw some significant features merged, notably the RFI flush patches, so we want the code in next to be tested against that, to avoid any surprises when the two are merged. There's also some other work on the panic handling that was reverted in fixes and we now want to do properly in next, which would conflict. And we also fix a few other minor merge conflicts.
2018-01-21powerpc/mm: Invalidate subpage_prot() system call on radix platformsAnshuman Khandual1-0/+3
Radix enabled platforms don't support subpage_prot() system calls. But at present the system call goes through without an error and fails later on while validating expected subpage accesses. Lets not allow the system call on powerpc radix platforms to begin with to prevent this confusion in user space. Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-21powerpc: Enable pkey subsystemRam Pai1-9/+55
PAPR defines 'ibm,processor-storage-keys' property. It exports two values. The first value holds the number of data-access keys and the second holds the number of instruction-access keys. Due to a bug in the firmware, instruction-access keys is always reported as zero. However any key can be configured to disable data-access and/or disable execution-access. The inavailablity of the second value is not a big handicap, though it could have been used to determine if the platform supported disable-execution-access. Non-PAPR platforms do not define this property in the device tree yet. Fortunately power8 is the only released Non-PAPR platform that is supported. Here, we hardcode the number of supported pkey to 32, by consulting the PowerISA3.0 This patch calculates the number of keys supported by the platform. Also it determines the platform support for read/write/execution access support for pkeys. Signed-off-by: Ram Pai <[email protected]> [mpe: Use a PVR check instead of CPU_FTR for execute. Restrict to Power7/8/9 for now until older CPUs are tested.] Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: Deliver SEGV signal on pkey violationRam Pai1-12/+27
The value of the pkey, whose protection got violated, is made available in si_pkey field of the siginfo structure. Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: introduce get_mm_addr_key() helperRam Pai1-0/+24
get_mm_addr_key() helper returns the pkey associated with an address corresponding to a given mm_struct. Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: Handle exceptions caused by pkey violationRam Pai1-0/+22
Handle Data and Instruction exceptions caused by memory protection-key. The CPU will detect the key fault if the HPTE is already programmed with the key. However if the HPTE is not hashed, a key fault will not be detected by the hardware. The software will detect pkey violation in such a case. Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: implementation for arch_vma_access_permitted()Ram Pai1-0/+34
This patch provides the implementation for arch_vma_access_permitted(). Returns true if the requested access is allowed by pkey associated with the vma. Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: helper to validate key-access permissions of a pteRam Pai1-0/+28
helper function that checks if the read/write/execute is allowed on the pte. Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: Program HPTE key protection bitsRam Pai1-0/+1
Map the PTE protection key bits to the HPTE key protection bits, while creating HPTE entries. Acked-by: Balbir Singh <[email protected]> Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: implementation for arch_override_mprotect_pkey()Ram Pai1-0/+36
arch independent code calls arch_override_mprotect_pkey() to return a pkey that best matches the requested protection. This patch provides the implementation. Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: ability to associate pkey to a vmaRam Pai1-0/+8
arch-independent code expects the arch to map a pkey into the vma's protection bit setting. The patch provides that ability. Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: introduce execute-only pkeyRam Pai1-0/+58
This patch provides the implementation of execute-only pkey. The architecture-independent layer expects the arch-dependent layer, to support the ability to create and enable a special key which has execute-only permission. Acked-by: Balbir Singh <[email protected]> Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: store and restore the pkey state across context switchesRam Pai1-1/+51
Store and restore the AMR, IAMR and UAMOR register state of the task before scheduling out and after scheduling in, respectively. Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: ability to create execute-disabled pkeysRam Pai1-0/+16
powerpc has hardware support to disable execute on a pkey. This patch enables the ability to create execute-disabled keys. Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: implementation for arch_set_user_pkey_access()Ram Pai1-0/+40
This patch provides the detailed implementation for a user to allocate a key and enable it in the hardware. It provides the plumbing, but it cannot be used till the system call is implemented. The next patch will do so. Reviewed-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: helper functions to initialize AMR, IAMR and UAMOR registersRam Pai1-0/+47
Introduce helper functions that can initialize the bits in the AMR, IAMR and UAMOR register; the bits that correspond to the given pkey. Reviewed-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: helper function to read, write AMR, IAMR, UAMOR registersRam Pai1-0/+36
Implements helper functions to read and write the key related registers; AMR, IAMR, UAMOR. AMR register tracks the read,write permission of a key IAMR register tracks the execute permission of a key UAMOR register enables and disables a key Acked-by: Balbir Singh <[email protected]> Reviewed-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: track allocation status of all pkeysRam Pai2-0/+42
Total 32 keys are available on power7 and above. However pkey 0,1 are reserved. So effectively we have 30 pkeys. On 4K kernels, we do not have 5 bits in the PTE to represent all the keys; we only have 3bits. Two of those keys are reserved; pkey 0 and pkey 1. So effectively we have 6 pkeys. This patch keeps track of reserved keys, allocated keys and keys that are currently free. Also it adds skeletal functions and macros, that the architecture-independent code expects to be available. Reviewed-by: Thiago Jung Bauermann <[email protected]> Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-20powerpc: initial pkey plumbingRam Pai3-0/+31
Basic plumbing to initialize the pkey system. Nothing is enabled yet. A later patch will enable it once all the infrastructure is in place. Signed-off-by: Ram Pai <[email protected]> [mpe: Rework copyrights to use SPDX tags] Signed-off-by: Michael Ellerman <[email protected]>
2018-01-19powerpc/64: Rename soft_enabled to irq_soft_maskMadhavan Srinivasan1-1/+1
Rename the paca->soft_enabled to paca->irq_soft_mask as it is no longer used as a flag for interrupt state, but a mask. Signed-off-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-19powerpc/64: Add #defines for paca->soft_enabled flagsMadhavan Srinivasan1-1/+1
Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when updating paca->soft_enabled. Replace the hardcoded values used when updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic change. Reviewed-by: Nicholas Piggin <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-18powerpc/pseries: lift RTAS limit for hashNicholas Piggin1-3/+5
With the previous patch to switch to 64-bit mode after returning from RTAS and before doing any memory accesses, the RMA limit need not be clamped to 1GB to avoid RTAS bugs. Keep the 1GB limit for older firmware (although this is more of a kernel concern than RTAS), and remove it starting with POWER9. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-18powerpc/pseries: lift RTAS limit for radixNicholas Piggin1-17/+4
With the previous patch to switch to 64-bit mode after returning from RTAS and before doing any memory accesses, the RMA limit need not be clamped to 1GB to avoid RTAS bugs. Keep the 1GB limit for older firmware (although this is more of a kernel concern than RTAS), and remove it starting with POWER9. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-18powerpc/pseries: radix is not subject to RMA limit, remove itNicholas Piggin1-7/+4
The radix guest is not subject to the paravirtualized HPT VRMA limit, so remove that from ppc64_rma_size calculation for that platform. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-18powerpc/powernv: Remove real mode access limit for early allocationsNicholas Piggin2-23/+34
This removes the RMA limit on powernv platform, which constrains early allocations such as PACAs and stacks. There are still other restrictions that must be followed, such as bolted SLB limits, but real mode addressing has no constraints. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-18powerpc/64s: Improve local TLB flush for boot and MCE on POWER9Nicholas Piggin4-0/+177
There are several cases outside the normal address space management where a CPU's entire local TLB is to be flushed: 1. Booting the kernel, in case something has left stale entries in the TLB (e.g., kexec). 2. Machine check, to clean corrupted TLB entries. One other place where the TLB is flushed, is waking from deep idle states. The flush is a side-effect of calling ->cpu_restore with the intention of re-setting various SPRs. The flush itself is unnecessary because in the first case, the TLB should not acquire new corrupted TLB entries as part of sleep/wake (though they may be lost). This type of TLB flush is coded inflexibly, several times for each CPU type, and they have a number of problems with ISA v3.0B: - The current radix mode of the MMU is not taken into account, it is always done as a hash flushn For IS=2 (LPID-matching flush from host) and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if the R field does not match the current radix mode. - ISA v3.0B hash must flush the partition and process table caches as well. - ISA v3.0B radix must flush partition and process scoped translations, partition and process table caches, and also the page walk cache. So consolidate the flushing code and implement it in C and inline asm under the mm/ directory with the rest of the flush code. Add ISA v3.0B cases for radix and hash, and use the radix flush in radix environment. Provide a way for IS=2 (LPID flush) to specify the radix mode of the partition. Have KVM pass in the radix mode of the guest. Take out the flushes from early cputable/dt_cpu_ftrs detection hooks, and move it later in the boot process after, the MMU registers are set up and before relocation is first turned on. The TLB flush is no longer called when restoring from deep idle states. This was not be done as a separate step because booting secondaries uses the same cpu_restore as idle restore, which needs the TLB flush. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-16powerpc: Use the TRAP macro whenever comparing a trap numberBenjamin Herrenschmidt1-1/+1
Trap numbers can have extra bits at the bottom that need to be filtered out. There are a few cases where we don't do that. It's possible that we got lucky but better safe than sorry. Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-16powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAPChristophe Leroy1-1/+1
When CONFIG_SWAP is set, the TLB miss handlers have to also take into account _PAGE_ACCESSED flag. At the moment it is done by anding _PAGE_ACCESSED into _PAGE_PRESENT using 3 instructions. This patch uses APG for handling _PAGE_ACCESSED, allowing to just copy _PAGE_ACCESSED bit into APG field, hence reducing the action to a single instruction. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-16powerpc/8xx: Remove _PAGE_USER and handle user access at PMD levelChristophe Leroy1-1/+1
As Linux kernel separates KERNEL and USER address spaces, there is therefore no need to flag USER access at page level. Today, the 8xx TLB handlers already handle user access in the L1 entry through Access Protection Groups, it is then natural to move the user access handling at PMD level once _PAGE_NA allows to handle PAGE_NONE protection without _PAGE_USER In the mean time, as we free up one bit in the PTE, we can use it to include SPS (page size flag) in the PTE and avoid handling it at every TLB miss hence removing special handling based on compiled page size. For _PAGE_EXEC, we rework it to use PP PTE bits, avoiding the copy of _PAGE_EXEC bit into the L1 entry. Unfortunatly we are not able to put it at the correct location as it conflicts with NA/RO/RW bits for data entries. Upper bits of APG in L1 entry overlap with PMD base address. In order to avoid having to filter that out, we set up all groups so that upper bits can have any value. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-16powerpc/mm: Introduce _PAGE_NAChristophe Leroy1-7/+11
Today, PAGE_NONE is defined as a page not having _PAGE_USER. In some circunstances, when the CPU supports it, it might be better to be able to flag a page with NO ACCESS. In a following patch, the 8xx will switch user access being flagged in the PMD, therefore it will not be possible anymore to use _PAGE_USER as a way to flag a page with no access. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-01-16powerpc/mm: extend _PAGE_PRIVILEGED to all CPUsChristophe Leroy5-33/+6
commit ac29c64089b74 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED") introduced _PAGE_PRIVILEGED for BOOK3S/64 This patch generalises _PAGE_PRIVILEGED for all CPUs, allowing to have either _PAGE_PRIVILEGED or _PAGE_USER or both. PPC_8xx has a _PAGE_SHARED flag which is set for and only for all non user pages. Lets rename it _PAGE_PRIVILEGED to remove confusion as it has nothing to do with Linux shared pages. On BookE, there's a _PAGE_BAP_SR which has to be set for kernel pages: defining _PAGE_PRIVILEGED as _PAGE_BAP_SR will make this generic Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>