aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)AuthorFilesLines
2018-04-05mm, powerpc: use vma_kernel_pagesize() in vma_mmu_pagesize()Dan Williams1-4/+1
Patch series "mm, smaps: MMUPageSize for device-dax", v3. Similar to commit 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct") here is another occasion where we want special-case hugetlbfs/hstate enabling to also apply to device-dax. This prompts the question what other hstate conversions we might do beyond ->split() and ->pagesize(), but this appears to be the last of the usages of hstate_vma() in generic/non-hugetlbfs specific code paths. This patch (of 3): The current powerpc definition of vma_mmu_pagesize() open codes looking up the page size via hstate. It is identical to the generic vma_kernel_pagesize() implementation. Now, vma_kernel_pagesize() is growing support for determining the page size of Device-DAX vmas in addition to the existing Hugetlbfs page size determination. Ideally, if the powerpc vma_mmu_pagesize() used vma_kernel_pagesize() it would automatically benefit from any new vma-type support that is added to vma_kernel_pagesize(). However, the powerpc vma_mmu_pagesize() is prevented from calling vma_kernel_pagesize() due to a circular header dependency that requires vma_mmu_pagesize() to be defined before including <linux/hugetlb.h>. Break this circular dependency by defining the default vma_mmu_pagesize() as a __weak symbol to be overridden by the powerpc version. Link: http://lkml.kernel.org/r/151996254179.27922.2213728278535578744.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Jane Chu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-05mm/migrate: rename migration reason MR_CMA to MR_CONTIG_RANGEAnshuman Khandual1-1/+1
alloc_contig_range() initiates compaction and eventual migration for the purpose of either CMA or HugeTLB allocations. At present, the reason code remains the same MR_CMA for either of these cases. Let's make it MR_CONTIG_RANGE which will appropriately reflect the reason code in both these cases. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-04powerpc/mm/radix: Parse disable_radix commandline correctly.Aneesh Kumar K.V1-1/+1
kernel parameter disable_radix takes different options disable_radix=yes|no|1|0 or just disable_radix. When using the later format we get below error. `Malformed early option 'disable_radix'` Fixes: 1fd6c0220710 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlbAneesh Kumar K.V1-5/+13
With 64k page size, we have hugetlb pte entries at the pmd and pud level for book3s64. We don't need to create a separate page table cache for that. With 4k we need to make sure hugepd page table cache for 16M is placed at PUD level and 16G at the PGD level. Simplify all these by not using HUGEPD_PD_SHIFT which is confusing for book3s64. Without this patch, with 64k page size we create pagetable caches with shift value 10 and 7 which are not used at all. Fixes: 419df06eea5b ("powerpc: Reduce the PTE_INDEX_SIZE") Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04powerpc/mm/radix: Update pte fragment count from 16 to 256 on radixAneesh Kumar K.V1-6/+2
With split PTL (page table lock) config, we allocate the level 4 (leaf) page table using pte fragment framework instead of slab cache like other levels. This was done to enable us to have split page table lock at the level 4 of the page table. We use page->plt backing the all the level 4 pte fragment for the lock. Currently with Radix, we use only 16 fragments out of the allocated page. In radix each fragment is 256 bytes which means we use only 4k out of the allocated 64K page wasting 60k of the allocated memory. This was done earlier to keep it closer to hash. This patch update the pte fragment count to 256, thereby using the full 64K page and reducing the memory usage. Performance tests shows really low impact even with THP disabled. With THP disabled we will be contenting further less on level 4 ptl and hence the impact should be further low. 256 threads: without patch (10 runs of ./ebizzy -m -n 1000 -s 131072 -S 100) median = 15678.5 stdev = 42.1209 with patch: median = 15354 stdev = 194.743 This is with THP disabled. With THP enabled the impact of the patch will be less. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04powerpc/mm/keys: Update documentation and remove unnecessary checkAneesh Kumar K.V2-23/+16
Adds more code comments. We also remove an unnecessary pkey check after we check for pkey error in this patch. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-01powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXTMathieu Malaterre1-1/+1
In commit 9690c1574268 ("powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT") an issue was discovered where `mm->context.id` was being truncated to an `unsigned int`, while the PID is actually an `unsigned long`. Update the earlier patch by fixing one remaining occurrence. Discovered during a compilation with W=1: arch/powerpc/mm/tlb-radix.c:702:19: error: comparison is always false due to limited range of data type [-Werror=type-limits] Signed-off-by: Mathieu Malaterre <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-01powerpc/64s: Remove POWER4 supportNicholas Piggin1-4/+5
POWER4 has been broken since at least the change 49d09bf2a6 ("powerpc/64s: Optimise MSR handling in exception handling"), which requires mtmsrd L=1 support. This was introduced in ISA v2.01, and POWER4 supports ISA v2.00. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-01powerpc/mm/32: Remove the reserved memory hackJonathan Neuschäfer3-8/+1
This hack, introduced in commit c5df7f775148 ("powerpc: allow ioremap within reserved memory regions") is now unnecessary. Signed-off-by: Jonathan Neuschäfer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-01powerpc/mm/32: Use page_is_ram to check for RAMJonathan Neuschäfer1-0/+1
On systems where there is MMIO space between different blocks of RAM in the physical address space, __ioremap_caller did not allow mapping these MMIO areas, because they were below the end RAM and thus considered RAM as well. Use the memblock-based page_is_ram function, which returns false for such MMIO holes. v2: Keep the check for p < virt_to_phys(high_memory). On 32-bit systems with high memory (memory above physical address 4GiB), the high memory is expected to be available though ioremap. The high_memory variable marks the end of low memory; comparing against it means that only ioremap requests for low RAM will be denied. Reported by Michael Ellerman. Signed-off-by: Jonathan Neuschäfer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-01powerpc/mm: Use memblock API for PPC32 page_is_ramJonathan Neuschäfer1-4/+0
To support accurate checking for different blocks of memory on PPC32, use the same memblock-based approach that's already used on PPC64 also on PPC32. Signed-off-by: Jonathan Neuschäfer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-01powerpc/mm: Simplify page_is_ram by using memblock_is_memoryJonathan Neuschäfer1-7/+1
Instead of open-coding the search in page_is_ram, call memblock_is_memory. Signed-off-by: Jonathan Neuschäfer <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-31Merge branch 'topic/paca' into nextMichael Ellerman6-86/+169
Bring in yet another series that touches KVM code, and might need to be merged into the kvm-ppc branch to resolve conflicts. This required some changes in pnv_power9_force_smt4_catch/release() due to the paca array becomming an array of pointers.
2018-03-31powerpc/mm/hash64: Increase the VA rangeAneesh Kumar K.V3-11/+4
This patch increases the max virtual (effective) address value to 4PB. With 4K page size config we continue to limit ourself to 64TB. Signed-off-by: Aneesh Kumar K.V <[email protected]> [mpe: Keep the H_PGTABLE_RANGE test, update it to work] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-31powerpc/mm: Add support for handling > 512TB address in SLB missAneesh Kumar K.V8-10/+149
For addresses above 512TB we allocate additional mmu contexts. To make it all easy, addresses above 512TB are handled with IR/DR=1 and with stack frame setup. The mmu_context_t is also updated to track the new extended_ids. To support upto 4PB we need a total 8 contexts. Signed-off-by: Aneesh Kumar K.V <[email protected]> [mpe: Minor formatting tweaks and comment wording, switch BUG to WARN in get_ea_context().] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-31powerpc/mm/slice: Consolidate return path in slice_get_unmapped_area()Aneesh Kumar K.V1-16/+20
In a following patch, on finding a free area we will need to do allocatinon of extra contexts as needed. Consolidating the return path for slice_get_unmapped_area() will make that easier. Split into a separate patch to make review easy. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-31powerpc/mm: Fix thread_pkey_regs_init()Ram Pai1-3/+3
thread_pkey_regs_init() initializes the pkey related registers instead of initializing the fields in the task structures. Fortunately those key related registers are re-set to zero when the task gets scheduled on the cpu. However its good to fix this glaringly visible error. Fixes: 06bb53b33804 ("powerpc: store and restore the pkey state across context switches") Signed-off-by: Ram Pai <[email protected]> Signed-off-by: Thiago Jung Bauermann <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-31powerpc/mm: Pass node id into create_section_mappingNicholas Piggin4-15/+15
Signed-off-by: Nicholas Piggin <[email protected]> [mpe: Move __map_kernel_page_nid() inside #ifdef SPARSEMEM_VMEMMAP] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-31powerpc/64s/radix: Allocate kernel page tables node-local if possibleNicholas Piggin1-24/+80
Try to allocate kernel page tables for direct mapping and vmemmap according to the node of the memory they will map. The node is not available for the linear map in early boot, so use range allocation to allocate the page tables from the region they map, which is effectively node-local. Signed-off-by: Nicholas Piggin <[email protected]> [mpe: Fix build error in radix__create_section_mapping()] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-31powerpc/64s/radix: Split early page table mapping to its own functionNicholas Piggin1-47/+65
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-30powerpc/mm/numa: move numa topology discovery earlierNicholas Piggin2-14/+23
Split sparsemem initialisation from basic numa topology discovery. Move the parsing earlier in boot, before pacas are allocated. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-30powerpc/64s: Allocate LPPACAs individuallyNicholas Piggin1-2/+2
We no longer allocate lppacas in an array, so this patch removes the 1kB static alignment for the structure, and enforces the PAPR alignment requirements at allocation time. We can not reduce the 1kB allocation size however, due to existing KVM hypervisors. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-30powerpc/64: Use array of paca pointers and allocate pacas individuallyNicholas Piggin1-1/+1
Change the paca array into an array of pointers to pacas. Allocate pacas individually. This allows flexibility in where the PACAs are allocated. Future work will allocate them node-local. Platforms that don't have address limits on PACAs would be able to defer PACA allocations until later in boot rather than allocate all possible ones up-front then freeing unused. This is slightly more overhead (one additional indirection) for cross CPU paca references, but those aren't too common. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-28Merge branch 'fixes' into nextMichael Ellerman4-80/+107
Merge our fixes branch from the 4.16 cycle. There were a number of important fixes merged, in particular some Power9 workarounds that we want in next for testing purposes. There's also been some conflicting changes in the CPU features code which are best merged and tested before going upstream.
2018-03-27powerpc/mm: Fix typo in commentsAlexey Kardashevskiy1-7/+7
Fixes: 912cc87a6 "powerpc/mm/radix: Add LPID based tlb flush helpers" Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/mm: Fix section mismatch warning in stop_machine_change_mapping()Mauricio Faria de Oliveira3-10/+10
Fix the warning messages for stop_machine_change_mapping(), and a number of other affected functions in its call chain. All modified functions are under CONFIG_MEMORY_HOTPLUG, so __meminit is okay (keeps them / does not discard them). Boot-tested on powernv/power9/radix-mmu and pseries/power8/hash-mmu. $ make -j$(nproc) CONFIG_DEBUG_SECTION_MISMATCH=y vmlinux ... MODPOST vmlinux.o WARNING: vmlinux.o(.text+0x6b130): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping() The function stop_machine_change_mapping() references the function __meminit create_physical_mapping(). This is often because stop_machine_change_mapping lacks a __meminit annotation or the annotation of create_physical_mapping is wrong. WARNING: vmlinux.o(.text+0x6b13c): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping() The function stop_machine_change_mapping() references the function __meminit create_physical_mapping(). This is often because stop_machine_change_mapping lacks a __meminit annotation or the annotation of create_physical_mapping is wrong. ... Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9Paul Mackerras1-0/+6
On POWER9, since commit cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9", 2017-01-30), we set both the radix and HPT bits in the client-architecture-support (CAS) vector, which tells the hypervisor that we can do either radix or HPT. According to PAPR, if we use this combination we are promising to do a H_REGISTER_PROC_TBL hcall later on to let the hypervisor know whether we are doing radix or HPT. We currently do this call if we are doing radix but not if we are doing HPT. If the hypervisor is able to support both radix and HPT guests, it would be entitled to defer allocation of the HPT until the H_REGISTER_PROC_TBL call, and to fail any attempts to create HPTEs until the H_REGISTER_PROC_TBL call. Thus we need to do a H_REGISTER_PROC_TBL call when we are doing HPT; otherwise we may crash at boot time. This adds the code to call H_REGISTER_PROC_TBL in this case, before we attempt to create any HPT entries using H_ENTER. Fixes: cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9") Cc: [email protected] # v4.11+ Signed-off-by: Paul Mackerras <[email protected]> Reviewed-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-23powerpc/mm: Fixup tlbie vs store ordering issue on POWER9Aneesh Kumar K.V3-1/+31
On POWER9, under some circumstances, a broadcast TLB invalidation might complete before all previous stores have drained, potentially allowing stale stores from becoming visible after the invalidation. This works around it by doubling up those TLB invalidations which was verified by HW to be sufficient to close the risk window. This will be documented in a yet-to-be-published errata. Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Aneesh Kumar K.V <[email protected]> [mpe: Enable the feature in the DT CPU features code for all Power9, rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-23powerpc/mm/radix: Move the functions that does the actual tlbie closerAneesh Kumar K.V1-32/+32
No functionality change. Just code movement to ease code changes later Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-23powerpc/mm/radix: Remove unused codeAneesh Kumar K.V1-40/+0
These function are not used in the code. Remove them. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-23powerpc/mm: Workaround Nest MMU bug with TLB invalidationsBenjamin Herrenschmidt1-7/+43
On POWER9 the Nest MMU may fail to invalidate some translations when doing a tlbie "by PID" or "by LPID" that is targeted at the TLB only and not the page walk cache. This works around it by forcing such invalidations to escalate to RIC=2 (full invalidation of TLB *and* PWC) when a coprocessor is in use for the context. Fixes: 03b8abedf4f4 ("cxl: Enable global TLBIs for cxl contexts") Cc: [email protected] # v4.15+ Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Balbir Singh <[email protected]> [balbirs: fixed spelling and coding style to quiesce checkpatch.pl] Tested-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-23powerpc/mm: Add tracking of the number of coprocessors using a contextBenjamin Herrenschmidt1-0/+1
Currently, when using coprocessors (which use the Nest MMU), we simply increment the active_cpu count to force all TLB invalidations to be come broadcast. Unfortunately, due to an errata in POWER9, we will need to know more specifically that coprocessors are in use. This maintains a separate copros counter in the MMU context for that purpose. NB. The commit mentioned in the fixes tag below is not at fault for the bug we're fixing in this commit and the next, but this fix applies on top the infrastructure it introduced. Fixes: 03b8abedf4f4 ("cxl: Enable global TLBIs for cxl contexts") Cc: [email protected] # v4.15+ Signed-off-by: Benjamin Herrenschmidt <[email protected]> Tested-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm/slice: remove radix calls to the slice codeNicholas Piggin2-15/+8
This is a tidy up which removes radix MMU calls into the slice code. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm/slice: Use const pointers to cached slice masks where possibleNicholas Piggin1-41/+38
The slice_mask cache was a basic conversion which copied the slice mask into caller's structures, because that's how the original code worked. In most cases the pointer can be used directly instead, saving a copy and an on-stack structure. On POWER8, this increases vfork+exec+exit performance by 0.3% and reduces time to mmap+munmap a 64kB page by 2%. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm/slice: remove dead codeNicholas Piggin1-6/+0
This code is never compiled in, and it gets broken by the next patch, so remove it. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm/slice: Switch to 3-operand slice bitops helpersNicholas Piggin1-15/+23
This converts the slice_mask bit operation helpers to be the usual 3-operand kind, which allows 2 inputs to set a different output without an extra copy, which is used in the next patch. Adds slice_copy_mask, which will be used in the next patch. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm/slice: implement slice_check_range_fitsNicholas Piggin1-28/+34
Rather than build slice masks from a range then use that to check for fit in a candidate mask, implement slice_check_range_fits that checks if a range fits in a mask directly. This allows several structures to be removed from stacks, and also we don't expect a huge range in a lot of these cases, so building and comparing a full mask is going to be more expensive than testing just one or two bits of the range. On POWER8, this increases vfork+exec+exit performance by 0.3% and reduces time to mmap+munmap a 64kB page by 5%. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm/slice: implement a slice mask cacheNicholas Piggin1-42/+70
Calculating the slice mask can become a signifcant overhead for get_unmapped_area. This patch adds a struct slice_mask for each page size in the mm_context, and keeps these in synch with the slices psize arrays and slb_addr_limit. On Book3S/64 this adds 288 bytes to the mm_context_t for the slice mask caches. On POWER8, this increases vfork+exec+exit performance by 9.9% and reduces time to mmap+munmap a 64kB page by 28%. Reduces time to mmap+munmap by about 10% on 8xx. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm/slice: pass pointers to struct slice_mask where possibleNicholas Piggin1-39/+45
Pass around const pointers to struct slice_mask where possible, rather than copies of slice_mask, to reduce stack and call overhead. checkstack.pl gives, before: 0x00000d1c slice_get_unmapped_area [slice.o]: 592 0x00001864 is_hugepage_only_range [slice.o]: 448 0x00000754 slice_find_area_topdown [slice.o]: 400 0x00000484 slice_find_area_bottomup.isra.1 [slice.o]: 272 0x000017b4 slice_set_range_psize [slice.o]: 224 0x00000a4c slice_find_area [slice.o]: 128 0x00000160 slice_check_fit [slice.o]: 112 after: 0x00000ad0 slice_get_unmapped_area [slice.o]: 448 0x00001464 is_hugepage_only_range [slice.o]: 288 0x000006c0 slice_find_area [slice.o]: 144 0x0000016c slice_check_fit [slice.o]: 128 0x00000528 slice_find_area_bottomup.isra.2 [slice.o]: 128 0x000013e4 slice_set_range_psize [slice.o]: 128 This increases vfork+exec+exit performance by 1.5%. Reduces time to mmap+munmap a 64kB page by 17%. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm/slice: tidy lpsizes and hpsizes update loopsNicholas Piggin1-10/+12
Make these loops look the same, and change their form so the important part is not wrapped over so many lines. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm/slice: Simplify and optimise slice context initialisationNicholas Piggin3-66/+20
The slice state of an mm gets zeroed then initialised upon exec. This is the only caller of slice_set_user_psize now, so that can be removed and instead implement a faster and simplified approach that requires no locking or checking existing state. This speeds up vfork+exec+exit performance on POWER8 by 3%. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/32: Make some functions staticMathieu Malaterre1-1/+1
These functions can all be static, make it so. Signed-off-by: Mathieu Malaterre <[email protected]> [mpe: Combine a patch of Mathieu's with some other static conversions] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-13powerpc/mm: Drop the function native_register_proc_table()Anshuman Khandual1-15/+0
This is left over from the segment table implementation and not getting called from any where now. Hence just drop it. Suggested-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-06powerpc/mm/slice: Allow up to 64 low slicesChristophe Leroy3-37/+41
While the implementation of the "slices" address space allows a significant amount of high slices, it limits the number of low slices to 16 due to the use of a single u64 low_slices_psize element in struct mm_context_t On the 8xx, the minimum slice size is the size of the area covered by a single PMD entry, ie 4M in 4K pages mode and 64M in 16K pages mode. This means we could have at least 64 slices. In order to override this limitation, this patch switches the handling of low_slices_psize to char array as done already for high_slices_psize. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-06powerpc/mm/slice: Fix hugepage allocation at hint address on 8xxChristophe Leroy3-3/+19
On the 8xx, the page size is set in the PMD entry and applies to all pages of the page table pointed by the said PMD entry. When an app has some regular pages allocated (e.g. see below) and tries to mmap() a huge page at a hint address covered by the same PMD entry, the kernel accepts the hint allthough the 8xx cannot handle different page sizes in the same PMD entry. 10000000-10001000 r-xp 00000000 00:0f 2597 /root/malloc 10010000-10011000 rwxp 00000000 00:0f 2597 /root/malloc mmap(0x10080000, 524288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x10080000 This results the app remaining forever in do_page_fault()/hugetlb_fault() and when interrupting that app, we get the following warning: [162980.035629] WARNING: CPU: 0 PID: 2777 at arch/powerpc/mm/hugetlbpage.c:354 hugetlb_free_pgd_range+0xc8/0x1e4 [162980.035699] CPU: 0 PID: 2777 Comm: malloc Tainted: G W 4.14.6 #85 [162980.035744] task: c67e2c00 task.stack: c668e000 [162980.035783] NIP: c000fe18 LR: c00e1eec CTR: c00f90c0 [162980.035830] REGS: c668fc20 TRAP: 0700 Tainted: G W (4.14.6) [162980.035854] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24044224 XER: 20000000 [162980.036003] [162980.036003] GPR00: c00e1eec c668fcd0 c67e2c00 00000010 c6869410 10080000 00000000 77fb4000 [162980.036003] GPR08: ffff0001 0683c001 00000000 ffffff80 44028228 10018a34 00004008 418004fc [162980.036003] GPR16: c668e000 00040100 c668e000 c06c0000 c668fe78 c668e000 c6835ba0 c668fd48 [162980.036003] GPR24: 00000000 73ffffff 74000000 00000001 77fb4000 100fffff 10100000 10100000 [162980.036743] NIP [c000fe18] hugetlb_free_pgd_range+0xc8/0x1e4 [162980.036839] LR [c00e1eec] free_pgtables+0x12c/0x150 [162980.036861] Call Trace: [162980.036939] [c668fcd0] [c00f0774] unlink_anon_vmas+0x1c4/0x214 (unreliable) [162980.037040] [c668fd10] [c00e1eec] free_pgtables+0x12c/0x150 [162980.037118] [c668fd40] [c00eabac] exit_mmap+0xe8/0x1b4 [162980.037210] [c668fda0] [c0019710] mmput.part.9+0x20/0xd8 [162980.037301] [c668fdb0] [c001ecb0] do_exit+0x1f0/0x93c [162980.037386] [c668fe00] [c001f478] do_group_exit+0x40/0xcc [162980.037479] [c668fe10] [c002a76c] get_signal+0x47c/0x614 [162980.037570] [c668fe70] [c0007840] do_signal+0x54/0x244 [162980.037654] [c668ff30] [c0007ae8] do_notify_resume+0x34/0x88 [162980.037744] [c668ff40] [c000dae8] do_user_signal+0x74/0xc4 [162980.037781] Instruction dump: [162980.037821] 7fdff378 81370000 54a3463a 80890020 7d24182e 7c841a14 712a0004 4082ff94 [162980.038014] 2f890000 419e0010 712a0ff0 408200e0 <0fe00000> 54a9000a 7f984840 419d0094 [162980.038216] ---[ end trace c0ceeca8e7a5800a ]--- [162980.038754] BUG: non-zero nr_ptes on freeing mm: 1 [162985.363322] BUG: non-zero nr_ptes on freeing mm: -1 In order to fix this, this patch uses the address space "slices" implemented for BOOK3S/64 and enhanced to support PPC32 by the preceding patch. This patch modifies the context.id on the 8xx to be in the range [1:16] instead of [0:15] in order to identify context.id == 0 as not initialised contexts as done on BOOK3S This patch activates CONFIG_PPC_MM_SLICES when CONFIG_HUGETLB_PAGE is selected for the 8xx Alltough we could in theory have as many slices as PMD entries, the current slices implementation limits the number of low slices to 16. This limitation is not preventing us to fix the initial issue allthough it is suboptimal. It will be cured in a subsequent patch. Fixes: 4b91428699477 ("powerpc/8xx: Implement support of hugepages") Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-06powerpc/mm/slice: Enhance for supporting PPC32Christophe Leroy1-8/+29
In preparation for the following patch which will fix an issue on the 8xx by re-using the 'slices', this patch enhances the 'slices' implementation to support 32 bits CPUs. On PPC32, the address space is limited to 4Gbytes, hence only the low slices will be used. The high slices use bitmaps. As bitmap functions are not prepared to handle bitmaps of size 0, this patch ensures that bitmap functions are called only when SLICE_NUM_HIGH is not nul. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-06powerpc/mm/slice: Remove intermediate bitmap copyChristophe Leroy1-8/+4
bitmap_or() and bitmap_andnot() can work properly with dst identical to src1 or src2. There is no need of an intermediate result bitmap that is copied back to dst in a second step. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-02-23powerpc/mm/drmem: Fix unexpected flag value in ibm,dynamic-memory-v2Bharata B Rao1-3/+3
Memory addtion and removal by count and indexed-count methods temporarily mark the LMBs that are being added/removed by a special flag value DRMEM_LMB_RESERVED. Accessing flags value directly at a few places without proper accessor method is causing two unexpected side-effects: - DRMEM_LMB_RESERVED bit is becoming part of the flags word of drconf_cell_v2 entries in ibm,dynamic-memory-v2 DT property. - This results in extra drconf_cell entries in ibm,dynamic-memory-v2. For example if 1G memory is added, it leads to one entry for 3 LMBs and 1 separate entry for the last LMB. All the 4 LMBs should be defined by one entry here. Fix this by always accessing the flags by its accessor method drmem_lmb_flags(). Fixes: 2b31e3aec1db ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property") Signed-off-by: Bharata B Rao <[email protected]> Reviewed-by: Nathan Fontenot <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-02-16powerpc/pseries: Check for zero filled ibm,dynamic-memory propertyNathan Fontenot1-0/+8
Some versions of QEMU will produce an ibm,dynamic-reconfiguration-memory node with a ibm,dynamic-memory property that is zero-filled. This causes the drmem code to oops trying to parse this property. The fix for this is to validate that the property does contain LMB entries before trying to parse it and bail if the count is zero. Oops: Kernel access of bad area, sig: 11 [#1] DAR: 0000000000000010 NIP read_drconf_v1_cell+0x54/0x9c LR read_drconf_v1_cell+0x48/0x9c Call Trace: __param_initcall_debug+0x0/0x28 (unreliable) drmem_init+0x144/0x2f8 do_one_initcall+0x64/0x1d0 kernel_init_freeable+0x298/0x38c kernel_init+0x24/0x160 ret_from_kernel_thread+0x5c/0xb4 The ibm,dynamic-reconfiguration-memory device tree property generated that causes this: ibm,dynamic-reconfiguration-memory { ibm,lmb-size = <0x0 0x10000000>; ibm,memory-flags-mask = <0xff>; ibm,dynamic-memory = <0x0 0x0 0x0 0x0 0x0 0x0>; linux,phandle = <0x7e57eed8>; ibm,associativity-lookup-arrays = <0x1 0x4 0x0 0x0 0x0 0x0>; ibm,memory-preservation-time = <0x0>; }; Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Cyril Bur <[email protected]> Tested-by: Daniel Black <[email protected]> [mpe: Trim oops report] Signed-off-by: Michael Ellerman <[email protected]>
2018-02-13powerpc/mm/hash64: Store the slot information at the right offset for hugetlbAneesh Kumar K.V4-11/+20
The hugetlb pte entries are at the PMD and PUD level, so we can't use PTRS_PER_PTE to find the second half of the page table. Use the right offset for PUD/PMD to get to the second half of the table. Fixes: bf9a95f9a648 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Ram Pai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>