aboutsummaryrefslogtreecommitdiff
path: root/arch/sparc/mm
AgeCommit message (Collapse)AuthorFilesLines
2014-05-03sparc64: Fix bugs in get_user_pages_fast() wrt. THP.David S. Miller1-1/+1
The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to decide if it needs to bail and return 0. pmd_large() should therefore just check _PAGE_PMD_HUGE. Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we just need to add a valid bit check. Signed-off-by: David S. Miller <[email protected]>
2014-05-03sparc64: Fix huge PMD invalidation.David S. Miller1-0/+11
On sparc64 "present" and "valid" are seperate PTE bits, this allows us to naturally distinguish between the user explicitly asking for PROT_NONE with mprotect() and other situations. However we weren't handling this properly in the huge PMD paths. First of all, the page table walker in the TSB miss path only checks for _PAGE_PMD_HUGE. So the generic pmdp_invalidate() would clear _PAGE_PRESENT but the TLB miss paths would still load it into the TLB as a valid huge PMD. Fix this by clearing the valid bit in pmdp_invalidate(), and also checking the valid bit in USER_PGTABLE_CHECK_PMD_HUGE using "brgez" since _PAGE_VALID is bit 63 in both the sun4u and sun4v pte layouts. Signed-off-by: David S. Miller <[email protected]>
2014-05-03sparc64: Fix executable bit testing in set_pmd_at() paths.David S. Miller1-6/+9
This code was mistakenly using the exec bit from the PMD in all cases, even when the PMD isn't a huge PMD. If it's not a huge PMD, test the exec bit in the individual ptes down in tlb_batch_pmd_scan(). Signed-off-by: David S. Miller <[email protected]>
2014-04-29sparc32: fix sparse warnings in unaligned_32.cSam Ravnborg1-3/+1
Fix following warnings: unaligned_32.c:146:15: warning: symbol 'safe_compute_effective_address' was not declared. Should it be static? unaligned_32.c:235:17: warning: symbol 'kernel_unaligned_trap' was not declared. Should it be static? unaligned_32.c:319:17: warning: symbol 'user_unaligned_trap' was not declared. Should it be static? Add proper declarations in kernel.h + setup.h Signed-off-by: Sam Ravnborg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-29sparc32: fix sparse warning in devices.cSam Ravnborg1-2/+0
Fix following warning: devices.c:114:13: warning: symbol 'device_scan' was not declared. Should it be static? Add prototype to asm/setup.h Signed-off-by: Sam Ravnborg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-29sparc32: fix sparse warnings in setup_32.cSam Ravnborg1-1/+1
Fix following warnings: setup_32.c:106:15: warning: symbol 'cmdline_memory_size' was not declared. Should it be static? setup_32.c:270:16: warning: symbol 'fake_swapper_regs' was not declared. Should it be static? setup_32.c:368:55: warning: Using plain integer as NULL pointer Add missing declaration of cmdline_memory_size and remove the local one in init_32.c fake_swapper_regs was only used locally - so defined static. When replacing 0 with NULL also add a few spaces around operators Signed-off-by: Sam Ravnborg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-29sparc32: fix sparse "Should it be static?" in mm/Sam Ravnborg3-4/+7
Fix following warnings: srmmu.c:870:13: warning: symbol 'srmmu_paging_init' was not declared. Should it be static? iommu.c:430:13: warning: symbol 'ld_mmu_iommu' was not declared. Should it be static? leon_mm.c:21:5: warning: symbol 'srmmu_swprobe_trace' was not declared. Should it be static? Add proper prototypes or define static to fix them. Signed-off-by: Sam Ravnborg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-29sparc32: fix sparse warnings in srmmu.cSam Ravnborg4-7/+11
Fix following warnings: srmmu.c:78:5: warning: symbol 'flush_page_for_dma_global' was not declared. Should it be static? srmmu.c:85:5: warning: symbol 'viking_mxcc_present' was not declared. Should it be static? srmmu.c:103:6: warning: symbol 'srmmu_nocache_bitmap' was not declared. Should it be static? srmmu.c:176:24: warning: Using plain integer as NULL pointer srmmu.c:731:46: warning: Using plain integer as NULL pointer srmmu.c:731:46: warning: Using plain integer as NULL pointer srmmu.c:731:46: warning: Using plain integer as NULL pointer srmmu.c:870:13: warning: symbol 'srmmu_paging_init' was not declared. Should it be static? Add proper prototypes in mm_32.h and drop local prototype in init_32.c Replace 0 with NULL Signed-off-by: Sam Ravnborg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-29sparc32: fix sparse warning in init_32.cSam Ravnborg1-2/+0
Fix following warning: init_32.c:112:22: warning: symbol 'bootmem_init' was not declared. Should it be static? Fix by adding a proper prototype in pgtable_32.h and drop the local prototype in srmmu.c Signed-off-by: Sam Ravnborg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-29sparc32: fix sparse warning in fault_32.cSam Ravnborg2-3/+12
Fix following warning: fault_32.c:38:24: error: symbol 'unhandled_fault' redeclared with different type - different modifiers When this warning was fixed several new warnings popped up - fix them too. Signed-off-by: Sam Ravnborg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-29sparc32: rename mm/srmmu.h to mm/mm_32.hSam Ravnborg3-2/+2
This file will be used for more than just srmmu stuff, so the old name was misleading. Signed-off-by: Sam Ravnborg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-03-17sparc64:tsb.c:use array size macro rather than numberDoug Wilson1-1/+1
This is a small patch which uses ARRAY_SIZE macro rather than a number to make code readability better. Signed-off-by: Doug Wilson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-02-19sparc32: make copy_to/from_user_page() usable from modular codePaul Gortmaker1-0/+2
While copy_to/from_user_page() users are uncommon, there is one in drivers/staging/lustre/lustre/libcfs/linux/linux-curproc.c which leads to the following: ERROR: "sparc32_cachetlb_ops" [drivers/staging/lustre/lustre/libcfs/libcfs.ko] undefined! during routine allmodconfig build coverage. The reason this happens is as follows: In arch/sparc/include/asm/cacheflush_32.h we have: #define flush_cache_page(vma,addr,pfn) \ sparc32_cachetlb_ops->cache_page(vma, addr) #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ do { \ flush_cache_page(vma, vaddr, page_to_pfn(page));\ memcpy(dst, src, len); \ } while (0) #define copy_from_user_page(vma, page, vaddr, dst, src, len) \ do { \ flush_cache_page(vma, vaddr, page_to_pfn(page));\ memcpy(dst, src, len); \ } while (0) However, sparc32_cachetlb_ops isn't exported and hence the error. Signed-off-by: Paul Gortmaker <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-01-28sparc: delete non-required instances of include <linux/init.h>Paul Gortmaker2-2/+0
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. Signed-off-by: Paul Gortmaker <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-01-21memblock: make memblock_set_node() support different memblock_typeTang Chen1-2/+3
[[email protected]: fix powerpc build] Signed-off-by: Tang Chen <[email protected]> Reviewed-by: Zhang Yanfei <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "Rafael J . Wysocki" <[email protected]> Cc: Chen Tang <[email protected]> Cc: Gong Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Larry Woodman <[email protected]> Cc: Len Brown <[email protected]> Cc: Liu Jiang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Taku Izumi <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Renninger <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vasilis Liaskovitis <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-18sparc64: merge fixStephen Rothwell1-2/+0
After merging the final tree, today's linux-next build (sparc64 defconfig) failed like this: arch/sparc/mm/init_64.c: In function 'pte_alloc_one': arch/sparc/mm/init_64.c:2568:9: error: unused variable 'pte' [-Werror=unused-variable] Caused by the merge between commit 37b3a8ff3e08 ("sparc64: Move from 4MB to 8MB huge pages") and commit 1ae9ae5f7df7 ("sparc: handle pgtable_page_ctor() fail") (I had the following merge fix in linux-next, but it didn't seem to propagate upstream - may have forgotten to point it out :-(). Signed-off-by: Stephen Rothwell <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextLinus Torvalds8-202/+147
Pull sparc update from David Miller: 1) Implement support for up to 47-bit physical addresses on sparc64. 2) Support HAVE_CONTEXT_TRACKING on sparc64, from Kirill Tkhai. 3) Fix Simba bridge window calculations, from Kjetil Oftedal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc64: Implement HAVE_CONTEXT_TRACKING sparc64: Add self-IPI support for smp_send_reschedule() sparc: PCI: Fix incorrect address calculation of PCI Bridge windows on Simba-bridges sparc64: Encode huge PMDs using PTE encoding. sparc64: Move to 64-bit PGDs and PMDs. sparc64: Move from 4MB to 8MB huge pages. sparc64: Make PAGE_OFFSET variable. sparc64: Fix inconsistent max-physical-address defines. sparc64: Document the shift counts used to validate linear kernel addresses. sparc64: Define PAGE_OFFSET in terms of physical address bits. sparc64: Use PAGE_OFFSET instead of a magic constant. sparc64: Clean up 64-bit mmap exclusion defines.
2013-11-15sparc: handle pgtable_page_ctor() failKirill A. Shutemov2-6/+10
Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-15mm, thp: do not access mm->pmd_huge_pte directlyKirill A. Shutemov1-6/+6
Currently mm->pmd_huge_pte protected by page table lock. It will not work with split lock. We have to have per-pmd pmd_huge_pte for proper access serialization. For now, let's just introduce wrapper to access mm->pmd_huge_pte. Signed-off-by: Kirill A. Shutemov <[email protected]> Tested-by: Alex Thorlton <[email protected]> Cc: Alex Thorlton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Al Viro <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jones <[email protected]> Cc: David Howells <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Robin Holt <[email protected]> Cc: Sedat Dilek <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-14sparc64: Implement HAVE_CONTEXT_TRACKINGKirill Tkhai1-5/+9
Mark the places when the system are in user or are in kernel. This is used to make full dynticks system (tickless) -- CONFIG_NO_HZ_FULL dependence. Signed-off-by: Kirill Tkhai <[email protected]> CC: David Miller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-13sparc64: Encode huge PMDs using PTE encoding.David S. Miller3-110/+10
Now that we have 64-bits for PMDs we can stop using special encodings for the huge PMD values, and just put real PTEs in there. We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and huge ones. It is the same for both 4U and 4V PTE layouts. We also use _PAGE_SPECIAL to indicate the splitting state, since a huge PMD cannot also be special. All of the PMD --> PTE translation code disappears, and most of the huge PMD bit modifications and tests just degenerate into the PTE operations. In particular USER_PGTABLE_CHECK_PMD_HUGE becomes trivial. As a side effect, normal PMDs don't shift the physical address around. This also speeds up the page table walks in the TLB miss paths since they don't have to do the shifts any more. Another non-trivial aspect is that pte_modify() has to be changed to preserve the _PAGE_PMD_HUGE bits as well as the page size field of the pte. Signed-off-by: David S. Miller <[email protected]>
2013-11-12sparc64: Move to 64-bit PGDs and PMDs.David S. Miller1-1/+1
To make the page tables compact, we were using 32-bit PGDs and PMDs. We only had to support <= 43 bits of physical addresses so this was quite feasible. In order to support larger physical addresses we have to move to 64-bit PGDs and PMDs. Most of the changes are straight-forward: 1) {pgd,pmd}_t --> unsigned long 2) Anything that tries to use plain "unsigned int" types with pgd/pmd values needs to be adjusted. In particular things like "0U" become "0UL". 3) {PGDIR,PMD}_BITS decrease by one. 4) In the assembler page table walkers, use "ldxa" instead of "lduwa" and adjust the low bit masks to clear out the low 3 bits instead of just the low 2 bits during pgd/pmd address formation. Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the swapper_{pg_dir,low_pmd_dir} arrays. This patch does not try to take advantage of having 64-bits in the PMDs to simplify the hugepage code, that will come in a subsequent change. Signed-off-by: David S. Miller <[email protected]>
2013-11-12sparc64: Move from 4MB to 8MB huge pages.David S. Miller3-72/+21
The impetus for this is that we would like to move to 64-bit PMDs and PGDs, but that would result in only supporting a 42-bit address space with the current page table layout. It'd be nice to support at least 43-bits. The reason we'd end up with only 42-bits after making PMDs and PGDs 64-bit is that we only use half-page sized PTE tables in order to make PMDs line up to 4MB, the hardware huge page size we use. So what we do here is we make huge pages 8MB, and fabricate them using 4MB hw TLB entries. Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in places that really need to operate on hardware 4MB pages. Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT, PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up. This makes the pgtable cache completely unused, so remove the code managing it and the state used in mm_context_t. Now we have less spinlocks taken in the page table allocation path. The technique we use to fabricate the 8MB pages is to transfer bit 22 from the missing virtual address into the PTEs physical address field. That takes care of the transparent huge pages case. For hugetlb, we fill things in at the PTE level and that code already puts the sub huge page physical bits into the PTEs, based upon the offset, so there is nothing special we need to do. It all just works out. So, a small amount of complexity in the THP case, but this code is about to get much simpler when we move the 64-bit PMDs as we can move away from the fancy 32-bit huge PMD encoding and just put a real PTE value in there. With bug fixes and help from Bob Picco. Signed-off-by: David S. Miller <[email protected]>
2013-11-12sparc64: Make PAGE_OFFSET variable.David S. Miller2-6/+98
Choose PAGE_OFFSET dynamically based upon cpu type. Original UltraSPARC-I (spitfire) chips only supported a 44-bit virtual address space. Newer chips (T4 and later) support 52-bit virtual addresses and up to 47-bits of physical memory space. Therefore we have to adjust PAGE_SIZE dynamically based upon the capabilities of the chip. Note that this change alone does not allow us to support > 43-bit physical memory, to do that we need to re-arrange our page table support. The current encodings of the pmd_t and pgd_t pointers restricts us to "32 + 11" == 43 bits. This change can waste quite a bit of memory for the various tables. In particular, a future change should work to size and allocate kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically. This isn't easy as we really cannot take a TLB miss when accessing kern_linear_bitmap[]. We'd have to lock it into the TLB or similar. Signed-off-by: David S. Miller <[email protected]> Acked-by: Bob Picco <[email protected]>
2013-11-12sparc64: Document the shift counts used to validate linear kernel addresses.David S. Miller1-1/+3
This way we can see exactly what they are derived from, and in particular how they would change if we were to use a different PAGE_OFFSET value. Signed-off-by: David S. Miller <[email protected]> Acked-by: Bob Picco <[email protected]>
2013-11-12sparc64: Use PAGE_OFFSET instead of a magic constant.David S. Miller1-7/+7
This pertains to all of the computations of the kernel fast TLB miss xor values. Based upon a patch by Bob Picco. Signed-off-by: David S. Miller <[email protected]> Acked-by: Bob Picco <[email protected]>
2013-11-12sparc64: Clean up 64-bit mmap exclusion defines.David S. Miller1-2/+0
Older UltraSPARC chips had an address space hole due to the MMU only supporting 44-bit virtual addresses. The top end of this hole also has the same value as the current definition of PAGE_OFFSET, so this can be confusing. Consolidate the defines for the userspace mmap exclusion range into page_64.h and use them in sys_sparc_64.c and hugetlbpage.c Signed-off-by: David S. Miller <[email protected]> Acked-by: Bob Picco <[email protected]>
2013-09-12arch: mm: pass userspace fault flag to generic fault handlerJohannes Weiner2-5/+13
Unlike global OOM handling, memory cgroup code will invoke the OOM killer in any OOM situation because it has no way of telling faults occuring in kernel context - which could be handled more gracefully - from user-triggered faults. Pass a flag that identifies faults originating in user space from the architecture-specific fault handlers to generic code so that memcg OOM handling can be improved. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: azurIt <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-11mm: migrate: check movability of hugepage in unmap_and_move_huge_page()Naoya Horiguchi1-0/+5
Currently hugepage migration works well only for pmd-based hugepages (mainly due to lack of testing,) so we had better not enable migration of other levels of hugepages until we are ready for it. Some users of hugepage migration (mbind, move_pages, and migrate_pages) do page table walk and check pud/pmd_huge() there, so they are safe. But the other users (softoffline and memory hotremove) don't do this, so without this patch they can try to migrate unexpected types of hugepages. To prevent this, we introduce hugepage_migration_support() as an architecture dependent check of whether hugepage are implemented on a pmd basis or not. And on some architecture multiple sizes of hugepages are available, so hugepage_migration_support() also checks hugepage size. Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Rik van Riel <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-14sparc: delete __cpuinit/__CPUINIT usage from all usersPaul Gortmaker2-7/+7
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/sparc uses of the __cpuinit macros from C files and removes __CPUINIT from assembly files. Note that even though arch/sparc/kernel/trampoline_64.S has instances of ".previous" in it, they are all paired off against explicit ".section" directives, and not implicitly paired with __CPUINIT (unlike mips and arm were). [1] https://lkml.org/lkml/2013/5/20/589 Cc: "David S. Miller" <[email protected]> Cc: [email protected] Signed-off-by: Paul Gortmaker <[email protected]>
2013-07-10[PATCH] sparc32: vm_area_struct access for old Sun SPARCs.Olivier DANET4-16/+16
Commit e4c6bfd2d79d063017ab19a18915f0bc759f32d9 ("mm: rearrange vm_area_struct for fewer cache misses") changed the layout of the vm_area_struct structure, it broke several SPARC32 assembly routines which used numerical constants for accessing the vm_mm field. This patch defines the VMA_VM_MM constant to replace the immediate values. Signed-off-by: Olivier DANET <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-07-04Merge branch 'next' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...
2013-07-03mm/SPARC: prepare for removing num_physpages and simplify mem_init()Jiang Liu2-51/+7
Prepare for removing num_physpages and simplify mem_init(). Signed-off-by: Jiang Liu <[email protected]> Acked-by: Sam Ravnborg <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Tang Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: concentrate modification of totalram_pages into the mm coreJiang Liu2-3/+2
Concentrate code to modify totalram_pages into the mm core, so the arch memory initialized code doesn't need to take care of it. With these changes applied, only following functions from mm core modify global variable totalram_pages: free_bootmem_late(), free_all_bootmem(), free_all_bootmem_node(), adjust_managed_page_count(). With this patch applied, it will be much more easier for us to keep totalram_pages and zone->managed_pages in consistence. Signed-off-by: Jiang Liu <[email protected]> Acked-by: David Howells <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Tang Chen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: change signature of free_reserved_area() to fix building warningsJiang Liu2-4/+4
Change signature of free_reserved_area() according to Russell King's suggestion to fix following build warnings: arch/arm/mm/init.c: In function 'mem_init': arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default] free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL); ^ In file included from include/linux/mman.h:4:0, from arch/arm/mm/init.c:15: include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *' extern unsigned long free_reserved_area(unsigned long start, unsigned long end, mm/page_alloc.c: In function 'free_reserved_area': >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default] In file included from arch/mips/include/asm/page.h:49:0, from include/linux/mmzone.h:20, from include/linux/gfp.h:4, from include/linux/mm.h:8, from mm/page_alloc.c:18: arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int' mm/page_alloc.c: In function 'free_area_init_nodes': mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds] Also address some minor code review comments. Signed-off-by: Jiang Liu <[email protected]> Reported-by: Arnd Bergmann <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Tang Chen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-01Merge tag 'v3.10' into nextBenjamin Herrenschmidt2-2/+9
Merge 3.10 in order to get some of the last minute powerpc changes, resolve conflicts and add additional fixes on top of them.
2013-06-20mm/THP: add pmd args to pgtable deposit and withdraw APIsAneesh Kumar K.V1-2/+3
This will be later used by powerpc THP support. In powerpc we want to use pgtable for storing the hash index values. So instead of adding them to mm_context list, we would like to store them in the second half of pmd Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Reviewed-by: David Gibson <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-19sparc: tsb must be flushed before tlbDave Kleikamp1-1/+1
This fixes a race where a cpu may re-load a tlb from a stale tsb right after it has been flushed by a remote function call. I still see some instability when stressing the system with parallel kernel builds while creating memory pressure by writing to /proc/sys/vm/nr_hugepages, but this patch improves the stability significantly. Signed-off-by: Dave Kleikamp <[email protected]> Acked-by: Bob Picco <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-06-19sparc64 address-congruence propertybob picco1-1/+8
The Machine Description (MD) property "address-congruence-offset" is optional. According to the MD specification the value is assumed 0UL when not present. This caused early boot failure on T5. Signed-off-by: Bob Picco <[email protected]> CC: [email protected] Signed-off-by: David S. Miller <[email protected]>
2013-05-07mm/SPARC: use common help functions to free reserved pagesJiang Liu2-56/+9
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <[email protected]> Acked-by: David S. Miller <[email protected]> Acked-by: Sam Ravnborg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcDavid S. Miller7-59/+181
Merge sparc bug fixes that didn't make it into v3.9 into sparc-next. Signed-off-by: David S. Miller <[email protected]>
2013-04-29sparse-vmemmap: specify vmemmap population range in bytesJohannes Weiner1-4/+3
The sparse code, when asking the architecture to populate the vmemmap, specifies the section range as a starting page and a number of pages. This is an awkward interface, because none of the arch-specific code actually thinks of the range in terms of 'struct page' units and always translates it to bytes first. In addition, later patches mix huge page and regular page backing for the vmemmap. For this, they need to call vmemmap_populate_basepages() on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these are not necessarily multiples of the 'struct page' size and so this unit is too coarse. Just translate the section range into bytes once in the generic sparse code, then pass byte ranges down the stack. Signed-off-by: Johannes Weiner <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Bernhard Schmidt <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Russell King <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Heiko Carstens <[email protected]> Acked-by: David S. Miller <[email protected]> Tested-by: David S. Miller <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-04-29mm/SPARC: use free_highmem_page() to free highmem pages into buddy systemJiang Liu1-10/+2
Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <[email protected]> Cc: "David S. Miller" <[email protected]> Acked-by: Sam Ravnborg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-04-24sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching.David S. Miller1-1/+2
Reported-by: Meelis Roos <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-04-19sparc64: Fix race in TLB batch processing.David S. Miller3-43/+171
As reported by Dave Kleikamp, when we emit cross calls to do batched TLB flush processing we have a race because we do not synchronize on the sibling cpus completing the cross call. So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.) and either flushes are missed or flushes will flush the wrong addresses. Fix this by using generic infrastructure to synchonize on the completion of the cross call. This first required getting the flush_tlb_pending() call out from switch_to() which operates with locks held and interrupts disabled. The problem is that smp_call_function_many() cannot be invoked with IRQs disabled and this is explicitly checked for with WARN_ON_ONCE(). We get the batch processing outside of locked IRQ disabled sections by using some ideas from the powerpc port. Namely, we only batch inside of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a region, we flush TLBs synchronously. 1) Get rid of xcall_flush_tlb_pending and per-cpu type implementations. 2) Do TLB batch cross calls instead via: smp_call_function_many() tlb_pending_func() __flush_tlb_pending() 3) Batch only in lazy mmu sequences: a) Add 'active' member to struct tlb_batch b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE c) Set 'active' in arch_enter_lazy_mmu_mode() d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode() e) Check 'active' in tlb_batch_add_one() and do a synchronous flush if it's clear. 4) Add infrastructure for synchronous TLB page flushes. a) Implement __flush_tlb_page and per-cpu variants, patch as needed. b) Likewise for xcall_flush_tlb_page. c) Implement smp_flush_tlb_page() to invoke the cross-call. d) Wire up global_flush_tlb_page() to the right routine based upon CONFIG_SMP 5) It turns out that singleton batches are very common, 2 out of every 3 batch flushes have only a single entry in them. The batch flush waiting is very expensive, both because of the poll on sibling cpu completeion, as well as because passing the tlb batch pointer to the sibling cpus invokes a shared memory dereference. Therefore, in flush_tlb_pending(), if there is only one entry in the batch perform a completely asynchronous global_flush_tlb_page() instead. Reported-by: Dave Kleikamp <[email protected]> Signed-off-by: David S. Miller <[email protected]> Acked-by: Dave Kleikamp <[email protected]>
2013-04-08sparc64: Do not save/restore interrupts in get_new_mmu_context()Kirill Tkhai1-3/+2
get_new_mmu_context() is always called with interrupts disabled. So it's possible to do this micro optimization. (Also fix the comment to switch_mm, which is called in both cases) Signed-off-by: Kirill Tkhai <[email protected]> CC: David Miller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-03-31sparc/iommu: fix typo s/265KB/256KB/Akinobu Mita1-1/+1
IOMMU_NPTES is 64K PTEs, so the size is 256KB (= 64K * sizeof(iopte_t)) Signed-off-by: Akinobu Mita <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2013-03-31sparc/srmmu: clear trailing edge of bitmap properlyAkinobu Mita1-1/+3
srmmu_nocache_bitmap is cleared by bit_map_init(). But bit_map_init() attempts to clear by memset(), so it can't clear the trailing edge of bitmap properly on big-endian architecture if the number of bits is not a multiple of BITS_PER_LONG. Actually, the number of bits in srmmu_nocache_bitmap is not always a multiple of BITS_PER_LONG. It is calculated as below: bitmap_bits = srmmu_nocache_size >> SRMMU_NOCACHE_BITMAP_SHIFT; srmmu_nocache_size is decided proportionally by the amount of system RAM and it is rounded to a multiple of PAGE_SIZE. SRMMU_NOCACHE_BITMAP_SHIFT is defined as (PAGE_SHIFT - 4). So it can only be said that bitmap_bits is a multiple of 16. This fixes the problem by using bitmap_clear() instead of memset() in bit_map_init() and this also uses BITS_TO_LONGS() to calculate correct size at bitmap allocation time. Signed-off-by: Akinobu Mita <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2013-03-20sparc64: Do not change num_physpages during initmem freeingTkhai Kirill1-2/+0
Common hibernation code looks at num_physpages during suspend and restore. Restore is able to be called from initcall, which is before initmem freeing. This case leads to restore fail. Signed-off-by: Kirill Tkhai <[email protected]> CC: David Miller <[email protected]> CC: Sam Ravnborg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-02-23swap: add per-partition lock for swapfileShaohua Li1-1/+1
swap_lock is heavily contended when I test swap to 3 fast SSD (even slightly slower than swap to 2 such SSD). The main contention comes from swap_info_get(). This patch tries to fix the gap with adding a new per-partition lock. Global data like nr_swapfiles, total_swap_pages, least_priority and swap_list are still protected by swap_lock. nr_swap_pages is an atomic now, it can be changed without swap_lock. In theory, it's possible get_swap_page() finds no swap pages but actually there are free swap pages. But sounds not a big problem. Accessing partition specific data (like scan_swap_map and so on) is only protected by swap_info_struct.lock. Changing swap_info_struct.flags need hold swap_lock and swap_info_struct.lock, because scan_scan_map() will check it. read the flags is ok with either the locks hold. If both swap_lock and swap_info_struct.lock must be hold, we always hold the former first to avoid deadlock. swap_entry_free() can change swap_list. To delete that code, we add a new highest_priority_index. Whenever get_swap_page() is called, we check it. If it's valid, we use it. It's a pity get_swap_page() still holds swap_lock(). But in practice, swap_lock() isn't heavily contended in my test with this patch (or I can say there are other much more heavier bottlenecks like TLB flush). And BTW, looks get_swap_page() doesn't really need the lock. We never free swap_info[] and we check SWAP_WRITEOK flag. The only risk without the lock is we could swapout to some low priority swap, but we can quickly recover after several rounds of swap, so sounds not a big deal to me. But I'd prefer to fix this if it's a real problem. "swap: make each swap partition have one address_space" improved the swapout speed from 1.7G/s to 2G/s. This patch further improves the speed to 2.3G/s, so around 15% improvement. It's a multi-process test, so TLB flush isn't the biggest bottleneck before the patches. [[email protected]: fix it for nommu] [[email protected]: add missing unlock] [[email protected]: get rid of lockdep whinge on sys_swapon] Signed-off-by: Shaohua Li <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Xiao Guangrong <[email protected]> Cc: Dan Magenheimer <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>