aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-12-12kasan: eliminate long stalls during quarantine reductionDmitry Vyukov1-46/+48
Currently we dedicate 1/32 of RAM for quarantine and then reduce it by 1/4 of total quarantine size. This can be a significant amount of memory. For example, with 4GB of RAM total quarantine size is 128MB and it is reduced by 32MB at a time. With 128GB of RAM total quarantine size is 4GB and it is reduced by 1GB. This leads to several problems: - freeing 1GB can take tens of seconds, causes rcu stall warnings and just introduces unexpected long delays at random places - if kmalloc() is called under a mutex, other threads stall on that mutex while a thread reduces quarantine - threads wait on quarantine_lock while one thread grabs a large batch of objects to evict - we walk the uncached list of object to free twice which makes all of the above worse - when a thread frees objects, they are already not accounted against global_quarantine.bytes; as the result we can have quarantine_size bytes in quarantine + unbounded amount of memory in large batches in threads that are in process of freeing Reduce size of quarantine in smaller batches to reduce the delays. The only reason to reduce it in batches is amortization of overheads, the new batch size of 1MB should be well enough to amortize spinlock lock/unlock and few function calls. Plus organize quarantine as a FIFO array of batches. This allows to not walk the list in quarantine_reduce() under quarantine_lock, which in turn reduces contention and is just faster. This improves performance of heavy load (syzkaller fuzzing) by ~20% with 4 CPUs and 32GB of RAM. Also this eliminates frequent (every 5 sec) drops of CPU consumption from ~400% to ~100% (one thread reduces quarantine while others are waiting on a mutex). Some reference numbers: 1. Machine with 4 CPUs and 4GB of memory. Quarantine size 128MB. Currently we free 32MB at at time. With new code we free 1MB at a time (1024 batches, ~128 are used). 2. Machine with 32 CPUs and 128GB of memory. Quarantine size 4GB. Currently we free 1GB at at time. With new code we free 8MB at a time (1024 batches, ~512 are used). 3. Machine with 4096 CPUs and 1TB of memory. Quarantine size 32GB. Currently we free 8GB at at time. With new code we free 4MB at a time (16K batches, ~8K are used). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Andrey Konovalov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12kasan: support panic_on_warnDmitry Vyukov1-0/+2
If user sets panic_on_warn, he wants kernel to panic if there is anything barely wrong with the kernel. KASAN-detected errors are definitely not less benign than an arbitrary kernel WARNING. Panic after KASAN errors if panic_on_warn is set. We use this for continuous fuzzing where we want kernel to stop and reboot on any error. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry Vyukov <[email protected]> Acked-by: Andrey Ryabinin <[email protected]> Cc: Alexander Potapenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: make transparent hugepage size publicHugh Dickins2-0/+15
Test programs want to know the size of a transparent hugepage. While it is commonly the same as the size of a hugetlbfs page (shown as Hugepagesize in /proc/meminfo), that is not always so: powerpc implements transparent hugepages in a different way from hugetlbfs pages, so it's coincidence when their sizes are the same; and x86 and others can support more than one hugetlbfs page size. Add /sys/kernel/mm/transparent_hugepage/hpage_pmd_size to show the THP size in bytes - it's the same for Anonymous and Shmem hugepages. Call it hpage_pmd_size (after HPAGE_PMD_SIZE) rather than hpage_size, in case some transparent support for pud and pgd pages is added later. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Greg Thelen <[email protected]> Cc: David Rientjes <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: add cond_resched() in gather_pte_stats()Hugh Dickins1-0/+1
The other pagetable walks in task_mmu.c have a cond_resched() after walking their ptes: add a cond_resched() in gather_pte_stats() too, for reading /proc/<id>/numa_maps. Only pagemap_pmd_range() has a cond_resched() in its (unusually expensive) pmd_trans_huge case: more should probably be added, but leave them unchanged for now. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gerald Schaefer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: add three more cond_resched() in swapoffHugh Dickins1-7/+6
Add a cond_resched() in the unuse_pmd_range() loop (so as to call it even when pmd none or trans_huge, like zap_pmd_range() does); and in the unuse_mm() loop (since that might skip over many vmas). shmem_unuse() and radix_tree_locate_item() look good enough already. Those were the obvious places, but in fact the stalls came from find_next_to_unuse(), which sometimes scans through many unused entries. Apply scan_swap_map()'s LATENCY_LIMIT of 256 there too; and only go off to test frontswap_map when a used entry is found. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reported-by: Eric Dumazet <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm, page_alloc: keep pcp count and list contents in sync if struct page is ↵Mel Gorman1-2/+10
corrupted Vlastimil Babka pointed out that commit 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") will allow the per-cpu list counter to be out of sync with the per-cpu list contents if a struct page is corrupted. The consequence is an infinite loop if the per-cpu lists get fully drained by free_pcppages_bulk because all the lists are empty but the count is positive. The infinite loop occurs here do { batch_free++; if (++migratetype == MIGRATE_PCPTYPES) migratetype = 0; list = &pcp->lists[migratetype]; } while (list_empty(list)); What the user sees is a bad page warning followed by a soft lockup with interrupts disabled in free_pcppages_bulk(). This patch keeps the accounting in sync. Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> [4.7+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm, rmap: handle anon_vma_prepare() common case inlineVlastimil Babka2-36/+43
anon_vma_prepare() is mostly a large "if (unlikely(...))" block, as the expected common case is that an anon_vma already exists. We could turn the condition around and return 0, but it also makes sense to do it inline and avoid a call for the common case. Bloat-o-meter naturally shows that inlining the check has some code size costs: add/remove: 1/1 grow/shrink: 4/0 up/down: 475/-373 (102) function old new delta __anon_vma_prepare - 359 +359 handle_mm_fault 2744 2796 +52 hugetlb_cow 1146 1170 +24 hugetlb_fault 2123 2145 +22 wp_page_copy 1469 1487 +18 anon_vma_prepare 373 - -373 Checking the asm however confirms that the hot paths now avoid a call, which is moved away. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm, debug: print raw struct page data in __dump_page()Vlastimil Babka1-0/+4
__dump_page() is used when a page metadata inconsistency is detected, either by standard runtime checks, or extra checks in CONFIG_DEBUG_VM builds. It prints some of the relevant metadata, but not the whole struct page, which is based on unions and interpretation is dependent on the context. This means that sometimes e.g. a VM_BUG_ON_PAGE() checks certain field, which is however not printed by __dump_page() and the resulting bug report may then lack clues that could help in determining the root cause. This patch solves the problem by simply printing the whole struct page word by word, so no part is missing, but the interpretation of the data is left to developers. This is similar to e.g. x86_64 raw stack dumps. Example output: page:ffffea00000475c0 count:1 mapcount:0 mapping: (null) index:0x0 flags: 0x100000000000400(reserved) raw: 0100000000000400 0000000000000000 0000000000000000 00000001ffffffff raw: ffffea00000475e0 ffffea00000475e0 0000000000000000 0000000000000000 page dumped because: VM_BUG_ON_PAGE(1) [[email protected]: suggested print_hex_dump()] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: THP page cache support for ppc64Aneesh Kumar K.V6-17/+100
Add arch specific callback in the generic THP page cache code that will deposit and withdarw preallocated page table. Archs like ppc64 use this preallocated table to store the hash pte slot information. Testing: kernel build of the patch series on tmpfs mounted with option huge=always The related thp stat: thp_fault_alloc 72939 thp_fault_fallback 60547 thp_collapse_alloc 603 thp_collapse_alloc_failed 0 thp_file_alloc 253763 thp_file_mapped 4251 thp_split_page 51518 thp_split_page_failed 1 thp_deferred_split_page 73566 thp_split_pmd 665 thp_zero_page_alloc 3 thp_zero_page_alloc_failed 0 [[email protected]: remove unneeded parentheses, per Kirill] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Neuling <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: move vma_is_anonymous check within pmd_move_must_withdrawAneesh Kumar K.V3-15/+18
Independent of whether the vma is for anonymous memory, some arches like ppc64 would like to override pmd_move_must_withdraw(). One option is to encapsulate the vma_is_anonymous() check for general architectures inside pmd_move_must_withdraw() so that is always called and architectures that need unconditional overriding can override this function. ppc64 needs to override the function when the MMU is configured to use hash PTE's. [[email protected]: reworked changelog] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Michael Ellerman <[email protected]> (powerpc) Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Neuling <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: add preempt points into __purge_vmap_area_lazy()Joel Fernandes1-5/+9
Use cond_resched_lock to avoid holding the vmap_area_lock for a potentially long time and thus creating bad latencies for various workloads. [hch: split from a larger patch by Joel, wrote the crappy changelog] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joel Fernandes <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Jisheng Zhang <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Chris Wilson <[email protected]> Cc: John Dias <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: turn vmap_purge_lock into a mutexChristoph Hellwig1-7/+7
The purge_lock spinlock causes high latencies with non RT kernel. This has been reported multiple times on lkml [1] [2] and affects applications like audio. This patch replaces it with a mutex to allow preemption while holding the lock. Thanks to Joel Fernandes for the detailed report and analysis as well as an earlier attempt at fixing this issue. [1] http://lists.openwall.net/linux-kernel/2016/03/23/29 [2] https://lkml.org/lkml/2016/10/9/59 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Jisheng Zhang <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Chris Wilson <[email protected]> Cc: John Dias <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: mark all calls into the vmalloc subsystem as potentially sleepingChristoph Hellwig1-1/+6
We will take a sleeping lock in later in this series, so this adds the proper safeguards. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Jisheng Zhang <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Chris Wilson <[email protected]> Cc: John Dias <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12x86/ldt: use vfree_atomic() to free ldt entriesAndrey Ryabinin1-1/+1
vfree() is going to use sleeping lock. free_ldt_struct() may be called with disabled preemption, therefore we must use vfree_atomic() here. E.g. call trace: vfree() free_ldt_struct() destroy_context_ldt() __mmdrop() finish_task_switch() schedule_tail() ret_from_fork() Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Jisheng Zhang <[email protected]> Cc: Chris Wilson <[email protected]> Cc: John Dias <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12kernel/fork: use vfree_atomic() to free thread stackAndrey Ryabinin1-1/+1
vfree() is going to use sleeping lock. Thread stack freed in atomic context, therefore we must use vfree_atomic() here. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Jisheng Zhang <[email protected]> Cc: Chris Wilson <[email protected]> Cc: John Dias <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: add vfree_atomic()Andrey Ryabinin2-6/+37
We are going to use sleeping lock for freeing vmap. However some vfree() users want to free memory from atomic (but not from interrupt) context. For this we add vfree_atomic() - deferred variation of vfree() which can be used in any atomic context (except NMIs). [[email protected]: tweak comment grammar] [[email protected]: use raw_cpu_ptr() instead of this_cpu_ptr()] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Jisheng Zhang <[email protected]> Cc: Chris Wilson <[email protected]> Cc: John Dias <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: refactor __purge_vmap_area_lazy()Christoph Hellwig1-45/+35
Move the purge_lock synchronization to the callers, move the call to purge_fragmented_blocks_allcpus at the beginning of the function to the callers that need it, move the force_flush behavior to the caller that needs it, and pass start and end by value instead of by reference. No change in behavior. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Jisheng Zhang <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Chris Wilson <[email protected]> Cc: John Dias <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: remove free_unmap_vmap_area_addr()Christoph Hellwig1-13/+8
Just inline it into the only caller. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Jisheng Zhang <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Chris Wilson <[email protected]> Cc: John Dias <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: remove free_unmap_vmap_area_noflush()Christoph Hellwig1-11/+2
Patch series "reduce latency in __purge_vmap_area_lazy", v2. This patch (of 10): Sort out the long lock hold times in __purge_vmap_area_lazy. It is based on a patch from Joel. Inline free_unmap_vmap_area_noflush() it into the only caller. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Jisheng Zhang <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Chris Wilson <[email protected]> Cc: John Dias <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: workingset: update shadow limit to reflect bigger active listJohannes Weiner1-19/+25
Since commit 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") the size of the active file list is no longer limited to half of memory. Increase the shadow node limit accordingly to avoid throwing out shadow entries that might still result in eligible refaults. The exact size of the active list now depends on the overall size of the page cache, but converges toward taking up most of the space: In mm/vmscan.c::inactive_list_is_low(), * total target max * memory ratio inactive * ------------------------------------- * 10MB 1 5MB * 100MB 1 50MB * 1GB 3 250MB * 10GB 10 0.9GB * 100GB 31 3GB * 1TB 101 10GB * 10TB 320 32GB It would be possible to apply the same precise ratios when determining the limit for radix tree nodes containing shadow entries, but since it is merely an approximation of the oldest refault distances in the wild and the code also makes assumptions about the node population density, keep it simple and always target the full cache size. While at it, clarify the comment and the formula for memory footprint. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: workingset: restore refault tracking for single-page filesJohannes Weiner1-8/+1
Shadow entries in the page cache used to be accounted behind the radix tree implementation's back in the upper bits of node->count, and the radix tree code extending a single-entry tree with a shadow entry in root->rnode would corrupt that counter. As a result, we could not put shadow entries at index 0 if the tree didn't have any other entries, and that means no refault detection for any single-page file. Now that the shadow entries are tracked natively in the radix tree's exceptional counter, this is no longer necessary. Extending and shrinking the tree from and to single entries in root->rnode now does the right thing when the entry is exceptional, remove that limitation. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: workingset: move shadow entry tracking to radix tree exceptional trackingJohannes Weiner6-138/+60
Currently, we track the shadow entries in the page cache in the upper bits of the radix_tree_node->count, behind the back of the radix tree implementation. Because the radix tree code has no awareness of them, we rely on random subtleties throughout the implementation (such as the node->count != 1 check in the shrinking code, which is meant to exclude multi-entry nodes but also happens to skip nodes with only one shadow entry, as that's accounted in the upper bits). This is error prone and has, in fact, caused the bug fixed in d3798ae8c6f3 ("mm: filemap: don't plant shadow entries without radix tree node"). To remove these subtleties, this patch moves shadow entry tracking from the upper bits of node->count to the existing counter for exceptional entries. node->count goes back to being a simple counter of valid entries in the tree node and can be shrunk to a single byte. This vastly simplifies the page cache code. All accounting happens natively inside the radix tree implementation, and maintaining the LRU linkage of shadow nodes is consolidated into a single function in the workingset code that is called for leaf nodes affected by a change in the page cache tree. This also removes the last user of the __radix_delete_node() return value. Eliminate it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12lib: radix-tree: update callback for changing leaf nodesJohannes Weiner4-16/+36
Support handing __radix_tree_replace() a callback that gets invoked for all leaf nodes that change or get freed as a result of the slot replacement, to assist users tracking nodes with node->private_list. This prepares for putting page cache shadow entries into the radix tree root again and drastically simplifying the shadow tracking. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Suggested-by: Jan Kara <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12lib: radix-tree: add entry deletion support to __radix_tree_replace()Johannes Weiner1-111/+116
Page cache shadow entry handling will be a lot simpler when it can use a single generic replacement function for pages, shadow entries, and emptying slots. Make __radix_tree_replace() properly account insertions and deletions in node->count and garbage collect nodes as they become empty. Then re-implement radix_tree_delete() on top of it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12lib: radix-tree: check accounting of existing slot replacement usersJohannes Weiner10-40/+64
The bug in khugepaged fixed earlier in this series shows that radix tree slot replacement is fragile; and it will become more so when not only NULL<->!NULL transitions need to be caught but transitions from and to exceptional entries as well. We need checks. Re-implement radix_tree_replace_slot() on top of the sanity-checked __radix_tree_replace(). This requires existing callers to also pass the radix tree root, but it'll warn us when somebody replaces slots with contents that need proper accounting (transitions between NULL entries, real entries, exceptional entries) and where a replacement through the slot pointer would corrupt the radix tree node counts. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Suggested-by: Jan Kara <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12lib: radix-tree: native accounting of exceptional entriesJohannes Weiner4-12/+57
The way the page cache is sneaking shadow entries of evicted pages into the radix tree past the node entry accounting and tracking them manually in the upper bits of node->count is fraught with problems. These shadow entries are marked in the tree as exceptional entries, which are a native concept to the radix tree. Maintain an explicit counter of exceptional entries in the radix tree node. Subsequent patches will switch shadow entry tracking over to that counter. DAX and shmem are the other users of exceptional entries. Since slot replacements that change the entry type from regular to exceptional must now be accounted, introduce a __radix_tree_replace() function that does replacement and accounting, and switch DAX and shmem over. The increase in radix tree node size is temporary. A followup patch switches the shadow tracking to this new scheme and we'll no longer need the upper bits in node->count and shrink that back to one byte. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: workingset: turn shadow node shrinker bugs into warningsJohannes Weiner1-8/+12
When the shadow page shrinker tries to reclaim a radix tree node but finds it in an unexpected state - it should contain no pages, and non-zero shadow entries - there is no need to kill the executing task or even the entire system. Warn about the invalid state, then leave that tree node be. Simply don't put it back on the shadow LRU for future reclaim and move on. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: khugepaged: fix radix tree node leak in shmem collapse error pathJohannes Weiner1-2/+4
The radix tree counts valid entries in each tree node. Entries stored in the tree cannot be removed by simpling storing NULL in the slot or the internal counters will be off and the node never gets freed again. When collapsing a shmem page fails, restore the holes that were filled with radix_tree_insert() with a proper radix tree deletion. Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Jan Kara <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: khugepaged: close use-after-free race during shmem collapsingJohannes Weiner1-0/+5
Patch series "mm: workingset: radix tree subtleties & single-page file refaults", v3. This is another revision of the radix tree / workingset patches based on feedback from Jan and Kirill. This is a follow-up to d3798ae8c6f3 ("mm: filemap: don't plant shadow entries without radix tree node"). That patch fixed an issue that was caused mainly by the page cache sneaking special shadow page entries into the radix tree and relying on subtleties in the radix tree code to make that work. The fix also had to stop tracking refaults for single-page files because shadow pages stored as direct pointers in radix_tree_root->rnode weren't properly handled during tree extension. These patches make the radix tree code explicitely support and track such special entries, to eliminate the subtleties and to restore the thrash detection for single-page files. This patch (of 9): When a radix tree iteration drops the tree lock, another thread might swoop in and free the node holding the current slot. The iteration needs to do another tree lookup from the current index to continue. [[email protected]: re-lookup for replacement] Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12include/linux/backing-dev-defs.h: shrink struct backing_dev_infoAndrew Morton1-1/+1
Move the 4-byte `capabilities' field next to other 4-byte things. Shrinks sizeof(backing_dev_info) by 8 bytes on x86_64. Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: don't cap request size based on read-ahead settingJens Axboe4-11/+31
We ran into a funky issue, where someone doing 256K buffered reads saw 128K requests at the device level. Turns out it is read-ahead capping the request size, since we use 128K as the default setting. This doesn't make a lot of sense - if someone is issuing 256K reads, they should see 256K reads, regardless of the read-ahead setting, if the underlying device can support a 256K read in a single command. This patch introduces a bdi hint, io_pages. This is the soft max IO size for the lower level, I've hooked it up to the bdev settings here. Read-ahead is modified to issue the maximum of the user request size, and the read-ahead max size, but capped to the max request size on the device side. The latter is done to avoid reading ahead too much, if the application asks for a huge read. With this patch, the kernel behaves like the application expects. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Acked-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12shmem: fix compilation warnings on unused functionsJérémy Lefaure1-0/+2
Compiling shmem.c with SHMEM and TRANSAPRENT_HUGE_PAGECACHE enabled raises warnings on two unused functions when CONFIG_TMPFS and CONFIG_SYSFS are both disabled: mm/shmem.c:390:20: warning: `shmem_format_huge' defined but not used [-Wunused-function] static const char *shmem_format_huge(int huge) ^~~~~~~~~~~~~~~~~ mm/shmem.c:373:12: warning: `shmem_parse_huge' defined but not used [-Wunused-function] static int shmem_parse_huge(const char *str) ^~~~~~~~~~~~~~~~ A conditional compilation on tmpfs or sysfs removes the warnings. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jérémy Lefaure <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12fs/fs-writeback.c: remove redundant if checkTahsin Erdogan1-9/+7
b_more_io non-empty check is already preceded by an opposite check. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tahsin Erdogan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm/filemap.c: add comment for confusing logic in page_cache_tree_insert()Kirill A. Shutemov1-1/+4
Unlike THP, hugetlb pages are represented by one entry in the radix-tree. [[email protected]: tweak comment] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: cma: make linux/cma.h standalone includibleThierry Reding1-0/+3
The header uses types and definitions from the linux/init.h as well as linux/types.h headers without explicitly including them. This causes a failure to compile if they are not implicitly pulled in by includers. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thierry Reding <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: disable numa migration faults for dax vmasDan Williams1-0/+8
Mark dax vmas as not migratable to exclude them from task_numa_work(). This is especially relevant for device-dax which wants to ensure predictable access latency and not incur periodic faults. [[email protected]: add comment] Link: http://lkml.kernel.org/r/147892450132.22062.16875659431109209179.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reported-by: Aneesh Kumar K.V <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm/pkeys: generate pkey system call code only if ARCH_HAS_PKEYS is selectedHeiko Carstens1-0/+4
Having code for the pkey_mprotect, pkey_alloc and pkey_free system calls makes only sense if ARCH_HAS_PKEYS is selected. If not selected these system calls will always return -ENOSPC or -EINVAL. To simplify things and have less code generate the pkey system call code only if ARCH_HAS_PKEYS is selected. For architectures which have already wired up the system calls, but do not select ARCH_HAS_PKEYS this will result in less generated code and a different return code: the three system calls will now always return -ENOSYS, using the cond_syscall mechanism. For architectures which have not wired up the system calls less unreachable code will be generated. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Heiko Carstens <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12dt: add documentation of "hotpluggable" memory propertyReza Arbab1-0/+7
Summarize the "hotpluggable" property of dt memory nodes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Reza Arbab <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Fontenot <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Rob Herring <[email protected]> Cc: Stewart Smith <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12of/fdt: mark hotpluggable memoryReza Arbab3-1/+21
When movable nodes are enabled, any node containing only hotpluggable memory is made movable at boot time. On x86, hotpluggable memory is discovered by parsing the ACPI SRAT, making corresponding calls to memblock_mark_hotplug(). If we introduce a dt property to describe memory as hotpluggable, configs supporting early fdt may then also do this marking and use movable nodes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Reza Arbab <[email protected]> Tested-by: Balbir Singh <[email protected]> Acked-by: Balbir Singh <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Fontenot <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Rob Herring <[email protected]> Cc: Stewart Smith <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: enable CONFIG_MOVABLE_NODE on non-x86 archesReza Arbab1-1/+1
To support movable memory nodes (CONFIG_MOVABLE_NODE), at least one of the following must be true: 1. This config has the capability to identify movable nodes at boot. Right now, only x86 can do this. 2. Our config supports memory hotplug, which means that a movable node can be created by hotplugging all of its memory into ZONE_MOVABLE. Fix the Kconfig definition of CONFIG_MOVABLE_NODE, which currently recognizes (1), but not (2). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Reza Arbab <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Acked-by: Balbir Singh <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Fontenot <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Rob Herring <[email protected]> Cc: Stewart Smith <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: remove x86-only restriction of movable_nodeReza Arbab3-21/+25
In commit c5320926e370 ("mem-hotplug: introduce movable_node boot option"), the memblock allocation direction is changed to bottom-up and then back to top-down like this: 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node(). 2. memblock_set_bottom_up(false), called by x86's numa_init(). Even though (1) occurs in generic mm code, it is wrapped by #ifdef CONFIG_MOVABLE_NODE, which depends on X86_64. This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches, things will be unbalanced. (1) will happen for them, but (2) will not. This toggle was added in the first place because x86 has a delay between adding memblocks and marking them as hotpluggable. Since other arches do this marking either immediately or not at all, they do not require the bottom-up toggle. So, resolve things by moving (1) from cmdline_parse_movable_node() to x86's setup_arch(), immediately after the movable_node parameter has been parsed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Reza Arbab <[email protected]> Acked-by: Balbir Singh <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Fontenot <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Rob Herring <[email protected]> Cc: Stewart Smith <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12powerpc/mm: allow memory hotplug into a memoryless nodeReza Arbab1-12/+1
Patch series "enable movable nodes on non-x86 configs", v7. This patchset allows more configs to make use of movable nodes. When CONFIG_MOVABLE_NODE is selected, there are two ways to introduce such nodes into the system: 1. Discover movable nodes at boot. Currently this is only possible on x86, but we will enable configs supporting fdt to do the same. 2. Hotplug and online all of a node's memory using online_movable. This is already possible on any config supporting memory hotplug, not just x86, but the Kconfig doesn't say so. We will fix that. We'll also remove some cruft on power which would prevent (2). This patch (of 5): Remove the check which prevents us from hotplugging into an empty node. The original commit b226e4621245 ("[PATCH] powerpc: don't add memory to empty node/zone"), states that this was intended to be a temporary measure. It is a workaround for an oops which no longer occurs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Reza Arbab <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Acked-by: Balbir Singh <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Nathan Fontenot <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Rob Herring <[email protected]> Cc: Stewart Smith <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm/mempolicy.c: forbid static or relative flags for local NUMA modePiotr Kwapulinski1-1/+3
The MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES flags are irrelevant when setting them for MPOL_LOCAL NUMA memory policy via set_mempolicy or mbind. Return the "invalid argument" from set_mempolicy and mbind whenever any of these flags is passed along with MPOL_LOCAL. It is consistent with MPOL_PREFERRED passed with empty nodemask. It slightly shortens the execution time in paths where these flags are used e.g. when trying to rebind the NUMA nodes for changes in cgroups cpuset mems (mpol_rebind_preferred()) or when just printing the mempolicy structure (/proc/PID/numa_maps). Isolated tests done. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Piotr Kwapulinski <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Liang Chen <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Nathan Zimmer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: fix up get_user_pages* commentsLorenzo Stoakes1-10/+6
In the previous round of get_user_pages* changes comments attached to __get_user_pages_unlocked() and get_user_pages_unlocked() were rendered incorrect, this patch corrects them. In addition the get_user_pages_unlocked() comment seems to have already been outdated as it referred to tsk, mm parameters which were removed in c12d2da5 ("mm/gup: Remove the macro overload API migration helpers from the get_user*() APIs"), this patch fixes this also. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lorenzo Stoakes <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: remove the page size change check in tlb_remove_pageAneesh Kumar K.V7-81/+15
Now that we check for page size change early in the loop, we can partially revert e9d55e157034a ("mm: change the interface for __tlb_remove_page"). This simplies the code much, by removing the need to track the last address with which we adjusted the range. We also go back to the older way of filling the mmu_gather array, ie, we add an entry and then check whether the gather batch is full. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: add tlb_remove_check_page_size_change to track page size changeAneesh Kumar K.V11-1/+78
With commit e77b0852b551 ("mm/mmu_gather: track page size with mmu gather and force flush if page size change") we added the ability to force a tlb flush when the page size change in a mmu_gather loop. We did that by checking for a page size change every time we added a page to mmu_gather for lazy flush/remove. We can improve that by moving the page size change check early and not doing it every time we add a page. This also helps us to do tlb flush when invalidating a range covering dax mapping. Wrt dax mapping we don't have a backing struct page and hence we don't call tlb_remove_page, which earlier forced the tlb flush on page size change. Moving the page size change check earlier means we will do the same even for dax mapping. We also avoid doing this check on architecture other than powerpc. In a later patch we will remove page size check from tlb_remove_page(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm/hugetlb: add tlb_remove_hugetlb_entry for handling hugetlb pagesAneesh Kumar K.V7-1/+20
This add tlb_remove_hugetlb_entry similar to tlb_remove_pmd_tlb_entry. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: update mmu_gather range correctlyAneesh Kumar K.V1-12/+31
We use __tlb_adjust_range to update range convered by mmu_gather struct. We later use the 'start' and 'end' to do a mmu_notifier_invalidate_range in tlb_flush_mmu_tlbonly(). Update the 'end' correctly in __tlb_adjust_range so that we call mmu_notifier_invalidate_range with the correct range values. Wrt tlbflush, this should not have any impact, because a flush with correct start address will flush tlb mapping for the range. Also add comment w.r.t updating the range when we free pagetable pages. For now we don't support a range based page table cache flush. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12mm: use the correct page size when removing the pageAneesh Kumar K.V1-2/+2
We are removing a pmd hugepage here. Use the correct page size. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-12shmem: avoid maybe-uninitialized warningArnd Bergmann1-3/+1
After enabling -Wmaybe-uninitialized warnings, we get a false-postive warning for shmem: mm/shmem.c: In function `shmem_getpage_gfp': include/linux/spinlock.h:332:21: error: `info' may be used uninitialized in this function [-Werror=maybe-uninitialized] This can be easily avoided, since the correct 'info' pointer is known at the time we first enter the function, so we can simply move the initialization up. Moving it before the first label avoids the warning and lets us remove two later initializations. Note that the function is so hard to read that it not only confuses the compiler, but also most readers and without this patch it could\ easily break if one of the 'goto's changed. Link: https://www.spinics.net/lists/kernel/msg2368133.html Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>