aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2010-11-09perf_events: Fix perf_counter_mmap() hook in mprotect()Pekka Enberg1-1/+1
As pointed out by Linus, commit dab5855 ("perf_counter: Add mmap event hooks to mprotect()") is fundamentally wrong as mprotect_fixup() can free 'vma' due to merging. Fix the problem by moving perf_event_mmap() hook to mprotect_fixup(). Note: there's another successful return path from mprotect_fixup() if old flags equal to new flags. We don't, however, need to call perf_event_mmap() there because 'perf' already knows the VMA is executable. Reported-by: Dave Jones <[email protected]> Analyzed-by: Linus Torvalds <[email protected]> Cc: Ingo Molnar <[email protected]> Reviewed-by: Peter Zijlstra <[email protected]> Signed-off-by: Pekka Enberg <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-03vmstat: fix offset calculation on void*Wu Fengguang1-1/+1
Fix regression introduced by commit 79da826aee6 ("writeback: report dirty thresholds in /proc/vmstat"). The incorrect pointer arithmetic can result in problems like this: BUG: unable to handle kernel paging request at 07c06d16 IP: [<c050c336>] strnlen+0x6/0x20 Call Trace: [<c050a249>] ? string+0x39/0xe0 [<c042be6b>] ? __wake_up_common+0x4b/0x80 [<c050afcc>] ? vsnprintf+0x1ec/0x380 [<c04b380e>] ? seq_printf+0x2e/0x60 [<c04829a6>] ? vmstat_show+0x26/0x30 [<c04b3bb6>] ? seq_read+0xa6/0x380 [<c04b3b10>] ? seq_read+0x0/0x380 [<c04d5d2f>] ? proc_reg_read+0x5f/0x90 [<c049c4a1>] ? vfs_read+0xa1/0x140 [<c04d5cd0>] ? proc_reg_read+0x0/0x90 [<c049c981>] ? sys_read+0x41/0x70 [<c0402bd0>] ? sysenter_do_call+0x12/0x26 Reported-by: Tetsuo Handa <[email protected]> Cc: Michael Rubin <[email protected]> Signed-off-by: Wu Fengguang <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-02Release page reference during page fault retryMichel Lespinasse1-1/+3
This slipped by when unifying the filemap and swap versions of lock_page_or_retry()... Signed-off-by: Michel Lespinasse <[email protected]> Acked-by: Rik van Riel <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-30audit mmapAl Viro2-0/+4
Normal syscall audit doesn't catch 5th argument of syscall. It also doesn't catch the contents of userland structures pointed to be syscall argument, so for both old and new mmap(2) ABI it doesn't record the descriptor we are mapping. For old one it also misses flags. Signed-off-by: Al Viro <[email protected]>
2010-10-29convert get_sb_nodev() usersAl Viro1-5/+5
Signed-off-by: Al Viro <[email protected]>
2010-10-28numa: fix slab_node(MPOL_BIND)Eric Dumazet1-1/+1
When a node contains only HighMem memory, slab_node(MPOL_BIND) dereferences a NULL pointer. [ This code seems to go back all the way to commit 19770b32609b: "mm: filter based on a nodemask as well as a gfp_mask". Which was back in April 2008, and it got merged into 2.6.26. - Linus ] Signed-off-by: Eric Dumazet <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300Linus Torvalds1-1/+1
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300: (44 commits) MN10300: Save frame pointer in thread_info struct rather than global var MN10300: Change "Matsushita" to "Panasonic". MN10300: Create a defconfig for the ASB2364 board MN10300: Update the ASB2303 defconfig MN10300: ASB2364: Add support for SMSC911X and SMC911X MN10300: ASB2364: Handle the IRQ multiplexer in the FPGA MN10300: Generic time support MN10300: Specify an ELF HWCAP flag for MN10300 Atomic Operations Unit support MN10300: Map userspace atomic op regs as a vmalloc page MN10300: And Panasonic AM34 subarch and implement SMP MN10300: Delete idle_timestamp from irq_cpustat_t MN10300: Make various interrupt priority settings configurable MN10300: Optimise do_csum() MN10300: Implement atomic ops using atomic ops unit MN10300: Make the FPU operate in non-lazy mode under SMP MN10300: SMP TLB flushing MN10300: Use the [ID]PTEL2 registers rather than [ID]PTEL for TLB control MN10300: Make the use of PIDR to mark TLB entries controllable MN10300: Rename __flush_tlb*() to local_flush_tlb*() MN10300: AM34 erratum requires MMUCTR read and write on exception entry ...
2010-10-27fuse: use release_pages()Miklos Szeredi1-0/+1
Replace iterated page_cache_release() with release_pages(), which is faster and shorter. Needs release_pages() to be exported to modules. Suggested-by: Andrew Morton <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27memcg: generic filestat update interfaceKAMEZAWA Hiroyuki1-7/+18
This patch extracts the core logic from mem_cgroup_update_file_mapped() as mem_cgroup_update_file_stat() and adds a wrapper. As a planned future update, memory cgroup has to count dirty pages to implement dirty_ratio/limit. And more, the number of dirty pages is required to kick flusher thread to start writeback. (Now, no kick.) This patch is preparation for it and makes other statistics implementation clearer. Just a clean up. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Balbir Singh <[email protected]> Reviewed-by: Greg Thelen <[email protected]> Cc: Daisuke Nishimura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27memcg: cpu hotplug aware quick acount_move detectionKAMEZAWA Hiroyuki1-7/+30
An event counter MEM_CGROUP_ON_MOVE is used for quick check whether file stat update can be done in async manner or not. Now, it use percpu counter and for_each_possible_cpu to update. This patch replaces for_each_possible_cpu to for_each_online_cpu and adds necessary synchronization logic at CPU HOTPLUG. [[email protected]: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27memcg: cpu hotplug aware percpu count updatesKAMEZAWA Hiroyuki1-9/+93
Now, memcgroup's per cpu coutner uses for_each_possible_cpu() to get the value. It's better to use for_each_online_cpu() and a cpu hotplug handler. This patch only handles statistics counter. MEM_CGROUP_ON_MOVE will be handled in another patch. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27memcg: use for_each_mem_cgroupKAMEZAWA Hiroyuki1-87/+83
In memory cgroup management, we sometimes have to walk through subhierarchy of cgroup to gather informaiton, or lock something, etc. Now, to do that, mem_cgroup_walk_tree() function is provided. It calls given callback function per cgroup found. But the bad thing is that it has to pass a fixed style function and argument, "void*" and it adds much type casting to memcontrol.c. To make the code clean, this patch replaces walk_tree() with for_each_mem_cgroup_tree(iter, root) An iterator style call. The good point is that iterator call doesn't have to assume what kind of function is called under it. A bad point is that it may cause reference-count leak if a caller use "break" from the loop by mistake. I think the benefit is larger. The modified code seems straigtforward and easy to read because we don't have misterious callbacks and pointer cast. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27memcg: avoid lock in updating file_mapped (Was fix race in file_mapped ↵KAMEZAWA Hiroyuki1-14/+85
accouting flag management At accounting file events per memory cgroup, we need to find memory cgroup via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup() for guarantee pc->mem_cgroup is not overwritten while we make use of it. But, considering the context which page-cgroup for files are accessed, we can use alternative light-weight mutual execusion in the most case. At handling file-caches, the only race we have to take care of is "moving" account, IOW, overwriting page_cgroup->mem_cgroup. (See comment in the patch) Unlike charge/uncharge, "move" happens not so frequently. It happens only when rmdir() and task-moving (with a special settings.) This patch adds a race-checker for file-cache-status accounting v.s. account moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added. The routine for account move 1. Increment it before start moving 2. Call synchronize_rcu() 3. Decrement it after the end of moving. By this, file-status-counting routine can check it needs to call lock_page_cgroup(). In most case, I doesn't need to call it. Following is a perf data of a process which mmap()/munmap 32MB of file cache in a minute. Before patch: 28.25% mmap mmap [.] main 22.64% mmap [kernel.kallsyms] [k] page_fault 9.96% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped 3.67% mmap [kernel.kallsyms] [k] filemap_fault 3.50% mmap [kernel.kallsyms] [k] unmap_vmas 2.99% mmap [kernel.kallsyms] [k] __do_fault 2.76% mmap [kernel.kallsyms] [k] find_get_page After patch: 30.00% mmap mmap [.] main 23.78% mmap [kernel.kallsyms] [k] page_fault 5.52% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped 3.81% mmap [kernel.kallsyms] [k] unmap_vmas 3.26% mmap [kernel.kallsyms] [k] find_get_page 3.18% mmap [kernel.kallsyms] [k] __do_fault 3.03% mmap [kernel.kallsyms] [k] filemap_fault 2.40% mmap [kernel.kallsyms] [k] handle_mm_fault 2.40% mmap [kernel.kallsyms] [k] do_page_fault This patch reduces memcg's cost to some extent. (mem_cgroup_update_file_mapped is called by both of map/unmap) Note: It seems some more improvements are required..but no idea. maybe removing set/unset flag is required. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Greg Thelen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27memcg: fix race in file_mapped accouting flag managementKAMEZAWA Hiroyuki1-1/+2
Presently memory cgroup accounts file-mapped by counter and flag. counter is working in the same way with zone_stat but FileMapped flag only exists in memcg (for helping move_account). This flag can be updated wrongly in a case. Assume CPU0 and CPU1 and a thread mapping a page on CPU0, another thread unmapping it on CPU1. CPU0 CPU1 rmv rmap (mapcount 1->0) add rmap (mapcount 0->1) lock_page_cgroup() memcg counter+1 (some delay) set MAPPED FLAG. unlock_page_cgroup() lock_page_cgroup() memcg counter-1 clear MAPPED flag In the above sequence counter is properly updated but FLAG is not. This means that representing a state by a flag which is maintained by counter needs some special care. To handle this, when clearing a flag, this patch check mapcount directly and clear the flag only when mapcount == 0. (if mapcount >0, someone will make it to zero later and flag will be cleared.) Reverse case, dec-after-inc cannot be a problem because page_table_lock() works well for it. (IOW, to make above sequence, 2 processes should touch the same page at once with map/unmap.) Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Greg Thelen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27mm,x86: fix kmap_atomic_push vs ioremap_32.cPeter Zijlstra1-1/+5
It appears i386 uses kmap_atomic infrastructure regardless of CONFIG_HIGHMEM which results in a compile error when highmem is disabled. Cure this by providing the needed few bits for both CONFIG_HIGHMEM and CONFIG_X86_32. Signed-off-by: Peter Zijlstra <[email protected]> Reported-by: Chris Wilson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-27MN10300: Save frame pointer in thread_info struct rather than global varDavid Howells1-1/+1
Save the current exception frame pointer in the thread_info struct rather than in a global variable as the latter makes SMP tricky, especially when preemption is also enabled. This also replaces __frame with current_frame() and rearranges header file inclusions to make it all compile. Signed-off-by: David Howells <[email protected]> Acked-by: Akira Takeuchi <[email protected]>
2010-10-26Merge branch 'for-linus' of ↵Linus Torvalds2-6/+7
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...
2010-10-26kernel: remove PF_FLUSHERPeter Zijlstra1-1/+1
PF_FLUSHER is only ever set, not tested, remove it. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26use clear_page()/copy_page() in favor of memset()/memcpy() on whole pagesJan Beulich1-1/+1
After all that's what they are intended for. Signed-off-by: Jan Beulich <[email protected]> Cc: Miklos Szeredi <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26replace nested max/min macros with {max,min}3 macroHagen Paul Pfeifer1-1/+1
Use the new {max,min}3 macros to save some cycles and bytes on the stack. This patch substitutes trivial nested macros with their counterpart. Signed-off-by: Hagen Paul Pfeifer <[email protected]> Cc: Joe Perches <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hartley Sweeten <[email protected]> Cc: Russell King <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Sean Hefty <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: do_migrate_range: reduce list_empty() checkBob Liu1-12/+9
Simple code for reducing list_empty(&source) check. Signed-off-by: Bob Liu <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Wu Fengguang <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: do_migrate_range: exit loop if not_managed is trueBob Liu1-4/+6
If not_managed is true all pages will be putback to lru, so break the loop earlier to skip other pages isolate. Signed-off-by: Bob Liu <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Wu Fengguang <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: page_isolation: codeclean fix comment and rm unneeded val initBob Liu1-2/+1
__test_page_isolated_in_pageblock() returns 1 if all pages in the range are isolated, so fix the comment. Variable `pfn' will be initialised in the following loop so remove it. Signed-off-by: Bob Liu <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: fix is_mem_section_removable() page_order BUG_ON checkKAMEZAWA Hiroyuki1-1/+1
page_order() is called by memory hotplug's user interface to check the section is removable or not. (is_mem_section_removable()) It calls page_order() withoug holding zone->lock. So, even if the caller does if (PageBuddy(page)) ret = page_order(page) ... The caller may hit BUG_ON(). For fixing this, there are 2 choices. 1. add zone->lock. 2. remove BUG_ON(). is_mem_section_removable() is used for some "advice" and doesn't need to be 100% accurate. This is_removable() can be called via user program.. We don't want to take this important lock for long by user's request. So, this patch removes BUG_ON(). Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Wu Fengguang <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm/hugetlb.c: add missing spin_lock() to hugetlb_cow()Dean Nelson1-1/+4
Add missing spin_lock() of the page_table_lock before an error return in hugetlb_cow(). Callers of hugtelb_cow() expect it to be held upon return. Signed-off-by: Dean Nelson <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: fix error reporting in move_pages() syscallGleb Natapov1-2/+2
The vma returned by find_vma does not necessarily include the target address. If this happens the code tries to follow a page outside of any vma and returns ENOENT instead of EFAULT. Signed-off-by: Gleb Natapov <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26/proc/swaps: support pollingKay Sievers1-1/+48
System management wants to subscribe to changes in swap configuration. Make /proc/swaps pollable like /proc/mounts. [[email protected]: document proc_poll_event] Signed-off-by: Kay Sievers <[email protected]> Acked-by: Greg KH <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: add vzalloc() and vzalloc_node() helpersDave Young2-3/+92
Add vzalloc() and vzalloc_node() to encapsulate the vmalloc-then-memset-zero operation. Use __GFP_ZERO to zero fill the allocated memory. Signed-off-by: Dave Young <[email protected]> Cc: Christoph Lameter <[email protected]> Acked-by: Greg Ungerer <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm/memory_hotplug.c: make scan_lru_pages() staticAndrew Morton1-1/+1
Reported-by: KOSAKI Motohiro <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26vmstat: include compaction.h when CONFIG_COMPACTIONNamhyung Kim1-0/+2
This removes following warning from sparse: mm/vmstat.c:466:5: warning: symbol 'fragmentation_index' was not declared. Should it be static? [[email protected]: move the include to top-of-file] Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26vmalloc: annotate lock context change on s_start/stop()Namhyung Kim1-0/+2
s_start() and s_stop() grab/release vmlist_lock but were missing proper annotations. Add them. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26vmalloc: rename temporary variable in __insert_vmap_area()Namhyung Kim1-4/+4
Rename redundant 'tmp' to fix following sparse warnings: mm/vmalloc.c:296:34: warning: symbol 'tmp' shadows an earlier one mm/vmalloc.c:293:24: originally declared here Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26rmap: make anon_vma_chain_free() staticNamhyung Kim1-1/+1
Make anon_vma_chain_free() static. It is called only in rmap.c and the corresponding alloc function is already static. Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26rmap: wrap page_check_address() using __cond_lock()Namhyung Kim1-1/+1
The page_check_address() conditionally grabs *@ptlp in case of returning non-NULL. Rename and wrap it using __cond_lock() removes following warnings from sparse: mm/rmap.c:472:9: warning: context imbalance in 'page_mapped_in_vma' - unexpected unlock mm/rmap.c:524:9: warning: context imbalance in 'page_referenced_one' - unexpected unlock mm/rmap.c:706:9: warning: context imbalance in 'page_mkclean_one' - unexpected unlock mm/rmap.c:1066:9: warning: context imbalance in 'try_to_unmap_one' - unexpected unlock Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26rmap: annotate lock context change on page_[un]lock_anon_vma()Namhyung Kim1-1/+3
The page_lock_anon_vma() conditionally grabs RCU and anon_vma lock but page_unlock_anon_vma() releases them unconditionally. This leads sparse to complain about context imbalance. Annotate them. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: wrap follow_pte() using __cond_lock()Namhyung Kim1-1/+12
The follow_pte() conditionally grabs *@ptlp in case of returning 0. Rename and wrap it using __cond_lock() removes following warnings: mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: add lock release annotation on do_wp_page()Namhyung Kim1-0/+1
The do_wp_page() releases @ptl but was missing proper annotation. Add it. This removes following warnings from sparse: mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: wrap get_locked_pte() using __cond_lock()Namhyung Kim1-1/+1
The get_locked_pte() conditionally grabs 'ptl' in case of returning non-NULL. This leads sparse to complain about context imbalance. Rename and wrap it using __cond_lock() to make sparse happy. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: add casts to/from gfp_t in gfp_to_alloc_flags()Namhyung Kim1-2/+2
This removes following warning from sparse: mm/page_alloc.c:1934:9: warning: restricted gfp_t degrades to integer Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: remove temporary variable on generic_file_direct_write()Namhyung Kim1-4/+4
'end' shadows earlier one and is not necessary at all. Remove it and use 'pos' instead. This removes following sparse warnings: mm/filemap.c:2180:24: warning: symbol 'end' shadows an earlier one mm/filemap.c:2132:25: originally declared here Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: retry page fault when blocking on disk transferMichel Lespinasse2-3/+23
This change reduces mmap_sem hold times that are caused by waiting for disk transfers when accessing file mapped VMAs. It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call site wants mmap_sem to be released if blocking on a pending disk transfer. In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and do_page_fault() will then re-acquire mmap_sem and retry the page fault. It is expected that the retry will hit the same page which will now be cached, and thus it will complete with a low mmap_sem hold time. Tests: - microbenchmark: thread A mmaps a large file and does random read accesses to the mmaped area - achieves about 55 iterations/s. Thread B does mmap/munmap in a loop at a separate location - achieves 55 iterations/s before, 15000 iterations/s after. - We are seeing related effects in some applications in house, which show significant performance regressions when running without this change. [[email protected]: fix warning & crash] Signed-off-by: Michel Lespinasse <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Nick Piggin <[email protected]> Reviewed-by: Wu Fengguang <[email protected]> Cc: Ying Han <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Acked-by: "H. Peter Anvin" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: filemap_fault: unique path for locking pageMichel Lespinasse1-9/+11
Introduce a single location where filemap_fault() locks the desired page. There used to be two such places, depending if the initial find_get_page() was successful or not. Signed-off-by: Michel Lespinasse <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Nick Piggin <[email protected]> Reviewed-by: Wu Fengguang <[email protected]> Cc: Ying Han <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: add a might_sleep_if() to dma_pool_alloc()Dima Zavin1-0/+2
Buggy drivers (e.g. fsl_udc) could call dma_pool_alloc from atomic context with GFP_KERNEL. In most instances, the first pool_alloc_page call would succeed and the sleeping functions would never be called. This allowed the buggy drivers to slip through the cracks. Add a might_sleep_if() checking for __GFP_WAIT in flags. Signed-off-by: Dima Zavin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: remove pte_*map_nested()Peter Zijlstra2-4/+4
Since we no longer need to provide KM_type, the whole pte_*map_nested() API is now redundant, remove it. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Chris Metcalf <[email protected]> Cc: David Howells <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Russell King <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: David Miller <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: stack based kmap_atomic()Peter Zijlstra1-58/+4
Keep the current interface but ignore the KM_type and use a stack based approach. The advantage is that we get rid of crappy code like: #define __KM_PTE \ (in_nmi() ? KM_NMI_PTE : \ in_irq() ? KM_IRQ_PTE : \ KM_PTE0) and in general can stop worrying about what context we're in and what kmap slots might be appropriate for that. The downside is that FRV kmap_atomic() gets more expensive. For now we use a CPP trick suggested by Andrew: #define kmap_atomic(page, args...) __kmap_atomic(page) to avoid having to touch all kmap_atomic() users in a single patch. [ not compiled on: - mn10300: the arch doesn't actually build with highmem to begin with ] [[email protected]: coding-style fixes] [[email protected]: fix up drivers/gpu/drm/i915/intel_overlay.c] Acked-by: Rik van Riel <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Chris Metcalf <[email protected]> Cc: David Howells <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Russell King <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: David Miller <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Li Zefan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26vmscan,tmpfs: treat used once pages on tmpfs as used onceKOSAKI Motohiro1-1/+1
When a page has PG_referenced, shrink_page_list() discards it only if it is not dirty. This rule works fine if the backing filesystem is a regular one. PG_dirty is a good signal that the page was used recently because the flusher threads clean pages periodically. In addition, page writeback is costlier than simple page discard. However, when a page is on tmpfs this heuristic doesn't work because flusher threads don't write back tmpfs pages. Consequently tmpfs pages always rotate around the lru twice at least and adds unnecessary lru churn. Simple tmpfs streaming io shouldn't cause large anonymous page swap-out. Remove this unncessary reclaim bonus of tmpfs pages. Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: Hugh Dickins <[email protected]> Reviewed-by: Johannes Weiner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26writeback: remove the internal 5% low bound on dirty_ratioWu Fengguang1-11/+5
The dirty_ratio was silently limited in global_dirty_limits() to >= 5%. This is not a user expected behavior. And it's inconsistent with calc_period_shift(), which uses the plain vm_dirty_ratio value. Let's remove the internal bound. At the same time, fix balance_dirty_pages() to work with the dirty_thresh=0 case. This allows applications to proceed when dirty+writeback pages are all cleaned. And ">" fits with the name "exceeded" better than ">=" does. Neil thinks it is an aesthetic improvement as well as a functional one :) Signed-off-by: Wu Fengguang <[email protected]> Cc: Jan Kara <[email protected]> Proposed-by: Con Kolivas <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Neil Brown <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Cc: Michael Rubin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26writeback: do not sleep on the congestion queue if there are no congested ↵Mel Gorman3-11/+96
BDIs or if significant congestion is not being encountered in the current zone If congestion_wait() is called with no BDI congested, the caller will sleep for the full timeout and this may be an unnecessary sleep. This patch adds a wait_iff_congested() that checks congestion and only sleeps if a BDI is congested else, it calls cond_resched() to ensure the caller is not hogging the CPU longer than its quota but otherwise will not sleep. This is aimed at reducing some of the major desktop stalls reported during IO. For example, while kswapd is operating, it calls congestion_wait() but it could just have been reclaiming clean page cache pages with no congestion. Without this patch, it would sleep for a full timeout but after this patch, it'll just call schedule() if it has been on the CPU too long. Similar logic applies to direct reclaimers that are not making enough progress. Signed-off-by: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26vmscan: isolate_lru_pages(): stop neighbour search if neighbour cannot be ↵KOSAKI Motohiro1-6/+11
isolated isolate_lru_pages() does not just isolate LRU tail pages, but also isolates neighbour pages of the eviction page. The neighbour search does not stop even if neighbours cannot be isolated which is excessive as the lumpy reclaim will no longer result in a successful higher order allocation. This patch stops the PFN neighbour pages if an isolation fails and moves on to the next block. Signed-off-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Wu Fengguang <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26vmscan: remove dead code in shrink_inactive_list()KOSAKI Motohiro1-8/+0
After synchrounous lumpy reclaim, the page_list is guaranteed to not have active pages as page activation in shrink_page_list() disables lumpy reclaim. Remove the dead code. Signed-off-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: Rik van Riel <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>