aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2010-12-22mm/migrate.c: fix compilation errorMichal Nazarewicz1-0/+2
GCC complained about update_mmu_cache() not being defined in migrate.c. Including <asm/tlbflush.h> seems to solve the problem. Signed-off-by: Michal Nazarewicz <[email protected]> Signed-off-by: Kyungmin Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-22writeback: do uninterruptible sleep in balance_dirty_pages()Wu Fengguang1-1/+1
Using TASK_INTERRUPTIBLE in balance_dirty_pages() seems wrong. If it's going to do that then it must break out if signal_pending(), otherwise it's pretty much guaranteed to degenerate into a busywait loop. Plus we *do* want these processes to appear in D state and to contribute to load average. So it should be TASK_UNINTERRUPTIBLE. -- Andrew Morton Signed-off-by: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-22mm/compaction.c: avoid double mem_cgroup_del_lru()Minchan Kim1-1/+0
del_page_from_lru_list() already called mem_cgroup_del_lru(). So we must not call it again. It adds unnecessary overhead. It was not a runtime bug because the TestClearPageCgroupAcctLRU() early in mem_cgroup_del_lru_list() will prevent any double-deletion, etc. Signed-off-by: Minchan Kim <[email protected]> Acked-by: Balbir Singh <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-22Merge branch 'master' into for-nextJiri Kosina17-98/+185
Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.
2010-12-22percpu: print out alloc information with KERN_DEBUG instead of KERN_INFOTejun Heo1-1/+1
Now that percpu allocator is mostly stable, there is no reason to print alloc information with KERN_INFO and clutter the boot messages. Switch it to KERN_DEBUG. Signed-off-by: Tejun Heo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mike Travis <[email protected]>
2010-12-18vmstat: User per cpu atomics to avoid interrupt disable / enableChristoph Lameter1-14/+87
Currently the operations to increment vm counters must disable interrupts in order to not mess up their housekeeping of counters. So use this_cpu_cmpxchg() to avoid the overhead. Since we can no longer count on preremption being disabled we still have some minor issues. The fetching of the counter thresholds is racy. A threshold from another cpu may be applied if we happen to be rescheduled on another cpu. However, the following vmstat operation will then bring the counter again under the threshold limit. The operations for __xxx_zone_state are not changed since the caller has taken care of the synchronization needs (and therefore the cycle count is even less than the optimized version for the irq disable case provided here). The optimization using this_cpu_cmpxchg will only be used if the arch supports efficient this_cpu_ops (must have CONFIG_CMPXCHG_LOCAL set!) The use of this_cpu_cmpxchg reduces the cycle count for the counter operations by %80 (inc_zone_page_state goes from 170 cycles to 32). Signed-off-by: Christoph Lameter <[email protected]>
2010-12-17vmstat: Use this_cpu_inc_return for vm statisticsChristoph Lameter1-6/+2
this_cpu_inc_return() saves us a memory access there. Code size does not change. V1->V2: - Fixed the location of the __per_cpu pointer attributes - Sparse checked V2->V3: - Move fixes to __percpu attribute usage to earlier patch Reviewed-by: Pekka Enberg <[email protected]> Acked-by: H. Peter Anvin <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2010-12-17Merge branch 'this_cpu_ops' into for-2.6.38Tejun Heo4-4/+28
2010-12-17core: Replace __get_cpu_var with __this_cpu_read if not used for an address.Christoph Lameter1-3/+3
__get_cpu_var() can be replaced with this_cpu_read and will then use a single read instruction with implied address calculation to access the correct per cpu instance. However, the address of a per cpu variable passed to __this_cpu_read() cannot be determined (since it's an implied address conversion through segment prefixes). Therefore apply this only to uses of __get_cpu_var where the address of the variable is not used. Cc: Pekka Enberg <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Thomas Gleixner <[email protected]> Acked-by: H. Peter Anvin <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2010-12-17vmstat: Optimize zone counter modifications through the use of this cpu ↵Christoph Lameter1-20/+28
operations this cpu operations can be used to slightly optimize the function. The changes will avoid some address calculations and replace them with the use of the percpu segment register. If one would have this_cpu_inc_return and this_cpu_dec_return then it would be possible to optimize inc_zone_page_state and dec_zone_page_state even more. V1->V2: - Fix __dec_zone_state overflow handling - Use s8 variables for temporary storage. V2->V3: - Put __percpu annotations in correct places. Reviewed-by: Pekka Enberg <[email protected]> Acked-by: H. Peter Anvin <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2010-12-15install_special_mapping skips security_file_mmap check.Tavis Ormandy1-4/+12
The install_special_mapping routine (used, for example, to setup the vdso) skips the security check before insert_vm_struct, allowing a local attacker to bypass the mmap_min_addr security restriction by limiting the available pages for special mappings. bprm_mm_init() also skips the check, and although I don't think this can be used to bypass any restrictions, I don't see any reason not to have the security check. $ uname -m x86_64 $ cat /proc/sys/vm/mmap_min_addr 65536 $ cat install_special_mapping.s section .bss resb BSS_SIZE section .text global _start _start: mov eax, __NR_pause int 0x80 $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o $ ./install_special_mapping & [1] 14303 $ cat /proc/14303/maps 0000f000-00010000 r-xp 00000000 00:00 0 [vdso] 00010000-00011000 r-xp 00001000 00:19 2453665 /home/taviso/install_special_mapping 00011000-ffffe000 rwxp 00000000 00:00 0 [stack] It's worth noting that Red Hat are shipping with mmap_min_addr set to 4096. Signed-off-by: Tavis Ormandy <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Robert Swiecki <[email protected]> [ Changed to not drop the error code - akpm ] Reviewed-by: James Morris <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-15workqueue: convert cancel_rearming_delayed_work[queue]() users to ↵Tejun Heo2-2/+2
cancel_delayed_work_sync() cancel_rearming_delayed_work[queue]() has been superceded by cancel_delayed_work_sync() quite some time ago. Convert all the in-kernel users. The conversions are completely equivalent and trivial. Signed-off-by: Tejun Heo <[email protected]> Acked-by: "David S. Miller" <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Evgeniy Polyakov <[email protected]> Cc: Jeff Garzik <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: [email protected] Cc: Anton Vorontsov <[email protected]> Cc: David Woodhouse <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: Neil Brown <[email protected]> Cc: Alex Elder <[email protected]> Cc: [email protected] Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Cc: Trond Myklebust <[email protected]> Cc: [email protected]
2010-12-14Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds3-0/+16
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix panic after nfs_umount() nfs: remove extraneous and problematic calls to nfs_clear_request nfs: kernel should return EPROTONOSUPPORT when not support NFSv4 NFS: Fix fcntl F_GETLK not reporting some conflicts nfs: Discard ACL cache on mode update NFS: Readdir cleanups NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found NFS: Fix a memory leak in nfs_readdir Call the filesystem back whenever a page is removed from the page cache NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler
2010-12-07percpu: zero memory more efficiently in mm/percpu.c::pcpu_mem_alloc()Jesper Juhl1-6/+2
Don't do vmalloc() + memset() when vzalloc() will do. tj: dropped unnecessary temp variable ptr. Signed-off-by: Jesper Juhl <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2010-12-06Merge branch 'pm-fixes' of ↵Linus Torvalds1-7/+12
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Hibernate: Fix memory corruption related to swap PM / Hibernate: Use async I/O when reading compressed hibernation image
2010-12-06PM / Hibernate: Fix memory corruption related to swapRafael J. Wysocki1-7/+12
There is a problem that swap pages allocated before the creation of a hibernation image can be released and used for storing the contents of different memory pages while the image is being saved. Since the kernel stored in the image doesn't know of that, it causes memory corruption to occur after resume from hibernation, especially on systems with relatively small RAM that need to swap often. This issue can be addressed by keeping the GFP_IOFS bits clear in gfp_allowed_mask during the entire hibernation, including the saving of the image, until the system is finally turned off or the hibernation is aborted. Unfortunately, for this purpose it's necessary to rework the way in which the hibernate and suspend code manipulates gfp_allowed_mask. This change is based on an earlier patch from Hugh Dickins. Signed-off-by: Rafael J. Wysocki <[email protected]> Reported-by: Ondrej Zary <[email protected]> Acked-by: Hugh Dickins <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: [email protected]
2010-12-05Merge branch 'slab/urgent' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Fix a crash during slabinfo -v
2010-12-04slub: Fix a crash during slabinfo -vTero Roponen1-2/+2
Commit f7cb1933621bce66a77f690776a16fe3ebbc4d58 ("SLUB: Pass active and inactive redzone flags instead of boolean to debug functions") missed two instances of check_object(). This caused a lot of warnings during 'slabinfo -v' finally leading to a crash: BUG ext4_xattr: Freepointer corrupt ... BUG buffer_head: Freepointer corrupt ... BUG ext4_alloc_context: Freepointer corrupt ... ... BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff810a291f>] file_sb_list_del+0x1c/0x35 PGD 79d78067 PUD 79e67067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/kernel/slab/:t-0000192/validate This patch fixes the problem by converting the two missed instances. Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Tero Roponen <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2010-12-04slub: Fix a crash during slabinfo -vTero Roponen1-2/+2
Commit f7cb1933621bce66a77f690776a16fe3ebbc4d58 ("SLUB: Pass active and inactive redzone flags instead of boolean to debug functions") missed two instances of check_object(). This caused a lot of warnings during 'slabinfo -v' finally leading to a crash: BUG ext4_xattr: Freepointer corrupt ... BUG buffer_head: Freepointer corrupt ... BUG ext4_alloc_context: Freepointer corrupt ... ... BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff810a291f>] file_sb_list_del+0x1c/0x35 PGD 79d78067 PUD 79e67067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/kernel/slab/:t-0000192/validate This patch fixes the problem by converting the two missed instances. Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Tero Roponen <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2010-12-02ksm: annotate ksm_thread_mutex is no deadlock sourceKOSAKI Motohiro1-1/+6
commit 62b61f611e ("ksm: memory hotremove migration only") caused the following new lockdep warning. ======================================================= [ INFO: possible circular locking dependency detected ] ------------------------------------------------------- bash/1621 is trying to acquire lock: ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0 but task is already holding lock: (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (ksm_thread_mutex){+.+.+.}: [<ffffffff8108b70a>] lock_acquire+0xaa/0x140 [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0 [<ffffffff81506228>] mutex_lock_nested+0x48/0x60 [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0 [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0 [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0 [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813afbfb>] memory_notify+0x1b/0x20 [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0 [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0 [<ffffffff813afd62>] store_mem_state+0xe2/0xf0 [<ffffffff813a0bb0>] sysdev_store+0x20/0x30 [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170 [<ffffffff8114f398>] vfs_write+0xc8/0x190 [<ffffffff8114fc14>] sys_write+0x54/0x90 [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b -> #0 ((memory_chain).rwsem){.+.+.+}: [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600 [<ffffffff8108b70a>] lock_acquire+0xaa/0x140 [<ffffffff81506601>] down_read+0x51/0xa0 [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0 [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813afbfb>] memory_notify+0x1b/0x20 [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0 [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0 [<ffffffff813afd62>] store_mem_state+0xe2/0xf0 [<ffffffff813a0bb0>] sysdev_store+0x20/0x30 [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170 [<ffffffff8114f398>] vfs_write+0xc8/0x190 [<ffffffff8114fc14>] sys_write+0x54/0x90 [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b But it's a false positive. Both memory_chain.rwsem and ksm_thread_mutex have an outer lock (mem_hotplug_mutex). So they cannot deadlock. Thus, This patch annotate ksm_thread_mutex is not deadlock source. [[email protected]: update comment, from Hugh] Signed-off-by: KOSAKI Motohiro <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andi Kleen <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-02mem-hotplug: introduce {un}lock_memory_hotplug()KOSAKI Motohiro2-11/+28
Presently hwpoison is using lock_system_sleep() to prevent a race with memory hotplug. However lock_system_sleep() is a no-op if CONFIG_HIBERNATION=n. Therefore we need a new lock. Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Suggested-by: Hugh Dickins <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-02vmalloc: eagerly clear ptes on vunmapJeremy Fitzhardinge1-11/+17
On stock 2.6.37-rc4, running: # mount lilith:/export /mnt/lilith # find /mnt/lilith/ -type f -print0 | xargs -0 file crashes the machine fairly quickly under Xen. Often it results in oops messages, but the couple of times I tried just now, it just hung quietly and made Xen print some rude messages: (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 1d7058 (pfn 18fa7) (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp 1000000000000000) for mfn 1d2e04 (pfn 1d1fb) (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04 Which means the domain tried to map a pagetable page RW, which would allow it to map arbitrary memory, so Xen stopped it. This is because vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had finished with them, and those pages got recycled as pagetable pages while still having these RW aliases. Removing those mappings immediately removes the Xen-visible aliases, and so it has no problem with those pages being reused as pagetable pages. Deferring the TLB flush doesn't upset Xen because it can flush the TLB itself as needed to maintain its invariants. When unmapping a region in the vmalloc space, clear the ptes immediately. There's no point in deferring this because there's no amortization benefit. The TLBs are left dirty, and they are flushed lazily to amortize the cost of the IPIs. This specific motivation for this patch is an oops-causing regression since 2.6.36 when using NFS under Xen, triggered by the NFS client's use of vm_map_ram() introduced in 56e4ebf877b60 ("NFS: readdir with vmapped pages") . XFS also uses vm_map_ram() and could cause similar problems. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Bryan Schumaker <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Alex Elder <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-02vmstat: fix dirty threshold orderingWu Fengguang1-2/+2
The nr_dirty_[background_]threshold fields are misplaced before the numa_* fields, and users will read strange values. This is the right order. Before patch, nr_dirty_background_threshold will read as 0 (the value from numa_miss). numa_hit 128501 numa_miss 0 numa_foreign 0 numa_interleave 7388 numa_local 128501 numa_other 0 nr_dirty_threshold 144291 nr_dirty_background_threshold 72145 Signed-off-by: Wu Fengguang <[email protected]> Cc: Michael Rubin <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-02mm/mempolicy.c: add rcu read lock to protect pid structureZeng Zhaoming1-0/+3
find_task_by_vpid() should be protected by rcu_read_lock(), to prevent free_pid() reclaiming pid. Signed-off-by: Zeng Zhaoming <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-02mm/hugetlb.c: avoid double unlock_page() in hugetlb_fault()Dean Nelson1-1/+2
Have hugetlb_fault() call unlock_page(page) only if it had previously called lock_page(page). Setting CONFIG_DEBUG_VM=y and then running the libhugetlbfs test suite, resulted in the tripping of VM_BUG_ON(!PageLocked(page)) in unlock_page() having been called by hugetlb_fault() when page == pagecache_page. This patch remedied the problem. Signed-off-by: Dean Nelson <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-02Call the filesystem back whenever a page is removed from the page cacheLinus Torvalds3-0/+16
NFS needs to be able to release objects that are stored in the page cache once the page itself is no longer visible from the page cache. This patch adds a callback to the address space operations that allows filesystems to perform page cleanups once the page has been removed from the page cache. Original patch by: Linus Torvalds <[email protected]> [trondmy: cover the cases of invalidate_inode_pages2() and truncate_inode_pages()] Signed-off-by: Trond Myklebust <[email protected]>
2010-11-28Kill off a bunch of warning: ‘inline’ is not at beginning of declarationJesper Juhl1-1/+1
These warnings are spewed during a build of a 'allnoconfig' kernel (especially the ones from u64_stats_sync.h show up a lot) when building with -Wextra (which I often do).. They are a) annoying b) easy to get rid of. This patch kills them off. include/linux/u64_stats_sync.h:70:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:77:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:84:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:96:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:115:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:127:1: warning: ‘inline’ is not at beginning of declaration kernel/time.c:241:1: warning: ‘inline’ is not at beginning of declaration kernel/time.c:257:1: warning: ‘inline’ is not at beginning of declaration kernel/perf_event.c:4513:1: warning: ‘inline’ is not at beginning of declaration mm/page_alloc.c:4012:1: warning: ‘inline’ is not at beginning of declaration Signed-off-by: Jesper Juhl <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2010-11-28tracing/slab: Move kmalloc tracepoint out of inline codeSteven Rostedt1-15/+23
The tracepoint for kmalloc is in the slab inlined code which causes every instance of kmalloc to have the tracepoint. This patch moves the tracepoint out of the inline code to the slab C file, which removes a large number of inlined trace points. objdump -dr vmlinux.slab| grep 'jmpq.*<trace_kmalloc' |wc -l 213 objdump -dr vmlinux.slab.patched| grep 'jmpq.*<trace_kmalloc' |wc -l 1 This also has a nice impact on size. text data bss dec hex filename 7023060 2121564 2482432 11627056 b16a30 vmlinux.slab 6970579 2109772 2482432 11562783 b06f1f vmlinux.slab.patched Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2010-11-27Merge branch 'cleanup-bd_claim' of ↵Jens Axboe1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into for-2.6.38/core
2010-11-25mm: remove call to find_vma in pagewalk for non-hugetlbfsDavid Sterba1-2/+3
Commit d33b9f45 ("mm: hugetlb: fix hugepage memory leak in walk_page_range()") introduces a check if a vma is a hugetlbfs one and later in 5dc37642 ("mm hugetlb: add hugepage support to pagemap") it is moved under #ifdef CONFIG_HUGETLB_PAGE but a needless find_vma call is left behind and its result is not used anywhere else in the function. The side-effect of caching vma for @addr inside walk->mm is neither utilized in walk_page_range() nor in called functions. Signed-off-by: David Sterba <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Matt Mackall <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-25mm/page_alloc.c: fix build_all_zonelist() where percpu_alloc() is wrongly ↵KAMEZAWA Hiroyuki1-9/+5
called under stop_machine_run() During memory hotplug, build_allzonelists() may be called under stop_machine_run(). In this function, setup_zone_pageset() is called. But it's bug because it will do page allocation under stop_machine_run(). Here is a report from Alok Kataria. BUG: sleeping function called from invalid context at kernel/mutex.c:94 in_atomic(): 0, irqs_disabled(): 1, pid: 4, name: migration/0 Pid: 4, comm: migration/0 Not tainted 2.6.35.6-45.fc14.x86_64 #1 Call Trace: [<ffffffff8103d12b>] __might_sleep+0xeb/0xf0 [<ffffffff81468245>] mutex_lock+0x24/0x50 [<ffffffff8110eaa6>] pcpu_alloc+0x6d/0x7ee [<ffffffff81048888>] ? load_balance+0xbe/0x60e [<ffffffff8103a1b3>] ? rt_se_boosted+0x21/0x2f [<ffffffff8103e1cf>] ? dequeue_rt_stack+0x18b/0x1ed [<ffffffff8110f237>] __alloc_percpu+0x10/0x12 [<ffffffff81465e22>] setup_zone_pageset+0x38/0xbe [<ffffffff810d6d81>] ? build_zonelists_node.clone.58+0x79/0x8c [<ffffffff81452539>] __build_all_zonelists+0x419/0x46c [<ffffffff8108ef01>] ? cpu_stopper_thread+0xb2/0x198 [<ffffffff8108f075>] stop_machine_cpu_stop+0x8e/0xc5 [<ffffffff8108efe7>] ? stop_machine_cpu_stop+0x0/0xc5 [<ffffffff8108ef57>] cpu_stopper_thread+0x108/0x198 [<ffffffff81467a37>] ? schedule+0x5b2/0x5cc [<ffffffff8108ee4f>] ? cpu_stopper_thread+0x0/0x198 [<ffffffff81065f29>] kthread+0x7f/0x87 [<ffffffff8100aae4>] kernel_thread_helper+0x4/0x10 [<ffffffff81065eaa>] ? kthread+0x0/0x87 [<ffffffff8100aae0>] ? kernel_thread_helper+0x0/0x10 Built 5 zonelists in Node order, mobility grouping on. Total pages: 289456 Policy zone: Normal This patch tries to fix the issue by moving setup_zone_pageset() out from stop_machine_run(). It's obviously not necessary to be called under stop_machine_run(). [[email protected]: remove unneeded local] Reported-by: Alok Kataria <[email protected]> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Petr Vandrovec <[email protected]> Cc: Pekka Enberg <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-25cgroups: make swap accounting default behavior configurableMichal Hocko1-2/+19
Swap accounting can be configured by CONFIG_CGROUP_MEM_RES_CTLR_SWAP configuration option and then it is turned on by default. There is a boot option (noswapaccount) which can disable this feature. This makes it hard for distributors to enable the configuration option as this feature leads to a bigger memory consumption and this is a no-go for general purpose distribution kernel. On the other hand swap accounting may be very usuful for some workloads. This patch adds a new configuration option which controls the default behavior (CGROUP_MEM_RES_CTLR_SWAP_ENABLED). If the option is selected then the feature is turned on by default. It also adds a new boot parameter swapaccount[=1|0] which enhances the original noswapaccount parameter semantic by means of enable/disable logic (defaults to 1 if no value is provided to be still consistent with noswapaccount). The default behavior is unchanged (if CONFIG_CGROUP_MEM_RES_CTLR_SWAP is enabled then CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED is enabled as well) Signed-off-by: Michal Hocko <[email protected]> Acked-by: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-25memcg: avoid deadlock between move charge and try_charge()Daisuke Nishimura1-17/+26
__mem_cgroup_try_charge() can be called under down_write(&mmap_sem)(e.g. mlock does it). This means it can cause deadlock if it races with move charge: Ex.1) move charge | try charge --------------------------------------+------------------------------ mem_cgroup_can_attach() | down_write(&mmap_sem) mc.moving_task = current | .. mem_cgroup_precharge_mc() | __mem_cgroup_try_charge() mem_cgroup_count_precharge() | prepare_to_wait() down_read(&mmap_sem) | if (mc.moving_task) -> cannot aquire the lock | -> true | schedule() Ex.2) move charge | try charge --------------------------------------+------------------------------ mem_cgroup_can_attach() | mc.moving_task = current | mem_cgroup_precharge_mc() | mem_cgroup_count_precharge() | down_read(&mmap_sem) | .. | up_read(&mmap_sem) | | down_write(&mmap_sem) mem_cgroup_move_task() | .. mem_cgroup_move_charge() | __mem_cgroup_try_charge() down_read(&mmap_sem) | prepare_to_wait() -> cannot aquire the lock | if (mc.moving_task) | -> true | schedule() To avoid this deadlock, we do all the move charge works (both can_attach() and attach()) under one mmap_sem section. And after this patch, we set/clear mc.moving_task outside mc.lock, because we use the lock only to check mc.from/to. Signed-off-by: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-25memcg: fix false positive VM_BUG on non-SMPKirill A. Shutemov1-1/+1
Fix this: kernel BUG at mm/memcontrol.c:2155! invalid opcode: 0000 [#1] last sysfs file: Pid: 18, comm: sh Not tainted 2.6.37-rc3 #3 /Bochs EIP: 0060:[<c10731b2>] EFLAGS: 00000246 CPU: 0 EIP is at mem_cgroup_move_account+0xe2/0xf0 EAX: 00000004 EBX: c6f931d4 ECX: c681c300 EDX: c681c000 ESI: c681c300 EDI: ffffffea EBP: c681c000 ESP: c46f3e30 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process sh (pid: 18, ti=c46f2000 task=c6826e60 task.ti=c46f2000) Stack: 00000155 c681c000 0805f000 c46ee180 c46f3e5c c7058820 c1074d37 00000000 08060000 c46db9a0 c46ec080 c7058820 0805f000 08060000 c46f3e98 c1074c50 c106c75e c46f3e98 c46ec080 08060000 0805ffff c46db9a0 c46f3e98 c46e0340 Call Trace: [<c1074d37>] ? mem_cgroup_move_charge_pte_range+0xe7/0x130 [<c1074c50>] ? mem_cgroup_move_charge_pte_range+0x0/0x130 [<c106c75e>] ? walk_page_range+0xee/0x1d0 [<c10725d6>] ? mem_cgroup_move_task+0x66/0x90 [<c1074c50>] ? mem_cgroup_move_charge_pte_range+0x0/0x130 [<c1072570>] ? mem_cgroup_move_task+0x0/0x90 [<c1042616>] ? cgroup_attach_task+0x136/0x200 [<c1042878>] ? cgroup_tasks_write+0x48/0xc0 [<c1041e9e>] ? cgroup_file_write+0xde/0x220 [<c101398d>] ? do_page_fault+0x17d/0x3f0 [<c108a79d>] ? alloc_fd+0x2d/0xd0 [<c1041dc0>] ? cgroup_file_write+0x0/0x220 [<c1077ba2>] ? vfs_write+0x92/0xc0 [<c1077c81>] ? sys_write+0x41/0x70 [<c1140e3d>] ? syscall_call+0x7/0xb Code: 03 00 74 09 8b 44 24 04 e8 1c f1 ff ff 89 73 04 8d 86 b0 00 00 00 b9 01 00 00 00 89 da 31 ff e8 65 f5 ff ff e9 4d ff ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 90 8d b4 26 00 00 00 00 83 ec 10 8b 0d f4 e3 EIP: [<c10731b2>] mem_cgroup_move_account+0xe2/0xf0 SS:ESP 0068:c46f3e30 ---[ end trace 7daa1582159b6532 ]--- lock_page_cgroup and unlock_page_cgroup are implemented using bit_spinlock. bit_spinlock doesn't touch the bit if we are on non-SMP machine, so we can't use the bit to check whether the lock was taken. Let's introduce is_page_cgroup_locked based on bit_spin_is_locked instead of PageCgroupLocked to fix it. [[email protected]: s/is_page_cgroup_locked/page_is_cgroup_locked/] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Johannes Weiner <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-25nommu: yield CPU while disposing VMSteven J. Magnani1-0/+1
Depending on processor speed, page size, and the amount of memory a process is allowed to amass, cleanup of a large VM may freeze the system for many seconds. This can result in a watchdog timeout. Make sure other tasks receive some service when cleaning up large VMs. Signed-off-by: Steven J. Magnani <[email protected]> Cc: Greg Ungerer <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-14Merge branch 'for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Fix slub_lock down/up imbalance
2010-11-14slub: Fix slub_lock down/up imbalancePavel Emelyanov1-1/+2
There are two places, that do not release the slub_lock. Respective bugs were introduced by sysfs changes ab4d5ed5 (slub: Enable sysfs support for !CONFIG_SLUB_DEBUG) and 2bce6485 ( slub: Allow removal of slab caches during boot). Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2010-11-13block: make blkdev_get/put() handle exclusive accessTejun Heo1-3/+4
Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Neil Brown <[email protected]> Acked-by: Ryusuke Konishi <[email protected]> Acked-by: Mike Snitzer <[email protected]> Acked-by: Philipp Reisner <[email protected]> Cc: Peter Osterlund <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Jan Kara <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Alex Elder <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Leo Chen <[email protected]> Cc: Scott Branden <[email protected]> Cc: Chris Mason <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Dave Kleikamp <[email protected]> Cc: Joern Engel <[email protected]> Cc: [email protected] Cc: Alexander Viro <[email protected]>
2010-11-12radix-tree: fix RCU bugNick Piggin1-16/+10
Salman Qazi describes the following radix-tree bug: In the following case, we get can get a deadlock: 0. The radix tree contains two items, one has the index 0. 1. The reader (in this case find_get_pages) takes the rcu_read_lock. 2. The reader acquires slot(s) for item(s) including the index 0 item. 3. The non-zero index item is deleted, and as a consequence the other item is moved to the root of the tree. The place where it used to be is queued for deletion after the readers finish. 3b. The zero item is deleted, removing it from the direct slot, it remains in the rcu-delayed indirect node. 4. The reader looks at the index 0 slot, and finds that the page has 0 ref count 5. The reader looks at it again, hoping that the item will either be freed or the ref count will increase. This never happens, as the slot it is looking at will never be updated. Also, this slot can never be reclaimed because the reader is holding rcu_read_lock and is in an infinite loop. The fix is to re-use the same "indirect" pointer case that requires a slot lookup retry into a general "retry the lookup" bit. Signed-off-by: Nick Piggin <[email protected]> Reported-by: Salman Qazi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-12vmscan: avoid setting zone congested if no page dirtyShaohua Li1-1/+1
nr_dirty and nr_congested are increased only when the page is dirty. So if all pages are clean, both them will be zero. In this case, we should not mark the zone congested. Signed-off-by: Shaohua Li <[email protected]> Reviewed-by: Johannes Weiner <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-12mm/vfs: revalidate page->mapping in do_generic_file_read()Dave Hansen1-0/+3
70 hours into some stress tests of a 2.6.32-based enterprise kernel, we ran into a NULL dereference in here: int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc, unsigned long from) { ----> struct inode *inode = page->mapping->host; It looks like page->mapping was the culprit. (xmon trace is below). After closer examination, I realized that do_generic_file_read() does a find_get_page(), and eventually locks the page before calling block_is_partially_uptodate(). However, it doesn't revalidate the page->mapping after the page is locked. So, there's a small window between the find_get_page() and ->is_partially_uptodate() where the page could get truncated and page->mapping cleared. We _have_ a reference, so it can't get reclaimed, but it certainly can be truncated. I think the correct thing is to check page->mapping after the trylock_page(), and jump out if it got truncated. This patch has been running in the test environment for a month or so now, and we have not seen this bug pop up again. xmon info: 1f:mon> e cpu 0x1f: Vector: 300 (Data Access) at [c0000002ae36f770] pc: c0000000001e7a6c: .block_is_partially_uptodate+0xc/0x100 lr: c000000000142944: .generic_file_aio_read+0x1e4/0x770 sp: c0000002ae36f9f0 msr: 8000000000009032 dar: 0 dsisr: 40000000 current = 0xc000000378f99e30 paca = 0xc000000000f66300 pid = 21946, comm = bash 1f:mon> r R00 = 0025c0500000006d R16 = 0000000000000000 R01 = c0000002ae36f9f0 R17 = c000000362cd3af0 R02 = c000000000e8cd80 R18 = ffffffffffffffff R03 = c0000000031d0f88 R19 = 0000000000000001 R04 = c0000002ae36fa68 R20 = c0000003bb97b8a0 R05 = 0000000000000000 R21 = c0000002ae36fa68 R06 = 0000000000000000 R22 = 0000000000000000 R07 = 0000000000000001 R23 = c0000002ae36fbb0 R08 = 0000000000000002 R24 = 0000000000000000 R09 = 0000000000000000 R25 = c000000362cd3a80 R10 = 0000000000000000 R26 = 0000000000000002 R11 = c0000000001e7b60 R27 = 0000000000000000 R12 = 0000000042000484 R28 = 0000000000000001 R13 = c000000000f66300 R29 = c0000003bb97b9b8 R14 = 0000000000000001 R30 = c000000000e28a08 R15 = 000000000000ffff R31 = c0000000031d0f88 pc = c0000000001e7a6c .block_is_partially_uptodate+0xc/0x100 lr = c000000000142944 .generic_file_aio_read+0x1e4/0x770 msr = 8000000000009032 cr = 22000488 ctr = c0000000001e7a60 xer = 0000000020000000 trap = 300 dar = 0000000000000000 dsisr = 40000000 1f:mon> t [link register ] c000000000142944 .generic_file_aio_read+0x1e4/0x770 [c0000002ae36f9f0] c000000000142a14 .generic_file_aio_read+0x2b4/0x770 (unreliable) [c0000002ae36fb40] c0000000001b03e4 .do_sync_read+0xd4/0x160 [c0000002ae36fce0] c0000000001b153c .vfs_read+0xec/0x1f0 [c0000002ae36fd80] c0000000001b1768 .SyS_read+0x58/0xb0 [c0000002ae36fe30] c00000000000852c syscall_exit+0x0/0x40 --- Exception: c00 (System Call) at 00000080a840bc54 SP (fffca15df30) is in userspace 1f:mon> di c0000000001e7a6c c0000000001e7a6c e9290000 ld r9,0(r9) c0000000001e7a70 418200c0 beq c0000000001e7b30 # .block_is_partially_uptodate+0xd0/0x100 c0000000001e7a74 e9440008 ld r10,8(r4) c0000000001e7a78 78a80020 clrldi r8,r5,32 c0000000001e7a7c 3c000001 lis r0,1 c0000000001e7a80 812900a8 lwz r9,168(r9) c0000000001e7a84 39600001 li r11,1 c0000000001e7a88 7c080050 subf r0,r8,r0 c0000000001e7a8c 7f805040 cmplw cr7,r0,r10 c0000000001e7a90 7d6b4830 slw r11,r11,r9 c0000000001e7a94 796b0020 clrldi r11,r11,32 c0000000001e7a98 419d00a8 bgt cr7,c0000000001e7b40 # .block_is_partially_uptodate+0xe0/0x100 c0000000001e7a9c 7fa55840 cmpld cr7,r5,r11 c0000000001e7aa0 7d004214 add r8,r0,r8 c0000000001e7aa4 79080020 clrldi r8,r8,32 c0000000001e7aa8 419c0078 blt cr7,c0000000001e7b20 # .block_is_partially_uptodate+0xc0/0x100 Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Johannes Weiner <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: Minchan Kim <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-12memcg: null dereference on allocation failureDan Carpenter1-7/+9
The original code had a null dereference if alloc_percpu() failed. This was introduced in commit 711d3d2c9bc3 ("memcg: cpu hotplug aware percpu count updates") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Balbir Singh <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Daisuke Nishimura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-09perf_events: Fix perf_counter_mmap() hook in mprotect()Pekka Enberg1-1/+1
As pointed out by Linus, commit dab5855 ("perf_counter: Add mmap event hooks to mprotect()") is fundamentally wrong as mprotect_fixup() can free 'vma' due to merging. Fix the problem by moving perf_event_mmap() hook to mprotect_fixup(). Note: there's another successful return path from mprotect_fixup() if old flags equal to new flags. We don't, however, need to call perf_event_mmap() there because 'perf' already knows the VMA is executable. Reported-by: Dave Jones <[email protected]> Analyzed-by: Linus Torvalds <[email protected]> Cc: Ingo Molnar <[email protected]> Reviewed-by: Peter Zijlstra <[email protected]> Signed-off-by: Pekka Enberg <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-06slub: Fix slub_lock down/up imbalancePavel Emelyanov1-1/+2
There are two places, that do not release the slub_lock. Respective bugs were introduced by sysfs changes ab4d5ed5 (slub: Enable sysfs support for !CONFIG_SLUB_DEBUG) and 2bce6485 ( slub: Allow removal of slab caches during boot). Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2010-11-06slub tracing: move trace calls out of always inlined functions to reduce ↵Richard Kennedy1-7/+23
kernel code size Having the trace calls defined in the always inlined kmalloc functions in include/linux/slub_def.h causes a lot of code duplication as the trace functions get instantiated for each kamalloc call site. This can simply be removed by pushing the trace calls down into the functions in slub.c. On my x86_64 built this patch shrinks the code size of the kernel by approx 36K and also shrinks the code size of many modules -- too many to list here ;) size vmlinux (2.6.36) reports text data bss dec hex filename 5410611 743172 828928 6982711 6a8c37 vmlinux 5373738 744244 828928 6946910 6a005e vmlinux + patch The resulting kernel has had some testing & kmalloc trace still seems to work. This patch - moves trace_kmalloc out of the inlined kmalloc() and pushes it down into kmem_cache_alloc_trace() so this it only get instantiated once. - rename kmem_cache_alloc_notrace() to kmem_cache_alloc_trace() to indicate that now is does have tracing. (maybe this would better being called something like kmalloc_kmem_cache ?) - adds a new function kmalloc_order() to handle allocation and tracing of large allocations of page order. - removes tracing from the inlined kmalloc_large() replacing them with a call to kmalloc_order(); - move tracing out of inlined kmalloc_node() and pushing it down into kmem_cache_alloc_node_trace - rename kmem_cache_alloc_node_notrace() to kmem_cache_alloc_node_trace() - removes the include of trace/events/kmem.h from slub_def.h. v2 - keep kmalloc_order_trace inline when !CONFIG_TRACE Signed-off-by: Richard Kennedy <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2010-11-03vmstat: fix offset calculation on void*Wu Fengguang1-1/+1
Fix regression introduced by commit 79da826aee6 ("writeback: report dirty thresholds in /proc/vmstat"). The incorrect pointer arithmetic can result in problems like this: BUG: unable to handle kernel paging request at 07c06d16 IP: [<c050c336>] strnlen+0x6/0x20 Call Trace: [<c050a249>] ? string+0x39/0xe0 [<c042be6b>] ? __wake_up_common+0x4b/0x80 [<c050afcc>] ? vsnprintf+0x1ec/0x380 [<c04b380e>] ? seq_printf+0x2e/0x60 [<c04829a6>] ? vmstat_show+0x26/0x30 [<c04b3bb6>] ? seq_read+0xa6/0x380 [<c04b3b10>] ? seq_read+0x0/0x380 [<c04d5d2f>] ? proc_reg_read+0x5f/0x90 [<c049c4a1>] ? vfs_read+0xa1/0x140 [<c04d5cd0>] ? proc_reg_read+0x0/0x90 [<c049c981>] ? sys_read+0x41/0x70 [<c0402bd0>] ? sysenter_do_call+0x12/0x26 Reported-by: Tetsuo Handa <[email protected]> Cc: Michael Rubin <[email protected]> Signed-off-by: Wu Fengguang <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-02Release page reference during page fault retryMichel Lespinasse1-1/+3
This slipped by when unifying the filemap and swap versions of lock_page_or_retry()... Signed-off-by: Michel Lespinasse <[email protected]> Acked-by: Rik van Riel <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-01tree-wide: fix comment/printk typosUwe Kleine-König2-2/+2
"gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2010-10-30audit mmapAl Viro2-0/+4
Normal syscall audit doesn't catch 5th argument of syscall. It also doesn't catch the contents of userland structures pointed to be syscall argument, so for both old and new mmap(2) ABI it doesn't record the descriptor we are mapping. For old one it also misses flags. Signed-off-by: Al Viro <[email protected]>
2010-10-29convert get_sb_nodev() usersAl Viro1-5/+5
Signed-off-by: Al Viro <[email protected]>