aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2010-08-05memblock: NUMA allocate can now use early_pfn_mapBenjamin Herrenschmidt1-1/+27
We now provide a default (weak) implementation of memblock_nid_range() which uses the early_pfn_map[] if CONFIG_ARCH_POPULATES_NODE_MAP is set. Sparc still needs to use its own method due to the way the pages can be scattered between nodes. This implementation is inefficient due to our main algorithm and callback construct wanting to work on an ascending addresses bases while early_pfn_map[] would rather work with nid's (it's unsorted at that stage). But it should work and we can look into improving it subsequently, possibly using arch compile options to chose a different algorithm alltogether. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Add "start" argument to memblock_find_base()Benjamin Herrenschmidt1-11/+16
To constraint the search of a region between two boundaries, which will be used by the new NUMA aware allocator among others. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Add arch function to control coalescing of memblock memory regionsBenjamin Herrenschmidt1-1/+18
Some archs such as ARM want to avoid coalescing accross things such as the lowmem/highmem boundary or similar. This provides the option to control it via an arch callback for which a weak default is provided which always allows coalescing. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Add array resizing supportBenjamin Herrenschmidt1-2/+102
When one of the array gets full, we resize it. After much thinking and a few iterations of that code, I went back to on-demand resizing using the (new) internal memblock_find_base() function, which is pretty much what Yinghai initially proposed, though there some differences in the details. To work this relies on the default alloc limit being set sensibly by the architecture. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Move functions around into a more sensible orderBenjamin Herrenschmidt1-142/+159
Some shuffling is needed for doing array resize so we may as well put some sense into the ordering of the functions in the whole memblock.c file. No code change. Added some comments. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: split memblock_find_base() out of __memblock_alloc_base()Benjamin Herrenschmidt1-20/+38
This will be used by the array resize code and might prove useful to some arch code as well at which point it can be made non-static. Also add comment as to why aligning size is important Signed-off-by: Benjamin Herrenschmidt <[email protected]> --- v2. Fix loss of size alignment v3. Fix result code
2010-08-05memblock: Move memblock_init() to the bottom of the fileBenjamin Herrenschmidt1-27/+27
It's a real PITA to have to search for it in the middle Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Define MEMBLOCK_ERROR internally instead of using ~(phys_addr_t)0Benjamin Herrenschmidt1-5/+7
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Make memblock_find_region() out of memblock_alloc_region()Benjamin Herrenschmidt1-11/+9
This function will be used to locate a free area to put the new memblock arrays when attempting to resize them. memblock_alloc_region() is gone, the two callsites now call memblock_add_region(). Signed-off-by: Benjamin Herrenschmidt <[email protected]> --- v2. Fix membase_alloc_nid_region() conversion
2010-08-05memblock: Add debug markers at the end of the arrayBenjamin Herrenschmidt1-0/+11
Since we allocate one more than needed, why not do a bit of sanity checking here to ensure we don't walk past the end of the array ? Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Move memblock arrays to static storage in memblock.c and make ↵Benjamin Herrenschmidt1-1/+9
their size a variable This is in preparation for having resizable arrays. Note that we still allocate one more than needed, this is unchanged from the previous implementation. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Remove memblock_type.size and add memblock.memory_size insteadBenjamin Herrenschmidt1-4/+4
Right now, both the "memory" and "reserved" memblock_type structures have a "size" member. It represents the calculated memory size in the former case and is unused in the latter. This moves it out to the main memblock structure instead Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Change u64 to phys_addr_tBenjamin Herrenschmidt1-58/+60
Let's not waste space and cycles on archs that don't support >32-bit physical address space. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Remove rmo_size, burry it in arch/powerpc where it belongsBenjamin Herrenschmidt1-8/+0
The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact server ppc64 though I hijack it on embedded ppc64 for similar purposes) and represents the area of memory that can be accessed in real mode (aka with MMU off), or on embedded, from the exception vectors (which is bolted in the TLB) which pretty much boils down to the same thing. We take that out of the generic MEMBLOCK data structure and move it into arch/powerpc where it belongs, renaming it to "RMA" while at it. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Introduce default allocation limit and use it to replace explicit onesBenjamin Herrenschmidt1-8/+11
This introduce memblock.current_limit which is used to limit allocations from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE). The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still be used with memblock_alloc_base() to allocate really anywhere. It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears. Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I strongly recommend that you ensure that you set an appropriate limit during boot in order to guarantee that an memblock_alloc() at any time results in something that is accessible with a simple __va(). The reason is that a subsequent patch will introduce the ability for the array to resize itself by reallocating itself. The MEMBLOCK core will honor the current limit when performing those allocations. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Expose MEMBLOCK_ALLOC_ANYWHEREBenjamin Herrenschmidt1-2/+0
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Factor the lowest level alloc functionBenjamin Herrenschmidt1-32/+27
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Remove nid_range argument, arch provides memblock_nid_range() insteadBenjamin Herrenschmidt1-5/+8
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-05memblock: Remove memblock_find()Benjamin Herrenschmidt1-32/+0
Nobody uses it anymore. It's semantics were ... weird Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-04memblock: Implement memblock_is_memory and memblock_is_region_memoryBenjamin Herrenschmidt1-8/+34
To make it fast, we steal ARM's binary search for memblock_is_memory() and we use that to also the replace existing implementation of memblock_is_reserved(). Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-04memblock: Rename memblock_region to memblock_type and memblock_property to ↵Benjamin Herrenschmidt1-85/+83
memblock_region Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-08-04memblock: Fix memblock_is_region_reserved() to return a booleanBenjamin Herrenschmidt1-1/+1
All callers expect a boolean result which is true if the region overlaps a reserved region. However, the implementation actually returns -1 if there is no overlap, and a region index (0 based) if there is. Make it behave as callers (and common sense) expect. Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-07-30mm: fix ia64 crash when gcore reads gate areaHugh Dickins1-3/+13
Debian's ia64 autobuilders have been seeing kernel freeze or reboot when running the gdb testsuite (Debian bug 588574): dannf bisected to 2.6.32 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 "mm: ZERO_PAGE without PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target. I'd missed updating the gate_vma handling in __get_user_pages(): that happens to use vm_normal_page() (nowadays failing on the zero page), yet reported success even when it failed to get a page - boom when access_process_vm() tried to copy that to its intermediate buffer. Fix this, resisting cleanups: in particular, leave it for now reporting success when not asked to get any pages - very probably safe to change, but let's not risk it without testing exposure. Why did ia64 crash with 16kB pages, but succeed with 64kB pages? Because setup_gate() pads each 64kB of its gate area with zero pages. Reported-by: Andreas Barth <[email protected]> Bisected-by: dann frazier <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Tested-by: dann frazier <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2010-07-20x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit ↵Yinghai Lu2-4/+23
numa is used Borislav Petkov reported his 32bit numa system has problem: [ 0.000000] Reserving total of 4c00 pages for numa KVA remap [ 0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe [ 0.000000] max_pfn = 238000 [ 0.000000] 8202MB HIGHMEM available. [ 0.000000] 885MB LOWMEM available. [ 0.000000] mapped low ram: 0 - 375fe000 [ 0.000000] low ram: 0 - 375fe000 [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000 [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80 [ 0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140 [ 0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000 [ 0.000000] BUG: unable to handle kernel paging request at 40000000 [ 0.000000] IP: [<c2c8cff1>] __alloc_memory_core_early+0x147/0x1d6 [ 0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00 ... [ 0.000000] Call Trace: [ 0.000000] [<c2c8b4f8>] ? __alloc_bootmem_node+0x216/0x22f [ 0.000000] [<c2c90c9b>] ? sparse_early_usemaps_alloc_node+0x5a/0x10b [ 0.000000] [<c2c9149e>] ? sparse_init+0x1dc/0x499 [ 0.000000] [<c2c79118>] ? paging_init+0x168/0x1df [ 0.000000] [<c2c780ff>] ? native_pagetable_setup_start+0xef/0x1bb looks like it allocates too much high address for bootmem. Try to cut limit with get_max_mapped() Reported-by: Borislav Petkov <[email protected]> Tested-by: Conny Seidel <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Cc: <[email protected]> [2.6.34.x] Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-07-20mm/vmscan.c: fix mapping use after freeNick Piggin1-1/+1
We need lock_page_nosync() here because we have no reference to the mapping when taking the page lock. Signed-off-by: Nick Piggin <[email protected]> Reviewed-by: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-07-19mm: add context argument to shrinker callbackDave Chinner1-3/+5
The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2010-07-19Merge branch 'kmemleak' of ↵Linus Torvalds2-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm * 'kmemleak' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm: kmemleak: Add support for NO_BOOTMEM configurations kmemleak: Annotate false positive in init_section_page_cgroup()
2010-07-19kmemleak: Add support for NO_BOOTMEM configurationsCatalin Marinas1-0/+5
With commits 08677214 and 59be5a8e, alloc_bootmem()/free_bootmem() and friends use the early_res functions for memory management when NO_BOOTMEM is enabled. This patch adds the kmemleak calls in the corresponding code paths for bootmem allocations. Signed-off-by: Catalin Marinas <[email protected]> Acked-by: Pekka Enberg <[email protected]> Acked-by: Yinghai Lu <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: [email protected]
2010-07-19kmemleak: Annotate false positive in init_section_page_cgroup()Catalin Marinas1-0/+7
The pointer to the page_cgroup table allocated in init_section_page_cgroup() is stored in section->page_cgroup as (base - pfn). Since this value does not point to the beginning or inside the allocated memory block, kmemleak reports a false positive. This was reported in bugzilla.kernel.org as #16297. Signed-off-by: Catalin Marinas <[email protected]> Reported-by: Adrien Dessemond <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrew Morton <[email protected]>
2010-07-14lmb: rename to memblockYinghai Lu3-0/+546
via following scripts FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N | sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: Ingo Molnar <[email protected]> Acked-by: "H. Peter Anvin" <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Acked-by: Linus Torvalds <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2010-07-08Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2-15/+5
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: writeback: simplify the write back thread queue writeback: split writeback_inodes_wb writeback: remove writeback_inodes_wbc fs-writeback: fix kernel-doc warnings splice: check f_mode for seekable file splice: direct_splice_actor() should not use pos in sd
2010-07-06writeback: simplify the write back thread queueChristoph Hellwig1-11/+3
First remove items from work_list as soon as we start working on them. This means we don't have to track any pending or visited state and can get rid of all the RCU magic freeing the work items - we can simply free them once the operation has finished. Second use a real completion for tracking synchronous requests - if the caller sets the completion pointer we complete it, otherwise use it as a boolean indicator that we can free the work item directly. Third unify struct wb_writeback_args and struct bdi_work into a single data structure, wb_writeback_work. Previous we set all parameters into a struct wb_writeback_args, copied it into struct bdi_work, copied it again on the stack to use it there. Instead of just allocate one structure dynamically or on the stack and use it all the way through the stack. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-07-06writeback: remove writeback_inodes_wbcChristoph Hellwig2-4/+2
This was just an odd wrapper around writeback_inodes_wb. Removing this also allows to get rid of the bdi member of struct writeback_control which was rather out of place there. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-06-29mempolicy: fix dangling reference to tmpfs superblock mpolLee Schermerhorn1-4/+5
My patch to "Factor out duplicate put/frees in mpol_shared_policy_init() to a common return path"; and Dan Carpenter's fix thereto both left a dangling reference to the incoming tmpfs superblock mempolicy structure. A similar leak was introduced earlier when the nodemask was moved offstack to the scratch area despite the note in the comment block regarding the incoming ref. Move the remaining 'put of the incoming "mpol" to the common exit path to drop the reference. Signed-off-by: Lee Schermerhorn <[email protected]> Acked-by: Dan Carpenter <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-06-29memcg: fix wake up in oom wait queueKAMEZAWA Hiroyuki1-1/+3
OOM-waitqueue should be waken up when oom_disable is canceled. This is a fix for 3c11ecf448eff8f1 ("memcg: oom kill disable and oom status"). How to test: Create a cgroup A... 1. set memory.limit and memory.memsw.limit to be small value 2. echo 1 > /cgroup/A/memory.oom_control, this disables oom-kill. 3. run a program which must cause OOM. A program executed in 3 will sleep by oom_waiqueue in memcg. Then, how to wake it up is problem. 1. echo 0 > /cgroup/A/memory.oom_control (enable OOM-killer) 2. echo big mem > /cgroup/A/memory.memsw.limit_in_bytes(allow more swap) etc.. Without the patch, a task in slept can not be waken up. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-06-29Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds1-3/+2
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Don't count_vm_events for discard bio in submit_bio. cfq: fix recursive call in cfq_blkiocg_update_completion_stats() cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=n cfq: Don't allow queue merges for queues that have no process references block: fix DISCARD_BARRIER requests cciss: set SCSI max cmd len to 16, as default is wrong cpqarray: fix two more wrong section type cpqarray: fix wrong __init type on pci probe function drbd: Fixed a race between disk-attach and unexpected state changes writeback: fix pin_sb_for_writeback writeback: add missing requeue_io in writeback_inodes_wb writeback: simplify and split bdi_start_writeback writeback: simplify wakeup_flusher_threads writeback: fix writeback_inodes_wb from writeback_inodes_sb writeback: enforce s_umount locking in writeback_inodes_sb writeback: queue work on stack in writeback_inodes_sb writeback: fix writeback completion notifications
2010-06-18percpu: fix first chunk match in per_cpu_ptr_to_phys()Tejun Heo1-3/+28
per_cpu_ptr_to_phys() determines whether the passed in @addr belongs to the first_chunk or not by just matching the address against the address range of the base unit (unit0, used by cpu0). When an adress from another cpu was passed in, it will always determine that the address doesn't belong to the first chunk even when it does. This makes the function return a bogus physical address which may lead to crash. This problem was discovered by Cliff Wickman while investigating a crash during kdump on a SGI UV system. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Cliff Wickman <[email protected]> Tested-by: Cliff Wickman <[email protected]> Cc: [email protected]
2010-06-17percpu: fix trivial bugs in pcpu_build_alloc_info()Pavel V. Panteleev1-3/+2
Fix the following two trivial bugs in pcpu_build_alloc_info() * we should memset group_cnt to 0 by size of group_cnt, not size of group_map (both are of the same size, so the bug isn't dangerous) * we can delete useless variable group_cnt_max. Signed-off-by: Pavel V. Panteleev <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2010-06-11writeback: simplify and split bdi_start_writebackChristoph Hellwig1-3/+2
bdi_start_writeback now never gets a superblock passed, so we can just remove that case. And to further untangle the code and flatten the call stack split it into two trivial helpers for it's two callers. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-06-08writeback: limit write_cache_pages integrity scanning to current EOFDave Chinner1-0/+15
sync can currently take a really long time if a concurrent writer is extending a file. The problem is that the dirty pages on the address space grow in the same direction as write_cache_pages scans, so if the writer keeps ahead of writeback, the writeback will not terminate until the writer stops adding dirty pages. For a data integrity sync, we only need to write the pages dirty at the time we start the writeback, so we can stop scanning once we get to the page that was at the end of the file at the time the scan started. This will prevent operations like copying a large file preventing sync from completing as it will not write back pages that were dirtied after the sync was started. This does not impact the existing integrity guarantees, as any dirty page (old or new) within the EOF range at the start of the scan will still be captured. This patch will not prevent sync from blocking on large writes into holes. That requires more complex intervention while this patch only addresses the common append-case of this sync holdoff. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-06-08writeback: pay attention to wbc->nr_to_write in write_cache_pagesDave Chinner1-10/+5
If a filesystem writes more than one page in ->writepage, write_cache_pages fails to notice this and continues to attempt writeback when wbc->nr_to_write has gone negative - this trace was captured from XFS: wbc_writeback_start: towrt=1024 wbc_writepage: towrt=1024 wbc_writepage: towrt=0 wbc_writepage: towrt=-1 wbc_writepage: towrt=-5 wbc_writepage: towrt=-21 wbc_writepage: towrt=-85 This has adverse effects on filesystem writeback behaviour. write_cache_pages() needs to terminate after a certain number of pages are written, not after a certain number of calls to ->writepage are made. This is a regression introduced by 17bc6c30cf6bfffd816bdc53682dd46fc34a2cf4 ("vfs: Add no_nrwrite_index_update writeback control flag"), but cannot be reverted directly due to subsequent bug fixes that have gone in on top of it. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-06-04Merge branch 'for-linus' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: Minix: Clean up left over label fix truncate inode time modification breakage fix setattr error handling in sysfs, configfs fcntl: return -EFAULT if copy_to_user fails wrong type for 'magic' argument in simple_fill_super() fix the deadlock in qib_fs mqueue doesn't need make_bad_inode()
2010-06-04Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds1-2/+2
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits) block: make blk_init_free_list and elevator_init idempotent block: avoid unconditionally freeing previously allocated request_queue pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface pipe: change the privilege required for growing a pipe beyond system max pipe: adjust minimum pipe size to 1 page block: disable preemption before using sched_clock() cciss: call BUG() earlier Preparing 8.3.8rc2 drbd: Reduce verbosity drbd: use drbd specific ratelimit instead of global printk_ratelimit drbd: fix hang on local read errors while disconnected drbd: Removed the now empty w_io_error() function drbd: removed duplicated #includes drbd: improve usage of MSG_MORE drbd: need to set socket bufsize early to take effect drbd: improve network latency, TCP_QUICKACK drbd: Revert "drbd: Create new current UUID as late as possible" brd: support discard Revert "writeback: fix WB_SYNC_NONE writeback from umount" Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync" ...
2010-06-04vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim failureKOSAKI Motohiro1-13/+16
Greg Thelen reported recent Johannes's stack diet patch makes kernel hang. His test is following. mount -t cgroup none /cgroups -o memory mkdir /cgroups/cg1 echo $$ > /cgroups/cg1/tasks dd bs=1024 count=1024 if=/dev/null of=/data/foo echo $$ > /cgroups/tasks echo 1 > /cgroups/cg1/memory.force_empty Actually, This OOM hard to try logic have been corrupted since following two years old patch. commit a41f24ea9fd6169b147c53c2392e2887cc1d9247 Author: Nishanth Aravamudan <[email protected]> Date: Tue Apr 29 00:58:25 2008 -0700 page allocator: smarter retry of costly-order allocations Original intention was "return success if the system have shrinkable zones though priority==0 reclaim was failure". But the above patch changed to "return nr_reclaimed if .....". Oh, That forgot nr_reclaimed may be 0 if priority==0 reclaim failure. And Johannes's patch 0aeb2339e54e ("vmscan: remove all_unreclaimable scan control") made it more corrupt. Originally, priority==0 reclaim failure on memcg return 0, but this patch changed to return 1. It totally confused memcg. This patch fixes it completely. Reported-by: Greg Thelen <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Tested-by: Greg Thelen <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-06-04fix truncate inode time modification breakageNick Piggin1-2/+3
mtime and ctime should be changed only if the file size has actually changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate sequence has caused regressions where they always update timestamps. There is some strange cases in POSIX where truncate(2) must not update times unless the size has acutally changed, see 6e656be89. This area is all still rather buggy in different ways in a lot of filesystems and needs a cleanup and audit (ideally the vfs will provide a simple attribute or call to direct all filesystems exactly which attributes to change). But coming up with the best solution will take a while and is not appropriate for rc anyway. So fix recent regression for now. Signed-off-by: Nick Piggin <[email protected]> Signed-off-by: Al Viro <[email protected]>
2010-06-01Merge branch 'master' into for-linusJens Axboe27-761/+2574
Conflicts: fs/pipe.c Signed-off-by: Jens Axboe <[email protected]>
2010-06-01Revert "writeback: fix WB_SYNC_NONE writeback from umount"Jens Axboe1-2/+2
This reverts commit e913fc825dc685a444cb4c1d0f9d32f372f59861. We are investigating a hang associated with the WB_SYNC_NONE changes, so revert them for now. Conflicts: fs/fs-writeback.c mm/page-writeback.c Signed-off-by: Jens Axboe <[email protected]>
2010-05-30Merge branch 'slub/urgent' of ↵Linus Torvalds1-22/+11
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slub/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: SLUB: Allow full duplication of kmalloc array for 390 slub: move kmem_cache_node into it's own cacheline
2010-05-30Merge branch 'for-linus' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: mm: export generic_pipe_buf_*() to modules fuse: support splice() reading from fuse device fuse: allow splice to move pages mm: export remove_from_page_cache() to modules mm: export lru_cache_add_*() to modules fuse: support splice() writing to fuse device fuse: get page reference for readpages fuse: use get_user_pages_fast() fuse: remove unneeded variable
2010-05-27tmpfs: convert to use the new truncate convention[email protected]1-21/+22
Cc: Christoph Hellwig <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Nick Piggin <[email protected]> Signed-off-by: Al Viro <[email protected]>