aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-03-22ST SPEAr: PCIE gadget suppportPratyush Anand5-0/+1080
This is a configurable gadget. can be configured by configfs interface. Any IP available at PCIE bus can be programmed to be used by host controller.It supoorts both INTX and MSI. By default, the gadget is configured for INTX and SYSRAM1 is mapped to BAR0 with size 0x1000 Signed-off-by: Pratyush Anand <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Jesse Barnes <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Shiraz Hashim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22drivers/misc/bmp085.c: free initmem memoryShubhrajyoti Datta1-3/+3
Free the memory that is used only at init Signed-off-by: Shubhrajyoti Datta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22bh1780gli: convert to dev pm opsShubhrajyoti Datta1-9/+10
Signed-off-by: Shubhrajyoti Datta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22fs.h: remove 8 bytes of padding from block_device on 64bit buildsRichard Kennedy1-1/+1
Re-ordering struct block_inode to remove 8 bytes of padding on 64 bit builds, which also shrinks bdev_inode by 8 bytes (776 -> 768) allowing it to fit into one fewer cache lines. Signed-off-by: Richard Kennedy <[email protected]> Cc: Al Viro <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22include/linux/compiler-gcc*.h: unify macro definitionsBorislav Petkov3-16/+8
Unify identical gcc3.x and gcc4.x macros. Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22fs: use appropriate printk priority levelsMandeep Singh Baines3-3/+4
printk()s without a priority level default to KERN_WARNING. To reduce noise at KERN_WARNING, this patch set the priority level appriopriately for unleveled printks()s. This should be useful to folks that look at dmesg warnings closely. Signed-off-by: Mandeep Singh Baines <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22add the common dma_addr_t typedef to include/linux/types.hFUJITA Tomonori19-106/+6
All architectures can use the common dma_addr_t typedef now. We can remove the arch specific dma_addr_t. Signed-off-by: FUJITA Tomonori <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Matt Turner <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Chris Metcalf <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22um: remove file pointer from ioctlRichard Weinberger2-4/+4
Commit 6caa76b ("tty: now phase out the ioctl file pointer for good") removed the ioctl file pointer. User Mode Linux's line driver uses this ioctl and needs a signature update too. Signed-off-by: Richard Weinberger <[email protected]> Cc: Alan Cox <[email protected]> Cc: Greg KH <[email protected]> Cc: Jeff Dike <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22uml: kernels on {i386,x86_64} produce bad coredumpsPaul Pluzhnikov2-0/+4
One of our users reported that when a user-level program SIGSEGVs under UML kernel, the resulting core dump is not very usable. I have reproduced that with the latest kernel: make ARCH=um defconfig; make ARCH=um Run the resulting kernel, then "inside" run this program: #include <pthread.h> void *fn(void *p) { abort(); } int main() { pthread_t tid; pthread_create(&tid, 0, fn, 0); pthread_join(tid, 0); return 0; } Analyze the coredump with GDB. Here is what you'll see: sudo gdb -q -ex 'set solib-absolute-prefix ../root_fs' -ex 'file ../root_fs/var/tmp/mt-abort' -ex 'core ../root_fs/var/tmp/core.762' Reading symbols from /usr/local/google/root_fs/var/tmp/mt-abort...done. [New Thread 763] [New Thread 762] Core was generated by `./mt-abort'. Program terminated with signal 6, Aborted. #0 0x0000000040255250 in raise () from ../root_fs/lib64/libc.so.6 (gdb) info thread 2 Thread 762 0x0000000000000000 in ?? () * 1 Thread 763 0x0000000040255250 in raise () from ../root_fs/lib64/libc.so.6 Note that thread#2 looks funny. (gdb) thread 2 [Switching to thread 2 (Thread 762)]#0 0x0000000000000000 in ?? () (gdb) info reg rax 0x0 0 rbx 0x0 0 rcx 0x0 0 rdx 0x0 0 rsi 0x0 0 rdi 0x0 0 rbp 0x0 0x0 rsp 0x0 0x0 r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x0 0 r12 0x0 0 r13 0x0 0 r14 0x0 0 r15 0x0 0 rip 0x0 0 eflags 0x0 [ ] cs 0x0 0 ss 0x0 0 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 Examining the core shows that NT_PRSTATUS notes for all threads other than the one that crashed are zeroed out. I believe this is happening because neither ELF_CORE_COPY_TASK_REGS nor task_pt_regs are defined under ARCH=um, and so elf_core_copy_task_regs() becomes a no-op. Attached patch fixes this for SUBARCH={x86_64,i386}. Signed-off-by: Paul Pluzhnikov <[email protected]> Cc: Jeff Dike <[email protected]> Acked-by: WANG Cong <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: simplify code of swap.cShaohua Li1-75/+58
Clean up code and remove duplicate code. Next patch will use pagevec_lru_move_fn introduced here too. Signed-off-by: Shaohua Li <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Hiroyuki Kamezawa <[email protected]> Cc: Andi Kleen <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22shmem: let shared anonymous be nonlinear againHugh Dickins1-0/+1
Up to 2.6.22, you could use remap_file_pages(2) on a tmpfs file or a shared mapping of /dev/zero or a shared anonymous mapping. In 2.6.23 we disabled it by default, but set VM_CAN_NONLINEAR to enable it on safe mappings. We made sure to set it in shmem_mmap() for tmpfs files, but missed it in shmem_zero_setup() for the others. Fix that at last. Reported-by: Kenny Simpson <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm/memblock: properly handle overlaps and fix error pathBenjamin Herrenschmidt1-109/+132
Currently memblock_reserve() or memblock_free() don't handle overlaps of any kind. There is some special casing for coalescing exactly adjacent regions but that's about it. This is annoying because typically memblock_reserve() is used to mark regions passed by the firmware as reserved and we all know how much we can trust our firmwares... Also, with the current code, if we do something it doesn't handle right such as trying to memblock_reserve() a large range spanning multiple existing smaller reserved regions for example, or doing overlapping reservations, it can silently corrupt the internal region array, causing odd errors much later on, such as allocations returning reserved regions etc... This patch rewrites the underlying functions that add or remove a region to the arrays. The new code is a lot more robust as it fully handles overlapping regions. It's also, imho, simpler than the previous implementation. In addition, while doing so, I found a bug where if we fail to double the array while adding a region, we would remove the last region of the array rather than the region we just allocated. This fixes it too. Signed-off-by: Benjamin Herrenschmidt <[email protected]> Acked-by: Yinghai Lu <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm/page_alloc.c: use list_move() instead of list_del()/list_add() combinationKirill A. Shutemov1-3/+2
Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22vmalloc: remove confusing comment on vwrite()Namhyung Kim1-2/+0
KM_USER1 is never used for vwrite() path so the caller doesn't need to guarantee it is not used. Only the caller should guarantee is KM_USER0 and it is commented already. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22writeback: make mapping->writeback_index to point to the last written pageJun'ichi Nomura1-1/+2
For range-cyclic writeback (e.g. kupdate), the writeback code sets a continuation point of the next writeback to mapping->writeback_index which is set the page after the last written page. This happens so that we evenly write the whole file even if pages in it get continuously redirtied. However, in some cases, sequential writer is writing in the middle of the page and it just redirties the last written page by continuing from that. For example with an application which uses a file as a big ring buffer we see: [1st writeback session] ... flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898514 + 8 flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898522 + 8 flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898530 + 8 flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898538 + 8 flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898546 + 8 kworker/0:1-11 4571: block_rq_issue: 8,0 W 0 () 94898514 + 40 >> flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898554 + 8 >> flush-8:0-2743 4571: block_rq_issue: 8,0 W 0 () 94898554 + 8 [2nd writeback session after 35sec] flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898562 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898570 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898578 + 8 ... kworker/0:1-11 4606: block_rq_issue: 8,0 W 0 () 94898562 + 640 kworker/0:1-11 4606: block_rq_issue: 8,0 W 0 () 94899202 + 72 ... flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899962 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899970 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899978 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899986 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899994 + 8 kworker/0:1-11 4606: block_rq_issue: 8,0 W 0 () 94899962 + 40 >> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898554 + 8 >> flush-8:0-2743 4606: block_rq_issue: 8,0 W 0 () 94898554 + 8 So we seeked back to 94898554 after we wrote all the pages at the end of the file. This extra seek seems unnecessary. If we continue writeback from the last written page, we can avoid it and do not cause harm to other cases. The original intent of even writeout over the whole file is preserved and if the page does not get redirtied pagevec_lookup_tag() just skips it. As an exceptional case, when I/O error happens, set done_index to the next page as the comment in the code suggests. Tested-by: Wu Fengguang <[email protected]> Signed-off-by: Jun'ichi Nomura <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: remove inline from scan_swap_map()Cesar Eduardo Barros1-2/+2
scan_swap_map() is a large function (224 lines), with several loops and a complex control flow involving several gotos. Given all that, it is a bit silly that it is marked as inline. The compiler agrees with me: on a x86-64 compile, it did not inline the function. Remove the "inline" and let the compiler decide instead. Signed-off-by: Cesar Eduardo Barros <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: separate final enabling of the swapfileCesar Eduardo Barros1-42/+42
The block in sys_swapon which does the final adjustments to the swap_info_struct and to swap_list is the same as the block which re-inserts it again at sys_swapoff on failure of try_to_unuse(). Move this code to a separate function, and use it both in sys_swapon and sys_swapoff. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapoff: change order to match sys_swaponCesar Eduardo Barros1-3/+4
The block in sys_swapon which does the final adjustments to the swap_info_struct and to swap_list is the same as the block which re-inserts it again at sys_swapoff on failure of try_to_unuse(), except for the order of the operations within the lock. Since the order should not matter, arbitrarily change sys_swapoff to match sys_swapon, in preparation to making both share the same code. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: move printk outside lockCesar Eduardo Barros1-7/+8
The block in sys_swapon which does the final adjustments to the swap_info_struct and to swap_list is the same as the block which re-inserts it again at sys_swapoff on failure of try_to_unuse(). To be able to make both share the same code, move the printk() call in the middle of it to just after it. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: remove nr_good_pages variableCesar Eduardo Barros1-5/+3
It still exists within setup_swap_map_and_extents(), but after it nr_good_pages == p->pages. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: simplify error flow in setup_swap_map_and_extents()Cesar Eduardo Barros1-14/+5
Since there is no cleanup to do, there is no reason to jump to a label. Return directly instead. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: separate parsing of bad blocks and extentsCesar Eduardo Barros1-29/+54
Move the code which parses the bad block list and the extents to a separate function. Only code movement, no functional changes. This change uses the fact that, after the success path, nr_good_pages == p->pages. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: call swap_cgroup_swapon() earlierCesar Eduardo Barros1-4/+4
The call to swap_cgroup_swapon is in the middle of loading the swap map and extents. As it only does memory allocation and does not depend on the swapfile layout (map/extents), it can be called earlier (or later). Move it to just after the allocation of swap_map, since it is conceptually similar (allocates a map). Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: simplify error flow in read_swap_header()Cesar Eduardo Barros1-9/+6
Since there is no cleanup to do, there is no reason to jump to a label. Return directly instead. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: separate parsing of swapfile headerCesar Eduardo Barros1-62/+78
Move the code which parses and checks the swapfile header (except for the bad block list) to a separate function. Only code movement, no functional changes. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: move setting of swapfilepages near useCesar Eduardo Barros1-2/+1
There is no reason I can see to read inode->i_size long before it is needed. Move its read to just before it is needed, to reduce the variable lifetime. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Jesper Juhl <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: simplify error flow in claim_swapfile()Cesar Eduardo Barros1-14/+6
Since there is no cleanup to do, there is no reason to jump to a label. Return directly instead. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: separate bdev claim and inode lockCesar Eduardo Barros1-25/+39
Move the code which claims the bdev (S_ISBLK) or locks the inode (S_ISREG) to a separate function. Only code movement, no functional changes. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: use a single error labelCesar Eduardo Barros1-4/+3
sys_swapon currently has two error labels, bad_swap and bad_swap_2. bad_swap does the same as bad_swap_2 plus destroy_swap_extents() and swap_cgroup_swapoff(); both are noops in the places where bad_swap_2 is jumped to. With a single extra test for inode (matching the one in the S_ISREG case below), all the error paths in the function can go to bad_swap. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: do only cleanup in the cleanup blocksCesar Eduardo Barros1-4/+3
The only way error is 0 in the cleanup blocks is when the function is returning successfully. In this case, the cleanup blocks were setting S_SWAPFILE in the S_ISREG case. But this is not a cleanup. Move the setting of S_SWAPFILE to just before the "goto out;" to make this more clear. At this point, we do not need to test for inode because it will never be NULL. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: remove bdev variableCesar Eduardo Barros1-10/+9
The bdev variable is always equivalent to (S_ISBLK(inode->i_mode) ? p->bdev : NULL), as long as it being set is moved to a bit earlier. Use this fact to remove the bdev variable. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: move setting of error nearer useCesar Eduardo Barros1-5/+6
Move the setting of the error variable nearer the goto in a few places. Avoids calling PTR_ERR() if not IS_ERR() in two places, and makes the error condition more explicit in two other places. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Jesper Juhl <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: remove did_down variableCesar Eduardo Barros1-6/+2
Since mutex_lock(&inode->i_mutex) is called just after setting inode, did_down is always equivalent to (inode && S_ISREG(inode->i_mode)). Use this fact to remove the did_down variable. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: remove initial value of name variableCesar Eduardo Barros1-1/+1
Now there is nothing which jumps to the cleanup blocks before the name variable is set. There is no need to set it initially to NULL anymore. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: simplify error flow in alloc_swap_info()Cesar Eduardo Barros1-6/+1
Since there is no cleanup to do, there is no reason to jump to a label. Return directly instead. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: simplify error return from swap_info allocationCesar Eduardo Barros1-4/+2
At this point in sys_swapon, there is nothing to free. Return directly instead of jumping to the cleanup block at the end of the function. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: separate swap_info allocationCesar Eduardo Barros1-20/+37
Move the swap_info allocation to its own function. Only code movement, no functional changes. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: do not depend on "type" after allocationCesar Eduardo Barros1-5/+5
Within sys_swapon, after the swap_info entry has been allocated, we always have type == p->type and swap_info[type] == p. Use this fact to reduce the dependency on the "type" local variable within the function, as a preparation to move the allocation of the swap_info entry to a separate function. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: remove changelog from function commentCesar Eduardo Barros1-5/+0
Changelogs belong in the git history instead of in the source code. Also, "The swapon system call" is redundant with "SYSCALL_DEFINE2(swapon, ...)". Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: Jesper Juhl <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> [ Gaah. That's a _historical_ comment. But the patch-series depends on removal ] Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_swapon: use vzalloc() instead of vmalloc/memsetCesar Eduardo Barros1-2/+1
This patch series refactors the sys_swapon function. sys_swapon is currently a very large function, with 313 lines (more than 12 25-line screens), which can make it a bit hard to read. This patch series reduces this size by half, by extracting large chunks of related code to new helper functions. One of these chunks of code was nearly identical to the part of sys_swapoff which is used in case of a failure return from try_to_unuse(), so this patch series also makes both share the same code. As a side effect of all this refactoring, the compiled code gets a bit smaller (from v1 of this patch series): text data bss dec hex filename 14012 944 276 15232 3b80 mm/swapfile.o.before 13941 944 276 15161 3b39 mm/swapfile.o.after This patch: Use vzalloc() instead of vmalloc/memset. Signed-off-by: Cesar Eduardo Barros <[email protected]> Tested-by: Eric B Munson <[email protected]> Acked-by: Eric B Munson <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Reviewed-by: Jesper Juhl <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: use __GFP_OTHER_NODE for transparent huge pagesAndi Kleen1-9/+11
Pass __GFP_OTHER_NODE for transparent hugepages NUMA allocations done by the hugepages daemon. This way the low level accounting for local versus remote pages works correctly. Contains improvements from Andrea Arcangeli [[email protected]: coding-style fixes] Signed-off-by: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: add __GFP_OTHER_NODE flagAndi Kleen4-5/+12
Add a new __GFP_OTHER_NODE flag to tell the low level numa statistics in zone_statistics() that an allocation is on behalf of another thread. This way the local and remote counters can be still correct, even when background daemons like khugepaged are changing memory mappings. This only affects the accounting, but I think it's worth doing that right to avoid confusing users. I first tried to just pass down the right node, but this required a lot of changes to pass down this parameter and at least one addition of a 10th argument to a 9 argument function. Using the flag is a lot less intrusive. Open: should be also used for migration? [[email protected]: coding-style fixes] Signed-off-by: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: compaction: Use async migration for __GFP_NO_KSWAPD and enforce no writebackAndrea Arcangeli2-16/+34
__GFP_NO_KSWAPD allocations are usually very expensive and not mandatory to succeed as they have graceful fallback. Waiting for I/O in those, tends to be overkill in terms of latencies, so we can reduce their latency by disabling sync migrate. Unfortunately, even with async migration it's still possible for the process to be blocked waiting for a request slot (e.g. get_request_wait in the block layer) when ->writepage is called. To prevent __GFP_NO_KSWAPD blocking, this patch prevents ->writepage being called on dirty page cache for asynchronous migration. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31142 [[email protected]: Avoid writebacks for NFS, retry locked pages, use bool] Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Cc: Arthur Marsh <[email protected]> Cc: Clemens Ladisch <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Minchan Kim <[email protected]> Reported-by: Alex Villacis Lasso <[email protected]> Tested-by: Alex Villacis Lasso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: compaction: minimise the time IRQs are disabled while isolating pages ↵Andrea Arcangeli1-0/+18
for migration compaction_alloc() isolates pages for migration in isolate_migratepages. While it's scanning, IRQs are disabled on the mistaken assumption the scanning should be short. Tests show this to be true for the most part but contention times on the LRU lock can be increased. Before this patch, the IRQ disabled times for a simple test looked like Total sampled time IRQs off (not real total time): 5493 Event shrink_inactive_list..shrink_zone 1596 us count 1 Event shrink_inactive_list..shrink_zone 1530 us count 1 Event shrink_inactive_list..shrink_zone 956 us count 1 Event shrink_inactive_list..shrink_zone 541 us count 1 Event shrink_inactive_list..shrink_zone 531 us count 1 Event split_huge_page..add_to_swap 232 us count 1 Event save_args..call_softirq 36 us count 1 Event save_args..call_softirq 35 us count 2 Event __wake_up..__wake_up 1 us count 1 This patch reduces the worst-case IRQs-disabled latencies by releasing the lock every SWAP_CLUSTER_MAX pages that are scanned and releasing the CPU if necessary. The cost of this is that the processing performing compaction will be slower but IRQs being disabled for too long a time has worse consequences as the following report shows; Total sampled time IRQs off (not real total time): 4367 Event shrink_inactive_list..shrink_zone 881 us count 1 Event shrink_inactive_list..shrink_zone 875 us count 1 Event shrink_inactive_list..shrink_zone 868 us count 1 Event shrink_inactive_list..shrink_zone 555 us count 1 Event split_huge_page..add_to_swap 495 us count 1 Event compact_zone..compact_zone_order 269 us count 1 Event split_huge_page..add_to_swap 266 us count 1 Event shrink_inactive_list..shrink_zone 85 us count 1 Event save_args..call_softirq 36 us count 2 Event __wake_up..__wake_up 1 us count 1 [[email protected]: simplify with s/unlocked/locked/] Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Arthur Marsh <[email protected]> Cc: Clemens Ladisch <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: compaction: minimise the time IRQs are disabled while isolating free pagesMel Gorman1-5/+13
compaction_alloc() isolates free pages to be used as migration targets. While its scanning, IRQs are disabled on the mistaken assumption the scanning should be short. Analysis showed that IRQs were in fact being disabled for substantial time. A simple test was run using large anonymous mappings with transparent hugepage support enabled to trigger frequent compactions. A monitor sampled what the worst IRQ-off latencies were and a post-processing tool found the following; Total sampled time IRQs off (not real total time): 22355 Event compaction_alloc..compaction_alloc 8409 us count 1 Event compaction_alloc..compaction_alloc 7341 us count 1 Event compaction_alloc..compaction_alloc 2463 us count 1 Event compaction_alloc..compaction_alloc 2054 us count 1 Event shrink_inactive_list..shrink_zone 1864 us count 1 Event shrink_inactive_list..shrink_zone 88 us count 1 Event save_args..call_softirq 36 us count 1 Event save_args..call_softirq 35 us count 2 Event __make_request..__blk_run_queue 24 us count 1 Event __alloc_pages_nodemask..__alloc_pages_nodemask 6 us count 1 i.e. compaction is disabled IRQs for a prolonged period of time - 8ms in one instance. The full report generated by the tool can be found at http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-vanilla-micro.report This patch reduces the time IRQs are disabled by simply disabling IRQs at the last possible minute. An updated IRQs-off summary report then looks like; Total sampled time IRQs off (not real total time): 5493 Event shrink_inactive_list..shrink_zone 1596 us count 1 Event shrink_inactive_list..shrink_zone 1530 us count 1 Event shrink_inactive_list..shrink_zone 956 us count 1 Event shrink_inactive_list..shrink_zone 541 us count 1 Event shrink_inactive_list..shrink_zone 531 us count 1 Event split_huge_page..add_to_swap 232 us count 1 Event save_args..call_softirq 36 us count 1 Event save_args..call_softirq 35 us count 2 Event __wake_up..__wake_up 1 us count 1 A full report is again available at http://www.csn.ul.ie/~mel/postings/minfree-20110225/irqsoff-minimiseirq-free-v1r4-micro.report As should be obvious, IRQ disabled latencies due to compaction are almost elimimnated for this particular test. [[email protected]: Fix initialisation of isolated] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Acked-by: Andrea Arcangeli <[email protected]> Cc: Arthur Marsh <[email protected]> Cc: Clemens Ladisch <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: don't return 0 too early from find_get_pages()Hugh Dickins1-0/+14
Callers of find_get_pages(), or its wrapper pagevec_lookup() - notably truncate_inode_pages_range() - stop looking further when it returns 0. But if an interrupt comes just after its radix_tree_gang_lookup_slot(), especially if we have preemptible RCU enabled, isn't it conceivable that all 14 pages returned could be removed from the page cache by shrink_page_list(), before find_get_pages() gets to process them? So causing it to return 0 although there may be plenty more pages beyond. Make find_get_pages() and find_get_pages_tag() check for this unlikely case, and restart should it occur; but callers of find_get_pages_contig() have no such expectation, it's okay for that to return 0 early. I have not seen this in practice, just worried by the possibility. Signed-off-by: Hugh Dickins <[email protected]> Cc: Nick Piggin <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: Salman Qazi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: remove worrying dead code from find_get_pages()Hugh Dickins1-2/+16
The radix_tree_deref_retry() case in find_get_pages() has a strange little excrescence, not seen in the other gang lookups: it looks like the start of an abandoned attempt to guarantee forward progress in a case that cannot arise. ret should always be 0 here: if it isn't, then going back to restart will leak references to pages already gotten. There used to be a comment saying nr_found is necessarily 1 here: that's not quite true, but the radix_tree_deref_retry() case is peculiar to the entry at index 0, when we race with it being moved out of the radix_tree root or back. Remove the worrisome two lines, add a brief comment here and in find_get_pages_contig() and find_get_pages_tag(), and a WARN_ON in find_get_pages() should it ever be seen elsewhere than at 0. Signed-off-by: Hugh Dickins <[email protected]> Cc: Nick Piggin <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: Salman Qazi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22hugetlbfs: correct handling of negative input to /proc/sys/vm/nr_hugepagesPetr Holasek1-4/+2
When the user inserts a negative value into /proc/sys/vm/nr_hugepages it will cause the kernel to allocate as many hugepages as possible and to then update /proc/meminfo to reflect this. This changes the behavior so that the negative input will result in nr_hugepages value being unchanged. Signed-off-by: Petr Holasek <[email protected]> Signed-off-by: Anton Arapov <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Eric B Munson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: vmscan: kswapd should not free an excessive number of pages when ↵Mel Gorman2-3/+22
balancing small zones When reclaiming for order-0 pages, kswapd requires that all zones be balanced. Each cycle through balance_pgdat() does background ageing on all zones if necessary and applies equal pressure on the inactive zone unless a lot of pages are free already. A "lot of free pages" is defined as a "balance gap" above the high watermark which is currently 7*high_watermark. Historically this was reasonable as min_free_kbytes was small. However, on systems using huge pages, it is recommended that min_free_kbytes is higher and it is tuned with hugeadm --set-recommended-min_free_kbytes. With the introduction of transparent huge page support, this recommended value is also applied. On X86-64 with 4G of memory, min_free_kbytes becomes 67584 so one would expect around 68M of memory to be free. The Normal zone is approximately 35000 pages so under even normal memory pressure such as copying a large file, it gets exhausted quickly. As it is getting exhausted, kswapd applies pressure equally to all zones, including the DMA32 zone. DMA32 is approximately 700,000 pages with a high watermark of around 23,000 pages. In this situation, kswapd will reclaim around (23000*8 where 8 is the high watermark + balance gap of 7 * high watermark) pages or 718M of pages before the zone is ignored. What the user sees is that free memory far higher than it should be. To avoid an excessive number of pages being reclaimed from the larger zones, explicitely defines the "balance gap" to be either 1% of the zone or the low watermark for the zone, whichever is smaller. While kswapd will check all zones to apply pressure, it'll ignore zones that meets the (high_wmark + balance_gap) watermark. To test this, 80G were copied from a partition and the amount of memory being used was recorded. A comparison of a patch and unpatched kernel can be seen at http://www.csn.ul.ie/~mel/postings/minfree-20110222/memory-usage-hydra.ps and shows that kswapd is not reclaiming as much memory with the patch applied. Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: "Chen, Tim C" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mempolicy: remove redundant check in __mpol_equal()Namhyung Kim1-2/+1
The 'flags' field is already checked, no need to do it again. Signed-off-by: Namhyung Kim <[email protected]> Cc: Bob Liu <[email protected]> Cc: Lee Schermerhorn <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>