aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-06-16migrate: don't account swapcache as shmemAndrea Arcangeli1-1/+1
swapcache will reach the below code path in migrate_page_move_mapping, and swapcache is accounted as NR_FILE_PAGES but it's not accounted as NR_SHMEM. Hugh pointed out we must use PageSwapCache instead of comparing mapping to &swapper_space, to avoid build failure with CONFIG_SWAP=n. Signed-off-by: Andrea Arcangeli <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2011-06-16Merge branch 'rc-fixes' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: kbuild: Call depmod.sh via shell perf: clear out make flags when calling kernel make kernelver
2011-06-16Merge branch 'for-linus' of ↵Linus Torvalds17-213/+210
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: AFS: Use i_generation not i_version for the vnode uniquifier AFS: Set s_id in the superblock to the volume name vfs: Fix data corruption after failed write in __block_write_begin() afs: afs_fill_page reads too much, or wrong data VFS: Fix vfsmount overput on simultaneous automount fix wrong iput on d_inode introduced by e6bc45d65d Delay struct net freeing while there's a sysfs instance refering to it afs: fix sget() races, close leak on umount ubifs: fix sget races ubifs: split allocation of ubifs_info into a separate function fix leak in proc_set_super()
2011-06-16Merge branch 'sh-fixes-for-linus' of ↵Linus Torvalds10-42/+140
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x * 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x: sh: sh7724: Add USBHS DMAEngine support sh: ecovec: Add renesas_usbhs support sh, exec: remove redundant set_fs(USER_DS) drivers: sh: resume enabled clocks fix dmaengine: shdma: SH_DMAC_MAX_CHANNELS message fix sh: Fix up xchg/cmpxchg corruption with gUSA RB. sh: Remove compressed kernel libgcc dependency. sh: fix wrong icache/dcache address-array start addr in cache-debugfs.
2011-06-16Merge branch 'rmobile-fixes-for-linus' of ↵Linus Torvalds4-71/+161
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x * 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x: ARM: mach-shmobile: mackerel: tidyup usbhs driver settings ARM: mach-shmobile: Correct SCIF port types for SH7367. ARM: mach-shmobile: sh73a0 gic_arch_extn.irq_set_wake() fix ARM: mach-shmobile: Mackerel USB platform data update ARM: mach-shmobile: AG5EVM SDHI1 platform data update
2011-06-16Merge branch 'fbdev-fixes-for-linus' of ↵Linus Torvalds4-32/+20
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-3.x * 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-3.x: fbdev: sh_mobile_hdmi: fix regression: statically enable RTPM fbdev/atyfb: Fix 2 defined-but-not-used warnings efifb: Fix call to wrong unregister function video: s3c-fb: move enabling channel for window video: s3c-fb: fix virtual resolution checking video: s3c-fb: fix misleading kfree in remove function
2011-06-16Merge branch 'for-linus' of ↵Linus Torvalds2-1/+39
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: SELinux: skip file_name_trans_write() when policy downgraded. selinux: fix case of names with whitespace/multibytes on /selinux/create
2011-06-16AFS: Use i_generation not i_version for the vnode uniquifierDavid Howells3-10/+11
Store the AFS vnode uniquifier in the i_generation field, not the i_version field of the inode struct. i_version can then be given the AFS data version number. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-06-16AFS: Set s_id in the superblock to the volume nameDavid Howells1-0/+1
Set s_id in the superblock to the name of the AFS volume that this superblock corresponds to. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-06-16vfs: Fix data corruption after failed write in __block_write_begin()Jan Kara1-3/+1
I've got a report of a file corruption from fsxlinux on ext3. The important operations to the page were: mapwrite to a hole partial write to the page read - found the page zeroed from the end of the normal write The culprit seems to be that if get_block() fails in __block_write_begin() (e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page). Thus when we retry the write, the logic in __block_write_begin() thinks zeroing of the page is needed and overwrites old data. In fact, I don't see why we should ever need to zero the uptodate bit here - either the page was uptodate when we entered __block_write_begin() and it should stay so when we leave it, or it was not uptodate and noone had right to set it uptodate during __block_write_begin() so it remains !uptodate when we leave as well. So just remove clearing of the bit. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-06-16afs: afs_fill_page reads too much, or wrong dataAnton Blanchard1-12/+9
afs_fill_page should read the page that is about to be written but the current implementation has a number of issues. If we aren't extending the file we always read PAGE_CACHE_SIZE at offset 0. If we are extending the file we try to read the entire file. Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset, clamped to i_size. While here, avoid calling afs_fill_page when we are doing a PAGE_CACHE_SIZE write. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-06-16staging: fix iio builds when IIO_RING_BUFFER is not enabledRandy Dunlap2-2/+2
Fix build by moving enum list outside of #ifdef CONFIG_IIO_RING_BUFFER. drivers/staging/iio/accel/adis16201_core.c:413: error: 'ADIS16201_SCAN_SUPPLY' undeclared here (not in a function) drivers/staging/iio/accel/adis16201_core.c:417: error: 'ADIS16201_SCAN_TEMP' undeclared here (not in a function) .. drivers/staging/iio/accel/adis16203_core.c:374: error: 'ADIS16203_SCAN_SUPPLY' undeclared here (not in a function) drivers/staging/iio/accel/adis16203_core.c:378: error: 'ADIS16203_SCAN_AUX_ADC' undeclared here (not in a function) .. Signed-off-by: Randy Dunlap <[email protected]> Acked-by: Jonathan Cameron <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2011-06-16VFS: Fix vfsmount overput on simultaneous automountAl Viro1-8/+16
[Kudos to dhowells for tracking that crap down] If two processes attempt to cause automounting on the same mountpoint at the same time, the vfsmount holding the mountpoint will be left with one too few references on it, causing a BUG when the kernel tries to clean up. The problem is that lock_mount() drops the caller's reference to the mountpoint's vfsmount in the case where it finds something already mounted on the mountpoint as it transits to the mounted filesystem and replaces path->mnt with the new mountpoint vfsmount. During a pathwalk, however, we don't take a reference on the vfsmount if it is the same as the one in the nameidata struct, but do_add_mount() doesn't know this. The fix is to make sure we have a ref on the vfsmount of the mountpoint before calling do_add_mount(). However, if lock_mount() doesn't transit, we're then left with an extra ref on the mountpoint vfsmount which needs releasing. We can handle that in follow_managed() by not making assumptions about what we can and what we cannot get from lookup_mnt() as the current code does. The callers of follow_managed() expect that reference to path->mnt will be grabbed iff path->mnt has been changed. follow_managed() and follow_automount() keep track of whether such reference has been grabbed and assume that it'll happen in those and only those cases that'll have us return with changed path->mnt. That assumption is almost correct - it breaks in case of racing automounts and in even harder to hit race between following a mountpoint and a couple of mount --move. The thing is, we don't need to make that assumption at all - after the end of loop in follow_manage() we can check if path->mnt has ended up unchanged and do mntput() if needed. The BUG can be reproduced with the following test program: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <sys/wait.h> int main(int argc, char **argv) { int pid, ws; struct stat buf; pid = fork(); stat(argv[1], &buf); if (pid > 0) wait(&ws); return 0; } and the following procedure: (1) Mount an NFS volume that on the server has something else mounted on a subdirectory. For instance, I can mount / from my server: mount warthog:/ /mnt -t nfs4 -r On the server /data has another filesystem mounted on it, so NFS will see a change in FSID as it walks down the path, and will mark /mnt/data as being a mountpoint. This will cause the automount code to be triggered. !!! Do not look inside the mounted fs at this point !!! (2) Run the above program on a file within the submount to generate two simultaneous automount requests: /tmp/forkstat /mnt/data/testfile (3) Unmount the automounted submount: umount /mnt/data (4) Unmount the original mount: umount /mnt At this point the kernel should throw a BUG with something like the following: BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12] Note that the bug appears on the root dentry of the original mount, not the mountpoint and not the submount because sys_umount() hasn't got to its final mntput_no_expire() yet, but this isn't so obvious from the call trace: [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs] [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs] [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b [<ffffffff81186261>] mntput_no_expire+0x18d/0x199 [<ffffffff811862a8>] mntput+0x3b/0x44 [<ffffffff81186d87>] release_mounts+0xa2/0xbf [<ffffffff811876af>] sys_umount+0x47a/0x4ba [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b as do_umount() is inlined. However, you can see release_mounts() in there. Note also that it may be necessary to have multiple CPU cores to be able to trigger this bug. Tested-by: Jeff Layton <[email protected]> Tested-by: Ian Kent <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-06-16fix wrong iput on d_inode introduced by e6bc45d65dTörök Edwin1-1/+3
Git bisection shows that commit e6bc45d65df8599fdbae73be9cec4ceed274db53 causes BUG_ONs under high I/O load: kernel BUG at fs/inode.c:1368! [ 2862.501007] Call Trace: [ 2862.501007] [<ffffffff811691d8>] d_kill+0xf8/0x140 [ 2862.501007] [<ffffffff81169c19>] dput+0xc9/0x190 [ 2862.501007] [<ffffffff8115577f>] fput+0x15f/0x210 [ 2862.501007] [<ffffffff81152171>] filp_close+0x61/0x90 [ 2862.501007] [<ffffffff81152251>] sys_close+0xb1/0x110 [ 2862.501007] [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b A reliable way to reproduce this bug is: Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk, and apt-get remove openjdk-6-jdk. The buggy part of the patch is this: struct inode *inode = NULL; ..... - if (nd.last.name[nd.last.len]) - goto slashes; inode = dentry->d_inode; - if (inode) - ihold(inode); + if (nd.last.name[nd.last.len] || !inode) + goto slashes; + ihold(inode) ... if (inode) iput(inode); /* truncate the inode here */ If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken), and dentry->d_inode is non-NULL, then this code now does an additional iput on the inode, which is wrong. Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0. Reference: https://lkml.org/lkml/2011/6/15/50 Reported-by: Norbert Preining <[email protected]> Reported-by: Török Edwin <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Török Edwin <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-06-16mm: get rid of the most spurious find_vma_prev() usersLinus Torvalds1-9/+3
We have some users of this function that date back to before the vma list was doubly linked, and just are silly. These days, you can find the previous vma by just following the vma->vm_prev pointer. In some cases you don't need any find_vma() lookup at all, and in other cases you're better off with the regular "find_vma()" that uses the vma cache front-end lookup. Some "find_vma_prev()" users are still valid, though. For example, in the case of a stack that grows up, it can be the case that we don't find any 'vma' at all (because we're looking up an address that is past the last vma), and that the stack that we want to grow is the 'prev' vma. But that kind of special case aside, we generally should prefer to use 'find_vma()'. Noticed due to a totally unrelated POWER memory corruption bug that just happened to hit in 'find_vma_prev()' and made me go "Hmm - why are we using that function here?". Signed-off-by: Linus Torvalds <[email protected]>
2011-06-16sh: sh7724: Add USBHS DMAEngine supportKuninori Morimoto2-0/+48
Signed-off-by: Kuninori Morimoto <[email protected]> Signed-off-by: Paul Mundt <[email protected]>
2011-06-16sh: ecovec: Add renesas_usbhs supportKuninori Morimoto1-0/+48
Signed-off-by: Kuninori Morimoto <[email protected]> Signed-off-by: Paul Mundt <[email protected]>
2011-06-15Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds22-57/+86
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm: ARM: footbridge: fix clock event support ARM: footbridge: fix debug macros ARM: initrd: disable initrds outside of memory ARM: extend Code: line by one 16-bit quantity for Thumb instructions ARM: 6955/1: cmpxchg syscall should data abort if page not write ARM: 6954/1: zImage: fix Thumb2 breakage ARM: 6953/1: DT: don't try to access physical address zero ARM: 6949/2: mach-u300: fix compilaton warning in IO accessors Revert "ARM: 6944/1: mm: allow ASID 0 to be allocated to tasks" Revert "ARM: 6943/1: mm: use TTBR1 instead of reserved context ID" davinci: make PCM platform devices static arm: davinci: Fix fallout from generic irq chip conversion ARM: 6894/1: mmci: trigger card detect IRQs on falling and rising edges ARM: 6952/1: fix lockdep warning of "unannotated irqs-off" ARM: 6951/1: include .bss in memory layout information ARM: 6948/1: Fix .size directives for __arm{7,9}tdmi_proc_info ARM: 6947/2: mach-u300: fix compilation error in timer ARM: 6946/1: vexpress: move v2m clock init to init_early ARM: mx51/sdma: Check the chip revision in run-time arm: mxs: include asm/processor.h for cpu_relax()
2011-06-15Revert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & ↵Linus Torvalds1-1/+1
VM_STACK_INCOMPLETE_SETUP" This reverts commit 7f81c8890c15a10f5220bebae3b6dfae4961962a. It turns out that it's not actually a build-time check on x86-64 UML, which does some seriously crazy stuff with VM_STACK_FLAGS. The VM_STACK_FLAGS define depends on the arch-supplied VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have arch/um/sys-x86_64/shared/sysdep/vm-flags.h: #define VM_STACK_DEFAULT_FLAGS \ (test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags) #define VM_STACK_DEFAULT_FLAGS vm_stack_flags (yes, seriously: two different #define's for that thing, with the first one being inside an "#ifdef TIF_IA32") It's possible that it is UML that should just be fixed in this area, but for now let's just undo the (very small) optimization. Reported-by: Randy Dunlap <[email protected]> Acked-by: Andrew Morton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Richard Weinberger <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15Documentation: fix cgroup typos and formattingJörg Sommer3-13/+13
Fix format and spelling. Signed-off-by: Jörg Sommer <[email protected]> Acked-by: Paul Menage <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15Documentation: update cgroupfs mount pointJörg Sommer11-94/+109
According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to mount cgroupfs on") the canonical mountpoint for the cgroup filesystem is /sys/fs/cgroup. Hence, this should be used in the documentation. Signed-off-by: Jörg Sommer <[email protected]> Acked-by: Paul Menage <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15Documentation: update kmemleak supported archsMaxin B. John1-1/+3
Instead of listing the architectures that are supported by kmemleak in Documentation/kmemleak.txt, just refer people to the list of supported architecutures in lib/Kconfig.debug so that Documentation/kmemleak.txt does not need more updates for this. Signed-off-by: Maxin B. John <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15Documentation: update printk-formats.txtAndrew Murray1-2/+117
This patch updates the incomplete documentation concerning the printk extended format specifiers. Signed-off-by: Andrew Murray <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Check if lowest_mask is initialized in find_lowest_rq() sched: Fix need_resched() when checking peempt
2011-06-15alpha: fix several security issuesDan Rosenberg1-4/+7
Fix several security issues in Alpha-specific syscalls. Untested, but mostly trivial. 1. Signedness issue in osf_getdomainname allows copying out-of-bounds kernel memory to userland. 2. Signedness issue in osf_sysinfo allows copying large amounts of kernel memory to userland. 3. Typo (?) in osf_getsysinfo bounds minimum instead of maximum copy size, allowing copying large amounts of kernel memory to userland. 4. Usage of user pointer in osf_wait4 while under KERNEL_DS allows privilege escalation via writing return value of sys_wait4 to kernel memory. Signed-off-by: Dan Rosenberg <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15drivers/misc/apds990x.c: apds990x_chip_on() should depend on CONFIG_PM || ↵Geert Uytterhoeven1-0/+2
CONFIG_PM_RUNTIME Fixes this warning: drivers/misc/apds990x.c: At top level: drivers/misc/apds990x.c:613: warning: `apds990x_chip_on' defined but not used Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Samu Onkalo <[email protected]> Cc: Jonathan Cameron <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15ksm: fix NULL pointer dereference in scan_get_next_rmap_item()Hugh Dickins1-0/+6
Andrea Righi reported a case where an exiting task can race against ksmd::scan_get_next_rmap_item (http://lkml.org/lkml/2011/6/1/742) easily triggering a NULL pointer dereference in ksmd. ksm_scan.mm_slot == &ksm_mm_head with only one registered mm CPU 1 (__ksm_exit) CPU 2 (scan_get_next_rmap_item) list_empty() is false lock slot == &ksm_mm_head list_del(slot->mm_list) (list now empty) unlock lock slot = list_entry(slot->mm_list.next) (list is empty, so slot is still ksm_mm_head) unlock slot->mm == NULL ... Oops Close this race by revalidating that the new slot is not simply the list head again. Andrea's test case: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #define BUFSIZE getpagesize() int main(int argc, char **argv) { void *ptr; if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) { perror("posix_memalign"); exit(1); } if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) { perror("madvise"); exit(1); } *(char *)NULL = 0; return 0; } Reported-by: Andrea Righi <[email protected]> Tested-by: Andrea Righi <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Chris Wright <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15rtc: fix build warnings in defconfigsWanlong Gao11-11/+11
RTC_CLASS is changed to bool, so 'm' is invalid. Signed-off-by: Wanlong Gao <[email protected]> Acked-by: Mike Frysinger <[email protected]> Acked-by: Wolfram Sang <[email protected]> Acked-by: Hans-Christian Egtvedt <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Cc: Guan Xuetao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15drivers/tty/serial/pch_uart.c: don't oops if dmi_get_system_info returns NULLAlexander Stein1-1/+3
If dmi_get_system_info() returns NULL, pch_uart_init_port() will dereferencea a zero pointer. This oops was observed on an Atom based board which has no BIOS, but a bootloder which doesn't provide DMI data. Signed-off-by: Alexander Stein <[email protected]> Cc: Greg KH <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15drivers/char/hpet.c: fix periodic-emulation for delayed interruptsNils Carlson1-2/+23
When interrupts are delayed due to interrupt masking or due to other interrupts being serviced the HPET periodic-emuation would fail. This happened because given an interval t and a time for the current interrupt m we would compute the next time as t + m. This works until we are delayed for > t, in which case we would be writing a new value which is in fact in the past. This can be solved by computing the next time instead as (k * t) + m where k is large enough to be in the future. The exact computation of k is described in a comment to the code. More detail: Assuming an interval of 5 between each expected interrupt we have a normal case of t0: interrupt, read t0 from comparator, set next interrupt t0 + 5 t5: interrupt, read t5 from comparator, set next interrupt t5 + 5 t10: interrupt, read t10 from comparator, set next interrupt t10 + 5 ... So, what happens when the interrupt is serviced too late? t0: interrupt, read t0 from comparator, set next interrupt t0 + 5 t11: delayed interrupt serviced, read t5 from comparator, set next interrupt t5 + 5, which is in the past! ... counter loops ... t10: Much much later, get the next interrupt. This can happen either because we have interrupts masked for too long (some stupid driver goes on a printk rampage) or just because we are pushing the limits of the interval (too small a period), or both most probably. My solution is to read the main counter as well and set the next interrupt to occur at the right interval, for example: t0: interrupt, read t0 from comparator, set next interrupt t0 + 5 t11: delayed interrupt serviced, read t5 from comparator, set next interrupt t15 as t10 has been missed. t15: back on track. Signed-off-by: Nils Carlson <[email protected]> Cc: John Stultz <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Clemens Ladisch <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15Documentation/feature-removal-schedule.txt: remove ns_cgroup from ↵[email protected]1-17/+0
feature-removal-schedule.txt Commit a77aea92010acf ("cgroup: remove the ns_cgroup") removed the ns_cgroup but it forgot to remove the related doc in feature-removal-schedule.txt. Signed-off-by: WANG Cong <[email protected]> Cc: Daniel Lezcano <[email protected]> Cc: Serge E. Hallyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15mm: compaction: abort compaction if too many pages are isolated and caller ↵Mel Gorman1-5/+24
is asynchronous V2 Asynchronous compaction is used when promoting to huge pages. This is all very nice but if there are a number of processes in compacting memory, a large number of pages can be isolated. An "asynchronous" process can stall for long periods of time as a result with a user reporting that firefox can stall for 10s of seconds. This patch aborts asynchronous compaction if too many pages are isolated as it's better to fail a hugepage promotion than stall a process. [[email protected]: return COMPACT_PARTIAL for abort] Reported-and-tested-by: Ury Stankevich <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15mm: vmscan: do not use page_count without a page pinAndrea Arcangeli1-2/+14
It is unsafe to run page_count during the physical pfn scan because compound_head could trip on a dangling pointer when reading page->first_page if the compound page is being freed by another CPU. [[email protected]: split out patch] Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15mm: compaction: ensure that the compaction free scanner does not move to the ↵Mel Gorman1-1/+12
next zone Compaction works with two scanners, a migration and a free scanner. When the scanners crossover, migration within the zone is complete. The location of the scanner is recorded on each cycle to avoid excesive scanning. When a zone is small and mostly reserved, it's very easy for the migration scanner to be close to the end of the zone. Then the following situation can occurs o migration scanner isolates some pages near the end of the zone o free scanner starts at the end of the zone but finds that the migration scanner is already there o free scanner gets reinitialised for the next cycle as cc->migrate_pfn + pageblock_nr_pages moving the free scanner into the next zone o migration scanner moves into the next zone When this happens, NR_ISOLATED accounting goes haywire because some of the accounting happens against the wrong zone. One zones counter remains positive while the other goes negative even though the overall global count is accurate. This was reported on X86-32 with !SMP because !SMP allows the negative counters to be visible. The fact that it is the bug should theoritically be possible there. Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15compaction: checks correct fragmentation indexShaohua Li1-2/+4
fragmentation_index() returns -1000 when the allocation might succeed This doesn't match the comment and code in compaction_suitable(). I thought compaction_suitable should return COMPACT_PARTIAL in -1000 case, because in this case allocation could succeed depending on watermarks. The impact of this is that compaction starts and compact_finished() is called which rechecks the watermarks and the free lists. It should have the same result in that compaction should not start but is more expensive. Acked-by: Mel Gorman <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15mm/memory-failure.c: fix page isolated count mismatchMinchan Kim1-1/+3
Pages isolated for migration are accounted with the vmstat counters NR_ISOLATE_[ANON|FILE]. Callers of migrate_pages() are expected to increment these counters when pages are isolated from the LRU. Once the pages have been migrated, they are put back on the LRU or freed and the isolated count is decremented. Memory failure is not properly accounting for pages it isolates causing the NR_ISOLATED counters to be negative. On SMP builds, this goes unnoticed as negative counters are treated as 0 due to expected per-cpu drift. On UP builds, the counter is treated by too_many_isolated() as a large value causing processes to enter D state during page reclaim or compaction. This patch accounts for pages isolated by memory failure correctly. [[email protected]: rewrote changelog] Reviewed-by: Andrea Arcangeli <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Cc: Andi Kleen <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNELJosh Triplett2-2/+2
CONFIG_CONSTRUCTORS controls support for running constructor functions at kernel init time. According to commit b99b87f70c7785ab ("kernel: constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However, CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it, and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have CONFIG_GCOV_KERNEL select it, so that the normal case of CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n. Observed in the short list of =y values in a minimal kernel configuration. Signed-off-by: Josh Triplett <[email protected]> Acked-by: WANG Cong <[email protected]> Acked-by: Peter Oberparleiter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15MAINTAINERS: add entry for legacy eeprom driverJean Delvare1-0/+6
I shall maintain the legacy eeprom driver, until we finally get rid of it. Signed-off-by: Jean Delvare <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15memcg: avoid percpu cached charge draining at softlimitKAMEZAWA Hiroyuki1-1/+7
Based on Michal Hocko's comment. We are not draining per cpu cached charges during soft limit reclaim because background reclaim doesn't care about charges. It tries to free some memory and charges will not give any. Cached charges might influence only selection of the biggest soft limit offender but as the call is done only after the selection has been already done it makes no change. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Daisuke Nishimura <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15memcg: fix percpu cached charge draining frequencyKAMEZAWA Hiroyuki1-16/+38
For performance, memory cgroup caches some "charge" from res_counter into per cpu cache. This works well but because it's cache, it needs to be flushed in some cases. Typical cases are 1. when someone hit limit. 2. when rmdir() is called and need to charges to be 0. But "1" has problem. Recently, with large SMP machines, we see many kworker runs because of flushing memcg's cache. Bad things in implementation are that even if a cpu contains a cache for memcg not related to a memcg which hits limit, drain code is called. This patch does A) check percpu cache contains a useful data or not. B) check other asynchronous percpu draining doesn't run. C) don't call local cpu callback. (*)This patch avoid changing the calling condition with hard-limit. When I run "cat 1Gfile > /dev/null" under 300M limit memcg, [Before] 13767 kamezawa 20 0 98.6m 424 416 D 10.0 0.0 0:00.61 cat 58 root 20 0 0 0 0 S 0.6 0.0 0:00.09 kworker/2:1 60 root 20 0 0 0 0 S 0.6 0.0 0:00.08 kworker/4:1 4 root 20 0 0 0 0 S 0.3 0.0 0:00.02 kworker/0:0 57 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/1:1 61 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/5:1 62 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/6:1 63 root 20 0 0 0 0 S 0.3 0.0 0:00.05 kworker/7:1 [After] 2676 root 20 0 98.6m 416 416 D 9.3 0.0 0:00.87 cat 2626 kamezawa 20 0 15192 1312 920 R 0.3 0.0 0:00.28 top 1 root 20 0 19384 1496 1204 S 0.0 0.0 0:00.66 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0 [[email protected]: make percpu_charge_mutex static, tweak comments] Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Daisuke Nishimura <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Tested-by: Ying Han <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15memcg: fix wrong check of noswap with softlimitKAMEZAWA Hiroyuki1-1/+1
Hierarchical reclaim doesn't swap out if memsw and resource limits are thye same (memsw_is_minimum == true) because we would hit mem+swap limit anyway (during hard limit reclaim). If it comes to the soft limit we shouldn't consider memsw_is_minimum at all because it doesn't make much sense. Either the soft limit is bellow the hard limit and then we cannot hit mem+swap limit or the direct reclaim takes a precedence. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Acked-by: Daisuke Nishimura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15memcg: clear mm->owner when last possible owner leavesKAMEZAWA Hiroyuki1-16/+15
The following crash was reported: > Call Trace: > [<ffffffff81139792>] mem_cgroup_from_task+0x15/0x17 > [<ffffffff8113a75a>] __mem_cgroup_try_charge+0x148/0x4b4 > [<ffffffff810493f3>] ? need_resched+0x23/0x2d > [<ffffffff814cbf43>] ? preempt_schedule+0x46/0x4f > [<ffffffff8113afe8>] mem_cgroup_charge_common+0x9a/0xce > [<ffffffff8113b6d1>] mem_cgroup_newpage_charge+0x5d/0x5f > [<ffffffff81134024>] khugepaged+0x5da/0xfaf > [<ffffffff81078ea0>] ? __init_waitqueue_head+0x4b/0x4b > [<ffffffff81133a4a>] ? add_mm_counter.constprop.5+0x13/0x13 > [<ffffffff81078625>] kthread+0xa8/0xb0 > [<ffffffff814d13e8>] ? sub_preempt_count+0xa1/0xb4 > [<ffffffff814d5664>] kernel_thread_helper+0x4/0x10 > [<ffffffff814ce858>] ? retint_restore_args+0x13/0x13 > [<ffffffff8107857d>] ? __init_kthread_worker+0x5a/0x5a What happens is that khugepaged tries to charge a huge page against an mm whose last possible owner has already exited, and the memory controller crashes when the stale mm->owner is used to look up the cgroup to charge. mm->owner has never been set to NULL with the last owner going away, but nobody cared until khugepaged came along. Even then it wasn't a problem because the final mmput() on an mm was forced to acquire and release mmap_sem in write-mode, preventing an exiting owner to go away while the mmap_sem was held, and until "692e0b3 mm: thp: optimize memcg charge in khugepaged", the memory cgroup charge was protected by mmap_sem in read-mode. Instead of going back to relying on the mmap_sem to enforce lifetime of a task, this patch ensures that mm->owner is properly set to NULL when the last possible owner is exiting, which the memory controller can handle just fine. [[email protected]: tweak comments] Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Hugh Dickins <[email protected]> Reported-by: Dave Jones <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15memcg: fix init_page_cgroup nid with sparsememKAMEZAWA Hiroyuki1-18/+53
Commit 21a3c9646873 ("memcg: allocate memory cgroup structures in local nodes") makes page_cgroup allocation as NUMA aware. But that caused a problem https://bugzilla.kernel.org/show_bug.cgi?id=36192. The problem was getting a NID from invalid struct pages, which was not initialized because it was out-of-node, out of [node_start_pfn, node_end_pfn) Now, with sparsemem, page_cgroup_init scans pfn from 0 to max_pfn. But this may scan a pfn which is not on any node and can access memmap which is not initialized. This makes page_cgroup_init() for SPARSEMEM node aware and remove a code to get nid from page->flags. (Then, we'll use valid NID always.) [[email protected]: try to fix up comments] Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15mm: memory.numa_stat: fix file permissionKAMEZAWA Hiroyuki1-0/+1
Commit 406eb0c9ba76 ("memcg: add memory.numastat api for numa statistics") adds memory.numa_stat file for memory cgroup. But the file permissions are wrong. [kamezawa@bluextal linux-2.6]$ ls -l /cgroup/memory/A/memory.numa_stat ---------- 1 root root 0 Jun 9 18:36 /cgroup/memory/A/memory.numa_stat This patch fixes the permission as [root@bluextal kamezawa]# ls -l /cgroup/memory/A/memory.numa_stat -r--r--r-- 1 root root 0 Jun 10 16:49 /cgroup/memory/A/memory.numa_stat Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Ying Han <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15leds: fix the incorrect display in menuconfigEric Miao1-3/+2
Seems when a config option does not have a dependency of the menuconfig, it messes the display of the rest configs, even if it's a hidden one. Signed-off-by: Eric Miao <[email protected]> Cc: Richard Purdie <[email protected]> Cc: Valdis Kletnieks <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15mm: fix negative commitlimit when gigantic hugepages are allocatedRafael Aquini1-0/+8
When 1GB hugepages are allocated on a system, free(1) reports less available memory than what really is installed in the box. Also, if the total size of hugepages allocated on a system is over half of the total memory size, CommitLimit becomes a negative number. The problem is that gigantic hugepages (order > MAX_ORDER) can only be allocated at boot with bootmem, thus its frames are not accounted to 'totalram_pages'. However, they are accounted to hugetlb_total_pages() What happens to turn CommitLimit into a negative number is this calculation, in fs/proc/meminfo.c: allowed = ((totalram_pages - hugetlb_total_pages()) * sysctl_overcommit_ratio / 100) + total_swap_pages; A similar calculation occurs in __vm_enough_memory() in mm/mmap.c. Also, every vm statistic which depends on 'totalram_pages' will render confusing values, as if system were 'missing' some part of its memory. Impact of this bug: When gigantic hugepages are allocated and sysctl_overcommit_memory == OVERCOMMIT_NEVER. In a such situation, __vm_enough_memory() goes through the mentioned 'allowed' calculation and might end up mistakenly returning -ENOMEM, thus forcing the system to start reclaiming pages earlier than it would be ususal, and this could cause detrimental impact to overall system's performance, depending on the workload. Besides the aforementioned scenario, I can only think of this causing annoyances with memory reports from /proc/meminfo and free(1). [[email protected]: standardize comment layout] Reported-by: Russ Anderson <[email protected]> Signed-off-by: Rafael Aquini <[email protected]> Acked-by: Russ Anderson <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15mm/memory_hotplug.c: fix building of node hotplug zonelistKAMEZAWA Hiroyuki1-0/+6
During memory hotplug we refresh zonelists when we online a page in a new zone. It means that the node's zonelist is not initialized until pages are onlined. So for example, "nid" passed by MEM_GOING_ONLINE notifier will point to NODE_DATA(nid) which has no zone fallback list. Moreover, if we hot-add cpu-only nodes, alloc_pages() will do no fallback. This patch makes a zonelist when a new pgdata is available. Note: in production, at fujitsu, memory should be onlined before cpu and our server didn't have any memory-less nodes and had no problems. But recent changes in MEM_GOING_ONLINE+page_cgroup will access not initialized zonelist of node. Anyway, there are memory-less node and we need some care. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15init/calibrate.c: remove annoying printkBorislav Petkov1-3/+0
Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips calculation as it appears when booting every core on setups with 'ignore_loglevel' which dmesg people scan for possible issues. As the message doesn't show very useful information to the widest audience of kernel boot message gazers, it should be removed. Introduced by commit d2b463135f84 ("init/calibrate.c: fix for critical bogoMIPS intermittent calculation failure"). Signed-off-by: Borislav Petkov <[email protected]> Cc: Andrew Worsley <[email protected]> Cc: Phil Carmody <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15w1: W1_MASTER_DS1WM should depend on GENERIC_HARDIRQSGeert Uytterhoeven1-1/+1
On m68k (which doesn't support generic hardirqs yet): drivers/w1/masters/ds1wm.c: In function `ds1wm_probe': drivers/w1/masters/ds1wm.c: error: implicit declaration of function `irq_set_irq_type' Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Evgeniy Polyakov <[email protected]> Cc: Jean-Franois Dagenais <[email protected]> Cc: Matt Reimer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-06-15include/asm-generic/pgtable.h: fix unbalanced parenthesisNicolas Kaiser1-1/+1
Signed-off-by: Nicolas Kaiser <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>