aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2012-01-06Merge branch 'core-locking-for-linus' of ↵Linus Torvalds4-16/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep/waitqueues: Add better annotation lockdep, bug: Exclude TAINT_OOT_MODULE from disabling lock debugging lockdep: Print lock name in lockdep_init_error() init/main.c: Execute lockdep_init() as early as possible lockdep, kmemcheck: Annotate ->lock in lockdep_init_map() lockdep, rtmutex, bug: Show taint flags on error lockdep, bug: Exclude TAINT_FIRMWARE_WORKAROUND from disabling lockdep lockdep: Always try to set ->class_cache in register_lock_class() lockdep_init_map()
2012-01-06Merge branch 'core-debugobjects-for-linus' of ↵Linus Torvalds1-6/+56
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timer: Use debugobjects to catch deletion of uninitialized timers timer: Setup uninitialized timer with a stub callback debugobjects: Extend to assert that an object is initialized debugobjects: Be smarter about static objects
2012-01-05cgroup: fix to allow mounting a hierarchy by nameLi Zefan1-3/+3
If we mount a hierarchy with a specified name, the name is unique, and we can use it to mount the hierarchy without specifying its set of subsystem names. This feature is documented is Documentation/cgroups/cgroups.txt section 2.3 Here's an example: # mount -t cgroup -o cpuset,name=myhier xxx /cgroup1 # mount -t cgroup -o name=myhier xxx /cgroup2 But it was broken by commit 32a8cf235e2f192eb002755076994525cdbaa35a (cgroup: make the mount options parsing more accurate) This fixes the regression. Signed-off-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected]
2012-01-05Merge branch 'devel-stable' into for-linusRussell King1-5/+7
Conflicts: arch/arm/kernel/setup.c arch/arm/mach-shmobile/board-kota2.c
2012-01-05Merge branch 'pm-sleep' into pm-for-linusRafael J. Wysocki1-0/+64
* pm-sleep: PM / Hibernate: Implement compat_ioctl for /dev/snapshot
2012-01-05PM / Hibernate: Implement compat_ioctl for /dev/snapshotBen Hutchings1-0/+64
This allows uswsusp built for i386 to run on an x86_64 kernel (tested with Debian package version 1.0+20110509-2). References: http://bugs.debian.org/502816 Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2012-01-04ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detachOleg Nesterov2-3/+12
This is the temporary simple fix for 3.2, we need more changes in this area. 1. do_signal_stop() assumes that the running untraced thread in the stopped thread group is not possible. This was our goal but it is not yet achieved: a stopped-but-resumed tracee can clone the running thread which can initiate another group-stop. Remove WARN_ON_ONCE(!current->ptrace). 2. A new thread always starts with ->jobctl = 0. If it is auto-attached and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr) in do_jobctl_trap() if another debugger attaches. Change __ptrace_unlink() to set the artificial SIGSTOP for report. Alternatively we could change ptrace_init_task() to copy signr from current, but this means we can copy it for no reason and hide the possible similar problems. Acked-by: Tejun Heo <[email protected]> Cc: <[email protected]> [3.1] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-01-04ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE raceOleg Nesterov1-1/+8
Test-case: int main(void) { int pid, status; pid = fork(); if (!pid) { for (;;) { if (!fork()) return 0; if (waitpid(-1, &status, 0) < 0) { printf("ERR!! wait: %m\n"); return 0; } } } assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0); assert(waitpid(-1, NULL, 0) == pid); assert(ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACEFORK) == 0); do { ptrace(PTRACE_CONT, pid, 0, 0); pid = waitpid(-1, NULL, 0); } while (pid > 0); return 1; } It fails because ->real_parent sees its child in EXIT_DEAD state while the tracer is going to change the state back to EXIT_ZOMBIE in wait_task_zombie(). The offending commit is 823b018e which moved the EXIT_DEAD check, but in fact we should not blame it. The original code was not correct as well because it didn't take ptrace_reparented() into account and because we can't really trust ->ptrace. This patch adds the additional check to close this particular race but it doesn't solve the whole problem. We simply can't rely on ->ptrace in this case, it can be cleared if the tracer is multithreaded by the exiting ->parent. I think we should kill EXIT_DEAD altogether, we should always remove the soon-to-be-reaped child from ->children or at least we should never do the DEAD->ZOMBIE transition. But this is too complex for 3.2. Reported-and-tested-by: Denys Vlasenko <[email protected]> Tested-by: Lukasz Michalik <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: <[email protected]> [3.0+] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-01-04cgroup: move assignement out of condition in cgroup_attach_proc()Dan Carpenter1-2/+5
Gcc complains about this: "kernel/cgroup.c:2179:4: warning: suggest parentheses around assignment used as truth value [-Wparentheses]" Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2012-01-03auditsc: propage umode_tAl Viro1-2/+2
Signed-off-by: Al Viro <[email protected]>
2012-01-03switch kern_ipc_perm to umode_tAl Viro1-5/+5
Signed-off-by: Al Viro <[email protected]>
2012-01-03switch mq_open() to umode_tAl Viro1-3/+3
2012-01-03sysctl: use umode_t for table permissionsAl Viro1-1/+1
Signed-off-by: Al Viro <[email protected]>
2012-01-03cgroup: propagate mode_tAl Viro1-7/+7
Signed-off-by: Al Viro <[email protected]>
2012-01-03switch debugfs to umode_tAl Viro4-4/+4
Signed-off-by: Al Viro <[email protected]>
2012-01-03switch vfs_mkdir() and ->mkdir() to umode_tAl Viro1-2/+2
vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <[email protected]>
2012-01-03fs: move code out of buffer.cAl Viro1-1/+0
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export kill_bdev as well, so brd doesn't have to open code it. Reduce buffer_head.h requirement accordingly. Removed a rather large comment from invalidate_bdev, as it looked a bit obsolete to bother moving. The small comment replacing it says enough. Signed-off-by: Nick Piggin <[email protected]> Cc: Al Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-01-03get rid of timer in kern/acct.cAl Viro1-30/+10
... and clean it up a bit, while we are at it Signed-off-by: Al Viro <[email protected]>
2012-01-03hung_task: fix false positive during vforkMandeep Singh Baines1-4/+10
vfork parent uninterruptibly and unkillably waits for its child to exec/exit. This wait is of unbounded length. Ignore such waits in the hung_task detector. Signed-off-by: Mandeep Singh Baines <[email protected]> Reported-by: Sasha Levin <[email protected]> LKML-Reference: <1325344394.28904.43.camel@lappy> Cc: Linus Torvalds <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrew Morton <[email protected]> Cc: John Kacur <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2012-01-02misc latin1 to utf8 conversionsAl Viro2-2/+2
Signed-off-by: Al Viro <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2011-12-31futex: Fix uninterruptible loop due to gate_areaHugh Dickins1-8/+20
It was found (by Sasha) that if you use a futex located in the gate area we get stuck in an uninterruptible infinite loop, much like the ZERO_PAGE issue. While looking at this problem, PeterZ realized you'll get into similar trouble when hitting any install_special_pages() mapping. And are there still drivers setting up their own special mmaps without page->mapping, and without special VM or pte flags to make get_user_pages fail? In most cases, if page->mapping is NULL, we do not need to retry at all: Linus points out that even /proc/sys/vm/drop_caches poses no problem, because it ends up using remove_mapping(), which takes care not to interfere when the page reference count is raised. But there is still one case which does need a retry: if memory pressure called shmem_writepage in between get_user_pages_fast dropping page table lock and our acquiring page lock, then the page gets switched from filecache to swapcache (and ->mapping set to NULL) whatever the refcount. Fault it back in to get the page->mapping needed for key->shared.inode. Reported-by: Sasha Levin <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2011-12-30Revert "clockevents: Set noop handler in clockevents_exchange_device()"Linus Torvalds1-1/+0
This reverts commit de28f25e8244c7353abed8de0c7792f5f883588c. It results in resume problems for various people. See for example http://thread.gmane.org/gmane.linux.kernel/1233033 http://thread.gmane.org/gmane.linux.kernel/1233389 http://thread.gmane.org/gmane.linux.kernel/1233159 http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877 and the fedora and ubuntu bug reports https://bugzilla.redhat.com/show_bug.cgi?id=767248 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569 which got bisected down to the stable version of this commit. Reported-by: Jonathan Nieder <[email protected]> Reported-by: Phil Miller <[email protected]> Reported-by: Philip Langdale <[email protected]> Reported-by: Tim Gardner <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Greg KH <[email protected]> Cc: [email protected] # for stable kernels that applied the original Signed-off-by: Linus Torvalds <[email protected]>
2011-12-28irq: check domain hwirq range for DT translateRob Herring1-0/+3
A DT node may have more than 1 domain associated with it, so make sure the hwirq number is within range when doing DT translation. Signed-off-by: Rob Herring <[email protected]> Acked-by: Grant Likely <[email protected]> Acked-by: Shawn Guo <[email protected]> Cc: Thomas Gleixner <[email protected]>
2011-12-27cgroup: Remove task_lock() from cgroup_post_fork()Frederic Weisbecker1-3/+12
cgroup_post_fork() is protected between threadgroup_change_begin() and threadgroup_change_end() against concurrent changes of the child's css_set in cgroup_task_migrate(). Also the child can't exit and call cgroup_exit() at this stage, this means it's css_set can't be changed with init_css_set concurrently. For these reasons, we don't need to hold task_lock() on the child because it's css_set can only remain stable in this place. Let's remove the lock there. v2: Update comment to explain that we are safe against cgroup_exit() Signed-off-by: Frederic Weisbecker <[email protected]> Acked-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: Containers <[email protected]> Cc: Cgroups <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul Menage <[email protected]> Cc: Mandeep Singh Baines <[email protected]>
2011-12-27cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()Kirill A. Shutemov1-0/+2
Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-12-27cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as staticKirill A. Shutemov1-2/+2
Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-12-27Merge remote-tracking branch 'tip/perf/core' into kvm-updates/3.3Avi Kivity12-306/+476
* tip/perf/core: (66 commits) perf, x86: Expose perf capability to other modules perf, x86: Implement arch event mask as quirk x86, perf: Disable non available architectural events jump_label: Provide jump_label_key initializers jump_label, x86: Fix section mismatch perf, core: Rate limit perf_sched_events jump_label patching perf: Fix enable_on_exec for sibling events perf: Remove superfluous arguments perf, x86: Prefer fixed-purpose counters when scheduling perf, x86: Fix event scheduler for constraints with overlapping counters perf, x86: Implement event scheduler helper functions perf: Avoid a useless pmu_disable() in the perf-tick x86/tools: Add decoded instruction dump mode x86: Update instruction decoder to support new AVX formats x86/tools: Fix insn_sanity message outputs x86/tools: Fix instruction decoder message output x86: Fix instruction decoder to handle grouped AVX instructions x86/tools: Fix Makefile to build all test tools perf test: Soft errors shouldn't stop the "Validate PERF_RECORD_" test perf test: Validate PERF_RECORD_ events and perf_sample fields ... Signed-off-by: Avi Kivity <[email protected]> * commit 'b3d9468a8bd218a695e3a0ff112cd4efd27b670a': (66 commits) perf, x86: Expose perf capability to other modules perf, x86: Implement arch event mask as quirk x86, perf: Disable non available architectural events jump_label: Provide jump_label_key initializers jump_label, x86: Fix section mismatch perf, core: Rate limit perf_sched_events jump_label patching perf: Fix enable_on_exec for sibling events perf: Remove superfluous arguments perf, x86: Prefer fixed-purpose counters when scheduling perf, x86: Fix event scheduler for constraints with overlapping counters perf, x86: Implement event scheduler helper functions perf: Avoid a useless pmu_disable() in the perf-tick x86/tools: Add decoded instruction dump mode x86: Update instruction decoder to support new AVX formats x86/tools: Fix insn_sanity message outputs x86/tools: Fix instruction decoder message output x86: Fix instruction decoder to handle grouped AVX instructions x86/tools: Fix Makefile to build all test tools perf test: Soft errors shouldn't stop the "Validate PERF_RECORD_" test perf test: Validate PERF_RECORD_ events and perf_sample fields ...
2011-12-27jump-label: export jump_label_inc/jump_label_decXiao Guangrong1-0/+2
Export these two symbols, they will be used by KVM mmu audit Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-12-25Merge branch 'pm-sleep' into pm-for-linusRafael J. Wysocki13-366/+273
* pm-sleep: (51 commits) PM: Drop generic_subsys_pm_ops PM / Sleep: Remove forward-only callbacks from AMBA bus type PM / Sleep: Remove forward-only callbacks from platform bus type PM: Run the driver callback directly if the subsystem one is not there PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers PM / Sleep: Merge internal functions in generic_ops.c PM / Sleep: Simplify generic system suspend callbacks PM / Hibernate: Remove deprecated hibernation snapshot ioctls PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled() PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep() PM / Sleep: Make [un]lock_system_sleep() generic PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count() PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else' Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer PM / Sleep: Unify diagnostic messages from device suspend/resume ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR PM / Hibernate: Remove deprecated hibernation test modes PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path ... Conflicts: kernel/kmod.c
2011-12-25Merge branch 'pm-misc' into pm-for-linusRafael J. Wysocki2-6/+4
* pm-misc: CPU: Add right qualifiers for alloc_frozen_cpus() and cpu_hotplug_pm_sync_init() PM / Usermodehelper: Cleanup remnants of usermodehelper_pm_callback()
2011-12-23ARM: 7235/1: irqdomain: export irq_domain_simple_ops for !CONFIG_OFJamie Iles1-5/+7
irqdomain support is used in interrupt controller drivers that may not have device tree support but only need the basic HW->Linux irq translation. Rather than having each of these implement their own IRQ domain, allow them to use the simple ops. Acked-by: Thomas Gleixner <[email protected]> Acked-by: Rob Herring <[email protected]> Cc: Grant Likely <[email protected]> Signed-off-by: Jamie Iles <[email protected]> Signed-off-by: Russell King <[email protected]>
2011-12-23sched/tracing: Add a new tracepoint for sleeptimeArun Sharma2-2/+1
If CONFIG_SCHEDSTATS is defined, the kernel maintains information about how long the task was sleeping or in the case of iowait, blocking in the kernel before getting woken up. This will be useful for sleep time profiling. Note: this information is only provided for sched_fair. Other scheduling classes may choose to provide this in the future. Note: the delay includes the time spent on the runqueue as well. Signed-off-by: Arun Sharma <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Andrew Vagin <[email protected]> Cc: Frederic Weisbecker <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-12-23sched: Disable scheduler warnings during oopsesDave Jones1-0/+3
The panic-on-framebuffer code seems to cause a schedule to occur during an oops. This causes a bunch of extra spew as can be seen in: https://bugzilla.redhat.com/attachment.cgi?id=549230 Don't do scheduler debug checks when we are oopsing already. Signed-off-by: Dave Jones <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-12-22cgroup: only need to check oldcgrp==newgrp onceMandeep Singh Baines1-16/+6
In cgroup_attach_proc it is now sufficient to only check that oldcgrp==newcgrp once. Now that we are using threadgroup_lock() during the migrations, oldcgrp will not change. Signed-off-by: Mandeep Singh Baines <[email protected]> Acked-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected] Cc: [email protected] Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul Menage <[email protected]>
2011-12-22cgroup: remove redundant get/put of task structMandeep Singh Baines1-9/+2
threadgroup_lock() guarantees that the target threadgroup will remain stable - no new task will be added, no new PF_EXITING will be set and exec won't happen. Changes in V2: * https://lkml.org/lkml/2011/12/20/369 (Tejun Heo) * Undo incorrect removal of get/put from attach_task_by_pid() * Author * Remove a comment which is made stale by this change Signed-off-by: Mandeep Singh Baines <[email protected]> Acked-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected] Cc: [email protected] Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul Menage <[email protected]>
2011-12-22cgroup: remove redundant get/put of old css_set from migrateMandeep Singh Baines1-20/+8
We can now assume that the css_set reference held by the task will not go away for an exiting task. PF_EXITING state can be trusted throughout migration by checking it after locking threadgroup. Changes in V4: * https://lkml.org/lkml/2011/12/20/368 (Tejun Heo) * Fix typo in commit message * Undid the rename of css_set_check_fetched * https://lkml.org/lkml/2011/12/20/427 (Li Zefan) * Fix comment in cgroup_task_migrate() Changes in V3: * https://lkml.org/lkml/2011/12/20/255 (Frederic Weisbecker) * Fixed to put error in retval Changes in V2: * https://lkml.org/lkml/2011/12/19/289 (Tejun Heo) * Updated commit message -tj: removed stale patch description about dropped function rename. Signed-off-by: Mandeep Singh Baines <[email protected]> Acked-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected] Cc: [email protected] Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul Menage <[email protected]>
2011-12-21clockevents: remove sysdev.hKay Sievers1-1/+0
This isn't needed in the clockevents.c file, and the header file is going away soon, so just remove the #include Cc: Thomas Gleixner <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2011-12-21cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystemKay Sievers1-21/+19
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Paul Mundt <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Tigran Aivazian <[email protected]> Cc: Len Brown <[email protected]> Cc: Zhang Rui <[email protected]> Cc: Dave Jones <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Russell King <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: "Srivatsa S. Bhat" <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2011-12-21Merge branch 'master' into pm-sleepRafael J. Wysocki26-84/+467
* master: (848 commits) SELinux: Fix RCU deref check warning in sel_netport_insert() binary_sysctl(): fix memory leak mm/vmalloc.c: remove static declaration of va from __get_vm_area_node ipmi_watchdog: restore settings when BMC reset oom: fix integer overflow of points in oom_badness memcg: keep root group unchanged if creation fails nilfs2: potential integer overflow in nilfs_ioctl_clean_segments() nilfs2: unbreak compat ioctl cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask evm: prevent racing during tfm allocation evm: key must be set once during initialization mmc: vub300: fix type of firmware_rom_wait_states module parameter Revert "mmc: enable runtime PM by default" mmc: sdhci: remove "state" argument from sdhci_suspend_host x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT IB/qib: Correct sense on freectxts increment and decrement RDMA/cma: Verify private data length cgroups: fix a css_set not found bug in cgroup_attach_proc oprofile: Fix uninitialized memory access when writing to writing to oprofilefs Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel" ... Conflicts: kernel/cgroup_freezer.c
2011-12-21cgroup: Remove unnecessary task_lock before fetching css_set on migrationFrederic Weisbecker1-10/+10
When we fetch the css_set of the tasks on cgroup migration, we don't need anymore to synchronize against cgroup_exit() that could swap the old one with init_css_set. Now that we are using threadgroup_lock() during the migrations, we don't need to worry about it anymore. Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Mandeep Singh Baines <[email protected]> Reviewed-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: Containers <[email protected]> Cc: Cgroups <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul Menage <[email protected]>
2011-12-21cgroup: Drop task_lock(parent) on cgroup_fork()Frederic Weisbecker1-6/+17
We don't need to hold the parent task_lock() on the parent in cgroup_fork() because we are already synchronized against the two places that may change the parent css_set concurrently: - cgroup_exit(), but the parent obviously can't exit concurrently - cgroup migration: we are synchronized against threadgroup_lock() So we can safely remove the task_lock() there. Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: Containers <[email protected]> Cc: Cgroups <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul Menage <[email protected]> Cc: Mandeep Singh Baines <[email protected]>
2011-12-21sched: Fix cgroup movement of waking processDaisuke Nishimura1-1/+3
There is a small race between try_to_wake_up() and sched_move_task(), which is trying to move the process being woken up. try_to_wake_up() on CPU0 sched_move_task() on CPU1 --------------------------------+--------------------------------- raw_spin_lock_irqsave(p->pi_lock) task_waking_fair() ->p.se.vruntime -= cfs_rq->min_vruntime ttwu_queue() ->send reschedule IPI to CPU1 raw_spin_unlock_irqsave(p->pi_lock) task_rq_lock() -> tring to aquire both p->pi_lock and rq->lock with IRQ disabled task_move_group_fair() -> p.se.vruntime -= (old)cfs_rq->min_vruntime += (new)cfs_rq->min_vruntime task_rq_unlock() (via IPI) sched_ttwu_pending() raw_spin_lock(rq->lock) ttwu_do_activate() ... enqueue_entity() child.se->vruntime += cfs_rq->min_vruntime raw_spin_unlock(rq->lock) As a result, vruntime of the process becomes far bigger than min_vruntime, if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime. This patch fixes this problem by just ignoring such process in task_move_group_fair(), because the vruntime has already been normalized in task_waking_fair(). Signed-off-by: Daisuke Nishimura <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-12-21sched: Fix cgroup movement of newly created processDaisuke Nishimura1-0/+13
There is a small race between do_fork() and sched_move_task(), which is trying to move the child. do_fork() sched_move_task() --------------------------------+--------------------------------- copy_process() sched_fork() task_fork_fair() -> vruntime of the child is initialized based on that of the parent. -> we can see the child in "tasks" file now. task_rq_lock() task_move_group_fair() -> child.se.vruntime -= (old)cfs_rq->min_vruntime += (new)cfs_rq->min_vruntime task_rq_unlock() wake_up_new_task() ... enqueue_entity() child.se.vruntime += cfs_rq->min_vruntime As a result, vruntime of the child becomes far bigger than min_vruntime, if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime. This patch fixes this problem by just ignoring such process in task_move_group_fair(), because the vruntime has already been normalized in task_fork_fair(). Signed-off-by: Daisuke Nishimura <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-12-21sched: Fix cgroup movement of forking processDaisuke Nishimura1-2/+5
There is a small race between task_fork_fair() and sched_move_task(), which is trying to move the parent. task_fork_fair() sched_move_task() --------------------------------+--------------------------------- cfs_rq = task_cfs_rq(current) -> cfs_rq is the "old" one. curr = cfs_rq->curr -> curr is set to the parent. task_rq_lock() dequeue_task() ->parent.se.vruntime -= (old)cfs_rq->min_vruntime enqueue_task() ->parent.se.vruntime += (new)cfs_rq->min_vruntime task_rq_unlock() raw_spin_lock_irqsave(rq->lock) se->vruntime = curr->vruntime -> vruntime of the child is set to that of the parent which has already been updated by sched_move_task(). se->vruntime -= (old)cfs_rq->min_vruntime. raw_spin_unlock_irqrestore(rq->lock) As a result, vruntime of the child becomes far bigger than expected, if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime. This patch fixes this problem by setting "cfs_rq" and "curr" after holding the rq->lock. Signed-off-by: Daisuke Nishimura <[email protected]> Acked-by: Paul Turner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-12-21sched: Remove cfs bandwidth period check in tg_set_cfs_period()Kamalesh Babulal1-3/+0
Remove cfs bandwidth period check from tg_set_cfs_period. Invalid bandwidth period's lower/upper limits are denoted by min_cfs_quota_period/max_cfs_quota_period repsectively, and are checked against valid period in tg_set_cfs_bandwidth(). As pjt pointed out, negative input will result in very large unsigned numbers and will be caught by the max allowed period test. Signed-off-by: Kamalesh Babulal <[email protected]> Acked-by: Paul Turner <[email protected]> [ammended changelog to mention negative values] Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] -- kernel/sched/core.c | 3 --- 1 file changed, 3 deletions(-) Signed-off-by: Ingo Molnar <[email protected]>
2011-12-21sched: Fix load-balance lock-breakingPeter Zijlstra1-7/+25
The current lock break relies on contention on the rq locks, something which might never come because we've got IRQs disabled. Or will be very likely because on anything with more than 2 cpus a synchronized load-balance pass will very likely cause contention on the rq locks. Also the sched_nr_migrate thing fails when it gets trapped the loops of either the cgroup muck in load_balance_fair() or the move_tasks() load condition. Instead, use the new lb_flags field to propagate break/abort conditions for all these loops and create a new loop outside the irq disabled on the break being required. Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-12-21sched: Replace all_pinned with a generic flags fieldPeter Zijlstra1-16/+19
Replace the all_pinned argument with a flags field so that we can add some extra controls throughout that entire call chain. Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-12-21sched: Only queue remote wakeups when crossing cache boundariesPeter Zijlstra3-30/+70
Mike reported a 13% drop in netperf TCP_RR performance due to the new remote wakeup code. Suresh too noticed some performance issues with it. Reducing the IPIs to only cross cache domains solves the observed performance issues. Reported-by: Suresh Siddha <[email protected]> Reported-by: Mike Galbraith <[email protected]> Acked-by: Suresh Siddha <[email protected]> Acked-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Chris Mason <[email protected]> Cc: Dave Kleikamp <[email protected]> Link: http://lkml.kernel.org/r/1323338531.17673.7.camel@twins Signed-off-by: Ingo Molnar <[email protected]>
2011-12-21lockdep/waitqueues: Add better annotationPeter Zijlstra1-2/+2
-> #2 (&tty->write_wait){-.-...}: is a lot more informative than: -> #2 (key#19){-.....}: Signed-off-by: Peter Zijlstra <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-12-21Merge commit 'v3.2-rc6' into core/lockingIngo Molnar18-53/+265
Merge reason: Pick up the latest fixes. Signed-off-by: Ingo Molnar <[email protected]>