aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2011-01-05Merge branch 'clksrc' into develRussell King1-0/+1
Conflicts: arch/arm/mach-vexpress/v2m.c arch/arm/plat-omap/counter_32k.c arch/arm/plat-versatile/Makefile
2011-01-05Merge commit 'v2.6.37' into perf/coreIngo Molnar2-1/+3
Merge reason: Add the final .37 tree. Signed-off-by: Ingo Molnar <[email protected]>
2011-01-05sched: Change wait_for_completion_*_timeout() to return a signed longNeilBrown1-2/+2
wait_for_completion_*_timeout() can return: 0: if the wait timed out -ve: if the wait was interrupted +ve: if the completion was completed. As they currently return an 'unsigned long', the last two cases are not easily distinguished which can easily result in buggy code, as is the case for the recently added wait_for_completion_interruptible_timeout() call in net/sunrpc/cache.c So change them both to return 'long'. As MAX_SCHEDULE_TIMEOUT is LONG_MAX, a large +ve return value should never overflow. Signed-off-by: NeilBrown <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-01-05Merge commit 'v2.6.37' into sched/coreIngo Molnar13-171/+361
Merge reason: Merge the final .37 tree. Signed-off-by: Ingo Molnar <[email protected]>
2011-01-04sched, autogroup: Fix reference leakMike Galbraith1-1/+1
The cgroup exit mess also uncovered a struct autogroup reference leak. copy_process() was simply freeing vs putting the signal_struct, stranding a reference. Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Oleg Nesterov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-01-04sched, autogroup: Fix potential access to freed memoryMike Galbraith1-8/+17
Oleg pointed out that the /proc interface kref_get() useage may race with the final put during autogroup_move_group(). A signal->autogroup assignment may be in flight when the /proc interface dereference, leaving them taking a reference to an already dead group. Reported-by: Oleg Nesterov <[email protected]> Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-01-04sched: Remove redundant CONFIG_CGROUP_SCHED ifdefYong Zhang1-4/+0
CONFIG_[FAIR|RT]_GROUP_SCHED always means CONFIG_CGROUP_SCHED Signed-off-by: Yong Zhang <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-01-04perf: Clean up power events by introducing new, more generic onesThomas Renninger2-0/+18
Add these new power trace events: power:cpu_idle power:cpu_frequency power:machine_suspend The old C-state/idle accounting events: power:power_start power:power_end Have now a replacement (but we are still keeping the old tracepoints for compatibility): power:cpu_idle and power:power_frequency is replaced with: power:cpu_frequency power:machine_suspend is newly introduced. Jean Pihet has a patch integrated into the generic layer (kernel/power/suspend.c) which will make use of it. the type= field got removed from both, it was never used and the type is differed by the event type itself. perf timechart userspace tool gets adjusted in a separate patch. Signed-off-by: Thomas Renninger <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Acked-by: Jean Pihet <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> LKML-Reference: <[email protected]>
2011-01-04perf: Do not export power_frequency, but power_start eventThomas Renninger1-1/+1
power_frequency moved to drivers/cpufreq/cpufreq.c which has to be compiled in, no need to export it. intel_idle can a be module though... Signed-off-by: Thomas Renninger <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Jean Pihet <[email protected]> Cc: Jean Pihet <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> LKML-Reference: <[email protected]>
2011-01-04Merge commit 'v2.6.37-rc8' into perf/coreIngo Molnar3-14/+63
Merge reason: pick up latest -rc. Signed-off-by: Ingo Molnar <[email protected]>
2011-01-03watchdog: Improve initialisation error message and documentationBen Hutchings1-1/+2
The error message 'NMI watchdog failed to create perf event...' does not make it clear that this is a fatal error for the watchdog. It also currently prints the error value as a pointer, rather than extracting the error code with PTR_ERR(). Fix that. Add a note to the description of the 'nowatchdog' kernel parameter to associate it with this message. Reported-by: Cesare Leonardi <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Don Zickus <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: <[email protected]> # .37.x and later LKML-Reference: <1294009362.3167.126.camel@localhost> Signed-off-by: Ingo Molnar <[email protected]>
2010-12-29fix freeing user_struct in user cacheHillf Danton1-0/+1
When racing on adding into user cache, the new allocated from mm slab is freed without putting user namespace. Since the user namespace is already operated by getting, putting has to be issued. Signed-off-by: Hillf Danton <[email protected]> Acked-by: Serge Hallyn <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2010-12-28Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds1-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ring_buffer: Off-by-one and duplicate events in ring_buffer_read_page
2010-12-24Merge branch 'for-linus' of ↵Linus Torvalds1-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: print out alloc information with KERN_DEBUG instead of KERN_INFO kthread_work: make lockdep happy
2010-12-24PM / Wakeup: Replace pm_check_wakeup_events() with pm_wakeup_pending()Rafael J. Wysocki3-4/+4
To avoid confusion with the meaning and return value of pm_check_wakeup_events() replace it with pm_wakeup_pending() that will work the other way around (ie. return true when system-wide power transition should be aborted). Signed-off-by: Rafael J. Wysocki <[email protected]>
2010-12-24PM / Hibernate: When failed, in_suspend should be resetMyungJoo Ham1-0/+1
When hibernation failed due to an error in swsusp_write() called by hibernate(), it skips calling "power_down()" and returns. When hibernate() is called again (probably after fixing up so that swsusp_write() wouldn't fail again), before "in_suspend = 1" of create_image is called, in_suspend should be 0. However, because hibernate() did not reset "in_suspend" after a failure, it's already 1. This patch fixes such inconsistency of "in_suspend" value. Signed-off-by: MyungJoo Ham <[email protected]> Signed-off-by: Kyungmin Park <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2010-12-24PM / Hibernate: hibernation_ops->leave should be checked tooMyungJoo Ham1-1/+1
Because hibernate calls hibernation_ops->leave() without checking whether hibernation_ops->leave is NULL or not, hiberantion_set_ops should WARN_ON if hibernation_ops->leave is NULL. This patch added one more condition to check hibernation_ops->leave. Signed-off-by: MyungJoo Ham <[email protected]> Signed-off-by: Kyungmin Park <[email protected]> Acked-by: Pavel Machek <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2010-12-24Freezer: Fix a race during freezing of TASK_STOPPED tasksTejun Heo2-2/+13
After calling freeze_task(), try_to_freeze_tasks() see whether the task is stopped or traced and if so, considers it to be frozen; however, nothing guarantees that either the task being frozen sees TIF_FREEZE or the freezer sees TASK_STOPPED -> TASK_RUNNING transition. The task being frozen may wake up and not see TIF_FREEZE while the freezer fails to notice the transition and believes the task is still stopped. This patch fixes the race by making freeze_task() always go through fake_signal_wake_up() for applicable tasks. The function goes through the target task's scheduler lock and thus guarantees that either the target sees TIF_FREEZE or try_to_freeze_task() sees TASK_RUNNING. Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2010-12-24PM: Use proper ccflag flag in kernel/power/MakefileTracey Dent1-4/+1
Use the ccflags-$ flag instead of EXTRA_CFLAGS because EXTRA_CFLAGS is deprecated and should now be switched. According to (documentation/kbuild/makefiles.txt). Signed-off-by: Tracey Dent <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2010-12-23ring_buffer: Off-by-one and duplicate events in ring_buffer_read_pageDavid Sharp1-1/+8
Fix two related problems in the event-copying loop of ring_buffer_read_page. The loop condition for copying events is off-by-one. "len" is the remaining space in the caller-supplied page. "size" is the size of the next event (or two events). If len == size, then there is just enough space for the next event. size was set to rb_event_ts_length, which may include the size of two events if the first event is a time-extend, in order to assure time- extends are kept together with the event after it. However, rb_advance_reader always advances by one event. This would result in the event after any time-extend being duplicated. Instead, get the size of a single event for the memcpy, but use rb_event_ts_length for the loop condition. Signed-off-by: David Sharp <[email protected]> LKML-Reference: <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2010-12-23module: Move RO/NX module protection to after ftrace module updateSteven Rostedt1-12/+12
The commit: 84e1c6bb38eb318e456558b610396d9f1afaabf0 x86: Add RO/NX protection for loadable kernel modules Broke the function tracer with this output: ------------[ cut here ]------------ WARNING: at kernel/trace/ftrace.c:1014 ftrace_bug+0x114/0x171() Hardware name: Precision WorkStation 470 Modules linked in: i2c_core(+) Pid: 86, comm: modprobe Not tainted 2.6.37-rc2+ #68 Call Trace: [<ffffffff8104e957>] warn_slowpath_common+0x85/0x9d [<ffffffffa00026db>] ? __process_new_adapter+0x7/0x34 [i2c_core] [<ffffffffa00026db>] ? __process_new_adapter+0x7/0x34 [i2c_core] [<ffffffff8104e989>] warn_slowpath_null+0x1a/0x1c [<ffffffff810a9dfe>] ftrace_bug+0x114/0x171 [<ffffffffa00026db>] ? __process_new_adapter+0x7/0x34 [i2c_core] [<ffffffff810aa0db>] ftrace_process_locs+0x1ae/0x274 [<ffffffffa00026db>] ? __process_new_adapter+0x7/0x34 [i2c_core] [<ffffffff810aa29e>] ftrace_module_notify+0x39/0x44 [<ffffffff814405cf>] notifier_call_chain+0x37/0x63 [<ffffffff8106e054>] __blocking_notifier_call_chain+0x46/0x5b [<ffffffff8106e07d>] blocking_notifier_call_chain+0x14/0x16 [<ffffffff8107ffde>] sys_init_module+0x73/0x1f3 [<ffffffff8100acf2>] system_call_fastpath+0x16/0x1b ---[ end trace 2aff4f4ca53ec746 ]--- ftrace faulted on writing [<ffffffffa00026db>] __process_new_adapter+0x7/0x34 [i2c_core] The cause was that the module text was set to read only before ftrace could convert the calls to mcount to nops. Thus, the conversions failed due to not being able to write to the text locations. The simple fix is to move setting the module to read only after the module notifiers are called (where ftrace sets the module mcounts to nops). Reported-by: Peter Zijlstra <[email protected]> Acked-by: Rusty Russell <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2010-12-23Merge branch 'rcu/next' of ↵Ingo Molnar9-253/+996
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu
2010-12-23Merge commit 'v2.6.37-rc7' into x86/securityIngo Molnar27-257/+570
2010-12-22taskstats: pad taskstats netlink response for aligment issues on ia64Jeff Mahoney1-13/+44
The taskstats structure is internally aligned on 8 byte boundaries but the layout of the aggregrate reply, with two NLA headers and the pid (each 4 bytes), actually force the entire structure to be unaligned. This causes the kernel to issue unaligned access warnings on some architectures like ia64. Unfortunately, some software out there doesn't properly unroll the NLA packet and assumes that the start of the taskstats structure will always be 20 bytes from the start of the netlink payload. Aligning the start of the taskstats structure breaks this software, which we don't want. So, for now the alignment only happens on architectures that require it and those users will have to update to fixed versions of those packages. Space is reserved in the packet only when needed. This ifdef should be removed in several years e.g. 2012 once we can be confident that fixed versions are installed on most systems. We add the padding before the aggregate since the aggregate is already a defined type. Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems") previously addressed the alignment issues by padding out the pid field. This was supposed to be a compatible change but the circumstances described above mean that it wasn't. This patch backs out that change, since it was a hack, and introduces a new NULL attribute type to provide the padding. Padding the response with 4 bytes avoids allocating an aligned taskstats structure and copying it back. Since the structure weighs in at 328 bytes, it's too big to do it on the stack. Signed-off-by: Jeff Mahoney <[email protected]> Reported-by: Brian Rogers <[email protected]> Cc: Jeff Mahoney <[email protected]> Cc: Guillaume Chazarain <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-12-22Fix rounding in clocks_calc_mult_shift()john stultz1-0/+1
Russell King reports: | On the ARM dev boards, we have a 32-bit counter running at 24MHz. Calling | clocks_calc_mult_shift(&mult, &shift, 24MHz, NSEC_PER_SEC, 60) gives | us a multiplier of 2796202666 and a shift of 26. | | Over a large counter delta, this produces an error - lets take a count | from 362976315 to 4280663372: | | (4280663372-362976315) * 2796202666 / 2^26 - (4280663372-362976315) * (1000/24) | => -38.91872422891230269990 | | Can we do better? | | (4280663372-362976315) * 2796202667 / 2^26 - (4280663372-362976315) * (1000/24) | 19.45936211449532822051 | | which is about twice as good as the 2796202666 multiplier. | | Looking at the equivalent divisions obtained, 2796202666 / 2^26 gives | 41.66666665673255920410ns per tick, whereas 2796202667 / 2^26 gives | 41.66666667163372039794ns. The actual value wanted is 1000/24 = | 41.66666666666666666666ns. Fix this by ensuring we round to nearest when calculating the multiplier. Signed-off-by: John Stultz <[email protected]> Tested-by: Santosh Shilimkar <[email protected]> Tested-by: Will Deacon <[email protected]> Tested-by: Mikael Pettersson <[email protected]> Tested-by: Eric Miao <[email protected]> Tested-by: Olof Johansson <[email protected]> Tested-by: Jamie Iles <[email protected]> Signed-off-by: Russell King <[email protected]>
2010-12-22hrtimer: fix a typo in commentNamhyung Kim1-1/+1
Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2010-12-22Merge branch 'master' into for-nextJiri Kosina33-291/+658
Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.
2010-12-22Merge branch 'tip/perf/core' of ↵Ingo Molnar2-4/+16
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2010-12-22Merge commit 'v2.6.37-rc7' into perf/coreIngo Molnar7-151/+260
Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar <[email protected]>
2010-12-22kthread_work: make lockdep happyYong Zhang1-0/+11
spinlock in kthread_worker and wait_queue_head in kthread_work both should be lockdep sensible, so change the interface to make it suiltable for CONFIG_LOCKDEP. tj: comment update Reported-by: Nicolas <[email protected]> Signed-off-by: Yong Zhang <[email protected]> Signed-off-by: Andy Walls <[email protected]> Tested-by: Andy Walls <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2010-12-20workqueue: allow chained queueing during destructionTejun Heo1-1/+59
Currently, destroy_workqueue() makes the workqueue deny all new queueing by setting WQ_DYING and flushes the workqueue once before proceeding with destruction; however, there are cases where work items queue more related work items. Currently, such users need to explicitly flush the workqueue multiple times depending on the possible depth of such chained queueing. This patch updates the queueing path such that a work item can queue further work items on the same workqueue even when WQ_DYING is set. The flush on destruction is automatically retried until the workqueue is empty. This guarantees that the workqueue is empty on destruction while allowing chained queueing. The flush retry logic whines if it takes too many retries to drain the workqueue. Signed-off-by: Tejun Heo <[email protected]> Cc: James Bottomley <[email protected]>
2010-12-20Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Remove debugging check
2010-12-19sched: Remove debugging checkIngo Molnar1-1/+0
Linus reported that the new warning introduced by commit f26f9aff6aaf "Sched: fix skip_clock_update optimization" triggers. The need_resched flag can be set by other CPUs asynchronously so this debug check is bogus - remove it. Reported-by: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-12-19Merge branches 'x86-fixes-for-linus' and 'perf-fixes-for-linus' of ↵Linus Torvalds2-8/+39
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32: Make sure we can map all of lowmem if we need to x86, vt-d: Handle previous faults after enabling fault handling x86: Enable the intr-remap fault handling after local APIC setup x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem() bootmem: Add alloc_bootmem_align() x86, gcc-4.6: Use gcc -m options when building vdso x86: HPET: Chose a paranoid safe value for the ETIME check x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix off by one in perf_swevent_init() perf: Fix duplicate events with multiple-pmu vs software events ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO scripts/tags.sh: Add magic for trace-events tracing: Fix panic when lseek() called on "trace" opened for writing
2010-12-19Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds3-52/+245
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix the irqtime code for 32bit sched: Fix the irqtime code to deal with u64 wraps nohz: Fix get_next_timer_interrupt() vs cpu hotplug Sched: fix skip_clock_update optimization sched: Cure more NO_HZ load average woes
2010-12-19sched: Fix interactivity bug by charging unaccounted run-time on entity ↵Paul Turner1-1/+5
re-weight Mike Galbraith reported poor interactivity[*] when the new shares distribution code was combined with autogroups. The root cause turns out to be a mis-ordering of accounting accrued execution time and shares updates. Since update_curr() is issued hierarchically, updating the parent entity weights to reflect child enqueue/dequeue results in the parent's unaccounted execution time then being accrued (vs vruntime) at the new weight as opposed to the weight present at accumulation. While this doesn't have much effect on processes with timeslices that cross a tick, it is particularly problematic for an interactive process (e.g. Xorg) which incurs many (tiny) timeslices. In this scenario almost all updates are at dequeue which can result in significant fairness perturbation (especially if it is the only thread, resulting in potential {tg->shares, MIN_SHARES} transitions). Correct this by ensuring unaccounted time is accumulated prior to manipulating an entity's weight. [*] http://xkcd.com/619/ is perversely Nostradamian here. Signed-off-by: Paul Turner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Linus Torvalds <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-12-19sched: Move periodic share updates to entity_tick()Paul Turner1-4/+17
Long running entities that do not block (dequeue) require periodic updates to maintain accurate share values. (Note: group entities with several threads are quite likely to be non-blocking in many circumstances). By virtue of being long-running however, we will see entity ticks (otherwise the required update occurs in dequeue/put and we are done). Thus we can move the detection (and associated work) for these updates into the periodic path. This restores the 'atomicity' of update_curr() with respect to accounting. Signed-off-by: Paul Turner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-12-19Merge commit 'v2.6.37-rc6' into sched/coreIngo Molnar1-3/+4
Merge reason: Update to the latest -rc. Signed-off-by: Ingo Molnar <[email protected]>
2010-12-18Merge branch 'for-linus' of ↵Linus Torvalds1-94/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86: avoid high BIOS area when allocating address space x86: avoid E820 regions when allocating address space x86: avoid low BIOS area when allocating address space resources: add arch hook for preventing allocation in reserved areas Revert "resources: support allocating space within a region from the top down" Revert "PCI: allocate bus resources from the top down" Revert "x86/PCI: allocate space from the end of a region, not the beginning" Revert "x86: allocate space within a region top-down" Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode" PCI: Update MCP55 quirk to not affect non HyperTransport variants
2010-12-18irq_work: Use per cpu atomics instead of regular atomicsChristoph Lameter1-9/+9
The irq work queue is a per cpu object and it is sufficient for synchronization if per cpu atomics are used. Doing so simplifies the code and reduces the overhead of the code. Before: [email protected]$ size kernel/irq_work.o text data bss dec hex filename 451 8 1 460 1cc kernel/irq_work.o After: [email protected]$ size kernel/irq_work.o text data bss dec hex filename 438 8 1 447 1bf kernel/irq_work.o Cc: Peter Zijlstra <[email protected]> Signed-off-by: Christoph Lameter <[email protected]>
2010-12-17rcu: reduce __call_rcu()-induced contention on rcu_node structuresPaul E. McKenney1-17/+21
When the current __call_rcu() function was written, the expedited APIs did not exist. The __call_rcu() implementation therefore went to great lengths to detect the end of old grace periods and to start new ones, all in the name of reducing grace-period latency. Now the expedited APIs do exist, and the usage of __call_rcu() has increased considerably. This commit therefore causes __call_rcu() to avoid worrying about grace periods unless there are a large number of RCU callbacks stacked up on the current CPU. Signed-off-by: Paul E. McKenney <[email protected]>
2010-12-17rcu: limit rcu_node leaf-level fanoutPaul E. McKenney2-21/+27
Some recent benchmarks have indicated possible lock contention on the leaf-level rcu_node locks. This commit therefore limits the number of CPUs per leaf-level rcu_node structure to 16, in other words, there can be at most 16 rcu_data structures fanning into a given rcu_node structure. Prior to this, the limit was 32 on 32-bit systems and 64 on 64-bit systems. Note that the fanout of non-leaf rcu_node structures is unchanged. The organization of accesses to the rcu_node tree is such that references to non-leaf rcu_node structures are much less frequent than to the leaf structures. Signed-off-by: Paul E. McKenney <[email protected]>
2010-12-17rcu: fine-tune grace-period begin/end checksPaul E. McKenney1-9/+18
Use the CPU's bit in rnp->qsmask to determine whether or not the CPU should try to report a quiescent state. Handle overflow in the check for rdp->gpnum having fallen behind. Signed-off-by: Paul E. McKenney <[email protected]>
2010-12-17rcu: Keep gpnum and completed fields synchronizedFrederic Weisbecker1-0/+9
When a CPU that was in an extended quiescent state wakes up and catches up with grace periods that remote CPUs completed on its behalf, we update the completed field but not the gpnum that keeps a stale value of a backward grace period ID. Later, note_new_gpnum() will interpret the shift between the local CPU and the node grace period ID as some new grace period to handle and will then start to hunt quiescent state. But if every grace periods have already been completed, this interpretation becomes broken. And we'll be stuck in clusters of spurious softirqs because rcu_report_qs_rdp() will make this broken state run into infinite loop. The solution, as suggested by Lai Jiangshan, is to ensure that the gpnum and completed fields are well synchronized when we catch up with completed grace periods on their behalf by other cpus. This way we won't start noting spurious new grace periods. Suggested-by: Lai Jiangshan <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected] Signed-off-by: Paul E. McKenney <[email protected]>
2010-12-17rcu: Stop chasing QS if another CPU did it for usFrederic Weisbecker1-0/+8
When a CPU is idle and others CPUs handled its extended quiescent state to complete grace periods on its behalf, it will catch up with completed grace periods numbers when it wakes up. But at this point there might be no more grace period to complete, but still the woken CPU always keeps its stale qs_pending value and will then continue to chase quiescent states even if its not needed anymore. This results in clusters of spurious softirqs until a new real grace period is started. Because if we continue to chase quiescent states but we have completed every grace periods, rcu_report_qs_rdp() is puzzled and makes that state run into infinite loops. As suggested by Lai Jiangshan, just reset qs_pending if someone completed every grace periods on our behalf. Suggested-by: Lai Jiangshan <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-12-17rcu: increase synchronize_sched_expedited() batchingTejun Heo1-20/+62
The fix in commit #6a0cc49 requires more than three concurrent instances of synchronize_sched_expedited() before batching is possible. This patch uses a ticket-counter-like approach that is also not unrelated to Lai Jiangshan's Ring RCU to allow sharing of expedited grace periods even when there are only two concurrent instances of synchronize_sched_expedited(). This commit builds on Tejun's original posting, which may be found at http://lkml.org/lkml/2010/11/9/204, adding memory barriers, avoiding overflow of signed integers (other than via atomic_t), and fixing the detection of batching. Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-12-17resources: add arch hook for preventing allocation in reserved areasBjorn Helgaas1-0/+6
This adds arch_remove_reservations(), which an arch can implement if it needs to protect part of the address space from allocation. Sometimes that can be done by just putting a region in the resource tree, but there are cases where that doesn't work well. For example, x86 BIOS E820 reservations are not related to devices, so they may overlap part of, all of, or more than a device resource, so they may not end up at the correct spot in the resource tree. Acked-by: H. Peter Anvin <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Jesse Barnes <[email protected]>
2010-12-17Revert "resources: support allocating space within a region from the top down"Bjorn Helgaas1-94/+4
This reverts commit e7f8567db9a7f6b3151b0b275e245c1cef0d9c70. Acked-by: H. Peter Anvin <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Jesse Barnes <[email protected]>
2010-12-17taskstats: Use this_cpu_opsChristoph Lameter1-3/+2
Use this_cpu_inc_return in one place and avoid ugly __raw_get_cpu in another. V3->V4: - Fix off by one. V4-V4f: - Use &listener_array Cc: Michael Holzheu <[email protected]> Acked-by: H. Peter Anvin <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2010-12-17Merge branch 'this_cpu_ops' into for-2.6.38Tejun Heo3-5/+8