aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2014-03-20Merge branches 'pm-runtime' and 'pm-sleep'Rafael J. Wysocki6-14/+20
* pm-runtime: PM / Runtime: Update runtime_idle() documentation for return value meaning * pm-sleep: PM / sleep: Correct whitespace errors in <linux/pm.h> PM: Add missing "freeze" state PM / Hibernate: Spelling s/anonymouns/anonymous/ PM / Runtime: Add missing "it" in comment PM / suspend: Remove unnecessary !! PCI / PM: Resume runtime-suspended devices later during system suspend ACPI / PM: Resume runtime-suspended devices later during system suspend PM / sleep: Set pm_generic functions to NULL for !CONFIG_PM_SLEEP PM: fix typo in comment PM / hibernate: use name_to_dev_t to parse resume PM / wakeup: Include appropriate header file in kernel/power/wakelock.c PM / sleep: Move prototype declaration to header file kernel/power/power.h PM / sleep: Asynchronous threads for suspend_late PM / sleep: Asynchronous threads for suspend_noirq PM / sleep: Asynchronous threads for resume_early PM / sleep: Asynchronous threads for resume_noirq PM / sleep: Two flags for async suspend_noirq and suspend_late
2014-03-20Merge branches 'pm-qos', 'pm-domains' and 'pm-drivers'Rafael J. Wysocki1-6/+12
* pm-qos: PM / QoS: Add type to dev_pm_qos_add_ancestor_request() arguments ACPI / LPSS: Support for device latency tolerance PM QoS ACPI / scan: Add bind/unbind callbacks to struct acpi_scan_handler PM / QoS: Introcuce latency tolerance device PM QoS type PM / QoS: Add no_constraints_value field to struct pm_qos_constraints PM / QoS: Rename device resume latency QoS items * pm-domains: PM / domains: Turn latency warning into debug message * pm-drivers: PM: Add pm_runtime_suspend|resume_force functions PM / runtime: Fetch runtime PM callbacks using a macro
2014-03-20timer: Remove code redundancy while calling get_nohz_timer_target()Viresh Kumar3-21/+6
There are only two users of get_nohz_timer_target(): timer and hrtimer. Both call it under same circumstances, i.e. #ifdef CONFIG_NO_HZ_COMMON if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu)) return get_nohz_timer_target(); #endif So, it makes more sense to get all this as part of get_nohz_timer_target() instead of duplicating code at two places. For this another parameter is required to be passed to this routine, pinned. Signed-off-by: Viresh Kumar <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1e1b53537217d58d48c2d7a222a9c3ac47d5b64c.1395140107.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <[email protected]>
2014-03-20timer: Use variable head instead of &work_list in __run_timers()Viresh Kumar1-1/+1
We already have a variable 'head' that points to '&work_list', and so we should use that instead wherever possible. Signed-off-by: Viresh Kumar <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/0d8645a6efc8360c4196c9797d59343abbfdcc5e.1395129136.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <[email protected]>
2014-03-19Merge branch 'for-3.14-fixes' of ↵Linus Torvalds1-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "One really late cgroup patch to fix error path in create_css(). Hitting this bug would be pretty rare but still possible and it gets delayed we'd need to backport it through -stable anyway. It only updates error path in create_css() and has low chance of new breakages" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix a failure path in create_css()
2014-03-19cgroup: fix cgroup_taskset walking orderTejun Heo1-5/+19
cgroup_taskset is used to track and iterate target tasks while migrating a task or process and should guarantee that the first task iterated is the task group leader if a process is being migrated. b3dc094e9390 ("cgroup: use css_set->mg_tasks to track target tasks during migration") replaced flex array cgroup_taskset->tc_array with css_set->mg_tasks list to remove process size limit and dynamic allocation during migration; unfortunately, it incorrectly used list operations which don't preserve order breaking the guarantee that cgroup_taskset_first() returns the leader for a process target. Fix it by using order preserving list operations. Note that as multiple src_csets may map to a single dst_cset, the iteration order may change across cgroup_task_migrate(); however, the leader is still guaranteed to be the first entry. The switch to list_splice_tail_init() at the end of cgroup_migrate() isn't strictly necessary. Let's still do it for consistency. Signed-off-by: Tejun Heo <[email protected]>
2014-03-19resources: Set type in __request_region()Bjorn Helgaas1-2/+2
We don't set the type (I/O, memory, etc.) of resources added by __request_region(), which leads to confusing messages like this: address space collision: [io 0x1000-0x107f] conflicts with ACPI CPU throttle [??? 0x00001010-0x00001015 flags 0x80000000] Set the type of a new resource added by __request_region() (used by request_region() and request_mem_region()) to the type of its parent. This makes the resource tree internally consistent and fixes messages like the above, where the ACPI CPU throttle resource really is an I/O port region, but request_region() didn't fill in the type, so %pR didn't know how to print it. Sample dmesg showing the issue at the link below. Link: https://bugzilla.kernel.org/show_bug.cgi?id=71611 Reported-by: Paul Bolle <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2014-03-19cgroup: implement CFTYPE_ONLY_ON_DFLTejun Heo1-0/+2
This cftype flag makes the file only appear on the default hierarchy. This will later be used for cgroup.controllers file. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-03-19cgroup: make cgrp_dfl_root mountableTejun Heo1-33/+61
cgrp_dfl_root will be used as the default unified hierarchy. This patch makes cgrp_dfl_root mountable by making the following changes. * cgroup_init_early() now initializes cgrp_dfl_root w/ CGRP_ROOT_SANE_BEHAVIOR. The default hierarchy is always sane. * parse_cgroupfs_options() and cgroup_mount() are updated such that cgrp_dfl_root is mounted if sane_behavior is specified w/o any subsystems. * rebind_subsystems() now populates the root directory of cgrp_dfl_root. Note that the function still guarantees success of rebinding subsystems to cgrp_dfl_root. If populating fails while rebinding to cgrp_dfl_root, it whines but ignores the error. * For backward compatibility, the default hierarchy shows up in /proc/$PID/cgroup only after it's explicitly mounted so that userland which doesn't make use of it doesn't see any change. * "current_css_set_cg_links" file of debug cgroup now treats the default hierarchy the same as other hierarchies. This is visible to userland. Given that it's for debug controller, this should be fine. * While at it, implement cgroup_on_dfl() which tests whether a give cgroup is on the default hierarchy or not. The above changes make cgrp_dfl_root mostly equivalent to other controllers but the actual unified hierarchy behaviors are not implemented yet. Let's plug child cgroup creation in cgrp_dfl_root from create_cgroup() for now. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-03-19cgroup: drop const from @buffer of cftype->write_string()Tejun Heo3-3/+3
cftype->write_string() just passes on the writeable buffer from kernfs and there's no reason to add const restriction on the buffer. The only thing const achieves is unnecessarily complicating parsing of the buffer. Drop const from @buffer. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Balbir Singh <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]>
2014-03-19cgroup: rename cgroup_dummy_root and related namesTejun Heo1-87/+81
The dummy root will be repurposed to serve as the default unified hierarchy. Let's rename things in preparation. * s/cgroup_dummy_root/cgrp_dfl_root/ * s/cgroupfs_root/cgroup_root/ as we don't do fs part directly anymore * s/cgroup_root->top_cgroup/cgroup_root->cgrp/ for brevity This is pure rename. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-03-19cgroup: move ->subsys_mask from cgroupfs_root to cgroupTejun Heo1-22/+39
cgroupfs_root->subsys_mask represents the controllers attached to the hierarchy. This patch moves the field to cgroup. Subsystem initialization and rebinding updates the top cgroup's subsys_mask. For !root cgroups, the subsys_mask bits are set from create_css() and cleared from kill_css(), which effectively means that all cgroups will have the same subsys_mask as the top cgroup. While this doesn't make any difference now, this will help implementation of the default unified hierarchy where !root cgroups may have subsets of the top_cgroup's subsys_mask. While at it, __kill_css() is split out of kill_css(). The former doesn't care about the subsys_mask while the latter becomes noop if the controller is already killed and clears the matching bit if not before proceeding to killing the css. This will be used later by the default unified hierarchy implementation. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-03-19cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebindingTejun Heo1-44/+56
Currently, while rebinding, cgroup_dummy_root serves as the anchor point. In addition to the target root, rebind_subsystems() takes @added_mask and @removed_mask. The subsystems specified in the former are expected to be on the dummy root and then moved to the target root. The ones in the latter are moved from non-dummy root to dummy. Now that the dummy root is a fully functional one and we're planning to use it for the default unified hierarchy, this level of distinction between dummy and non-dummy roots is quite awkward. This patch updates rebind_subsystems() to take the target root and one subsystem mask and move the specified subsystmes to the target root which may or may not be the dummy root. IOW, unbinding now becomes moving the subsystems to the dummy root and binding to non-dummy root. This makes the dummy root mostly equivalent to other hierarchies in terms of the mechanism of moving subsystems around; however, we still retain all the semantical restrictions so that this patch doesn't introduce any visible behavior differences. Another noteworthy detail is that rebind_subsystems() guarantees that moving a subsystem to the dummy root never fails so that valid unmounting attempts always succeed. This unifies binding and unbinding of subsystems. The invocation points of ->bind() were inconsistent between the two and now moved after whole rebinding is complete. This doesn't break the current users and generally makes more sense. All rebind_subsystems() users are converted accordingly. Note that cgroup_remount() now makes two calls to rebind_subsystems() to bind and then unbind the requested subsystems. This will allow repurposing of the dummy hierarchy as the default unified hierarchy and shouldn't make any userland visible behavior difference. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-03-19cgroup: use cgroup_setup_root() to initialize cgroup_dummy_rootTejun Heo1-23/+20
cgroup_dummy_root is used to host controllers which aren't attached to any other hierarchy. The root is minimally set up during kernfs bootstrap and didn't go through full hierarchy initialization. We're planning to use cgroup_dummy_root for the default unified hierarchy and thus want it to be fully functional. Replace the special initialization, which was collected into cgroup_init() by the previous patch, with an invocation of cgroup_setup_root(). This simplifies the init path and makes cgroup_dummy_root a full hierarchy with its own kernfs_root and all. As this puts the dummy hierarchy on the cgroup_roots list, rename for_each_active_root() to for_each_root() and update its users to skip the dummy root for now. This patch doesn't cause any userland visible behavior changes at this point. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-03-19cgroup: reorganize cgroup bootstrappingTejun Heo1-51/+49
* Fields of init_css_set and css_set_count are now set using initializer instead of programmatically from cgroup_init_early(). * init_cgroup_root() now also takes @opts and performs the optional part of initialization too. The leftover part of cgroup_root_from_opts() is collapsed into its only caller - cgroup_mount(). * Initialization of cgroup_root_count and linking of init_css_set are moved from cgroup_init_early() to to cgroup_init(). None of the early_init users depends on init_css_set being linked. * Subsystem initializations are moved after dummy hierarchy init and init_css_set linking. These changes reorganize the bootstrap logic so that the dummy hierarchy can share the usual hierarchy init path and be made more normal. These changes don't make noticeable behavior changes. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-03-19cgroup: relocate setting of CGRP_DEADTejun Heo1-9/+9
In cgroup_destroy_locked(), move setting of CGRP_DEAD above invocations of kill_css(). This doesn't make any visible behavior difference now but will be used to inhibit manipulating controller enable states of a dying cgroup on the unified hierarchy. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-03-19genirq: procfs: Make smp_affinity values go+rChema Gonzalez1-4/+4
Includes: - /proc/irq/default_smp_affinity - /proc/irq/*/affinity_hint - /proc/irq/*/smp_affinity - /proc/irq/*/smp_affinity_list Users can distill the same information by reading /proc/interrupts. Signed-off-by: Chema Gonzalez <[email protected]> Cc: Eric Dumazet <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-03-19softirq: Add linux/irq.h to make it compile againThomas Gleixner1-0/+1
On Sparc and S390 the removal of irq.h from kernel_stat.h causes: kernel/softirq.c:774:9: error: 'NR_IRQS_LEGACY' undeclared Reported-by: Peter Zijlstra <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2014-03-18cgroup: fix a failure path in create_css()Li Zefan1-4/+7
If online_css() fails, we should remove cgroup files belonging to css->ss. Signed-off-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2014-03-18uprobes: allow ignoring of probe hitsDavid A. Long1-0/+9
Allow arches to decided to ignore a probe hit. ARM will use this to only call handlers if the conditions to execute a conditionally executed instruction are satisfied. Signed-off-by: David A. Long <[email protected]> Acked-by: Oleg Nesterov <[email protected]>
2014-03-18uprobes: Kconfig dependency fixDavid A. Long1-0/+1
Suggested change from Oleg Nesterov. Fixes incomplete dependencies for uprobes feature. Signed-off-by: David A. Long <[email protected]> Acked-by: Oleg Nesterov <[email protected]>
2014-03-16Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds3-3/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three small fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock: Prevent tracing recursion in sched_clock_cpu() stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy
2014-03-14genirq: Add a new IRQCHIP_EOI_THREADED flagThomas Gleixner3-9/+42
The flag is necessary for interrupt chips which require an ACK/EOI after the handler has run. In case of threaded handlers this needs to happen after the threaded handler has completed before the unmask of the interrupt. The flag is only unseful in combination with the handle_fasteoi_irq flow control handler. It can be combined with the flag IRQCHIP_EOI_IF_HANDLED, so the EOI is not issued when the interrupt is disabled or in progress. Tested-by: Hans de Goede <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Maxime Ripard <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-03-13block: remove old blk_iopoll_enabled variableJens Axboe1-12/+0
This was a debugging measure to toggle enabled/disabled when testing. But for real production setups, it's not safe to toggle this setting without either reloading drivers of quiescing IO first. Neither of which the toggle enforces. Additionally, it makes drivers deal with the conditional state. Remove it completely. It's up to the driver whether iopoll is enabled or not. Signed-off-by: Jens Axboe <[email protected]>
2014-03-13sched: Remove needless round trip nsecs <-> tick conversion of steal timeFrederic Weisbecker2-16/+0
When update_rq_clock_task() accounts the pending steal time for a task, it converts the steal delta from nsecs to tick then from tick to nsecs. There is no apparent good reason for doing that though because both the task clock and the prev steal delta are u64 and store values in nsecs. So lets remove the needless conversion. Cc: Ingo Molnar <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Acked-by: Rik van Riel <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2014-03-13cputime: Fix jiffies based cputime assumption on steal accountingFrederic Weisbecker1-5/+11
The steal guest time accounting code assumes that cputime_t is based on jiffies. So when CONFIG_NO_HZ_FULL=y, which implies that cputime_t is based on nsecs, steal_account_process_tick() passes the delta in jiffies to account_steal_time() which then accounts it as if it's a value in nsecs. As a result, accounting 1 second of steal time (with HZ=100 that would be 100 jiffies) is spuriously accounted as 100 nsecs. As such /proc/stat may report 0 values of steal time even when two guests have run concurrently for a few seconds on the same host and same CPU. In order to fix this, lets convert the nsecs based steal delta to cputime instead of jiffies by using the right conversion API. Given that the steal time is stored in cputime_t and this type can have a smaller granularity than nsecs, we only account the rounded converted value and leave the remaining nsecs for the next deltas. Reported-by: Huiqingding <[email protected]> Reported-by: Marcelo Tosatti <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Acked-by: Rik van Riel <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2014-03-13Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULEMathieu Desnoyers3-3/+8
Users have reported being unable to trace non-signed modules loaded within a kernel supporting module signature. This is caused by tracepoint.c:tracepoint_module_coming() refusing to take into account tracepoints sitting within force-loaded modules (TAINT_FORCED_MODULE). The reason for this check, in the first place, is that a force-loaded module may have a struct module incompatible with the layout expected by the kernel, and can thus cause a kernel crash upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y. Tracepoints, however, specifically accept TAINT_OOT_MODULE and TAINT_CRAP, since those modules do not lead to the "very likely system crash" issue cited above for force-loaded modules. With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed module is tainted re-using the TAINT_FORCED_MODULE taint flag. Unfortunately, this means that Tracepoints treat that module as a force-loaded module, and thus silently refuse to consider any tracepoint within this module. Since an unsigned module does not fit within the "very likely system crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag to specifically address this taint behavior, and accept those modules within Tracepoints. We use the letter 'X' as a taint flag character for a module being loaded that doesn't know how to sign its name (proposed by Steven Rostedt). Also add the missing 'O' entry to trace event show_module_flags() list for the sake of completeness. Signed-off-by: Mathieu Desnoyers <[email protected]> Acked-by: Steven Rostedt <[email protected]> NAKed-by: Ingo Molnar <[email protected]> CC: Thomas Gleixner <[email protected]> CC: David Howells <[email protected]> CC: Greg Kroah-Hartman <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-03-13module: use pr_contJiri Slaby1-3/+3
When dumping loaded modules, we print them one by one in separate printks. Let's use pr_cont as they are continuation prints. Signed-off-by: Jiri Slaby <[email protected]> Cc: Rusty Russell <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-03-12Merge branch 'irq/for-gpio' into irq/coreThomas Gleixner20-98/+168
Merge the request/release callbacks which are in a separate branch for consumption by the gpio folks. Signed-off-by: Thomas Gleixner <[email protected]>
2014-03-12genirq: Provide irq_request/release_resources chip callbacksThomas Gleixner1-1/+27
For certain irq types, e.g. gpios, it's necessary to request resources before starting up the irq. This might fail so we cannot use the irq_startup() callback because we might call the irq_set_type() callback before that which does not make sense when the resource is not available. Calling irq_startup() before irq_set_type() can lead to spurious interrupts which is not desired either. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jean-Jacques Hiblot <[email protected]> Cc: Grant Likely <[email protected]> Cc: [email protected] Reviewed-by: Linus Walleij <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-03-12locking/mutex: Fix debug checksPeter Zijlstra2-0/+13
OK, so commit: 1d8fe7dc8078 ("locking/mutexes: Unlock the mutex without the wait_lock") generates this boot warning when CONFIG_DEBUG_MUTEXES=y: WARNING: CPU: 0 PID: 139 at /usr/src/linux-2.6/kernel/locking/mutex-debug.c:82 debug_mutex_unlock+0x155/0x180() DEBUG_LOCKS_WARN_ON(lock->owner != current) And that makes sense, because as soon as we release the lock a new owner can come in... One would think that !__mutex_slowpath_needs_to_unlock() implementations suffer the same, but for DEBUG we fall back to mutex-null.h which has an unconditional 1 for that. The mutex debug code requires the mutex to be unlocked after doing the debug checks, otherwise it can find inconsistent state. Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-12sched: Clean up the task_hot() functionAlex Shi1-2/+2
task_hot() doesn't need the 'sched_domain' parameter, so remove it. Signed-off-by: Alex Shi <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-12sched: Remove double calculation in fix_small_imbalance()Vincent Guittot1-4/+2
The tmp value has been already calculated in: scaled_busy_load_per_task = (busiest->load_per_task * SCHED_POWER_SCALE) / busiest->group_power; Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-12sched: Fix broken setscheduler()Steven Rostedt1-1/+9
I decided to run my tests on linux-next, and my wakeup_rt tracer was broken. After running a bisect, I found that the problem commit was: linux-next commit c365c292d059 "sched: Consider pi boosting in setscheduler()" And the reason the wake_rt tracer test was failing, was because it had no RT task to trace. I first noticed this when running with sched_switch event and saw that my RT task still had normal SCHED_OTHER priority. Looking at the problem commit, I found: - p->normal_prio = normal_prio(p); - p->prio = rt_mutex_getprio(p); With no + p->normal_prio = normal_prio(p); + p->prio = rt_mutex_getprio(p); Reading what the commit is suppose to do, I realize that the p->prio can't be set if the task is boosted with a higher prio, but the p->normal_prio still needs to be set regardless, otherwise, when the task is deboosted, it wont get the new priority. The p->prio has to be set before "check_class_changed()" is called, otherwise the class wont be changed. Also added fix to newprio to include a check for deadline policy that was missing. This change was suggested by Juri Lelli. Signed-off-by: Steven Rostedt <[email protected]> Cc: SebastianAndrzej Siewior <[email protected]> Cc: Juri Lelli <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-11ftrace: Constify ftrace_text_reservedSasha Levin1-1/+1
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-03-11tracepoints: API doc update to tracepoint_probe_register() return valueMathieu Desnoyers1-1/+11
Describe the return values of tracepoint_probe_register(), including -ENODEV added by commit: Author: Steven Rostedt <[email protected]> tracing: Warn if a tracepoint is not set via debugfs Link: http://lkml.kernel.org/r/[email protected] CC: Ingo Molnar <[email protected]> CC: Frederic Weisbecker <[email protected]> CC: Andrew Morton <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-03-11tracepoints: API doc update to data argumentMathieu Desnoyers1-0/+4
Describe the @data argument (probe private data). Link: http://lkml.kernel.org/r/[email protected] Fixes: 38516ab59fbc "tracing: Let tracepoints have data passed to tracepoint callbacks" CC: Ingo Molnar <[email protected]> CC: Frederic Weisbecker <[email protected]> CC: Andrew Morton <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-03-12PM: Add missing "freeze" stateGeert Uytterhoeven1-2/+2
Fix descriptions of /sys/power/state in the documentation and in a code comment. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Srivatsa S. Bhat <[email protected]> Acked-by: Pavel Machek <[email protected]> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <[email protected]>
2014-03-12PM / Hibernate: Spelling s/anonymouns/anonymous/Geert Uytterhoeven1-1/+1
Spelling fix. Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Pavel Machek <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2014-03-11ftrace: Fix compilation warning about control_ops_freeJiri Slaby1-5/+5
With CONFIG_DYNAMIC_FTRACE=n, I see a warning: kernel/trace/ftrace.c:240:13: warning: 'control_ops_free' defined but not used static void control_ops_free(struct ftrace_ops *ops) ^ Move that function around to an already existing #ifdef CONFIG_DYNAMIC_FTRACE block as the function is used solely from the dynamic function tracing functions. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Slaby <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-03-11Merge branch 'for-linus' of ↵Linus Torvalds3-19/+24
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull audit namespace fixes from Eric Biederman: "Starting with 3.14-rc1 the audit code is faulty (think oopses and races) with respect to how it computes the network namespace of which socket to reply to, and I happened to notice by chance when reading through the code. My testing and the automated build bots don't find any problems with these fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: audit: Update kdoc for audit_send_reply and audit_list_rules_send audit: Send replies in the proper network namespace. audit: Use struct net not pid_t to remember the network namespce to reply in
2014-03-11locking/mutexes: Add extra reschedule pointPeter Zijlstra1-0/+7
Add in an extra reschedule in an attempt to avoid getting reschedule the moment we've acquired the lock. Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-11locking/mutexes: Introduce cancelable MCS lock for adaptive spinningPeter Zijlstra4-5/+200
Since we want a task waiting for a mutex_lock() to go to sleep and reschedule on need_resched() we must be able to abort the mcs_spin_lock() around the adaptive spin. Therefore implement a cancelable mcs lock. Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Jason Low <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-11locking/mutexes: Unlock the mutex without the wait_lockJason Low1-4/+4
When running workloads that have high contention in mutexes on an 8 socket machine, mutex spinners would often spin for a long time with no lock owner. The main reason why this is occuring is in __mutex_unlock_common_slowpath(), if __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the mutex->wait_lock before releasing the mutex (setting lock->count to 1). When the wait_lock is contended, this delays the mutex from being released. We should be able to release the mutex without holding the wait_lock. Signed-off-by: Jason Low <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-11locking/mutexes: Modify the way optimistic spinners are queuedJason Low1-9/+6
The mutex->spin_mlock was introduced in order to ensure that only 1 thread spins for lock acquisition at a time to reduce cache line contention. When lock->owner is NULL and the lock->count is still not 1, the spinner(s) will continually release and obtain the lock->spin_mlock. This can generate quite a bit of overhead/contention, and also might just delay the spinner from getting the lock. This patch modifies the way optimistic spinners are queued by queuing before entering the optimistic spinning loop as oppose to acquiring before every call to mutex_spin_on_owner(). So in situations where the spinner requires a few extra spins before obtaining the lock, then there will only be 1 spinner trying to get the lock and it will avoid the overhead from unnecessarily unlocking and locking the spin_mlock. Signed-off-by: Jason Low <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-11locking/mutexes: Return false if task need_resched() in ↵Jason Low1-0/+3
mutex_can_spin_on_owner() The mutex_can_spin_on_owner() function should also return false if the task needs to be rescheduled to avoid entering the MCS queue when it needs to reschedule. Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-11locking: Move mcs_spinlock.h into kernel/locking/Peter Zijlstra2-1/+115
The mcs_spinlock code is not meant (or suitable) as a generic locking primitive, therefore take it away from the normal includes and place it in kernel/locking/. This way the locking primitives implemented there can use it as part of their implementation but we do not risk it getting used inapropriately. Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-11sched/numa: Move task_numa_free() to __put_task_struct()Mike Galbraith2-2/+1
Bad idea on -rt: [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf and since upstream does not mind where we do this, be a bit nicer ... Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-03-11sched/fair: Fix endless loop in idle_balance()Kirill Tkhai1-1/+1
Check for fair tasks number to decide, that we've pulled a task. rq's nr_running may contain throttled RT tasks. Signed-off-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/1394118975.19290.104.camel@tkhai Signed-off-by: Ingo Molnar <[email protected]>
2014-03-11sched/core: Fix endless loop in pick_next_task()Kirill Tkhai3-11/+15
1) Single cpu machine case. When rq has only RT tasks, but no one of them can be picked because of throttling, we enter in endless loop. pick_next_task_{dl,rt} return NULL. In pick_next_task_fair() we permanently go to retry if (rq->nr_running != rq->cfs.h_nr_running) return RETRY_TASK; (rq->nr_running is not being decremented when rt_rq becomes throttled). No chances to unthrottle any rt_rq or to wake fair here, because of rq is locked permanently and interrupts are disabled. 2) In case of SMP this can cause a hang too. Although we unlock rq in idle_balance(), interrupts are still disabled. The solution is to check for available tasks in DL and RT classes instead of checking for sum. Signed-off-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/1394098321.19290.11.camel@tkhai Signed-off-by: Ingo Molnar <[email protected]>