Age | Commit message (Collapse) | Author | Files | Lines |
|
* pm-runtime:
PM / Runtime: Update runtime_idle() documentation for return value meaning
* pm-sleep:
PM / sleep: Correct whitespace errors in <linux/pm.h>
PM: Add missing "freeze" state
PM / Hibernate: Spelling s/anonymouns/anonymous/
PM / Runtime: Add missing "it" in comment
PM / suspend: Remove unnecessary !!
PCI / PM: Resume runtime-suspended devices later during system suspend
ACPI / PM: Resume runtime-suspended devices later during system suspend
PM / sleep: Set pm_generic functions to NULL for !CONFIG_PM_SLEEP
PM: fix typo in comment
PM / hibernate: use name_to_dev_t to parse resume
PM / wakeup: Include appropriate header file in kernel/power/wakelock.c
PM / sleep: Move prototype declaration to header file kernel/power/power.h
PM / sleep: Asynchronous threads for suspend_late
PM / sleep: Asynchronous threads for suspend_noirq
PM / sleep: Asynchronous threads for resume_early
PM / sleep: Asynchronous threads for resume_noirq
PM / sleep: Two flags for async suspend_noirq and suspend_late
|
|
* pm-qos:
PM / QoS: Add type to dev_pm_qos_add_ancestor_request() arguments
ACPI / LPSS: Support for device latency tolerance PM QoS
ACPI / scan: Add bind/unbind callbacks to struct acpi_scan_handler
PM / QoS: Introcuce latency tolerance device PM QoS type
PM / QoS: Add no_constraints_value field to struct pm_qos_constraints
PM / QoS: Rename device resume latency QoS items
* pm-domains:
PM / domains: Turn latency warning into debug message
* pm-drivers:
PM: Add pm_runtime_suspend|resume_force functions
PM / runtime: Fetch runtime PM callbacks using a macro
|
|
There are only two users of get_nohz_timer_target(): timer and hrtimer. Both
call it under same circumstances, i.e.
#ifdef CONFIG_NO_HZ_COMMON
if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu))
return get_nohz_timer_target();
#endif
So, it makes more sense to get all this as part of get_nohz_timer_target()
instead of duplicating code at two places. For this another parameter is
required to be passed to this routine, pinned.
Signed-off-by: Viresh Kumar <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/1e1b53537217d58d48c2d7a222a9c3ac47d5b64c.1395140107.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
We already have a variable 'head' that points to '&work_list', and so
we should use that instead wherever possible.
Signed-off-by: Viresh Kumar <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/0d8645a6efc8360c4196c9797d59343abbfdcc5e.1395129136.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"One really late cgroup patch to fix error path in create_css().
Hitting this bug would be pretty rare but still possible and it gets
delayed we'd need to backport it through -stable anyway. It only
updates error path in create_css() and has low chance of new
breakages"
* 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix a failure path in create_css()
|
|
cgroup_taskset is used to track and iterate target tasks while
migrating a task or process and should guarantee that the first task
iterated is the task group leader if a process is being migrated.
b3dc094e9390 ("cgroup: use css_set->mg_tasks to track target tasks
during migration") replaced flex array cgroup_taskset->tc_array with
css_set->mg_tasks list to remove process size limit and dynamic
allocation during migration; unfortunately, it incorrectly used list
operations which don't preserve order breaking the guarantee that
cgroup_taskset_first() returns the leader for a process target.
Fix it by using order preserving list operations. Note that as
multiple src_csets may map to a single dst_cset, the iteration order
may change across cgroup_task_migrate(); however, the leader is still
guaranteed to be the first entry.
The switch to list_splice_tail_init() at the end of cgroup_migrate()
isn't strictly necessary. Let's still do it for consistency.
Signed-off-by: Tejun Heo <[email protected]>
|
|
We don't set the type (I/O, memory, etc.) of resources added by
__request_region(), which leads to confusing messages like this:
address space collision: [io 0x1000-0x107f] conflicts with ACPI CPU throttle [??? 0x00001010-0x00001015 flags 0x80000000]
Set the type of a new resource added by __request_region() (used by
request_region() and request_mem_region()) to the type of its parent. This
makes the resource tree internally consistent and fixes messages like the
above, where the ACPI CPU throttle resource really is an I/O port region,
but request_region() didn't fill in the type, so %pR didn't know how to
print it.
Sample dmesg showing the issue at the link below.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=71611
Reported-by: Paul Bolle <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
This cftype flag makes the file only appear on the default hierarchy.
This will later be used for cgroup.controllers file.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
cgrp_dfl_root will be used as the default unified hierarchy. This
patch makes cgrp_dfl_root mountable by making the following changes.
* cgroup_init_early() now initializes cgrp_dfl_root w/
CGRP_ROOT_SANE_BEHAVIOR. The default hierarchy is always sane.
* parse_cgroupfs_options() and cgroup_mount() are updated such that
cgrp_dfl_root is mounted if sane_behavior is specified w/o any
subsystems.
* rebind_subsystems() now populates the root directory of
cgrp_dfl_root. Note that the function still guarantees success of
rebinding subsystems to cgrp_dfl_root. If populating fails while
rebinding to cgrp_dfl_root, it whines but ignores the error.
* For backward compatibility, the default hierarchy shows up in
/proc/$PID/cgroup only after it's explicitly mounted so that
userland which doesn't make use of it doesn't see any change.
* "current_css_set_cg_links" file of debug cgroup now treats the
default hierarchy the same as other hierarchies. This is visible to
userland. Given that it's for debug controller, this should be
fine.
* While at it, implement cgroup_on_dfl() which tests whether a give
cgroup is on the default hierarchy or not.
The above changes make cgrp_dfl_root mostly equivalent to other
controllers but the actual unified hierarchy behaviors are not
implemented yet. Let's plug child cgroup creation in cgrp_dfl_root
from create_cgroup() for now.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
cftype->write_string() just passes on the writeable buffer from kernfs
and there's no reason to add const restriction on the buffer. The
only thing const achieves is unnecessarily complicating parsing of the
buffer. Drop const from @buffer.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
|
|
The dummy root will be repurposed to serve as the default unified
hierarchy. Let's rename things in preparation.
* s/cgroup_dummy_root/cgrp_dfl_root/
* s/cgroupfs_root/cgroup_root/ as we don't do fs part directly anymore
* s/cgroup_root->top_cgroup/cgroup_root->cgrp/ for brevity
This is pure rename.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
cgroupfs_root->subsys_mask represents the controllers attached to the
hierarchy. This patch moves the field to cgroup. Subsystem
initialization and rebinding updates the top cgroup's subsys_mask.
For !root cgroups, the subsys_mask bits are set from create_css() and
cleared from kill_css(), which effectively means that all cgroups will
have the same subsys_mask as the top cgroup.
While this doesn't make any difference now, this will help
implementation of the default unified hierarchy where !root cgroups
may have subsets of the top_cgroup's subsys_mask.
While at it, __kill_css() is split out of kill_css(). The former
doesn't care about the subsys_mask while the latter becomes noop if
the controller is already killed and clears the matching bit if not
before proceeding to killing the css. This will be used later by the
default unified hierarchy implementation.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
Currently, while rebinding, cgroup_dummy_root serves as the anchor
point. In addition to the target root, rebind_subsystems() takes
@added_mask and @removed_mask. The subsystems specified in the former
are expected to be on the dummy root and then moved to the target
root. The ones in the latter are moved from non-dummy root to dummy.
Now that the dummy root is a fully functional one and we're planning
to use it for the default unified hierarchy, this level of distinction
between dummy and non-dummy roots is quite awkward.
This patch updates rebind_subsystems() to take the target root and one
subsystem mask and move the specified subsystmes to the target root
which may or may not be the dummy root. IOW, unbinding now becomes
moving the subsystems to the dummy root and binding to non-dummy root.
This makes the dummy root mostly equivalent to other hierarchies in
terms of the mechanism of moving subsystems around; however, we still
retain all the semantical restrictions so that this patch doesn't
introduce any visible behavior differences. Another noteworthy detail
is that rebind_subsystems() guarantees that moving a subsystem to the
dummy root never fails so that valid unmounting attempts always
succeed.
This unifies binding and unbinding of subsystems. The invocation
points of ->bind() were inconsistent between the two and now moved
after whole rebinding is complete. This doesn't break the current
users and generally makes more sense.
All rebind_subsystems() users are converted accordingly. Note that
cgroup_remount() now makes two calls to rebind_subsystems() to bind
and then unbind the requested subsystems.
This will allow repurposing of the dummy hierarchy as the default
unified hierarchy and shouldn't make any userland visible behavior
difference.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
cgroup_dummy_root is used to host controllers which aren't attached to
any other hierarchy. The root is minimally set up during kernfs
bootstrap and didn't go through full hierarchy initialization. We're
planning to use cgroup_dummy_root for the default unified hierarchy
and thus want it to be fully functional.
Replace the special initialization, which was collected into
cgroup_init() by the previous patch, with an invocation of
cgroup_setup_root(). This simplifies the init path and makes
cgroup_dummy_root a full hierarchy with its own kernfs_root and all.
As this puts the dummy hierarchy on the cgroup_roots list, rename
for_each_active_root() to for_each_root() and update its users to skip
the dummy root for now.
This patch doesn't cause any userland visible behavior changes at this
point.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
* Fields of init_css_set and css_set_count are now set using
initializer instead of programmatically from cgroup_init_early().
* init_cgroup_root() now also takes @opts and performs the optional
part of initialization too. The leftover part of
cgroup_root_from_opts() is collapsed into its only caller -
cgroup_mount().
* Initialization of cgroup_root_count and linking of init_css_set are
moved from cgroup_init_early() to to cgroup_init(). None of the
early_init users depends on init_css_set being linked.
* Subsystem initializations are moved after dummy hierarchy init and
init_css_set linking.
These changes reorganize the bootstrap logic so that the dummy
hierarchy can share the usual hierarchy init path and be made more
normal. These changes don't make noticeable behavior changes.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
In cgroup_destroy_locked(), move setting of CGRP_DEAD above
invocations of kill_css(). This doesn't make any visible behavior
difference now but will be used to inhibit manipulating controller
enable states of a dying cgroup on the unified hierarchy.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
Includes:
- /proc/irq/default_smp_affinity
- /proc/irq/*/affinity_hint
- /proc/irq/*/smp_affinity
- /proc/irq/*/smp_affinity_list
Users can distill the same information by reading /proc/interrupts.
Signed-off-by: Chema Gonzalez <[email protected]>
Cc: Eric Dumazet <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
On Sparc and S390 the removal of irq.h from kernel_stat.h causes:
kernel/softirq.c:774:9: error: 'NR_IRQS_LEGACY' undeclared
Reported-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
If online_css() fails, we should remove cgroup files belonging
to css->ss.
Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Allow arches to decided to ignore a probe hit. ARM will use this to
only call handlers if the conditions to execute a conditionally executed
instruction are satisfied.
Signed-off-by: David A. Long <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
|
|
Suggested change from Oleg Nesterov. Fixes incomplete dependencies
for uprobes feature.
Signed-off-by: David A. Long <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Three small fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/clock: Prevent tracing recursion in sched_clock_cpu()
stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy
|
|
The flag is necessary for interrupt chips which require an ACK/EOI
after the handler has run. In case of threaded handlers this needs to
happen after the threaded handler has completed before the unmask of
the interrupt.
The flag is only unseful in combination with the handle_fasteoi_irq
flow control handler.
It can be combined with the flag IRQCHIP_EOI_IF_HANDLED, so the EOI is
not issued when the interrupt is disabled or in progress.
Tested-by: Hans de Goede <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Maxime Ripard <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
This was a debugging measure to toggle enabled/disabled
when testing. But for real production setups, it's not
safe to toggle this setting without either reloading
drivers of quiescing IO first. Neither of which the toggle
enforces.
Additionally, it makes drivers deal with the conditional
state.
Remove it completely. It's up to the driver whether iopoll
is enabled or not.
Signed-off-by: Jens Axboe <[email protected]>
|
|
When update_rq_clock_task() accounts the pending steal time for a task,
it converts the steal delta from nsecs to tick then from tick to nsecs.
There is no apparent good reason for doing that though because both
the task clock and the prev steal delta are u64 and store values
in nsecs.
So lets remove the needless conversion.
Cc: Ingo Molnar <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The steal guest time accounting code assumes that cputime_t is based on
jiffies. So when CONFIG_NO_HZ_FULL=y, which implies that cputime_t
is based on nsecs, steal_account_process_tick() passes the delta in
jiffies to account_steal_time() which then accounts it as if it's a
value in nsecs.
As a result, accounting 1 second of steal time (with HZ=100 that would
be 100 jiffies) is spuriously accounted as 100 nsecs.
As such /proc/stat may report 0 values of steal time even when two
guests have run concurrently for a few seconds on the same host and
same CPU.
In order to fix this, lets convert the nsecs based steal delta to
cputime instead of jiffies by using the right conversion API.
Given that the steal time is stored in cputime_t and this type can have
a smaller granularity than nsecs, we only account the rounded converted
value and leave the remaining nsecs for the next deltas.
Reported-by: Huiqingding <[email protected]>
Reported-by: Marcelo Tosatti <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.
This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.
Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.
With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.
Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. We use the letter 'X' as a taint flag character for
a module being loaded that doesn't know how to sign its name (proposed
by Steven Rostedt).
Also add the missing 'O' entry to trace event show_module_flags() list
for the sake of completeness.
Signed-off-by: Mathieu Desnoyers <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
NAKed-by: Ingo Molnar <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: David Howells <[email protected]>
CC: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
|
|
When dumping loaded modules, we print them one by one in separate
printks. Let's use pr_cont as they are continuation prints.
Signed-off-by: Jiri Slaby <[email protected]>
Cc: Rusty Russell <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
|
|
Merge the request/release callbacks which are in a separate branch for
consumption by the gpio folks.
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
For certain irq types, e.g. gpios, it's necessary to request resources
before starting up the irq.
This might fail so we cannot use the irq_startup() callback because we
might call the irq_set_type() callback before that which does not make
sense when the resource is not available. Calling irq_startup() before
irq_set_type() can lead to spurious interrupts which is not desired
either.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jean-Jacques Hiblot <[email protected]>
Cc: Grant Likely <[email protected]>
Cc: [email protected]
Reviewed-by: Linus Walleij <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
OK, so commit:
1d8fe7dc8078 ("locking/mutexes: Unlock the mutex without the wait_lock")
generates this boot warning when CONFIG_DEBUG_MUTEXES=y:
WARNING: CPU: 0 PID: 139 at /usr/src/linux-2.6/kernel/locking/mutex-debug.c:82 debug_mutex_unlock+0x155/0x180() DEBUG_LOCKS_WARN_ON(lock->owner != current)
And that makes sense, because as soon as we release the lock a
new owner can come in...
One would think that !__mutex_slowpath_needs_to_unlock()
implementations suffer the same, but for DEBUG we fall back to
mutex-null.h which has an unconditional 1 for that.
The mutex debug code requires the mutex to be unlocked after
doing the debug checks, otherwise it can find inconsistent
state.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
task_hot() doesn't need the 'sched_domain' parameter, so remove it.
Signed-off-by: Alex Shi <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The tmp value has been already calculated in:
scaled_busy_load_per_task =
(busiest->load_per_task * SCHED_POWER_SCALE) /
busiest->group_power;
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
I decided to run my tests on linux-next, and my wakeup_rt tracer was
broken. After running a bisect, I found that the problem commit was:
linux-next commit c365c292d059
"sched: Consider pi boosting in setscheduler()"
And the reason the wake_rt tracer test was failing, was because it had
no RT task to trace. I first noticed this when running with
sched_switch event and saw that my RT task still had normal SCHED_OTHER
priority. Looking at the problem commit, I found:
- p->normal_prio = normal_prio(p);
- p->prio = rt_mutex_getprio(p);
With no
+ p->normal_prio = normal_prio(p);
+ p->prio = rt_mutex_getprio(p);
Reading what the commit is suppose to do, I realize that the p->prio
can't be set if the task is boosted with a higher prio, but the
p->normal_prio still needs to be set regardless, otherwise, when the
task is deboosted, it wont get the new priority.
The p->prio has to be set before "check_class_changed()" is called,
otherwise the class wont be changed.
Also added fix to newprio to include a check for deadline policy that
was missing. This change was suggested by Juri Lelli.
Signed-off-by: Steven Rostedt <[email protected]>
Cc: SebastianAndrzej Siewior <[email protected]>
Cc: Juri Lelli <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
Describe the return values of tracepoint_probe_register(), including
-ENODEV added by commit:
Author: Steven Rostedt <[email protected]>
tracing: Warn if a tracepoint is not set via debugfs
Link: http://lkml.kernel.org/r/[email protected]
CC: Ingo Molnar <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Andrew Morton <[email protected]>
Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
Describe the @data argument (probe private data).
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 38516ab59fbc "tracing: Let tracepoints have data passed to tracepoint callbacks"
CC: Ingo Molnar <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Andrew Morton <[email protected]>
Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
Fix descriptions of /sys/power/state in the documentation and in
a code comment.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Srivatsa S. Bhat <[email protected]>
Acked-by: Pavel Machek <[email protected]>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Spelling fix.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
With CONFIG_DYNAMIC_FTRACE=n, I see a warning:
kernel/trace/ftrace.c:240:13: warning: 'control_ops_free' defined but not used
static void control_ops_free(struct ftrace_ops *ops)
^
Move that function around to an already existing #ifdef
CONFIG_DYNAMIC_FTRACE block as the function is used solely from the
dynamic function tracing functions.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Slaby <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull audit namespace fixes from Eric Biederman:
"Starting with 3.14-rc1 the audit code is faulty (think oopses and
races) with respect to how it computes the network namespace of which
socket to reply to, and I happened to notice by chance when reading
through the code.
My testing and the automated build bots don't find any problems with
these fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
audit: Update kdoc for audit_send_reply and audit_list_rules_send
audit: Send replies in the proper network namespace.
audit: Use struct net not pid_t to remember the network namespce to reply in
|
|
Add in an extra reschedule in an attempt to avoid getting reschedule
the moment we've acquired the lock.
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Since we want a task waiting for a mutex_lock() to go to sleep and
reschedule on need_resched() we must be able to abort the
mcs_spin_lock() around the adaptive spin.
Therefore implement a cancelable mcs lock.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Jason Low <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When running workloads that have high contention in mutexes on an 8 socket
machine, mutex spinners would often spin for a long time with no lock owner.
The main reason why this is occuring is in __mutex_unlock_common_slowpath(),
if __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the
mutex->wait_lock before releasing the mutex (setting lock->count to 1). When
the wait_lock is contended, this delays the mutex from being released.
We should be able to release the mutex without holding the wait_lock.
Signed-off-by: Jason Low <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The mutex->spin_mlock was introduced in order to ensure that only 1 thread
spins for lock acquisition at a time to reduce cache line contention. When
lock->owner is NULL and the lock->count is still not 1, the spinner(s) will
continually release and obtain the lock->spin_mlock. This can generate
quite a bit of overhead/contention, and also might just delay the spinner
from getting the lock.
This patch modifies the way optimistic spinners are queued by queuing before
entering the optimistic spinning loop as oppose to acquiring before every
call to mutex_spin_on_owner(). So in situations where the spinner requires
a few extra spins before obtaining the lock, then there will only be 1 spinner
trying to get the lock and it will avoid the overhead from unnecessarily
unlocking and locking the spin_mlock.
Signed-off-by: Jason Low <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
mutex_can_spin_on_owner()
The mutex_can_spin_on_owner() function should also return false if the
task needs to be rescheduled to avoid entering the MCS queue when it
needs to reschedule.
Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The mcs_spinlock code is not meant (or suitable) as a generic locking
primitive, therefore take it away from the normal includes and place
it in kernel/locking/.
This way the locking primitives implemented there can use it as part
of their implementation but we do not risk it getting used
inapropriately.
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Bad idea on -rt:
[ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
[ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130
[ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100
[ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae
[ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0
[ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
[ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
[ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0
[ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
[ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17
[ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf
and since upstream does not mind where we do this, be a bit nicer ...
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Check for fair tasks number to decide, that we've pulled a task.
rq's nr_running may contain throttled RT tasks.
Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/1394118975.19290.104.camel@tkhai
Signed-off-by: Ingo Molnar <[email protected]>
|
|
1) Single cpu machine case.
When rq has only RT tasks, but no one of them can be picked
because of throttling, we enter in endless loop.
pick_next_task_{dl,rt} return NULL.
In pick_next_task_fair() we permanently go to retry
if (rq->nr_running != rq->cfs.h_nr_running)
return RETRY_TASK;
(rq->nr_running is not being decremented when rt_rq becomes
throttled).
No chances to unthrottle any rt_rq or to wake fair here,
because of rq is locked permanently and interrupts are
disabled.
2) In case of SMP this can cause a hang too. Although we unlock
rq in idle_balance(), interrupts are still disabled.
The solution is to check for available tasks in DL and RT
classes instead of checking for sum.
Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/1394098321.19290.11.camel@tkhai
Signed-off-by: Ingo Molnar <[email protected]>
|