Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
Currently both the wakeup and irqsoff traces do not handle set_graph_notrace
well. The ftrace infrastructure will ignore the return paths of all
functions leaving them hanging without an end:
# echo '*spin*' > set_graph_notrace
# cat trace
[...]
_raw_spin_lock() {
preempt_count_add() {
do_raw_spin_lock() {
update_rq_clock();
Where the '*spin*' functions should have looked like this:
_raw_spin_lock() {
preempt_count_add();
do_raw_spin_lock();
}
update_rq_clock();
Instead, have the wakeup and irqsoff tracers ignore the functions that are
set by the set_graph_notrace like the function_graph tracer does. Move
the logic in the function_graph tracer into a header to allow wakeup and
irqsoff tracers to use it as well.
Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Both the wakeup and irqsoff tracers can use the function graph tracer when
the display-graph option is set. The problem is that they ignore the notrace
file, and record the entry of functions that would be ignored by the
function_graph tracer. This causes the trace->depth to be recorded into the
ring buffer. The set_graph_notrace uses a trick by adding a large negative
number to the trace->depth when a graph function is to be ignored.
On trace output, the graph function uses the depth to record a stack of
functions. But since the depth is negative, it accesses the array with a
negative number and causes an out of bounds access that can cause a kernel
oops or corrupt data.
Have the print functions handle cases where a tracer still records functions
even when they are in set_graph_notrace.
Also add warnings if the depth is below zero before accessing the array.
Note, the function graph logic will still prevent the return of these
functions from being recorded, which means that they will be left hanging
without a return. For example:
# echo '*spin*' > set_graph_notrace
# echo 1 > options/display-graph
# echo wakeup > current_tracer
# cat trace
[...]
_raw_spin_lock() {
preempt_count_add() {
do_raw_spin_lock() {
update_rq_clock();
Where it should look like:
_raw_spin_lock() {
preempt_count_add();
do_raw_spin_lock();
}
update_rq_clock();
Cc: stable@vger.kernel.org
Cc: Namhyung Kim <namhyung.kim@lge.com>
Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Instead of using get_user_pages_fast() and kmap_atomic() when writing
to the trace_marker file, just allocate enough space on the ring buffer
directly, and write into it via copy_from_user().
Writing into the trace_marker file use to allocate a temporary buffer
to perform the copy_from_user(), as we didn't want to write into the
ring buffer if the copy failed. But as a trace_marker write is suppose
to be extremely fast, and allocating memory causes other tracepoints to
trigger, Peter Zijlstra suggested using get_user_pages_fast() and
kmap_atomic() to keep the user space pages in memory and reading it
directly. But Henrik Austad had issues with this because it required taking
the mm->mmap_sem and causing long delays with the write.
Instead, just allocate the space in the ring buffer and use
copy_from_user() directly. If it faults, return -EFAULT and write
"<faulted>" into the ring buffer.
Link: http://lkml.kernel.org/r/20161208124018.72dd0f86@gandalf.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Henrik Austad <henrik@austad.us>
Cc: Peter Zijlstra <peterz@infradead.org>
Updates: d696b58ca2c3ca "tracing: Do not allocate buffer for trace_marker"
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The trace event start up selftests fails when the trace benchmark is
enabled, because it is disabled during boot. It really only needs to be
disabled before scheduling is set up, as it creates a thread.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
If one of the events within a system fails to enable when "1" is written
to the system "enable" file, it should return an error. Note, some events
may still be enabled, but the user should know that something did go wrong.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Trace events are enabled very early on boot up via the boot command line
parameter. The benchmark tool creates a new thread to perform the trace
event benchmarking. But at start up, it is called before scheduling is set
up and because it creates a new thread before the init thread is created,
this crashes the kernel.
Have the benchmark fail to register when started via the kernel command
line.
Also, since the registering of a tracepoint now can handle failure cases,
return -ENOMEM instead of warning if the thread cannot be created.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Some tracepoints have a registration function that gets enabled when the
tracepoint is enabled. There may be cases that the registraction function
must fail (for example, can't allocate enough memory). In this case, the
tracepoint should also fail to register, otherwise the user would not know
why the tracepoint is not working.
Cc: David Howells <dhowells@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The resume code must deal with a clocksource delta which is potentially big
enough to overflow the 64bit mult.
Replace the open coded handling with the proper function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Liav Rehana <liavr@mellanox.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.921674404@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
cycle_t is defined as u64, so casting it to u64 is a pointless and
confusing exercise. cycle_t should simply go away and be replaced with a
plain u64 to avoid further confusion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Liav Rehana <liavr@mellanox.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.844699737@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Propagating a unsigned value through signed variables and functions makes
absolutely no sense and is just prone to (re)introduce subtle signed
vs. unsigned issues as happened recently.
Clean it up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Liav Rehana <liavr@mellanox.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.765843099@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:
s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:
- Time jumps backwards
- __iter_div_u64_rem() which is used in one of the calling code pathes
will take forever to piecewise calculate the seconds/nanoseconds part.
This has been reported by several people with different scenarios:
David observed that when stopping a VM with a debugger:
"It was essentially the stopped by debugger case. I forget exactly why,
but the guest was being explicitly stopped from outside, it wasn't just
scheduling lag. I think it was something in the vicinity of 10 minutes
stopped."
When lifting the stop the machine went dead.
The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.
But this was also observed on a live system by Liav:
"When the OS is too overloaded, delta will get a high enough value for the
msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
after the shift the nsec variable will gain a value similar to
0xffffffffff000000."
Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
timekeeping_get_ns()").
Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:
s64 nsec;
- nsec = (d * m + n) >> s:
+ nsec = d * m + n;
+ nsec >>= s;
d being type of cycle_t adds another level of obfuscation.
This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.
There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.
Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.
Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.
Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.
This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.
It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.
I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.
Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This patch allows XDP prog to extend/remove the packet
data at the head (like adding or removing header). It is
done by adding a new XDP helper bpf_xdp_adjust_head().
It also renames bpf_helper_changes_skb_data() to
bpf_helper_changes_pkt_data() to better reflect
that XDP prog does not work on skb.
This patch adds one "xdp_adjust_head" bit to bpf_prog for the
XDP-capable driver to check if the XDP prog requires
bpf_xdp_adjust_head() support. The driver can then decide
to error out during XDP_SETUP_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
and 484611357c19 ("bpf: allow access into map value arrays") by themselves
are correct, but in combination they make state equivalence ignore 'id' field
of the register state which can lead to accepting invalid program.
Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
kthread_create_on_cpu() sets KTHREAD_IS_PER_CPU and kthread->cpu, this
only makes sense if this kthread can be parked/unparked by cpuhp code.
kthread workers never call kthread_parkme() so this has no effect.
Change __kthread_create_worker() to simply call kthread_bind(task, cpu).
The very fact that kthread_create_on_cpu() doesn't accept a generic fmt
shows that it should not be used outside of smpboot.c.
Now, the only reason we can not unexport this helper and move it into
smpboot.c is that it sets kthread->cpu and struct kthread is not exported.
And the only reason we can not kill kthread->cpu is that kthread_unpark()
is used by drivers/gpu/drm/amd/scheduler/gpu_scheduler.c and thus we can
not turn _unpark into kthread_unpark(struct smp_hotplug_thread *, cpu).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175110.GA5342@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Now that to_kthread() is always validm change kthread_park() and
kthread_unpark() to use it and kill to_live_kthread().
The conversion of kthread_unpark() is trivial. If KTHREAD_IS_PARKED is set
then the task has called complete(&self->parked) and there the function
cannot race against a concurrent kthread_stop() and exit.
kthread_park() is more tricky, because its semantics are not well
defined. It returns -ENOSYS if the thread exited but this can never happen
and as Roman pointed out kthread_park() can obviously block forever if it
would race with the exiting kthread.
The usage of kthread_park() in cpuhp code (cpu.c, smpboot.c, stop_machine.c)
is fine. It can never see an exiting/exited kthread, smpboot_destroy_threads()
clears *ht->store, smpboot_park_thread() checks it is not NULL under the same
smpboot_threads_lock. cpuhp_threads and cpu_stop_threads never exit, so other
callers are fine too.
But it has two more users:
- watchdog_park_threads():
The code is actually correct, get_online_cpus() ensures that
kthread_park() can't race with itself (note that kthread_park() can't
handle this race correctly), but it should not use kthread_park()
directly.
- drivers/gpu/drm/amd/scheduler/gpu_scheduler.c should not use
kthread_park() either.
kthread_park() must not be called after amd_sched_fini() which does
kthread_stop(), otherwise even to_live_kthread() is not safe because
task_struct can be already freed and sched->thread can point to nowhere.
The usage of kthread_park/unpark should either be restricted to core code
which is properly protected against the exit race or made more robust so it
is safe to use it in drivers.
To catch eventual exit issues, add a WARN_ON(PF_EXITING) for now.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175107.GA5339@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
kthread_stop() had to use to_live_kthread() simply because it was not
possible to access kthread->exited after the exiting task clears
task_struct->vfork_done. Now that to_kthread() is always valid,
wake_up_process() + wait_for_completion() can be done
ununconditionally. It's not an issue anymore if the task has already issued
complete_vfork_done() or died.
The exiting task can get the spurious wakeup after mm_release() but this is
possible without this change too and is fine; do_task_dead() ensures that
this can't make any harm.
As a further enhancement this could be converted to task_work_add() later,
so ->vfork_done can be avoided completely.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175103.GA5336@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
to_live_kthread() function"
This reverts commit 23196f2e5f5d810578a772785807dcdc2b9fdce9.
Now that struct kthread is kmalloc'ed and not longer on the task stack
there is no need anymore to pin the stack.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175100.GA5333@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
commit 23196f2e5f5d "kthread: Pin the stack via try_get_task_stack() /
put_task_stack() in to_live_kthread() function" is a workaround for the
fragile design of struct kthread being allocated on the task stack.
struct kthread in its current form should be removed, but this needs
cleanups outside of kthread.c.
As a first step move struct kthread away from the task stack by making it
kmalloc'ed. This allows to access kthread.exited without the magic of
trying to pin task stack and the try logic in to_live_kthread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chunming Zhou <David1.Zhou@amd.com>
Cc: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161129175057.GA5330@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following
[ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[ 144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
<snipped>
[ 145.122628] Call Trace:
[ 145.125086] [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
[ 145.131350] [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
[ 145.137268] [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
[ 145.143188] [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
[ 145.149018] [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
[ 145.154940] [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
[ 145.160511] [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
[ 145.167035] [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
[ 145.172694] [<ffffffffa290848d>] module_attr_store+0x1d/0x30
[ 145.178443] [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
[ 145.183925] [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
[ 145.189761] [<ffffffffa2a99248>] __vfs_write+0x18/0x40
[ 145.194982] [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
[ 145.200122] [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
[ 145.205177] [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.
Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.
Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In __sanitizer_cov_trace_pc we use task_struct and fields within it, but
as we haven't included <linux/sched.h>, it is not guaranteed to be
defined. While we usually happen to acquire the definition through a
transitive include, this is fragile (and hasn't been true in the past,
causing issues with backports).
Include <linux/sched.h> to avoid any fragility.
[mark.rutland@arm.com: rewrote changelog]
Link: http://lkml.kernel.org/r/1481007384-27529-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"An autogroup nice level adjustment bug fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/autogroup: Fix 64-bit kernel nice level adjustment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"A bogus warning fix, a counter width handling fix affecting certain
machines, plus a oneliner hw-enablement patch for Knights Mill CPUs"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Remove invalid warning from list_update_cgroup_even()t
perf/x86: Fix full width counter, counter overflow
perf/x86/intel: Enable C-state residency events for Knights Mill
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two rtmutex race fixes (which miraculously never triggered, that we
know of), plus two lockdep printk formatting regression fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Fix report formatting
locking/rtmutex: Use READ_ONCE() in rt_mutex_owner()
locking/rtmutex: Prevent dequeue vs. unlock race
locking/selftest: Fix output since KERN_CONT changes
|
|
General assumption is that single program can hold up to BPF_MAXINSNS,
that is, 4096 number of instructions. It is the case with cBPF and
that limit was carried over to eBPF. When recently testing digest, I
noticed that it's actually not possible to feed 4096 instructions
via bpf(2).
The check for > BPF_MAXINSNS was added back then to bpf_check() in
cbd357008604 ("bpf: verifier (add ability to receive verification log)").
However, 09756af46893 ("bpf: expand BPF syscall with program load/unload")
added yet another check that comes before that into bpf_prog_load(),
but this time bails out already in case of >= BPF_MAXINSNS.
Fix it up and perform the check early in bpf_prog_load(), so we can drop
the second one in bpf_check(). It makes sense, because also a 0 insn
program is useless and we don't want to waste any resources doing work
up to bpf_check() point. The existing bpf(2) man page documents E2BIG
as the official error for such cases, so just stick with it as well.
Fixes: 09756af46893 ("bpf: expand BPF syscall with program load/unload")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and
requires to know mult and shift factors for timestamp conversion from raw
value to nanoseconds (ptp clock). Now these mult and shift factors are
calculated manually and provided through DT, which makes very hard to
support of a lot number of platforms, especially if CPTS refclk is not the
same for some kind of boards and depends on efuse settings (Keystone 2
platforms). Hence, export clocks_calc_mult_shift() to allow drivers like
CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of
mult and shift factors.
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before commit b32614c03413 ("tracing/rb: Convert to hotplug state machine")
the allocated cpumask was initialized to the mask of online or possible
CPUs. After the CPU hotplug changes the buffer initialization moved to
trace_rb_cpu_prepare() but the cpumask is allocated with alloc_cpumask()
and therefor has random content. As a consequence the cpu buffers are not
initialized and a later access dereferences a NULL pointer.
Use zalloc_cpumask() instead so trace_rb_cpu_prepare() initializes the
buffers properly.
Fixes: b32614c03413 ("tracing/rb: Convert to hotplug state machine")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20161207133133.hzkcqfllxcdi3joz@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Since commit:
4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
printk() requires KERN_CONT to continue log messages. Lots of printk()
in lockdep.c and print_ip_sym() don't have it. As the result lockdep
reports are completely messed up.
Add missing KERN_CONT and inline print_ip_sym() where necessary.
Example of a messed up report:
0-rc5+ #41 Not tainted
-------------------------------------------------------
syz-executor0/5036 is trying to acquire lock:
(
rtnl_mutex
){+.+.+.}
, at:
[<ffffffff86b3d6ac>] rtnl_lock+0x1c/0x20
but task is already holding lock:
(
&net->packet.sklist_lock
){+.+...}
, at:
[<ffffffff873541a6>] packet_diag_dump+0x1a6/0x1920
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3
(
&net->packet.sklist_lock
+.+...}
...
Without this patch all scripts that parse kernel bug reports are broken.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andreyknvl@google.com
Cc: aryabinin@virtuozzo.com
Cc: joe@perches.com
Cc: syzkaller@googlegroups.com
Link: http://lkml.kernel.org/r/1480343083-48731-1-git-send-email-dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The warning introduced in commit:
864c2357ca89 ("perf/core: Do not set cpuctx->cgrp for unscheduled cgroups")
assumed that a cgroup switch always precedes list_del_event. This is
not the case. Remove warning.
Make sure that cpuctx->cgrp is NULL until a cgroup event is sched in
or ctx->nr_cgroups == 0.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1480841177-27299-1-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When loading a BPF program via bpf(2), calculate the digest over
the program's instruction stream and store it in struct bpf_prog's
digest member. This is done at a point in time before any instructions
are rewritten by the verifier. Any unstable map file descriptor
number part of the imm field will be zeroed for the hash.
fdinfo example output for progs:
# cat /proc/1590/fdinfo/5
pos: 0
flags: 02000002
mnt_id: 11
prog_type: 1
prog_jited: 1
prog_digest: b27e8b06da22707513aa97363dfb11c7c3675d28
memlock: 4096
When programs are pinned and retrieved by an ELF loader, the loader
can check the program's digest through fdinfo and compare it against
one that was generated over the ELF file's program section to see
if the program needs to be reloaded. Furthermore, this can also be
exposed through other means such as netlink in case of a tc cls/act
dump (or xdp in future), but also through tracepoints or other
facilities to identify the program. Other than that, the digest can
also serve as a base name for the work in progress kallsyms support
of programs. The digest doesn't depend/select the crypto layer, since
we need to keep dependencies to a minimum. iproute2 will get support
for this facility.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
copy_from_iter_full(), copy_from_iter_full_nocache() and
csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
et.al., advancing iterator only in case of successful full copy
and returning whether it had been successful or not.
Convert some obvious users. *NOTE* - do not blindly assume that
something is a good candidate for those unless you are sure that
not advancing iov_iter in failure case is the right thing in
this case. Anything that does short read/short write kind of
stuff (or is in a loop, etc.) is unlikely to be a good one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Occasionally, clang (e.g. version 3.8.1) translates a sum between two
constant operands using a BPF_OR instead of a BPF_ADD. The verifier is
currently not handling this scenario, and the destination register type
becomes UNKNOWN_VALUE even if it's still storing a constant. As a result,
the destination register cannot be used as argument to a helper function
expecting a ARG_CONST_STACK_*, limiting some use cases.
Modify the verifier to handle this case, and add a few tests to make sure
all combinations are supported, and stack boundaries are still verified
even with BPF_OR.
Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Couple conflicts resolved here:
1) In the MACB driver, a bug fix to properly initialize the
RX tail pointer properly overlapped with some changes
to support variable sized rings.
2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
overlapping with a reorganization of the driver to support
ACPI, OF, as well as PCI variants of the chip.
3) In 'net' we had several probe error path bug fixes to the
stmmac driver, meanwhile a lot of this code was cleaned up
and reorganized in 'net-next'.
4) The cls_flower classifier obtained a helper function in
'net-next' called __fl_delete() and this overlapped with
Daniel Borkamann's bug fix to use RCU for object destruction
in 'net'. It also overlapped with Jiri's change to guard
the rhashtable_remove_fast() call with a check against
tc_skip_sw().
5) In mlx4, a revert bug fix in 'net' overlapped with some
unrelated changes in 'net-next'.
6) In geneve, a stale header pointer after pskb_expand_head()
bug fix in 'net' overlapped with a large reorganization of
the same code in 'net-next'. Since the 'net-next' code no
longer had the bug in question, there was nothing to do
other than to simply take the 'net-next' hunks.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Lots more phydev and probe error path leaks in various drivers by
Johan Hovold.
2) Fix race in packet_set_ring(), from Philip Pettersson.
3) Use after free in dccp_invalid_packet(), from Eric Dumazet.
4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric
Dumazet.
5) When tunneling between ipv4 and ipv6 we can be left with the wrong
skb->protocol value as we enter the IPSEC engine and this causes all
kinds of problems. Set it before the output path does any
dst_output() calls, from Eli Cooper.
6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from
Florian Fainelli.
7) Various netfilter nat bug fixes from FLorian Westphal.
8) Fix memory leak in ipvlan_link_new(), from Gao Feng.
9) Locking fixes, particularly wrt. socket lookups, in l2tp from
Guillaume Nault.
10) Avoid invoking rhash teardowns in atomic context by moving netlink
cb->done() dump completion from a worker thread. Fix from Herbert
Xu.
11) Buffer refcount problems in tun and macvtap on errors, from Jason
Wang.
12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user
selects BBR. Fix from Julian Wollrath.
13) Fix deadlock in transmit path on altera TSE driver, from Lino
Sanfilippo.
14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita
Yushchenko.
15) tc_tunnel_key needs to be properly exported to userspace via uapi,
fix from Roi Dayan.
16) rds_tcp_init_net() doesn't unregister notifier in error path, fix
from Sowmini Varadhan.
17) Stale packet header pointer access after pskb_expand_head() in
genenve driver, fix from Sabrina Dubroca.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
geneve: avoid use-after-free of skb->data
tipc: check minimum bearer MTU
net: renesas: ravb: unintialized return value
sh_eth: remove unchecked interrupts for RZ/A1
net: bcmgenet: Utilize correct struct device for all DMA operations
NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
cdc_ether: Fix handling connection notification
ip6_offload: check segs for NULL in ipv6_gso_segment.
RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
ipv6: Set skb->protocol properly for local output
ipv4: Set skb->protocol properly for local output
packet: fix race condition in packet_set_ring
net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler
net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers
net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
net: ethernet: stmmac: platform: fix outdated function header
net: ethernet: stmmac: dwmac-meson8b: fix probe error path
net: ethernet: stmmac: dwmac-generic: fix probe error path
...
|
|
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.
This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Code move and rename only; no functional change intended.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Registers new BPF program types which correspond to the LWT hooks:
- BPF_PROG_TYPE_LWT_IN => dst_input()
- BPF_PROG_TYPE_LWT_OUT => dst_output()
- BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit()
The separate program types are required to differentiate between the
capabilities each LWT hook allows:
* Programs attached to dst_input() or dst_output() are restricted and
may only read the data of an skb. This prevent modification and
possible invalidation of already validated packet headers on receive
and the construction of illegal headers while the IP headers are
still being assembled.
* Programs attached to lwtunnel_xmit() are allowed to modify packet
content as well as prepending an L2 header via a newly introduced
helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is
invoked after the IP header has been assembled completely.
All BPF programs receive an skb with L3 headers attached and may return
one of the following error codes:
BPF_OK - Continue routing as per nexthop
BPF_DROP - Drop skb and return EPERM
BPF_REDIRECT - Redirect skb to device as per redirect() helper.
(Only valid in lwtunnel_xmit() context)
The return codes are binary compatible with their TC_ACT_
relatives to ease compatibility.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rt_mutex_proxy_unlock()/init_proxy_locked()
While debugging the unlock vs. dequeue race which resulted in state
corruption of futexes the lockless nature of rt_mutex_proxy_unlock()
caused some confusion.
Add commentry to explain why it is safe to do this lockless. Add matching
comments to rt_mutex_init_proxy_locked() for completeness sake.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20161130210030.591941927@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This is a left over from the original rtmutex implementation which used
both bit0 and bit1 in the owner pointer. Commit:
8161239a8bcc ("rtmutex: Simplify PI algorithm and make highest prio task get lock")
... removed the usage of bit1, but kept the extra mask around. This is
confusing at best.
Remove it and just use RT_MUTEX_HAS_WAITERS for the masking.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20161130210030.509567906@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
While debugging the rtmutex unlock vs. dequeue race Will suggested to use
READ_ONCE() in rt_mutex_owner() as it might race against the
cmpxchg_release() in unlock_rt_mutex_safe().
Will: "It's a minor thing which will most likely not matter in practice"
Careful search did not unearth an actual problem in todays code, but it's
better to be safe than surprised.
Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
David reported a futex/rtmutex state corruption. It's caused by the
following problem:
CPU0 CPU1 CPU2
l->owner=T1
rt_mutex_lock(l)
lock(l->wait_lock)
l->owner = T1 | HAS_WAITERS;
enqueue(T2)
boost()
unlock(l->wait_lock)
schedule()
rt_mutex_lock(l)
lock(l->wait_lock)
l->owner = T1 | HAS_WAITERS;
enqueue(T3)
boost()
unlock(l->wait_lock)
schedule()
signal(->T2) signal(->T3)
lock(l->wait_lock)
dequeue(T2)
deboost()
unlock(l->wait_lock)
lock(l->wait_lock)
dequeue(T3)
===> wait list is now empty
deboost()
unlock(l->wait_lock)
lock(l->wait_lock)
fixup_rt_mutex_waiters()
if (wait_list_empty(l)) {
owner = l->owner & ~HAS_WAITERS;
l->owner = owner
==> l->owner = T1
}
lock(l->wait_lock)
rt_mutex_unlock(l) fixup_rt_mutex_waiters()
if (wait_list_empty(l)) {
owner = l->owner & ~HAS_WAITERS;
cmpxchg(l->owner, T1, NULL)
===> Success (l->owner = NULL)
l->owner = owner
==> l->owner = T1
}
That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.
This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().
The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.
It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.
Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.
Reported-by: David Daney <ddaney@caviumnetworks.com>
Tested-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Install the callbacks via the state machine. The notifier in struct
ring_buffer is replaced by the multi instance interface. Upon
__ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke
the trace_rb_cpu_prepare() on each CPU.
This callback may now fail. This means __ring_buffer_alloc() will fail and
cleanup (like previously) and during a CPU up event this failure will not
allow the CPU to come up.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161126231350.10321-7-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
netlink kernel socket is protected by refcount, not RCU.
Its rcv path is neither protected by RCU. So the synchronize_net()
is just pointless.
Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alarm timers are one of the mechanisms to wake up a system from suspend,
but there exist no tracepoints to analyse which process/thread armed an
alarmtimer.
Add tracepoints for start/cancel/expire of individual alarm timers and one
for tracing the suspend time decision when to resume the system.
The following trace excerpt illustrates the new mechanism:
Binder:3292_2-3304 [000] d..2 149.981123: alarmtimer_cancel:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325463120000000000 now:1325376810370370245
Binder:3292_2-3304 [000] d..2 149.981136: alarmtimer_start:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325376840000000000 now:1325376810370384591
Binder:3292_9-3953 [000] d..2 150.212991: alarmtimer_cancel:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179552000000 now:150154008122
Binder:3292_9-3953 [000] d..2 150.213006: alarmtimer_start:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179551000000 now:150154025622
system_server-3000 [002] ...1 162.701940: alarmtimer_suspend:
alarmtimer type:REALTIME expires:1325376840000000000
The wakeup time which is selected at suspend time allows to map it back to
the task arming the timer: Binder:3292_2.
[ tglx: Store alarm timer expiry time instead of some useless RTC relative
information, add proper type information for wakeups which are
handled via the clock_nanosleep/freezer and massage the changelog. ]
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Link: http://lkml.kernel.org/r/1480372524-15181-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|