aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2013-11-06perf: Fix arch_perf_out_copy_user defaultPeter Zijlstra1-9/+26
The arch_perf_output_copy_user() default of __copy_from_user_inatomic() returns bytes not copied, while all other argument functions given DEFINE_OUTPUT_COPY() return bytes copied. Since copy_from_user_nmi() is the odd duck out by returning bytes copied where all other *copy_{to,from}* functions return bytes not copied, change it over and ammend DEFINE_OUTPUT_COPY() to expect bytes not copied. Oddly enough DEFINE_OUTPUT_COPY() already returned bytes not copied while expecting its worker functions to return bytes copied. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: will.deacon@arm.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131030201622.GR16117@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Update a stale commentPeter Zijlstra1-2/+2
Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-9s5mze78gmlz19agt39i8rii@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Optimize perf_output_begin() -- address calculationPeter Zijlstra1-7/+7
Rewrite the handle address calculation code to be clearer. Saves 8 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-3trb2n2henb9m27tncef3ag7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Optimize perf_output_begin() -- lost_event casePeter Zijlstra1-5/+8
Avoid touching the lost_event and sample_data cachelines twince. Its not like we end up doing less work, but it might help to keep all accesses to these cachelines in one place. Due to code shuffle, this looses 4 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-zfxnc58qxj0eawdoj31hhupv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Optimize perf_output_begin()Peter Zijlstra1-8/+9
There's no point in re-doing the memory-barrier when we fail the cmpxchg(). Also placing it after the space reservation loop makes it clearer it only separates the userpage->tail read from the data stores. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-c19u6egfldyx86tpyc3zgkw9@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Add unlikely() to the ring-buffer codePeter Zijlstra1-8/+8
Add unlikely() annotations to 'slow' paths: When having a sampling event but no output buffer; you have bigger issues -- also the bail is still faster than actually doing the work. When having a sampling event but a control page only buffer, you have bigger issues -- again the bail is still faster than actually doing work. Optimize for the case where you're not loosing events -- again, not doing the work is still faster but make sure that when you have to actually do work its as fast as possible. The typical watermark is 1/2 the buffer size, so most events will not take this path. Shrinks perf_output_begin() by 16 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-wlg3jew3qnutm8opd0hyeuwn@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Simplify the ring-buffer codePeter Zijlstra1-33/+4
By using CIRC_SPACE() we can obviate the need for perf_output_space(). Shrinks the size of perf_output_begin() by 17 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-vtb0xb0llebmsdlfn1v5vtfj@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06locking: Move the percpu-rwsem code to kernel/locking/Peter Zijlstra2-1/+167
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-52bjmtty46we26hbfd9sc9iy@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06locking: Move the lglocks code to kernel/locking/Peter Zijlstra3-2/+2
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-amd6pg1mif6tikbyktfvby3y@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06locking: Move the rwsem code to kernel/locking/Peter Zijlstra5-2/+593
Notably: changed lib/rwsem* targets from lib- to obj-, no idea about the ramifications of that. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-g0kynfh5feriwc6p3h6kpbw6@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06locking: Move the rtmutex code to kernel/locking/Peter Zijlstra10-5/+5
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-p9ijt8div0hwldexwfm4nlhj@git.kernel.org [ Fixed build failure in kernel/rcu/tree_plugin.h. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06hung_task: add method to reset detectorMarcelo Tosatti1-0/+11
In certain occasions it is possible for a hung task detector positive to be false: continuation from a paused VM, for example. Add a method to reset detection, similar as is done with other kernel watchdogs. Acked-by: Don Zickus <dzickus@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-11-06locking: Move the semaphore core to kernel/locking/Peter Zijlstra3-2/+2
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-vmw5sf6vzmua1z6nx1cg69h2@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06locking: Move the spinlock code to kernel/locking/Peter Zijlstra4-3/+306
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-b81ol0z3mon45m51o131yc9j@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06locking: Move the lockdep code to kernel/locking/Peter Zijlstra6-6/+6
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-wl7s3tta5isufzfguc23et06@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06locking: Move the mutex code to kernel/locking/Peter Zijlstra6-4/+11
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-1ditvncg30dgbpvrz2bxfmke@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ ↵Ingo Molnar23-804/+2264
file move Conflicts: kernel/Makefile There are conflicts in kernel/Makefile due to file moving in the scheduler tree - resolve them. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06sched: Move completion code from core.c to completion.cPeter Zijlstra3-285/+300
Completions already have their own header file: linux/completion.h Move the implementation out of kernel/sched/core.c and into its own file: kernel/sched/completion.c. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-x2y49rmxu5dljt66ai2lcfuw@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06sched: Move wait code from core.c to wait.cPeter Zijlstra2-105/+105
For some reason only the wait part of the wait api lives in kernel/sched/wait.c and the wake part still lives in kernel/sched/core.c; ammend this. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-ftycee88naznulqk7ei5mbci@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06sched: Move wait.c into kernel/sched/Peter Zijlstra3-1/+2
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-5q5yqvdaen0rmapwloeaotx3@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06Merge branch 'core/rcu' into core/locking, to prepare the kernel/locking/ ↵Ingo Molnar13-103/+251
file move There are conflicts in lockdep.c due to RCU changes, and also the RCU tree changes kernel/Makefile - so pre-merge it to ease the moving of locking related .c files to kernel/locking/. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06Merge tag 'v3.12' into core/locking to pick up mutex upatesIngo Molnar17-66/+247
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-05tracing: Add support for SOFT_DISABLE to syscall eventsTom Zanussi2-12/+34
The original SOFT_DISABLE patches didn't add support for soft disable of syscall events; this adds it. Add an array of ftrace_event_file pointers indexed by syscall number to the trace array and remove the existing enabled bitmaps, which as a result are now redundant. The ftrace_event_file structs in turn contain the soft disable flags we need for per-syscall soft disable accounting. Adding ftrace_event_files also means we can remove the USE_CALL_FILTER bit, thus enabling multibuffer filter support for syscall events. Link: http://lkml.kernel.org/r/6e72b566e85d8df8042f133efbc6c30e21fb017e.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05tracing: Make register/unregister_ftrace_command __initTom Zanussi2-4/+12
register/unregister_ftrace_command() are only ever called from __init functions, so can themselves be made __init. Also make register_snapshot_cmd() __init for the same reason. Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05tracing: Update event filters for multibufferTom Zanussi12-91/+239
The trace event filters are still tied to event calls rather than event files, which means you don't get what you'd expect when using filters in the multibuffer case: Before: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 2048 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 Setting the filter in tracing/instances/test1/events shouldn't affect the same event in tracing/events as it does above. After: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 We'd like to just move the filter directly from ftrace_event_call to ftrace_event_file, but there are a couple cases that don't yet have multibuffer support and therefore have to continue using the current event_call-based filters. For those cases, a new USE_CALL_FILTER bit is added to the event_call flags, whose main purpose is to keep the old behavior for those cases until they can be updated with multibuffer support; at that point, the USE_CALL_FILTER flag (and the new associated call_filter_check_discard() function) can go away. The multibuffer support also made filter_current_check_discard() redundant, so this change removes that function as well and replaces it with filter_check_discard() (or call_filter_check_discard() as appropriate). Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05ftrace: Have control op function callback only trace when RCU is watchingSteven Rostedt (Red Hat)1-0/+9
Dave Jones reported that trinity would be able to trigger the following back trace: =============================== [ INFO: suspicious RCU usage. ] 3.10.0-rc2+ #38 Not tainted ------------------------------- include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 1 lock held by trinity-child1/18786: #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310 stack backtrace: CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38 0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8 ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8 0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48 Call Trace: [<ffffffff816e2f6b>] dump_stack+0x19/0x1b [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310 [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310 [<ffffffff81309289>] ? __const_udelay+0x29/0x30 [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0 [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0 [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90 [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0 [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0 [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20 [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0 [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210 [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20 [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470 [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210 [<ffffffff816f4000>] ftrace_call+0x5/0x2f [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210 [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40 [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40 [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470 [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0 [<ffffffff81103673>] rcu_user_enter+0x13/0x20 [<ffffffff8114541a>] user_enter+0x6a/0xd0 [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140 [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d ------------[ cut here ]------------ Perf uses rcu_read_lock() but as the function tracer can trace functions even when RCU is not currently active, this makes the rcu_read_lock() used by perf ineffective. As perf is currently the only user of the ftrace_ops_control_func() and perf is also the only function callback that actively uses rcu_read_lock(), the quick fix is to prevent the ftrace_ops_control_func() from calling its callbacks if RCU is not active. With Paul's new "rcu_is_watching()" we can tell if RCU is active or not. Reported-by: Dave Jones <davej@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05rcu: Do not trace rcu_is_watching() functionsSteven Rostedt2-3/+3
As perf uses the rcu_read_lock() primitives for recording into its ring buffer, perf tracing can not be called when RCU in inactive. With the perf function tracing, there are functions that can be traced when RCU is not active, and perf must not have its function callback called when this is the case. Luckily, Paul McKenney has created a way to detect when RCU is active or not with the rcu_is_watching() function. Unfortunately, this function can also be traced, and if that happens it can cause a bit of overhead for the perf function calls that do the check. Recursion protection prevents anything bad from happening, but there is a bit of added overhead for every function being traced that must detect that the rcu_is_watching() is also being traced. As rcu_is_watching() is a helper routine and not part of the critical logic in RCU, it does not need to be traced in order to debug RCU itself. Add the "notrace" annotation to all the rcu_is_watching() calls such that we never trace it. Link: http://lkml.kernel.org/r/20131104202736.72dd8e45@gandalf.local.home Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05Merge branch 'idle.2013.09.25a' of ↵Steven Rostedt (Red Hat)6-24/+41
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into HEAD Need to use Paul McKenney's "rcu_is_watching()" changes to fix a perf/ftrace bug.
2013-11-05trace/trace_stat: use rbtree postorder iteration helper instead of opencodingCody P Schafer1-36/+5
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of opencoding an alternate postorder iteration that modifies the tree Link: http://lkml.kernel.org/r/1383345566-25087-2-git-send-email-cody@linux.vnet.ibm.com Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05audit: call audit_bprm() only once to add AUDIT_EXECVE informationRichard Guy Briggs2-5/+0
Move the audit_bprm() call from search_binary_handler() to exec_binprm(). This allows us to get rid of the mm member of struct audit_aux_data_execve since bprm->mm will equal current->mm. This also mitigates the issue that ->argc could be modified by the load_binary() call in search_binary_handler(). audit_bprm() was being called to add an AUDIT_EXECVE record to the audit context every time search_binary_handler() was recursively called. Only one reference is necessary. Reported-by: Oleg Nesterov <onestero@redhat.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> --- This patch is against 3.11, but was developed on Oleg's post-3.11 patches that introduce exec_binprm().
2013-11-05audit: move audit_aux_data_execve contents into audit_context unionRichard Guy Briggs2-29/+16
audit_bprm() was being called to add an AUDIT_EXECVE record to the audit context every time search_binary_handler() was recursively called. Only one reference is necessary, so just update it. Move the the contents of audit_aux_data_execve into the union in audit_context, removing dependence on a kmalloc along the way. Reported-by: Oleg Nesterov <onestero@redhat.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: remove unused envc member of audit_aux_data_execveRichard Guy Briggs1-2/+0
Get rid of write-only audit_aux_data_exeve structure member envc. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: Kill the unused struct audit_aux_data_capsetEric W. Biederman1-6/+0
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from ebiederman commit 6904431d6b41190e42d6b94430b67cb4e7e6a4b7) Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: do not reject all AUDIT_INODE filter typesEric Paris1-0/+1
commit ab61d38ed8cf670946d12dc46b9198b521c790ea tried to merge the invalid filter checking into a single function. However AUDIT_INODE filters were not verified in the new generic checker. Thus such rules were being denied even though they were perfectly valid. Ex: $ auditctl -a exit,always -F arch=b64 -S open -F key=/foo -F inode=6955 -F devmajor=9 -F devminor=1 Error sending add rule data request (Invalid argument) Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: log the audit_names record typeJeff Layton1-0/+20
...to make it clear what the intent behind each record's operation was. In many cases you can infer this, based on the context of the syscall and the result. In other cases it's not so obvious. For instance, in the case where you have a file being renamed over another, you'll have two different records with the same filename but different inode info. By logging this information we can clearly tell which one was created and which was deleted. This fixes what was broken in commit bfcec708. Commit 79f6530c should also be backported to stable v3.7+. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: use given values in tty_audit enable apiRichard Guy Briggs1-1/+1
In send/GET, we don't want the kernel to lie about what value is set. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: use nlmsg_len() to get message payload lengthMathias Krause1-2/+2
Using the nlmsg_len member of the netlink header to test if the message is valid is wrong as it includes the size of the netlink header itself. Thereby allowing to send short netlink messages that pass those checks. Use nlmsg_len() instead to test for the right message length. The result of nlmsg_len() is guaranteed to be non-negative as the netlink message already passed the checks of nlmsg_ok(). Also switch to min_t() to please checkpatch.pl. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: stable@vger.kernel.org # v2.6.6+ for the 1st hunk, v2.6.23+ for the 2nd Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: use memset instead of trying to initialize field by fieldEric Paris1-1/+1
We currently are setting fields to 0 to initialize the structure declared on the stack. This is a bad idea as if the structure has holes or unpacked space these will not be initialized. Just use memset. This is not a performance critical section of code. Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: fix info leak in AUDIT_GET requestsMathias Krause1-0/+1
We leak 4 bytes of kernel stack in response to an AUDIT_GET request as we miss to initialize the mask member of status_set. Fix that. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: stable@vger.kernel.org # v2.6.6+ Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: update AUDIT_INODE filter rule to comparator functionRichard Guy Briggs1-1/+1
It appears this one comparison function got missed in f368c07d (and 9c937dcc). Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: audit feature to set loginuid immutableEric Paris2-1/+5
This adds a new 'audit_feature' bit which allows userspace to set it such that the loginuid is absolutely immutable, even if you have CAP_AUDIT_CONTROL. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: audit feature to only allow unsetting the loginuidEric Paris2-1/+5
This is a new audit feature which only grants processes with CAP_AUDIT_CONTROL the ability to unset their loginuid. They cannot directly set it from a valid uid to another valid uid. The ability to unset the loginuid is nice because a priviledged task, like that of container creation, can unset the loginuid and then priv is not needed inside the container when a login daemon needs to set the loginuid. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: allow unsetting the loginuid (with priv)Eric Paris1-1/+3
If a task has CAP_AUDIT_CONTROL allow that task to unset their loginuid. This would allow a child of that task to set their loginuid without CAP_AUDIT_CONTROL. Thus when launching a new login daemon, a priviledged helper would be able to unset the loginuid and then the daemon, which may be malicious user facing, do not need priv to function correctly. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLEEric Paris1-6/+4
After trying to use this feature in Fedora we found the hard coding policy like this into the kernel was a bad idea. Surprise surprise. We ran into these problems because it was impossible to launch a container as a logged in user and run a login daemon inside that container. This reverts back to the old behavior before this option was added. The option will be re-added in a userspace selectable manor such that userspace can choose when it is and when it is not appropriate. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: loginuid functions coding styleEric Paris1-25/+45
This is just a code rework. It makes things more readable. It does not make any functional changes. It does change the log messages to include both the old session id as well the new and it includes a new res field, which means we get messages even when the user did not have permission to change the loginuid. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: implement generic feature setting and retrievingEric Paris1-0/+109
The audit_status structure was not designed with extensibility in mind. Define a new AUDIT_SET_FEATURE message type which takes a new structure of bits where things can be enabled/disabled/locked one at a time. This structure should be able to grow in the future while maintaining forward and backward compatibility (based loosly on the ideas from capabilities and prctl) This does not actually add any features, but is just infrastructure to allow new on/off types of audit system features. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: change decimal constant to macro for invalid uidRichard Guy Briggs1-1/+1
SFR reported this 2013-05-15: > After merging the final tree, today's linux-next build (i386 defconfig) > produced this warning: > > kernel/auditfilter.c: In function 'audit_data_to_entry': > kernel/auditfilter.c:426:3: warning: this decimal constant is unsigned only > in ISO C90 [enabled by default] > > Introduced by commit 780a7654cee8 ("audit: Make testing for a valid > loginuid explicit") from Linus' tree. Replace this decimal constant in the code with a macro to make it more readable (add to the unsigned cast to quiet the warning). Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: printk USER_AVC messages when audit isn't enabledTyler Hicks1-1/+1
When the audit=1 kernel parameter is absent and auditd is not running, AUDIT_USER_AVC messages are being silently discarded. AUDIT_USER_AVC messages should be sent to userspace using printk(), as mentioned in the commit message of 4a4cd633 ("AUDIT: Optimise the audit-disabled case for discarding user messages"). When audit_enabled is 0, audit_receive_msg() discards all user messages except for AUDIT_USER_AVC messages. However, audit_log_common_recv_msg() refuses to allocate an audit_buffer if audit_enabled is 0. The fix is to special case AUDIT_USER_AVC messages in both functions. It looks like commit 50397bd1 ("[AUDIT] clean up audit_receive_msg()") introduced this bug. Cc: <stable@kernel.org> # v2.6.25+ Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: linux-audit@redhat.com Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit_alloc: clear TIF_SYSCALL_AUDIT if !audit_contextOleg Nesterov1-1/+3
If audit_filter_task() nacks the new thread it makes sense to clear TIF_SYSCALL_AUDIT which can be copied from parent by dup_task_struct(). A wrong TIF_SYSCALL_AUDIT is not really bad but it triggers the "slow" audit paths in entry.S to ensure the task can not miss audit_syscall_*() calls, this is pointless if the task has no ->audit_context. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05Audit: remove duplicate commentsGao feng1-7/+0
Remove it. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>