aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2016-12-02tracing/rb: Convert to hotplug state machineSebastian Andrzej Siewior2-91/+59
Install the callbacks via the state machine. The notifier in struct ring_buffer is replaced by the multi instance interface. Upon __ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke the trace_rb_cpu_prepare() on each CPU. This callback may now fail. This means __ring_buffer_alloc() will fail and cleanup (like previously) and during a CPU up event this failure will not allow the CPU to come up. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-12-01audit: remove useless synchronize_net()WANG Cong1-2/+1
netlink kernel socket is protected by refcount, not RCU. Its rcv path is neither protected by RCU. So the synchronize_net() is just pointless. Cc: Richard Guy Briggs <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-01alarmtimer: Add tracepoints for alarm timersBaolin Wang1-10/+43
Alarm timers are one of the mechanisms to wake up a system from suspend, but there exist no tracepoints to analyse which process/thread armed an alarmtimer. Add tracepoints for start/cancel/expire of individual alarm timers and one for tracing the suspend time decision when to resume the system. The following trace excerpt illustrates the new mechanism: Binder:3292_2-3304 [000] d..2 149.981123: alarmtimer_cancel: alarmtimer:ffffffc1319a7800 type:REALTIME expires:1325463120000000000 now:1325376810370370245 Binder:3292_2-3304 [000] d..2 149.981136: alarmtimer_start: alarmtimer:ffffffc1319a7800 type:REALTIME expires:1325376840000000000 now:1325376810370384591 Binder:3292_9-3953 [000] d..2 150.212991: alarmtimer_cancel: alarmtimer:ffffffc1319a5a00 type:BOOTTIME expires:179552000000 now:150154008122 Binder:3292_9-3953 [000] d..2 150.213006: alarmtimer_start: alarmtimer:ffffffc1319a5a00 type:BOOTTIME expires:179551000000 now:150154025622 system_server-3000 [002] ...1 162.701940: alarmtimer_suspend: alarmtimer type:REALTIME expires:1325376840000000000 The wakeup time which is selected at suspend time allows to map it back to the task arming the timer: Binder:3292_2. [ tglx: Store alarm timer expiry time instead of some useless RTC relative information, add proper type information for wakeups which are handled via the clock_nanosleep/freezer and massage the changelog. ] Signed-off-by: Baolin Wang <[email protected]> Signed-off-by: John Stultz <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-12-01Merge back earlier cpuidle material for v4.10.Rafael J. Wysocki3-67/+111
2016-11-30bpf: fix states equal logic for varlen accessJosef Bacik1-2/+8
If we have a branch that looks something like this int foo = map->value; if (condition) { foo += blah; } else { foo = bar; } map->array[foo] = baz; We will incorrectly assume that the !condition branch is equal to the condition branch as the register for foo will be UNKNOWN_VALUE in both cases. We need to adjust this logic to only do this if we didn't do a varlen access after we processed the !condition branch, otherwise we have different ranges and need to check the other branch as well. Fixes: 484611357c19 ("bpf: allow access into map value arrays") Reported-by: Jann Horn <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-11-30kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.Thiago Jung Bauermann1-5/+20
kexec_locate_mem_hole will be used by the PowerPC kexec_file_load implementation to find free memory for the purgatory stack. Signed-off-by: Thiago Jung Bauermann <[email protected]> Acked-by: Dave Young <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-11-30kexec_file: Change kexec_add_buffer to take kexec_buf as argument.Thiago Jung Bauermann1-48/+40
This is done to simplify the kexec_add_buffer argument list. Adapt all callers to set up a kexec_buf to pass to kexec_add_buffer. In addition, change the type of kexec_buf.buffer from char * to void *. There is no particular reason for it to be a char *, and the change allows us to get rid of 3 existing casts to char * in the code. Signed-off-by: Thiago Jung Bauermann <[email protected]> Acked-by: Dave Young <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-11-30kexec_file: Allow arch-specific memory walking for kexec_add_bufferThiago Jung Bauermann2-24/+22
Allow architectures to specify a different memory walking function for kexec_add_buffer. x86 uses iomem to track reserved memory ranges, but PowerPC uses the memblock subsystem. Signed-off-by: Thiago Jung Bauermann <[email protected]> Acked-by: Dave Young <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-11-30locking/lockdep: Provide a type check for lock_is_heldPeter Zijlstra1-8/+12
Christoph requested lockdep_assert_held() variants that distinguish between held-for-read or held-for-write. Provide: int lock_is_held_type(struct lockdep_map *lock, int read) which takes the same argument as lock_acquire(.read) and matches it to the held_lock instance. Use of this function should be gated by the debug_locks variable. When that is 0 the return value of the lock_is_held_type() function is undefined. This is done to allow both negative and positive tests for holding locks. By default we provide (positive) lockdep_assert_held{,_exclusive,_read}() macros. Requested-by: Christoph Hellwig <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Jens Axboe <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-11-29bpf: cgroup: fix documentation of __cgroup_bpf_update()Daniel Mack1-2/+2
There's a 'not' missing in one paragraph. Add it. Fixes: 3007098494be ("cgroup: add support for eBPF programs") Signed-off-by: Daniel Mack <[email protected]> Reported-by: Rami Rosen <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-11-29Re-enable CONFIG_MODVERSIONS in a slightly weaker formLinus Torvalds1-2/+3
This enables CONFIG_MODVERSIONS again, but allows for missing symbol CRC information in order to work around the issue that newer binutils versions seem to occasionally drop the CRC on the floor. binutils 2.26 seems to work fine, while binutils 2.27 seems to break MODVERSIONS of symbols that have been defined in assembler files. [ We've had random missing CRC's before - it may be an old problem that just is now reliably triggered with the weak asm symbols and a new version of binutils ] Some day I really do want to remove MODVERSIONS entirely. Sadly, today does not appear to be that day: Debian people apparently do want the option to enable MODVERSIONS to make it easier to have external modules across kernel versions, and this seems to be a fairly minimal fix for the annoying problem. Cc: Ben Hutchings <[email protected]> Acked-by: Michal Marek <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-11-29audit: add support for session ID user filterRichard Guy Briggs2-0/+7
Define AUDIT_SESSIONID in the uapi and add support for specifying user filters based on the session ID. Also add the new session ID filter to the feature bitmap so userspace knows it is available. https://github.com/linux-audit/audit-kernel/issues/4 RFE: add a session ID filter to the kernel's user filter Signed-off-by: Richard Guy Briggs <[email protected]> [PM: combine multiple patches from Richard into this one] Signed-off-by: Paul Moore <[email protected]>
2016-11-29trace: Add an option for boot clock as trace clockJoel Fernandes1-0/+1
Unlike monotonic clock, boot clock as a trace clock will account for time spent in suspend useful for tracing suspend/resume. This uses earlier introduced infrastructure for using the fast boot clock. Signed-off-by: Joel Fernandes <[email protected]> Signed-off-by: John Stultz <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-11-29timekeeping: Add a fast and NMI safe boot clockJoel Fernandes1-0/+29
This boot clock can be used as a tracing clock and will account for suspend time. To keep it NMI safe since we're accessing from tracing, we're not using a separate timekeeper with updates to monotonic clock and boot offset protected with seqlocks. This has the following minor side effects: (1) Its possible that a timestamp be taken after the boot offset is updated but before the timekeeper is updated. If this happens, the new boot offset is added to the old timekeeping making the clock appear to update slightly earlier: CPU 0 CPU 1 timekeeping_inject_sleeptime64() __timekeeping_inject_sleeptime(tk, delta); timestamp(); timekeeping_update(tk, TK_CLEAR_NTP...); (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be partially updated. Since the tk->offs_boot update is a rare event, this should be a rare occurrence which postprocessing should be able to handle. Signed-off-by: Joel Fernandes <[email protected]> Signed-off-by: John Stultz <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-11-29sched/idle: Add support for tasks that inject idlePeter Zijlstra3-62/+103
Idle injection drivers such as Intel powerclamp and ACPI PAD drivers use realtime tasks to take control of CPU then inject idle. There are two issues with this approach: 1. Low efficiency: injected idle task is treated as busy so sched ticks do not stop during injected idle period, the result of these unwanted wakeups can be ~20% loss in power savings. 2. Idle accounting: injected idle time is presented to user as busy. This patch addresses the issues by introducing a new PF_IDLE flag which allows any given task to be treated as idle task while the flag is set. Therefore, idle injection tasks can run through the normal flow of NOHZ idle enter/exit to get the correct accounting as well as tick stop when possible. The implication is that idle task is then no longer limited to PID == 0. Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Jacob Pan <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-11-29cpuidle: Allow enforcing deepest idle state selectionJacob Pan1-5/+8
When idle injection is used to cap power, we need to override the governor's choice of idle states. For this reason, make it possible the deepest idle state selection to be enforced by setting a flag on a given CPU to achieve the maximum potential power draw reduction. Signed-off-by: Jacob Pan <[email protected]> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-11-27bpf: allow for mount options to specify permissionsDaniel Borkmann1-1/+53
Since we recently converted the BPF filesystem over to use mount_nodev(), we now have the possibility to also hold mount options in sb's s_fs_info. This work implements mount options support for specifying permissions on the sb's inode, which will be used by tc when it manually needs to mount the fs. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-11-27bpf: add owner_prog_type and accounted mem to array map's fdinfoDaniel Borkmann1-2/+15
Allow for checking the owner_prog_type of a program array map. In some cases bpf(2) can return -EINVAL /after/ the verifier passed and did all the rewrites of the bpf program. The reason that lets us fail at this late stage is that program array maps are incompatible. Allow users to inspect this earlier after they got the map fd through BPF_OBJ_GET command. tc will get support for this. Also, display how much we charged the map with regards to RLIMIT_MEMLOCK. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-11-27bpf: drop unnecessary context cast from BPF_PROG_RUNDaniel Borkmann2-2/+2
Since long already bpf_func is not only about struct sk_buff * as input anymore. Make it generic as void *, so that callers don't need to cast for it each time they call BPF_PROG_RUN(). Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-11-27module: extend 'rodata=off' boot cmdline parameter to module mappingsAKASHI Takahiro1-3/+17
The current "rodata=off" parameter disables read-only kernel mappings under CONFIG_DEBUG_RODATA: commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings") This patch is a logical extension to module mappings ie. read-only mappings at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX (mainly for debug use). Please note, however, that it only affects RO/RW permissions, keeping NX set. This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory (always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64. Suggested-by: and Acked-by: Mark Rutland <[email protected]> Signed-off-by: AKASHI Takahiro <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Rusty Russell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jessica Yu <[email protected]>
2016-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-8/+42
udplite conflict is resolved by taking what 'net-next' did which removed the backlog receive method assignment, since it is no longer necessary. Two entries were added to the non-priv ethtool operations switch statement, one in 'net' and one in 'net-next, so simple overlapping changes. Signed-off-by: David S. Miller <[email protected]>
2016-11-26module: Fix a comment above strong_try_module_get()Miroslav Benes1-2/+5
The comment above strong_try_module_get() function is not true anymore. Return values changed with commit c9a3ba55bb5d ("module: wait for dependent modules doing init."). Signed-off-by: Miroslav Benes <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [[email protected]: style fixes to make checkpatch.pl happy] Signed-off-by: Jessica Yu <[email protected]>
2016-11-26module: When modifying a module's text ignore modules which are going away tooAaron Tomlin1-1/+7
By default, during the access permission modification of a module's core and init pages, we only ignore modules that are malformed. Albeit for a module which is going away, it does not make sense to change its text to RO since the module should be RW, before deallocation. This patch makes set_all_modules_text_ro() skip modules which are going away too. Signed-off-by: Aaron Tomlin <[email protected]> Acked-by: Rusty Russell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [[email protected]: add comment as suggested by Steven Rostedt] Signed-off-by: Jessica Yu <[email protected]>
2016-11-26module: Ensure a module's state is set accordingly during module coming ↵Aaron Tomlin1-0/+1
cleanup code In load_module() in the event of an error, for e.g. unknown module parameter(s) specified we go to perform some module coming clean up operations. At this point the module is still in a "formed" state when it is actually going away. This patch updates the module's state accordingly to ensure anyone on the module_notify_list waiting for a module going away notification will be notified accordingly. Signed-off-by: Aaron Tomlin <[email protected]> Acked-by: Rusty Russell <[email protected]> Reviewed-by: Miroslav Benes <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jessica Yu <[email protected]>
2016-11-26taint/module: Clean up global and module taint flags handlingPetr Mladek2-48/+38
The commit 66cc69e34e86a231 ("Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE") updated module_taint_flags() to potentially print one more character. But it did not increase the size of the corresponding buffers in m_show() and print_modules(). We have recently done the same mistake when adding a taint flag for livepatching, see https://lkml.kernel.org/r/cfba2c823bb984690b73572aaae1db596b54a082.1472137475.git.jpoimboe@redhat.com Also struct module uses an incompatible type for mod-taints flags. It survived from the commit 2bc2d61a9638dab670d ("[PATCH] list module taint flags in Oops/panic"). There was used "int" for the global taint flags at these times. But only the global tain flags was later changed to "unsigned long" by the commit 25ddbb18aae33ad2 ("Make the taint flags reliable"). This patch defines TAINT_FLAGS_COUNT that can be used to create arrays and buffers of the right size. Note that we could not use enum because the taint flag indexes are used also in assembly code. Then it reworks the table that describes the taint flags. The TAINT_* numbers can be used as the index. Instead, we add information if the taint flag is also shown per-module. Finally, it uses "unsigned long", bit operations, and the updated taint_flags table also for mod->taints. It is not optimal because only few taint flags can be printed by module_taint_flags(). But better be on the safe side. IMHO, it is not worth the optimization and this is a good compromise. Signed-off-by: Petr Mladek <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix broken lkml link in changelog] Signed-off-by: Jessica Yu <[email protected]>
2016-11-25bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commandsDaniel Mack1-0/+81
Extend the bpf(2) syscall by two new commands, BPF_PROG_ATTACH and BPF_PROG_DETACH which allow attaching and detaching eBPF programs to a target. On the API level, the target could be anything that has an fd in userspace, hence the name of the field in union bpf_attr is called 'target_fd'. When called with BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS, the target is expected to be a valid file descriptor of a cgroup v2 directory which has the bpf controller enabled. These are the only use-cases implemented by this patch at this point, but more can be added. If a program of the given type already exists in the given cgroup, the program is swapped automically, so userspace does not have to drop an existing program first before installing a new one, which would otherwise leave a gap in which no program is attached. For more information on the propagation logic to subcgroups, please refer to the bpf cgroup controller implementation. The API is guarded by CAP_NET_ADMIN. Signed-off-by: Daniel Mack <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-11-25cgroup: add support for eBPF programsDaniel Mack3-0/+186
This patch adds two sets of eBPF program pointers to struct cgroup. One for such that are directly pinned to a cgroup, and one for such that are effective for it. To illustrate the logic behind that, assume the following example cgroup hierarchy. A - B - C \ D - E If only B has a program attached, it will be effective for B, C, D and E. If D then attaches a program itself, that will be effective for both D and E, and the program in B will only affect B and C. Only one program of a given type is effective for a cgroup. Attaching and detaching programs will be done through the bpf(2) syscall. For now, ingress and egress inet socket filtering are the only supported use-cases. Signed-off-by: Daniel Mack <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-11-24cpufreq: schedutil: Rectify comment in sugov_irq_work() functionViresh Kumar1-6/+6
This patch rectifies a comment present in sugov_irq_work() function to follow proper grammar. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Viresh Kumar <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-11-24sched: Extend scheduler's asym packingTim Chen3-17/+57
We generalize the scheduler's asym packing to provide an ordering of the cpu beyond just the cpu number. This allows the use of the ASYM_PACKING scheduler machinery to move loads to preferred CPU in a sched domain. The preference is defined with the cpu priority given by arch_asym_cpu_priority(cpu). We also record the most preferred cpu in a sched group when we build the cpu's capacity for fast lookup of preferred cpu during load balancing. Co-developed-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Tim Chen <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Srinivas Pandruvada <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/0e73ae12737dfaafa46c07066cc7c5d3f1675e46.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <[email protected]>
2016-11-24sched/autogroup: Fix 64-bit kernel nice level adjustmentMike Galbraith1-1/+3
Michael Kerrisk reported: > Regarding the previous paragraph... My tests indicate > that writing *any* value to the autogroup [nice priority level] > file causes the task group to get a lower priority. Because autogroup didn't call the then meaningless scale_load()... Autogroup nice level adjustment has been broken ever since load resolution was increased for 64-bit kernels. Use scale_load() to scale group weight. Michael Kerrisk tested this patch to fix the problem: > Applied and tested against 4.9-rc6 on an Intel u7 (4 cores). > Test setup: > > Terminal window 1: running 40 CPU burner jobs > Terminal window 2: running 40 CPU burner jobs > Terminal window 1: running 1 CPU burner job > > Demonstrated that: > * Writing "0" to the autogroup file for TW1 now causes no change > to the rate at which the process on the terminal consume CPU. > * Writing -20 to the autogroup file for TW1 caused those processes > to get the lion's share of CPU while TW2 TW3 get a tiny amount. > * Writing -20 to the autogroup files for TW1 and TW3 allowed the > process on TW3 to get as much CPU as it was getting as when > the autogroup nice values for both terminals were 0. Reported-by: Michael Kerrisk <[email protected]> Tested-by: Michael Kerrisk <[email protected]> Signed-off-by: Mike Galbraith <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: linux-man <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-11-23ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inlineSteven Rostedt (Red Hat)1-2/+2
Both rb_end_commit() and rb_set_commit_to_write() are in the fast path of the ring buffer recording. Make sure they are always inlined. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andi Kleen <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-11-23ring-buffer: Froce rb_update_write_stamp() to be inlinedSteven Rostedt (Red Hat)1-2/+2
The function rb_update_write_stamp() is in the hotpath of the ring buffer recording. Make sure that it is inlined as well. There's not many places that call it. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andi Kleen <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-11-23ring-buffer: Force inline of hotpath helper functionsSteven Rostedt (Red Hat)1-8/+8
There's several small helper functions in ring_buffer.c that are used in the hot path. For some reason, even though they are marked inline, gcc tends not to enforce it. Make sure these functions are always inlined. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andi Kleen <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-11-23tracing: Make __buffer_unlock_commit() always_inlineSteven Rostedt (Red Hat)5-21/+32
The function __buffer_unlock_commit() is called in a few places outside of trace.c. But for the most part, it should really be inlined, as it is in the hot path of the trace_events. For the callers outside of trace.c, create a new function trace_buffer_unlock_commit_nostack(), as the reason it was used was to avoid the stack tracing that trace_buffer_unlock_commit() could do. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andi Kleen <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-11-23tracing: Make tracepoint_printk a static_keySteven Rostedt (Red Hat)3-41/+79
Currently, when tracepoint_printk is set (enabled by the "tp_printk" kernel command line), it causes trace events to print via printk(). This is a very dangerous operation, but is useful for debugging. The issue is, it's seldom used, but it is always checked even if it's not enabled by the kernel command line. Instead of having this feature called by a branch against a variable, turn that variable into a static key, and this will remove the test and jump. To simplify things, the functions output_printk() and trace_event_buffer_commit() were moved from trace_events.c to trace.c. Signed-off-by: Steven Rostedt <[email protected]>
2016-11-23ring-buffer: Always inline rb_event_data()Steven Rostedt (Red Hat)1-1/+1
The rb_event_data() is the fast path of getting the ring buffer data from an event. Externally, ring_buffer_event_data() is used to access this function. But unfortunately, rb_event_data() is not inlined, and calling ring_buffer_event_data() causes that function to be called again. Force rb_event_data() to be inlined to lower the number of operations needed when calling ring_buffer_event_data(). Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andi Kleen <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-11-23ring-buffer: Make rb_reserve_next_event() always inlinedSteven Rostedt (Red Hat)1-1/+1
The function rb_reserved_next_event() is called by two functions: ring_buffer_lock_reserve() and ring_buffer_write(). This is in a very hot path of the tracing code, and it is best that they are not functions. The two callers are basically wrapers for rb_reserver_next_event(). Removing the function calls can save execution time in the hotpath of tracing. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andi Kleen <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-11-23tracing: Create a always_inlined __trace_buffer_lock_reserve()Steven Rostedt (Red Hat)1-39/+48
As Andi Kleen pointed out in the Link below, the trace events has quite a bit of code execution. A lot of that happens to be calling functions, where some of them should simply be inlined. One of these functions happens to be trace_buffer_lock_reserve() which is also a global, but it is used throughout the file it is defined in. Create a __trace_buffer_lock_reserve() that is always inlined that the file can benefit from. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andi Kleen <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-11-23Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-0/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Six fixes for bugs that were found via fuzzing, and a trivial hw-enablement patch for AMD Family-17h CPU PMUs" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Allow only a single PMU/box within an events group perf/x86/intel: Cure bogus unwind from PEBS entries perf/x86: Restore TASK_SIZE check on frame pointer perf/core: Fix address filter parser perf/x86: Add perf support for AMD family-17h processors perf/x86/uncore: Fix crash by removing bogus event_list[] handling for SNB client uncore IMC perf/core: Do not set cpuctx->cgrp for unscheduled cgroups
2016-11-23sched/fair: Clean up the tunable parameter definitionsIngo Molnar1-22/+28
No change in functionality: - align the default values vertically to make them easier to scan - standardize the 'default:' lines - fix minor whitespace typos Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-11-23sched/dl: Fix comment in pick_next_task_dl()T.Zhou1-1/+1
Fix cut & paste oversight: s/pull_rt_task/pull_dl_task Signed-off-by: T.Zhou <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/20161123004832.GA2983@geo Signed-off-by: Ingo Molnar <[email protected]>
2016-11-23Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar4-34/+93
Signed-off-by: Ingo Molnar <[email protected]>
2016-11-23Merge branch 'for-mingo' of ↵Ingo Molnar4-13/+28
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Documentation updates, yet again just simple changes. - Miscellaneous fixes, including a change to call_rcu()'s rcu_head alignment check. - Security-motivated list consistency checks, which are disabled by default behind DEBUG_LIST. - Torture-test updates. Signed-off-by: Ingo Molnar <[email protected]>
2016-11-22tracing: Add error checks to creation of event filesSteven Rostedt (Red Hat)1-9/+21
The creation of the set_event_pid file was assigned to a variable "entry" but that variable was never used. Ideally, it should be used to check if the file was created and warn if it was not. The files header_page, header_event should also be checked and a warning if they fail to be created. The "enable" file was moved up, as it is a more crucial file to have and a hard failure (return -ENOMEM) should be returned if it is not created. Reported-by: David Binderman <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-11-22tracing: Add hook to function tracing for other subsystems to useChunyan Zhang1-1/+128
Currently Function traces can be only exported to the ring buffer. This adds a trace_export concept which can process traces and export them to a registered destination as an addition to the current one that outputs to Ftrace - i.e. ring buffer. In this way, if we want function traces to be sent to other destinations rather than only to the ring buffer, we just need to register a new trace_export and implement its own .write() function for writing traces to storage. With this patch, only function tracing (trace type is TRACE_FN) is supported. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chunyan Zhang <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-11-22sched/nohz: Convert to hotplug state machineSebastian Andrzej Siewior1-19/+14
Install the callbacks via the state machine. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-11-22exec: Ensure mm->user_ns contains the execed filesEric W. Biederman1-2/+14
When the user namespace support was merged the need to prevent ptrace from revealing the contents of an unreadable executable was overlooked. Correct this oversight by ensuring that the executed file or files are in mm->user_ns, by adjusting mm->user_ns. Use the new function privileged_wrt_inode_uidgid to see if the executable is a member of the user namespace, and as such if having CAP_SYS_PTRACE in the user namespace should allow tracing the executable. If not update mm->user_ns to the parent user namespace until an appropriate parent is found. Cc: [email protected] Reported-by: Jann Horn <[email protected]> Fixes: 9e4a36ece652 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.") Signed-off-by: "Eric W. Biederman" <[email protected]>
2016-11-22ptrace: Don't allow accessing an undumpable mmEric W. Biederman1-6/+36
It is the reasonable expectation that if an executable file is not readable there will be no way for a user without special privileges to read the file. This is enforced in ptrace_attach but if ptrace is already attached before exec there is no enforcement for read-only executables. As the only way to read such an mm is through access_process_vm spin a variant called ptrace_access_vm that will fail if the target process is not being ptraced by the current process, or the current process did not have sufficient privileges when ptracing began to read the target processes mm. In the ptrace implementations replace access_process_vm by ptrace_access_vm. There remain several ptrace sites that still use access_process_vm as they are reading the target executables instructions (for kernel consumption) or register stacks. As such it does not appear necessary to add a permission check to those calls. This bug has always existed in Linux. Fixes: v1.0 Cc: [email protected] Reported-by: Andy Lutomirski <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2016-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-27/+87
All conflicts were simple overlapping changes except perhaps for the Thunder driver. That driver has a change_mtu method explicitly for sending a message to the hardware. If that fails it returns an error. Normally a driver doesn't need an ndo_change_mtu method becuase those are usually just range changes, which are now handled generically. But since this extra operation is needed in the Thunder driver, it has to stay. However, if the message send fails we have to restore the original MTU before the change because the entire call chain expects that if an error is thrown by ndo_change_mtu then the MTU did not change. Therefore code is added to nicvf_change_mtu to remember the original MTU, and to restore it upon nicvf_update_hw_max_frs() failue. Signed-off-by: David S. Miller <[email protected]>
2016-11-22ptrace: Capture the ptracer's creds not PT_PTRACE_CAPEric W. Biederman2-5/+27
When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was overlooked. This can result in incorrect behavior when an application like strace traces an exec of a setuid executable. Further PT_PTRACE_CAP does not have enough information for making good security decisions as it does not report which user namespace the capability is in. This has already allowed one mistake through insufficient granulariy. I found this issue when I was testing another corner case of exec and discovered that I could not get strace to set PT_PTRACE_CAP even when running strace as root with a full set of caps. This change fixes the above issue with strace allowing stracing as root a setuid executable without disabling setuid. More fundamentaly this change allows what is allowable at all times, by using the correct information in it's decision. Cc: [email protected] Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12") Signed-off-by: "Eric W. Biederman" <[email protected]>