blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2016-01-06	sched/fair: Fix new task's load avg removed from source CPU in ↵	Yuyang Du	1	-10/+28
	wake_up_new_task() If a newly created task is selected to go to a different CPU in fork balance when it wakes up the first time, its load averages should not be removed from the source CPU since they are never added to it before. The same is also applicable to a never used group entity. Fix it in remove_entity_load_avg(): when entity's last_update_time is 0, simply return. This should precisely identify the case in question, because in other migrations, the last_update_time is set to 0 after remove_entity_load_avg(). Reported-by: Steve Muckle <[email protected]> Signed-off-by: Yuyang Du <[email protected]> [peterz: cfs_rq_last_update_time] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vincent Guittot <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-06	sched/deadline: Fix the earliest_dl.next logic	Wanpeng Li	1	-52/+7
	earliest_dl.next should cache deadline of the earliest ready task that is also enqueued in the pushable rbtree, as pull algorithm uses this information to find candidates for migration: if the earliest_dl.next deadline of source rq is earlier than the earliest_dl.curr deadline of destination rq, the task from the source rq can be pulled. However, current implementation only guarantees that earliest_dl.next is the deadline of the next ready task instead of the next pushable task; which will result in potentially holding both rqs' lock and find nothing to migrate because of affinity constraints. In addition, current logic doesn't update the next candidate for pushing in pick_next_task_dl(), even if the running task is never eligible. This patch fixes both problems by updating earliest_dl.next when pushable dl task is enqueued/dequeued, similar to what we already do for RT. Tested-by: Luca Abeni <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-06	Merge branch 'sched/urgent' into sched/core, to pick up fixes before merging ↵	Ingo Molnar	29	-206/+323
	new patches Signed-off-by: Ingo Molnar <[email protected]>
2016-01-06	sched/core: Reset task's lockless wake-queues on fork()	Sebastian Andrzej Siewior	1	-0/+1
	In the following commit: 7675104990ed ("sched: Implement lockless wake-queues") we gained lockless wake-queues. The -RT kernel managed to lockup itself with those. There could be multiple attempts for task X to enqueue it for a wakeup _even_ if task X is already running. The reason is that task X could be runnable but not yet on CPU. The the task performing the wakeup did not leave the CPU it could performe multiple wakeups. With the proper timming task X could be running and enqueued for a wakeup. If this happens while X is performing a fork() then its its child will have a !NULL `wake_q` member copied. This is not a problem as long as the child task does not participate in lockless wakeups :) Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 7675104990ed ("sched: Implement lockless wake-queues") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-06	sched/fair: Fix multiplication overflow on 32-bit systems	Andrey Ryabinin	1	-1/+1
	Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX' on 32-bit systems: UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18 signed integer overflow: 87950 * 47742 cannot be represented in type 'int' The most likely effect of this bug are bad load average numbers resulting in weird scheduling. It's also likely that this can persist for a longer time - until the system goes idle for a long time so that all load avg numbers get reset. [ This is the CFS load average metric, not the procfs output, which is separate. ] Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") Link: http://lkml.kernel.org/r/[email protected] [ Improved the changelog. ] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-06	perf: Fix race in swevent hash	Peter Zijlstra	1	-19/+1
	There's a race on CPU unplug where we free the swevent hash array while it can still have events on. This will result in a use-after-free which is BAD. Simply do not free the hash array on unplug. This leaves the thing around and no use-after-free takes place. When the last swevent dies, we do a for_each_possible_cpu() iteration anyway to clean these up, at which time we'll free it, so no leakage will occur. Reported-by: Sasha Levin <[email protected]> Tested-by: Sasha Levin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2016-01-06	perf: Fix race in perf_event_exec()	Peter Zijlstra	1	-10/+5
	I managed to tickle this warning: [ 2338.884942] ------------[ cut here ]------------ [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80() [ 2338.900504] Modules linked in: [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244 [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 2338.923071] ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000 [ 2338.931382] ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400 [ 2338.939678] 0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00 [ 2338.947987] Call Trace: [ 2338.950726] [<ffffffff815c680c>] dump_stack+0x4e/0x82 [ 2338.956474] [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0 [ 2338.963195] [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20 [ 2338.969720] [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80 [ 2338.976244] [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180 [ 2338.982575] [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0 [ 2338.988810] [<ffffffff8126de83>] load_elf_binary+0x393/0x1660 [ 2338.995339] [<ffffffff811dc772>] ? get_user_pages+0x52/0x60 [ 2339.001669] [<ffffffff8121e297>] search_binary_handler+0x97/0x200 [ 2339.008581] [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0 [ 2339.016072] [<ffffffff8121fcea>] SyS_execve+0x3a/0x50 [ 2339.021819] [<ffffffff819fc165>] stub_execve+0x5/0x5 [ 2339.027469] [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71 [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]--- Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not what we expected it to be. This is because context switches can swap the task_struct::perf_event_ctxp[] pointer around. Therefore you have to either disable preemption when looking at current, or hold ctx->lock. Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[] before disabling interrupts, therefore a preemption in the right place can swap contexts around and we're using the wrong one. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kostya Serebryany <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: syzkaller <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-05	Merge tag 'trace-v4.4-rc4-3' of ↵	Linus Torvalds	1	-0/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two more fixes: 1. The recordmcount change had an output that used sprintf() (incorrectly) when it should have been a fprintf() to stderr. 2. The printk_formats file could crash if someone added a trace_printk() in the core kernel, and also added one in a module. This does not affect production kernels. Only kernels where developers add trace_printk() for debugging can crash" * tag 'trace-v4.4-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix setting of start_index in find_next() ftrace/scripts: Fix incorrect use of sprintf in recordmcount
2016-01-04	PM / sleep: Add support for read-only sysfs attributes	Rafael J. Wysocki	2	-15/+11
	Some sysfs attributes in /sys/power/ should really be read-only, so add support for that, convert those attributes to read-only and drop the stub .show() routines from them. Original-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-01-04	tracing: Fix setting of start_index in find_next()	Qiu Peiyang	1	-0/+1
	When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel panic at t_show. general protection fault: 0000 [#1] PREEMPT SMP CPU: 0 PID: 2957 Comm: sh Tainted: G W O 3.14.55-x86_64-01062-gd4acdc7 #2 RIP: 0010:[<ffffffff811375b2>] [<ffffffff811375b2>] t_show+0x22/0xe0 RSP: 0000:ffff88002b4ebe80 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1 RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0 R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570 FS: 0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0 Call Trace: [<ffffffff811dc076>] seq_read+0x2f6/0x3e0 [<ffffffff811b749b>] vfs_read+0x9b/0x160 [<ffffffff811b7f69>] SyS_read+0x49/0xb0 [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13 ---[ end trace 5bd9eb630614861e ]--- Kernel panic - not syncing: Fatal exception When the first time find_next calls find_next_mod_format, it should iterate the trace_bprintk_fmt_list to find the first print format of the module. However in current code, start_index is smaller than pos at first, and code will not iterate the list. Latter container_of will get the wrong address with former v, which will cause mod_fmt be a meaningless object and so is the returned mod_fmt->fmt. This patch will fix it by correcting the start_index. After fixed, when the first time calls find_next_mod_format, start_index will be equal to pos, and code will iterate the trace_bprintk_fmt_list to get the right module printk format, so is the returned mod_fmt->fmt. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] # 3.12+ Fixes: 102c9323c35a8 "tracing: Add __tracepoint_string() to export string pointers" Signed-off-by: Qiu Peiyang <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-01-04	kernel/*: switch to memdup_user_nul()	Al Viro	4	-95/+48
	Signed-off-by: Al Viro <[email protected]>
2016-01-04	convert a bunch of open-coded instances of memdup_user_nul()	Al Viro	1	-9/+3
	A _lot_ of ->write() instances were open-coding it; some are converted to memdup_user_nul(), a lot more remain... Signed-off-by: Al Viro <[email protected]>
2016-01-02	cgroup: demote subsystem init messages to KERN_DEBUG	Tejun Heo	1	-1/+1
	These are noisy during boot and not all that interesting. Signed-off-by: Tejun Heo <[email protected]>
2015-12-29	bpf: hash: use per-bucket spinlock	[email protected]	1	-18/+32
	Both htab_map_update_elem() and htab_map_delete_elem() can be called from eBPF program, and they may be in kernel hot path, so it isn't efficient to use a per-hashtable lock in this two helpers. The per-hashtable spinlock is used for protecting bucket's hlist, and per-bucket lock is just enough. This patch converts the per-hashtable lock into per-bucket spinlock, so that contention can be decreased a lot. Signed-off-by: Ming Lei <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-29	bpf: hash: move select_bucket() out of htab's spinlock	[email protected]	1	-4/+2
	The spinlock is just used for protecting the per-bucket hlist, so it isn't needed for selecting bucket. Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-29	bpf: hash: use atomic count	[email protected]	1	-6/+6
	Preparing for removing global per-hashtable lock, so the counter need to be defined as aotmic_t first. Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-29	posix-clock: Fix return code on the poll method's error path	Richard Cochran	1	-2/+2
	The posix_clock_poll function is supposed to return a bit mask of POLLxxx values. However, in case the hardware has disappeared (due to hot plugging for example) this code returns -ENODEV in a futile attempt to throw an error at the file descriptor level. The kernel's file_operations interface does not accept such error codes from the poll method. Instead, this function aught to return POLLERR. The value -ENODEV does, in fact, contain the POLLERR bit (and almost all the other POLLxxx bits as well), but only by chance. This patch fixes code to return a proper bit mask. Credit goes to Markus Elfring for pointing out the suspicious signed/unsigned mismatch. Reported-by: Markus Elfring <[email protected]> igned-off-by: Richard Cochran <[email protected]> Cc: John Stultz <[email protected]> Cc: Julia Lawall <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-29	Merge branch 'irq/gic-v2m-acpi' of ↵	Thomas Gleixner	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull another round of GIC changes from Marc: ACPI support for GIV-v2m
2015-12-24	security: Make inode argument of inode_getsecid non-const	Andreas Gruenbacher	3	-5/+5
	Make the inode argument of the inode_getsecid hook non-const so that we can use it to revalidate invalid security labels. Signed-off-by: Andreas Gruenbacher <[email protected]> Acked-by: Stephen Smalley <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2015-12-23	tracing: Fix comment to use tracing_on over tracing_enable	Chuyu Hu	1	-2/+2
	The file tracing_enable is obsolete and does not exist anymore. Replace the comment that references it with the proper tracing_on file. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chuyu Hu <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2015-12-23	ftrace: Clean up ftrace_module_init() code	Steven Rostedt (Red Hat)	1	-6/+3
	The start and end variables were only used when ftrace_module_init() was split up into multiple functions. No need to keep them around after the merger. Signed-off-by: Steven Rostedt <[email protected]>
2015-12-23	ftrace: Join functions ftrace_module_init() and ftrace_init_module()	Abel Vesa	1	-9/+6
	Simple cleanup. No need for two functions here. The whole work can simply be done inside 'ftrace_module_init'. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Abel Vesa <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2015-12-23	bpf: Constify bpf_verifier_ops structure	Julia Lawall	1	-1/+1
	This bpf_verifier_ops structure is never modified, like the other bpf_verifier_ops structures, so declare it as const. Done with the help of Coccinelle. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2015-12-23	ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too	Steven Rostedt (Red Hat)	1	-12/+18
	Jiri Olsa noted that the change to replace the control_ops did not update the trampoline for when running perf on a single CPU and with CONFIG_PREEMPT disabled (where dynamic ops, like perf, can use trampolines directly). The result was that perf function could be called when RCU is not watching as well as not handle the ftrace_local_disable(). Modify the ftrace_ops_get_func() to also check the RCU and PER_CPU ops flags and use the recursive function if they are set. The recursive function is modified to check those flags and execute the appropriate checks if they are set. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Jiri Olsa <[email protected]> Patch-fixed-up-by: Jiri Olsa <[email protected]> Tested-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2015-12-23	ftrace: Remove use of control list and ops	Steven Rostedt (Red Hat)	3	-91/+39
	Currently perf has its own list function within the ftrace infrastructure that seems to be used only to allow for it to have per-cpu disabling as well as a check to make sure that it's not called while RCU is not watching. It uses something called the "control_ops" which is used to iterate over ops under it with the control_list_func(). The problem is that this control_ops and control_list_func unnecessarily complicates the code. By replacing FTRACE_OPS_FL_CONTROL with two new flags (FTRACE_OPS_FL_RCU and FTRACE_OPS_FL_PER_CPU) we can remove all the code that is special with the control ops and add the needed checks within the generic ftrace_list_func(). Signed-off-by: Steven Rostedt <[email protected]>
2015-12-23	ftrace: Fix output of enabled_functions for showing tramp	Steven Rostedt (Red Hat)	1	-3/+4
	When showing all tramps registered to a ftrace record in the file enabled_functions, it exits the loop with ops == NULL. But then it is suppose to show the function on the ops->trampoline and add_trampoline_func() is called with the given ops. But because ops is now NULL (to exit the loop), it always shows the static trampoline instead of the one that is really registered to the record. The call to add_trampoline_func() that shows the trampoline for the given ops needs to be called at every iteration. Fixes: 39daa7b9e895 "ftrace: Show all tramps registered to a record on ftrace_bug()" Signed-off-by: Steven Rostedt <[email protected]>
2015-12-23	ftrace: Fix a typo in comment	Li Bin	1	-1/+1
	s/ARCH_SUPPORT_FTARCE_OPS/ARCH_SUPPORTS_FTRACE_OPS/ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Li Bin <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2015-12-23	Merge branch 'for-linus' into for-next	Takashi Iwai	20	-166/+229
	Conflicts: drivers/gpu/drm/i915/intel_pm.c
2015-12-21	missing include asm/paravirt.h in cputime.c	Stefano Stabellini	1	-0/+3
	Add include asm/paravirt.h to cputime.c, as steal_account_process_tick calls paravirt_steal_clock, which is defined in asm/paravirt.h. The ifdef CONFIG_PARAVIRT is necessary because not all archs have an asm/paravirt.h to include. The reason why currently cputime.c compiles, even though include <asm/paravirt.h> is missing, is that on x86 asm/paravirt.h is included by one of the other headers included in kernel/sched/cputime.c: On arm and arm64, where I am about to introduce asm/paravirt.h and stolen time support, without #include <asm/paravirt.h> in cputime.c, I would get an error. Signed-off-by: Stefano Stabellini <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
2015-12-21	irqdomain: Introduce is_fwnode_irqchip helper	Suravee Suthikulpanit	1	-1/+1
	Since there will be several places checking if fwnode.type is equal FWNODE_IRQCHIP, this patch adds a convenient function for this purpose. Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Suravee Suthikulpanit <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2015-12-20	futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op	Darren Hart	1	-1/+2
	While reviewing Michael Kerrisk's recent futex manpage update, I noticed that we allow the FUTEX_CLOCK_REALTIME flag for FUTEX_WAIT_BITSET but not for FUTEX_WAIT. FUTEX_WAIT is treated as a simple version for FUTEX_WAIT_BITSET internally (with a bitmask of FUTEX_BITSET_MATCH_ANY). As such, I cannot come up with a reason for this exclusion for FUTEX_WAIT. This change does modify the behavior of the futex syscall, changing a call with FUTEX_WAIT \| FUTEX_CLOCK_REALTIME from returning -ENOSYS, to be equivalent to FUTEX_WAIT_BITSET \| FUTEX_CLOCK_REALTIME with a bitset of FUTEX_BITSET_MATCH_ANY. Reported-by: Michael Kerrisk <[email protected]> Signed-off-by: Darren Hart <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Davidlohr Bueso <[email protected]> Link: http://lkml.kernel.org/r/9f3bdc116d79d23f5ee72ceb9a2a857f5ff8fa29.1450474525.git.dvhart@linux.intel.com Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-20	futex: Cleanup the goto confusion in requeue_pi()	Thomas Gleixner	1	-2/+7
	out_unlock: does not only drop the locks, it also drops the refcount on the pi_state. Really intuitive. Move the label after the put_pi_state() call and use 'break' in the error handling path of the requeue loop. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: [email protected] Cc: Andy Lowe <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-20	futex: Remove pointless put_pi_state calls in requeue()	Thomas Gleixner	1	-4/+2
	In the error handling cases we neither have pi_state nor a reference to it. Remove the pointless code. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: [email protected] Cc: Andy Lowe <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-20	futex: Document pi_state refcounting in requeue code	Thomas Gleixner	1	-12/+39
	Documentation of the pi_state refcounting in the requeue code is non existent. Add it. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: [email protected] Cc: Andy Lowe <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-20	futex: Rename free_pi_state() to put_pi_state()	Thomas Gleixner	1	-7/+10
	free_pi_state() is confusing as it is in fact only freeing/caching the pi state when the last reference is gone. Rename it to put_pi_state() which reflects better what it is doing. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: [email protected] Cc: Andy Lowe <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-20	futex: Drop refcount if requeue_pi() acquired the rtmutex	Thomas Gleixner	1	-0/+5
	If the proxy lock in the requeue loop acquires the rtmutex for a waiter then it acquired also refcount on the pi_state related to the futex, but the waiter side does not drop the reference count. Add the missing free_pi_state() call. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: [email protected] Cc: Andy Lowe <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected]
2015-12-20	genirq/msi: Export functions to allow MSI domains in modules	Jake Oshins	2	-0/+5
	The Linux kernel already has the concept of IRQ domain, wherein a component can expose a set of IRQs which are managed by a particular interrupt controller chip or other subsystem. The PCI driver exposes the notion of an IRQ domain for Message-Signaled Interrupts (MSI) from PCI Express devices. This patch exposes the functions which are necessary for creating a MSI IRQ domain within a module. [ tglx: Split it into x86 and core irq parts ] Signed-off-by: Jake Oshins <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-19	clocksource: Make clocksource validation work for all clocksources	Yang Yingliang	1	-1/+5
	The clocksource validation which makes sure that the newly read value is not smaller than the last value only works if the clocksource mask is 64bit, i.e. the counter is 64bit wide. But we want to use that mechanism also for clocksources which are less than 64bit wide. So instead of checking whether bit 63 is set, we check whether the most significant bit of the clocksource mask is set in the delta result. If it is set, we return 0. [ tglx: Simplified the implementation, added a comment and massaged the commit message ] Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Yang Yingliang <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-19	Merge branch 'irq/wire-msi-bridge' of ↵	Thomas Gleixner	35	-221/+425
	git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull the MSI wire bridge implementation from Marc Zyngier along with the first user of it. This is infrastructure to support a wired interrupt to MSI interrupt brigde. The first user is mbigen found in Hisilicon ARM SoCs.
2015-12-19	Merge branch 'fortglx/4.5/time' of ↵	Thomas Gleixner	6	-26/+92
	https://git.linaro.org/people/john.stultz/linux into timers/core Get the core time(keeping) updates from John Stultz - NTP robustness tweaks - Another signed overflow nailed down - More y2038 changes - Stop alarmtimer after resume - MAINTAINERS update - Selftest fixes
2015-12-19	kexec: Fix race between panic() and crash_kexec()	Hidehiro Kawai	2	-3/+35
	Currently, panic() and crash_kexec() can be called at the same time. For example (x86 case): CPU 0: oops_end() crash_kexec() mutex_trylock() // acquired nmi_shootdown_cpus() // stop other CPUs CPU 1: panic() crash_kexec() mutex_trylock() // failed to acquire smp_send_stop() // stop other CPUs infinite loop If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump fails. In another case: CPU 0: oops_end() crash_kexec() mutex_trylock() // acquired <NMI> io_check_error() panic() crash_kexec() mutex_trylock() // failed to acquire infinite loop Clearly, this is an undesirable result. To fix this problem, this patch changes crash_kexec() to exclude others by using the panic_cpu atomic. Signed-off-by: Hidehiro Kawai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Martin Schwidefsky <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Minfei Huang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: x86-ml <[email protected]> Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrs Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-19	panic, x86: Allow CPUs to save registers even if looping in NMI context	Hidehiro Kawai	2	-1/+10
	Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends an NMI IPI to CPUs which haven't called panic() to stop them, save their register information and do some cleanups for crash dumping. However, if such a CPU is infinitely looping in NMI context, we fail to save its register information into the crash dump. For example, this can happen when unknown NMIs are broadcast to all CPUs as follows: CPU 0 CPU 1 =========================== ========================== receive an unknown NMI unknown_nmi_error() panic() receive an unknown NMI spin_trylock(&panic_lock) unknown_nmi_error() crash_kexec() panic() spin_trylock(&panic_lock) panic_smp_self_stop() infinite loop kdump_nmi_shootdown_cpus() issue NMI IPI -----------> blocked until IRET infinite loop... Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET, so the NMI is not handled and the callback function to save registers is never called. In practice, this can happen on some servers which broadcast NMIs to all CPUs when the NMI button is pushed. To save registers in this case, we need to: a) Return from NMI handler instead of looping infinitely or b) Call the callback function directly from the infinite loop Inherently, a) is risky because NMI is also used to prevent corrupted data from being propagated to devices. So, we chose b). This patch does the following: 1. Move the infinite looping of CPUs which haven't called panic() in NMI context (actually done by panic_smp_self_stop()) outside of panic() to enable us to refer pt_regs. Please note that panic_smp_self_stop() is still used for normal context. 2. Call a callback of kdump_nmi_shootdown_cpus() directly to save registers and do some cleanups after setting waiting_for_crash_ipi which is used for counting down the number of CPUs which handled the callback Signed-off-by: Hidehiro Kawai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Aaron Tomlin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Baoquan He <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Dave Young <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Don Zickus <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Gobinda Charan Maji <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: Hidehiro Kawai <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Javi Merino <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Cc: [email protected] Cc: lkml <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Nicolas Iooss <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Stefan Lippers-Hollmann <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ulrich Obergfell <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs [ Cleanup comments, fixup formatting. ] Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-19	panic, x86: Fix re-entrance problem due to panic on NMI	Hidehiro Kawai	2	-4/+14
	If panic on NMI happens just after panic() on the same CPU, panic() is recursively called. Kernel stalls, as a result, after failing to acquire panic_lock. To avoid this problem, don't call panic() in NMI context if we've already entered panic(). For that, introduce nmi_panic() macro to reduce code duplication. In the case of panic on NMI, don't return from NMI handlers if another CPU already panicked. Signed-off-by: Hidehiro Kawai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Aaron Tomlin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Baoquan He <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Don Zickus <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Gobinda Charan Maji <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Javi Merino <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Cc: [email protected] Cc: lkml <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Nicolas Iooss <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ulrich Obergfell <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vivek Goyal <[email protected]> Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs [ Cleanup comments, fixup formatting. ] Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2015-12-18	bpf: move clearing of A/X into classic to eBPF migration prologue	Daniel Borkmann	1	-4/+0
	Back in the days where eBPF (or back then "internal BPF" ;->) was not exposed to user space, and only the classic BPF programs internally translated into eBPF programs, we missed the fact that for classic BPF A and X needed to be cleared. It was fixed back then via 83d5b7ef99c9 ("net: filter: initialize A and X registers"), and thus classic BPF specifics were added to the eBPF interpreter core to work around it. This added some confusion for JIT developers later on that take the eBPF interpreter code as an example for deriving their JIT. F.e. in f75298f5c3fe ("s390/bpf: clear correct BPF accumulator register"), at least X could leak stack memory. Furthermore, since this is only needed for classic BPF translations and not for eBPF (verifier takes care that read access to regs cannot be done uninitialized), more complexity is added to JITs as they need to determine whether they deal with migrations or native eBPF where they can just omit clearing A/X in their prologue and thus reduce image size a bit, see f.e. cde66c2d88da ("s390/bpf: Only clear A and X for converted BPF programs"). In other cases (x86, arm64), A and X is being cleared in the prologue also for eBPF case, which is unnecessary. Lets move this into the BPF migration in bpf_convert_filter() where it actually belongs as long as the number of eBPF JITs are still few. It can thus be done generically; allowing us to remove the quirk from __bpf_prog_run() and to slightly reduce JIT image size in case of eBPF, while reducing code duplication on this matter in current(/future) eBPF JITs. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Reviewed-by: Michael Holzheu <[email protected]> Tested-by: Michael Holzheu <[email protected]> Cc: Zi Shen Lim <[email protected]> Cc: Yang Shi <[email protected]> Acked-by: Yang Shi <[email protected]> Acked-by: Zi Shen Lim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-17	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	23	-177/+270
	Conflicts: drivers/net/geneve.c Here we had an overlapping change, where in 'net' the extraneous stats bump was being removed whilst in 'net-next' the final argument to udp_tunnel6_xmit_skb() was being changed. Signed-off-by: David S. Miller <[email protected]>
2015-12-17	locking/osq: Fix ordering of node initialisation in osq_lock	Will Deacon	1	-3/+5
	The Cavium guys reported a soft lockup on their arm64 machine, caused by commit c55a6ffa6285 ("locking/osq: Relax atomic semantics"): mutex_optimistic_spin+0x9c/0x1d0 __mutex_lock_slowpath+0x44/0x158 mutex_lock+0x54/0x58 kernfs_iop_permission+0x38/0x70 __inode_permission+0x88/0xd8 inode_permission+0x30/0x6c link_path_walk+0x68/0x4d4 path_openat+0xb4/0x2bc do_filp_open+0x74/0xd0 do_sys_open+0x14c/0x228 SyS_openat+0x3c/0x48 el0_svc_naked+0x24/0x28 This is because in osq_lock we initialise the node for the current CPU: node->locked = 0; node->next = NULL; node->cpu = curr; and then publish the current CPU in the lock tail: old = atomic_xchg_acquire(&lock->tail, curr); Once the update to lock->tail is visible to another CPU, the node is then live and can be both read and updated by concurrent lockers. Unfortunately, the ACQUIRE semantics of the xchg operation mean that there is no guarantee the contents of the node will be visible before lock tail is updated. This can lead to lock corruption when, for example, a concurrent locker races to set the next field. Fixes: c55a6ffa6285 ("locking/osq: Relax atomic semantics"): Reported-by: David Daney <[email protected]> Reported-by: Andrew Pinski <[email protected]> Tested-by: Andrew Pinski <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2015-12-16	timekeeping: Cap adjustments so they don't exceed the maxadj value	John Stultz	1	-4/+27
	Thus its been occasionally noted that users have seen confusing warnings like: Adjusting tsc more than 11% (5941981 vs 7759439) We try to limit the maximum total adjustment to 11% (10% tick adjustment + 0.5% frequency adjustment). But this is done by bounding the requested adjustment values, and the internal steering that is done by tracking the error from what was requested and what was applied, does not have any such limits. This is usually not problematic, but in some cases has a risk that an adjustment could cause the clocksource mult value to overflow, so its an indication things are outside of what is expected. It ends up most of the reports of this 11% warning are on systems using chrony, which utilizes the adjtimex() ADJ_TICK interface (which allows a +-10% adjustment). The original rational for ADJ_TICK unclear to me but my assumption it was originally added to allow broken systems to get a big constant correction at boot (see adjtimex userspace package for an example) which would allow the system to work w/ ntpd's 0.5% adjustment limit. Chrony uses ADJ_TICK to make very aggressive short term corrections (usually right at startup). Which push us close enough to the max bound that a few late ticks can cause the internal steering to push past the max adjust value (tripping the warning). Thus this patch adds some extra logic to enforce the max adjustment cap in the internal steering. Note: This has the potential to slow corrections when the ADJ_TICK value is furthest away from the default value. So it would be good to get some testing from folks using chrony, to make sure we don't cause any troubles there. Cc: Miroslav Lichvar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Andy Lutomirski <[email protected]> Tested-by: Miroslav Lichvar <[email protected]> Reported-by: Andy Lutomirski <[email protected]> Signed-off-by: John Stultz <[email protected]>
2015-12-16	ntp: Fix second_overflow's input parameter type to be 64bits	DengChao	2	-8/+10
	The function "second_overflow" uses "unsign long" as its input parameter type which will overflow after year 2106 on 32bit systems. Thus this patch replaces it with time64_t type. While the 64-bit division is expensive, "next_ntp_leap_sec" has been calculated already, so we can just re-use it in the TIME_INS/DEL cases, allowing one expensive division per leapsecond instead of re-doing the divsion once a second after the leap flag has been set. Signed-off-by: DengChao <[email protected]> [jstultz: Tweaked commit message] Signed-off-by: John Stultz <[email protected]>
2015-12-16	ntp: Change time_reftime to time64_t and utilize 64bit __ktime_get_real_seconds	DengChao	1	-4/+6
	The type of static variant "time_reftime" and the call of get_seconds in ntp are both not y2038 safe. So change the type of time_reftime to time64_t and replace get_seconds with __ktime_get_real_seconds. The local variant "secs" in ntp_update_offset represents seconds between now and last ntp adjustment, it seems impossible that this time will last more than 68 years, so keep its type as "long". Reviewed-by: John Stultz <[email protected]> Signed-off-by: DengChao <[email protected]> [jstultz: Tweaked commit message] Signed-off-by: John Stultz <[email protected]>
2015-12-16	timekeeping: Provide internal function __ktime_get_real_seconds	DengChao	2	-0/+15
	In order to fix Y2038 issues in the ntp code we will need replace get_seconds() with ktime_get_real_seconds() but as the ntp code uses the timekeeping lock which is also used by ktime_get_real_seconds(), we need a version without locking. Add a new function __ktime_get_real_seconds() in timekeeping to do this. Reviewed-by: John Stultz <[email protected]> Signed-off-by: DengChao <[email protected]> Signed-off-by: John Stultz <[email protected]>