aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2015-10-22genirq: Make the cpuhotplug migration code less noisyThomas Gleixner1-1/+1
The original arm code has a pr_debug() statement for the case where the irq chip has no set_affinity() callback. That's sufficient for debugging and we really don't want to spam dmesg with useless warnings for the normal case. Fixes: f1e0bb0ad473: "genirq: Introduce generic irq migration for cpu hotunplug" Reported-by: Geert Uytterhoeven <[email protected]> Requested-by: Russell King <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Yang Yingliang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Will Deacon <[email protected]> Cc: Hanjun Guo <[email protected]> Cc: Jiang Liu <[email protected]>
2015-10-21kexec/crash: Say which char is the unrecognizedBorislav Petkov1-3/+3
It is helpful when the crashkernel cmdline parsing routines actually say which character is the unrecognized one. Make them do so. Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Dave Young <[email protected]> Reviewed-by: Joerg Roedel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Baoquan He <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salter <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: WANG Chao <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20tracing: Do not allow stack_tracer to record stack in NMISteven Rostedt (Red Hat)1-0/+4
The code in stack tracer should not be executed within an NMI as it grabs spinlocks and stack tracing an NMI gives the possibility of causing a deadlock. Although this is safe on x86_64, because it does not perform stack traces when the task struct stack is not in use (interrupts and NMIs), it may be an issue for NMIs on i386 and other archs that use the same stack as the NMI. Signed-off-by: Steven Rostedt <[email protected]>
2015-10-20tracing: Have stack tracer force RCU to be watchingSteven Rostedt (Red Hat)1-0/+7
The stack tracer was triggering the WARN_ON() in module.c: static void module_assert_mutex_or_preempt(void) { #ifdef CONFIG_LOCKDEP if (unlikely(!debug_locks)) return; WARN_ON(!rcu_read_lock_sched_held() && !lockdep_is_held(&module_mutex)); #endif } The reason is that the stack tracer traces all function calls, and some of those calls happen while exiting or entering user space and idle. Some of these functions are called after RCU had already stopped watching, as RCU does not watch userspace or idle CPUs. If a max stack is hit, then the save_stack_trace() is called, which will check module addresses and call module_assert_mutex_or_preempt(), and then trigger the warning. Sad part is, the warning itself will also do a stack trace and tigger the same warning. That probably should be fixed. The warning was added by 0be964be0d45 "module: Sanitize RCU usage and locking" but this bug has probably been around longer. But it's unlikely to cause much harm, but the new warning causes the system to lock up. Cc: [email protected] # 4.2+ Cc: Peter Zijlstra <[email protected]> Cc:"Paul E. McKenney" <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2015-10-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller8-21/+33
Conflicts: drivers/net/usb/asix_common.c net/ipv4/inet_connection_sock.c net/switchdev/switchdev.c In the inet_connection_sock.c case the request socket hashing scheme is completely different in net-next. The other two conflicts were overlapping changes. Signed-off-by: David S. Miller <[email protected]>
2015-10-20Merge branch 'fortglx/4.4/time' of ↵Thomas Gleixner3-16/+16
https://git.linaro.org/people/john.stultz/linux into timers/core Time updates from John Stultz: - More 2038 work from Arnd Bergmann around ntp and pps
2015-10-20sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETSOleg Nesterov1-5/+7
If CONFIG_CPUSETS=n then "case cpuset" changes the state and runs the already failed for_each_cpu() loop again for no reason. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()Peter Zijlstra2-9/+4
The cpu_active() tests are not fundamentally part of stop_two_cpus(), move then into the scheduler where they belong. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20sched: Start stopper earlyPeter Zijlstra2-4/+9
Ensure the stopper thread is active 'early', because the load balancer pretty much assumes that its available. And when 'online && active' the load-balancer is fully available. Not only the numa balancing stop_two_cpus() caller relies on it, but also the self migration stuff does, and at CPU_ONLINE time the cpu really is 'free' to run anything. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()Oleg Nesterov1-11/+1
Now that we always use stop_machine_unpark() to wake the stopper threas up, we can kill ->setup() and fold cpu_stop_unpark() into stop_machine_unpark(). And we do not need stopper->lock to set stopper->enabled = true. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce ↵Oleg Nesterov3-4/+12
stop_machine_unpark() 1. Change smpboot_unpark_thread() to check ->selfparking, just like smpboot_park_thread() does. 2. Introduce stop_machine_unpark() which sets ->enabled and calls kthread_unpark(). 3. Change smpboot_thread_call() and cpu_stop_init() to call stop_machine_unpark() by hand. This way: - IMO the ->selfparking logic becomes more consistent. - We can kill the smp_hotplug_thread->pre_unpark() method. - We can easily unpark the stopper thread earlier. Say, we can move stop_machine_unpark() from smpboot_thread_call() to sched_cpu_active() as Peter suggests. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabledOleg Nesterov1-9/+20
Change cpu_stop_queue_two_works() to ensure that both CPU's have stopper->enabled == T or fail otherwise. This way stop_two_cpus() no longer needs to check cpu_active() to avoid the deadlock. This patch doesn't remove these checks, we will do this later. Note: we need to take both stopper->lock's at the same time, but this will also help to remove lglock from stop_machine.c, so I hope this is fine. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()Oleg Nesterov1-11/+26
Preparation to simplify the review of the next change. Add two simple helpers, __cpu_stop_queue_work() and cpu_stop_queue_two_works() which simply take a bit of code from their callers. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20stop_machine: Ensure that a queued callback will be called before ↵Oleg Nesterov2-11/+14
cpu_stop_park() cpu_stop_queue_work() checks stopper->enabled before it queues the work, but ->enabled == T can only guarantee cpu_stop_signal_done() if we race with cpu_down(). This is not enough for stop_two_cpus() or stop_machine(), they will deadlock if multi_cpu_stop() won't be called by one of the target CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex. But stop_two_cpus() has to check cpu_active() to avoid the same race with hotplug, and this check is very unobvious and probably not even correct if we race with cpu_up(). Change cpu_down() pass to clear ->enabled before cpu_stopper_thread() flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set. Note also that smpboot_thread_call() calls cpu_stop_unpark() which sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until cpu_stopper_thread() is called at least once. This all means that if cpu_stop_queue_work() succeeds, we know that work->fn() will be called. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20Merge branch 'sched/urgent' into sched/core, to pick up fixes and resolve ↵Ingo Molnar5-11/+13
conflicts Conflicts: kernel/sched/fair.c Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20Merge tag 'v4.3-rc6' into locking/core, to pick up fixes before applying new ↵Ingo Molnar6-18/+15
changes Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20sched/deadline: Fix migration of SCHED_DEADLINE tasksLuca Abeni1-3/+5
Commit: 9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target") broke select_task_rq_dl() and find_lock_later_rq(), because it introduced a comparison between the local task's deadline and dl.earliest_dl.curr of the remote queue. However, if the remote runqueue does not contain any SCHED_DEADLINE task its earliest_dl.curr is 0 (always smaller than the deadline of the local task) and the remote runqueue is not selected for pushing. As a result, if an application creates multiple SCHED_DEADLINE threads, they will never be pushed to runqueues that do not already contain SCHED_DEADLINE tasks. This patch fixes the issue by checking if dl.dl_nr_running == 0. Signed-off-by: Luca Abeni <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wanpeng Li <[email protected]> Fixes: 9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20nohz: Revert "nohz: Set isolcpus when nohz_full is set"Frederic Weisbecker1-3/+0
This reverts: 8cb9764fc88b ("nohz: Set isolcpus when nohz_full is set") We assumed that full-nohz users always want scheduler isolation on full dynticks CPUs, therefore we included full-nohz CPUs on cpu_isolated_map. This means that tasks run by default on CPUs outside the nohz_full range unless their affinity is explicity overwritten. This suits pure isolation workloads but when the machine is needed to run common workloads, the available sets of CPUs to run common tasks becomes reduced. We reach an extreme case when CONFIG_NO_HZ_FULL_ALL is enabled as it leaves only CPU 0 for non-isolation tasks, which makes people think that their supercomputer regressed to 90's UP - which is true in a sense. Some full-nohz users appear to be interested in running normal workloads either before or after an isolation workload. Full-nohz isn't optimized toward normal workloads but it's still better than UP performance. We are reaching a limitation in kernel presets here. Lets revert this cpu_isolated_map inclusion and let userspace do its own scheduler isolation using cpusets or explicit affinity settings. Reported-by: Ingo Molnar <[email protected]> Reported-by: Mike Galbraith <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Jones <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E . McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20sched/fair: Update task group's load_avg after task migrationYuyang Du1-2/+3
When cfs_rq has cfs_rq->removed_load_avg set (when a task migrates from this cfs_rq), we need to update its contribution to the group's load_avg. This should not increase tg's update too much, because in most cases, the cfs_rq has already decayed its load_avg. Tested-by: Dietmar Eggemann <[email protected]> Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20sched/fair: Fix overly small weight for interactive group entitiesYuyang Du1-2/+2
Commit: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") led to an overly small weight for interactive group entities. The bad case can be easily reproduced when a number of CPU hogs compete for the CPUs at the same time (thanks to Mike). This is largly because the task group's load average tracking cross CPUs lags behind the real changes. To fix this we accelerate the group share distribution process by using the load.weight of the cfs_rq. This may increase the entire group's share, but we have to do so to protect the (fragile) interactive tasks, especially from CPU hogs. Reported-by: Mike Galbraith <[email protected]> Tested-by: Dietmar Eggemann <[email protected]> Tested-by: Mike Galbraith <[email protected]> Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-19Merge branch 'for-mingo' of ↵Ingo Molnar16-443/+1111
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Miscellaneous fixes. (Paul E. McKenney, Boqun Feng, Oleg Nesterov, Patrick Marlier) - Improvements to expedited grace periods. (Paul E. McKenney) - Performance improvements to and locktorture tests for percpu-rwsem. (Oleg Nesterov, Paul E. McKenney) - Torture-test changes. (Paul E. McKenney, Davidlohr Bueso) - Documentation updates. (Paul E. McKenney) Signed-off-by: Ingo Molnar <[email protected]>
2015-10-17Merge branches 'irq-urgent-for-linus' and 'timers-urgent-for-linus' of ↵Linus Torvalds2-6/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq/timer fixes from Thomas Gleixner: "irq: a fix for the new hierarchical MSI interrupt handling which unbreaks PCI=n configurations. timers: a fix for the new hrtimer clock offset update mechanism to ensure that the boot time offset is respected" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi: Do not use pci_msi_[un]mask_irq as default methods * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Increment clock_was_set_seq in timekeeping_init()
2015-10-16timekeeping: Increment clock_was_set_seq in timekeeping_init()Thomas Gleixner1-1/+1
timekeeping_init() can set the wall time offset, so we need to increment the clock_was_set_seq counter. That way hrtimers will pick up the early offset immediately. Otherwise on a machine which does not set wall time later in the boot process the hrtimer offset is stale at 0 and wall time timers are going to expire with a delay of 45 years. Fixes: 868a3e915f7f "hrtimer: Make offset update smarter" Reported-and-tested-by: Heiko Carstens <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Stefan Liebler <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: John Stultz <[email protected]>
2015-10-16genirq/msi: Do not use pci_msi_[un]mask_irq as default methodsMarc Zyngier1-5/+1
When we create a generic MSI domain, that MSI_FLAG_USE_DEF_CHIP_OPS is set, and that any of .mask or .unmask are NULL in the irq_chip structure, we set them to pci_msi_[un]mask_irq. This is a bad idea for at least two reasons: - PCI_MSI might not be selected, kernel fails to build (yes, this is legitimate, at least on arm64!) - This may not be a PCI/MSI domain at all (platform MSI, for example) Either way, this looks wrong. Move the overriding of mask/unmask to the PCI counterpart, and panic is any of these two methods is not set in the core code (they really should be present). Signed-off-by: Marc Zyngier <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Bjorn Helgaas <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-16bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_putTom Herbert1-0/+1
Currently, is only called from __prog_put_rcu in the bpf_prog_release path. Need this to call this from bpf_prog_put also to get correct accounting. Fixes: aaac3ba95e4c8b49 ("bpf: charge user for creation of BPF maps and programs") Signed-off-by: Tom Herbert <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-15Merge branch 'for-4.3-fixes' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixlet from Tejun Heo: "Single patch to make delayed work always be queued on the local CPU" This is not actually something we should guarantee, but it's something we by accident have historically done, and at least one call site has grown to depend on it. I'm going to fix that known broken callsite, but in the meantime this makes the accidental behavior be explicit, just in case there are other cases that might depend on it. * 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: make sure delayed work run in local cpu
2015-10-15posix_cpu_timer: Reduce unnecessary sighand lock contentionJason Low1-2/+24
It was found while running a database workload on large systems that significant time was spent trying to acquire the sighand lock. The issue was that whenever an itimer expired, many threads ended up simultaneously trying to send the signal. Most of the time, nothing happened after acquiring the sighand lock because another thread had just already sent the signal and updated the "next expire" time. The fastpath_timer_check() didn't help much since the "next expire" time was updated after the threads exit fastpath_timer_check(). This patch addresses this by having the thread_group_cputimer structure maintain a boolean to signify when a thread in the group is already checking for process wide timers, and adds extra logic in the fastpath to check the boolean. Signed-off-by: Jason Low <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: George Spelvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-15posix_cpu_timer: Convert cputimer->running to boolJason Low2-3/+3
In the next patch in this series, a new field 'checking_timer' will be added to 'struct thread_group_cputimer'. Both this and the existing 'running' integer field are just used as boolean values. To save space in the structure, we can make both of these fields booleans. This is a preparatory patch to convert the existing running integer field to a boolean. Suggested-by: George Spelvin <[email protected]> Signed-off-by: Jason Low <[email protected]> Reviewed: George Spelvin <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-15posix_cpu_timer: Check thread timers only when there are active thread timersJason Low1-6/+16
The fastpath_timer_check() contains logic to check for if any timers are set by checking if !task_cputime_zero(). Similarly, we can do this before calling check_thread_timers(). In the case where there are only process-wide timers, this will skip all of the computations for per-thread timers when there are no per-thread timers. As suggested by George, we can put the task_cputime_zero() check in check_thread_timers(), since that is more of an optization to the function. Similarly, we move the existing check of cputimer->running to check_process_timers(). Signed-off-by: Jason Low <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: George Spelvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-15posix_cpu_timer: Optimize fastpath_timer_check()Jason Low1-8/+3
In fastpath_timer_check(), the task_cputime() function is always called to compute the utime and stime values. However, this is not necessary if there are no per-thread timers to check for. This patch modifies the code such that we compute the task_cputime values only when there are per-thread timers set. Signed-off-by: Jason Low <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Reviewed-by: George Spelvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-14PM / hibernate: fix a comment typoGeliang Tang1-1/+1
Just fix a typo in a function name in kerneldoc comments. Signed-off-by: Geliang Tang <[email protected]> Acked-by: Pavel Machek <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2015-10-14PM / sleep: Add flags to indicate platform firmware involvementRafael J. Wysocki1-0/+4
There are quite a few cases in which device drivers, bus types or even the PM core itself may benefit from knowing whether or not the platform firmware will be involved in the upcoming system power transition (during system suspend) or whether or not it was involved in it (during system resume). For this reason, introduce global system suspend flags that can be used by the platform code to expose that information for the benefit of the other parts of the kernel and make the ACPI core set them as appropriate. Users of the new flags will be added later. Signed-off-by: Rafael J. Wysocki <[email protected]>
2015-10-13irqdomain/msi: Use fwnode instead of of_nodeMarc Zyngier1-4/+4
As we continue to push of_node towards the outskirts of irq domains, let's start tackling the case of msi_create_irq_domain and its little friends. This has limited impact in both PCI/MSI, platform MSI, and a few drivers. Signed-off-by: Marc Zyngier <[email protected]> Tested-by: Hanjun Guo <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Cc: <[email protected]> Cc: Tomasz Nowicki <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Graeme Gregory <[email protected]> Cc: Jake Oshins <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-13irqdomain: Introduce irq_domain_create_hierarchyMarc Zyngier1-6/+6
As we're about to start converting the various MSI layers to use fwnode_handle instead of device_node, add irq_domain_create_hierarchy as a directly equivalent of irq_domain_add_hierarchy (which still exists as a compatibility interface). Signed-off-by: Marc Zyngier <[email protected]> Tested-by: Hanjun Guo <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Cc: <[email protected]> Cc: Tomasz Nowicki <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Graeme Gregory <[email protected]> Cc: Jake Oshins <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-13irqdomain: Add a fwnode_handle allocatorMarc Zyngier1-0/+51
In order to be able to reference an irqdomain from ACPI, we need to be able to create an identifier, which is usually a struct device_node. This device node does't really fit the ACPI infrastructure, so we cunningly allocate a new structure containing a fwnode_handle, and return that. This structure doesn't really point to a device (interrupt controllers are not "real" devices in Linux), but as we cannot really deny that they exist, we create them with a new fwnode_type (FWNODE_IRQCHIP). Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Reviewed-and-tested-by: Hanjun Guo <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Cc: <[email protected]> Cc: Tomasz Nowicki <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Graeme Gregory <[email protected]> Cc: Jake Oshins <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-13irqdomain: Introduce irq_domain_create_{linear, tree}Marc Zyngier1-5/+6
Just like we have irq_domain_add_{linear,tree} to create a irq domain identified by an of_node, introduce irq_domain_create_{linear,tree} that do the same thing, except that they take a struct fwnode_handle. Existing functions get rewritten in terms of the new ones so that everything keeps working as before (and __irq_domain_add is now fwnode_handle based as well). Signed-off-by: Marc Zyngier <[email protected]> Reviewed-and-tested-by: Hanjun Guo <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Cc: <[email protected]> Cc: Tomasz Nowicki <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Graeme Gregory <[email protected]> Cc: Jake Oshins <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-13irqdomain: Introduce irq_create_fwspec_mappingMarc Zyngier1-15/+15
Just like we have irq_create_of_mapping, irq_create_fwspec_mapping creates a IRQ domain mapping for an interrupt described in a struct irq_fwspec. irq_create_of_mapping gets rewritten in terms of the new function, and the hack we introduced before gets removed (now that no stacked irqchip uses of_phandle_args anymore). Signed-off-by: Marc Zyngier <[email protected]> Reviewed-and-tested-by: Hanjun Guo <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Cc: <[email protected]> Cc: Tomasz Nowicki <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Graeme Gregory <[email protected]> Cc: Jake Oshins <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-13irqdomain: Introduce a firmware-specific IRQ specifier structureMarc Zyngier1-11/+48
So far the closest thing to a generic IRQ specifier structure is of_phandle_args, which happens to be pretty OF specific (the of_node pointer in there is quite annoying). Let's introduce 'struct irq_fwspec' that can be used in place of of_phandle_args for OF, but also for other firmware implementations (that'd be ACPI). This is used together with a new 'translate' method that is the pendent of 'xlate'. We convert irq_create_of_mapping to use this new structure (with a small hack that will be removed later). Signed-off-by: Marc Zyngier <[email protected]> Reviewed-and-tested-by: Hanjun Guo <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Cc: <[email protected]> Cc: Tomasz Nowicki <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Graeme Gregory <[email protected]> Cc: Jake Oshins <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-13irqdomain: Allow irq domain lookup by fwnodeMarc Zyngier1-9/+7
So far, our irq domains are still looked up by device node. Let's change this and allow a domain to be looked up using a fwnode_handle pointer. The existing interfaces are preserved with a couple of helpers. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-and-tested-by: Hanjun Guo <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Cc: <[email protected]> Cc: Tomasz Nowicki <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Graeme Gregory <[email protected]> Cc: Jake Oshins <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-13irqdomain: Convert irqdomain-%3Eof_node to fwnodeMarc Zyngier1-1/+5
Now that we have everyone accessing the of_node field via the irq_domain_get_of_node accessor, it is pretty easy to swap it for a pointer to a fwnode_handle. This translates into a few limited changes in __irq_domain_add, and an updated irq_domain_get_of_node. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-and-tested-by: Hanjun Guo <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Cc: <[email protected]> Cc: Tomasz Nowicki <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Graeme Gregory <[email protected]> Cc: Jake Oshins <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-13irqdomain: Use irq_domain_get_of_node() instead of direct field accessMarc Zyngier1-9/+21
The struct irq_domain contains a "struct device_node *" field (of_node) that is almost the only link between the irqdomain and the device tree infrastructure. In order to prepare for the removal of that field, convert all users to use irq_domain_get_of_node() instead. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-and-tested-by: Hanjun Guo <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Cc: <[email protected]> Cc: Tomasz Nowicki <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Graeme Gregory <[email protected]> Cc: Jake Oshins <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-13Merge branch 'linus' into irq/coreThomas Gleixner8-52/+129
Bring in upstream updates for patches which depend on them
2015-10-12bpf: charge user for creation of BPF maps and programsAlexei Starovoitov3-1/+68
since eBPF programs and maps use kernel memory consider it 'locked' memory from user accounting point of view and charge it against RLIMIT_MEMLOCK limit. This limit is typically set to 64Kbytes by distros, so almost all bpf+tracing programs would need to increase it, since they use maps, but kernel charges maximum map size upfront. For example the hash map of 1024 elements will be charged as 64Kbyte. It's inconvenient for current users and changes current behavior for root, but probably worth doing to be consistent root vs non-root. Similar accounting logic is done by mmap of perf_event. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-12bpf: enable non-root eBPF programsAlexei Starovoitov3-14/+116
In order to let unprivileged users load and execute eBPF programs teach verifier to prevent pointer leaks. Verifier will prevent - any arithmetic on pointers (except R10+Imm which is used to compute stack addresses) - comparison of pointers (except if (map_value_ptr == 0) ... ) - passing pointers to helper functions - indirectly passing pointers in stack to helper functions - returning pointer from bpf program - storing pointers into ctx or maps Spill/fill of pointers into stack is allowed, but mangling of pointers stored in the stack or reading them byte by byte is not. Within bpf programs the pointers do exist, since programs need to be able to access maps, pass skb pointer to LD_ABS insns, etc but programs cannot pass such pointer values to the outside or obfuscate them. Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs, so that socket filters (tcpdump), af_packet (quic acceleration) and future kcm can use it. tracing and tc cls/act program types still require root permissions, since tracing actually needs to be able to see all kernel pointers and tc is for root only. For example, the following unprivileged socket filter program is allowed: int bpf_prog1(struct __sk_buff *skb) { u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); u64 *value = bpf_map_lookup_elem(&my_map, &index); if (value) *value += skb->len; return 0; } but the following program is not: int bpf_prog1(struct __sk_buff *skb) { u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); u64 *value = bpf_map_lookup_elem(&my_map, &index); if (value) *value += (u64) skb; return 0; } since it would leak the kernel address into the map. Unprivileged socket filter bpf programs have access to the following helper functions: - map lookup/update/delete (but they cannot store kernel pointers into them) - get_random (it's already exposed to unprivileged user space) - get_smp_processor_id - tail_call into another socket filter program - ktime_get_ns The feature is controlled by sysctl kernel.unprivileged_bpf_disabled. This toggle defaults to off (0), but can be set true (1). Once true, bpf programs and maps cannot be accessed from unprivileged process, and the toggle cannot be set back to false. Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-12Merge back earlier 'pm-sleep' material for v4.4.Rafael J. Wysocki2-1/+18
2015-10-12Merge tag 'v4.3-rc5' into timers/core, to pick up fixes before applying new ↵Ingo Molnar11-82/+220
changes Signed-off-by: Ingo Molnar <[email protected]>
2015-10-12sched, tracing: Stop/start critical timings around the idle=poll idle loopDaniel Bristot de Oliveira1-0/+2
When using idle=poll, the preemptoff tracer is always showing the idle task as the culprit for long latencies. That happens because critical timings are not stopped before idle loop. This patch stops critical timings before entering the idle loop, starting it again after the idle loop. This problem does not affect the irqsoff tracer because interruptions are enabled before entering the idle loop. Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Reviewed-by: Luis Claudio R. Goncalves <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/10fc3705874aef11dbe152a068b591a7be1899b4.1444314899.git.bristot@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2015-10-11timers: Use __fls in apply_slack()Rasmus Villemoes1-1/+1
In apply_slack(), find_last_bit() is applied to a bitmask consisting of precisely BITS_PER_LONG bits. Since mask is non-zero, we might as well eliminate the function call and use __fls() directly. On x86_64, this shaves 23 bytes of the only caller, mod_timer(). This also gets rid of Coverity CID 1192106, but that is a false positive: Coverity is not aware that mask != 0 implies that find_last_bit will not return BITS_PER_LONG. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: John Stultz <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-11clocksource: Remove return statement from void functionsGuillaume Gomez1-3/+2
Signed-off-by: Guillaume Gomez <[email protected]> Cc: John Stultz <[email protected]> Link: http://lkml.kernel.org/r/CAAOQCfSDgmqSWDBsetau%2ByF8x0%[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-11Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2-7/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "Fix a long standing state race in finish_task_switch()" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix TASK_DEAD race in finish_task_switch()