Age | Commit message (Collapse) | Author | Files | Lines |
|
The original arm code has a pr_debug() statement for the case where
the irq chip has no set_affinity() callback. That's sufficient for
debugging and we really don't want to spam dmesg with useless warnings
for the normal case.
Fixes: f1e0bb0ad473: "genirq: Introduce generic irq migration for cpu hotunplug"
Reported-by: Geert Uytterhoeven <[email protected]>
Requested-by: Russell King <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Yang Yingliang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Hanjun Guo <[email protected]>
Cc: Jiang Liu <[email protected]>
|
|
It is helpful when the crashkernel cmdline parsing routines
actually say which character is the unrecognized one. Make them
do so.
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Dave Young <[email protected]>
Reviewed-by: Joerg Roedel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: WANG Chao <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The code in stack tracer should not be executed within an NMI as it grabs
spinlocks and stack tracing an NMI gives the possibility of causing a
deadlock. Although this is safe on x86_64, because it does not perform stack
traces when the task struct stack is not in use (interrupts and NMIs), it
may be an issue for NMIs on i386 and other archs that use the same stack as
the NMI.
Signed-off-by: Steven Rostedt <[email protected]>
|
|
The stack tracer was triggering the WARN_ON() in module.c:
static void module_assert_mutex_or_preempt(void)
{
#ifdef CONFIG_LOCKDEP
if (unlikely(!debug_locks))
return;
WARN_ON(!rcu_read_lock_sched_held() &&
!lockdep_is_held(&module_mutex));
#endif
}
The reason is that the stack tracer traces all function calls, and some of
those calls happen while exiting or entering user space and idle. Some of
these functions are called after RCU had already stopped watching, as RCU
does not watch userspace or idle CPUs.
If a max stack is hit, then the save_stack_trace() is called, which will
check module addresses and call module_assert_mutex_or_preempt(), and then
trigger the warning. Sad part is, the warning itself will also do a stack
trace and tigger the same warning. That probably should be fixed.
The warning was added by 0be964be0d45 "module: Sanitize RCU usage and
locking" but this bug has probably been around longer. But it's unlikely to
cause much harm, but the new warning causes the system to lock up.
Cc: [email protected] # 4.2+
Cc: Peter Zijlstra <[email protected]>
Cc:"Paul E. McKenney" <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
Conflicts:
drivers/net/usb/asix_common.c
net/ipv4/inet_connection_sock.c
net/switchdev/switchdev.c
In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.
The other two conflicts were overlapping changes.
Signed-off-by: David S. Miller <[email protected]>
|
|
https://git.linaro.org/people/john.stultz/linux into timers/core
Time updates from John Stultz:
- More 2038 work from Arnd Bergmann around ntp and pps
|
|
If CONFIG_CPUSETS=n then "case cpuset" changes the state and runs
the already failed for_each_cpu() loop again for no reason.
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The cpu_active() tests are not fundamentally part of stop_two_cpus(),
move then into the scheduler where they belong.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Ensure the stopper thread is active 'early', because the load balancer
pretty much assumes that its available. And when 'online && active' the
load-balancer is fully available.
Not only the numa balancing stop_two_cpus() caller relies on it, but
also the self migration stuff does, and at CPU_ONLINE time the cpu
really is 'free' to run anything.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that we always use stop_machine_unpark() to wake the stopper
threas up, we can kill ->setup() and fold cpu_stop_unpark() into
stop_machine_unpark().
And we do not need stopper->lock to set stopper->enabled = true.
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
stop_machine_unpark()
1. Change smpboot_unpark_thread() to check ->selfparking, just
like smpboot_park_thread() does.
2. Introduce stop_machine_unpark() which sets ->enabled and calls
kthread_unpark().
3. Change smpboot_thread_call() and cpu_stop_init() to call
stop_machine_unpark() by hand.
This way:
- IMO the ->selfparking logic becomes more consistent.
- We can kill the smp_hotplug_thread->pre_unpark() method.
- We can easily unpark the stopper thread earlier. Say, we
can move stop_machine_unpark() from smpboot_thread_call()
to sched_cpu_active() as Peter suggests.
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Change cpu_stop_queue_two_works() to ensure that both CPU's have
stopper->enabled == T or fail otherwise.
This way stop_two_cpus() no longer needs to check cpu_active() to
avoid the deadlock. This patch doesn't remove these checks, we will
do this later.
Note: we need to take both stopper->lock's at the same time, but this
will also help to remove lglock from stop_machine.c, so I hope this
is fine.
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Preparation to simplify the review of the next change. Add two simple
helpers, __cpu_stop_queue_work() and cpu_stop_queue_two_works() which
simply take a bit of code from their callers.
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
cpu_stop_park()
cpu_stop_queue_work() checks stopper->enabled before it queues the
work, but ->enabled == T can only guarantee cpu_stop_signal_done()
if we race with cpu_down().
This is not enough for stop_two_cpus() or stop_machine(), they will
deadlock if multi_cpu_stop() won't be called by one of the target
CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex.
But stop_two_cpus() has to check cpu_active() to avoid the same race
with hotplug, and this check is very unobvious and probably not even
correct if we race with cpu_up().
Change cpu_down() pass to clear ->enabled before cpu_stopper_thread()
flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set.
Note also that smpboot_thread_call() calls cpu_stop_unpark() which
sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until
cpu_stopper_thread() is called at least once. This all means that if
cpu_stop_queue_work() succeeds, we know that work->fn() will be called.
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
conflicts
Conflicts:
kernel/sched/fair.c
Signed-off-by: Ingo Molnar <[email protected]>
|
|
changes
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit:
9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target")
broke select_task_rq_dl() and find_lock_later_rq(), because it introduced
a comparison between the local task's deadline and dl.earliest_dl.curr of
the remote queue.
However, if the remote runqueue does not contain any SCHED_DEADLINE
task its earliest_dl.curr is 0 (always smaller than the deadline of
the local task) and the remote runqueue is not selected for pushing.
As a result, if an application creates multiple SCHED_DEADLINE
threads, they will never be pushed to runqueues that do not already
contain SCHED_DEADLINE tasks.
This patch fixes the issue by checking if dl.dl_nr_running == 0.
Signed-off-by: Luca Abeni <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Wanpeng Li <[email protected]>
Fixes: 9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This reverts:
8cb9764fc88b ("nohz: Set isolcpus when nohz_full is set")
We assumed that full-nohz users always want scheduler isolation on full
dynticks CPUs, therefore we included full-nohz CPUs on cpu_isolated_map.
This means that tasks run by default on CPUs outside the nohz_full range
unless their affinity is explicity overwritten.
This suits pure isolation workloads but when the machine is needed to
run common workloads, the available sets of CPUs to run common tasks
becomes reduced.
We reach an extreme case when CONFIG_NO_HZ_FULL_ALL is enabled as it
leaves only CPU 0 for non-isolation tasks, which makes people think that
their supercomputer regressed to 90's UP - which is true in a sense.
Some full-nohz users appear to be interested in running normal workloads
either before or after an isolation workload. Full-nohz isn't optimized
toward normal workloads but it's still better than UP performance.
We are reaching a limitation in kernel presets here. Lets revert this
cpu_isolated_map inclusion and let userspace do its own scheduler
isolation using cpusets or explicit affinity settings.
Reported-by: Ingo Molnar <[email protected]>
Reported-by: Mike Galbraith <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When cfs_rq has cfs_rq->removed_load_avg set (when a task migrates from
this cfs_rq), we need to update its contribution to the group's load_avg.
This should not increase tg's update too much, because in most cases, the
cfs_rq has already decayed its load_avg.
Tested-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Yuyang Du <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dietmar Eggemann <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit:
9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
led to an overly small weight for interactive group entities. The bad case
can be easily reproduced when a number of CPU hogs compete for the CPUs
at the same time (thanks to Mike). This is largly because the task group's
load average tracking cross CPUs lags behind the real changes.
To fix this we accelerate the group share distribution process by using
the load.weight of the cfs_rq. This may increase the entire group's
share, but we have to do so to protect the (fragile) interactive
tasks, especially from CPU hogs.
Reported-by: Mike Galbraith <[email protected]>
Tested-by: Dietmar Eggemann <[email protected]>
Tested-by: Mike Galbraith <[email protected]>
Signed-off-by: Yuyang Du <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dietmar Eggemann <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
- Miscellaneous fixes. (Paul E. McKenney, Boqun Feng, Oleg Nesterov, Patrick Marlier)
- Improvements to expedited grace periods. (Paul E. McKenney)
- Performance improvements to and locktorture tests for percpu-rwsem.
(Oleg Nesterov, Paul E. McKenney)
- Torture-test changes. (Paul E. McKenney, Davidlohr Bueso)
- Documentation updates. (Paul E. McKenney)
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq/timer fixes from Thomas Gleixner:
"irq: a fix for the new hierarchical MSI interrupt handling which
unbreaks PCI=n configurations.
timers: a fix for the new hrtimer clock offset update mechanism to
ensure that the boot time offset is respected"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi: Do not use pci_msi_[un]mask_irq as default methods
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Increment clock_was_set_seq in timekeeping_init()
|
|
timekeeping_init() can set the wall time offset, so we need to
increment the clock_was_set_seq counter. That way hrtimers will pick
up the early offset immediately. Otherwise on a machine which does not
set wall time later in the boot process the hrtimer offset is stale at
0 and wall time timers are going to expire with a delay of 45 years.
Fixes: 868a3e915f7f "hrtimer: Make offset update smarter"
Reported-and-tested-by: Heiko Carstens <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Stefan Liebler <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: John Stultz <[email protected]>
|
|
When we create a generic MSI domain, that MSI_FLAG_USE_DEF_CHIP_OPS
is set, and that any of .mask or .unmask are NULL in the irq_chip
structure, we set them to pci_msi_[un]mask_irq.
This is a bad idea for at least two reasons:
- PCI_MSI might not be selected, kernel fails to build (yes, this is
legitimate, at least on arm64!)
- This may not be a PCI/MSI domain at all (platform MSI, for example)
Either way, this looks wrong. Move the overriding of mask/unmask to
the PCI counterpart, and panic is any of these two methods is not
set in the core code (they really should be present).
Signed-off-by: Marc Zyngier <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Currently, is only called from __prog_put_rcu in the bpf_prog_release
path. Need this to call this from bpf_prog_put also to get correct
accounting.
Fixes: aaac3ba95e4c8b49 ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: Tom Herbert <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixlet from Tejun Heo:
"Single patch to make delayed work always be queued on the local CPU"
This is not actually something we should guarantee, but it's something
we by accident have historically done, and at least one call site has
grown to depend on it.
I'm going to fix that known broken callsite, but in the meantime this
makes the accidental behavior be explicit, just in case there are other
cases that might depend on it.
* 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: make sure delayed work run in local cpu
|
|
It was found while running a database workload on large systems that
significant time was spent trying to acquire the sighand lock.
The issue was that whenever an itimer expired, many threads ended up
simultaneously trying to send the signal. Most of the time, nothing
happened after acquiring the sighand lock because another thread
had just already sent the signal and updated the "next expire" time.
The fastpath_timer_check() didn't help much since the "next expire"
time was updated after the threads exit fastpath_timer_check().
This patch addresses this by having the thread_group_cputimer structure
maintain a boolean to signify when a thread in the group is already
checking for process wide timers, and adds extra logic in the fastpath
to check the boolean.
Signed-off-by: Jason Low <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: George Spelvin <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
In the next patch in this series, a new field 'checking_timer' will
be added to 'struct thread_group_cputimer'. Both this and the
existing 'running' integer field are just used as boolean values. To
save space in the structure, we can make both of these fields booleans.
This is a preparatory patch to convert the existing running integer
field to a boolean.
Suggested-by: George Spelvin <[email protected]>
Signed-off-by: Jason Low <[email protected]>
Reviewed: George Spelvin <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The fastpath_timer_check() contains logic to check for if any timers
are set by checking if !task_cputime_zero(). Similarly, we can do this
before calling check_thread_timers(). In the case where there
are only process-wide timers, this will skip all of the computations for
per-thread timers when there are no per-thread timers.
As suggested by George, we can put the task_cputime_zero() check in
check_thread_timers(), since that is more of an optization to the
function. Similarly, we move the existing check of cputimer->running
to check_process_timers().
Signed-off-by: Jason Low <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: George Spelvin <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
In fastpath_timer_check(), the task_cputime() function is always
called to compute the utime and stime values. However, this is not
necessary if there are no per-thread timers to check for. This patch
modifies the code such that we compute the task_cputime values only
when there are per-thread timers set.
Signed-off-by: Jason Low <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Reviewed-by: George Spelvin <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Just fix a typo in a function name in kerneldoc comments.
Signed-off-by: Geliang Tang <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
There are quite a few cases in which device drivers, bus types or
even the PM core itself may benefit from knowing whether or not
the platform firmware will be involved in the upcoming system power
transition (during system suspend) or whether or not it was involved
in it (during system resume).
For this reason, introduce global system suspend flags that can be
used by the platform code to expose that information for the benefit
of the other parts of the kernel and make the ACPI core set them
as appropriate.
Users of the new flags will be added later.
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
As we continue to push of_node towards the outskirts of irq domains,
let's start tackling the case of msi_create_irq_domain and its little
friends.
This has limited impact in both PCI/MSI, platform MSI, and a few
drivers.
Signed-off-by: Marc Zyngier <[email protected]>
Tested-by: Hanjun Guo <[email protected]>
Tested-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]>
Cc: Tomasz Nowicki <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Graeme Gregory <[email protected]>
Cc: Jake Oshins <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
As we're about to start converting the various MSI layers to
use fwnode_handle instead of device_node, add irq_domain_create_hierarchy
as a directly equivalent of irq_domain_add_hierarchy (which still
exists as a compatibility interface).
Signed-off-by: Marc Zyngier <[email protected]>
Tested-by: Hanjun Guo <[email protected]>
Tested-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]>
Cc: Tomasz Nowicki <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Graeme Gregory <[email protected]>
Cc: Jake Oshins <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
In order to be able to reference an irqdomain from ACPI, we need
to be able to create an identifier, which is usually a struct
device_node.
This device node does't really fit the ACPI infrastructure, so
we cunningly allocate a new structure containing a fwnode_handle,
and return that.
This structure doesn't really point to a device (interrupt
controllers are not "real" devices in Linux), but as we cannot
really deny that they exist, we create them with a new fwnode_type
(FWNODE_IRQCHIP).
Signed-off-by: Marc Zyngier <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Reviewed-and-tested-by: Hanjun Guo <[email protected]>
Tested-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]>
Cc: Tomasz Nowicki <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Graeme Gregory <[email protected]>
Cc: Jake Oshins <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Just like we have irq_domain_add_{linear,tree} to create a irq domain
identified by an of_node, introduce irq_domain_create_{linear,tree}
that do the same thing, except that they take a struct fwnode_handle.
Existing functions get rewritten in terms of the new ones so that
everything keeps working as before (and __irq_domain_add is now
fwnode_handle based as well).
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-and-tested-by: Hanjun Guo <[email protected]>
Tested-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]>
Cc: Tomasz Nowicki <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Graeme Gregory <[email protected]>
Cc: Jake Oshins <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Just like we have irq_create_of_mapping, irq_create_fwspec_mapping
creates a IRQ domain mapping for an interrupt described in a
struct irq_fwspec.
irq_create_of_mapping gets rewritten in terms of the new function,
and the hack we introduced before gets removed (now that no stacked
irqchip uses of_phandle_args anymore).
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-and-tested-by: Hanjun Guo <[email protected]>
Tested-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]>
Cc: Tomasz Nowicki <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Graeme Gregory <[email protected]>
Cc: Jake Oshins <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
So far the closest thing to a generic IRQ specifier structure is
of_phandle_args, which happens to be pretty OF specific (the of_node
pointer in there is quite annoying).
Let's introduce 'struct irq_fwspec' that can be used in place of
of_phandle_args for OF, but also for other firmware implementations
(that'd be ACPI). This is used together with a new 'translate' method
that is the pendent of 'xlate'.
We convert irq_create_of_mapping to use this new structure (with a
small hack that will be removed later).
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-and-tested-by: Hanjun Guo <[email protected]>
Tested-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]>
Cc: Tomasz Nowicki <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Graeme Gregory <[email protected]>
Cc: Jake Oshins <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
So far, our irq domains are still looked up by device node.
Let's change this and allow a domain to be looked up using
a fwnode_handle pointer.
The existing interfaces are preserved with a couple of helpers.
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-and-tested-by: Hanjun Guo <[email protected]>
Tested-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]>
Cc: Tomasz Nowicki <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Graeme Gregory <[email protected]>
Cc: Jake Oshins <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Now that we have everyone accessing the of_node field via the
irq_domain_get_of_node accessor, it is pretty easy to swap it
for a pointer to a fwnode_handle.
This translates into a few limited changes in __irq_domain_add,
and an updated irq_domain_get_of_node.
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-and-tested-by: Hanjun Guo <[email protected]>
Tested-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]>
Cc: Tomasz Nowicki <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Graeme Gregory <[email protected]>
Cc: Jake Oshins <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The struct irq_domain contains a "struct device_node *" field
(of_node) that is almost the only link between the irqdomain
and the device tree infrastructure.
In order to prepare for the removal of that field, convert all
users to use irq_domain_get_of_node() instead.
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-and-tested-by: Hanjun Guo <[email protected]>
Tested-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]>
Cc: Tomasz Nowicki <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Graeme Gregory <[email protected]>
Cc: Jake Oshins <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Bring in upstream updates for patches which depend on them
|
|
since eBPF programs and maps use kernel memory consider it 'locked' memory
from user accounting point of view and charge it against RLIMIT_MEMLOCK limit.
This limit is typically set to 64Kbytes by distros, so almost all
bpf+tracing programs would need to increase it, since they use maps,
but kernel charges maximum map size upfront.
For example the hash map of 1024 elements will be charged as 64Kbyte.
It's inconvenient for current users and changes current behavior for root,
but probably worth doing to be consistent root vs non-root.
Similar accounting logic is done by mmap of perf_event.
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
(except R10+Imm which is used to compute stack addresses)
- comparison of pointers
(except if (map_value_ptr == 0) ... )
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps
Spill/fill of pointers into stack is allowed, but mangling
of pointers stored in the stack or reading them byte by byte is not.
Within bpf programs the pointers do exist, since programs need to
be able to access maps, pass skb pointer to LD_ABS insns, etc
but programs cannot pass such pointer values to the outside
or obfuscate them.
Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
so that socket filters (tcpdump), af_packet (quic acceleration)
and future kcm can use it.
tracing and tc cls/act program types still require root permissions,
since tracing actually needs to be able to see all kernel pointers
and tc is for root only.
For example, the following unprivileged socket filter program is allowed:
int bpf_prog1(struct __sk_buff *skb)
{
u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
u64 *value = bpf_map_lookup_elem(&my_map, &index);
if (value)
*value += skb->len;
return 0;
}
but the following program is not:
int bpf_prog1(struct __sk_buff *skb)
{
u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
u64 *value = bpf_map_lookup_elem(&my_map, &index);
if (value)
*value += (u64) skb;
return 0;
}
since it would leak the kernel address into the map.
Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns
The feature is controlled by sysctl kernel.unprivileged_bpf_disabled.
This toggle defaults to off (0), but can be set true (1). Once true,
bpf programs and maps cannot be accessed from unprivileged process,
and the toggle cannot be set back to false.
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
|
|
changes
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When using idle=poll, the preemptoff tracer is always showing
the idle task as the culprit for long latencies. That happens
because critical timings are not stopped before idle loop. This
patch stops critical timings before entering the idle loop,
starting it again after the idle loop.
This problem does not affect the irqsoff tracer because
interruptions are enabled before entering the idle loop.
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Reviewed-by: Luis Claudio R. Goncalves <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/10fc3705874aef11dbe152a068b591a7be1899b4.1444314899.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
In apply_slack(), find_last_bit() is applied to a bitmask consisting
of precisely BITS_PER_LONG bits. Since mask is non-zero, we might as
well eliminate the function call and use __fls() directly. On x86_64,
this shaves 23 bytes of the only caller, mod_timer().
This also gets rid of Coverity CID 1192106, but that is a false
positive: Coverity is not aware that mask != 0 implies that
find_last_bit will not return BITS_PER_LONG.
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: John Stultz <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Signed-off-by: Guillaume Gomez <[email protected]>
Cc: John Stultz <[email protected]>
Link: http://lkml.kernel.org/r/CAAOQCfSDgmqSWDBsetau%2ByF8x0%[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"Fix a long standing state race in finish_task_switch()"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix TASK_DEAD race in finish_task_switch()
|