| Age | Commit message (Collapse) | Author | Files | Lines |
|
Since commit 383776fa7527 ("locking/lockdep: Handle statically initialized
PER_CPU locks properly") we try to collapse per-cpu locks into a single
class by giving them all the same key. For this key we choose the canonical
address of the per-cpu object, which would be the offset into the per-cpu
area.
This has two problems:
- there is a case where we run !0 lock->key through static_obj() and
expect this to pass; it doesn't for canonical pointers.
- 0 is a valid canonical address.
Cure both issues by redefining the canonical address as the address of the
per-cpu variable on the boot CPU.
Since I didn't want to rely on CPU0 being the boot-cpu, or even existing at
all, track the boot CPU in a variable.
Fixes: 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: kernel test robot <[email protected]>
Cc: LKP <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
On some hardware models (e.g. Dell Studio 1555 laptop) some hardware
related functions (e.g. SMIs) are to be executed on physical CPU 0
only. Instead of open coding such a functionality multiple times in
the kernel add a service function for this purpose. This will enable
the possibility to take special measures in virtualized environments
like Xen, too.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Install the callbacks via the state machine. They are installed at runtime so
smpcfd_prepare_cpu() needs to be invoked by the boot-CPU.
Signed-off-by: Richard Weinberger <[email protected]>
[ Added the dropped CPU dying case back in. ]
Signed-off-by: Richard Cochran <[email protected]>
Signed-off-by: Anna-Maria Gleixner <[email protected]>
Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
Cc: Davidlohr Bueso <dave@stgolabs>
Cc: Linus Torvalds <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Yes, it should work, but it's a bad idea. Not only did ARM64 not have
the 16-bit access code (there's a separate patch to add it), it's just
not a good atomic type. Some architectures fundamentally don't do
atomic accesses in them (alpha), and it's not like it saves any space
here anyway because of structure packing issues.
We normally should aim for flags to be "unsigned int" or "unsigned
long". And if space is at a premium, use a single byte (although that
causes problems on alpha again). There might be very special cases
where a 16-byte entity is really wanted, but this is not one of them.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The UP local API support can be set up from an early initcall. No need
for horrible hackery in the init code.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Currently kick_all_cpus_sync() can break non-polling idle cpus
thru IPI interrupts.
But sometimes we need to break the polling idle cpus immediately
to reselect the suitable c-state, also for non-idle cpus, we need
to do nothing if we try to wake up them.
Here adding one new function wake_up_all_idle_cpus() to let all cpus
out of idle based on function wake_up_if_idle().
Signed-off-by: Chuansheng Liu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrew Morton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Srivatsa S. Bhat <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
After all architectures were converted to the generic idle framework,
commit d190e8195b90 ("idle: Remove GENERIC_IDLE_LOOP config switch")
removed the last caller of cpu_idle(). The forward declarations in
header files were forgotten.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The name __smp_call_function_single() doesn't tell much about the
properties of this function, especially when compared to
smp_call_function_single().
The comments above the implementation are also misleading. The main
point of this function is actually not to be able to embed the csd
in an object. This is actually a requirement that result from the
purpose of this function which is to raise an IPI asynchronously.
As such it can be called with interrupts disabled. And this feature
comes at the cost of the caller who then needs to serialize the
IPIs on this csd.
Lets rename the function and enhance the comments so that they reflect
these properties.
Suggested-by: Christoph Hellwig <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The main point of calling __smp_call_function_single() is to send
an IPI in a pure asynchronous way. By embedding a csd in an object,
a caller can send the IPI without waiting for a previous one to complete
as is required by smp_call_function_single() for example. As such,
sending this kind of IPI can be safe even when irqs are disabled.
This flexibility comes at the expense of the caller who then needs to
synchronize the csd lifecycle by himself and make sure that IPIs on a
single csd are serialized.
This is how __smp_call_function_single() works when wait = 0 and this
usecase is relevant.
Now there don't seem to be any usecase with wait = 1 that can't be
covered by smp_call_function_single() instead, which is safer. Lets look
at the two possible scenario:
1) The user calls __smp_call_function_single(wait = 1) on a csd embedded
in an object. It looks like a nice and convenient pattern at the first
sight because we can then retrieve the object from the IPI handler easily.
But actually it is a waste of memory space in the object since the csd
can be allocated from the stack by smp_call_function_single(wait = 1)
and the object can be passed an the IPI argument.
Besides that, embedding the csd in an object is more error prone
because the caller must take care of the serialization of the IPIs
for this csd.
2) The user calls __smp_call_function_single(wait = 1) on a csd that
is allocated on the stack. It's ok but smp_call_function_single()
can do it as well and it already takes care of the allocation on the
stack. Again it's more simple and less error prone.
Therefore, using the underscore prepend API version with wait = 1
is a bad pattern and a sign that the caller can do safer and more
simple.
There was a single user of that which has just been converted.
So lets remove this option to discourage further users.
Cc: Andrew Morton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Align __smp_call_function_single() with smp_call_function_single() so
that it also checks whether requested cpu is still online.
Signed-off-by: Jan Kara <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Now that we got rid of all the remaining code which fiddled with csd.list,
lets remove it.
Signed-off-by: Jan Kara <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use what we already do for arch_disable_smp_support() to fix these:
arch/x86/kernel/smpboot.c:1155:6: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
arch/x86/kernel/smpboot.c:1160:6: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
kernel/cpu.c:512:13: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
kernel/cpu.c:516:13: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
Signed-off-by: Paul Gortmaker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make smp_call_function_single and friends more efficient by using a
lockless list.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
x86_64 allnoconfig:
kernel/up.c:25: error: redefinition of '__smp_call_function_single'
include/linux/smp.h:154: note: previous definition of '__smp_call_function_single' was here
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We've switched over every architecture that supports SMP to it, so
remove the new useless config variable.
Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
watchdog_tresh controls how often nmi perf event counter checks per-cpu
hrtimer_interrupts counter and blows up if the counter hasn't changed
since the last check. The counter is updated by per-cpu
watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
period which guarantees that hrtimer is scheduled 2 times per the main
period. Both hrtimer and perf event are started together when the
watchdog is enabled.
So far so good. But...
But what happens when watchdog_thresh is updated from sysctl handler?
proc_dowatchdog will set a new sampling period and hrtimer callback
(watchdog_timer_fn) will use the new value in the next round. The
problem, however, is that nobody tells the perf event that the sampling
period has changed so it is ticking with the period configured when it
has been set up.
This might result in an ear ripping dissonance between perf and hrtimer
parts if the watchdog_thresh is increased. And even worse it might lead
to KABOOM if the watchdog is configured to panic on such a spurious
lockup.
This patch fixes the issue by updating both nmi perf even counter and
hrtimers if the threshold value has changed.
The nmi one is disabled and then reinitialized from scratch. This has
an unpleasant side effect that the allocation of the new event might
fail theoretically so the hard lockup detector would be disabled for
such cpus. On the other hand such a memory allocation failure is very
unlikely because the original event is deallocated right before.
It would be much nicer if we just changed perf event period but there
doesn't seem to be any API to do that right now. It is also unfortunate
that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
cannot use on_each_cpu() and do the same thing from the per-cpu context.
The update from the current CPU should be safe because
perf_event_disable removes the event atomically before it clears the
per-cpu watchdog_ev so it cannot change anything under running handler
feet.
The hrtimer is simply restarted (thanks to Don Zickus who has pointed
this out) if it is queued because we cannot rely it will fire&adopt to
the new sampling period before a new nmi event triggers (when the
treshold is decreased).
[[email protected]: the UP version of __smp_call_function_single ended up in the wrong place]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Don Zickus <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Fabio Estevam <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
All of the other non-trivial !SMP versions of functions in smp.h are
out-of-line in up.c. Move on_each_cpu() there as well.
This allows us to get rid of the #include <linux/irqflags.h>. The
drawback is that this makes both the x86_64 and i386 defconfig !SMP
kernels about 200 bytes larger each.
Signed-off-by: David Daney <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As in commit f21afc25f9ed ("smp.h: Use local_irq_{save,restore}() in
!SMP version of on_each_cpu()"), we don't want to enable irqs if they
are not already enabled. There are currently no known problematical
callers of these functions, but since it is a known failure pattern, we
preemptively fix them.
Since they are not trivial functions, make them non-inline by moving
them to up.c. This also makes it so we don't have to fix #include
dependancies for preempt_{disable,enable}.
Signed-off-by: David Daney <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Revert commit c846ef7deba2 ("include/linux/smp.h:on_each_cpu(): switch
back to a macro"). It turns out that the problematic linux/irqflags.h
include was fixed within ia64 and mn10300.
Cc: Geert Uytterhoeven <[email protected]>
Cc: David Daney <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit f21afc25f9ed ("smp.h: Use local_irq_{save,restore}() in !SMP
version of on_each_cpu()") converted on_each_cpu() to a C function.
This required inclusion of irqflags.h, which broke ia64 and mn10300 (at
least) due to header ordering hell.
Switch on_each_cpu() back to a macro to fix this.
Reported-by: Geert Uytterhoeven <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: David Daney <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: <[email protected]> [3.10.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Thanks to commit f91eb62f71b3 ("init: scream bloody murder if interrupts
are enabled too early"), "bloody murder" is now being screamed.
With a MIPS OCTEON config, we use on_each_cpu() in our
irq_chip.irq_bus_sync_unlock() function. This gets called in early as a
result of the time_init() call. Because the !SMP version of
on_each_cpu() unconditionally enables irqs, we get:
WARNING: at init/main.c:560 start_kernel+0x250/0x410()
Interrupts were enabled early
CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
Call Trace:
show_stack+0x68/0x80
warn_slowpath_common+0x78/0xb0
warn_slowpath_fmt+0x38/0x48
start_kernel+0x250/0x410
Suggested fix: Do what we already do in the SMP version of
on_each_cpu(), and use local_irq_save/local_irq_restore. Because we
need a flags variable, make it a static inline to avoid name space
issues.
[ Change from v1: Convert on_each_cpu to a static inline function, add
#include <linux/irqflags.h> to avoid build breakage on some files.
on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
on_each_cpu(), but they are not causing !SMP bugs for me, so I will
defer changing them to a less urgent patch. ]
Signed-off-by: David Daney <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The 'priv' field is redundant; we can pass data via 'info'.
Signed-off-by: liguang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
smp_call_function_single()
I'm testing swapout workload in a two-socket Xeon machine. The workload
has 10 threads, each thread sequentially accesses separate memory
region. TLB flush overhead is very big in the workload. For each page,
page reclaim need move it from active lru list and then unmap it. Both
need a TLB flush. And this is a multthread workload, TLB flush happens
in 10 CPUs. In X86, TLB flush uses generic smp_call)function. So this
workload stress smp_call_function_many heavily.
Without patch, perf shows:
+ 24.49% [k] generic_smp_call_function_interrupt
- 21.72% [k] _raw_spin_lock
- _raw_spin_lock
+ 79.80% __page_check_address
+ 6.42% generic_smp_call_function_interrupt
+ 3.31% get_swap_page
+ 2.37% free_pcppages_bulk
+ 1.75% handle_pte_fault
+ 1.54% put_super
+ 1.41% grab_super_passive
+ 1.36% __swap_duplicate
+ 0.68% blk_flush_plug_list
+ 0.62% swap_info_get
+ 6.55% [k] flush_tlb_func
+ 6.46% [k] smp_call_function_many
+ 5.09% [k] call_function_interrupt
+ 4.75% [k] default_send_IPI_mask_sequence_phys
+ 2.18% [k] find_next_bit
swapout throughput is around 1300M/s.
With the patch, perf shows:
- 27.23% [k] _raw_spin_lock
- _raw_spin_lock
+ 80.53% __page_check_address
+ 8.39% generic_smp_call_function_single_interrupt
+ 2.44% get_swap_page
+ 1.76% free_pcppages_bulk
+ 1.40% handle_pte_fault
+ 1.15% __swap_duplicate
+ 1.05% put_super
+ 0.98% grab_super_passive
+ 0.86% blk_flush_plug_list
+ 0.57% swap_info_get
+ 8.25% [k] default_send_IPI_mask_sequence_phys
+ 7.55% [k] call_function_interrupt
+ 7.47% [k] smp_call_function_many
+ 7.25% [k] flush_tlb_func
+ 3.81% [k] _raw_spin_lock_irqsave
+ 3.78% [k] generic_smp_call_function_single_interrupt
swapout throughput is around 1400M/s. So there is around a 7%
improvement, and total cpu utilization doesn't change.
Without the patch, cfd_data is shared by all CPUs.
generic_smp_call_function_interrupt does read/write cfd_data several times
which will create a lot of cache ping-pong. With the patch, the data
becomes per-cpu. The ping-pong is avoided. And from the perf data, this
doesn't make call_single_queue lock contend.
Next step is to remove generic_smp_call_function_interrupt() from arch
code.
Signed-off-by: Shaohua Li <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
No users.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Srivatsa S. Bhat <[email protected]>
Cc: Rusty Russell <[email protected]>
|
|
There is no user of those APIs anymore, just remove it.
Signed-off-by: Yong Zhang <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Acked-by: Srivatsa S. Bhat <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Will replace the misnomed cpu_idle_wait() function which is copied a
gazillion times all over arch/*
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Preparatory patch to make the idle thread allocation for secondary
cpus generic.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Srivatsa S. Bhat <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Russell King <[email protected]>
Cc: Mike Frysinger <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Hirokazu Takata <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: David Howells <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
|
|
Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and
calculates the cpumask of cpus to IPI by calling a function supplied as a
parameter in order to determine whether to IPI each specific cpu.
The function works around allocation failure of cpumask variable in
CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time
via smp_call_function_single().
The function is useful since it allows to seperate the specific code that
decided in each case whether to IPI a specific cpu for a specific request
from the common boilerplate code of handling creating the mask, handling
failures etc.
[[email protected]: s/gfpflags/gfp_flags/]
[[email protected]: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func']
[[email protected]: s/CPU/CPUs, use all 80 cols in comment]
Signed-off-by: Gilad Ben-Yossef <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Matt Mackall <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Avi Kivity <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Kosaki Motohiro <[email protected]>
Cc: Milton Miller <[email protected]>
Reviewed-by: "Srivatsa S. Bhat" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We have lots of infrastructure in place to partition multi-core systems
such that we have a group of CPUs that are dedicated to specific task:
cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter.
Still, kernel code will at times interrupt all CPUs in the system via IPIs
for various needs. These IPIs are useful and cannot be avoided
altogether, but in certain cases it is possible to interrupt only specific
CPUs that have useful work to do and not the entire system.
This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying
to explore doing this by locating the places where such global IPI calls
are being made and turning the global IPI into an IPI for a specific group
of CPUs. The purpose of the patch set is to get feedback if this is the
right way to go for dealing with this issue and indeed, if the issue is
even worth dealing with at all. Based on the feedback from this patch set
I plan to offer further patches that address similar issue in other code
paths.
This patch creates an on_each_cpu_mask() and on_each_cpu_cond()
infrastructure API (the former derived from existing arch specific
versions in Tile and Arm) and uses them to turn several global IPI
invocation to per CPU group invocations.
Core kernel:
on_each_cpu_mask() calls a function on processors specified by cpumask,
which may or may not include the local processor.
You must not call this function with disabled interrupts or from a
hardware interrupt handler or from a bottom half handler.
arch/arm:
Note that the generic version is a little different then the Arm one:
1. It has the mask as first parameter
2. It calls the function on the calling CPU with interrupts disabled,
but this should be OK since the function is called on the other CPUs
with interrupts disabled anyway.
arch/tile:
The API is the same as the tile private one, but the generic version
also calls the function on the with interrupts disabled in UP case
This is OK since the function is called on the other CPUs
with interrupts disabled.
Signed-off-by: Gilad Ben-Yossef <[email protected]>
Reviewed-by: Christoph Lameter <[email protected]>
Acked-by: Chris Metcalf <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Matt Mackall <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Avi Kivity <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Kosaki Motohiro <[email protected]>
Cc: Milton Miller <[email protected]>
Cc: Russell King <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
enabling interrupts
There is a problem that kdump(2nd kernel) sometimes hangs up due
to a pending IPI from 1st kernel. Kernel panic occurs because IPI
comes before call_single_queue is initialized.
To fix the crash, rename init_call_single_data() to call_function_init()
and call it in start_kernel() so that call_single_queue can be
initialized before enabling interrupts.
The details of the crash are:
(1) 2nd kernel boots up
(2) A pending IPI from 1st kernel comes when irqs are first enabled
in start_kernel().
(3) Kernel tries to handle the interrupt, but call_single_queue
is not initialized yet at this point. As a result, in the
generic_smp_call_function_single_interrupt(), NULL pointer
dereference occurs when list_replace_init() tries to access
&q->list.next.
Therefore this patch changes the name of init_call_single_data()
to call_function_init() and calls it before local_irq_enable()
in start_kernel().
Signed-off-by: Takao Indoh <[email protected]>
Reviewed-by: WANG Cong <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: Vivek Goyal <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Milton Miller <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that powerpc has removed its use of MSG_ALL_BUT_SELF and MSG_ALL
all these MSG_ flags are unused.
Signed-off-by: Milton Miller <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
Commit 34db18a054c6 ("smp: move smp setup functions to kernel/smp.c")
causes this build error on s390 because of a missing init.h include:
CC arch/s390/kernel/asm-offsets.s
In file included from /home2/heicarst/linux-2.6/arch/s390/include/asm/spinlock.h:14:0,
from include/linux/spinlock.h:87,
from include/linux/seqlock.h:29,
from include/linux/time.h:8,
from include/linux/timex.h:56,
from include/linux/sched.h:57,
from arch/s390/kernel/asm-offsets.c:10:
include/linux/smp.h:117:20: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'setup_nr_cpu_ids'
include/linux/smp.h:118:20: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'smp_init'
Fix it by adding the include statement.
Signed-off-by: Heiko Carstens <[email protected]>
Acked-by: WANG Cong <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter
setup functions from init/main.c to kenrel/smp.c, saves some #ifdef
CONFIG_SMP.
Signed-off-by: WANG Cong <[email protected]>
Cc: Rakib Mullick <[email protected]>
Cc: David Howells <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Akinobu Mita <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Typedef the pointer to the function to be called by smp_call_function() and
friends:
typedef void (*smp_call_func_t)(void *info);
as it is used in a fair number of places.
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
|
|
smp: Fix documentation.
Fix documentation in include/linux/smp.h: smp_processor_id()
Signed-off-by: Rakib Mullick <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Andrew points out that acpi-cpufreq uses cpumask_any, when it really
would prefer to use the same CPU if possible (to avoid an IPI). In
general, this seems a good idea to offer.
[ tglx: Documented selection preference and Inlined the UP case to
avoid the copy of smp_call_function_single() and the extra
EXPORT ]
Signed-off-by: Rusty Russell <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Venkatesh Pallipadi <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Zhao Yakui <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: "Zhang, Yanmin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Everyone is now using smp_call_function_many().
Signed-off-by: Rusty Russell <[email protected]>
|
|
put_cpu_no_resched() is an optimization of put_cpu() which unfortunately
can cause high latencies.
The nfs iostats code uses put_cpu_no_resched() in a code sequence where a
reschedule request caused by an interrupt between the get_cpu() and the
put_cpu_no_resched() can delay the reschedule for at least HZ.
The other users of put_cpu_no_resched() optimize correctly in interrupt
code, but there is no real harm in using the put_cpu() function which is
an alias for preempt_enable(). The extra check of the preemmpt count is
not as critical as the potential source of missing a reschedule.
Debugged in the preempt-rt tree and verified in mainline.
Impact: remove a high latency source
[[email protected]: build fix]
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Impact: cleanup, no code changed
Remove an ugly #ifdef CONFIG_SMP from panic(), by providing
an smp_send_stop() wrapper on UP too.
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
|
|
Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
the code so that we can use CSD_FLAG_LOCK for both purposes.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Rusty Russell <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This function should be provided on UP too.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
disable_ioapic_setup()
Impact: cleanup
disable_ioapic_setup() in init/main.c is ugly as the function is
x86-specific. The #ifdef inline prototype there is ugly too.
Replace it with a generic arch_disable_smp_support() function - which
has a weak alias for non-x86 architectures and for non-ioapic x86 builds.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If you do
smp_call_function_single(expression-with-side-effects, ...)
then expression-with-side-effects never gets evaluated on UP builds.
As always, implementing it in C is the correct thing to do.
While we're there, uninline it for size and possible header dependency
reasons.
And create a new kernel/up.c, as a place in which to put
uniprocessor-specific code and storage. It should mirror kernel/smp.c.
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Impact: Implementation change to remove cpumask_t from stack.
Actually change smp_call_function_mask() to smp_call_function_many().
We avoid cpumasks on the stack in this version.
(S390 has its own version, but that's going away apparently).
We have to do some dancing to figure out if 0 or 1 other cpus are in
the mask supplied and the online mask without allocating a tmp
cpumask. It's still fairly cheap.
We allocate the cpumask at the end of the call_function_data
structure: if allocation fails we fallback to smp_call_function_single
rather than using the baroque quiescing code (which needs a cpumask on
stack).
(Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!)
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Mike Travis <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
|
|
Impact: add new sysfs files.
Add sysfs files "kernel_max" and "offline" to display the max CPU index
allowed (NR_CPUS-1), and the map of cpus that are offline.
Cpus can be offlined via HOTPLUG, disabled by the BIOS ACPI tables, or
if they exceed the number of cpus allowed by the NR_CPUS config option,
or the "maxcpus=NUM" kernel start parameter.
The "possible_cpus=NUM" parameter can also extend the number of possible
cpus allowed, in which case the cpus not present at startup will be
in the offline state. (These cpus can be HOTPLUGGED ON after system
startup [pending a follow-on patch to provide the capability via the
/sys/devices/sys/cpu/cpuN/online mechanism to bring them online.])
By design, the "offlined cpus > possible cpus" display will always
use the following formats:
* all possible cpus online: "x$" or "x-y$"
* some possible cpus offline: ".*,x$" or ".*,x-y$"
where:
x == number of possible cpus (nr_cpu_ids); and
y == number of cpus >= NR_CPUS or maxcpus (if y > x).
One use of this feature is for distros to select (or configure) the
appropriate kernel to install for the resident system.
Notes:
* cpus offlined <= possible cpus will be printed for all architectures.
* cpus offlined > possible cpus will only be printed for arches that
set 'total_cpus' [X86 only in this patch].
Based on tip/cpus4096 + .../rusty/linux-2.6-for-ingo.git/master +
x86-only-patches sent 12/15.
Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
|
|
Otherwise those using it in transition patches (eg. kvm) can't compile
with CONFIG_SMP=n:
arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function 'make_all_cpus_request':
arch/x86/kvm/../../../virt/kvm/kvm_main.c:380: error: implicit declaration of function 'smp_call_function_many'
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Impact: introduce new APIs
We want to deprecate cpumasks on the stack, as we are headed for
gynormous numbers of CPUs. Eventually, we want to head towards an
undefined 'struct cpumask' so they can never be declared on stack.
1) New cpumask functions which take pointers instead of copies.
(cpus_* -> cpumask_*)
2) Several new helpers to reduce requirements for temporary cpumasks
(cpumask_first_and, cpumask_next_and, cpumask_any_and)
3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
(cpumask_var_t, alloc_cpumask_var and free_cpumask_var)
4) 'struct cpumask' for explicitness and to mark new-style code.
5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
not NR_CPUS for time efficiency and for smaller dynamic allocations
in future.
6) cpumask_copy() so we can allocate less than a full cpumask eventually
(for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
definition eventually.
7) work_on_cpu() helper for doing task on a CPU, rather than saving old
cpumask for current thread and manipulating it.
8) smp_call_function_many() which is smp_call_function_mask() except
taking a cpumask pointer.
Note that this patch simply introduces the new functions and leaves
the obsolescent ones in place. This is to simplify the transition
patches.
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This is basically a genericization of Jens Axboe's block layer
remote softirq changes.
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|