| Age | Commit message (Collapse) | Author | Files | Lines |
|
This function has a __weak definition and an override that is only used on
freescale powerpc chips. The powerpc definition however does not see the
declaration that is in a .c file:
arch/powerpc/kernel/dma-mask.c:7:6: error: no previous prototype for 'arch_dma_set_mask' [-Werror=missing-prototypes]
Move it into the linux/dma-map-ops.h header where the other arch_dma_* functions
are declared.
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Drivers have no business looking at dma-mapping or swiotlb internals.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
|
|
Commit e1572f1d08be ("cpu/SMT: create and export cpu_smt_possible()")
introduces cpu_smt_possible() to represent if SMT is theoretically
possible. It returns true when SMT is supported and not forcefully
disabled ('nosmt=force'). But the comment of it says "Returns true if
SMT is not supported of forcefully (irreversibly) disabled", which is
wrong. Fix that comment accordingly.
Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
commit 3795de236d67 ("genirq: Distangle kernel/irq/handle.c")
left behind this.
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
There is a possibility of deadlock if synchronize_hardirq() is called
when the nested threaded interrupt is active. The following scenario
was observed on a uniprocessor PREEMPT_NONE system:
Thread 1 Thread 2
handle_nested_thread()
Set INPROGRESS
Call ->thread_fn()
thread_fn goes to sleep
free_irq()
__synchronize_hardirq()
Busy-loop forever waiting for INPROGRESS
to be cleared
The INPROGRESS flag is only supposed to be used for hard interrupt
handlers. Remove the incorrect usage in the nested threaded interrupt
case and instead re-use the threads_active / wait_for_threads mechanism
to wait for nested threaded interrupts to complete.
Signed-off-by: Vincent Whitchurch <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Revert commit 7aa55f2a5902 ("sched/fair: Move unused stub functions to
header"), for while it has the right Changelog, the actual patch
content a revert of the previous 4 patches:
f7df852ad6db ("sched: Make task_vruntime_update() prototype visible")
c0bdfd72fbfb ("sched/fair: Hide unused init_cfs_bandwidth() stub")
378be384e01f ("sched: Add schedule_user() declaration")
d55ebae3f312 ("sched: Hide unused sched_update_scaling()")
So in effect this is a revert of a revert and re-applies those
patches.
Fixes: 7aa55f2a5902 ("sched/fair: Move unused stub functions to header")
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
|
|
We need the serial/tty fixes in here as well for testing and future
development.
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The creation of the trace event directory requires that a TRACE_SYSTEM is
defined that the trace event directory is added within the system it was
defined in.
The code handled the case where a TRACE_SYSTEM was not added, and would
then add the event at the events directory. But nothing should be doing
this. This code also prevents the implementation of creating dynamic
dentrys for the eventfs system.
As this path has never been hit on correct code, remove it. If it does get
hit, issues a WARN_ON_ONCE() and return ENODEV.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Ajay Kaher <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Currently we can resize trace ringbuffer by writing a value into file
'buffer_size_kb', then by reading the file, we get the value that is
usually what we wrote. However, this value may be not actual size of
trace ring buffer because of the round up when doing resize in kernel,
and the actual size would be more useful.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: <[email protected]>
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
As the trace iterator is created and used by various interfaces, the clean
up of it needs to be consistent. Create a free_trace_iter_content() helper
function that frees the content of the iterator and use that to clean it
up in all places that it is used.
Link: https://lkml.kernel.org/r/[email protected]
Cc: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The iterator allocated a descriptor to copy the current_trace. This was done
with the assumption that the function pointers might change. But this was a
false assuption, as it does not change. There's no reason to make a copy of the
current_trace and just use the pointer it points to. This removes needing to
manage freeing the descriptor. Worse yet, there's locations that the iterator
is used but does make a copy and just uses the pointer. This could cause the
actual pointer to the trace descriptor to be freed and not the allocated copy.
This is more of a clean up than a fix.
Link: https://lkml.kernel.org/r/[email protected]
Cc: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants")
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
ring_buffer.c. x86 CMPXCHG instruction returns success in ZF flag,
so this change saves a compare after cmpxchg (and related move
instruction in front of cmpxchg).
No functional change intended.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Steven Rostedt <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
For backward compatibility, older tooling expects to see the kernel_stack
event with a "caller" field that is a fixed size array of 8 addresses. The
code now supports more than 8 with an added "size" field that states the
real number of entries. But the "caller" field still just looks like a
fixed size to user space.
Since the tracing macros that create the user space format files also
creates the structures that those files represent, the kernel_stack event
structure had its "caller" field a fixed size of 8, but in reality, when
it is allocated on the ring buffer, it can hold more if the stack trace is
bigger that 8 functions. The copying of these entries was simply done with
a memcpy():
size = nr_entries * sizeof(unsigned long);
memcpy(entry->caller, fstack->calls, size);
The FORTIFY_SOURCE logic noticed at runtime that when the nr_entries was
larger than 8, that the memcpy() was writing more than what the structure
stated it can hold and it complained about it. This is because the
FORTIFY_SOURCE code is unaware that the amount allocated is actually
enough to hold the size. It does not expect that a fixed size field will
hold more than the fixed size.
This was originally solved by hiding the caller assignment with some
pointer arithmetic.
ptr = ring_buffer_data();
entry = ptr;
ptr += offsetof(typeof(*entry), caller);
memcpy(ptr, fstack->calls, size);
But it is considered bad form to hide from kernel hardening. Instead, make
it work nicely with FORTIFY_SOURCE by adding a new __stack_array() macro
that is specific for this one special use case. The macro will take 4
arguments: type, item, len, field (whereas the __array() macro takes just
the first three). This macro will act just like the __array() macro when
creating the code to deal with the format file that is exposed to user
space. But for the kernel, it will turn the caller field into:
type item[] __counted_by(field);
or for this instance:
unsigned long caller[] __counted_by(size);
Now the kernel code can expose the assignment of the caller to the
FORTIFY_SOURCE and everyone is happy!
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Sven Schnelle <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probe fixes from Masami Hiramatsu:
- probe-events: add NULL check for some BTF API calls which can return
error code and NULL.
- ftrace selftests: check fprobe and kprobe event correctly. This fixes
a miss condition of the test command.
- kprobes: do not allow probing functions that start with "__cfi_" or
"__pfx_" since those are auto generated for kernel CFI and not
executed.
* tag 'probes-fixes-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
kprobes: Prohibit probing on CFI preamble symbol
selftests/ftrace: Fix to check fprobe event eneblement
tracing/probes: Fix to add NULL check for BTF APIs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
- Fix a rtmutex race condition resulting from sharing of the sort key
between the lock waiters and the PI chain tree (->pi_waiters) of a
task by giving each tree their own sort key
* tag 'locking_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Fix task->pi_waiters integrity
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix to /sys/kernel/tracing/per_cpu/cpu*/stats read and entries.
If a resize shrinks the buffer it clears the read count to notify
readers that they need to reset. But the read count is also used for
accounting and this causes the numbers to be off. Instead, create a
separate variable to use to notify readers to reset.
- Fix the ref counts of the "soft disable" mode. The wrong value was
used for testing if soft disable mode should be enabled or disable,
but instead, just change the logic to do the enable and disable in
place when the SOFT_MODE is set or cleared.
- Several kernel-doc fixes
- Removal of unused external declarations
* tag 'trace-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix warning in trace_buffered_event_disable()
ftrace: Remove unused extern declarations
tracing: Fix kernel-doc warnings in trace_seq.c
tracing: Fix kernel-doc warnings in trace_events_trigger.c
tracing/synthetic: Fix kernel-doc warnings in trace_events_synth.c
ring-buffer: Fix kernel-doc warnings in ring_buffer.c
ring-buffer: Fix wrong stat of cpu_buffer->read
|
|
Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those
are used for CFI and not executed. Probing it will break the CFI.
Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
|
|
Warning happened in trace_buffered_event_disable() at
WARN_ON_ONCE(!trace_buffered_event_ref)
Call Trace:
? __warn+0xa5/0x1b0
? trace_buffered_event_disable+0x189/0x1b0
__ftrace_event_enable_disable+0x19e/0x3e0
free_probe_data+0x3b/0xa0
unregister_ftrace_function_probe_func+0x6b8/0x800
event_enable_func+0x2f0/0x3d0
ftrace_process_regex.isra.0+0x12d/0x1b0
ftrace_filter_write+0xe6/0x140
vfs_write+0x1c9/0x6f0
[...]
The cause of the warning is in __ftrace_event_enable_disable(),
trace_buffered_event_enable() was called once while
trace_buffered_event_disable() was called twice.
Reproduction script show as below, for analysis, see the comments:
```
#!/bin/bash
cd /sys/kernel/tracing/
# 1. Register a 'disable_event' command, then:
# 1) SOFT_DISABLED_BIT was set;
# 2) trace_buffered_event_enable() was called first time;
echo 'cmdline_proc_show:disable_event:initcall:initcall_finish' > \
set_ftrace_filter
# 2. Enable the event registered, then:
# 1) SOFT_DISABLED_BIT was cleared;
# 2) trace_buffered_event_disable() was called first time;
echo 1 > events/initcall/initcall_finish/enable
# 3. Try to call into cmdline_proc_show(), then SOFT_DISABLED_BIT was
# set again!!!
cat /proc/cmdline
# 4. Unregister the 'disable_event' command, then:
# 1) SOFT_DISABLED_BIT was cleared again;
# 2) trace_buffered_event_disable() was called second time!!!
echo '!cmdline_proc_show:disable_event:initcall:initcall_finish' > \
set_ftrace_filter
```
To fix it, IIUC, we can change to call trace_buffered_event_enable() at
fist time soft-mode enabled, and call trace_buffered_event_disable() at
last time soft-mode disabled.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: <[email protected]>
Fixes: 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events")
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Fix kernel-doc warning:
kernel/trace/trace_seq.c:142: warning: Function parameter or member
'args' not described in 'trace_seq_vprintf'
Link: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Fix kernel-doc warnings:
kernel/trace/trace_events_trigger.c:59: warning: Function parameter
or member 'buffer' not described in 'event_triggers_call'
kernel/trace/trace_events_trigger.c:59: warning: Function parameter
or member 'event' not described in 'event_triggers_call'
Link: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Fix kernel-doc warning:
kernel/trace/trace_events_synth.c:1257: warning: Function parameter
or member 'mod' not described in 'synth_event_gen_cmd_array_start'
Link: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Fix kernel-doc warnings:
kernel/trace/ring_buffer.c:954: warning: Function parameter or
member 'cpu' not described in 'ring_buffer_wake_waiters'
kernel/trace/ring_buffer.c:3383: warning: Excess function parameter
'event' description in 'ring_buffer_unlock_commit'
kernel/trace/ring_buffer.c:5359: warning: Excess function parameter
'cpu' description in 'ring_buffer_reset_online_cpus'
Link: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
When pages are removed in rb_remove_pages(), 'cpu_buffer->read' is set
to 0 in order to make sure any read iterators reset themselves. However,
this will mess 'entries' stating, see following steps:
# cd /sys/kernel/tracing/
# 1. Enlarge ring buffer prepare for later reducing:
# echo 20 > per_cpu/cpu0/buffer_size_kb
# 2. Write a log into ring buffer of cpu0:
# taskset -c 0 echo "hello1" > trace_marker
# 3. Read the log:
# cat per_cpu/cpu0/trace_pipe
<...>-332 [000] ..... 62.406844: tracing_mark_write: hello1
# 4. Stop reading and see the stats, now 0 entries, and 1 event readed:
# cat per_cpu/cpu0/stats
entries: 0
[...]
read events: 1
# 5. Reduce the ring buffer
# echo 7 > per_cpu/cpu0/buffer_size_kb
# 6. Now entries became unexpected 1 because actually no entries!!!
# cat per_cpu/cpu0/stats
entries: 1
[...]
read events: 0
To fix it, introduce 'page_removed' field to count total removed pages
since last reset, then use it to let read iterators reset themselves
instead of changing the 'read' pointer.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: <[email protected]>
Cc: <[email protected]>
Fixes: 83f40318dab0 ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
In internal testing of test_maps, we sometimes observed failures like:
test_maps: test_maps.c:173: void test_hashmap_percpu(unsigned int, void *):
Assertion `bpf_map_update_elem(fd, &key, value, BPF_ANY) == 0' failed.
where the errno is ENOMEM. After some troubleshooting and enabling
the warnings, we saw:
[ 91.304708] percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left
[ 91.304716] CPU: 51 PID: 24145 Comm: test_maps Kdump: loaded Tainted: G N 6.1.38-smp-DEV #7
[ 91.304719] Hardware name: Google Astoria/astoria, BIOS 0.20230627.0-0 06/27/2023
[ 91.304721] Call Trace:
[ 91.304724] <TASK>
[ 91.304730] [<ffffffffa7ef83b9>] dump_stack_lvl+0x59/0x88
[ 91.304737] [<ffffffffa7ef83f8>] dump_stack+0x10/0x18
[ 91.304738] [<ffffffffa75caa0c>] pcpu_alloc+0x6fc/0x870
[ 91.304741] [<ffffffffa75ca302>] __alloc_percpu_gfp+0x12/0x20
[ 91.304743] [<ffffffffa756785e>] alloc_bulk+0xde/0x1e0
[ 91.304746] [<ffffffffa7566c02>] bpf_mem_alloc_init+0xd2/0x2f0
[ 91.304747] [<ffffffffa7547c69>] htab_map_alloc+0x479/0x650
[ 91.304750] [<ffffffffa751d6e0>] map_create+0x140/0x2e0
[ 91.304752] [<ffffffffa751d413>] __sys_bpf+0x5a3/0x6c0
[ 91.304753] [<ffffffffa751c3ec>] __x64_sys_bpf+0x1c/0x30
[ 91.304754] [<ffffffffa7ef847a>] do_syscall_64+0x5a/0x80
[ 91.304756] [<ffffffffa800009b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
This makes sense, because in atomic context, percpu allocation would
not create new chunks; it would only create in non-atomic contexts.
And if during prefill all precpu chunks are full, -ENOMEM would
happen immediately upon next unit_alloc.
Prefill phase does not actually run in atomic context, so we can
use this fact to allocate non-atomically with GFP_KERNEL instead
of GFP_NOWAIT. This avoids the immediate -ENOMEM.
GFP_NOWAIT has to be used in unit_alloc when bpf program runs
in atomic context. Even if bpf program runs in non-atomic context,
in most cases, rcu read lock is enabled for the program so
GFP_NOWAIT is still needed. This is often also the case for
BPF_MAP_UPDATE_ELEM syscalls.
Signed-off-by: YiFei Zhu <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Hou Tao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The kernel test robot reported compilation warnings when -Wparentheses is
added to KBUILD_CFLAGS with gcc compiler. The following is the error message:
.../bpf-next/kernel/bpf/verifier.c: In function ‘coerce_reg_to_size_sx’:
.../bpf-next/kernel/bpf/verifier.c:5901:14:
error: suggest parentheses around comparison in operand of ‘==’ [-Werror=parentheses]
if (s64_max >= 0 == s64_min >= 0) {
~~~~~~~~^~~~
.../bpf-next/kernel/bpf/verifier.c: In function ‘coerce_subreg_to_size_sx’:
.../bpf-next/kernel/bpf/verifier.c:5965:14:
error: suggest parentheses around comparison in operand of ‘==’ [-Werror=parentheses]
if (s32_min >= 0 == s32_max >= 0) {
~~~~~~~~^~~~
To fix the issue, add proper parentheses for the above '>=' condition
to silence the warning/error.
I tried a few clang compilers like clang16 and clang18 and they do not emit
such warnings with -Wparentheses.
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Yonghong Song <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
If the dev is known (e.g. a devm-based sys_off_handler is used), it can
be passed to the handler's callback to have it available there.
Otherwise, cb_data might be set to the dev in most of the cases.
Reviewed-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Benjamin Bara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lee Jones <[email protected]>
|
|
As the emergency restart does not call kernel_restart_prepare(), the
system_state stays in SYSTEM_RUNNING.
Since bae1d3a05a8b, this hinders i2c_in_atomic_xfer_mode() from becoming
active, and therefore might lead to avoidable warnings in the restart
handlers, e.g.:
[ 12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0
[ 12.676926] Voluntary context switch within RCU read-side critical section!
...
[ 12.742376] schedule_timeout from wait_for_completion_timeout+0x90/0x114
[ 12.749179] wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70
...
[ 12.994527] atomic_notifier_call_chain from machine_restart+0x34/0x58
[ 13.001050] machine_restart from panic+0x2a8/0x32c
Avoid these by setting the correct system_state.
Fixes: bae1d3a05a8b ("i2c: core: remove use of in_atomic()")
Cc: [email protected] # v5.2+
Reviewed-by: Dmitry Osipenko <[email protected]>
Tested-by: Nishanth Menon <[email protected]>
Signed-off-by: Benjamin Bara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lee Jones <[email protected]>
|
|
Add support to the /sys/devices/system/cpu/smt/control interface for
enabling a specified number of SMT threads per core, including partial
SMT states where not all threads are brought online.
The current interface accepts "on" and "off", to enable either 1 or all
SMT threads per core.
This commit allows writing an integer, between 1 and the number of SMT
threads supported by the machine. Writing 1 is a synonym for "off", 2 or
more enables SMT with the specified number of threads.
When reading the file, if all threads are online "on" is returned, to
avoid changing behaviour for existing users. If some other number of
threads is online then the integer value is returned.
Architectures like x86 only supporting 1 thread or all threads, should not
define CONFIG_SMT_NUM_THREADS_DYNAMIC. Architecture supporting partial SMT
states, like PowerPC, should define it.
[ ldufour: Slightly reword the commit's description ]
[ ldufour: Remove switch() in __store_smt_control() ]
[ ldufour: Rix build issue in control_show() ]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Some architectures allows partial SMT states, i.e. when not all SMT threads
are brought online.
To support that, add an architecture helper which checks whether a given
CPU is allowed to be brought online depending on how many SMT threads are
currently enabled. Since this is only applicable to architecture supporting
partial SMT, only these architectures should select the new configuration
variable CONFIG_SMT_NUM_THREADS_DYNAMIC. For the other architectures, not
supporting the partial SMT states, there is no need to define
topology_cpu_smt_allowed(), the generic code assumed that all the threads
are allowed or only the primary ones.
Call the helper from cpu_smt_enable(), and cpu_smt_allowed() when SMT is
enabled, to check if the particular thread should be onlined. Notably,
also call it from cpu_smt_disable() if CPU_SMT_ENABLED, to allow
offlining some threads to move from a higher to lower number of threads
online.
[ ldufour: Slightly reword the commit's description ]
[ ldufour: Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC ]
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Since the maximum number of threads is now passed to cpu_smt_set_num_threads(),
checking that value is enough to know whether SMT is supported.
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Some architectures allow partial SMT states at boot time, ie. when not all
SMT threads are brought online.
To support that the SMT code needs to know the maximum number of SMT
threads, and also the currently configured number.
The architecture code knows the max number of threads, so have the
architecture code pass that value to cpu_smt_set_num_threads(). Note that
although topology_max_smt_threads() exists, it is not configured early
enough to be used here. As architecture, like PowerPC, allows the threads
number to be set through the kernel command line, also pass that value.
[ ldufour: Slightly reword the commit message ]
[ ldufour: Rename cpu_smt_check_topology and add a num_threads argument ]
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Move the simple exit cases, i.e. those which don't depend on the value
written, earlier in the function. That makes it clearer that regardless of
the input those states cannot be transitioned out of.
That does have a user-visible effect, in that the error returned will
now always be EPERM/ENODEV for those states, regardless of the value
written. Previously writing an invalid value would return EINVAL even
when in those states.
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In order to export the cpuhp_smt_control enum as part of the interface
between generic and architecture code, the architecture code needs to
include asm/topology.h.
But that leads to circular header dependencies. So split the enum and
related declarations into a separate header.
[ ldufour: Reworded the commit's description ]
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The commit 18415f33e2ac ("cpu/hotplug: Allow "parallel" bringup up to
CPUHP_BP_KICK_AP_STATE") introduce a dependancy against a global variable
cpu_primary_thread_mask exported by the X86 code. This variable is only
used when CONFIG_HOTPLUG_PARALLEL is set.
Since cpuhp_get_primary_thread_mask() and cpuhp_smt_aware() are only used
when CONFIG_HOTPLUG_PARALLEL is set, don't define them when it is not set.
No functional change.
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add asm support for new instructions so kernel verifier and bpftool
xlated insn dumps can have proper asm syntax for new instructions.
Acked-by: Eduard Zingerman <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add interpreter/jit/verifier support for 32bit offset jmp instruction.
If a conditional jmp instruction needs more than 16bit offset,
it can be simulated with a conditional jmp + a 32bit jmp insn.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Handle new insns properly in bpf_jit_blind_insn() function.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add interpreter/jit support for new signed div/mod insns.
The new signed div/mod instructions are encoded with
unsigned div/mod instructions plus insn->off == 1.
Also add basic verifier support to ensure new insns get
accepted.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The existing 'be' and 'le' insns will do conditional bswap
depends on host endianness. This patch implements
unconditional bswap insns.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Currently, if user accesses a ctx member with signed types,
the compiler will generate an unsigned load followed by
necessary left and right shifts.
With the introduction of sign-extension load, compiler may
just emit a ldsx insn instead. Let us do a final movsx sign
extension to the final unsigned ctx load result to
satisfy original sign extension requirement.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add interpreter/jit support for new sign-extension mov insns.
The original 'MOV' insn is extended to support reg-to-reg
signed version for both ALU and ALU64 operations. For ALU mode,
the insn->off value of 8 or 16 indicates sign-extension
from 8- or 16-bit value to 32-bit value. For ALU64 mode,
the insn->off value of 8/16/32 indicates sign-extension
from 8-, 16- or 32-bit value to 64-bit value.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add interpreter/jit support for new sign-extension load insns
which adds a new mode (BPF_MEMSX).
Also add verifier support to recognize these insns and to
do proper verification with new insns. In verifier, besides
to deduce proper bounds for the dst_reg, probed memory access
is also properly handled.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
Mark the time KUnit test, time64_to_tm_test_date_range, as slow using test
attributes.
This test ran relatively much slower than most other KUnit tests.
By marking this test as slow, the test can now be filtered using the KUnit
test attribute filtering feature. Example: --filter "speed>slow". This will
run only the tests that have speeds faster than slow. The slow attribute
will also be outputted in KTAP.
Reviewed-by: David Gow <[email protected]>
Signed-off-by: Rae Moar <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
Commit eda0047296a1 ("mm: make the page fault mmap locking killable")
intentionally made it much easier to trigger the "page fault fails
because a fatal signal is pending" situation, by having the mmap locking
fail early in that case.
We have long aborted page faults in other fatal cases when the actual IO
for a page is interrupted by SIGKILL - which is particularly useful for
the traditional case of NFS hanging due to network issues, but local
filesystems could cause it too if you happened to get the SIGKILL while
waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()).
So aborting the page fault wasn't a new condition - but it now triggers
earlier, before we even get to 'handle_mm_fault()'. And as a result the
error doesn't go through our 'fault_signal_pending()' logic, and doesn't
get filtered away there.
Normally you'd never even notice, because if a fatal signal is pending,
the new SIGSEGV we send ends up being ignored anyway.
But it turns out that there is one very noticeable exception: if you
enable 'show_unhandled_signals', the aborted page fault will be logged
in the kernel messages, and you'll get a scary line looking something
like this in your logs:
pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0)
which is rather misleading. It's not really a segfault at all, it's
just "the thread was killed before the page fault completed, so we
aborted the page fault".
Fix this by just making it clear that a pending fatal signal means that
any new signal coming in after that is implicitly handled. This will
avoid the misleading logging, since now the signal isn't 'unhandled' any
more.
Reported-and-tested-by: Fiona Ebner <[email protected]>
Tested-by: Thomas Lamprecht <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Acked-by: Oleg Nesterov <[email protected]>
Fixes: eda0047296a1 ("mm: make the page fault mmap locking killable")
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The flags of the child of a given scheduling domain are used to initialize
the flags of its scheduling groups. When the child of a scheduling domain
is degenerated, the flags of its local scheduling group need to be updated
to align with the flags of its new child domain.
The flag SD_SHARE_CPUCAPACITY was aligned in
Commit bf2dc42d6beb ("sched/topology: Propagate SMT flags when removing degenerate domain").
Further generalize this alignment so other flags can be used later, such as
in cluster-based task wakeup. [1]
Reported-by: Yicong Yang <[email protected]>
Suggested-by: Ricardo Neri <[email protected]>
Signed-off-by: Chen Yu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Tim Chen <[email protected]>
Reviewed-by: Yicong Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
There is no need to use runnable_avg when estimating util_est and that
even generates wrong behavior because one includes blocked tasks whereas
the other one doesn't. This can lead to accounting twice the waking task p,
once with the blocked runnable_avg and another one when adding its
util_est.
cpu's runnable_avg is already used when computing util_avg which is then
compared with util_est.
In some situation, feec will not select prev_cpu but another one on the
same performance domain because of higher max_util
Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'")
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Tested-by: Dietmar Eggemann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Since find_btf_func_param() abd btf_type_by_id() can return NULL,
the caller must check the return value correctly.
Link: https://lore.kernel.org/all/169024903951.395371.11361556840733470934.stgit@devnote2/
Fixes: b576e09701c7 ("tracing/probes: Support function parameters if BTF is available")
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
|
|
Splitting these out into separate helper functions means that we
actually pass an uninitialized variable into another function call
if dec_active() happens to not be inlined, and CONFIG_PREEMPT_RT
is disabled:
kernel/bpf/memalloc.c: In function 'add_obj_to_free_list':
kernel/bpf/memalloc.c:200:9: error: 'flags' is used uninitialized [-Werror=uninitialized]
200 | dec_active(c, flags);
Avoid this by passing the flags by reference, so they either get
initialized and dereferenced through a pointer, or the pointer never
gets accessed at all.
Fixes: 18e027b1c7c6d ("bpf: Factor out inc/dec of active flag into helpers.")
Suggested-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|