Age | Commit message (Collapse) | Author | Files | Lines |
|
When the boot CPU initializes the per-CPU data on behalf of all possible
CPUs, a sanity check is performed on each of them to make sure none is
initialized in an extended quiescent state.
This check involves a full memory barrier which is useless at this early
boot stage.
Do a plain access instead.
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Reviewed-by: Neeraj Upadhyay <[email protected]>
|
|
When the grace period kthread checks the extended quiescent state
counter of a CPU, full ordering is necessary to ensure that either:
* If the GP kthread observes the remote target in an extended quiescent
state, then that target must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it exits that extended quiescent state.
or:
* If the GP kthread observes the remote target NOT in an extended
quiescent state, then the target further entering in an extended
quiescent state must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it enters that extended quiescent state.
This ordering is enforced through a full memory barrier placed right
before taking the first EQS snapshot. However this is superfluous
because the snapshot is taken while holding the target's rnp lock which
provides the necessary ordering through its chain of
smp_mb__after_unlock_lock().
Remove the needless explicit barrier before the snapshot and put a
comment about the implicit barrier newly relied upon here.
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Reviewed-by: Neeraj Upadhyay <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
When the grace period kthread checks the extended quiescent state
counter of a CPU, full ordering is necessary to ensure that either:
* If the GP kthread observes the remote target in an extended quiescent
state, then that target must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it exits that extended quiescent state.
or:
* If the GP kthread observes the remote target NOT in an extended
quiescent state, then the target further entering in an extended
quiescent state must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it enters that extended quiescent state.
This ordering is enforced through a full memory barrier placed right
before taking the first EQS snapshot. However this is superfluous
because the snapshot is taken while holding the target's rnp lock which
provides the necessary ordering through its chain of
smp_mb__after_unlock_lock().
Remove the needless explicit barrier before the snapshot and put a
comment about the implicit barrier newly relied upon here.
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Reviewed-by: Neeraj Upadhyay <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
When the grace period kthread checks the extended quiescent state
counter of a CPU, full ordering is necessary to ensure that either:
* If the GP kthread observes the remote target in an extended quiescent
state, then that target must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it exits that extended quiescent state. Also the GP kthread must
observe all accesses performed by the target prior it entering in
EQS.
or:
* If the GP kthread observes the remote target NOT in an extended
quiescent state, then the target further entering in an extended
quiescent state must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it enters that extended quiescent state. Also the GP kthread later
observing that EQS must also observe all accesses performed by the
target prior it entering in EQS.
This ordering is explicitly performed both on the first EQS snapshot
and on the second one as well through the combination of a preceding
full barrier followed by an acquire read. However the second snapshot's
full memory barrier is redundant and not needed to enforce the above
guarantees:
GP kthread Remote target
---- -----
// Access prior GP
WRITE_ONCE(A, 1)
// first snapshot
smp_mb()
x = smp_load_acquire(EQS)
// Access prior GP
WRITE_ONCE(B, 1)
// EQS enter
// implied full barrier by atomic_add_return()
atomic_add_return(RCU_DYNTICKS_IDX, EQS)
// implied full barrier by atomic_add_return()
READ_ONCE(A)
// second snapshot
y = smp_load_acquire(EQS)
z = READ_ONCE(B)
If the GP kthread above fails to observe the remote target in EQS
(x not in EQS), the remote target will observe A == 1 after further
entering in EQS. Then the second snapshot taken by the GP kthread only
need to be an acquire read in order to observe z == 1.
Therefore remove the needless full memory barrier on second snapshot.
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Reviewed-by: Neeraj Upadhyay <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
e3f02f32a050 ("ionic: fix kernel panic due to multi-buffer handling")
d9c04209990b ("ionic: Mark error paths in the data path as unlikely")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Without BTF, attempting to load any sched_ext scheduler will result in
an error like the following:
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
This makes sched_ext pretty much unusable, so explicitly depend on
CONFIG_DEBUG_INFO_BTF to prevent these issues.
Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Correct eight to weight in the description of the .set_weight()
operation in sched_ext_ops.
Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"These are some bugfixes for system call ABI issues I found while
working on a cleanup series. None of these are urgent since these bugs
have gone unnoticed for many years, but I think we probably want to
backport them all to stable kernels, so it makes sense to have the
fixes included as early as possible.
One more fix addresses a compile-time warning in kallsyms that was
uncovered by a patch I did to enable additional warnings in 6.10. I
had mistakenly thought that this fix was already merged through the
module tree, but as Geert pointed out it was still missing"
* tag 'asm-generic-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
kallsyms: rework symbol lookup return codes
linux/syscalls.h: add missing __user annotations
syscalls: mmap(): use unsigned offset type consistently
s390: remove native mmap2() syscall
hexagon: fix fadvise64_64 calling conventions
csky, hexagon: fix broken sys_sync_file_range
sh: rework sync_file_range ABI
powerpc: restore some missing spu syscalls
parisc: use generic sys_fanotify_mark implementation
parisc: use correct compat recv/recvfrom syscalls
sparc: fix compat recv/recvfrom syscalls
sparc: fix old compat_sys_select()
syscalls: fix compat_sys_io_pgetevents_time64 usage
ftruncate: pass a signed offset
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, bpf and netfilter.
There are a bunch of regressions addressed here, but hopefully nothing
spectacular. We are still waiting the driver fix from Intel, mentioned
by Jakub in the previous networking pull.
Current release - regressions:
- core: add softirq safety to netdev_rename_lock
- tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed
TFO
- batman-adv: fix RCU race at module unload time
Previous releases - regressions:
- openvswitch: get related ct labels from its master if it is not
confirmed
- eth: bonding: fix incorrect software timestamping report
- eth: mlxsw: fix memory corruptions on spectrum-4 systems
- eth: ionic: use dev_consume_skb_any outside of napi
Previous releases - always broken:
- netfilter: fully validate NFT_DATA_VALUE on store to data registers
- unix: several fixes for OoB data
- tcp: fix race for duplicate reqsk on identical SYN
- bpf:
- fix may_goto with negative offset
- fix the corner case with may_goto and jump to the 1st insn
- fix overrunning reservations in ringbuf
- can:
- j1939: recover socket queue on CAN bus error during BAM
transmission
- mcp251xfd: fix infinite loop when xmit fails
- dsa: microchip: monitor potential faults in half-duplex mode
- eth: vxlan: pull inner IP header in vxlan_xmit_one()
- eth: ionic: fix kernel panic due to multi-buffer handling
Misc:
- selftest: unix tests refactor and a lot of new cases added"
* tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
net: mana: Fix possible double free in error handling path
selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c.
af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head.
selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c
selftest: af_unix: Check SIGURG after every send() in msg_oob.c
selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c
af_unix: Don't stop recv() at consumed ex-OOB skb.
selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c.
af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.
af_unix: Stop recv(MSG_PEEK) at consumed OOB skb.
selftest: af_unix: Add msg_oob.c.
selftest: af_unix: Remove test_unix_oob.c.
tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset()
netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
net: usb: qmi_wwan: add Telit FN912 compositions
tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
ionic: use dev_consume_skb_any outside of napi
net: dsa: microchip: fix wrong register write when masking interrupt
Fix race for duplicate reqsk on identical SYN
ibmvnic: Add tx check to prevent skb leak
...
|
|
Building with W=1 in some configurations produces a false positive
warning for kallsyms:
kernel/kallsyms.c: In function '__sprint_symbol.isra':
kernel/kallsyms.c:503:17: error: 'strcpy' source argument is the same as destination [-Werror=restrict]
503 | strcpy(buffer, name);
| ^~~~~~~~~~~~~~~~~~~~
This originally showed up while building with -O3, but later started
happening in other configurations as well, depending on inlining
decisions. The underlying issue is that the local 'name' variable is
always initialized to the be the same as 'buffer' in the called functions
that fill the buffer, which gcc notices while inlining, though it could
see that the address check always skips the copy.
The calling conventions here are rather unusual, as all of the internal
lookup functions (bpf_address_lookup, ftrace_mod_address_lookup,
ftrace_func_address_lookup, module_address_lookup and
kallsyms_lookup_buildid) already use the provided buffer and either return
the address of that buffer to indicate success, or NULL for failure,
but the callers are written to also expect an arbitrary other buffer
to be returned.
Rework the calling conventions to return the length of the filled buffer
instead of its address, which is simpler and easier to follow as well
as avoiding the warning. Leave only the kallsyms_lookup() calling conventions
unchanged, since that is called from 16 different functions and
adapting this would be a much bigger change.
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/[email protected]/
Tested-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Acked-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Two patches to fix kworker name formatting"
* tag 'wq-for-6.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Increase worker desc's length to 32
workqueue: Refactor worker ID formatting and make wq_worker_comm() use full ID string
|
|
accesses
Currently, it's possible to pass in a modified CONST_PTR_TO_DYNPTR to
a global function as an argument. The adverse effects of this is that
BPF helpers can continue to make use of this modified
CONST_PTR_TO_DYNPTR from within the context of the global function,
which can unintentionally result in out-of-bounds memory accesses and
therefore compromise overall system stability i.e.
[ 244.157771] BUG: KASAN: slab-out-of-bounds in bpf_dynptr_data+0x137/0x140
[ 244.161345] Read of size 8 at addr ffff88810914be68 by task test_progs/302
[ 244.167151] CPU: 0 PID: 302 Comm: test_progs Tainted: G O E 6.10.0-rc3-00131-g66b586715063 #533
[ 244.174318] Call Trace:
[ 244.175787] <TASK>
[ 244.177356] dump_stack_lvl+0x66/0xa0
[ 244.179531] print_report+0xce/0x670
[ 244.182314] ? __virt_addr_valid+0x200/0x3e0
[ 244.184908] kasan_report+0xd7/0x110
[ 244.187408] ? bpf_dynptr_data+0x137/0x140
[ 244.189714] ? bpf_dynptr_data+0x137/0x140
[ 244.192020] bpf_dynptr_data+0x137/0x140
[ 244.194264] bpf_prog_b02a02fdd2bdc5fa_global_call_bpf_dynptr_data+0x22/0x26
[ 244.198044] bpf_prog_b0fe7b9d7dc3abde_callback_adjust_bpf_dynptr_reg_off+0x1f/0x23
[ 244.202136] bpf_user_ringbuf_drain+0x2c7/0x570
[ 244.204744] ? 0xffffffffc0009e58
[ 244.206593] ? __pfx_bpf_user_ringbuf_drain+0x10/0x10
[ 244.209795] bpf_prog_33ab33f6a804ba2d_user_ringbuf_callback_const_ptr_to_dynptr_reg_off+0x47/0x4b
[ 244.215922] bpf_trampoline_6442502480+0x43/0xe3
[ 244.218691] __x64_sys_prlimit64+0x9/0xf0
[ 244.220912] do_syscall_64+0xc1/0x1d0
[ 244.223043] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 244.226458] RIP: 0033:0x7ffa3eb8f059
[ 244.228582] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 1d 0d 00 f7 d8 64 89 01 48
[ 244.241307] RSP: 002b:00007ffa3e9c6eb8 EFLAGS: 00000206 ORIG_RAX: 000000000000012e
[ 244.246474] RAX: ffffffffffffffda RBX: 00007ffa3e9c7cdc RCX: 00007ffa3eb8f059
[ 244.250478] RDX: 00007ffa3eb162b4 RSI: 0000000000000000 RDI: 00007ffa3e9c7fb0
[ 244.255396] RBP: 00007ffa3e9c6ed0 R08: 00007ffa3e9c76c0 R09: 0000000000000000
[ 244.260195] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffff80
[ 244.264201] R13: 000000000000001c R14: 00007ffc5d6b4260 R15: 00007ffa3e1c7000
[ 244.268303] </TASK>
Add a check_func_arg_reg_off() to the path in which the BPF verifier
verifies the arguments of global function arguments, specifically
those which take an argument of type ARG_PTR_TO_DYNPTR |
MEM_RDONLY. Also, process_dynptr_func() doesn't appear to perform any
explicit and strict type matching on the supplied register type, so
let's also enforce that a register either type PTR_TO_STACK or
CONST_PTR_TO_DYNPTR is by the caller.
Reported-by: Kumar Kartikeya Dwivedi <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Matt Bobrowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
On a ~2000 CPU powerpc system, hard lockups have been observed in the
workqueue code when stop_machine runs (in this case due to CPU hotplug).
This is due to lots of CPUs spinning in multi_cpu_stop, calling
touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
and that can find itself in the same cacheline as other important
workqueue data, which slows down operations to the point of lockups.
In the case of the following abridged trace, worker_pool_idr was in
the hot line, causing the lockups to always appear at idr_find.
watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
Call Trace:
get_work_pool
__queue_work
call_timer_fn
run_timer_softirq
__do_softirq
do_softirq_own_stack
irq_exit
timer_interrupt
decrementer_common_virt
* interrupt: 900 (timer) at multi_cpu_stop
multi_cpu_stop
cpu_stopper_thread
smpboot_thread_fn
kthread
Fix this by having wq_watchdog_touch() only write to the line if the
last time a touch was recorded exceeds 1/4 of the watchdog threshold.
Reported-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Warn in the case it is called with cpu == -1. This does not appear
to happen anywhere.
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
The hrtimer function callback must not be NULL. It has to be specified by
the call side but it is not validated by the hrtimer code. When a hrtimer
is queued without a function callback, the kernel crashes with a null
pointer dereference when trying to execute the callback in __run_hrtimer().
Introduce a validation before queuing the hrtimer in
hrtimer_start_range_ns().
[anna-maria: Rephrase commit message]
Signed-off-by: Phil Chang <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Anna-Maria Behnsen <[email protected]>
|
|
Using sys_io_pgetevents() as the entry point for compat mode tasks
works almost correctly, but misses the sign extension for the min_nr
and nr arguments.
This was addressed on parisc by switching to
compat_sys_io_pgetevents_time64() in commit 6431e92fc827 ("parisc:
io_pgetevents_time64() needs compat syscall in 32-bit compat mode"),
as well as by using more sophisticated system call wrappers on x86 and
s390. However, arm64, mips, powerpc, sparc and riscv still have the
same bug.
Change all of them over to use compat_sys_io_pgetevents_time64()
like parisc already does. This was clearly the intention when the
function was originally added, but it got hooked up incorrectly in
the tables.
Cc: [email protected]
Fixes: 48166e6ea47d ("y2038: add 64-bit time_t syscalls to all 32-bit architectures")
Acked-by: Heiko Carstens <[email protected]> # s390
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
This reverts commit f03e8c1060f86c23eb49bafee99d9fcbd1c1bd77.
Let's roll back all of the serial core and printk console changes that
went into 6.10-rc1 as there still are problems with them that need to be
sorted out.
Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1
Reported-by: Petr Mladek <[email protected]>
Reported-by: Tony Lindgren <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ilpo Järvinen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This reverts commit 8a831c584e6e80cf68f79893dc395c16cdf47dc8.
Let's roll back all of the serial core and printk console changes that
went into 6.10-rc1 as there still are problems with them that need to be
sorted out.
Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1
Reported-by: Petr Mladek <[email protected]>
Reported-by: Tony Lindgren <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ilpo Järvinen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This reverts commit b73c9cbe4f1fc02645228aa575998dd54067f8ef.
Let's roll back all of the serial core and printk console changes that
went into 6.10-rc1 as there still are problems with them that need to be
sorted out.
Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1
Reported-by: Petr Mladek <[email protected]>
Reported-by: Tony Lindgren <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ilpo Järvinen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
If hung_task_panic is enabled, don't consider the value of
hung_task_warnings and display the information of the hung tasks.
In some cases, hung_task_panic might not be initially set up, after
several hung tasks occur, the hung_task_warnings count reaches zero. If
hung_task_panic is set up later, it may not display any helpful hung task
info in dmesg, only showing messages like:
Kernel panic - not syncing: hung_task: blocked tasks
CPU: 3 PID: 58 Comm: khungtaskd Not tainted 6.10.0-rc3 #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
panic+0x2f3/0x320
watchdog+0x2dd/0x510
? __pfx_watchdog+0x10/0x10
kthread+0xe0/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yongliang Gao <[email protected]>
Reviewed-by: Huang Cun <[email protected]>
Cc: Joel Granados <[email protected]>
Cc: John Siddle <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Following warning is reported, so remove these duplicated header
including:
./kernel/crash_reserve.c: linux/kexec.h is included more than once.
This is just a clean code, no logic changed.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Wenchao Hao <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
With nearly 20 taint flags and respective characters, it's getting a bit
difficult to remember what each taint flag character means. Add verbose
logging of the set taints in the format:
Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN
in dump_stack_print_info() when there are taints.
Note that the "negative flag" G is not included.
Link: https://lkml.kernel.org/r/7321e306166cb2ca2807ab8639e665baa2462e9c.1717146197.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Make it easier to extend struct taint_flags in follow-up.
Link: https://lkml.kernel.org/r/8a2498285d37953cfad9dce939ed3abef61051bd.1717146197.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert print_tainted() to use struct seq_buf internally in order to be
more aware of the buffer constraints as well as make it easier to extend
in follow-up work.
Link: https://lkml.kernel.org/r/cb6006fa7c0f82a6b6885e8eea2920fcdc4fc9d0.1717146197.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Reduce indent to make follow-up changes slightly easier on the eyes.
Link: https://lkml.kernel.org/r/01d6c03de1c9d1b52b59c652a3704a0a9886ed63.1717146197.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
After adding min_heap_sift_up(), the naming convention has been adjusted
to maintain consistency with the min_heap_sift_up(). Consequently,
min_heapify() has been renamed to min_heap_sift_down().
Link: https://lkml.kernel.org/CAP-5=fVcBAxt8Mw72=NCJPRJfjDaJcqk4rjbadgouAEAHz_q1A@mail.gmail.com
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Wei Chiu <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Bagas Sanjaya <[email protected]>
Cc: Brian Foster <[email protected]>
Cc: Ching-Chun (Jim) Huang <[email protected]>
Cc: Coly Li <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Sakai <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add a third parameter 'args' for the 'less' and 'swp' functions in the
'struct min_heap_callbacks'. This additional parameter allows these
comparison and swap functions to handle extra arguments when necessary.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Wei Chiu <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Bagas Sanjaya <[email protected]>
Cc: Brian Foster <[email protected]>
Cc: Ching-Chun (Jim) Huang <[email protected]>
Cc: Coly Li <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Sakai <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Implement a type-safe interface for min_heap using strong type pointers
instead of void * in the data field. This change includes adding small
macro wrappers around functions, enabling the use of __minheap_cast and
__minheap_obj_size macros for type casting and obtaining element size.
This implementation removes the necessity of passing element size in
min_heap_callbacks. Additionally, introduce the MIN_HEAP_PREALLOCATED
macro for preallocating some elements.
Link: https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Wei Chiu <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Bagas Sanjaya <[email protected]>
Cc: Brian Foster <[email protected]>
Cc: Ching-Chun (Jim) Huang <[email protected]>
Cc: Coly Li <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Sakai <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "treewide: Refactor heap related implementation", v6.
This patch series focuses on several adjustments related to heap
implementation. Firstly, a type-safe interface has been added to the
min_heap, along with the introduction of several new functions to enhance
its functionality. Additionally, the heap implementation for bcache and
bcachefs has been replaced with the generic min_heap implementation from
include/linux. Furthermore, several typos have been corrected.
Previous discussion with Kent Overstreet:
https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu
This patch (of 16):
Replace 'artifically' with 'artificially'.
Replace 'irrespecive' with 'irrespective'.
Replace 'futher' with 'further'.
Replace 'sufficent' with 'sufficient'.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Wei Chiu <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Bagas Sanjaya <[email protected]>
Cc: Brian Foster <[email protected]>
Cc: Ching-Chun (Jim) Huang <[email protected]>
Cc: Coly Li <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Sakai <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use this_cpu_try_cmpxchg() instead of this_cpu_cmpxchg (*ptr, old, new) ==
old in try_release_thread_stack_to_cache. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg (and
related move instruction in front of cmpxchg).
No functional change intended.
[[email protected]: simplify the for loop a bit]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uros Bizjak <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Fix the 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/backtracetest.o
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jeff Johnson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-06-24
We've added 12 non-merge commits during the last 10 day(s) which contain
a total of 10 files changed, 412 insertions(+), 16 deletions(-).
The main changes are:
1) Fix a BPF verifier issue validating may_goto with a negative offset,
from Alexei Starovoitov.
2) Fix a BPF verifier validation bug with may_goto combined with jump to
the first instruction, also from Alexei Starovoitov.
3) Fix a bug with overrunning reservations in BPF ring buffer,
from Daniel Borkmann.
4) Fix a bug in BPF verifier due to missing proper var_off setting related
to movsx instruction, from Yonghong Song.
5) Silence unnecessary syzkaller-triggered warning in __xdp_reg_mem_model(),
from Daniil Dulov.
* tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xdp: Remove WARN() from __xdp_reg_mem_model()
selftests/bpf: Add tests for may_goto with negative offset.
bpf: Fix may_goto with negative offset.
selftests/bpf: Add more ring buffer test coverage
bpf: Fix overrunning reservations in ringbuf
selftests/bpf: Tests with may_goto and jumps to the 1st insn
bpf: Fix the corner case with may_goto and jump to the 1st insn.
bpf: Update BPF LSM maintainer list
bpf: Fix remap of arena.
selftests/bpf: Add a few tests to cover
bpf: Add missed var_off setting in coerce_subreg_to_size_sx()
bpf: Add missed var_off setting in set_sext32_default_val()
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When kernel has pending uretprobes installed, it hijacks original user
function return address on the stack with a uretprobe trampoline
address. There could be multiple such pending uretprobes (either on
different user functions or on the same recursive one) at any given
time within the same task.
This approach interferes with the user stack trace capture logic, which
would report suprising addresses (like 0x7fffffffe000) that correspond
to a special "[uprobes]" section that kernel installs in the target
process address space for uretprobe trampoline code, while logically it
should be an address somewhere within the calling function of another
traced user function.
This is easy to correct for, though. Uprobes subsystem keeps track of
pending uretprobes and records original return addresses. This patch is
using this to do a post-processing step and restore each trampoline
address entries with correct original return address. This is done only
if there are pending uretprobes for current task.
This is a similar approach to what fprobe/kretprobe infrastructure is
doing when capturing kernel stack traces in the presence of pending
return probes.
Link: https://lore.kernel.org/all/[email protected]/
Reported-by: Riham Selim <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
|
|
The per-CPU flush lists, which are accessed from within the NAPI callback
(xdp_do_flush() for instance), are per-CPU. There are subject to the
same problem as struct bpf_redirect_info.
Add the per-CPU lists cpu_map_flush_list, dev_map_flush_list and
xskmap_map_flush_list to struct bpf_net_context. Add wrappers for the
access. The lists initialized on first usage (similar to
bpf_net_ctx_get_ri()).
Cc: "Björn Töpel" <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Jonathan Lemon <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Maciej Fijalkowski <[email protected]>
Cc: Magnus Karlsson <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Yonghong Song <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The XDP redirect process is two staged:
- bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
packet and makes decisions. While doing that, the per-CPU variable
bpf_redirect_info is used.
- Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
and it may also access other per-CPU variables like xskmap_flush_list.
At the very end of the NAPI callback, xdp_do_flush() is invoked which
does not access bpf_redirect_info but will touch the individual per-CPU
lists.
The per-CPU variables are only used in the NAPI callback hence disabling
bottom halves is the only protection mechanism. Users from preemptible
context (like cpu_map_kthread_run()) explicitly disable bottom halves
for protections reasons.
Without locking in local_bh_disable() on PREEMPT_RT this data structure
requires explicit locking.
PREEMPT_RT has forced-threaded interrupts enabled and every
NAPI-callback runs in a thread. If each thread has its own data
structure then locking can be avoided.
Create a struct bpf_net_context which contains struct bpf_redirect_info.
Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
it, bpf_net_ctx_clear() removes it again.
The bpf_net_ctx_set() may nest. For instance a function can be used from
within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
NET_TX_SOFTIRQ which does not. Therefore only the first invocations
updates the pointer.
Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
bpf_redirect_info. The returned data structure is zero initialized to
ensure nothing is leaked from stack. This is done on first usage of the
struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to
note that initialisation is required. First invocation of
bpf_net_ctx_get_ri() will memset() the data structure and update
bpf_redirect_info::kern_flags.
bpf_redirect_info::nh is excluded from memset because it is only used
once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is
moved past nh to exclude it from memset.
The pointer to bpf_net_context is saved task's task_struct. Using
always the bpf_net_context approach has the advantage that there is
almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.
Cc: Andrii Nakryiko <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Yonghong Song <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add local_lock_nested_bh() locking. It is based on local_lock_t and the
naming follows the preempt_disable_nested() example.
For !PREEMPT_RT + !LOCKDEP it is a per-CPU annotation for locking
assumptions based on local_bh_disable(). The macro is optimized away
during compilation.
For !PREEMPT_RT + LOCKDEP the local_lock_nested_bh() is reduced to
the usual lock-acquire plus lockdep_assert_in_softirq() - ensuring that
BH is disabled.
For PREEMPT_RT local_lock_nested_bh() acquires the specified per-CPU
lock. It does not disable CPU migration because it relies on
local_bh_disable() disabling CPU migration.
With LOCKDEP it performans the usual lockdep checks as with !PREEMPT_RT.
Due to include hell the softirq check has been moved spinlock.c.
The intention is to use this locking in places where locking of a per-CPU
variable relies on BH being disabled. Instead of treating disabled
bottom halves as a big per-CPU lock, PREEMPT_RT can use this to reduce
the locking scope to what actually needs protecting.
A side effect is that it also documents the protection scope of the
per-CPU variables.
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Zac's syzbot crafted a bpf prog that exposed two bugs in may_goto.
The 1st bug is the way may_goto is patched. When offset is negative
it should be patched differently.
The 2nd bug is in the verifier:
when current state may_goto_depth is equal to visited state may_goto_depth
it means there is an actual infinite loop. It's not correct to prune
exploration of the program at this point.
Note, that this check doesn't limit the program to only one may_goto insn,
since 2nd and any further may_goto will increment may_goto_depth only
in the queued state pushed for future exploration. The current state
will have may_goto_depth == 0 regardless of number of may_goto insns
and the verifier has to explore the program until bpf_exit.
Fixes: 011832b97b31 ("bpf: Introduce may_goto instruction")
Reported-by: Zac Ecob <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Closes: https://lore.kernel.org/bpf/CAADnVQL-15aNp04-cyHRn47Yv61NXfYyhopyZtUyxNojUZUXpA@mail.gmail.com/
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Kernel test robot reports that kernel build fails with
resilient split BTF changes.
Examining the associated config and code we see that
btf_relocate_id() is defined under CONFIG_DEBUG_INFO_BTF_MODULES.
Moving it outside the #ifdef solves the issue.
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Alan Maguire <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
After the rework of "Parallel CPU bringup", the cmdline "nosmp" and
"maxcpus=0" parameters are not working anymore. These parameters set
setup_max_cpus to zero and that's handed to bringup_nonboot_cpus().
The code there does a decrement before checking for zero, which brings it
into the negative space and brings up all CPUs.
Add a zero check at the beginning of the function to prevent this.
[ tglx: Massaged change log ]
Fixes: 18415f33e2ac4ab382 ("cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE")
Fixes: 06c6796e0304234da6 ("cpu/hotplug: Fix off by one in cpuhp_bringup_mask()")
Signed-off-by: Huacai Chen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
Fixup the incomplete kernel-doc style comments for do_adjtimex() and
hardpps() by documenting the function parameters.
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9301
|
|
The scx_bpf_cpuperf_set() kfunc allows a BPF program to set the relative
performance target of a specified CPU. Commit d86adb4fc065 ("sched_ext: Add
cpuperf support") defined the @cpu argument to be unsigned. Let's update it
to be signed to match the norm for the rest of ext.c and the kernel.
Note that the kfunc declaration of scx_bpf_cpuperf_set() in the
common.bpf.h header in tools/sched_ext already listed the cpu as signed, so
this also fixes the build for tools/sched_ext and the sched_ext selftests
due to kfunc declarations now being emitted in vmlinux.h based on BTF (thus
causing the compiler to error due to observing conflicting types).
Fixes: d86adb4fc065 ("sched_ext: Add cpuperf support")
Signed-off-by: David Vernet <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Modify the comment formatting in irq_find_matching_fwspec function to
enhance code readability and maintain consistency.
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
sched_ext currently does not integrate with schedutil. When schedutil is the
governor, frequencies are left unregulated and usually get stuck close to
the highest performance level from running RT tasks.
Add CPU performance monitoring and scaling support by integrating into
schedutil. The following kfuncs are added:
- scx_bpf_cpuperf_cap(): Query the relative performance capacity of
different CPUs in the system.
- scx_bpf_cpuperf_cur(): Query the current performance level of a CPU
relative to its max performance.
- scx_bpf_cpuperf_set(): Set the current target performance level of a CPU.
This gives direct control over CPU performance setting to the BPF scheduler.
The only changes on the schedutil side are accounting for the utilization
factor from sched_ext and disabling frequency holding heuristics as it may
not apply well to sched_ext schedulers which may have a lot weaker
connection between tasks and their current / last CPU.
With cpuperf support added, there is no reason to block uclamp. Enable while
at it.
A toy implementation of cpuperf is added to scx_qmap as a demonstration of
the feature.
v2: Ignore cpu_util_cfs_boost() when scx_switched_all() in sugov_get_util()
to avoid factoring in stale util metric. (Christian)
Signed-off-by: Tejun Heo <[email protected]>
Reviewed-by: David Vernet <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Christian Loehle <[email protected]>
|
|
sugov_cpu_is_busy() is used to avoid decreasing performance level while the
CPU is busy and called by sugov_update_single_freq() and
sugov_update_single_perf(). Both callers repeat the same pattern to first
test for uclamp and then the business. Let's refactor so that the tests
aren't repeated.
The new helper is named sugov_hold_freq() and tests both the uclamp
exception and CPU business. No functional changes. This will make adding
more exception conditions easier.
Signed-off-by: Tejun Heo <[email protected]>
Reviewed-by: David Vernet <[email protected]>
Reviewed-by: Christian Loehle <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
|
|
A dying worker is first moved from pool->workers to pool->dying_workers
in set_worker_dying() and removed from pool->dying_workers in
detach_dying_workers(). The whole procedure is in the some lock context
of wq_pool_attach_mutex.
So pool->dying_workers is useless, just remove it and keep the dying
worker in pool->workers after set_worker_dying() and remove it in
detach_dying_workers() with wq_pool_attach_mutex held.
Cc: Valentin Schneider <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
The code to kick off the destruction of workers is now in a process
context (idle_cull_fn()), and the detaching of a worker is not required
to be inside the worker thread now, so just do the detaching directly
in idle_cull_fn().
wake_dying_workers() is renamed to detach_dying_workers() and the unneeded
wakeup in wake_dying_workers() is also removed.
Cc: Valentin Schneider <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
So that when the rescuer is woken up next time, it will not interrupt
the last working cpu which might be busy on other crucial works but
have nothing to do with the rescuer's incoming works.
Cc: Valentin Schneider <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
The code to kick off the destruction of workers is now in a process
context (idle_cull_fn()), so kthread_stop() can be used in the process
context to replace the work of pool->detach_completion.
The wakeup in wake_dying_workers() is unneeded after this change, but it
is harmless, jut keep it here until next patch renames wake_dying_workers()
rather than renaming it again and again.
Cc: Valentin Schneider <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Share relocation implementation with the kernel. As part of this,
we also need the type/string iteration functions so also share
btf_iter.c file. Relocation code in kernel and userspace is identical
save for the impementation of the reparenting of split BTF to the
relocated base BTF and retrieval of the BTF header from "struct btf";
these small functions need separate user-space and kernel implementations
for the separate "struct btf"s they operate upon.
One other wrinkle on the kernel side is we have to map .BTF.ids in
modules as they were generated with the type ids used at BTF encoding
time. btf_relocate() optionally returns an array mapping from old BTF
ids to relocated ids, so we use that to fix up these references where
needed for kfuncs.
Signed-off-by: Alan Maguire <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
...as this will allow split BTF modules with a base BTF
representation (rather than the full vmlinux BTF at time of
BTF encoding) to resolve their references to kernel types in a
way that is more resilient to small changes in kernel types.
This will allow modules that are not built every time the kernel
is to provide more resilient BTF, rather than have it invalidated
every time BTF ids for core kernel types change.
Fields are ordered to avoid holes in struct module.
Signed-off-by: Alan Maguire <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|