| Age | Commit message (Collapse) | Author | Files | Lines |
|
Commit:
5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG")
amended warn_slowpath_fmt() to disable preemption until the WARN splat
has been emitted.
However the commit neglected to reenable preemption in the !fmt codepath,
i.e. when a WARN splat is emitted without additional format string.
One consequence is that users may see more splats than intended. E.g. a
WARN splat emitted in a work item results in at least two extra splats:
BUG: workqueue leaked lock or atomic
(emitted by process_one_work())
BUG: scheduling while atomic
(emitted by worker_thread() -> schedule())
Ironically the point of the commit was to *avoid* extra splats. ;)
Fix it.
Fixes: 5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG")
Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Link: https://lore.kernel.org/r/3ec48fde01e4ee6505f77908ba351bad200ae3d1.1694763684.git.lukas@wunner.de
|
|
Current code charges modmem for regular trampoline, but not for struct_ops
trampoline. Add bpf_jit_[charge|uncharge]_modmem() to struct_ops so the
trampoline is charged in both cases.
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
The futex code already handles compound pages correctly, but using a folio
tells the compiler that there is already a reference to the head page and
it doesn't need to call compound_head() again.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When using rcutorture as a module, there are a number of conditions that
can abort the modprobe operation, for example, when attempting to run
both RCU CPU stall warning tests and forward-progress tests. This can
cause rcu_torture_cleanup() to be invoked on the unwind path out of
rcu_rcu_torture_init(), which will mean that rcu_gp_slow_unregister()
is invoked without a matching rcu_gp_slow_register(). This will cause
a splat because rcu_gp_slow_unregister() is passed rcu_fwd_cb_nodelay,
which does not match a NULL pointer.
This commit therefore forgives a mismatch involving a NULL pointer, thus
avoiding this false-positive splat.
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
When a structure containing an RCU callback rhp is (incorrectly) freed
and reallocated after rhp is passed to call_rcu(), it is not unusual for
rhp->func to be set to NULL. This defeats the debugging prints used by
__call_rcu_common() in kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y,
which expect to identify the offending code using the identity of this
function.
And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things
are even worse, as can be seen from this splat:
Unable to handle kernel NULL pointer dereference at virtual address 0
... ...
PC is at 0x0
LR is at rcu_do_batch+0x1c0/0x3b8
... ...
(rcu_do_batch) from (rcu_core+0x1d4/0x284)
(rcu_core) from (__do_softirq+0x24c/0x344)
(__do_softirq) from (__irq_exit_rcu+0x64/0x108)
(__irq_exit_rcu) from (irq_exit+0x8/0x10)
(irq_exit) from (__handle_domain_irq+0x74/0x9c)
(__handle_domain_irq) from (gic_handle_irq+0x8c/0x98)
(gic_handle_irq) from (__irq_svc+0x5c/0x94)
(__irq_svc) from (arch_cpu_idle+0x20/0x3c)
(arch_cpu_idle) from (default_idle_call+0x4c/0x78)
(default_idle_call) from (do_idle+0xf8/0x150)
(do_idle) from (cpu_startup_entry+0x18/0x20)
(cpu_startup_entry) from (0xc01530)
This commit therefore adds calls to mem_dump_obj(rhp) to output some
information, for example:
slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256
This provides the rough size of the memory block and the offset of the
rcu_head structure, which as least provides at least a few clues to help
locate the problem. If the problem is reproducible, additional slab
debugging can be enabled, for example, CONFIG_DEBUG_SLAB=y, which can
provide significantly more information.
Signed-off-by: Zhen Lei <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
When running a series of stress tests all making heavy use of RCU,
it is all too possible to OOM the system when the prior test's RCU
callbacks don't get invoked until after the subsequent test starts.
One way of handling this is just a timed wait, but this fails when a
given CPU has so many callbacks queued that they take longer to invoke
than allowed for by that timed wait.
This commit therefore adds an rcutree.do_rcu_barrier module parameter that
is accessible from sysfs. Writing one of the many synonyms for boolean
"true" will cause an rcu_barrier() to be invoked, but will guarantee that
no more than one rcu_barrier() will be invoked per sixteenth of a second
via this mechanism. The flip side is that a given request might wait a
second or three longer than absolutely necessary, but only when there are
multiple uses of rcutree.do_rcu_barrier within a one-second time interval.
This commit unnecessarily serializes the rcu_barrier() machinery, given
that serialization is already provided by procfs. This has the advantage
of allowing throttled rcu_barrier() from other sources within the kernel.
Reported-by: Johannes Weiner <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The return keyword is not needed here.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The current error handling in init_srcu_struct_fields() is a bit
inconsistent. If init_srcu_struct_nodes() fails, the function either
returns -ENOMEM or 0 depending on whether ssp->sda_is_static is true or
false. This can make init_srcu_struct_fields() return 0 even if memory
allocation failed!
Simplify the error handling by always returning -ENOMEM if either
init_srcu_struct_nodes() or the per-CPU allocation fails. This makes the
control flow easier to follow and avoids the inconsistent return values.
Add goto labels to avoid duplicating the error cleanup code.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Add missing LOCKDOWN checks for eventfs callers
When LOCKDOWN is active for tracing, it causes inconsistent state
when some functions succeed and others fail.
- Use dput() to free the top level eventfs descriptor
There was a race between accesses and freeing it.
- Fix a long standing bug that eventfs exposed due to changing timings
by dynamically creating files. That is, If a event file is opened for
an instance, there's nothing preventing the instance from being
removed which will make accessing the files cause use-after-free
bugs.
- Fix a ring buffer race that happens when iterating over the ring
buffer while writers are active. Check to make sure not to read the
event meta data if it's beyond the end of the ring buffer sub buffer.
- Fix the print trigger that disappeared because the test to create it
was looking for the event dir field being filled, but now it has the
"ef" field filled for the eventfs structure.
- Remove the unused "dir" field from the event structure.
- Fix the order of the trace_dynamic_info as it had it backwards for
the offset and len fields for which one was for which endianess.
- Fix NULL pointer dereference with eventfs_remove_rec()
If an allocation fails in one of the eventfs_add_*() functions, the
caller of it in event_subsystem_dir() or event_create_dir() assigns
the result to the structure. But it's assigning the ERR_PTR and not
NULL. This was passed to eventfs_remove_rec() which expects either a
good pointer or a NULL, not ERR_PTR. The fix is to not assign the
ERR_PTR to the structure, but to keep it NULL on error.
- Fix list_for_each_rcu() to use list_for_each_srcu() in
dcache_dir_open_wrapper(). One iteration of the code used RCU but
because it had to call sleepable code, it had to be changed to use
SRCU, but one of the iterations was missed.
- Fix synthetic event print function to use "as_u64" instead of passing
in a pointer to the union. To fix big/little endian issues, the u64
that represented several types was turned into a union to define the
types properly.
* tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()
tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper()
tracing/synthetic: Print out u64 values properly
tracing/synthetic: Fix order of struct trace_dynamic_info
selftests/ftrace: Fix dependencies for some of the synthetic event tests
tracing: Remove unused trace_event_file dir field
tracing: Use the new eventfs descriptor for print trigger
ring-buffer: Do not attempt to read past "commit"
tracefs/eventfs: Free top level files on removal
ring-buffer: Avoid softlockup in ring_buffer_resize()
tracing: Have event inject files inc the trace array ref count
tracing: Have option files inc the trace array ref count
tracing: Have current_trace inc the trace array ref count
tracing: Have tracing_max_latency inc the trace array ref count
tracing: Increase trace array ref count on enable and filter files
tracefs/eventfs: Use dput to free the toplevel events directory
tracefs/eventfs: Add missing lockdown checks
tracefs: Add missing lockdown check to tracefs_create_dir()
|
|
For SMT4, any group with more than 2 tasks will be marked as
group_smt_balance. Retain the behaviour of group_has_spare by marking
the busiest group as the group which has the least number of idle_cpus.
Also, handle rounding effect of adding (ncores_local + ncores_busy) when
the local is fully idle and busy group imbalance is less than 2 tasks.
Local group should try to pull at least 1 task in this case so imbalance
should be set to 2 instead.
Fixes: fee1759e4f04 ("sched/fair: Determine active load balance for SMT sched groups")
Acked-by: Shrikanth Hegde <[email protected]>
Signed-off-by: Tim Chen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Random remaining guard use...
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
call_single_data_t is a size-aligned typedef of struct __call_single_data.
This alignment is desirable in order to have smp_call_function*() avoid
bouncing an extra cacheline in case of an unaligned csd, given this
would hurt performance.
Since the removal of struct request->csd in commit 660e802c76c8
("blk-mq: use percpu csd to remote complete instead of per-rq csd") there
are no current users of smp_call_function*() with unaligned csd.
Change every 'struct __call_single_data' function parameter to
'call_single_data_t', so we have warnings if any new code tries to
introduce an smp_call_function*() call with unaligned csd.
Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Commit 8ac04063354a ("swiotlb: reduce the number of areas to match
actual memory pool size") calculated the reduced number of areas in
swiotlb_init_remap() but didn't actually use the value. Replace usage of
default_nareas accordingly.
Fixes: 8ac04063354a ("swiotlb: reduce the number of areas to match actual memory pool size")
Signed-off-by: Ross Lagerwall <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Fix missing or extra function parameter kernel-doc warnings
in cgroup.c:
kernel/bpf/cgroup.c:1359: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_skb'
kernel/bpf/cgroup.c:1359: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_skb'
kernel/bpf/cgroup.c:1439: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_sk'
kernel/bpf/cgroup.c:1439: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_sk'
kernel/bpf/cgroup.c:1467: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_sock_addr'
kernel/bpf/cgroup.c:1467: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_sock_addr'
kernel/bpf/cgroup.c:1512: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_sock_ops'
kernel/bpf/cgroup.c:1512: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_sock_ops'
kernel/bpf/cgroup.c:1685: warning: Excess function parameter 'type' description in '__cgroup_bpf_run_filter_sysctl'
kernel/bpf/cgroup.c:1685: warning: Function parameter or member 'atype' not described in '__cgroup_bpf_run_filter_sysctl'
kernel/bpf/cgroup.c:795: warning: Excess function parameter 'type' description in '__cgroup_bpf_replace'
kernel/bpf/cgroup.c:795: warning: Function parameter or member 'new_prog' not described in '__cgroup_bpf_replace'
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: [email protected]
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
snprintf() does not return negative error code on error, it returns the
number of characters which *would* be generated for the given input.
Fix the error handling check.
Fixes: 57539b1c0ac2 ("bpf: Enable annotating trusted nested pointers")
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/393bdebc87b22563c08ace094defa7160eb7a6c0.1694190795.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct audit_chunk.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Paul Moore <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: "Gustavo A. R. Silva" <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
|
|
From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.
From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
to other BPF programs"), BPF program is able to trace other BPF programs.
How about combining them all together?
1. FENTRY/FEXIT on a BPF subprogram.
2. A tailcall runs in the BPF subprogram.
3. The tailcall calls the subprogram's caller.
As a result, a tailcall infinite loop comes up. And the loop would halt
the machine.
As we know, in tail call context, the tail_call_cnt propagates by stack
and rax register between BPF subprograms. So do in trampolines.
Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Leon Hwang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
------------[ cut here ]------------
WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
Call trace:
__alloc_pages+0x1ec/0x248
__kmalloc_large_node+0xc0/0x1f8
__kmalloc_node+0x134/0x1e8
rb_alloc_aux+0xe0/0x298
perf_mmap+0x440/0x660
mmap_region+0x308/0x8a8
do_mmap+0x3c0/0x528
vm_mmap_pgoff+0xf4/0x1b8
ksys_mmap_pgoff+0x18c/0x218
__arm64_sys_mmap+0x38/0x58
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x58/0x188
do_el0_svc+0x34/0x50
el0_svc+0x34/0x108
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x1a4/0x1a8
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Inject fault while probing btrfs.ko, if kstrdup() fails in
eventfs_prepare_ef() in eventfs_add_dir(), it will return ERR_PTR
to assign file->ef. But the eventfs_remove() check NULL in
trace_module_remove_events(), which causes the below NULL
pointer dereference.
As both Masami and Steven suggest, allocater side should handle the
error carefully and remove it, so fix the places where it failed.
Could not create tracefs 'raid56_write' directory
Btrfs loaded, zoned=no, fsverity=no
Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=0000000102544000
[000000000000001c] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in: btrfs(-) libcrc32c xor xor_neon raid6_pq cfg80211 rfkill 8021q garp mrp stp llc ipv6 [last unloaded: btrfs]
CPU: 15 PID: 1343 Comm: rmmod Tainted: G N 6.5.0+ #40
Hardware name: linux,dummy-virt (DT)
pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : eventfs_remove_rec+0x24/0xc0
lr : eventfs_remove+0x68/0x1d8
sp : ffff800082d63b60
x29: ffff800082d63b60 x28: ffffb84b80ddd00c x27: ffffb84b3054ba40
x26: 0000000000000002 x25: ffff800082d63bf8 x24: ffffb84b8398e440
x23: ffffb84b82af3000 x22: dead000000000100 x21: dead000000000122
x20: ffff800082d63bf8 x19: fffffffffffffff4 x18: ffffb84b82508820
x17: 0000000000000000 x16: 0000000000000000 x15: 000083bc876a3166
x14: 000000000000006d x13: 000000000000006d x12: 0000000000000000
x11: 0000000000000001 x10: 00000000000017e0 x9 : 0000000000000001
x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffffb84b84289804
x5 : 0000000000000000 x4 : 9696969696969697 x3 : ffff33a5b7601f38
x2 : 0000000000000000 x1 : ffff800082d63bf8 x0 : fffffffffffffff4
Call trace:
eventfs_remove_rec+0x24/0xc0
eventfs_remove+0x68/0x1d8
remove_event_file_dir+0x88/0x100
event_remove+0x140/0x15c
trace_module_notify+0x1fc/0x230
notifier_call_chain+0x98/0x17c
blocking_notifier_call_chain+0x4c/0x74
__arm64_sys_delete_module+0x1a4/0x298
invoke_syscall+0x44/0x100
el0_svc_common.constprop.1+0x68/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x3c/0xc4
el0t_64_sync_handler+0xa0/0xc4
el0t_64_sync+0x174/0x178
Code: 5400052c a90153b3 aa0003f3 aa0103f4 (f9401400)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: 0x384b00c00000 from 0xffff800080000000
PHYS_OFFSET: 0xffffcc5b80000000
CPU features: 0x88000203,3c020000,1000421b
Memory Limit: none
Rebooting in 1 seconds..
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Cc: Ajay Kaher <[email protected]>
Fixes: 5bdcd5f5331a ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Jinjie Ruan <[email protected]>
Suggested-by: Masami Hiramatsu (Google) <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Commit 5904de0d735b ("PM: hibernate: Do not get block device exclusively
in test_resume mode") fixes a hibernation issue under test_resume mode.
That commit is supposed to open the block device in non-exclusive mode
when in test_resume. However the code does the opposite, which is against
its description.
In summary, the swap device is only opened exclusively by swsusp_check()
with its corresponding *close(), and must be in non test_resume mode.
This is to avoid the race condition that different processes scribble the
device at the same time. All the other cases should use non-exclusive mode.
Fix it by really disabling exclusive mode under test_resume.
Fixes: 5904de0d735b ("PM: hibernate: Do not get block device exclusively in test_resume mode")
Closes: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Pengfei Xu <[email protected]>
Signed-off-by: Chen Yu <[email protected]>
Tested-by: Chenzhou Feng <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Several functions reply on snapshot_test to decide whether to
open the resume device exclusively. However there is no strict
connection between the snapshot_test and the open mode. Rename
the 'snapshot_test' input parameter to 'exclusive' to better reflect
the use case.
No functional change is expected.
Signed-off-by: Chen Yu <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Fix for a bug observable under the following sequence of events:
1. Create a network device that does not support XDP offload.
2. Load a device bound XDP program with BPF_F_XDP_DEV_BOUND_ONLY flag
(such programs are not offloaded).
3. Load a device bound XDP program with zero flags
(such programs are offloaded).
At step (2) __bpf_prog_dev_bound_init() associates with device (1)
a dummy bpf_offload_netdev struct with .offdev field set to NULL.
At step (3) __bpf_prog_dev_bound_init() would reuse dummy struct
allocated at step (2).
However, downstream usage of the bpf_offload_netdev assumes that
.offdev field can't be NULL, e.g. in bpf_prog_offload_verifier_prep().
Adjust __bpf_prog_dev_bound_init() to require bpf_offload_netdev
with non-NULL .offdev for offloaded BPF programs.
Fixes: 2b3486bc2d23 ("bpf: Introduce device-bound XDP programs")
Reported-by: [email protected]
Closes: https://lore.kernel.org/bpf/[email protected]/
Signed-off-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
The bcachefs module wants it, and there doesn't seem to be any
reason it shouldn't be exported like the other functions.
Signed-off-by: Christopher James Halse Rogers <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Sysbot discovered that the queue and stack maps can deadlock if they are
being used from a BPF program that can be called from NMI context (such as
one that is attached to a perf HW counter event). To fix this, add an
in_nmi() check and use raw_spin_trylock() in NMI context, erroring out if
grabbing the lock fails.
Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Reported-by: Hsin-Wei Hung <[email protected]>
Tested-by: Hsin-Wei Hung <[email protected]>
Co-developed-by: Hsin-Wei Hung <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The synth traces incorrectly print pointer to the synthetic event values
instead of the actual value when using u64 type. Fix by addressing the
contents of the union properly.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Fixes: ddeea494a16f ("tracing/synthetic: Use union instead of casts")
Cc: [email protected]
Signed-off-by: Tero Kristo <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The refscale.verbose_batched and refscale.lookup_instances module
parameters are omitted from the ref_scale_print_module_parms()
beginning-of-test output. This commit therefore adds them.
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
This commit fixes a misplaced data re-read in the typesafe code.
The reason that this was not noticed is that this is a performance test
with no writers, so a mismatch could not occur.
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The rcu_tasks_lazy_ms variable is not used outside the file tasks.h,
so this commit marks it static.
kernel/rcu/tasks.h:1085:5: warning: symbol 'rcu_tasks_lazy_ms' was not declared. Should it be static?
Reported-by: Abaci Robot <[email protected]>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6086
Signed-off-by: Jiapeng Chong <[email protected]>
Reviewed-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The rcu_tasks_need_gpcb() samples ->percpu_dequeue_lim as part of the
condition clause of a "for" loop, which is a bit confusing. This commit
therefore hoists this sampling out of the loop, using the result loaded
in the condition clause.
So why does this work in the face of a concurrent switch from single-CPU
queueing to per-CPU queueing?
o The call_rcu_tasks_generic() that makes the change has already
enqueued its callback, which means that all of the other CPU's
callback queues are empty.
o For the call_rcu_tasks_generic() that first notices
the switch to per-CPU queues, the smp_store_release()
used to update ->percpu_enqueue_lim pairs with the
raw_spin_trylock_rcu_node()'s full barrier that is
between the READ_ONCE(rtp->percpu_enqueue_shift) and the
rcu_segcblist_enqueue() that enqueues the callback.
o Because this CPU's queue is empty (unless it happens to
be the original single queue, in which case there is no
need for synchronization), this call_rcu_tasks_generic()
will do an irq_work_queue() to schedule a handler for the
needed rcuwait_wake_up() call. This call will be ordered
after the first call_rcu_tasks_generic() function's change to
->percpu_dequeue_lim.
o This rcuwait_wake_up() will either happen before or after the
set_current_state() in rcuwait_wait_event(). If it happens
before, the "condition" argument's call to rcu_tasks_need_gpcb()
will be ordered after the original change, and all callbacks on
all CPUs will be visible. Otherwise, if it happens after, then
the grace-period kthread's state will be set back to running,
which will result in a later call to rcuwait_wait_event() and
thus to rcu_tasks_need_gpcb(), which will again see the change.
So it all works out.
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Currently, rcu_tasks_initiate_self_tests() prints a message and then
initiates self tests on up to three different RCU Tasks flavors. If one
of the flavors has a grace-period hang, it is not easy to work out which
of the three hung. This commit therefore prints a message prior to each
individual test.
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
There are instances where rcu_cpu_stall_reset() is called when jiffies
did not get a chance to update for a long time. Before jiffies is
updated, the CPU stall detector can go off triggering false-positives
where a just-started grace period appears to be ages old. In the past,
we disabled stall detection in rcu_cpu_stall_reset() however this got
changed [1]. This is resulting in false-positives in KGDB usecase [2].
Fix this by deferring the update of jiffies to the third run of the FQS
loop. This is more robust, as, even if rcu_cpu_stall_reset() is called
just before jiffies is read, we would end up pushing out the jiffies
read by 3 more FQS loops. Meanwhile the CPU stall detection will be
delayed and we will not get any false positives.
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
Tested with rcutorture.cpu_stall option as well to verify stall behavior
with/without patch.
Tested-by: Huacai Chen <[email protected]>
Reported-by: Binbin Zhou <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Suggested-by: Paul McKenney <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: a80be428fbc1 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()")
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
This commit registers an RCU CPU stall notifier when testing RCU CPU
stalls. The notifier logs a message similar to the following:
rcu_torture_stall_nf: v=1, duration=21001.
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
It is sometimes helpful to have a way for the subsystem causing
the stall to dump its state when an RCU CPU stall occurs. This
commit therefore bases rcu_stall_chain_notifier_register() and
rcu_stall_chain_notifier_unregister() on atomic notifiers in order to
provide this functionality.
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The code and comments of self-detected and other-detected RCU CPU stall
warnings are identical except the output function. This commit therefore
refactors so as to consolidate the duplicate code.
Signed-off-by: Zhen Lei <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The stacks of all stalled CPUs will be dumped in rcu_dump_cpu_stacks().
If the CPU on where RCU GP kthread last ran is stalled, its stack does
not need to be dumped again. We can search the corresponding backtrace
based on the printed CPU ID.
For example:
[ 87.328275] rcu: rcu_sched kthread starved for ... ->cpu=3 <--------|
... ... |
[ 89.385007] NMI backtrace for cpu 3 <--------|
[ 89.385179] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.0+ #22 <--|
[ 89.385188] Hardware name: linux,dummy-virt (DT)
[ 89.385196] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[ 89.385204] pc : arch_cpu_idle+0x40/0xc0
[ 89.385211] lr : arch_cpu_idle+0x2c/0xc0
... ...
[ 89.385566] Call trace:
[ 89.385574] arch_cpu_idle+0x40/0xc0
[ 89.385581] default_idle_call+0x100/0x450
[ 89.385589] cpuidle_idle_call+0x2f8/0x460
[ 89.385596] do_idle+0x1dc/0x3d0
[ 89.385604] cpu_startup_entry+0x5c/0xb0
[ 89.385613] secondary_start_kernel+0x35c/0x520
Signed-off-by: Zhen Lei <[email protected]>
Reviewed-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The rcu_check_gp_kthread_starvation() function uses task_cpu() to sample
the last CPU that the grace-period kthread ran on, and task_cpu() samples
the thread_info structure's ->cpu field. But this field will always
contain a number corresponding to a CPU that was online some time in
the past, thus never a negative number. This invariant is checked by
a WARN_ON_ONCE() in set_task_cpu().
This means that if the grace-period kthread exists, that is, if the "gpk"
local variable is non-NULL, the "cpu" local variable will be non-negative.
This in turn means that the existing check for non-negative "cpu" is
redundant with the enclosing check for non-NULL "gpk".
This commit threefore removes the redundant check of "cpu".
Signed-off-by: Zhen Lei <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Add extra check in bpf_mem_alloc_init() to ensure the unit_size of
bpf_mem_cache is matched with the object_size of underlying slab cache.
If these two sizes are unmatched, print a warning once and return
-EINVAL in bpf_mem_alloc_init(), so the mismatch can be found early and
the potential issue can be prevented.
Suggested-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Hou Tao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
When the unit_size of a bpf_mem_cache is unmatched with the object_size
of the underlying slab cache, the bpf_mem_cache will not be used, and
the allocation will be redirected to a bpf_mem_cache with a bigger
unit_size instead, so there is no need to prefill for these
unused bpf_mem_caches.
Signed-off-by: Hou Tao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The following warning was reported when running "./test_progs -a
link_api -a linked_list" on a RISC-V QEMU VM:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342 bpf_mem_refill
Modules linked in: bpf_testmod(OE)
CPU: 3 PID: 261 Comm: test_progs- ... 6.5.0-rc5-01743-gdcb152bb8328 #2
Hardware name: riscv-virtio,qemu (DT)
epc : bpf_mem_refill+0x1fc/0x206
ra : irq_work_single+0x68/0x70
epc : ffffffff801b1bc4 ra : ffffffff8015fe84 sp : ff2000000001be20
gp : ffffffff82d26138 tp : ff6000008477a800 t0 : 0000000000046600
t1 : ffffffff812b6ddc t2 : 0000000000000000 s0 : ff2000000001be70
s1 : ff5ffffffffe8998 a0 : ff5ffffffffe8998 a1 : ff600003fef4b000
a2 : 000000000000003f a3 : ffffffff80008250 a4 : 0000000000000060
a5 : 0000000000000080 a6 : 0000000000000000 a7 : 0000000000735049
s2 : ff5ffffffffe8998 s3 : 0000000000000022 s4 : 0000000000001000
s5 : 0000000000000007 s6 : ff5ffffffffe8570 s7 : ffffffff82d6bd30
s8 : 000000000000003f s9 : ffffffff82d2c5e8 s10: 000000000000ffff
s11: ffffffff82d2c5d8 t3 : ffffffff81ea8f28 t4 : 0000000000000000
t5 : ff6000008fd28278 t6 : 0000000000040000
[<ffffffff801b1bc4>] bpf_mem_refill+0x1fc/0x206
[<ffffffff8015fe84>] irq_work_single+0x68/0x70
[<ffffffff8015feb4>] irq_work_run_list+0x28/0x36
[<ffffffff8015fefa>] irq_work_run+0x38/0x66
[<ffffffff8000828a>] handle_IPI+0x3a/0xb4
[<ffffffff800a5c3a>] handle_percpu_devid_irq+0xa4/0x1f8
[<ffffffff8009fafa>] generic_handle_domain_irq+0x28/0x36
[<ffffffff800ae570>] ipi_mux_process+0xac/0xfa
[<ffffffff8000a8ea>] sbi_ipi_handle+0x2e/0x88
[<ffffffff8009fafa>] generic_handle_domain_irq+0x28/0x36
[<ffffffff807ee70e>] riscv_intc_irq+0x36/0x4e
[<ffffffff812b5d3a>] handle_riscv_irq+0x54/0x86
[<ffffffff812b6904>] do_irq+0x66/0x98
---[ end trace 0000000000000000 ]---
The warning is due to WARN_ON_ONCE(tgt->unit_size != c->unit_size) in
free_bulk(). The direct reason is that a object is allocated and
freed by bpf_mem_caches with different unit_size.
The root cause is that KMALLOC_MIN_SIZE is 64 and there is no 96-bytes
slab cache in the specific VM. When linked_list test allocates a
72-bytes object through bpf_obj_new(), bpf_global_ma will allocate it
from a bpf_mem_cache with 96-bytes unit_size, but this bpf_mem_cache is
backed by 128-bytes slab cache. When the object is freed, bpf_mem_free()
uses ksize() to choose the corresponding bpf_mem_cache. Because the
object is allocated from 128-bytes slab cache, ksize() returns 128,
bpf_mem_free() chooses a 128-bytes bpf_mem_cache to free the object and
triggers the warning.
A similar warning will also be reported when using CONFIG_SLAB instead
of CONFIG_SLUB in a x86-64 kernel. Because CONFIG_SLUB defines
KMALLOC_MIN_SIZE as 8 but CONFIG_SLAB defines KMALLOC_MIN_SIZE as 32.
An alternative fix is to use kmalloc_size_round() in bpf_mem_alloc() to
choose a bpf_mem_cache which has the same unit_size with the backing
slab cache, but it may introduce performance degradation, so fix the
warning by adjusting the indexes in size_index according to the value of
KMALLOC_MIN_SIZE just like setup_kmalloc_cache_index_table() does.
Fixes: 822fb26bdb55 ("bpf: Add a hint to allocated objects.")
Reported-by: Björn Töpel <[email protected]>
Closes: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Hou Tao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Before exporting these helpers to modules, make their names more
meaningful.
The names mnt_{get,put)_write_access*() were chosen, because they rhyme
with the inode {get,put)_write_access() helpers, which have a very close
meaning for the inode object.
Suggested-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/r/20230817-anfechtbar-ruhelosigkeit-8c6cca8443fc@brauner/
Signed-off-by: Amir Goldstein <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
The Itanium architecture is obsolete, and an informal survey [0] reveals
that any residual use of Itanium hardware in production is mostly HP-UX
or OpenVMS based. The use of Linux on Itanium appears to be limited to
enthusiasts that occasionally boot a fresh Linux kernel to see whether
things are still working as intended, and perhaps to churn out some
distro packages that are rarely used in practice.
None of the original companies behind Itanium still produce or support
any hardware or software for the architecture, and it is listed as
'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers
that contributed on behalf of those companies (nor anyone else, for that
matter) have been willing to support or maintain the architecture
upstream or even be responsible for applying the odd fix. The Intel
firmware team removed all IA-64 support from the Tianocore/EDK2
reference implementation of EFI in 2018. (Itanium is the original
architecture for which EFI was developed, and the way Linux supports it
deviates significantly from other architectures.) Some distros, such as
Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have
dropped support years ago.
While the argument is being made [1] that there is a 'for the common
good' angle to being able to build and run existing projects such as the
Grid Community Toolkit [2] on Itanium for interoperability testing, the
fact remains that none of those projects are known to be deployed on
Linux/ia64, and very few people actually have access to such a system in
the first place. Even if there were ways imaginable in which Linux/ia64
could be put to good use today, what matters is whether anyone is
actually doing that, and this does not appear to be the case.
There are no emulators widely available, and so boot testing Itanium is
generally infeasible for ordinary contributors. GCC still supports IA-64
but its compile farm [3] no longer has any IA-64 machines. GLIBC would
like to get rid of IA-64 [4] too because it would permit some overdue
code cleanups. In summary, the benefits to the ecosystem of having IA-64
be part of it are mostly theoretical, whereas the maintenance overhead
of keeping it supported is real.
So let's rip off the band aid, and remove the IA-64 arch code entirely.
This follows the timeline proposed by the Debian/ia64 maintainer [5],
which removes support in a controlled manner, leaving IA-64 in a known
good state in the most recent LTS release. Other projects will follow
once the kernel support is removed.
[0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/
[1] https://lore.kernel.org/all/[email protected]/
[2] https://gridcf.org/gct-docs/latest/index.html
[3] https://cfarm.tetaneutral.net/machines/list/
[4] https://lore.kernel.org/all/[email protected]/
[5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/
Acked-by: Tony Luck <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- The kernel now dynamically probes for misaligned access speed, as
opposed to relying on a table of known implementations.
- Support for non-coherent devices on systems using the Andes AX45MP
core, including the RZ/Five SoCs.
- Support for the V extension in ptrace(), again.
- Support for KASLR.
- Support for the BPF prog pack allocator in RISC-V.
- A handful of bug fixes and cleanups.
* tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (25 commits)
soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met
riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config
riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config
riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled
bpf, riscv: use prog pack allocator in the BPF JIT
riscv: implement a memset like function for text
riscv: extend patch_text_nosync() for multiple pages
bpf: make bpf_prog_pack allocator portable
riscv: libstub: Implement KASLR by using generic functions
libstub: Fix compilation warning for rv32
arm64: libstub: Move KASLR handling functions to kaslr.c
riscv: Dump out kernel offset information on panic
riscv: Introduce virtual kernel mapping KASLR
RISC-V: Add ptrace support for vectors
soc: renesas: Kconfig: Select the required configs for RZ/Five SoC
cache: Add L2 cache management for Andes AX45MP RISC-V core
dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller
riscv: mm: dma-noncoherent: nonstandard cache operations support
riscv: errata: Add Andes alternative ports
riscv: asm: vendorid_list: Add Andes Technology to the vendors list
...
|