Age | Commit message (Collapse) | Author | Files | Lines |
|
KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw
sockets uses the pad field in struct inet_diag_req_v2 for the
underlying protocol. This field corresponds to the sdiag_raw_protocol
field in struct inet_diag_req_raw.
inet_diag_get_exact_compat() converts inet_diag_req to
inet_diag_req_v2, but leaves the pad field uninitialized. So the issue
occurs when raw_lookup() accesses the sdiag_raw_protocol field.
Fix this by initializing the pad field in
inet_diag_get_exact_compat(). Also, do the same fix in
inet_diag_dump_compat() to avoid the similar issue in the future.
[1]
BUG: KMSAN: uninit-value in raw_lookup net/ipv4/raw_diag.c:49 [inline]
BUG: KMSAN: uninit-value in raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
raw_lookup net/ipv4/raw_diag.c:49 [inline]
raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
inet_diag_cmd_exact+0x7d9/0x980
inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x332/0x3d0 net/socket.c:745
____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
__sys_sendmsg net/socket.c:2668 [inline]
__do_sys_sendmsg net/socket.c:2677 [inline]
__se_sys_sendmsg net/socket.c:2675 [inline]
__x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was stored to memory at:
raw_sock_get+0x650/0x800 net/ipv4/raw_diag.c:71
raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
inet_diag_cmd_exact+0x7d9/0x980
inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x332/0x3d0 net/socket.c:745
____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
__sys_sendmsg net/socket.c:2668 [inline]
__do_sys_sendmsg net/socket.c:2677 [inline]
__se_sys_sendmsg net/socket.c:2675 [inline]
__x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Local variable req.i created at:
inet_diag_get_exact_compat net/ipv4/inet_diag.c:1396 [inline]
inet_diag_rcv_msg_compat+0x2a6/0x530 net/ipv4/inet_diag.c:1426
sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
CPU: 1 PID: 8888 Comm: syz-executor.6 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Fixes: 432490f9d455 ("net: ip, diag -- Add diag interface for raw sockets")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Shigeru Yoshida <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When we process segments with TCP AO, we don't check it in
tcp_parse_options(). Thus, opt_rx->saw_unknown is set to 1,
which unconditionally triggers the BPF TCP option parser.
Let's avoid the unnecessary BPF invocation.
Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Dmitry Safonov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
A couple copy/paste mistakes in the code that selects steering targets
for OADDRM and INSTANCE0 unintentionally clobbered the steering target
for DSS ranges in some cases.
The OADDRM/INSTANCE0 values were also not assigned as intended, although
that mistake wound up being harmless since the desired values for those
specific ranges were '0' which the kzalloc of the GT structure should
have already taken care of implicitly.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 4f82ac6102788112e599a6074d2c1f2afce923df)
Signed-off-by: Thomas Hellström <[email protected]>
|
|
Don't call drm_suballoc_free with sa_bo pointing to PTR_ERR.
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2120
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: <[email protected]> # v6.8+
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit ce6b63336f79ec5f3996de65f452330e395f99ae)
Signed-off-by: Thomas Hellström <[email protected]>
|
|
Bos can be put with multiple unrelated dma-resv locks held. But
imported bos attempt to grab the bo dma-resv during dma-buf detach
that typically happens during cleanup. That leads to lockde splats
similar to the below and a potential ABBA deadlock.
Fix this by always taking the delayed workqueue cleanup path for
imported bos.
Requesting stable fixes from when the Xe driver was introduced,
since its usage of drm_exec and wide vm dma_resvs appear to be
the first reliable trigger of this.
[22982.116427] ============================================
[22982.116428] WARNING: possible recursive locking detected
[22982.116429] 6.10.0-rc2+ #10 Tainted: G U W
[22982.116430] --------------------------------------------
[22982.116430] glxgears:sh0/5785 is trying to acquire lock:
[22982.116431] ffff8c2bafa539a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: dma_buf_detach+0x3b/0xf0
[22982.116438]
but task is already holding lock:
[22982.116438] ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[22982.116442]
other info that might help us debug this:
[22982.116442] Possible unsafe locking scenario:
[22982.116443] CPU0
[22982.116444] ----
[22982.116444] lock(reservation_ww_class_mutex);
[22982.116445] lock(reservation_ww_class_mutex);
[22982.116447]
*** DEADLOCK ***
[22982.116447] May be due to missing lock nesting notation
[22982.116448] 5 locks held by glxgears:sh0/5785:
[22982.116449] #0: ffff8c2d9aba58c8 (&xef->vm.lock){+.+.}-{3:3}, at: xe_file_close+0xde/0x1c0 [xe]
[22982.116507] #1: ffff8c2e28cc8480 (&vm->lock){++++}-{3:3}, at: xe_vm_close_and_put+0x161/0x9b0 [xe]
[22982.116578] #2: ffff8c2e31982970 (&val->lock){.+.+}-{3:3}, at: xe_validation_ctx_init+0x6d/0x70 [xe]
[22982.116647] #3: ffffacdc469478a8 (reservation_ww_class_acquire){+.+.}-{0:0}, at: xe_vma_destroy_unlocked+0x7f/0xe0 [xe]
[22982.116716] #4: ffff8c2d9aba6da8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x49/0x2b0 [drm_exec]
[22982.116719]
stack backtrace:
[22982.116720] CPU: 8 PID: 5785 Comm: glxgears:sh0 Tainted: G U W 6.10.0-rc2+ #10
[22982.116721] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
[22982.116723] Call Trace:
[22982.116724] <TASK>
[22982.116725] dump_stack_lvl+0x77/0xb0
[22982.116727] __lock_acquire+0x1232/0x2160
[22982.116730] lock_acquire+0xcb/0x2d0
[22982.116732] ? dma_buf_detach+0x3b/0xf0
[22982.116734] ? __lock_acquire+0x417/0x2160
[22982.116736] __ww_mutex_lock.constprop.0+0xd0/0x13b0
[22982.116738] ? dma_buf_detach+0x3b/0xf0
[22982.116741] ? dma_buf_detach+0x3b/0xf0
[22982.116743] ? ww_mutex_lock+0x2b/0x90
[22982.116745] ww_mutex_lock+0x2b/0x90
[22982.116747] dma_buf_detach+0x3b/0xf0
[22982.116749] drm_prime_gem_destroy+0x2f/0x40 [drm]
[22982.116775] xe_ttm_bo_destroy+0x32/0x220 [xe]
[22982.116818] ? __mutex_unlock_slowpath+0x3a/0x290
[22982.116821] drm_exec_unlock_all+0xa1/0xd0 [drm_exec]
[22982.116823] drm_exec_fini+0x12/0xb0 [drm_exec]
[22982.116824] xe_validation_ctx_fini+0x15/0x40 [xe]
[22982.116892] xe_vma_destroy_unlocked+0xb1/0xe0 [xe]
[22982.116959] xe_vm_close_and_put+0x41a/0x9b0 [xe]
[22982.117025] ? xa_find+0xe3/0x1e0
[22982.117028] xe_file_close+0x10a/0x1c0 [xe]
[22982.117074] drm_file_free+0x22a/0x280 [drm]
[22982.117099] drm_release_noglobal+0x22/0x70 [drm]
[22982.117119] __fput+0xf1/0x2d0
[22982.117122] task_work_run+0x59/0x90
[22982.117125] do_exit+0x330/0xb40
[22982.117127] do_group_exit+0x36/0xa0
[22982.117129] get_signal+0xbd2/0xbe0
[22982.117131] arch_do_signal_or_restart+0x3e/0x240
[22982.117134] syscall_exit_to_user_mode+0x1e7/0x290
[22982.117137] do_syscall_64+0xa1/0x180
[22982.117139] ? lock_acquire+0xcb/0x2d0
[22982.117140] ? __set_task_comm+0x28/0x1e0
[22982.117141] ? find_held_lock+0x2b/0x80
[22982.117144] ? __set_task_comm+0xe1/0x1e0
[22982.117145] ? lock_release+0xca/0x290
[22982.117147] ? __do_sys_prctl+0x245/0xab0
[22982.117149] ? lockdep_hardirqs_on_prepare+0xde/0x190
[22982.117150] ? syscall_exit_to_user_mode+0xb0/0x290
[22982.117152] ? do_syscall_64+0xa1/0x180
[22982.117154] ? __lock_acquire+0x417/0x2160
[22982.117155] ? reacquire_held_locks+0xd1/0x1f0
[22982.117156] ? do_user_addr_fault+0x30c/0x790
[22982.117158] ? lock_acquire+0xcb/0x2d0
[22982.117160] ? find_held_lock+0x2b/0x80
[22982.117162] ? do_user_addr_fault+0x357/0x790
[22982.117163] ? lock_release+0xca/0x290
[22982.117164] ? do_user_addr_fault+0x361/0x790
[22982.117166] ? trace_hardirqs_off+0x4b/0xc0
[22982.117168] ? clear_bhb_loop+0x45/0xa0
[22982.117170] ? clear_bhb_loop+0x45/0xa0
[22982.117172] ? clear_bhb_loop+0x45/0xa0
[22982.117174] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[22982.117176] RIP: 0033:0x7f943d267169
[22982.117192] Code: Unable to access opcode bytes at 0x7f943d26713f.
[22982.117193] RSP: 002b:00007f9430bffc80 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[22982.117195] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f943d267169
[22982.117196] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00005622f89579d0
[22982.117197] RBP: 00007f9430bffcb0 R08: 0000000000000000 R09: 00000000ffffffff
[22982.117198] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[22982.117199] R13: 0000000000000000 R14: 0000000000000000 R15: 00005622f89579d0
[22982.117202] </TASK>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Christian König <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]> # v6.8+
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Zijian Zhang says:
====================
fix OOM and order check in msg_zerocopy selftest
In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.
Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.
And, we find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <[email protected]>
Signed-off-by: Xiaochun Lu <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.
Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.
Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <[email protected]>
Signed-off-by: Xiaochun Lu <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-06-25 (ice)
This series contains updates to ice driver only.
Milena adds disabling of extts events when PTP is disabled.
Jake prevents possible NULL pointer by checking that timestamps are
ready before processing extts events and adds checks for unsupported
PTP pin configuration.
Petr Oros replaces _test_bit() with the correct test_bit() macro.
v1: https://lore.kernel.org/netdev/[email protected]/
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Do not use _test_bit() macro for testing bit. The proper macro for this
is one without underline.
_test_bit() is what test_bit() was prior to const-optimization. It
directly calls arch_test_bit(), i.e. the arch-specific implementation
(or the generic one). It's strictly _internal_ and shouldn't be used
anywhere outside the actual test_bit() macro.
test_bit() is a wrapper which checks whether the bitmap and the bit
number are compile-time constants and if so, it calls the optimized
function which evaluates this call to a compile-time constant as well.
If either of them is not a compile-time constant, it just calls _test_bit().
test_bit() is the actual function to use anywhere in the kernel.
IOW, calling _test_bit() avoids potential compile-time optimizations.
The sensors is not a compile-time constant, thus most probably there
are no object code changes before and after the patch.
But anyway, we shouldn't call internal wrappers instead of
the actual API.
Fixes: 4da71a77fc3b ("ice: read internal temperature sensor")
Acked-by: Ivan Vecera <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Signed-off-by: Petr Oros <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The driver receives requests for configuring pins via the .enable
callback of the PTP clock object. These requests come into the driver
with flags which modify the requested behavior from userspace. Current
implementation in ice does not reject flags that it doesn't support.
This causes the driver to incorrectly apply requests with such flags as
PTP_PEROUT_DUTY_CYCLE, or any future flags added by the kernel which it
is not yet aware of.
Fix this by properly validating flags in both ice_ptp_cfg_perout and
ice_ptp_cfg_extts. Ensure that we check by bit-wise negating supported
flags rather than just checking and rejecting known un-supported flags.
This is preferable, as it ensures better compatibility with future
kernels.
Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The ice_ptp_extts_event() function can race with ice_ptp_release() and
result in a NULL pointer dereference which leads to a kernel panic.
Panic occurs because the ice_ptp_extts_event() function calls
ptp_clock_event() with a NULL pointer. The ice driver has already
released the PTP clock by the time the interrupt for the next external
timestamp event occurs.
To fix this, modify the ice_ptp_extts_event() function to check the
PTP state and bail early if PTP is not ready.
Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Extts events are disabled and enabled by the application ts2phc.
However, in case where the driver is removed when the application is
running, a specific extts event remains enabled and can cause a kernel
crash.
As a side effect, when the driver is reloaded and application is started
again, remaining extts event for the channel from a previous run will
keep firing and the message "extts on unexpected channel" might be
printed to the user.
To avoid that, extts events shall be disabled when PTP is released.
Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <[email protected]>
Co-developed-by: Jacob Keller <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Signed-off-by: Milena Olech <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
syzkaller reported a KMSAN splat in __unix_walk_scc() while backtracking
edge_stack after finalising SCC.
Let's add a test case exercising the path.
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Shigeru Yoshida <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
KMSAN reported uninit-value access in __unix_walk_scc() [1].
In the list_for_each_entry_reverse() loop, when the vertex's index
equals it's scc_index, the loop uses the variable vertex as a
temporary variable that points to a vertex in scc. And when the loop
is finished, the variable vertex points to the list head, in this case
scc, which is a local variable on the stack (more precisely, it's not
even scc and might underflow the call stack of __unix_walk_scc():
container_of(&scc, struct unix_vertex, scc_entry)).
However, the variable vertex is used under the label prev_vertex. So
if the edge_stack is not empty and the function jumps to the
prev_vertex label, the function will access invalid data on the
stack. This causes the uninit-value access issue.
Fix this by introducing a new temporary variable for the loop.
[1]
BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline]
BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline]
BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
__unix_walk_scc net/unix/garbage.c:478 [inline]
unix_walk_scc net/unix/garbage.c:526 [inline]
__unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
kthread+0x3c4/0x530 kernel/kthread.c:389
ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Uninit was stored to memory at:
unix_walk_scc net/unix/garbage.c:526 [inline]
__unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
kthread+0x3c4/0x530 kernel/kthread.c:389
ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Local variable entries created at:
ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222
netdev_tracker_free include/linux/netdevice.h:4058 [inline]
netdev_put include/linux/netdevice.h:4075 [inline]
dev_put include/linux/netdevice.h:4101 [inline]
update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813
CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Workqueue: events_unbound __unix_gc
Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Shigeru Yoshida <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In function bond_option_arp_ip_targets_set(), if newval->string is an
empty string, newval->string+1 will point to the byte after the
string, causing an out-of-bound read.
BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418
Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107
CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0xc1/0x5e0 mm/kasan/report.c:475
kasan_report+0xbe/0xf0 mm/kasan/report.c:588
strlen+0x7d/0xa0 lib/string.c:418
__fortify_strlen include/linux/fortify-string.h:210 [inline]
in4_pton+0xa3/0x3f0 net/core/utils.c:130
bond_option_arp_ip_targets_set+0xc2/0x910
drivers/net/bonding/bond_options.c:1201
__bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767
__bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792
bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817
bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156
dev_attr_store+0x54/0x80 drivers/base/core.c:2366
sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x96a/0xd80 fs/read_write.c:584
ksys_write+0x122/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
---[ end trace ]---
Fix it by adding a check of string length before using it.
Fixes: f9de11a16594 ("bonding: add ip checks when store ip target")
Signed-off-by: Yue Sun <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Reviewed-by: Hangbin Liu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The use-after-free is actually in rswitch_tx_free(), which is inlined in
rswitch_poll(). Since `skb` and `gq->skbs[gq->dirty]` are in fact the
same pointer, the skb is first freed using dev_kfree_skb_any(), then the
value in skb->len is used to update the interface statistics.
Let's move around the instructions to use skb->len before the skb is
freed.
This bug is trivial to reproduce using KFENCE. It will trigger a splat
every few packets. A simple ARP request or ICMP echo request is enough.
Fixes: 271e015b9153 ("net: rswitch: Add unmap_addrs instead of dma address in each desc")
Signed-off-by: Radu Rendec <[email protected]>
Reviewed-by: Yoshihiro Shimoda <[email protected]>
Reviewed-by: Niklas Söderlund <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Another improper use of __folio_put() in an error path after freshly
allocating pages/folios which returns them with the refcount initialized
to 1. The refactor from __free_pages() -> __folio_put() (instead of
folio_put) removed a refcount decrement found in __free_pages() and
folio_put but absent from __folio_put().
Fixes: 13df3775efca ("btrfs: cleanup metadata page pointer usage")
CC: [email protected] # 6.8+
Tested-by: Ed Tomlinson <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Boris Burkov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
The conversion to folios switched __free_page() to __folio_put() in the
error path in btrfs_do_encoded_write().
However, this gets the page refcounting wrong. If we do hit that error
path (I reproduced by modifying btrfs_do_encoded_write to pretend to
always fail in a way that jumps to out_folios and running the fstests
case btrfs/281), then we always hit the following BUG freeing the folio:
BUG: Bad page state in process btrfs pfn:40ab0b
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x61be5 pfn:0x40ab0b
flags: 0x5ffff0000000000(node=0|zone=2|lastcpupid=0x1ffff)
raw: 05ffff0000000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000061be5 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: nonzero _refcount
Call Trace:
<TASK>
dump_stack_lvl+0x3d/0xe0
bad_page+0xea/0xf0
free_unref_page+0x8e1/0x900
? __mem_cgroup_uncharge+0x69/0x90
__folio_put+0xe6/0x190
btrfs_do_encoded_write+0x445/0x780
? current_time+0x25/0xd0
btrfs_do_write_iter+0x2cc/0x4b0
btrfs_ioctl_encoded_write+0x2b6/0x340
It turns out __free_page() decreases the page reference count while
__folio_put() does not. Switch __folio_put() to folio_put() which
decreases the folio reference count first.
Fixes: 400b172b8cdc ("btrfs: compression: migrate compression/decompression paths to folios")
Tested-by: Ed Tomlinson <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Boris Burkov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
syzbot reports:
KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
[..]
Workqueue: events nf_tables_trans_destroy_work
Call Trace:
nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Problem is that the notifier does a conditional flush, but its possible
that the table-to-be-removed is still referenced by transactions being
processed by the worker, so we need to flush unconditionally.
We could make the flush_work depend on whether we found a table to delete
in nf-next to avoid the flush for most cases.
AFAICS this problem is only exposed in nf-next, with
commit e169285f8c56 ("netfilter: nf_tables: do not store nft_ctx in transaction objects"),
with this commit applied there is an unconditional fetch of
table->family which is whats triggering the above splat.
Fixes: 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier")
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
When del_timer_sync() is called in an interrupt context it throws a warning
because of potential deadlock. The timer is used only to exit from
wait_for_completion() after a timeout so replacing the call with
wait_for_completion_timeout() allows to remove the problematic timer and
its related functions altogether.
Fixes: 41561f28e76a ("i2c: New Philips PNX bus driver")
Signed-off-by: Piotr Wojtaszczyk <[email protected]>
Signed-off-by: Andi Shyti <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix ioctl conflict with memmapped ring buffer ioctl
It was reported that the ioctl() number used to update the ring buffer
memory mapping conflicted with the TCGETS ioctl causing strace to
report:
$ strace -e ioctl stty
ioctl(0, TCGETS or TRACE_MMAP_IOCTL_GET_READER, {c_iflag=ICRNL|IXON, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0
Since this ioctl hasn't been in a full release yet, change it from
"T", 0x1 to "R" 0x20, and also reserve 0x20-0x2F for future ioctl
commands, as some more are being worked on for the future"
* tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F
|
|
To prevent conflicts with other ioctl numbers to allow strace to have an
idea of what is happening, add the range of ioctls for the trace buffer
mapping from _IO("T", 0x1) to the range of "R" 0x20 - 0x2F.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Cc: Jonathan Corbet <[email protected]>
Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer")
Link: https://lore.kernel.org/[email protected]
Reported-by: "Dmitry V. Levin" <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
If the kexec crash code is called in the interrupt context, the
machine_kexec_mask_interrupts() function will trigger a deadlock while
trying to acquire the irqdesc spinlock and then deactivate irqchip in
irq_set_irqchip_state() function.
Unlike arm64, riscv only requires irq_eoi handler to complete EOI and
keeping irq_set_irqchip_state() will only leave this possible deadlock
without any use. So we simply remove it.
Link: https://lore.kernel.org/linux-riscv/[email protected]/
Fixes: b17d19a5314a ("riscv: kexec: Fixup irq controller broken in kexec crash path")
Signed-off-by: Song Shuai <[email protected]>
Reviewed-by: Ryo Takakura <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
ftrace_graph_ret_addr() takes an `idx` integer pointer that is used to
optimize the stack unwinding. Pass it a valid pointer to utilize the
optimizations that might be available in the future.
The commit is making riscv's usage of ftrace_graph_ret_addr() match
x86_64.
Signed-off-by: Puranjay Mohan <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Clang does not support implicit LMUL in the vset* instruction sequences.
Introduce an explicit LMUL in the vsetivli instruction.
Signed-off-by: Charlie Jenkins <[email protected]>
Fixes: 9d5328eeb185 ("riscv: selftests: Add signal handling vector tests")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Atish Patra <[email protected]> says:
This series contains 3 fixes out of which the first one is a new fix
for invalid event data reported in lkml[2]. The last two are v3 of Samuel's
patch[1]. I added the RB/TB/Fixes tag and moved 1 unrelated change
to its own patch. I also changed an error message in kvm vcpu_pmu from
pr_err to pr_debug to avoid redundant failure error messages generated
due to the boot time quering of events implemented in the patch[1]
Here is the original cover letter for the patch[1]
Before this patch:
$ perf list hw
List of pre-defined events (to be used in -e or -M):
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
stalled-cycles-backend OR idle-cycles-backend [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
$ perf stat -ddd true
Performance counter stats for 'true':
4.36 msec task-clock # 0.744 CPUs utilized
1 context-switches # 229.325 /sec
0 cpu-migrations # 0.000 /sec
38 page-faults # 8.714 K/sec
4,375,694 cycles # 1.003 GHz (60.64%)
728,945 instructions # 0.17 insn per cycle
79,199 branches # 18.162 M/sec
17,709 branch-misses # 22.36% of all branches
181,734 L1-dcache-loads # 41.676 M/sec
5,547 L1-dcache-load-misses # 3.05% of all L1-dcache accesses
<not counted> LLC-loads (0.00%)
<not counted> LLC-load-misses (0.00%)
<not counted> L1-icache-loads (0.00%)
<not counted> L1-icache-load-misses (0.00%)
<not counted> dTLB-loads (0.00%)
<not counted> dTLB-load-misses (0.00%)
<not counted> iTLB-loads (0.00%)
<not counted> iTLB-load-misses (0.00%)
<not counted> L1-dcache-prefetches (0.00%)
<not counted> L1-dcache-prefetch-misses (0.00%)
0.005860375 seconds time elapsed
0.000000000 seconds user
0.010383000 seconds sys
After this patch:
$ perf list hw
List of pre-defined events (to be used in -e or -M):
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
$ perf stat -ddd true
Performance counter stats for 'true':
5.16 msec task-clock # 0.848 CPUs utilized
1 context-switches # 193.817 /sec
0 cpu-migrations # 0.000 /sec
37 page-faults # 7.171 K/sec
5,183,625 cycles # 1.005 GHz
961,696 instructions # 0.19 insn per cycle
85,853 branches # 16.640 M/sec
20,462 branch-misses # 23.83% of all branches
243,545 L1-dcache-loads # 47.203 M/sec
5,974 L1-dcache-load-misses # 2.45% of all L1-dcache accesses
<not supported> LLC-loads
<not supported> LLC-load-misses
<not supported> L1-icache-loads
<not supported> L1-icache-load-misses
<not supported> dTLB-loads
19,619 dTLB-load-misses
<not supported> iTLB-loads
6,831 iTLB-load-misses
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
0.006085625 seconds time elapsed
0.000000000 seconds user
0.013022000 seconds sys
[1] https://lore.kernel.org/linux-riscv/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
* b4-shazam-merge:
perf: RISC-V: Check standard event availability
drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus
drivers/perf: riscv: Do not update the event data if uptodate
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The RISC-V SBI PMU specification defines several standard hardware and
cache events. Currently, all of these events are exposed to userspace,
even when not actually implemented. They appear in the `perf list`
output, and commands like `perf stat` try to use them.
This is more than just a cosmetic issue, because the PMU driver's .add
function fails for these events, which causes pmu_groups_sched_in() to
prematurely stop scheduling in other (possibly valid) hardware events.
Add logic to check which events are supported by the hardware (i.e. can
be mapped to some counter), so only usable events are reported to
userspace. Since the kernel does not know the mapping between events and
possible counters, this check must happen during boot, when no counters
are in use. Make the check asynchronous to minimize impact on boot time.
Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension")
Signed-off-by: Samuel Holland <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Tested-by: Atish Patra <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Currently, we stop all the counters while a new cpu is brought online.
However, the hpmevent to counter mappings are not reset. The firmware may
have some stale encoding in their mapping structure which may lead to
undesirable results. We have not encountered such scenario though.
Signed-off-by: Samuel Holland <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
In case of an counter overflow, the event data may get corrupted
if called from an external overflow handler. This happens because
we can't update the counter without starting it when SBI PMU
extension is in use. However, the prev_count has been already
updated at the first pass while the counter value is still the
old one.
The solution is simple where we don't need to update it again
if it is already updated which can be detected using hwc state.
The event state in the overflow handler is updated in the following
patch. Thus, this fix can't be backported to kernel version where
overflow support was added.
Fixes: a8625217a054 ("drivers/perf: riscv: Implement SBI PMU snapshot function")
Closes:https://lore.kernel.org/all/[email protected]/
Reported-by: [email protected]
Signed-off-by: Atish Patra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.
Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <[email protected]>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "nilfs2: fix potential issues related to reserved inodes".
This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The dirty throttling logic is interspersed with assumptions that dirty
limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
fit into 64-bits). If limits end up being larger, we will hit overflows,
possible divisions by 0 etc. Fix these problems by never allowing so
large dirty limits as they have dubious practical value anyway. For
dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
so large limits. For dirty_ratio / dirty_background_ratio it isn't so
simple as the dirty limit is computed from the amount of available memory
which can change due to memory hotplug etc. So when converting dirty
limits from ratios to numbers of pages, we just don't allow the result to
exceed UINT_MAX.
This is root-only triggerable problem which occurs when the operator
sets dirty limits to >16 TB.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Reported-by: Zach O'Keefe <[email protected]>
Reviewed-By: Zach O'Keefe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <[email protected]>
Reviewed-By: Zach O'Keefe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.
If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.
Recognize this situation in advance and exit early.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jinliang Zheng <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Pull io_uring fix from Jens Axboe:
"A fix for a feature that went into the 6.10 merge window actually
ended up causing a regression in building bundles for receives.
Fix that up by ensuring we don't overwrite msg_inq before we use
it in the loop"
* tag 'io_uring-6.10-20240703' of git://git.kernel.dk/linux:
io_uring/net: don't clear msg_inq before io_recv_buf_select() needs it
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Some fixes related to the IPU6 driver"
* tag 'media/v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: ivsc: Depend on IPU_BRIDGE or not IPU_BRIDGE
media: intel/ipu6: Fix a null pointer dereference in ipu6_isys_query_stream_by_source
media: ipu6: Use the ISYS auxdev device as the V4L2 device's device
|
|
Fix invalid dereferencing of indirect CCW data pointer in
dasd_eckd_dump_sense() that leads to a kernel panic in error cases.
When using indirect addressing for DASD CCWs (IDAW) the CCW CDA pointer
does not contain the data address itself but a pointer to the IDAL.
This needs to be translated from physical to virtual as well before
using it.
This dereferencing is also used for dasd_page_cache and also fixed
although it is very unlikely that this code path ever gets used.
Fixes: c0bd39601c13 ("s390/dasd: use new address translation helpers")
Cc: [email protected]
Signed-off-by: Stefan Haberland <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
iwl_mvm_get_bss_vif might return a NULL or ERR_PTR. Some of the callers
check only the NULL case, and some doesn't check at all.
Some of the callers even have a pointer to the mvmvif of the bss vif,
so we don't even need to call this function, and can simply get the vif
from mvmvif. Do it for those cases, and for the others - properly check
if IS_ERR_OR_NULL
Fixes: ec0d43d26f2c ("wifi: iwlwifi: mvm: Activate EMLSR based on traffic volume")
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://patch.msgid.link/20240703064027.a661f8c65aac.I45cf09b01af8ee3d55828863958ead741ea43b7f@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
We already iterate the link bss_conf/link_info and have the
pointer, or know that deflink/bss_conf is used, so avoid an
extra lookup and just pass the pointer. This may also avoid
a crash when this is processed during restart, where the FW
to link conf array (link_id_to_link_conf) may be NULLed out.
Fixes: c1e458b987f2 ("wifi: iwlwifi: mvm: Move beacon filtering to be per link")
Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Ilan Peer <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://patch.msgid.link/20240703064026.346a6ef67a86.Iba5d65d728ca9f58518c88d029496c1250670544@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
Since we now want to sync the queues even when we're in RFKILL, we
shouldn't wake up the wait queue since we still expect to get all the
notifications from the firmware.
Fixes: 4d08c0b3357c ("wifi: iwlwifi: mvm: handle BA session teardown in RF-kill")
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://patch.msgid.link/20240703064027.be7a9dbeacde.I5586cb3ca8d6e44f79d819a48a0c22351ff720c9@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
The WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK should be set based on the
WOWLAN_KEK_KCK_MATERIAL command version. Currently, the command
version in the firmware has advanced to 4, which prevents the
flag from being set correctly, fix that.
Signed-off-by: Daniel Gabay <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://patch.msgid.link/20240703064026.a0f162108575.If1a9785727d2a1b0197a396680965df1b53d4096@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
Commit 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path")
made sure that the IEs data was manipulated under the relevant RCU section.
Unfortunately, while doing so, the commit brought a faulty implicit cast
from int to u8 on the ies_len variable, making the parsing fail to be
performed correctly if the IEs block is larger than 255 bytes. This failure
can be observed with Access Points appending a lot of IEs TLVs in their
beacon frames (reproduced with a Pixel phone acting as an Access Point,
which brough 273 bytes of IE data in my testing environment).
Fix IEs parsing by removing this undesired implicit cast.
Fixes: 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path")
Signed-off-by: Jozef Hopko <[email protected]>
Signed-off-by: Alexis Lothoré <[email protected]>
Acked-by: Ajay Singh <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://patch.msgid.link/[email protected]
|
|
bgpio_bits must be aligned with the data bus width. For example, on a
32 bit big endian system and we only have 16 GPIOs. If we only assume
bgpio_bits=16 we can never control the GPIO because the base address
is the lowest address.
low address high address
-------------------------------------------------
| byte3 | byte2 | byte1 | byte0 |
-------------------------------------------------
| NaN | NaN | gpio8-15 | gpio0-7 |
-------------------------------------------------
Fixes: 55b2395e4e92 ("gpio: mmio: handle "ngpios" properly in bgpio_init()")
Fixes: https://github.com/openwrt/openwrt/issues/15739
Reported-by: Mark Mentovai <[email protected]>
Signed-off-by: Shiji Yang <[email protected]>
Suggested-By: Mark Mentovai <[email protected]>
Reviewed-by: Jonas Gorski <[email protected]>
Tested-by: Lóránd Horváth <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/TYCP286MB089577B47D70F0AB25ABA6F5BCD52@TYCP286MB0895.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
The processing of a Transfer-In-Channel (TIC) CCW requires locating
the target of the CCW in the channel program, and updating the
address to reflect what will actually be sent to hardware.
An error exists where the 64-bit virtual address is truncated to
32-bits (variable "cda") when performing this math. Since s390
addresses of that size are 31-bits, this leaves that additional
bit enabled such that the resulting I/O triggers a channel
program check. This shows up occasionally when booting a KVM
guest from a passthrough DASD device:
..snip...
Interrupt Response Block Data:
: 0x0000000000003990
Function Ctrl : [Start]
Activity Ctrl :
Status Ctrl : [Alert] [Primary] [Secondary] [Status-Pending]
Device Status :
Channel Status : [Program-Check]
cpa=: 0x00000000008d0018
prev_ccw=: 0x0000000000000000
this_ccw=: 0x0000000000000000
...snip...
dasd-ipl: Failed to run IPL1 channel program
The channel program address of "0x008d0018" in the IRB doesn't
look wrong, but tracing the CCWs shows the offending bit enabled:
ccw=0x0000012e808d0000 cda=00a0b030
ccw=0x0000012e808d0008 cda=00a0b038
ccw=0x0000012e808d0010 cda=808d0008
ccw=0x0000012e808d0018 cda=00a0b040
Fix the calculation of the TIC CCW's data address such that it points
to a valid 31-bit address regardless of the input address.
Fixes: bd36cfbbb9e1 ("s390/vfio_ccw_cp: use new address translation helpers")
Signed-off-by: Eric Farman <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Commit 750011e239a5 ("net: stmmac: Add support for HW-accelerated VLAN
stripping") enables MAC level VLAN tag stripping for all MAC cores, but
leaves set_hw_vlan_mode() and rx_hw_vlan() un-implemented for both gmac
and xgmac.
On gmac and xgmac, ethtool reports rx-vlan-offload is on, both MAC and
driver do nothing about VLAN packets actually, although VLAN works well.
Driver level stripping should be used on gmac and xgmac for now.
Fixes: 750011e239a5 ("net: stmmac: Add support for HW-accelerated VLAN stripping")
Signed-off-by: Furong Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Starting with kernel 6.7, the framebuffer text console is not working
anymore with the virtio-gpu device on s390x hosts. Such big endian fb
devices are usinga different pixel ordering than little endian devices,
e.g. DRM_FORMAT_BGRX8888 instead of DRM_FORMAT_XRGB8888.
This used to work fine as long as drm_client_buffer_addfb() was still
calling drm_mode_addfb() which called drm_driver_legacy_fb_format()
internally to get the right format. But drm_client_buffer_addfb() has
recently been reworked to call drm_mode_addfb2() instead with the
format value that has been passed to it as a parameter (see commit
6ae2ff23aa43 ("drm/client: Convert drm_client_buffer_addfb() to drm_mode_addfb2()").
That format parameter is determined in drm_fbdev_generic_helper_fb_probe()
via the drm_mode_legacy_fb_format() function - which only generates
formats suitable for little endian devices. So to fix this issue
switch to drm_driver_legacy_fb_format() here instead to take the
device endianness into consideration.
Fixes: 6ae2ff23aa43 ("drm/client: Convert drm_client_buffer_addfb() to drm_mode_addfb2()")
Closes: https://issues.redhat.com/browse/RHEL-45158
Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Acked-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A sync-only job is meant to provide a synchronization point on a
queue, so we can't return a NULL fence there, we have to add a signal
operation to the command stream which executes after all other
previously submitted jobs are done.
v2:
- Fixed a UAF bug
- Added R-bs
Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Liviu Dudau <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The user is likely to leave all the drm_panthor_obj_array fields
to zero when the array is empty, which will cause an EINVAL failure.
v2:
- Added R-bs
Fixes: 4bdca1150792 ("drm/panthor: Add the driver frontend block")
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Liviu Dudau <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|