Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch fixes inconsistencies in the parsing rules of the levels 1
and 2 of the kselftest_deps.sh. It was added the levels 4 and 5 to
account for a few edge cases that are present in some tests, also some
minor identation styling have been fixed (s/ /\t/g).
Signed-off-by: Ricardo B. Marliere <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
Timeouts in kselftest are done using the "timeout" command with the
"--foreground" option. Without the "foreground" option, it is not
possible for a user to cancel the runner using SIGINT, because the
signal is not propagated to timeout which is running in a different
process group. The "forground" options places the timeout in the same
process group as its parent, but only sends the SIGTERM (on timeout)
signal to the forked process. Unfortunately, this does not play nice
with all kselftests, e.g. "net:fcnal-test.sh", where the child
processes will linger because timeout does not send SIGTERM to the
group.
Some users have noted these hangs [1].
Fix this by nesting the timeout with an additional timeout without the
foreground option.
Link: https://lore.kernel.org/all/[email protected]/ # [1]
Fixes: 651e0d881461 ("kselftest/runner: allow to properly deliver signals to tests")
Signed-off-by: Björn Töpel <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
The wiki has been archived and is not updated anymore. Remove or replace
the links in files that contain it (MAINTAINERS, Kconfig, docs).
Signed-off-by: Bhaskar Chowdhury <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
When removing a delayed item, or releasing which will remove it as well,
we will modify one of the delayed node's rbtrees and item counter if the
delayed item is in one of the rbtrees. This require having the delayed
node's mutex locked, otherwise we will race with other tasks modifying
the rbtrees and the counter.
This is motivated by a previous version of another patch actually calling
btrfs_release_delayed_item() after unlocking the delayed node's mutex and
against a delayed item that is in a rbtree.
So assert at __btrfs_remove_delayed_item() that the delayed node's mutex
is locked.
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Instead of calling BUG() when we fail to insert a delayed dir index item
into the delayed node's tree, we can just release all the resources we
have allocated/acquired before and return the error to the caller. This is
fine because all existing call chains undo anything they have done before
calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending
snapshots in the transaction commit path).
So remove the BUG() call and do proper error handling.
This relates to a syzbot report linked below, but does not fix it because
it only prevents hitting a BUG(), it does not fix the issue where somehow
we attempt to use twice the same index number for different index items.
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
CC: [email protected] # 5.4+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
If we fail to add a delayed dir index item because there's already another
item with the same index number, we print an error message (and then BUG).
However that message isn't very helpful to debug anything because we don't
know what's the index number and what are the values of index counters in
the inode and its delayed inode (index_cnt fields of struct btrfs_inode
and struct btrfs_delayed_node).
So update the error message to include the index number and counters.
We actually had a recent case where this issue was hit by a syzbot report
(see the link below).
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
[BUG]
After commit 72a69cd03082 ("btrfs: subpage: pack all subpage bitmaps
into a larger bitmap"), the DEBUG section of btree_dirty_folio() would
no longer compile.
[CAUSE]
If DEBUG is defined, we would do extra checks for btree_dirty_folio(),
mostly to make sure the range we marked dirty has an extent buffer and
that extent buffer is dirty.
For subpage, we need to iterate through all the extent buffers covered
by that page range, and make sure they all matches the criteria.
However commit 72a69cd03082 ("btrfs: subpage: pack all subpage bitmaps
into a larger bitmap") changes how we store the bitmap, we pack all the
16 bits bitmaps into a larger bitmap, which would save some space.
This means we no longer have btrfs_subpage::dirty_bitmap, instead the
dirty bitmap is starting at btrfs_subpage_info::dirty_offset, and has a
length of btrfs_subpage_info::bitmap_nr_bits.
[FIX]
Although I'm not sure if it still makes sense to maintain such code, at
least let it compile.
This patch would let us test the bits one by one through the bitmaps.
CC: [email protected] # 6.1+
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
If we do fast tree logging we increment a counter on the current
transaction for every ordered extent we need to wait for. This means we
expect the transaction to still be there when we clear pending on the
ordered extent. However if we happen to abort the transaction and clean
it up, there could be no running transaction, and thus we'll trip the
"ASSERT(trans)" check. This is obviously incorrect, and the code
properly deals with the case that the transaction doesn't exist. Fix
this ASSERT() to only fire if there's no trans and we don't have
BTRFS_FS_ERROR() set on the file system.
CC: [email protected] # 4.14+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
delayed items
When running delayed items we are holding a delayed node's mutex and then
we will attempt to modify a subvolume btree to insert/update/delete the
delayed items. However if have an error during the insertions for example,
btrfs_insert_delayed_items() may return with a path that has locked extent
buffers (a leaf at the very least), and then we attempt to release the
delayed node at __btrfs_run_delayed_items(), which requires taking the
delayed node's mutex, causing an ABBA type of deadlock. This was reported
by syzbot and the lockdep splat is the following:
WARNING: possible circular locking dependency detected
6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0 Not tainted
------------------------------------------------------
syz-executor.2/13257 is trying to acquire lock:
ffff88801835c0c0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256
but task is already holding lock:
ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (btrfs-tree-00){++++}-{3:3}:
__lock_release kernel/locking/lockdep.c:5475 [inline]
lock_release+0x36f/0x9d0 kernel/locking/lockdep.c:5781
up_write+0x79/0x580 kernel/locking/rwsem.c:1625
btrfs_tree_unlock_rw fs/btrfs/locking.h:189 [inline]
btrfs_unlock_up_safe+0x179/0x3b0 fs/btrfs/locking.c:239
search_leaf fs/btrfs/ctree.c:1986 [inline]
btrfs_search_slot+0x2511/0x2f80 fs/btrfs/ctree.c:2230
btrfs_insert_empty_items+0x9c/0x180 fs/btrfs/ctree.c:4376
btrfs_insert_delayed_item fs/btrfs/delayed-inode.c:746 [inline]
btrfs_insert_delayed_items fs/btrfs/delayed-inode.c:824 [inline]
__btrfs_commit_inode_delayed_items+0xd24/0x2410 fs/btrfs/delayed-inode.c:1111
__btrfs_run_delayed_items+0x1db/0x430 fs/btrfs/delayed-inode.c:1153
flush_space+0x269/0xe70 fs/btrfs/space-info.c:723
btrfs_async_reclaim_metadata_space+0x106/0x350 fs/btrfs/space-info.c:1078
process_one_work+0x92c/0x12c0 kernel/workqueue.c:2600
worker_thread+0xa63/0x1210 kernel/workqueue.c:2751
kthread+0x2b8/0x350 kernel/kthread.c:389
ret_from_fork+0x2e/0x60 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
-> #0 (&delayed_node->mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
__mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603
__mutex_lock kernel/locking/mutex.c:747 [inline]
mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799
__btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256
btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline]
__btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156
btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276
btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988
vfs_fsync_range fs/sync.c:188 [inline]
vfs_fsync fs/sync.c:202 [inline]
do_fsync fs/sync.c:212 [inline]
__do_sys_fsync fs/sync.c:220 [inline]
__se_sys_fsync fs/sync.c:218 [inline]
__x64_sys_fsync+0x196/0x1e0 fs/sync.c:218
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(btrfs-tree-00);
lock(&delayed_node->mutex);
lock(btrfs-tree-00);
lock(&delayed_node->mutex);
*** DEADLOCK ***
3 locks held by syz-executor.2/13257:
#0: ffff88802c1ee370 (btrfs_trans_num_writers){++++}-{0:0}, at: spin_unlock include/linux/spinlock.h:391 [inline]
#0: ffff88802c1ee370 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0xb87/0xe00 fs/btrfs/transaction.c:287
#1: ffff88802c1ee398 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0xbb2/0xe00 fs/btrfs/transaction.c:288
#2: ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198
stack backtrace:
CPU: 0 PID: 13257 Comm: syz-executor.2 Not tainted 6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
__mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603
__mutex_lock kernel/locking/mutex.c:747 [inline]
mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799
__btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256
btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline]
__btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156
btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276
btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988
vfs_fsync_range fs/sync.c:188 [inline]
vfs_fsync fs/sync.c:202 [inline]
do_fsync fs/sync.c:212 [inline]
__do_sys_fsync fs/sync.c:220 [inline]
__se_sys_fsync fs/sync.c:218 [inline]
__x64_sys_fsync+0x196/0x1e0 fs/sync.c:218
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f3ad047cae9
Code: 28 00 00 00 75 (...)
RSP: 002b:00007f3ad12510c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
RAX: ffffffffffffffda RBX: 00007f3ad059bf80 RCX: 00007f3ad047cae9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007f3ad04c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f3ad059bf80 R15: 00007ffe56af92f8
</TASK>
------------[ cut here ]------------
Fix this by releasing the path before releasing the delayed node in the
error path at __btrfs_run_delayed_items().
Reported-by: [email protected]
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
CC: [email protected] # 4.14+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Internally I got a report of very long stalls on normal operations like
creating a new file when auto relocation was running. The reporter used
the 'bpf offcputime' tracer to show that we would get stuck in
start_transaction for 5 to 30 seconds, and were always being woken up by
the transaction commit.
Using my timing-everything script, which times how long a function takes
and what percentage of that total time is taken up by its children, I
saw several traces like this
1083 took 32812902424 ns
29929002926 ns 91.2110% wait_for_commit_duration
25568 ns 7.7920e-05% commit_fs_roots_duration
1007751 ns 0.00307% commit_cowonly_roots_duration
446855602 ns 1.36182% btrfs_run_delayed_refs_duration
271980 ns 0.00082% btrfs_run_delayed_items_duration
2008 ns 6.1195e-06% btrfs_apply_pending_changes_duration
9656 ns 2.9427e-05% switch_commit_roots_duration
1598 ns 4.8700e-06% btrfs_commit_device_sizes_duration
4314 ns 1.3147e-05% btrfs_free_log_root_tree_duration
Here I was only tracing functions that happen where we are between
START_COMMIT and UNBLOCKED in order to see what would be keeping us
blocked for so long. The wait_for_commit() we do is where we wait for a
previous transaction that hasn't completed it's commit. This can
include all of the unpin work and other cleanups, which tends to be the
longest part of our transaction commit.
There is no reason we should be blocking new things from entering the
transaction at this point, it just adds to random latency spikes for no
reason.
Fix this by adding a PREP stage. This allows us to properly deal with
multiple committers coming in at the same time, we retain the behavior
that the winner waits on the previous transaction and the losers all
wait for this transaction commit to occur. Nothing else is blocked
during the PREP stage, and then once the wait is complete we switch to
COMMIT_START and all of the same behavior as before is maintained.
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
During the ino lookup ioctl we can end up calling btrfs_iget() to get an
inode reference while we are holding on a root's btree. If btrfs_iget()
needs to lookup the inode from the root's btree, because it's not
currently loaded in memory, then it will need to lock another or the
same path in the same root btree. This may result in a deadlock and
trigger the following lockdep splat:
WARNING: possible circular locking dependency detected
6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Not tainted
------------------------------------------------------
syz-executor277/5012 is trying to acquire lock:
ffff88802df41710 (btrfs-tree-01){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
but task is already holding lock:
ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (btrfs-tree-00){++++}-{3:3}:
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_search_slot+0x13a4/0x2f80 fs/btrfs/ctree.c:2302
btrfs_init_root_free_objectid+0x148/0x320 fs/btrfs/disk-io.c:4955
btrfs_init_fs_root fs/btrfs/disk-io.c:1128 [inline]
btrfs_get_root_ref+0x5ae/0xae0 fs/btrfs/disk-io.c:1338
btrfs_get_fs_root fs/btrfs/disk-io.c:1390 [inline]
open_ctree+0x29c8/0x3030 fs/btrfs/disk-io.c:3494
btrfs_fill_super+0x1c7/0x2f0 fs/btrfs/super.c:1154
btrfs_mount_root+0x7e0/0x910 fs/btrfs/super.c:1519
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
fc_mount fs/namespace.c:1112 [inline]
vfs_kern_mount+0xbc/0x150 fs/namespace.c:1142
btrfs_mount+0x39f/0xb50 fs/btrfs/super.c:1579
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (btrfs-tree-01){++++}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(btrfs-tree-00);
lock(btrfs-tree-01);
lock(btrfs-tree-00);
rlock(btrfs-tree-01);
*** DEADLOCK ***
1 lock held by syz-executor277/5012:
#0: ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
stack backtrace:
CPU: 1 PID: 5012 Comm: syz-executor277 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
__btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f0bec94ea39
Fix this simply by releasing the path before calling btrfs_iget() as at
point we don't need the path anymore.
Reported-by: [email protected]
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl")
CC: [email protected] # 4.19+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Commit 675dfe1223a6 ("btrfs: fix block group item corruption after
inserting new block group") fixed one race that resulted in not persisting
a block group's item when its "used" bytes field decreases to zero.
However there's another race that can happen in a much shorter time window
that results in the same problem. The following sequence of steps explains
how it can happen:
1) Task A creates a metadata block group X, its "used" and "commit_used"
fields are initialized to 0;
2) Two extents are allocated from block group X, so its "used" field is
updated to 32K, and its "commit_used" field remains as 0;
3) Transaction commit starts, by some task B, and it enters
btrfs_start_dirty_block_groups(). There it tries to update the block
group item for block group X, which currently has its "used" field with
a value of 32K and its "commit_used" field with a value of 0. However
that fails since the block group item was not yet inserted, so at
update_block_group_item(), the btrfs_search_slot() call returns 1, and
then we set 'ret' to -ENOENT. Before jumping to the label 'fail'...
4) The block group item is inserted by task A, when for example
btrfs_create_pending_block_groups() is called when releasing its
transaction handle. This results in insert_block_group_item() inserting
the block group item in the extent tree (or block group tree), with a
"used" field having a value of 32K and setting "commit_used", in struct
btrfs_block_group, to the same value (32K);
5) Task B jumps to the 'fail' label and then resets the "commit_used"
field to 0. At btrfs_start_dirty_block_groups(), because -ENOENT was
returned from update_block_group_item(), we add the block group again
to the list of dirty block groups, so that we will try again in the
critical section of the transaction commit when calling
btrfs_write_dirty_block_groups();
6) Later the two extents from block group X are freed, so its "used" field
becomes 0;
7) If no more extents are allocated from block group X before we get into
btrfs_write_dirty_block_groups(), then when we call
update_block_group_item() again for block group X, we will not update
the block group item to reflect that it has 0 bytes used, because the
"used" and "commit_used" fields in struct btrfs_block_group have the
same value, a value of 0.
As a result after committing the transaction we have an empty block
group with its block group item having a 32K value for its "used" field.
This will trigger errors from fsck ("btrfs check" command) and after
mounting again the fs, the cleaner kthread will not automatically delete
the empty block group, since its "used" field is not 0. Possibly there
are other issues due to this inconsistency.
When this issue happens, the error reported by fsck is like this:
[1/7] checking root items
[2/7] checking extents
block group [1104150528 1073741824] used 39796736 but extent items used 0
ERROR: errors found in extent allocation tree or chunk allocation
(...)
So fix this by not resetting the "commit_used" field of a block group when
we don't find the block group item at update_block_group_item().
Fixes: 7248e0cebbef ("btrfs: skip update of block group item if used bytes are the same")
CC: [email protected] # 6.2+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This reverts commit 3fa6456ebe13adab3ba1817c8e515a5b88f95dce.
The Commit broke the CMA region creation through DT on arm64,
as showed below logs with "memblock=debug":
[ 0.000000] memblock_phys_alloc_range: 41943040 bytes align=0x200000
from=0x0000000000000000 max_addr=0x00000000ffffffff
early_init_dt_alloc_reserved_memory_arch+0x34/0xa0
[ 0.000000] memblock_reserve: [0x00000000fd600000-0x00000000ffdfffff]
memblock_alloc_range_nid+0xc0/0x19c
[ 0.000000] Reserved memory: overlap with other memblock reserved region
>From call flow, region we defined in DT was always reserved before entering
into rmem_cma_setup. Also, rmem_cma_setup has one routine cma_init_reserved_mem
to ensure the region was reserved. Checking the region not reserved here seems
not correct.
early_init_fdt_scan_reserved_mem:
fdt_scan_reserved_mem
__reserved_mem_reserve_reg
early_init_dt_reserve_memory
memblock_reserve(using “reg” prop case)
fdt_init_reserved_mem
__reserved_mem_alloc_size
*early_init_dt_alloc_reserved_memory_arch*
memblock_reserve(dynamic alloc case)
__reserved_mem_init_node
rmem_cma_setup(region overlap check here should always fail)
Example DT can be used to reproduce issue:
dump_mem: mem_dump_region {
compatible = "shared-dma-pool";
alloc-ranges = <0x0 0x00000000 0x0 0xffffffff>;
reusable;
size = <0 0x2800000>;
};
Signed-off-by: Zhenhua Huang <[email protected]>
|
|
I got the below warning when do fuzzing test:
unregister_netdevice: waiting for bond0 to become free. Usage count = 2
It can be repoduced via:
ip link add bond0 type bond
sysctl -w net.ipv4.conf.bond0.promote_secondaries=1
ip addr add 4.117.174.103/0 scope 0x40 dev bond0
ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0
ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0
ip addr del 4.117.174.103/0 scope 0x40 dev bond0
ip link delete bond0 type bond
In this reproduction test case, an incorrect 'last_prim' is found in
__inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40)
is lost. The memory of the secondary address is leaked and the reference of
in_device and net_device is leaked.
Fix this problem:
Look for 'last_prim' starting at location of the deleted IP and inserting
the promoted IP into the location of 'last_prim'.
Fixes: 0ff60a45678e ("[IPV4]: Fix secondary IP addresses after promotion")
Signed-off-by: Liu Jian <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull drm fixes from Dave Airlie:
"Regular rounds of rc1 fixes, a large bunch for amdgpu since it's three
weeks in one go, one i915, one nouveau and one ivpu.
I think there might be a few more fixes in misc that I haven't pulled
in yet, but we should get them all for rc2.
amdgpu:
- Display replay fixes
- Fixes for headless boards
- Fix documentation breakage
- RAS fixes
- Handle newer IP discovery tables
- SMU 13.0.6 fixes
- SR-IOV fixes
- Display vstartup fixes
- NBIO 7.9 fixes
- Display scaling mode fixes
- Debugfs power reporting fix
- GC 9.4.3 fixes
- Dirty framebuffer fixes for fbcon
- eDP fixes
- DCN 3.1.5 fix
- Display ODM fixes
- GPU core dump fix
- Re-enable zops property now that IGT test is fixed
- Fix possible UAF in CS code
- Cursor degamma fix
amdkfd:
- HMM fixes
- Interrupt masking fix
- GFX11 MQD fixes
i915:
- Mark requests for GuC virtual engines to avoid use-after-free
nouveau:
- Fix fence state in nouveau_fence_emit()
ivpu:
- replace strncpy"
* tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm: (51 commits)
drm/amdgpu: Restrict bootloader wait to SMUv13.0.6
drm/amd/display: prevent potential division by zero errors
drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma
drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1
Revert "drm/amd/display: Remove v_startup workaround for dcn3+"
drm/amdgpu: fix amdgpu_cs_p1_user_fence
Revert "Revert "drm/amd/display: Implement zpos property""
drm/amdkfd: Add missing gfx11 MQD manager callbacks
drm/amdgpu: Free ras cmd input buffer properly
drm/amdgpu: Hide xcp partition sysfs under SRIOV
drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting
drm/amdkfd: use mask to get v9 interrupt sq data bits correctly
drm/amdgpu: Allocate coredump memory in a nonblocking way
drm/amdgpu: Support query ecc cap for aqua_vanjaram
drm/amdgpu: Add umc_info v4_0 structure
drm/amd/display: always switch off ODM before committing more streams
drm/amd/display: Remove wait while locked
drm/amd/display: update blank state on ODM changes
drm/amd/display: Add smu write msg id fail retry process
drm/amdgpu: Add SMU v13.0.6 default reset methods
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking updates from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- eth: stmmac: fix failure to probe without MAC interface specified
Current release - new code bugs:
- docs: netlink: fix missing classic_netlink doc reference
Previous releases - regressions:
- deal with integer overflows in kmalloc_reserve()
- use sk_forward_alloc_get() in sk_get_meminfo()
- bpf_sk_storage: fix the missing uncharge in sk_omem_alloc
- fib: avoid warn splat in flow dissector after packet mangling
- skb_segment: call zero copy functions before using skbuff frags
- eth: sfc: check for zero length in EF10 RX prefix
Previous releases - always broken:
- af_unix: fix msg_controllen test in scm_pidfd_recv() for
MSG_CMSG_COMPAT
- xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()
- netfilter:
- nft_exthdr: fix non-linear header modification
- xt_u32, xt_sctp: validate user space input
- nftables: exthdr: fix 4-byte stack OOB write
- nfnetlink_osf: avoid OOB read
- one more fix for the garbage collection work from last release
- igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
- bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t
- handshake: fix null-deref in handshake_nl_done_doit()
- ip: ignore dst hint for multipath routes to ensure packets are
hashed across the nexthops
- phy: micrel:
- correct bit assignments for cable test errata
- disable EEE according to the KSZ9477 errata
Misc:
- docs/bpf: document compile-once-run-everywhere (CO-RE) relocations
- Revert "net: macsec: preserve ingress frame ordering", it appears
to have been developed against an older kernel, problem doesn't
exist upstream"
* tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
Revert "net: team: do not use dynamic lockdep key"
net: hns3: remove GSO partial feature bit
net: hns3: fix the port information display when sfp is absent
net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
net: hns3: fix debugfs concurrency issue between kfree buffer and read
net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
net: hns3: Support query tx timeout threshold by debugfs
net: hns3: fix tx timeout issue
net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
s390/bpf: Pass through tail call counter in trampolines
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull more devicetree updates from Rob Herring:
"A couple of conversions which didn't get picked up by the subsystems
and one fix:
- Convert st,stih407-irq-syscfg and Omnivision OV7251 bindings to DT
schema
- Merge Omnivision OV5695 into OV5693 binding
- Fix of_overlay_fdt_apply prototype when !CONFIG_OF_OVERLAY"
* tag 'devicetree-fixes-for-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: irqchip: convert st,stih407-irq-syscfg to DT schema
media: dt-bindings: Convert Omnivision OV7251 to DT schema
media: dt-bindings: Merge OV5695 into OV5693 binding
of: overlay: Fix of_overlay_fdt_apply prototype when !CONFIG_OF_OVERLAY
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"Various cleanups and fixes across the board"
* tag 'pwm/for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (31 commits)
pwm: lpc32xx: Remove handling of PWM channels
pwm: atmel: Simplify using devm functions
dt-bindings: pwm: brcm,kona-pwm: convert to YAML
pwm: stmpe: Handle errors when disabling the signal
pwm: stm32: Simplify using devm_pwmchip_add()
pwm: stm32: Don't modify HW state in .remove() callback
pwm: Fix order of freeing resources in pwmchip_remove()
pwm: ntxec: Use device_set_of_node_from_dev()
pwm: ntxec: Drop a write-only variable from driver data
pwm: pxa: Don't reimplement of_device_get_match_data()
pwm: lpc18xx-sct: Simplify using devm_clk_get_enabled()
pwm: atmel-tcb: Don't track polarity in driver data
pwm: atmel-tcb: Unroll atmel_tcb_pwm_set_polarity() into only caller
pwm: atmel-tcb: Put per-channel data into driver data
pwm: atmel-tcb: Fix resource freeing in error path and remove
pwm: atmel-tcb: Harmonize resource allocation order
pwm: Drop unused #include <linux/radix-tree.h>
pwm: rz-mtu3: Fix build warning 'num_channel_ios' not described
pwm: Remove outdated documentation for pwmchip_remove()
pwm: atmel: Enable clk when pwm already enabled in bootloader
...
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-6.6-2023-09-06:
amdgpu:
- Display replay fixes
- Fixes for headless boards
- Fix documentation breakage
- RAS fixes
- Handle newer IP discovery tables
- SMU 13.0.6 fixes
- SR-IOV fixes
- Display vstartup fixes
- NBIO 7.9 fixes
- Display scaling mode fixes
- Debugfs power reporting fix
- GC 9.4.3 fixes
- Dirty framebuffer fixes for fbcon
- eDP fixes
- DCN 3.1.5 fix
- Display ODM fixes
- GPU core dump fix
- Re-enable zops property now that IGT test is fixed
- Fix possible UAF in CS code
- Cursor degamma fix
amdkfd:
- HMM fixes
- Interrupt masking fix
- GFX11 MQD fixes
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Mark requests for GuC virtual engines to avoid use-after-free (Andrzej).
Signed-off-by: Dave Airlie <[email protected]>
From: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Subsystem:
- Add a way for drivers to tell the core the supported alarm range is
smaller than the date range. This is not used yet but will be
useful for the alarmtimers in the next release.
- fix Wvoid-pointer-to-enum-cast warnings
- remove redundant of_match_ptr()
- stop warning for invalid alarms when the alarm is disabled
Drivers:
- isl12022: allow setting the trip level for battery level detection
- pcf2127: add support for PCF2131 and multiple timestamps
- stm32: time precision improvement, many fixes
- twl: NVRAM support"
* tag 'rtc-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (73 commits)
dt-bindings: rtc: ds3231: Remove text binding
rtc: wm8350: remove unnecessary messages
rtc: twl: remove unnecessary messages
rtc: sun6i: remove unnecessary message
rtc: stop warning for invalid alarms when the alarm is disabled
rtc: twl: add NVRAM support
rtc: pcf85363: Allow to wake up system without IRQ
rtc: m48t86: add DT support for m48t86
dt-bindings: rtc: Add ST M48T86
rtc: pcf2127: remove useless check
rtc: rzn1: Report maximum alarm limit to rtc core
rtc: ds1305: Report maximum alarm limit to rtc core
rtc: tps6586x: Report maximum alarm limit to rtc core
rtc: cmos: Report supported alarm limit to rtc infrastructure
rtc: cros-ec: Detect and report supported alarm window size
rtc: Add support for limited alarm timer offsets
rtc: isl1208: Fix incorrect logic in isl1208_set_xtoscb()
MAINTAINERS: remove obsolete pattern in RTC SUBSYSTEM section
rtc: tps65910: Remove redundant dev_warn() and do not check for 0 return after calling platform_get_irq()
rtc: omap: Do not check for 0 return after calling platform_get_irq()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"Core:
- Fix SETDASA when static and dynamic adress are equal
- Fix cmd_v1 DAA exit criteria
Drivers:
- svc: allow probing without any device"
* tag 'i3c/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: svc: fix probe failure when no i3c device exist
i3c: master: Fix SETDASA process
dt-bindings: i3c: Fix description for assigned-address
i3c: master: svc: Describe member 'saved_regs'
i3c: master: svc: Do not check for 0 return after calling platform_get_irq()
i3c/master: cmd_v1: Fix the exit criteria for the daa procedure
i3c: Explicitly include correct DT includes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes that came in during the merge window, both driver
specific - one for a bug that came up in testing, one for a bug due
to a misreading of the datasheet"
* tag 'regulator-fix-v6.6-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: tps6594-regulator: Fix random kernel crash
regulator: tps6287x: Fix n_voltages
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A couple of fixes for the sun6i driver. The patch to reduce DMA RX to
single byte width all the time is *hopefully* excessively cautious but
it's unclear which SoCs are affected so the fix just covers everything
for safety"
* tag 'spi-fix-v6.6-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: sun6i: fix race between DMA RX transfer completion and RX FIFO drain
spi: sun6i: reduce DMA RX transfer width to single byte
|
|
Pull kvm updates from Paolo Bonzini:
"ARM:
- Clean up vCPU targets, always returning generic v8 as the preferred
target
- Trap forwarding infrastructure for nested virtualization (used for
traps that are taken from an L2 guest and are needed by the L1
hypervisor)
- FEAT_TLBIRANGE support to only invalidate specific ranges of
addresses when collapsing a table PTE to a block PTE. This avoids
that the guest refills the TLBs again for addresses that aren't
covered by the table PTE.
- Fix vPMU issues related to handling of PMUver.
- Don't unnecessary align non-stack allocations in the EL2 VA space
- Drop HCR_VIRT_EXCP_MASK, which was never used...
- Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
parameter instead
- Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
- Remove prototypes without implementations
RISC-V:
- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
- Added ONE_REG interface for SATP mode
- Added ONE_REG interface to enable/disable multiple ISA extensions
- Improved error codes returned by ONE_REG interfaces
- Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
- Added get-reg-list selftest for KVM RISC-V
s390:
- PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
- Guest debug fixes (Ilya)
x86:
- Clean up KVM's handling of Intel architectural events
- Intel bugfixes
- Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
debug registers and generate/handle #DBs
- Clean up LBR virtualization code
- Fix a bug where KVM fails to set the target pCPU during an IRTE
update
- Fix fatal bugs in SEV-ES intrahost migration
- Fix a bug where the recent (architecturally correct) change to
reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
skip it)
- Retry APIC map recalculation if a vCPU is added/enabled
- Overhaul emergency reboot code to bring SVM up to par with VMX, tie
the "emergency disabling" behavior to KVM actually being loaded,
and move all of the logic within KVM
- Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
TSC ratio MSR cannot diverge from the default when TSC scaling is
disabled up related code
- Add a framework to allow "caching" feature flags so that KVM can
check if the guest can use a feature without needing to search
guest CPUID
- Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
- Fix KVM's handling of !visible guest roots to avoid premature
triple fault injection
- Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
API surface that is needed by external users (currently only
KVMGT), and fix a variety of issues in the process
Generic:
- Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
events to pass action specific data without needing to constantly
update the main handlers.
- Drop unused function declarations
Selftests:
- Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
- Add support for printf() in guest code and covert all guest asserts
to use printf-based reporting
- Clean up the PMU event filter test and add new testcases
- Include x86 selftests in the KVM x86 MAINTAINERS entry"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
KVM: x86/mmu: Include mmu.h in spte.h
KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
KVM: x86/mmu: Disallow guest from using !visible slots for page tables
KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
KVM: x86/mmu: Harden new PGD against roots without shadow pages
KVM: x86/mmu: Add helper to convert root hpa to shadow page
drm/i915/gvt: Drop final dependencies on KVM internal details
KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
KVM: x86/mmu: Assert that correct locks are held for page write-tracking
KVM: x86/mmu: Rename page-track APIs to reflect the new reality
KVM: x86/mmu: Drop infrastructure for multiple page-track modes
KVM: x86/mmu: Use page-track notifiers iff there are external users
KVM: x86/mmu: Move KVM-only page-track declarations to internal header
KVM: x86: Remove the unused page-track hook track_flush_slot()
drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
KVM: x86: Add a new page-track hook to handle memslot deletion
drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
KVM: x86: Reject memslot MOVE operations if KVMGT is attached
...
|
|
When user resize all trace ring buffer through file 'buffer_size_kb',
then in ring_buffer_resize(), kernel allocates buffer pages for each
cpu in a loop.
If the kernel preemption model is PREEMPT_NONE and there are many cpus
and there are many buffer pages to be allocated, it may not give up cpu
for a long time and finally cause a softlockup.
To avoid it, call cond_resched() after each cpu buffer allocation.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: <[email protected]>
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The event inject files add events for a specific trace array. For an
instance, if the file is opened and the instance is deleted, reading or
writing to the file will cause a use after free.
Up the ref count of the trace_array when a event inject file is opened.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Zheng Yejian <[email protected]>
Fixes: 6c3edaf9fd6a ("tracing: Introduce trace event injection")
Tested-by: Linux Kernel Functional Testing <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The option files update the options for a given trace array. For an
instance, if the file is opened and the instance is deleted, reading or
writing to the file will cause a use after free.
Up the ref count of the trace_array when an option file is opened.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Zheng Yejian <[email protected]>
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The current_trace updates the trace array tracer. For an instance, if the
file is opened and the instance is deleted, reading or writing to the file
will cause a use after free.
Up the ref count of the trace array when current_trace is opened.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Zheng Yejian <[email protected]>
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The tracing_max_latency file points to the trace_array max_latency field.
For an instance, if the file is opened and the instance is deleted,
reading or writing to the file will cause a use after free.
Up the ref count of the trace_array when tracing_max_latency is opened.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Zheng Yejian <[email protected]>
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
Short summary of fixes pull:
* ivpu: Replace strncpy
* nouveau: Fix fence state in nouveau_fence_emit()
Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20230901070123.GA6987@linux-uq9g
|
|
When the trace event enable and filter files are opened, increment the
trace array ref counter, otherwise they can be accessed when the trace
array is being deleted. The ref counter keeps the trace array from being
deleted while those files are opened.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>
Reported-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Currently when rmdir on an instance is done, eventfs_remove_events_dir()
is called and it does a dput on the dentry and then frees the
eventfs_inode that represents the events directory.
But there's no protection against a reader reading the top level events
directory at the same time and we can get a use after free error. Instead,
use the dput() associated to the dentry to also free the eventfs_inode
associated to the events directory, as that will get called when the last
reference to the directory is released.
This issue triggered the following KASAN report:
==================================================================
BUG: KASAN: slab-use-after-free in eventfs_root_lookup+0x88/0x1b0
Read of size 8 at addr ffff888120130ca0 by task ftracetest/1201
CPU: 4 PID: 1201 Comm: ftracetest Not tainted 6.5.0-test-10737-g469e0a8194e7 #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x90
print_report+0xcf/0x670
? __pfx_ring_buffer_record_off+0x10/0x10
? _raw_spin_lock_irqsave+0x2b/0x70
? __virt_addr_valid+0xd9/0x160
kasan_report+0xd4/0x110
? eventfs_root_lookup+0x88/0x1b0
? eventfs_root_lookup+0x88/0x1b0
eventfs_root_lookup+0x88/0x1b0
? eventfs_root_lookup+0x33/0x1b0
__lookup_slow+0x194/0x2a0
? __pfx___lookup_slow+0x10/0x10
? down_read+0x11c/0x330
walk_component+0x166/0x220
link_path_walk.part.0.constprop.0+0x3a3/0x5a0
? seqcount_lockdep_reader_access+0x82/0x90
? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10
path_openat+0x143/0x11f0
? __lock_acquire+0xa1a/0x3220
? __pfx_path_openat+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
do_filp_open+0x166/0x290
? __pfx_do_filp_open+0x10/0x10
? lock_is_held_type+0xce/0x120
? preempt_count_sub+0xb7/0x100
? _raw_spin_unlock+0x29/0x50
? alloc_fd+0x1a0/0x320
do_sys_openat2+0x126/0x160
? rcu_is_watching+0x34/0x60
? __pfx_do_sys_openat2+0x10/0x10
? __might_resched+0x2cf/0x3b0
? __fget_light+0xdf/0x100
__x64_sys_openat+0xcd/0x140
? __pfx___x64_sys_openat+0x10/0x10
? syscall_enter_from_user_mode+0x22/0x90
? lockdep_hardirqs_on+0x7d/0x100
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7f1dceef5e51
Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d 9a 27 0e 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 93 00 00 00 48 8b 54 24 28 64 48 2b 14 25
RSP: 002b:00007fff2cddf380 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000241 RCX: 00007f1dceef5e51
RDX: 0000000000000241 RSI: 000055d7520677d0 RDI: 00000000ffffff9c
RBP: 000055d7520677d0 R08: 000000000000001e R09: 0000000000000001
R10: 00000000000001b6 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000003 R14: 000055d752035678 R15: 000055d752067788
</TASK>
Allocated by task 1200:
kasan_save_stack+0x2f/0x50
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x8b/0x90
eventfs_create_events_dir+0x54/0x220
create_event_toplevel_files+0x42/0x130
event_trace_add_tracer+0x33/0x180
trace_array_create_dir+0x52/0xf0
trace_array_create+0x361/0x410
instance_mkdir+0x6b/0xb0
tracefs_syscall_mkdir+0x57/0x80
vfs_mkdir+0x275/0x380
do_mkdirat+0x1da/0x210
__x64_sys_mkdir+0x74/0xa0
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Freed by task 1251:
kasan_save_stack+0x2f/0x50
kasan_set_track+0x21/0x30
kasan_save_free_info+0x27/0x40
__kasan_slab_free+0x106/0x180
__kmem_cache_free+0x149/0x2e0
event_trace_del_tracer+0xcb/0x120
__remove_instance+0x16a/0x340
instance_rmdir+0x77/0xa0
tracefs_syscall_rmdir+0x77/0xc0
vfs_rmdir+0xed/0x2d0
do_rmdir+0x235/0x280
__x64_sys_rmdir+0x5f/0x90
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
The buggy address belongs to the object at ffff888120130ca0
which belongs to the cache kmalloc-16 of size 16
The buggy address is located 0 bytes inside of
freed 16-byte region [ffff888120130ca0, ffff888120130cb0)
The buggy address belongs to the physical page:
page:000000004dbddbb0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x120130
flags: 0x17ffffc0000800(slab|node=0|zone=2|lastcpupid=0x1fffff)
page_type: 0xffffffff()
raw: 0017ffffc0000800 ffff8881000423c0 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000800080 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888120130b80: 00 00 fc fc 00 05 fc fc 00 00 fc fc 00 02 fc fc
ffff888120130c00: 00 07 fc fc 00 00 fc fc 00 00 fc fc fa fb fc fc
>ffff888120130c80: 00 00 fc fc fa fb fc fc 00 00 fc fc 00 00 fc fc
^
ffff888120130d00: 00 00 fc fc 00 00 fc fc 00 00 fc fc fa fb fc fc
ffff888120130d80: 00 00 fc fc 00 00 fc fc 00 00 fc fc 00 00 fc fc
==================================================================
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Cc: Ajay Kaher <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: 5bdcd5f5331a2 eventfs: ("Implement removal of meta data from eventfs")
Tested-by: Linux Kernel Functional Testing <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>
Reported-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
jbd2_alloc() allocates a buffer from slab when the block size is smaller
than PAGE_SIZE, and slab may be using a compound page. Before commit
8147c4c4546f, we set b_page to the precise page containing the buffer
and this code worked well. Now we set b_page to the head page of the
allocation, so we can no longer use offset_in_page(). While we could
do a 1:1 replacement with offset_in_folio(), use the more idiomatic
bh_offset() and the folio APIs to map the buffer.
This isn't enough to support a b_size larger than PAGE_SIZE on HIGHMEM
machines, but this is good enough to fix the actual bug we're seeing.
Fixes: 8147c4c4546f ("jbd2: use a folio in jbd2_journal_write_metadata_buffer()")
Reported-by: Zorro Lang <[email protected]>
Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
[converted to be more folio]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
|
|
If the buffer pointed to by the buffer_head is part of a compound page,
bh_offset() assumes that b_page is the precise page that contains
the data. A recent change to jbd2 inadvertently violated that assumption.
By using page_size(), we support both b_page being set to the head page
(as page_size() will return the size of the entire folio) and the precise
page (as it will return PAGE_SIZE for a tail page).
Fixes: 8147c4c4546f ("jbd2: use a folio in jbd2_journal_write_metadata_buffer()")
Reported-by: Zorro Lang <[email protected]>
Tested-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
|
|
enetc_psi_create() returns an ERR_PTR() or a valid station interface
pointer, but checking for the non-NULL quality of the return code blurs
that difference away. So if enetc_psi_create() fails, we call
enetc_psi_destroy() when we shouldn't. This will likely result in
crashes, since enetc_psi_create() cleans up everything after itself when
it returns an ERR_PTR().
Fixes: f0168042a212 ("net: enetc: reimplement RFS/RSS memory clearing as PCI quirk")
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Suggested-by: Dan Carpenter <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This reverts commit 39285e124edbc752331e98ace37cc141a6a3747a.
Looks like the change has unintended consequences in exposing
objects before they are initialized. Let's drop this patch
and try again in net-next.
Reported-by: [email protected]
Fixes: 39285e124edb ("net: team: do not use dynamic lockdep key")
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- A couple of virtual vs physical address confusion fixes
- Rework locking in dcssblk driver to address a lockdep warning
- Remove support for "noexec" kernel command line option since there is
no use case where it would make sense
- Simplify kernel mapping setup and get rid of quite a bit of code
- Add architecture specific __set_memory_yy() functions which allow us
to modify kernel mappings. Unlike the set_memory_xx() variants they
take void pointer start and end parameters, which allows using them
without the usual casts, and also to use them on areas larger than
8TB.
Note that the set_memory_xx() family comes with an int num_pages
parameter which overflows with 8TB. This could be addressed by
changing the num_pages parameter to unsigned long, however requires
to change all architectures, since the module code expects an int
parameter (see module_set_memory()).
This was indeed an issue since for debug_pagealloc() we call
set_memory_4k() on the whole identity mapping. Therefore address this
for now with the __set_memory_yy() variant, and address common code
later
- Use dev_set_name() and also fix memory leak in zcrypt driver error
handling
- Remove unused lsi_mask from airq_struct
- Add warning for invalid kernel mapping requests
* tag 's390-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vmem: do not silently ignore mapping limit
s390/zcrypt: utilize dev_set_name() ability to use a formatted string
s390/zcrypt: don't leak memory if dev_set_name() fails
s390/mm: fix MAX_DMA_ADDRESS physical vs virtual confusion
s390/airq: remove lsi_mask from airq_struct
s390/mm: use __set_memory() variants where useful
s390/set_memory: add __set_memory() variant
s390/set_memory: generate all set_memory() functions
s390/mm: improve description of mapping permissions of prefix pages
s390/amode31: change type of __samode31, __eamode31, etc
s390/mm: simplify kernel mapping setup
s390: remove "noexec" option
s390/vmem: fix virtual vs physical address confusion
s390/dcssblk: fix lockdep warning
s390/monreader: fix virtual vs physical address confusion
|
|
Pull MIPS updates from Thomas Bogendoerfer:
"Just cleanups and fixes"
* tag 'mips_6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: TXx9: Do PCI error checks on own line
arch/mips/configs/*_defconfig cleanup
MIPS: VDSO: Conditionally export __vdso_gettimeofday()
Mips: loongson3_defconfig: Enable ast drm driver by default
mips: remove <asm/export.h>
mips: replace #include <asm/export.h> with #include <linux/export.h>
mips: remove unneeded #include <asm/export.h>
MIPS: Loongson64: Fix more __iomem attributes
MIPS: loongson32: Remove regs-rtc.h
MIPS: loongson32: Remove regs-clk.h
MIPS: More explicit DT include clean-ups
MIPS: Fixup explicit DT include clean-up
Revert MIPS: Loongson: Fix build error when make modules_install
MIPS: Only fiddle with CHECKFLAGS if `need-compiler'
MIPS: Fix CONFIG_CPU_DADDI_WORKAROUNDS `modules_install' regression
MIPS: Explicitly include correct DT includes
|
|
Pull xtensa updates from Max Filippov:
- enable MTD XIP support
- fix base address of the xtensa perf module in newer hardware
* tag 'xtensa-20230905' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: add XIP-aware MTD support
xtensa: PMU: fix base address for the newer hardware
|
|
Recently we moved most cleanup from ntfs_put_super() into
ntfs3_kill_sb() as part of a bigger cleanup. This accidently also moved
dropping inode references stashed in ntfs3's sb->s_fs_info from
@sb->put_super() to @sb->kill_sb(). But generic_shutdown_super()
verifies that there are no busy inodes past sb->put_super(). Fix this
and disentangle dropping inode references from freeing @sb->s_fs_info.
Fixes: a4f64a300a29 ("ntfs3: free the sbi in ->kill_sb") # mainline only
Reported-by: Guenter Roeck <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Mateusz reports that glibc turns 'fstat()' calls into 'fstatat()', and
that seems to have been going on for quite a long time due to glibc
having tried to simplify its stat logic into just one point.
This turns out to cause completely unnecessary overhead, where we then
go off and allocate the kernel side pathname, and actually look up the
empty path. Sure, our path lookup is quite optimized, but it still
causes a fair bit of allocation overhead and a couple of completely
unnecessary rounds of lockref accesses etc.
This is all hopefully getting fixed in user space, and there is a patch
floating around for just having glibc use the native fstat() system
call. But even with the current situation we can at least improve on
things by catching the situation and short-circuiting it.
Note that this is still measurably slower than just a plain 'fstat()',
since just checking that the filename is actually empty is somewhat
expensive due to inevitable user space access overhead from the kernel
(ie verifying pointers, and SMAP on x86). But it's still quite a bit
faster than actually looking up the path for real.
To quote numers from Mateusz:
"Sapphire Rapids, will-it-scale, ops/s
stock fstat 5088199
patched fstat 7625244 (+49%)
real fstat 8540383 (+67% / +12%)"
where that 'stock fstat' is the glibc translation of fstat into
fstatat() with an empty path, the 'patched fstat' is with this short
circuiting of the path lookup, and the 'real fstat' is the actual native
fstat() system call with none of this overhead.
Link: https://lore.kernel.org/lkml/20230903204858.lv7i3kqvw6eamhgz@f/
Reported-by: Mateusz Guzik <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 254986e324ad ("drm/radeon: Use the drm suballocation manager implementation.")
made the fence wait in amdgpu_sa_bo_new() interruptible but there is no
code to handle an interrupt. This caused the kernel to randomly explode
in high-VRAM-pressure situations so make it uninterruptible again.
Fixes: 254986e324ad ("drm/radeon: Use the drm suballocation manager implementation.")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2769
Signed-off-by: Alex Deucher <[email protected]>
CC: [email protected] # 6.4+
CC: Simon Pilkington <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Christian König <[email protected]>
|
|
This reverts commit b484a40dc1f16edb58e5430105a021e1916e6f27.
This commit cancels all requests with io-wq, not just the ones from the
originating task. This breaks use cases that have thread pools, or just
multiple tasks issuing requests on the same ring. The liburing
regression test for this also shows that problem:
$ test/thread-exit.t
cqe->res=-125, Expected 512
where an IO thread gets its request canceled rather than complete
successfully.
Signed-off-by: Jens Axboe <[email protected]>
|
|
[ 71.490669] WARNING: CPU: 3 PID: 17070 at io_uring/io_uring.c:769
io_cqring_event_overflow+0x47b/0x6b0
[ 71.498381] Call Trace:
[ 71.498590] <TASK>
[ 71.501858] io_req_cqe_overflow+0x105/0x1e0
[ 71.502194] __io_submit_flush_completions+0x9f9/0x1090
[ 71.503537] io_submit_sqes+0xebd/0x1f00
[ 71.503879] __do_sys_io_uring_enter+0x8c5/0x2380
[ 71.507360] do_syscall_64+0x39/0x80
We decoupled CQ locking from ->task_complete but haven't fixed up places
forcing locking for CQ overflows.
Fixes: ec26c225f06f5 ("io_uring: merge iopoll and normal completion paths")
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
io-wq will retry iopoll even when it failed with -EAGAIN. If that
races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers,
such workers might potentially infinitely spin retrying iopoll again and
again and each time failing on some allocation / waiting / etc. Don't
keep spinning if io-wq is dying.
Fixes: 561fb04a6a225 ("io_uring: replace workqueue usage with io-wq")
Cc: [email protected]
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This reverts commit 3e00123a13d824d63072b1824c9da59cd78356d9.
No, we never export random symbols for out of tree modules.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Acked-by: Petr Mladek <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.6
A bunch of fixes and new IDs that came in since the initial pull request
- all driver specific and nothing too exciting.
There's a trivial conflict in the AMD driver ID table due to the last
v6.5 fixes not having been merged up.
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
This PR contains nf_tables updates for your *net* tree.
This time almost all fixes are for old bugs:
First patch fixes a 4-byte stack OOB write, from myself.
This was broken ever since nftables was switches from 128 to 32bit
register addressing in v4.1.
2nd patch fixes an out-of-bounds read.
This has been broken ever since xt_osf got added in 2.6.31, the bug
was then just moved around during refactoring, from Wander Lairson Costa.
3rd patch adds a missing enum description, from Phil Sutter.
4th patch fixes a UaF inftables that occurs when userspace adds
elements with a timeout so small that expiration happens while the
transaction is still in progress. Fix from Pablo Neira Ayuso.
Patch 5 fixes a memory out of bounds access, this was
broken since v4.20. Patch from Kyle Zeng and Jozsef Kadlecsik.
Patch 6 fixes another bogus memory access when building audit
record. Bug added in the previous pull request, fix from Pablo.
netfilter pull request 2023-09-06
* tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Although commit c2c24edb1d9c ("arm64: csum: Fix pathological zero-length
calls") added an early return for zero-length input, syzkaller has
popped up with an example of a _negative_ length which causes an
undefined shift and an out-of-bounds read:
| BUG: KASAN: slab-out-of-bounds in do_csum+0x44/0x254 arch/arm64/lib/csum.c:39
| Read of size 4294966928 at addr ffff0000d7ac0170 by task syz-executor412/5975
|
| CPU: 0 PID: 5975 Comm: syz-executor412 Not tainted 6.4.0-rc4-syzkaller-g908f31f2a05b #0
| Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
| Call trace:
| dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:233
| show_stack+0x2c/0x44 arch/arm64/kernel/stacktrace.c:240
| __dump_stack lib/dump_stack.c:88 [inline]
| dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
| print_address_description mm/kasan/report.c:351 [inline]
| print_report+0x174/0x514 mm/kasan/report.c:462
| kasan_report+0xd4/0x130 mm/kasan/report.c:572
| kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:187
| __kasan_check_read+0x20/0x30 mm/kasan/shadow.c:31
| do_csum+0x44/0x254 arch/arm64/lib/csum.c:39
| csum_partial+0x30/0x58 lib/checksum.c:128
| gso_make_checksum include/linux/skbuff.h:4928 [inline]
| __udp_gso_segment+0xaf4/0x1bc4 net/ipv4/udp_offload.c:332
| udp6_ufo_fragment+0x540/0xca0 net/ipv6/udp_offload.c:47
| ipv6_gso_segment+0x5cc/0x1760 net/ipv6/ip6_offload.c:119
| skb_mac_gso_segment+0x2b4/0x5b0 net/core/gro.c:141
| __skb_gso_segment+0x250/0x3d0 net/core/dev.c:3401
| skb_gso_segment include/linux/netdevice.h:4859 [inline]
| validate_xmit_skb+0x364/0xdbc net/core/dev.c:3659
| validate_xmit_skb_list+0x94/0x130 net/core/dev.c:3709
| sch_direct_xmit+0xe8/0x548 net/sched/sch_generic.c:327
| __dev_xmit_skb net/core/dev.c:3805 [inline]
| __dev_queue_xmit+0x147c/0x3318 net/core/dev.c:4210
| dev_queue_xmit include/linux/netdevice.h:3085 [inline]
| packet_xmit+0x6c/0x318 net/packet/af_packet.c:276
| packet_snd net/packet/af_packet.c:3081 [inline]
| packet_sendmsg+0x376c/0x4c98 net/packet/af_packet.c:3113
| sock_sendmsg_nosec net/socket.c:724 [inline]
| sock_sendmsg net/socket.c:747 [inline]
| __sys_sendto+0x3b4/0x538 net/socket.c:2144
Extend the early return to reject negative lengths as well, aligning our
implementation with the generic code in lib/checksum.c
Cc: Robin Murphy <[email protected]>
Fixes: 5777eaed566a ("arm64: Implement optimised checksum routine")
Reported-by: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|