aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-09-08selftests: fix dependency checker scriptRicardo B. Marliere1-12/+65
This patch fixes inconsistencies in the parsing rules of the levels 1 and 2 of the kselftest_deps.sh. It was added the levels 4 and 5 to account for a few edge cases that are present in some tests, also some minor identation styling have been fixed (s/ /\t/g). Signed-off-by: Ricardo B. Marliere <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-09-08kselftest/runner.sh: Propagate SIGTERM to runner childBjörn Töpel1-1/+2
Timeouts in kselftest are done using the "timeout" command with the "--foreground" option. Without the "foreground" option, it is not possible for a user to cancel the runner using SIGINT, because the signal is not propagated to timeout which is running in a different process group. The "forground" options places the timeout in the same process group as its parent, but only sends the SIGTERM (on timeout) signal to the forked process. Unfortunately, this does not play nice with all kselftests, e.g. "net:fcnal-test.sh", where the child processes will linger because timeout does not send SIGTERM to the group. Some users have noted these hangs [1]. Fix this by nesting the timeout with an additional timeout without the foreground option. Link: https://lore.kernel.org/all/[email protected]/ # [1] Fixes: 651e0d881461 ("kselftest/runner: allow to properly deliver signals to tests") Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-09-08MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.orgBhaskar Chowdhury3-3/+1
The wiki has been archived and is not updated anymore. Remove or replace the links in files that contain it (MAINTAINERS, Kconfig, docs). Signed-off-by: Bhaskar Chowdhury <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08btrfs: assert delayed node locked when removing delayed itemFilipe Manana1-4/+8
When removing a delayed item, or releasing which will remove it as well, we will modify one of the delayed node's rbtrees and item counter if the delayed item is in one of the rbtrees. This require having the delayed node's mutex locked, otherwise we will race with other tasks modifying the rbtrees and the counter. This is motivated by a previous version of another patch actually calling btrfs_release_delayed_item() after unlocking the delayed node's mutex and against a delayed item that is in a rbtree. So assert at __btrfs_remove_delayed_item() that the delayed node's mutex is locked. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08btrfs: remove BUG() after failure to insert delayed dir index itemFilipe Manana1-27/+47
Instead of calling BUG() when we fail to insert a delayed dir index item into the delayed node's tree, we can just release all the resources we have allocated/acquired before and return the error to the caller. This is fine because all existing call chains undo anything they have done before calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending snapshots in the transaction commit path). So remove the BUG() call and do proper error handling. This relates to a syzbot report linked below, but does not fix it because it only prevents hitting a BUG(), it does not fix the issue where somehow we attempt to use twice the same index number for different index items. Link: https://lore.kernel.org/linux-btrfs/[email protected]/ CC: [email protected] # 5.4+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08btrfs: improve error message after failure to add delayed dir index itemFilipe Manana1-3/+4
If we fail to add a delayed dir index item because there's already another item with the same index number, we print an error message (and then BUG). However that message isn't very helpful to debug anything because we don't know what's the index number and what are the values of index counters in the inode and its delayed inode (index_cnt fields of struct btrfs_inode and struct btrfs_delayed_node). So update the error message to include the index number and counters. We actually had a recent case where this issue was hit by a syzbot report (see the link below). Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08btrfs: fix a compilation error if DEBUG is defined in btree_dirty_folioQu Wenruo1-6/+8
[BUG] After commit 72a69cd03082 ("btrfs: subpage: pack all subpage bitmaps into a larger bitmap"), the DEBUG section of btree_dirty_folio() would no longer compile. [CAUSE] If DEBUG is defined, we would do extra checks for btree_dirty_folio(), mostly to make sure the range we marked dirty has an extent buffer and that extent buffer is dirty. For subpage, we need to iterate through all the extent buffers covered by that page range, and make sure they all matches the criteria. However commit 72a69cd03082 ("btrfs: subpage: pack all subpage bitmaps into a larger bitmap") changes how we store the bitmap, we pack all the 16 bits bitmaps into a larger bitmap, which would save some space. This means we no longer have btrfs_subpage::dirty_bitmap, instead the dirty bitmap is starting at btrfs_subpage_info::dirty_offset, and has a length of btrfs_subpage_info::bitmap_nr_bits. [FIX] Although I'm not sure if it still makes sense to maintain such code, at least let it compile. This patch would let us test the bits one by one through the bitmaps. CC: [email protected] # 6.1+ Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08btrfs: check for BTRFS_FS_ERROR in pending ordered assertJosef Bacik1-1/+1
If we do fast tree logging we increment a counter on the current transaction for every ordered extent we need to wait for. This means we expect the transaction to still be there when we clear pending on the ordered extent. However if we happen to abort the transaction and clean it up, there could be no running transaction, and thus we'll trip the "ASSERT(trans)" check. This is obviously incorrect, and the code properly deals with the case that the transaction doesn't exist. Fix this ASSERT() to only fire if there's no trans and we don't have BTRFS_FS_ERROR() set on the file system. CC: [email protected] # 4.14+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08btrfs: fix lockdep splat and potential deadlock after failure running ↵Filipe Manana1-3/+16
delayed items When running delayed items we are holding a delayed node's mutex and then we will attempt to modify a subvolume btree to insert/update/delete the delayed items. However if have an error during the insertions for example, btrfs_insert_delayed_items() may return with a path that has locked extent buffers (a leaf at the very least), and then we attempt to release the delayed node at __btrfs_run_delayed_items(), which requires taking the delayed node's mutex, causing an ABBA type of deadlock. This was reported by syzbot and the lockdep splat is the following: WARNING: possible circular locking dependency detected 6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0 Not tainted ------------------------------------------------------ syz-executor.2/13257 is trying to acquire lock: ffff88801835c0c0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 but task is already holding lock: ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (btrfs-tree-00){++++}-{3:3}: __lock_release kernel/locking/lockdep.c:5475 [inline] lock_release+0x36f/0x9d0 kernel/locking/lockdep.c:5781 up_write+0x79/0x580 kernel/locking/rwsem.c:1625 btrfs_tree_unlock_rw fs/btrfs/locking.h:189 [inline] btrfs_unlock_up_safe+0x179/0x3b0 fs/btrfs/locking.c:239 search_leaf fs/btrfs/ctree.c:1986 [inline] btrfs_search_slot+0x2511/0x2f80 fs/btrfs/ctree.c:2230 btrfs_insert_empty_items+0x9c/0x180 fs/btrfs/ctree.c:4376 btrfs_insert_delayed_item fs/btrfs/delayed-inode.c:746 [inline] btrfs_insert_delayed_items fs/btrfs/delayed-inode.c:824 [inline] __btrfs_commit_inode_delayed_items+0xd24/0x2410 fs/btrfs/delayed-inode.c:1111 __btrfs_run_delayed_items+0x1db/0x430 fs/btrfs/delayed-inode.c:1153 flush_space+0x269/0xe70 fs/btrfs/space-info.c:723 btrfs_async_reclaim_metadata_space+0x106/0x350 fs/btrfs/space-info.c:1078 process_one_work+0x92c/0x12c0 kernel/workqueue.c:2600 worker_thread+0xa63/0x1210 kernel/workqueue.c:2751 kthread+0x2b8/0x350 kernel/kthread.c:389 ret_from_fork+0x2e/0x60 arch/x86/kernel/process.c:145 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 -> #0 (&delayed_node->mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761 __mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603 __mutex_lock kernel/locking/mutex.c:747 [inline] mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799 __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline] __btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156 btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276 btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988 vfs_fsync_range fs/sync.c:188 [inline] vfs_fsync fs/sync.c:202 [inline] do_fsync fs/sync.c:212 [inline] __do_sys_fsync fs/sync.c:220 [inline] __se_sys_fsync fs/sync.c:218 [inline] __x64_sys_fsync+0x196/0x1e0 fs/sync.c:218 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-tree-00); lock(&delayed_node->mutex); lock(btrfs-tree-00); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by syz-executor.2/13257: #0: ffff88802c1ee370 (btrfs_trans_num_writers){++++}-{0:0}, at: spin_unlock include/linux/spinlock.h:391 [inline] #0: ffff88802c1ee370 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0xb87/0xe00 fs/btrfs/transaction.c:287 #1: ffff88802c1ee398 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0xbb2/0xe00 fs/btrfs/transaction.c:288 #2: ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198 stack backtrace: CPU: 0 PID: 13257 Comm: syz-executor.2 Not tainted 6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195 check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761 __mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603 __mutex_lock kernel/locking/mutex.c:747 [inline] mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799 __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline] __btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156 btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276 btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988 vfs_fsync_range fs/sync.c:188 [inline] vfs_fsync fs/sync.c:202 [inline] do_fsync fs/sync.c:212 [inline] __do_sys_fsync fs/sync.c:220 [inline] __se_sys_fsync fs/sync.c:218 [inline] __x64_sys_fsync+0x196/0x1e0 fs/sync.c:218 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f3ad047cae9 Code: 28 00 00 00 75 (...) RSP: 002b:00007f3ad12510c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 00007f3ad059bf80 RCX: 00007f3ad047cae9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 RBP: 00007f3ad04c847a R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f3ad059bf80 R15: 00007ffe56af92f8 </TASK> ------------[ cut here ]------------ Fix this by releasing the path before releasing the delayed node in the error path at __btrfs_run_delayed_items(). Reported-by: [email protected] Link: https://lore.kernel.org/linux-btrfs/[email protected]/ CC: [email protected] # 4.14+ Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08btrfs: do not block starts waiting on previous transaction commitJosef Bacik4-20/+30
Internally I got a report of very long stalls on normal operations like creating a new file when auto relocation was running. The reporter used the 'bpf offcputime' tracer to show that we would get stuck in start_transaction for 5 to 30 seconds, and were always being woken up by the transaction commit. Using my timing-everything script, which times how long a function takes and what percentage of that total time is taken up by its children, I saw several traces like this 1083 took 32812902424 ns 29929002926 ns 91.2110% wait_for_commit_duration 25568 ns 7.7920e-05% commit_fs_roots_duration 1007751 ns 0.00307% commit_cowonly_roots_duration 446855602 ns 1.36182% btrfs_run_delayed_refs_duration 271980 ns 0.00082% btrfs_run_delayed_items_duration 2008 ns 6.1195e-06% btrfs_apply_pending_changes_duration 9656 ns 2.9427e-05% switch_commit_roots_duration 1598 ns 4.8700e-06% btrfs_commit_device_sizes_duration 4314 ns 1.3147e-05% btrfs_free_log_root_tree_duration Here I was only tracing functions that happen where we are between START_COMMIT and UNBLOCKED in order to see what would be keeping us blocked for so long. The wait_for_commit() we do is where we wait for a previous transaction that hasn't completed it's commit. This can include all of the unpin work and other cleanups, which tends to be the longest part of our transaction commit. There is no reason we should be blocking new things from entering the transaction at this point, it just adds to random latency spikes for no reason. Fix this by adding a PREP stage. This allows us to properly deal with multiple committers coming in at the same time, we retain the behavior that the winner waits on the previous transaction and the losers all wait for this transaction commit to occur. Nothing else is blocked during the PREP stage, and then once the wait is complete we switch to COMMIT_START and all of the same behavior as before is maintained. Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08btrfs: release path before inode lookup during the ino lookup ioctlFilipe Manana1-1/+7
During the ino lookup ioctl we can end up calling btrfs_iget() to get an inode reference while we are holding on a root's btree. If btrfs_iget() needs to lookup the inode from the root's btree, because it's not currently loaded in memory, then it will need to lock another or the same path in the same root btree. This may result in a deadlock and trigger the following lockdep splat: WARNING: possible circular locking dependency detected 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Not tainted ------------------------------------------------------ syz-executor277/5012 is trying to acquire lock: ffff88802df41710 (btrfs-tree-01){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136 but task is already holding lock: ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (btrfs-tree-00){++++}-{3:3}: down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645 __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136 btrfs_search_slot+0x13a4/0x2f80 fs/btrfs/ctree.c:2302 btrfs_init_root_free_objectid+0x148/0x320 fs/btrfs/disk-io.c:4955 btrfs_init_fs_root fs/btrfs/disk-io.c:1128 [inline] btrfs_get_root_ref+0x5ae/0xae0 fs/btrfs/disk-io.c:1338 btrfs_get_fs_root fs/btrfs/disk-io.c:1390 [inline] open_ctree+0x29c8/0x3030 fs/btrfs/disk-io.c:3494 btrfs_fill_super+0x1c7/0x2f0 fs/btrfs/super.c:1154 btrfs_mount_root+0x7e0/0x910 fs/btrfs/super.c:1519 legacy_get_tree+0xef/0x190 fs/fs_context.c:611 vfs_get_tree+0x8c/0x270 fs/super.c:1519 fc_mount fs/namespace.c:1112 [inline] vfs_kern_mount+0xbc/0x150 fs/namespace.c:1142 btrfs_mount+0x39f/0xb50 fs/btrfs/super.c:1579 legacy_get_tree+0xef/0x190 fs/fs_context.c:611 vfs_get_tree+0x8c/0x270 fs/super.c:1519 do_new_mount+0x28f/0xae0 fs/namespace.c:3335 do_mount fs/namespace.c:3675 [inline] __do_sys_mount fs/namespace.c:3884 [inline] __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (btrfs-tree-01){++++}-{3:3}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761 down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645 __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136 btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline] btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281 btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline] btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154 btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412 btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline] btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716 btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline] btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105 btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(btrfs-tree-00); lock(btrfs-tree-01); lock(btrfs-tree-00); rlock(btrfs-tree-01); *** DEADLOCK *** 1 lock held by syz-executor277/5012: #0: ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136 stack backtrace: CPU: 1 PID: 5012 Comm: syz-executor277 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195 check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761 down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645 __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136 btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline] btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281 btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline] btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154 btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412 btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline] btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716 btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline] btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105 btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f0bec94ea39 Fix this simply by releasing the path before calling btrfs_iget() as at point we don't need the path anymore. Reported-by: [email protected] Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Fixes: 23d0b79dfaed ("btrfs: Add unprivileged version of ino_lookup ioctl") CC: [email protected] # 4.19+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08btrfs: fix race between finishing block group creation and its item updateFilipe Manana1-2/+10
Commit 675dfe1223a6 ("btrfs: fix block group item corruption after inserting new block group") fixed one race that resulted in not persisting a block group's item when its "used" bytes field decreases to zero. However there's another race that can happen in a much shorter time window that results in the same problem. The following sequence of steps explains how it can happen: 1) Task A creates a metadata block group X, its "used" and "commit_used" fields are initialized to 0; 2) Two extents are allocated from block group X, so its "used" field is updated to 32K, and its "commit_used" field remains as 0; 3) Transaction commit starts, by some task B, and it enters btrfs_start_dirty_block_groups(). There it tries to update the block group item for block group X, which currently has its "used" field with a value of 32K and its "commit_used" field with a value of 0. However that fails since the block group item was not yet inserted, so at update_block_group_item(), the btrfs_search_slot() call returns 1, and then we set 'ret' to -ENOENT. Before jumping to the label 'fail'... 4) The block group item is inserted by task A, when for example btrfs_create_pending_block_groups() is called when releasing its transaction handle. This results in insert_block_group_item() inserting the block group item in the extent tree (or block group tree), with a "used" field having a value of 32K and setting "commit_used", in struct btrfs_block_group, to the same value (32K); 5) Task B jumps to the 'fail' label and then resets the "commit_used" field to 0. At btrfs_start_dirty_block_groups(), because -ENOENT was returned from update_block_group_item(), we add the block group again to the list of dirty block groups, so that we will try again in the critical section of the transaction commit when calling btrfs_write_dirty_block_groups(); 6) Later the two extents from block group X are freed, so its "used" field becomes 0; 7) If no more extents are allocated from block group X before we get into btrfs_write_dirty_block_groups(), then when we call update_block_group_item() again for block group X, we will not update the block group item to reflect that it has 0 bytes used, because the "used" and "commit_used" fields in struct btrfs_block_group have the same value, a value of 0. As a result after committing the transaction we have an empty block group with its block group item having a 32K value for its "used" field. This will trigger errors from fsck ("btrfs check" command) and after mounting again the fs, the cleaner kthread will not automatically delete the empty block group, since its "used" field is not 0. Possibly there are other issues due to this inconsistency. When this issue happens, the error reported by fsck is like this: [1/7] checking root items [2/7] checking extents block group [1104150528 1073741824] used 39796736 but extent items used 0 ERROR: errors found in extent allocation tree or chunk allocation (...) So fix this by not resetting the "commit_used" field of a block group when we don't find the block group item at update_block_group_item(). Fixes: 7248e0cebbef ("btrfs: skip update of block group item if used bytes are the same") CC: [email protected] # 6.2+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-09-08Revert "dma-contiguous: check for memory region overlap"Zhenhua Huang1-5/+0
This reverts commit 3fa6456ebe13adab3ba1817c8e515a5b88f95dce. The Commit broke the CMA region creation through DT on arm64, as showed below logs with "memblock=debug": [ 0.000000] memblock_phys_alloc_range: 41943040 bytes align=0x200000 from=0x0000000000000000 max_addr=0x00000000ffffffff early_init_dt_alloc_reserved_memory_arch+0x34/0xa0 [ 0.000000] memblock_reserve: [0x00000000fd600000-0x00000000ffdfffff] memblock_alloc_range_nid+0xc0/0x19c [ 0.000000] Reserved memory: overlap with other memblock reserved region >From call flow, region we defined in DT was always reserved before entering into rmem_cma_setup. Also, rmem_cma_setup has one routine cma_init_reserved_mem to ensure the region was reserved. Checking the region not reserved here seems not correct. early_init_fdt_scan_reserved_mem: fdt_scan_reserved_mem __reserved_mem_reserve_reg early_init_dt_reserve_memory memblock_reserve(using “reg” prop case) fdt_init_reserved_mem __reserved_mem_alloc_size *early_init_dt_alloc_reserved_memory_arch* memblock_reserve(dynamic alloc case) __reserved_mem_init_node rmem_cma_setup(region overlap check here should always fail) Example DT can be used to reproduce issue: dump_mem: mem_dump_region { compatible = "shared-dma-pool"; alloc-ranges = <0x0 0x00000000 0x0 0xffffffff>; reusable; size = <0 0x2800000>; }; Signed-off-by: Zhenhua Huang <[email protected]>
2023-09-08net: ipv4: fix one memleak in __inet_del_ifa()Liu Jian1-5/+5
I got the below warning when do fuzzing test: unregister_netdevice: waiting for bond0 to become free. Usage count = 2 It can be repoduced via: ip link add bond0 type bond sysctl -w net.ipv4.conf.bond0.promote_secondaries=1 ip addr add 4.117.174.103/0 scope 0x40 dev bond0 ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0 ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0 ip addr del 4.117.174.103/0 scope 0x40 dev bond0 ip link delete bond0 type bond In this reproduction test case, an incorrect 'last_prim' is found in __inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40) is lost. The memory of the secondary address is leaked and the reference of in_device and net_device is leaked. Fix this problem: Look for 'last_prim' starting at location of the deleted IP and inserting the promoted IP into the location of 'last_prim'. Fixes: 0ff60a45678e ("[IPV4]: Fix secondary IP addresses after promotion") Signed-off-by: Liu Jian <[email protected]> Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-07Merge tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drmLinus Torvalds62-227/+684
Pull drm fixes from Dave Airlie: "Regular rounds of rc1 fixes, a large bunch for amdgpu since it's three weeks in one go, one i915, one nouveau and one ivpu. I think there might be a few more fixes in misc that I haven't pulled in yet, but we should get them all for rc2. amdgpu: - Display replay fixes - Fixes for headless boards - Fix documentation breakage - RAS fixes - Handle newer IP discovery tables - SMU 13.0.6 fixes - SR-IOV fixes - Display vstartup fixes - NBIO 7.9 fixes - Display scaling mode fixes - Debugfs power reporting fix - GC 9.4.3 fixes - Dirty framebuffer fixes for fbcon - eDP fixes - DCN 3.1.5 fix - Display ODM fixes - GPU core dump fix - Re-enable zops property now that IGT test is fixed - Fix possible UAF in CS code - Cursor degamma fix amdkfd: - HMM fixes - Interrupt masking fix - GFX11 MQD fixes i915: - Mark requests for GuC virtual engines to avoid use-after-free nouveau: - Fix fence state in nouveau_fence_emit() ivpu: - replace strncpy" * tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm: (51 commits) drm/amdgpu: Restrict bootloader wait to SMUv13.0.6 drm/amd/display: prevent potential division by zero errors drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1 Revert "drm/amd/display: Remove v_startup workaround for dcn3+" drm/amdgpu: fix amdgpu_cs_p1_user_fence Revert "Revert "drm/amd/display: Implement zpos property"" drm/amdkfd: Add missing gfx11 MQD manager callbacks drm/amdgpu: Free ras cmd input buffer properly drm/amdgpu: Hide xcp partition sysfs under SRIOV drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting drm/amdkfd: use mask to get v9 interrupt sq data bits correctly drm/amdgpu: Allocate coredump memory in a nonblocking way drm/amdgpu: Support query ecc cap for aqua_vanjaram drm/amdgpu: Add umc_info v4_0 structure drm/amd/display: always switch off ODM before committing more streams drm/amd/display: Remove wait while locked drm/amd/display: update blank state on ODM changes drm/amd/display: Add smu write msg id fail retry process drm/amdgpu: Add SMU v13.0.6 default reset methods ...
2023-09-07Merge tag 'net-6.6-rc1' of ↵Linus Torvalds118-410/+1405
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking updates from Jakub Kicinski: "Including fixes from netfilter and bpf. Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream" * tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs() Revert "net: team: do not use dynamic lockdep key" net: hns3: remove GSO partial feature bit net: hns3: fix the port information display when sfp is absent net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue net: hns3: fix debugfs concurrency issue between kfree buffer and read net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read() net: hns3: Support query tx timeout threshold by debugfs net: hns3: fix tx timeout issue net: phy: Provide Module 4 KSZ9477 errata (DS80000754C) netfilter: nf_tables: Unbreak audit log reset netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID netfilter: nfnetlink_osf: avoid OOB read netfilter: nftables: exthdr: fix 4-byte stack OOB write selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc bpf: bpf_sk_storage: Fix invalid wait context lockdep report s390/bpf: Pass through tail call counter in trampolines ...
2023-09-07Merge tag 'devicetree-fixes-for-6.6-1' of ↵Linus Torvalds9-132/+202
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull more devicetree updates from Rob Herring: "A couple of conversions which didn't get picked up by the subsystems and one fix: - Convert st,stih407-irq-syscfg and Omnivision OV7251 bindings to DT schema - Merge Omnivision OV5695 into OV5693 binding - Fix of_overlay_fdt_apply prototype when !CONFIG_OF_OVERLAY" * tag 'devicetree-fixes-for-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: irqchip: convert st,stih407-irq-syscfg to DT schema media: dt-bindings: Convert Omnivision OV7251 to DT schema media: dt-bindings: Merge OV5695 into OV5693 binding of: overlay: Fix of_overlay_fdt_apply prototype when !CONFIG_OF_OVERLAY
2023-09-07Merge tag 'pwm/for-6.6-rc1' of ↵Linus Torvalds40-290/+281
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "Various cleanups and fixes across the board" * tag 'pwm/for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (31 commits) pwm: lpc32xx: Remove handling of PWM channels pwm: atmel: Simplify using devm functions dt-bindings: pwm: brcm,kona-pwm: convert to YAML pwm: stmpe: Handle errors when disabling the signal pwm: stm32: Simplify using devm_pwmchip_add() pwm: stm32: Don't modify HW state in .remove() callback pwm: Fix order of freeing resources in pwmchip_remove() pwm: ntxec: Use device_set_of_node_from_dev() pwm: ntxec: Drop a write-only variable from driver data pwm: pxa: Don't reimplement of_device_get_match_data() pwm: lpc18xx-sct: Simplify using devm_clk_get_enabled() pwm: atmel-tcb: Don't track polarity in driver data pwm: atmel-tcb: Unroll atmel_tcb_pwm_set_polarity() into only caller pwm: atmel-tcb: Put per-channel data into driver data pwm: atmel-tcb: Fix resource freeing in error path and remove pwm: atmel-tcb: Harmonize resource allocation order pwm: Drop unused #include <linux/radix-tree.h> pwm: rz-mtu3: Fix build warning 'num_channel_ios' not described pwm: Remove outdated documentation for pwmchip_remove() pwm: atmel: Enable clk when pwm already enabled in bootloader ...
2023-09-08Merge tag 'amd-drm-fixes-6.6-2023-09-06' of ↵Dave Airlie50-180/+632
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-fixes-6.6-2023-09-06: amdgpu: - Display replay fixes - Fixes for headless boards - Fix documentation breakage - RAS fixes - Handle newer IP discovery tables - SMU 13.0.6 fixes - SR-IOV fixes - Display vstartup fixes - NBIO 7.9 fixes - Display scaling mode fixes - Debugfs power reporting fix - GC 9.4.3 fixes - Dirty framebuffer fixes for fbcon - eDP fixes - DCN 3.1.5 fix - Display ODM fixes - GPU core dump fix - Re-enable zops property now that IGT test is fixed - Fix possible UAF in CS code - Cursor degamma fix amdkfd: - HMM fixes - Interrupt masking fix - GFX11 MQD fixes Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-09-08Merge tag 'drm-intel-next-fixes-2023-08-31' of ↵Dave Airlie3-5/+6
git://anongit.freedesktop.org/drm/drm-intel into drm-next - Mark requests for GuC virtual engines to avoid use-after-free (Andrzej). Signed-off-by: Dave Airlie <[email protected]> From: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-09-07Merge tag 'rtc-6.6' of ↵Linus Torvalds59-616/+1334
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "Subsystem: - Add a way for drivers to tell the core the supported alarm range is smaller than the date range. This is not used yet but will be useful for the alarmtimers in the next release. - fix Wvoid-pointer-to-enum-cast warnings - remove redundant of_match_ptr() - stop warning for invalid alarms when the alarm is disabled Drivers: - isl12022: allow setting the trip level for battery level detection - pcf2127: add support for PCF2131 and multiple timestamps - stm32: time precision improvement, many fixes - twl: NVRAM support" * tag 'rtc-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (73 commits) dt-bindings: rtc: ds3231: Remove text binding rtc: wm8350: remove unnecessary messages rtc: twl: remove unnecessary messages rtc: sun6i: remove unnecessary message rtc: stop warning for invalid alarms when the alarm is disabled rtc: twl: add NVRAM support rtc: pcf85363: Allow to wake up system without IRQ rtc: m48t86: add DT support for m48t86 dt-bindings: rtc: Add ST M48T86 rtc: pcf2127: remove useless check rtc: rzn1: Report maximum alarm limit to rtc core rtc: ds1305: Report maximum alarm limit to rtc core rtc: tps6586x: Report maximum alarm limit to rtc core rtc: cmos: Report supported alarm limit to rtc infrastructure rtc: cros-ec: Detect and report supported alarm window size rtc: Add support for limited alarm timer offsets rtc: isl1208: Fix incorrect logic in isl1208_set_xtoscb() MAINTAINERS: remove obsolete pattern in RTC SUBSYSTEM section rtc: tps65910: Remove redundant dev_warn() and do not check for 0 return after calling platform_get_irq() rtc: omap: Do not check for 0 return after calling platform_get_irq() ...
2023-09-07Merge tag 'i3c/for-6.6' of ↵Linus Torvalds6-12/+32
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c updates from Alexandre Belloni: "Core: - Fix SETDASA when static and dynamic adress are equal - Fix cmd_v1 DAA exit criteria Drivers: - svc: allow probing without any device" * tag 'i3c/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c: master: svc: fix probe failure when no i3c device exist i3c: master: Fix SETDASA process dt-bindings: i3c: Fix description for assigned-address i3c: master: svc: Describe member 'saved_regs' i3c: master: svc: Do not check for 0 return after calling platform_get_irq() i3c/master: cmd_v1: Fix the exit criteria for the daa procedure i3c: Explicitly include correct DT includes
2023-09-07Merge tag 'regulator-fix-v6.6-merge-window' of ↵Linus Torvalds2-17/+16
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A couple of fixes that came in during the merge window, both driver specific - one for a bug that came up in testing, one for a bug due to a misreading of the datasheet" * tag 'regulator-fix-v6.6-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: tps6594-regulator: Fix random kernel crash regulator: tps6287x: Fix n_voltages
2023-09-07Merge tag 'spi-fix-v6.6-merge-window' of ↵Linus Torvalds1-2/+29
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple of fixes for the sun6i driver. The patch to reduce DMA RX to single byte width all the time is *hopefully* excessively cautious but it's unclear which SoCs are affected so the fix just covers everything for safety" * tag 'spi-fix-v6.6-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: sun6i: fix race between DMA RX transfer completion and RX FIFO drain spi: sun6i: reduce DMA RX transfer width to single byte
2023-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds177-3127/+8673
Pull kvm updates from Paolo Bonzini: "ARM: - Clean up vCPU targets, always returning generic v8 as the preferred target - Trap forwarding infrastructure for nested virtualization (used for traps that are taken from an L2 guest and are needed by the L1 hypervisor) - FEAT_TLBIRANGE support to only invalidate specific ranges of addresses when collapsing a table PTE to a block PTE. This avoids that the guest refills the TLBs again for addresses that aren't covered by the table PTE. - Fix vPMU issues related to handling of PMUver. - Don't unnecessary align non-stack allocations in the EL2 VA space - Drop HCR_VIRT_EXCP_MASK, which was never used... - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu parameter instead - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort() - Remove prototypes without implementations RISC-V: - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V s390: - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya) x86: - Clean up KVM's handling of Intel architectural events - Intel bugfixes - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug registers and generate/handle #DBs - Clean up LBR virtualization code - Fix a bug where KVM fails to set the target pCPU during an IRTE update - Fix fatal bugs in SEV-ES intrahost migration - Fix a bug where the recent (architecturally correct) change to reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it) - Retry APIC map recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR cannot diverge from the default when TSC scaling is disabled up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID - Rip out the ancient MMU_DEBUG crud and replace the useful bits with CONFIG_KVM_PROVE_MMU - Fix KVM's handling of !visible guest roots to avoid premature triple fault injection - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface that is needed by external users (currently only KVMGT), and fix a variety of issues in the process Generic: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations Selftests: - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs - Add support for printf() in guest code and covert all guest asserts to use printf-based reporting - Clean up the PMU event filter test and add new testcases - Include x86 selftests in the KVM x86 MAINTAINERS entry" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits) KVM: x86/mmu: Include mmu.h in spte.h KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots KVM: x86/mmu: Disallow guest from using !visible slots for page tables KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page KVM: x86/mmu: Harden new PGD against roots without shadow pages KVM: x86/mmu: Add helper to convert root hpa to shadow page drm/i915/gvt: Drop final dependencies on KVM internal details KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers KVM: x86/mmu: Drop @slot param from exported/external page-track APIs KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled KVM: x86/mmu: Assert that correct locks are held for page write-tracking KVM: x86/mmu: Rename page-track APIs to reflect the new reality KVM: x86/mmu: Drop infrastructure for multiple page-track modes KVM: x86/mmu: Use page-track notifiers iff there are external users KVM: x86/mmu: Move KVM-only page-track declarations to internal header KVM: x86: Remove the unused page-track hook track_flush_slot() drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() KVM: x86: Add a new page-track hook to handle memslot deletion drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot KVM: x86: Reject memslot MOVE operations if KVMGT is attached ...
2023-09-07ring-buffer: Avoid softlockup in ring_buffer_resize()Zheng Yejian1-0/+2
When user resize all trace ring buffer through file 'buffer_size_kb', then in ring_buffer_resize(), kernel allocates buffer pages for each cpu in a loop. If the kernel preemption model is PREEMPT_NONE and there are many cpus and there are many buffer pages to be allocated, it may not give up cpu for a long time and finally cause a softlockup. To avoid it, call cond_resched() after each cpu buffer allocation. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: <[email protected]> Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-09-07tracing: Have event inject files inc the trace array ref countSteven Rostedt (Google)1-1/+2
The event inject files add events for a specific trace array. For an instance, if the file is opened and the instance is deleted, reading or writing to the file will cause a use after free. Up the ref count of the trace_array when a event inject file is opened. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Zheng Yejian <[email protected]> Fixes: 6c3edaf9fd6a ("tracing: Introduce trace event injection") Tested-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-09-07tracing: Have option files inc the trace array ref countSteven Rostedt (Google)1-1/+22
The option files update the options for a given trace array. For an instance, if the file is opened and the instance is deleted, reading or writing to the file will cause a use after free. Up the ref count of the trace_array when an option file is opened. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Zheng Yejian <[email protected]> Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()") Tested-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-09-07tracing: Have current_trace inc the trace array ref countSteven Rostedt (Google)1-1/+2
The current_trace updates the trace array tracer. For an instance, if the file is opened and the instance is deleted, reading or writing to the file will cause a use after free. Up the ref count of the trace array when current_trace is opened. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Zheng Yejian <[email protected]> Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()") Tested-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-09-07tracing: Have tracing_max_latency inc the trace array ref countSteven Rostedt (Google)1-5/+10
The tracing_max_latency file points to the trace_array max_latency field. For an instance, if the file is opened and the instance is deleted, reading or writing to the file will cause a use after free. Up the ref count of the trace_array when tracing_max_latency is opened. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Zheng Yejian <[email protected]> Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()") Tested-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-09-08Merge tag 'drm-misc-next-fixes-2023-09-01' of ↵Dave Airlie9-42/+46
git://anongit.freedesktop.org/drm/drm-misc into drm-next Short summary of fixes pull: * ivpu: Replace strncpy * nouveau: Fix fence state in nouveau_fence_emit() Signed-off-by: Dave Airlie <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/20230901070123.GA6987@linux-uq9g
2023-09-07tracing: Increase trace array ref count on enable and filter filesSteven Rostedt (Google)3-2/+33
When the trace event enable and filter files are opened, increment the trace array ref counter, otherwise they can be accessed when the trace array is being deleted. The ref counter keeps the trace array from being deleted while those files are opened. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()") Tested-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Reported-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-09-07tracefs/eventfs: Use dput to free the toplevel events directorySteven Rostedt (Google)3-8/+16
Currently when rmdir on an instance is done, eventfs_remove_events_dir() is called and it does a dput on the dentry and then frees the eventfs_inode that represents the events directory. But there's no protection against a reader reading the top level events directory at the same time and we can get a use after free error. Instead, use the dput() associated to the dentry to also free the eventfs_inode associated to the events directory, as that will get called when the last reference to the directory is released. This issue triggered the following KASAN report: ================================================================== BUG: KASAN: slab-use-after-free in eventfs_root_lookup+0x88/0x1b0 Read of size 8 at addr ffff888120130ca0 by task ftracetest/1201 CPU: 4 PID: 1201 Comm: ftracetest Not tainted 6.5.0-test-10737-g469e0a8194e7 #13 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x90 print_report+0xcf/0x670 ? __pfx_ring_buffer_record_off+0x10/0x10 ? _raw_spin_lock_irqsave+0x2b/0x70 ? __virt_addr_valid+0xd9/0x160 kasan_report+0xd4/0x110 ? eventfs_root_lookup+0x88/0x1b0 ? eventfs_root_lookup+0x88/0x1b0 eventfs_root_lookup+0x88/0x1b0 ? eventfs_root_lookup+0x33/0x1b0 __lookup_slow+0x194/0x2a0 ? __pfx___lookup_slow+0x10/0x10 ? down_read+0x11c/0x330 walk_component+0x166/0x220 link_path_walk.part.0.constprop.0+0x3a3/0x5a0 ? seqcount_lockdep_reader_access+0x82/0x90 ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10 path_openat+0x143/0x11f0 ? __lock_acquire+0xa1a/0x3220 ? __pfx_path_openat+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 do_filp_open+0x166/0x290 ? __pfx_do_filp_open+0x10/0x10 ? lock_is_held_type+0xce/0x120 ? preempt_count_sub+0xb7/0x100 ? _raw_spin_unlock+0x29/0x50 ? alloc_fd+0x1a0/0x320 do_sys_openat2+0x126/0x160 ? rcu_is_watching+0x34/0x60 ? __pfx_do_sys_openat2+0x10/0x10 ? __might_resched+0x2cf/0x3b0 ? __fget_light+0xdf/0x100 __x64_sys_openat+0xcd/0x140 ? __pfx___x64_sys_openat+0x10/0x10 ? syscall_enter_from_user_mode+0x22/0x90 ? lockdep_hardirqs_on+0x7d/0x100 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f1dceef5e51 Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d 9a 27 0e 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 93 00 00 00 48 8b 54 24 28 64 48 2b 14 25 RSP: 002b:00007fff2cddf380 EFLAGS: 00000202 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000241 RCX: 00007f1dceef5e51 RDX: 0000000000000241 RSI: 000055d7520677d0 RDI: 00000000ffffff9c RBP: 000055d7520677d0 R08: 000000000000001e R09: 0000000000000001 R10: 00000000000001b6 R11: 0000000000000202 R12: 0000000000000000 R13: 0000000000000003 R14: 000055d752035678 R15: 000055d752067788 </TASK> Allocated by task 1200: kasan_save_stack+0x2f/0x50 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x8b/0x90 eventfs_create_events_dir+0x54/0x220 create_event_toplevel_files+0x42/0x130 event_trace_add_tracer+0x33/0x180 trace_array_create_dir+0x52/0xf0 trace_array_create+0x361/0x410 instance_mkdir+0x6b/0xb0 tracefs_syscall_mkdir+0x57/0x80 vfs_mkdir+0x275/0x380 do_mkdirat+0x1da/0x210 __x64_sys_mkdir+0x74/0xa0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 1251: kasan_save_stack+0x2f/0x50 kasan_set_track+0x21/0x30 kasan_save_free_info+0x27/0x40 __kasan_slab_free+0x106/0x180 __kmem_cache_free+0x149/0x2e0 event_trace_del_tracer+0xcb/0x120 __remove_instance+0x16a/0x340 instance_rmdir+0x77/0xa0 tracefs_syscall_rmdir+0x77/0xc0 vfs_rmdir+0xed/0x2d0 do_rmdir+0x235/0x280 __x64_sys_rmdir+0x5f/0x90 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 The buggy address belongs to the object at ffff888120130ca0 which belongs to the cache kmalloc-16 of size 16 The buggy address is located 0 bytes inside of freed 16-byte region [ffff888120130ca0, ffff888120130cb0) The buggy address belongs to the physical page: page:000000004dbddbb0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x120130 flags: 0x17ffffc0000800(slab|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0017ffffc0000800 ffff8881000423c0 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000800080 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888120130b80: 00 00 fc fc 00 05 fc fc 00 00 fc fc 00 02 fc fc ffff888120130c00: 00 07 fc fc 00 00 fc fc 00 00 fc fc fa fb fc fc >ffff888120130c80: 00 00 fc fc fa fb fc fc 00 00 fc fc 00 00 fc fc ^ ffff888120130d00: 00 00 fc fc 00 00 fc fc 00 00 fc fc fa fb fc fc ffff888120130d80: 00 00 fc fc 00 00 fc fc 00 00 fc fc 00 00 fc fc ================================================================== Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: Ajay Kaher <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 5bdcd5f5331a2 eventfs: ("Implement removal of meta data from eventfs") Tested-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Reported-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-09-07jbd2: Remove page size assumptionsRitesh Harjani (IBM)2-18/+10
jbd2_alloc() allocates a buffer from slab when the block size is smaller than PAGE_SIZE, and slab may be using a compound page. Before commit 8147c4c4546f, we set b_page to the precise page containing the buffer and this code worked well. Now we set b_page to the head page of the allocation, so we can no longer use offset_in_page(). While we could do a 1:1 replacement with offset_in_folio(), use the more idiomatic bh_offset() and the folio APIs to map the buffer. This isn't enough to support a b_size larger than PAGE_SIZE on HIGHMEM machines, but this is good enough to fix the actual bug we're seeing. Fixes: 8147c4c4546f ("jbd2: use a folio in jbd2_journal_write_metadata_buffer()") Reported-by: Zorro Lang <[email protected]> Signed-off-by: Ritesh Harjani (IBM) <[email protected]> [converted to be more folio] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2023-09-07buffer: Make bh_offset() work for compound pagesMatthew Wilcox (Oracle)1-1/+4
If the buffer pointed to by the buffer_head is part of a compound page, bh_offset() assumes that b_page is the precise page that contains the data. A recent change to jbd2 inadvertently violated that assumption. By using page_size(), we support both b_page being set to the head page (as page_size() will return the size of the entire folio) and the precise page (as it will return PAGE_SIZE for a tail page). Fixes: 8147c4c4546f ("jbd2: use a folio in jbd2_journal_write_metadata_buffer()") Reported-by: Zorro Lang <[email protected]> Tested-by: Ritesh Harjani (IBM) <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2023-09-07net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()Vladimir Oltean1-1/+1
enetc_psi_create() returns an ERR_PTR() or a valid station interface pointer, but checking for the non-NULL quality of the return code blurs that difference away. So if enetc_psi_create() fails, we call enetc_psi_destroy() when we shouldn't. This will likely result in crashes, since enetc_psi_create() cleans up everything after itself when it returns an ERR_PTR(). Fixes: f0168042a212 ("net: enetc: reimplement RFS/RSS memory clearing as PCI quirk") Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Suggested-by: Dan Carpenter <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-09-07Revert "net: team: do not use dynamic lockdep key"Jakub Kicinski3-85/+60
This reverts commit 39285e124edbc752331e98ace37cc141a6a3747a. Looks like the change has unintended consequences in exposing objects before they are initialized. Let's drop this patch and try again in net-next. Reported-by: [email protected] Fixes: 39285e124edb ("net: team: do not use dynamic lockdep key") Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-09-07Merge tag 's390-6.6-2' of ↵Linus Torvalds21-223/+98
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - A couple of virtual vs physical address confusion fixes - Rework locking in dcssblk driver to address a lockdep warning - Remove support for "noexec" kernel command line option since there is no use case where it would make sense - Simplify kernel mapping setup and get rid of quite a bit of code - Add architecture specific __set_memory_yy() functions which allow us to modify kernel mappings. Unlike the set_memory_xx() variants they take void pointer start and end parameters, which allows using them without the usual casts, and also to use them on areas larger than 8TB. Note that the set_memory_xx() family comes with an int num_pages parameter which overflows with 8TB. This could be addressed by changing the num_pages parameter to unsigned long, however requires to change all architectures, since the module code expects an int parameter (see module_set_memory()). This was indeed an issue since for debug_pagealloc() we call set_memory_4k() on the whole identity mapping. Therefore address this for now with the __set_memory_yy() variant, and address common code later - Use dev_set_name() and also fix memory leak in zcrypt driver error handling - Remove unused lsi_mask from airq_struct - Add warning for invalid kernel mapping requests * tag 's390-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/vmem: do not silently ignore mapping limit s390/zcrypt: utilize dev_set_name() ability to use a formatted string s390/zcrypt: don't leak memory if dev_set_name() fails s390/mm: fix MAX_DMA_ADDRESS physical vs virtual confusion s390/airq: remove lsi_mask from airq_struct s390/mm: use __set_memory() variants where useful s390/set_memory: add __set_memory() variant s390/set_memory: generate all set_memory() functions s390/mm: improve description of mapping permissions of prefix pages s390/amode31: change type of __samode31, __eamode31, etc s390/mm: simplify kernel mapping setup s390: remove "noexec" option s390/vmem: fix virtual vs physical address confusion s390/dcssblk: fix lockdep warning s390/monreader: fix virtual vs physical address confusion
2023-09-07Merge tag 'mips_6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds44-252/+149
Pull MIPS updates from Thomas Bogendoerfer: "Just cleanups and fixes" * tag 'mips_6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: TXx9: Do PCI error checks on own line arch/mips/configs/*_defconfig cleanup MIPS: VDSO: Conditionally export __vdso_gettimeofday() Mips: loongson3_defconfig: Enable ast drm driver by default mips: remove <asm/export.h> mips: replace #include <asm/export.h> with #include <linux/export.h> mips: remove unneeded #include <asm/export.h> MIPS: Loongson64: Fix more __iomem attributes MIPS: loongson32: Remove regs-rtc.h MIPS: loongson32: Remove regs-clk.h MIPS: More explicit DT include clean-ups MIPS: Fixup explicit DT include clean-up Revert MIPS: Loongson: Fix build error when make modules_install MIPS: Only fiddle with CHECKFLAGS if `need-compiler' MIPS: Fix CONFIG_CPU_DADDI_WORKAROUNDS `modules_install' regression MIPS: Explicitly include correct DT includes
2023-09-07Merge tag 'xtensa-20230905' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds7-5/+56
Pull xtensa updates from Max Filippov: - enable MTD XIP support - fix base address of the xtensa perf module in newer hardware * tag 'xtensa-20230905' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: add XIP-aware MTD support xtensa: PMU: fix base address for the newer hardware
2023-09-07ntfs3: drop inode references in ntfs_put_super()Christian Brauner1-6/+12
Recently we moved most cleanup from ntfs_put_super() into ntfs3_kill_sb() as part of a bigger cleanup. This accidently also moved dropping inode references stashed in ntfs3's sb->s_fs_info from @sb->put_super() to @sb->kill_sb(). But generic_shutdown_super() verifies that there are no busy inodes past sb->put_super(). Fix this and disentangle dropping inode references from freeing @sb->s_fs_info. Fixes: a4f64a300a29 ("ntfs3: free the sbi in ->kill_sb") # mainline only Reported-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-09-07vfs: mostly undo glibc turning 'fstat()' into 'fstatat(AT_EMPTY_PATH)'Linus Torvalds1-0/+17
Mateusz reports that glibc turns 'fstat()' calls into 'fstatat()', and that seems to have been going on for quite a long time due to glibc having tried to simplify its stat logic into just one point. This turns out to cause completely unnecessary overhead, where we then go off and allocate the kernel side pathname, and actually look up the empty path. Sure, our path lookup is quite optimized, but it still causes a fair bit of allocation overhead and a couple of completely unnecessary rounds of lockref accesses etc. This is all hopefully getting fixed in user space, and there is a patch floating around for just having glibc use the native fstat() system call. But even with the current situation we can at least improve on things by catching the situation and short-circuiting it. Note that this is still measurably slower than just a plain 'fstat()', since just checking that the filename is actually empty is somewhat expensive due to inevitable user space access overhead from the kernel (ie verifying pointers, and SMAP on x86). But it's still quite a bit faster than actually looking up the path for real. To quote numers from Mateusz: "Sapphire Rapids, will-it-scale, ops/s stock fstat 5088199 patched fstat 7625244 (+49%) real fstat 8540383 (+67% / +12%)" where that 'stock fstat' is the glibc translation of fstat into fstatat() with an empty path, the 'patched fstat' is with this short circuiting of the path lookup, and the 'real fstat' is the actual native fstat() system call with none of this overhead. Link: https://lore.kernel.org/lkml/20230903204858.lv7i3kqvw6eamhgz@f/ Reported-by: Mateusz Guzik <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-09-07drm/radeon: make fence wait in suballocator uninterrruptableAlex Deucher1-1/+1
Commit 254986e324ad ("drm/radeon: Use the drm suballocation manager implementation.") made the fence wait in amdgpu_sa_bo_new() interruptible but there is no code to handle an interrupt. This caused the kernel to randomly explode in high-VRAM-pressure situations so make it uninterruptible again. Fixes: 254986e324ad ("drm/radeon: Use the drm suballocation manager implementation.") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2769 Signed-off-by: Alex Deucher <[email protected]> CC: [email protected] # 6.4+ CC: Simon Pilkington <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
2023-09-07Revert "io_uring: fix IO hang in io_wq_put_and_exit from do_exit()"Jens Axboe1-32/+0
This reverts commit b484a40dc1f16edb58e5430105a021e1916e6f27. This commit cancels all requests with io-wq, not just the ones from the originating task. This breaks use cases that have thread pools, or just multiple tasks issuing requests on the same ring. The liburing regression test for this also shows that problem: $ test/thread-exit.t cqe->res=-125, Expected 512 where an IO thread gets its request canceled rather than complete successfully. Signed-off-by: Jens Axboe <[email protected]>
2023-09-07io_uring: fix unprotected iopoll overflowPavel Begunkov1-2/+2
[ 71.490669] WARNING: CPU: 3 PID: 17070 at io_uring/io_uring.c:769 io_cqring_event_overflow+0x47b/0x6b0 [ 71.498381] Call Trace: [ 71.498590] <TASK> [ 71.501858] io_req_cqe_overflow+0x105/0x1e0 [ 71.502194] __io_submit_flush_completions+0x9f9/0x1090 [ 71.503537] io_submit_sqes+0xebd/0x1f00 [ 71.503879] __do_sys_io_uring_enter+0x8c5/0x2380 [ 71.507360] do_syscall_64+0x39/0x80 We decoupled CQ locking from ->task_complete but haven't fixed up places forcing locking for CQ overflows. Fixes: ec26c225f06f5 ("io_uring: merge iopoll and normal completion paths") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2023-09-07io_uring: break out of iowq iopoll on teardownPavel Begunkov3-0/+13
io-wq will retry iopoll even when it failed with -EAGAIN. If that races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers, such workers might potentially infinitely spin retrying iopoll again and again and each time failing on some allocation / waiting / etc. Don't keep spinning if io-wq is dying. Fixes: 561fb04a6a225 ("io_uring: replace workqueue usage with io-wq") Cc: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2023-09-07Revert "printk: export symbols for debug modules"Christoph Hellwig1-2/+0
This reverts commit 3e00123a13d824d63072b1824c9da59cd78356d9. No, we never export random symbols for out of tree modules. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Petr Mladek <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-09-07Merge tag 'asoc-fix-v6.6-merge-window' of ↵Takashi Iwai11-31/+96
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.6 A bunch of fixes and new IDs that came in since the initial pull request - all driver specific and nothing too exciting. There's a trivial conflict in the AMD driver ID table due to the last v6.5 fixes not having been merged up.
2023-09-07Merge tag 'nf-23-09-06' of ↵Paolo Abeni6-15/+36
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter updates for net This PR contains nf_tables updates for your *net* tree. This time almost all fixes are for old bugs: First patch fixes a 4-byte stack OOB write, from myself. This was broken ever since nftables was switches from 128 to 32bit register addressing in v4.1. 2nd patch fixes an out-of-bounds read. This has been broken ever since xt_osf got added in 2.6.31, the bug was then just moved around during refactoring, from Wander Lairson Costa. 3rd patch adds a missing enum description, from Phil Sutter. 4th patch fixes a UaF inftables that occurs when userspace adds elements with a timeout so small that expiration happens while the transaction is still in progress. Fix from Pablo Neira Ayuso. Patch 5 fixes a memory out of bounds access, this was broken since v4.20. Patch from Kyle Zeng and Jozsef Kadlecsik. Patch 6 fixes another bogus memory access when building audit record. Bug added in the previous pull request, fix from Pablo. netfilter pull request 2023-09-06 * tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: Unbreak audit log reset netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID netfilter: nfnetlink_osf: avoid OOB read netfilter: nftables: exthdr: fix 4-byte stack OOB write ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-09-07arm64: csum: Fix OoB access in IP checksum code for negative lengthsWill Deacon1-1/+1
Although commit c2c24edb1d9c ("arm64: csum: Fix pathological zero-length calls") added an early return for zero-length input, syzkaller has popped up with an example of a _negative_ length which causes an undefined shift and an out-of-bounds read: | BUG: KASAN: slab-out-of-bounds in do_csum+0x44/0x254 arch/arm64/lib/csum.c:39 | Read of size 4294966928 at addr ffff0000d7ac0170 by task syz-executor412/5975 | | CPU: 0 PID: 5975 Comm: syz-executor412 Not tainted 6.4.0-rc4-syzkaller-g908f31f2a05b #0 | Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 | Call trace: | dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:233 | show_stack+0x2c/0x44 arch/arm64/kernel/stacktrace.c:240 | __dump_stack lib/dump_stack.c:88 [inline] | dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106 | print_address_description mm/kasan/report.c:351 [inline] | print_report+0x174/0x514 mm/kasan/report.c:462 | kasan_report+0xd4/0x130 mm/kasan/report.c:572 | kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:187 | __kasan_check_read+0x20/0x30 mm/kasan/shadow.c:31 | do_csum+0x44/0x254 arch/arm64/lib/csum.c:39 | csum_partial+0x30/0x58 lib/checksum.c:128 | gso_make_checksum include/linux/skbuff.h:4928 [inline] | __udp_gso_segment+0xaf4/0x1bc4 net/ipv4/udp_offload.c:332 | udp6_ufo_fragment+0x540/0xca0 net/ipv6/udp_offload.c:47 | ipv6_gso_segment+0x5cc/0x1760 net/ipv6/ip6_offload.c:119 | skb_mac_gso_segment+0x2b4/0x5b0 net/core/gro.c:141 | __skb_gso_segment+0x250/0x3d0 net/core/dev.c:3401 | skb_gso_segment include/linux/netdevice.h:4859 [inline] | validate_xmit_skb+0x364/0xdbc net/core/dev.c:3659 | validate_xmit_skb_list+0x94/0x130 net/core/dev.c:3709 | sch_direct_xmit+0xe8/0x548 net/sched/sch_generic.c:327 | __dev_xmit_skb net/core/dev.c:3805 [inline] | __dev_queue_xmit+0x147c/0x3318 net/core/dev.c:4210 | dev_queue_xmit include/linux/netdevice.h:3085 [inline] | packet_xmit+0x6c/0x318 net/packet/af_packet.c:276 | packet_snd net/packet/af_packet.c:3081 [inline] | packet_sendmsg+0x376c/0x4c98 net/packet/af_packet.c:3113 | sock_sendmsg_nosec net/socket.c:724 [inline] | sock_sendmsg net/socket.c:747 [inline] | __sys_sendto+0x3b4/0x538 net/socket.c:2144 Extend the early return to reject negative lengths as well, aligning our implementation with the generic code in lib/checksum.c Cc: Robin Murphy <[email protected]> Fixes: 5777eaed566a ("arm64: Implement optimised checksum routine") Reported-by: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>