aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-02-09btrfs: lock the inode in shared mode before starting fiemapFilipe Manana1-0/+2
Currently fiemap does not take the inode's lock (VFS lock), it only locks a file range in the inode's io tree. This however can lead to a deadlock if we have a concurrent fsync on the file and fiemap code triggers a fault when accessing the user space buffer with fiemap_fill_next_extent(). The deadlock happens on the inode's i_mmap_lock semaphore, which is taken both by fsync and btrfs_page_mkwrite(). This deadlock was recently reported by syzbot and triggers a trace like the following: task:syz-executor361 state:D stack:20264 pid:5668 ppid:5119 flags:0x00004004 Call Trace: <TASK> context_switch kernel/sched/core.c:5293 [inline] __schedule+0x995/0xe20 kernel/sched/core.c:6606 schedule+0xcb/0x190 kernel/sched/core.c:6682 wait_on_state fs/btrfs/extent-io-tree.c:707 [inline] wait_extent_bit+0x577/0x6f0 fs/btrfs/extent-io-tree.c:751 lock_extent+0x1c2/0x280 fs/btrfs/extent-io-tree.c:1742 find_lock_delalloc_range+0x4e6/0x9c0 fs/btrfs/extent_io.c:488 writepage_delalloc+0x1ef/0x540 fs/btrfs/extent_io.c:1863 __extent_writepage+0x736/0x14e0 fs/btrfs/extent_io.c:2174 extent_write_cache_pages+0x983/0x1220 fs/btrfs/extent_io.c:3091 extent_writepages+0x219/0x540 fs/btrfs/extent_io.c:3211 do_writepages+0x3c3/0x680 mm/page-writeback.c:2581 filemap_fdatawrite_wbc+0x11e/0x170 mm/filemap.c:388 __filemap_fdatawrite_range mm/filemap.c:421 [inline] filemap_fdatawrite_range+0x175/0x200 mm/filemap.c:439 btrfs_fdatawrite_range fs/btrfs/file.c:3850 [inline] start_ordered_ops fs/btrfs/file.c:1737 [inline] btrfs_sync_file+0x4ff/0x1190 fs/btrfs/file.c:1839 generic_write_sync include/linux/fs.h:2885 [inline] btrfs_do_write_iter+0xcd3/0x1280 fs/btrfs/file.c:1684 call_write_iter include/linux/fs.h:2189 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x7dc/0xc50 fs/read_write.c:584 ksys_write+0x177/0x2a0 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f7d4054e9b9 RSP: 002b:00007f7d404fa2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f7d405d87a0 RCX: 00007f7d4054e9b9 RDX: 0000000000000090 RSI: 0000000020000000 RDI: 0000000000000006 RBP: 00007f7d405a51d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 61635f65646f6e69 R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87a8 </TASK> INFO: task syz-executor361:5697 blocked for more than 145 seconds. Not tainted 6.2.0-rc3-syzkaller-00376-g7c6984405241 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor361 state:D stack:21216 pid:5697 ppid:5119 flags:0x00004004 Call Trace: <TASK> context_switch kernel/sched/core.c:5293 [inline] __schedule+0x995/0xe20 kernel/sched/core.c:6606 schedule+0xcb/0x190 kernel/sched/core.c:6682 rwsem_down_read_slowpath+0x5f9/0x930 kernel/locking/rwsem.c:1095 __down_read_common+0x54/0x2a0 kernel/locking/rwsem.c:1260 btrfs_page_mkwrite+0x417/0xc80 fs/btrfs/inode.c:8526 do_page_mkwrite+0x19e/0x5e0 mm/memory.c:2947 wp_page_shared+0x15e/0x380 mm/memory.c:3295 handle_pte_fault mm/memory.c:4949 [inline] __handle_mm_fault mm/memory.c:5073 [inline] handle_mm_fault+0x1b79/0x26b0 mm/memory.c:5219 do_user_addr_fault+0x69b/0xcb0 arch/x86/mm/fault.c:1428 handle_page_fault arch/x86/mm/fault.c:1519 [inline] exc_page_fault+0x7a/0x110 arch/x86/mm/fault.c:1575 asm_exc_page_fault+0x22/0x30 arch/x86/include/asm/idtentry.h:570 RIP: 0010:copy_user_short_string+0xd/0x40 arch/x86/lib/copy_user_64.S:233 Code: 74 0a 89 (...) RSP: 0018:ffffc9000570f330 EFLAGS: 00050202 RAX: ffffffff843e6601 RBX: 00007fffffffefc8 RCX: 0000000000000007 RDX: 0000000000000000 RSI: ffffc9000570f3e0 RDI: 0000000020000120 RBP: ffffc9000570f490 R08: 0000000000000000 R09: fffff52000ae1e83 R10: fffff52000ae1e83 R11: 1ffff92000ae1e7c R12: 0000000000000038 R13: ffffc9000570f3e0 R14: 0000000020000120 R15: ffffc9000570f3e0 copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline] raw_copy_to_user arch/x86/include/asm/uaccess_64.h:58 [inline] _copy_to_user+0xe9/0x130 lib/usercopy.c:34 copy_to_user include/linux/uaccess.h:169 [inline] fiemap_fill_next_extent+0x22e/0x410 fs/ioctl.c:144 emit_fiemap_extent+0x22d/0x3c0 fs/btrfs/extent_io.c:3458 fiemap_process_hole+0xa00/0xad0 fs/btrfs/extent_io.c:3716 extent_fiemap+0xe27/0x2100 fs/btrfs/extent_io.c:3922 btrfs_fiemap+0x172/0x1e0 fs/btrfs/inode.c:8209 ioctl_fiemap fs/ioctl.c:219 [inline] do_vfs_ioctl+0x185b/0x2980 fs/ioctl.c:810 __do_sys_ioctl fs/ioctl.c:868 [inline] __se_sys_ioctl+0x83/0x170 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f7d4054e9b9 RSP: 002b:00007f7d390d92f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f7d405d87b0 RCX: 00007f7d4054e9b9 RDX: 0000000020000100 RSI: 00000000c020660b RDI: 0000000000000005 RBP: 00007f7d405a51d0 R08: 00007f7d390d9700 R09: 0000000000000000 R10: 00007f7d390d9700 R11: 0000000000000246 R12: 61635f65646f6e69 R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87b8 </TASK> What happens is the following: 1) Task A is doing an fsync, enters btrfs_sync_file() and flushes delalloc before locking the inode and the i_mmap_lock semaphore, that is, before calling btrfs_inode_lock(); 2) After task A flushes delalloc and before it calls btrfs_inode_lock(), another task dirties a page; 3) Task B starts a fiemap without FIEMAP_FLAG_SYNC, so the page dirtied at step 2 remains dirty and unflushed. Then when it enters extent_fiemap() and it locks a file range that includes the range of the page dirtied in step 2; 4) Task A calls btrfs_inode_lock() and locks the inode (VFS lock) and the inode's i_mmap_lock semaphore in write mode. Then it tries to flush delalloc by calling start_ordered_ops(), which will block, at find_lock_delalloc_range(), when trying to lock the range of the page dirtied at step 2, since this range was locked by the fiemap task (at step 3); 5) Task B generates a page fault when accessing the user space fiemap buffer with a call to fiemap_fill_next_extent(). The fault handler needs to call btrfs_page_mkwrite() for some other page of our inode, and there we deadlock when trying to lock the inode's i_mmap_lock semaphore in read mode, since the fsync task locked it in write mode (step 4) and the fsync task can not progress because it's waiting to lock a file range that is currently locked by us (the fiemap task, step 3). Fix this by taking the inode's lock (VFS lock) in shared mode when entering fiemap. This effectively serializes fiemap with fsync (except the most expensive part of fsync, the log sync), preventing this deadlock. Reported-by: [email protected] Link: https://lore.kernel.org/linux-btrfs/[email protected]/ CC: [email protected] # 6.1+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-09ext4: don't show commit interval if it is zeroWang Jianjian1-1/+1
If commit interval is 0, it means using default value. Fixes: 6e47a3cc68fc ("ext4: get rid of super block and sbi from handle_mount_ops()") Signed-off-by: Wang Jianjian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-02-09ext4: use ext4_fc_tl_mem in fast-commit replay pathEric Biggers1-18/+26
To avoid 'sparse' warnings about missing endianness conversions, don't store native endianness values into struct ext4_fc_tl. Instead, use a separate struct type, ext4_fc_tl_mem. Fixes: dcc5827484d6 ("ext4: factor out ext4_fc_get_tl()") Cc: Ye Bin <[email protected]> Signed-off-by: Eric Biggers <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-02-09ext4: improve xattr consistency checking and error reportingTheodore Ts'o1-46/+80
Refactor the in-inode and xattr block consistency checking, and report more fine-grained reports of the consistency problems. Also add more consistency checks for ea_inode number. Reviewed-by: Andreas Dilger <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-02-09udf: Avoid directory type conversion failure due to ENOMEMJan Kara1-3/+6
When converting directory from in-ICB to normal format, the last iteration through the directory fixing up directory enteries can fail due to ENOMEM. We do not expect this iteration to fail since the directory is already verified to be correct and it is difficult to undo the conversion at this point. So just use GFP_NOFAIL to make sure the small allocation cannot fail. Reported-by: [email protected] Fixes: 0aba4860b0d0 ("udf: Allocate name buffer in directory iterator on heap") Signed-off-by: Jan Kara <[email protected]>
2023-02-08coda: Avoid partial allocation of sig_inputArgsKees Cook1-1/+1
GCC does not like having a partially allocated object, since it cannot reason about it for bounds checking when it is passed to other code. Instead, fully allocate sig_inputArgs. (Alternatively, sig_inputArgs should be defined as a struct coda_in_hdr, if it is actually not using any other part of the union.) Seen under GCC 13: ../fs/coda/upcall.c: In function 'coda_upcall': ../fs/coda/upcall.c:801:22: warning: array subscript 'union inputArgs[0]' is partly outside array bounds of 'unsigned char[20]' [-Warray-bounds=] 801 | sig_inputArgs->ih.opcode = CODA_SIGNAL; | ^~ Cc: Jan Harkes <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-07fscrypt: clean up fscrypt_add_test_dummy_key()Eric Biggers3-22/+11
Now that fscrypt_add_test_dummy_key() is only called by setup_file_encryption_key() and not by the individual filesystems, un-export it. Also change its prototype to take the fscrypt_key_specifier directly, as the caller already has it. Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-07fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super()Eric Biggers1-1/+0
Now that the key associated with the "test_dummy_operation" mount option is added on-demand when it's needed, rather than immediately when the filesystem is mounted, fscrypt_destroy_keyring() no longer needs to be called from __put_super() to avoid a memory leak on mount failure. Remove this call, which was causing confusion because it appeared to be a sleep-in-atomic bug (though it wasn't, for a somewhat-subtle reason). Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-07f2fs: stop calling fscrypt_add_test_dummy_key()Eric Biggers1-6/+0
Now that fs/crypto/ adds the test dummy encryption key on-demand when it's needed, there's no need for individual filesystems to call fscrypt_add_test_dummy_key(). Remove the call to it from f2fs. Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-07ext4: stop calling fscrypt_add_test_dummy_key()Eric Biggers1-12/+1
Now that fs/crypto/ adds the test dummy encryption key on-demand when it's needed, there's no need for individual filesystems to call fscrypt_add_test_dummy_key(). Remove the call to it from ext4. Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-07fscrypt: add the test dummy encryption key on-demandEric Biggers3-4/+25
When the key for an inode is not found but the inode is using the test_dummy_encryption policy, automatically add the test_dummy_encryption key to the filesystem keyring. This eliminates the need for all the individual filesystems to do this at mount time, which is a bit tricky to clean up from on failure. Note: this covers the call to fscrypt_find_master_key() from inode key setup, but not from the fscrypt ioctls. So, this isn't *exactly* the same as the key being present from the very beginning. I think we can tolerate that, though, since the inode key setup caller is the only one that actually matters in the context of test_dummy_encryption. Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-07f2fs: fix to set ipu policyYangtao Li3-5/+29
For LFS mode, it should update outplace and no need inplace update. When using LFS mode for small-volume devices, IPU will not be used, and the OPU writing method is actually used, but F2FS_IPU_FORCE can be read from the ipu_policy node, which is different from the actual situation. And remount to lfs mode should be disallowed when f2fs ipu is enabled, let's fix it. Fixes: 84b89e5d943d ("f2fs: add auto tuning for small devices") Signed-off-by: Yangtao Li <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-07f2fs: fix typos in commentsJinyoung CHOI7-14/+14
This patch is to fix typos in f2fs files. Signed-off-by: Jinyoung Choi <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-07f2fs: fix kernel crash due to null io->bioJaegeuk Kim1-0/+4
We should return when io->bio is null before doing anything. Otherwise, panic. BUG: kernel NULL pointer dereference, address: 0000000000000010 RIP: 0010:__submit_merged_write_cond+0x164/0x240 [f2fs] Call Trace: <TASK> f2fs_submit_merged_write+0x1d/0x30 [f2fs] commit_checkpoint+0x110/0x1e0 [f2fs] f2fs_write_checkpoint+0x9f7/0xf00 [f2fs] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs] __checkpoint_and_complete_reqs+0x84/0x190 [f2fs] ? preempt_count_add+0x82/0xc0 ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs] issue_checkpoint_thread+0x4c/0xf0 [f2fs] ? __pfx_autoremove_wake_function+0x10/0x10 kthread+0xff/0x130 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Cc: [email protected] # v5.18+ Fixes: 64bf0eef0171 ("f2fs: pass the bio operation to bio_alloc_bioset") Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-07f2fs: use iostat_lat_type directly as a parameter in the ↵Yangtao Li3-38/+33
iostat_update_and_unbind_ctx() Convert to use iostat_lat_type as parameter instead of raw number. BTW, move NUM_PREALLOC_IOSTAT_CTXS to the header file, adjust iostat_lat[{0,1,2}] to iostat_lat[{READ_IO,WRITE_SYNC_IO,WRITE_ASYNC_IO}] in tracepoint function, and rename iotype to page_type to match the definition. Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Yangtao Li <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-07f2fs: add sysfs nodes to set last_age_weightqixiaoyu13-6/+21
Signed-off-by: qixiaoyu1 <[email protected]> Signed-off-by: xiongping1 <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-07ceph: flush cap releases when the session is flushedXiubo Li1-0/+6
MDS expects the completed cap release prior to responding to the session flush for cache drop. Cc: [email protected] Link: http://tracker.ceph.com/issues/38009 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Venky Shankar <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-02-07udf: Use unsigned variables for size calculationsKees Cook1-2/+3
To avoid confusing the compiler about possible negative sizes, switch various size variables that can never be negative from int to u32. Seen with GCC 13: ../fs/udf/directory.c: In function 'udf_copy_fi': ../include/linux/fortify-string.h:57:33: warning: '__builtin_memcpy' pointer overflow between offset 80 and size [-2147483648, -1] [-Warray-bounds=] 57 | #define __underlying_memcpy __builtin_memcpy | ^ ... ../fs/udf/directory.c:102:9: note: in expansion of macro 'memcpy' 102 | memcpy(&iter->fi, iter->bh[0]->b_data + off, len); | ^~~~~~ Cc: Jan Kara <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <[email protected]>
2023-02-07fanotify,audit: Allow audit to use the full permission event responseRichard Guy Briggs1-1/+2
This patch passes the full response so that the audit function can use all of it. The audit function was updated to log the additional information in the AUDIT_FANOTIFY record. Currently the only type of fanotify info that is defined is an audit rule number, but convert it to hex encoding to future-proof the field. Hex encoding suggested by Paul Moore <[email protected]>. The {subj,obj}_trust values are {0,1,2}, corresponding to no, yes, unknown. Sample records: type=FANOTIFY msg=audit(1600385147.372:590): resp=2 fan_type=1 fan_info=3137 subj_trust=3 obj_trust=5 type=FANOTIFY msg=audit(1659730979.839:284): resp=1 fan_type=0 fan_info=0 subj_trust=2 obj_trust=2 Suggested-by: Steve Grubb <[email protected]> Link: https://lore.kernel.org/r/3075502.aeNJFYEL58@x2 Tested-by: Steve Grubb <[email protected]> Acked-by: Steve Grubb <[email protected]> Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <bcb6d552e517b8751ece153e516d8b073459069c.1675373475.git.rgb@redhat.com>
2023-02-07fanotify: define struct members to hold response decision contextRichard Guy Briggs3-22/+73
This patch adds a flag, FAN_INFO and an extensible buffer to provide additional information about response decisions. The buffer contains one or more headers defining the information type and the length of the following information. The patch defines one additional information type, FAN_RESPONSE_INFO_AUDIT_RULE, to audit a rule number. This will allow for the creation of other information types in the future if other users of the API identify different needs. The kernel can be tested if it supports a given info type by supplying the complete info extension but setting fd to FAN_NOFD. It will return the expected size but not issue an audit record. Suggested-by: Steve Grubb <[email protected]> Link: https://lore.kernel.org/r/2745105.e9J7NaK4W3@x2 Suggested-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Steve Grubb <[email protected]> Acked-by: Steve Grubb <[email protected]> Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <10177cfcae5480926b7176321a28d9da6835b667.1675373475.git.rgb@redhat.com>
2023-02-07fanotify: Ensure consistent variable type for responseRichard Guy Briggs2-4/+4
The user space API for the response variable is __u32. This patch makes sure that the whole path through the kernel uses u32 so that there is no sign extension or truncation of the user space response. Suggested-by: Steve Grubb <[email protected]> Link: https://lore.kernel.org/r/12617626.uLZWGnKmhe@x2 Signed-off-by: Richard Guy Briggs <[email protected]> Acked-by: Paul Moore <[email protected]> Tested-by: Steve Grubb <[email protected]> Acked-by: Steve Grubb <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <3778cb0b3501bc4e686ba7770b20eb9ab0506cf4.1675373475.git.rgb@redhat.com>
2023-02-07udf: remove reporting loc in debug outputTom Rix1-2/+2
clang build fails with fs/udf/partition.c:86:28: error: variable 'loc' is uninitialized when used here [-Werror,-Wuninitialized] sb, block, partition, loc, index); ^~~ loc is now only known when bh is valid. So remove reporting loc in debug output. Fixes: 4215db46d538 ("udf: Use udf_bread() in udf_get_pblock_virt15()") Reported-by: kernel test robot <[email protected]> Reported-by: "kernelci.org bot" <[email protected]> Signed-off-by: Tom Rix <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2023-02-07udf: Check consistency of Space Bitmap DescriptorVladislav Efanov1-4/+27
Bits, which are related to Bitmap Descriptor logical blocks, are not reset when buffer headers are allocated for them. As the result, these logical blocks can be treated as free and be used for other blocks.This can cause usage of one buffer header for several types of data. UDF issues WARNING in this situation: WARNING: CPU: 0 PID: 2703 at fs/udf/inode.c:2014 __udf_add_aext+0x685/0x7d0 fs/udf/inode.c:2014 RIP: 0010:__udf_add_aext+0x685/0x7d0 fs/udf/inode.c:2014 Call Trace: udf_setup_indirect_aext+0x573/0x880 fs/udf/inode.c:1980 udf_add_aext+0x208/0x2e0 fs/udf/inode.c:2067 udf_insert_aext fs/udf/inode.c:2233 [inline] udf_update_extents fs/udf/inode.c:1181 [inline] inode_getblk+0x1981/0x3b70 fs/udf/inode.c:885 Found by Linux Verification Center (linuxtesting.org) with syzkaller. [JK: Somewhat cleaned up the boundary checks] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Vladislav Efanov <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2023-02-06cifs: Fix use-after-free in rdata->read_into_pages()ZhaoLong Wang1-2/+2
When the network status is unstable, use-after-free may occur when read data from the server. BUG: KASAN: use-after-free in readpages_fill_pages+0x14c/0x7e0 Call Trace: <TASK> dump_stack_lvl+0x38/0x4c print_report+0x16f/0x4a6 kasan_report+0xb7/0x130 readpages_fill_pages+0x14c/0x7e0 cifs_readv_receive+0x46d/0xa40 cifs_demultiplex_thread+0x121c/0x1490 kthread+0x16b/0x1a0 ret_from_fork+0x2c/0x50 </TASK> Allocated by task 2535: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 __kasan_kmalloc+0x82/0x90 cifs_readdata_direct_alloc+0x2c/0x110 cifs_readdata_alloc+0x2d/0x60 cifs_readahead+0x393/0xfe0 read_pages+0x12f/0x470 page_cache_ra_unbounded+0x1b1/0x240 filemap_get_pages+0x1c8/0x9a0 filemap_read+0x1c0/0x540 cifs_strict_readv+0x21b/0x240 vfs_read+0x395/0x4b0 ksys_read+0xb8/0x150 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 79: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x50 __kasan_slab_free+0x10e/0x1a0 __kmem_cache_free+0x7a/0x1a0 cifs_readdata_release+0x49/0x60 process_one_work+0x46c/0x760 worker_thread+0x2a4/0x6f0 kthread+0x16b/0x1a0 ret_from_fork+0x2c/0x50 Last potentially related work creation: kasan_save_stack+0x22/0x50 __kasan_record_aux_stack+0x95/0xb0 insert_work+0x2b/0x130 __queue_work+0x1fe/0x660 queue_work_on+0x4b/0x60 smb2_readv_callback+0x396/0x800 cifs_abort_connection+0x474/0x6a0 cifs_reconnect+0x5cb/0xa50 cifs_readv_from_socket.cold+0x22/0x6c cifs_read_page_from_socket+0xc1/0x100 readpages_fill_pages.cold+0x2f/0x46 cifs_readv_receive+0x46d/0xa40 cifs_demultiplex_thread+0x121c/0x1490 kthread+0x16b/0x1a0 ret_from_fork+0x2c/0x50 The following function calls will cause UAF of the rdata pointer. readpages_fill_pages cifs_read_page_from_socket cifs_readv_from_socket cifs_reconnect __cifs_reconnect cifs_abort_connection mid->callback() --> smb2_readv_callback queue_work(&rdata->work) # if the worker completes first, # the rdata is freed cifs_readv_complete kref_put cifs_readdata_release kfree(rdata) return rdata->... # UAF in readpages_fill_pages() Similarly, this problem also occurs in the uncache_fill_pages(). Fix this by adjusts the order of condition judgment in the return statement. Signed-off-by: ZhaoLong Wang <[email protected]> Cc: [email protected] Acked-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-02-06btrfs: simplify update of last_dir_index_offset when logging a directoryFilipe Manana2-8/+17
When logging a directory, we always set the inode's last_dir_index_offset to the offset of the last dir index item we found. This is using an extra field in the log context structure, and it makes more sense to update it only after we insert dir index items, and we could directly update the inode's last_dir_index_offset field instead. So make this simpler by updating the inode's last_dir_index_offset only when we actually insert dir index keys in the log tree, and getting rid of the last_dir_item_offset field in the log context structure. Reported-by: David Arendt <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reported-by: Maxim Mikityanskiy <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reported-by: Hunter Wardlaw <[email protected]> Link: https://bugzilla.suse.com/show_bug.cgi?id=1207231 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216851 CC: [email protected] # 6.1+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-06Merge tag 'for-6.2-rc7-tag' of ↵Linus Torvalds4-8/+20
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - explicitly initialize zlib work memory to fix a KCSAN warning - limit number of send clones by maximum memory allocated - limit device size extent in case it device shrink races with chunk allocation - raid56 fixes: - fix copy&paste error in RAID6 stripe recovery - make error bitmap update atomic * tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: raid56: make error_bitmap update atomic btrfs: send: limit number of clones and allocated memory size btrfs: zlib: zero-initialize zlib workspace btrfs: limit device extents to the device size btrfs: raid56: fix stripes if vertical errors are found
2023-02-05f2fs: fix f2fs_show_options to show nogc_merge mount optionYangtao Li1-0/+2
Commit 5911d2d1d1a3 ("f2fs: introduce gc_merge mount option") forgot to show nogc_merge option, let's fix it. Signed-off-by: Yangtao Li <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-05f2fs: fix cgroup writeback accounting with fs-layer encryptionEric Biggers1-3/+3
When writing a page from an encrypted file that is using filesystem-layer encryption (not inline encryption), f2fs encrypts the pagecache page into a bounce page, then writes the bounce page. It also passes the bounce page to wbc_account_cgroup_owner(). That's incorrect, because the bounce page is a newly allocated temporary page that doesn't have the memory cgroup of the original pagecache page. This makes wbc_account_cgroup_owner() not account the I/O to the owner of the pagecache page as it should. Fix this by always passing the pagecache page to wbc_account_cgroup_owner(). Fixes: 578c647879f7 ("f2fs: implement cgroup writeback support") Cc: [email protected] Reported-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Eric Biggers <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-05f2fs: fix wrong calculation of block ageqixiaoyu11-3/+10
Currently we wrongly calculate the new block age to old * LAST_AGE_WEIGHT / 100. Fix it to new * (100 - LAST_AGE_WEIGHT) / 100 + old * LAST_AGE_WEIGHT / 100. Signed-off-by: qixiaoyu1 <[email protected]> Signed-off-by: xiongping1 <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-24/+24
Pull ELF fix from Al Viro: "One of the many equivalent build warning fixes for !CONFIG_ELF_CORE configs. Geert's is the earliest one I've been able to find" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coredump: Move dump_emit_page() to kill unused warning
2023-02-05xfs: don't use BMBT btree split workers for IO completionDave Chinner1-2/+16
When we split a BMBT due to record insertion, we offload it to a worker thread because we can be deep in the stack when we try to allocate a new block for the BMBT. Allocation can use several kilobytes of stack (full memory reclaim, swap and/or IO path can end up on the stack during allocation) and we can already be several kilobytes deep in the stack when we need to split the BMBT. A recent workload demonstrated a deadlock in this BMBT split offload. It requires several things to happen at once: 1. two inodes need a BMBT split at the same time, one must be unwritten extent conversion from IO completion, the other must be from extent allocation. 2. there must be a no available xfs_alloc_wq worker threads available in the worker pool. 3. There must be sustained severe memory shortages such that new kworker threads cannot be allocated to the xfs_alloc_wq pool for both threads that need split work to be run 4. The split work from the unwritten extent conversion must run first. 5. when the BMBT block allocation runs from the split work, it must loop over all AGs and not be able to either trylock an AGF successfully, or each AGF is is able to lock has no space available for a single block allocation. 6. The BMBT allocation must then attempt to lock the AGF that the second task queued to the rescuer thread already has locked before it finds an AGF it can allocate from. At this point, we have an ABBA deadlock between tasks queued on the xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task holding the AGF lock can't be run by the rescuer thread until the task the rescuer thread is runing gets the AGF lock.... This is a highly improbably series of events, but there it is. There's a couple of ways to fix this, but the easiest way to ensure that we only punt tasks with a locked AGF that holds enough space for the BMBT block allocations to the worker thread. This works for unwritten extent conversion in IO completion (which doesn't have a locked AGF and space reservations) because we have tight control over the IO completion stack. It is typically only 6 functions deep when xfs_btree_split() is called because we've already offloaded the IO completion work to a worker thread and hence we don't need to worry about stack overruns here. The other place we can be called for a BMBT split without a preceeding allocation is __xfs_bunmapi() when punching out the center of an existing extent. We don't remove extents in the IO path, so these operations don't tend to be called with a lot of stack consumed. Hence we don't really need to ship the split off to a worker thread in these cases, either. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05xfs: fix confusing variable names in xfs_refcount_item.cDarrick J. Wong1-27/+27
Variable names in this code module are inconsistent and confusing. xfs_phys_extent describe physical mappings, so rename them "pmap". xfs_refcount_intents describe refcount intents, so rename them "ri". Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05xfs: pass refcount intent directly through the log intent codeDarrick J. Wong4-103/+74
Pass the incore refcount intent through the CUI logging code instead of repeatedly boxing and unboxing parameters. Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05xfs: fix confusing variable names in xfs_rmap_item.cDarrick J. Wong1-39/+40
Variable names in this code module are inconsistent and confusing. xfs_map_extent describe file mappings, so rename them "map". xfs_rmap_intents describe block mapping intents, so rename them "ri". Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05xfs: pass rmap space mapping directly through the log intent codeDarrick J. Wong3-66/+55
Pass the incore rmap space mapping through the RUI logging code instead of repeatedly boxing and unboxing parameters. Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05xfs: fix confusing xfs_extent_item variable namesDarrick J. Wong2-51/+51
Change the name of all pointers to xfs_extent_item structures to "xefi" to make the name consistent and because the current selections ("new" and "free") mean other things in C. Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05xfs: pass xfs_extent_free_item directly through the log intent codeDarrick J. Wong1-25/+30
Pass the incore xfs_extent_free_item through the EFI logging code instead of repeatedly boxing and unboxing parameters. Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05xfs: fix confusing variable names in xfs_bmap_item.cDarrick J. Wong1-28/+28
Variable names in this code module are inconsistent and confusing. xfs_map_extent describe file mappings, so rename them "map". xfs_bmap_intents describe block mapping intents, so rename them "bi". Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05xfs: pass the xfs_bmbt_irec directly through the log intent codeDarrick J. Wong3-72/+46
Instead of repeatedly boxing and unboxing the incore extent mapping structure as it passes through the BUI code, pass the pointer directly through. Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05xfs: use strscpy() to instead of strncpy()Xu Panda1-3/+1
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL-terminated strings. Signed-off-by: Xu Panda <[email protected]> Signed-off-by: Yang Yang <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2023-02-05kbuild: remove --include-dir MAKEFLAG from top MakefileMasahiro Yamada1-1/+1
I added $(srctree)/ to some included Makefiles in the following commits: - 3204a7fb98a3 ("kbuild: prefix $(srctree)/ to some included Makefiles") - d82856395505 ("kbuild: do not require sub-make for separate output tree builds") They were a preparation for removing --include-dir flag. I have never thought --include-dir useful. Rather, it _is_ harmful. For example, run the following commands: $ make -s ARCH=x86 mrproper defconfig $ make ARCH=arm O=foo dtbs make[1]: Entering directory '/tmp/linux/foo' HOSTCC scripts/basic/fixdep Error: kernelrelease not valid - run 'make prepare' to update it UPD include/config/kernel.release make[1]: Leaving directory '/tmp/linux/foo' The first command configures the source tree for x86. The next command tries to build ARM device trees in the separate foo/ directory - this must stop because the directory foo/ has not been configured yet. However, due to --include-dir=$(abs_srctree), the top Makefile includes the wrong include/config/auto.conf from the source tree and continues building. Kbuild traverses the directory tree, but of course it does not work correctly. The Error message is also pointless - 'make prepare' does not help at all for fixing the issue. This commit fixes more arch Makefile, and finally removes --include-dir from the top Makefile. There are more breakages under drivers/, but I do not volunteer to fix them all. I just moved --include-dir to drivers/Makefile. With this commit, the second command will stop with a sensible message. $ make -s ARCH=x86 mrproper defconfig $ make ARCH=arm O=foo dtbs make[1]: Entering directory '/tmp/linux/foo' SYNC include/config/auto.conf.cmd *** *** The source tree is not clean, please run 'make ARCH=arm mrproper' *** in /tmp/linux *** make[2]: *** [../Makefile:646: outputmakefile] Error 1 /tmp/linux/Makefile:770: include/config/auto.conf.cmd: No such file or directory make[1]: *** [/tmp/linux/Makefile:793: include/config/auto.conf.cmd] Error 2 make[1]: Leaving directory '/tmp/linux/foo' make: *** [Makefile:226: __sub-make] Error 2 Signed-off-by: Masahiro Yamada <[email protected]>
2023-02-03revert "squashfs: harden sanity check in squashfs_read_xattr_id_table"Andrew Morton1-1/+1
This fix was nacked by Philip, for reasons identified in the email linked below. Link: https://lkml.kernel.org/r/[email protected] Fixes: 72e544b1b28325 ("squashfs: harden sanity check in squashfs_read_xattr_id_table") Cc: Alexey Khoroshilov <[email protected]> Cc: Fedor Pchelkin <[email protected]> Cc: Phillip Lougher <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-03fsdax: dax_unshare_iter() should return a valid lengthShiyang Ruan1-2/+3
The copy_mc_to_kernel() will return 0 if it executed successfully. Then the return value should be set to the length it copied. [[email protected]: don't mess up `ret', per Matthew] Link: https://lkml.kernel.org/r/[email protected] Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax") Signed-off-by: Shiyang Ruan <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-03aio: fix mremap after fork null-derefSeth Jenkins1-0/+4
Commit e4a0d3e720e7 ("aio: Make it possible to remap aio ring") introduced a null-deref if mremap is called on an old aio mapping after fork as mm->ioctx_table will be set to NULL. [[email protected]: fix 80 column issue] Link: https://lkml.kernel.org/r/[email protected] Fixes: e4a0d3e720e7 ("aio: Make it possible to remap aio ring") Signed-off-by: Seth Jenkins <[email protected]> Signed-off-by: Jeff Moyer <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: Jann Horn <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-03Merge tag 'ceph-for-6.2-rc7' of https://github.com/ceph/ceph-clientLinus Torvalds6-10/+103
Pull ceph fix from Ilya Dryomov: "A safeguard to prevent the kernel client from further damaging the filesystem after running into a case of an invalid snap trace. The root cause of this metadata corruption is still being investigated but it appears to be stemming from the MDS. As such, this is the best we can do for now" * tag 'ceph-for-6.2-rc7' of https://github.com/ceph/ceph-client: ceph: blocklist the kclient when receiving corrupted snap trace ceph: move mount state enum to super.h
2023-02-03Merge tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of ↵Linus Torvalds6-10/+8
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "25 hotfixes, mainly for MM. 13 are cc:stable" * tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits) mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() Kconfig.debug: fix the help description in SCHED_DEBUG mm/swapfile: add cond_resched() in get_swap_pages() mm: use stack_depot_early_init for kmemleak Squashfs: fix handling and sanity checking of xattr_ids count sh: define RUNTIME_DISCARD_EXIT highmem: round down the address passed to kunmap_flush_on_unmap() migrate: hugetlb: check for hugetlb shared PMD in node migration mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Revert "mm: kmemleak: alloc gray object for reserved region with direct map" freevxfs: Kconfig: fix spelling maple_tree: should get pivots boundary by type .mailmap: update e-mail address for Eugen Hristev mm, mremap: fix mremap() expanding for vma's with vm_ops->close() squashfs: harden sanity check in squashfs_read_xattr_id_table ia64: fix build error due to switch case label appearing next to declaration mm: multi-gen LRU: fix crash during cgroup migration Revert "mm: add nodes= arg to memory.reclaim" zsmalloc: fix a race with deferred_handles storing ...
2023-02-03splice: use bvec_set_page to initialize a bvecChristoph Hellwig1-3/+2
Use the bvec_set_page helper to initialize a bvec. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-02-03orangefs: use bvec_set_{page,folio} to initialize bvecsChristoph Hellwig1-15/+7
Use the bvec_set_page and bvec_set_folio helpers to initialize bvecs. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-02-03nfs: use bvec_set_page to initialize bvecsChristoph Hellwig1-10/+6
Use the bvec_set_page helper to initialize bvecs. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Trond Myklebust <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-02-03coredump: use bvec_set_page to initialize a bvecChristoph Hellwig1-5/+2
Use the bvec_set_page helper to initialize a bvec. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>