aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-06f2fs: convert f2fs_write_end() to use folioChao Yu1-6/+7
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_write_begin() to use folioChao Yu1-21/+23
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_submit_page_read() to use folioChao Yu1-6/+6
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_handle_page_eio() to use folioChao Yu3-4/+6
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_read_multi_pages() to use folioChao Yu1-12/+16
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert __f2fs_write_meta_page() to use folioChao Yu1-6/+7
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_do_write_meta_page() to use folioChao Yu3-9/+9
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_write_single_data_page() to use folioChao Yu3-16/+18
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_write_inline_data() to use folioChao Yu4-8/+8
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_clear_page_cache_dirty_tag() to use folioChao Yu5-6/+5
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_vm_page_mkwrite() to use folioChao Yu1-16/+16
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_compress_ctx_add_page() to use folioChao Yu3-10/+10
onvert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: Use sysfs_emit_at() to simplify codeChristophe JAILLET1-24/+21
This file already uses sysfs_emit(). So be consistent and also use sysfs_emit_at(). This slightly simplifies the code and makes it more readable. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: atomic: fix to forbid dio in atomic_fileChao Yu1-12/+24
atomic write can only be used via buffered IO, let's fail direct IO on atomic_file and return -EOPNOTSUPP. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: compress: don't redirty sparse cluster during {,de}compressYeongjin Gil3-26/+61
In f2fs_do_write_data_page, when the data block is NULL_ADDR, it skips writepage considering that it has been already truncated. This results in an infinite loop as the PAGECACHE_TAG_TOWRITE tag is not cleared during the writeback process for a compressed file including NULL_ADDR in compress_mode=user. This is the reproduction process: 1. dd if=/dev/zero bs=4096 count=1024 seek=1024 of=testfile 2. f2fs_io compress testfile 3. dd if=/dev/zero bs=4096 count=1 conv=notrunc of=testfile 4. f2fs_io decompress testfile To prevent the problem, let's check whether the cluster is fully allocated before redirty its pages. Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Reviewed-by: Sungjong Seo <[email protected]> Reviewed-by: Sunmin Jeong <[email protected]> Tested-by: Jaewook Kim <[email protected]> Signed-off-by: Yeongjin Gil <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: check discard support for conventional zonesShin'ichiro Kawasaki1-0/+7
As the helper function f2fs_bdev_support_discard() shows, f2fs checks if the target block devices support discard by calling bdev_max_discard_sectors() and bdev_is_zoned(). This check works well for most cases, but it does not work for conventional zones on zoned block devices. F2fs assumes that zoned block devices support discard, and calls __submit_discard_cmd(). When __submit_discard_cmd() is called for sequential write required zones, it works fine since __submit_discard_cmd() issues zone reset commands instead of discard commands. However, when __submit_discard_cmd() is called for conventional zones, __blkdev_issue_discard() is called even when the devices do not support discard. The inappropriate __blkdev_issue_discard() call was not a problem before the commit 30f1e7241422 ("block: move discard checks into the ioctl handler") because __blkdev_issue_discard() checked if the target devices support discard or not. If not, it returned EOPNOTSUPP. After the commit, __blkdev_issue_discard() no longer checks it. It always returns zero and sets NULL to the given bio pointer. This NULL pointer triggers f2fs_bug_on() in __submit_discard_cmd(). The BUG is recreated with the commands below at the umount step, where /dev/nullb0 is a zoned null_blk with 5GB total size, 128MB zone size and 10 conventional zones. $ mkfs.f2fs -f -m /dev/nullb0 $ mount /dev/nullb0 /mnt $ for ((i=0;i<5;i++)); do dd if=/dev/zero of=/mnt/test bs=65536 count=1600 conv=fsync; done $ umount /mnt To fix the BUG, avoid the inappropriate __blkdev_issue_discard() call. When discard is requested for conventional zones, check if the device supports discard or not. If not, return EOPNOTSUPP. Fixes: 30f1e7241422 ("block: move discard checks into the ioctl handler") Cc: [email protected] Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Chao Yu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread()Chao Yu3-4/+11
syzbot reports a f2fs bug as below: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_report+0xe8/0x550 mm/kasan/report.c:491 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline] __refcount_add include/linux/refcount.h:184 [inline] __refcount_inc include/linux/refcount.h:241 [inline] refcount_inc include/linux/refcount.h:258 [inline] get_task_struct include/linux/sched/task.h:118 [inline] kthread_stop+0xca/0x630 kernel/kthread.c:704 f2fs_stop_gc_thread+0x65/0xb0 fs/f2fs/gc.c:210 f2fs_do_shutdown+0x192/0x540 fs/f2fs/file.c:2283 f2fs_ioc_shutdown fs/f2fs/file.c:2325 [inline] __f2fs_ioctl+0x443a/0xbe60 fs/f2fs/file.c:4325 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is below race condition, it may cause use-after-free issue in sbi->gc_th pointer. - remount - f2fs_remount - f2fs_stop_gc_thread - kfree(gc_th) - f2fs_ioc_shutdown - f2fs_do_shutdown - f2fs_stop_gc_thread - kthread_stop(gc_th->f2fs_gc_task) : sbi->gc_thread = NULL; We will call f2fs_do_shutdown() in two paths: - for f2fs_ioc_shutdown() path, we should grab sb->s_umount semaphore for fixing. - for f2fs_shutdown() path, it's safe since caller has already grabbed sb->s_umount semaphore. Reported-by: [email protected] Closes: https://lore.kernel.org/linux-f2fs-devel/[email protected] Fixes: 7950e9ac638e ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: atomic: fix to truncate pagecache before on-disk metadata truncationChao Yu1-0/+4
We should always truncate pagecache while truncating on-disk data. Fixes: a46bebd502fe ("f2fs: synchronize atomic write aborts") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: fix to wait page writeback before setting gcing flagChao Yu1-0/+4
Soft IRQ Thread - f2fs_write_end_io - f2fs_defragment_range - set_page_private_gcing - type = WB_DATA_TYPE(page, false); : assign type w/ F2FS_WB_CP_DATA due to page_private_gcing() is true - dec_page_count() w/ wrong type - end_page_writeback() Value of F2FS_WB_CP_DATA reference count may become negative under above race condition, the root cause is we missed to wait page writeback before setting gcing page private flag, let's fix it. Fixes: 2d1fe8a86bf5 ("f2fs: fix to tag gcing flag on page during file defragment") Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: Create COW inode from parent dentry for atomic writeYeongjin Gil1-9/+3
The i_pino in f2fs_inode_info has the previous parent's i_ino when inode was renamed, which may cause f2fs_ioc_start_atomic_write to fail. If file_wrong_pino is true and i_nlink is 1, then to find a valid pino, we should refer to the dentry from inode. To resolve this issue, let's get parent inode using parent dentry directly. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Reviewed-by: Sungjong Seo <[email protected]> Reviewed-by: Sunmin Jeong <[email protected]> Signed-off-by: Yeongjin Gil <[email protected]> Reviewed-by: Daeho Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: Require FMODE_WRITE for atomic write ioctlsJann Horn1-0/+9
The F2FS ioctls for starting and committing atomic writes check for inode_owner_or_capable(), but this does not give LSMs like SELinux or Landlock an opportunity to deny the write access - if the caller's FSUID matches the inode's UID, inode_owner_or_capable() immediately returns true. There are scenarios where LSMs want to deny a process the ability to write particular files, even files that the FSUID of the process owns; but this can currently partially be bypassed using atomic write ioctls in two ways: - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can truncate an inode to size 0 - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert changes another process concurrently made to a file Fix it by requiring FMODE_WRITE for these operations, just like for F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these ioctls when intending to write into the file, that seems unlikely to break anything. Fixes: 88b88a667971 ("f2fs: support atomic writes") Cc: [email protected] Signed-off-by: Jann Horn <[email protected]> Reviewed-by: Chao Yu <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: clean up val{>>,<<}F2FS_BLKSIZE_BITSZhiguo Niu6-9/+9
Use F2FS_BYTES_TO_BLK(bytes) and F2FS_BLK_TO_BYTES(blk) for cleanup Signed-off-by: Zhiguo Niu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: fix to use per-inode maxbytes and cleanupZhiguo Niu3-9/+7
This is a supplement to commit 6d1451bf7f84 ("f2fs: fix to use per-inode maxbytes") for some missed cases, also cleanup redundant code in f2fs_llseek. Cc: Chengguang Xu <[email protected]> Signed-off-by: Zhiguo Niu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: use f2fs_get_node_page when write inline dataZijie Wang1-12/+11
We just need inode page when write inline data, use f2fs_get_node_page() to get it instead of using dnode_of_data, which can eliminate unnecessary struct use. Signed-off-by: Zijie Wang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: sysfs: support atgc_enabledliujinbao12-0/+14
When we add "atgc" to the fstab table, ATGC is not immediately enabled. There is a 7-day time threshold, and we can use "atgc_enabled" to show whether ATGC is enabled. Signed-off-by: liujinbao1 <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15Revert "f2fs: use flush command instead of FUA for zoned device"Wenjie Cheng2-3/+2
This reverts commit c550e25bca660ed2554cbb48d32b82d0bb98e4b1. Commit c550e25bca660ed2554cbb48d32b82d0bb98e4b1 ("f2fs: use flush command instead of FUA for zoned device") used additional flush command to keep write order. Since Commit dd291d77cc90eb6a86e9860ba8e6e38eebd57d12 ("block: Introduce zone write plugging") has enabled the block layer to handle this order issue, there is no need to use flush command. Signed-off-by: Wenjie Cheng <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: get rid of buffer_head useChao Yu5-41/+66
Convert to use folio and related functionality. Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: fix to avoid racing in between read and OPU dio writeChao Yu1-0/+4
If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete In LFS mode, if there is inflight dio, let's wait for its completion, this policy won't cover all race cases, however it is a tradeoff which avoids abusing lock around IO paths. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: fix to wait dio completionChao Yu1-0/+13
It should wait all existing dio write IOs before block removal, otherwise, previous direct write IO may overwrite data in the block which may be reused by other inode. Cc: [email protected] Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: reduce expensive checkpoint trigger frequencyChao Yu4-3/+19
We may trigger high frequent checkpoint for below case: 1. mkdir /mnt/dir1; set dir1 encrypted 2. touch /mnt/file1; fsync /mnt/file1 3. mkdir /mnt/dir2; set dir2 encrypted 4. touch /mnt/file2; fsync /mnt/file2 ... Although, newly created dir and file are not related, due to commit bbf156f7afa7 ("f2fs: fix lost xattrs of directories"), we will trigger checkpoint whenever fsync() comes after a new encrypted dir created. In order to avoid such performance regression issue, let's record an entry including directory's ino in global cache whenever we update directory's xattr data, and then triggerring checkpoint() only if xattr metadata of target file's parent was updated. This patch updates to cover below no encryption case as well: 1) parent is checkpointed 2) set_xattr(dir) w/ new xnid 3) create(file) 4) fsync(file) Fixes: bbf156f7afa7 ("f2fs: fix lost xattrs of directories") Reported-by: wangzijie <[email protected]> Reported-by: Zhiguo Niu <[email protected]> Tested-by: Zhiguo Niu <[email protected]> Reported-by: Yunlei He <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: atomic: fix to avoid racing w/ GCChao Yu2-2/+16
Case #1: SQLite App GC Thread Kworker Shrinker - f2fs_ioc_start_atomic_write - f2fs_ioc_commit_atomic_write - f2fs_commit_atomic_write - filemap_write_and_wait_range : write atomic_file's data to cow_inode echo 3 > drop_caches to drop atomic_file's cache. - f2fs_gc - gc_data_segment - move_data_page - set_page_dirty - writepages - f2fs_do_write_data_page : overwrite atomic_file's data to cow_inode - f2fs_down_write(&fi->i_gc_rwsem[WRITE]) - __f2fs_commit_atomic_write - f2fs_up_write(&fi->i_gc_rwsem[WRITE]) Case #2: SQLite App GC Thread Kworker - f2fs_ioc_start_atomic_write - __writeback_single_inode - do_writepages - f2fs_write_cache_pages - f2fs_write_single_data_page - f2fs_do_write_data_page : write atomic_file's data to cow_inode - f2fs_gc - gc_data_segment - move_data_page - set_page_dirty - writepages - f2fs_do_write_data_page : overwrite atomic_file's data to cow_inode - f2fs_ioc_commit_atomic_write In above cases racing in between atomic_write and GC, previous data in atomic_file may be overwrited to cow_file, result in data corruption. This patch introduces PAGE_PRIVATE_ATOMIC_WRITE bit flag in page.private, and use it to indicate that there is last dirty data in atomic file, and the data should be writebacked into cow_file, if the flag is not tagged in page, we should never write data across files. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: Daeho Jeong <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: fix macro definition stat_inc_cp_countJulian Sun1-1/+1
The macro stat_inc_cp_count accepts a parameter si, but it was not used, rather the variable sbi was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: fix macro definition on_f2fs_build_free_nidsJulian Sun1-1/+1
The macro on_f2fs_build_free_nids accepts a parameter nmi, but it was not used, rather the variable nm_i was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: add write priority option based on zone UFSLiao Yuanhong5-1/+58
Currently, we are using a mix of traditional UFS and zone UFS to support some functionalities that cannot be achieved on zone UFS alone. However, there are some issues with this approach. There exists a significant performance difference between traditional UFS and zone UFS. Under normal usage, we prioritize writes to zone UFS. However, in critical conditions (such as when the entire UFS is almost full), we cannot determine whether data will be written to traditional UFS or zone UFS. This can lead to significant performance fluctuations, which is not conducive to development and testing. To address this, we have added an option zlu_io_enable under sys with the following three modes: 1) zlu_io_enable == 0:Normal mode, prioritize writing to zone UFS; 2) zlu_io_enable == 1:Zone UFS only mode, only allow writing to zone UFS; 3) zlu_io_enable == 2:Traditional UFS priority mode, prioritize writing to traditional UFS. Signed-off-by: Liao Yuanhong <[email protected]> Signed-off-by: Wu Bo <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: avoid potential int overflow in sanity_check_area_boundary()Nikita Zhandarovich1-2/+2
While calculating the end addresses of main area and segment 0, u32 may be not enough to hold the result without the danger of int overflow. Just in case, play it safe and cast one of the operands to a wider type (u64). Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: fd694733d523 ("f2fs: cover large section in sanity check of super") Cc: [email protected] Signed-off-by: Nikita Zhandarovich <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: fix several potential integer overflows in file offsetsNikita Zhandarovich2-3/+3
When dealing with large extents and calculating file offsets by summing up according extent offsets and lengths of unsigned int type, one may encounter possible integer overflow if the values are big enough. Prevent this from happening by expanding one of the addends to (pgoff_t) type. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: d323d005ac4a ("f2fs: support file defragment") Cc: [email protected] Signed-off-by: Nikita Zhandarovich <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: prevent possible int overflow in dir_block_index()Nikita Zhandarovich1-1/+2
The result of multiplication between values derived from functions dir_buckets() and bucket_blocks() *could* technically reach 2^30 * 2^2 = 2^32. While unlikely to happen, it is prudent to ensure that it will not lead to integer overflow. Thus, use mul_u32_u32() as it's more appropriate to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 3843154598a0 ("f2fs: introduce large directory support") Cc: [email protected] Signed-off-by: Nikita Zhandarovich <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: clean up data_blkaddr() and get_dnode_addr()Chao Yu1-29/+17
Introudce a new help get_dnode_base() to wrap common code from get_dnode_addr() and data_blkaddr() for cleanup. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05Merge tag 'slab-fixes-for-6.11-rc2' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: "Since v6.8 we've had a subtle breakage in SLUB with KFENCE enabled, that can cause a crash. It hasn't been found earlier due to quite specific conditions necessary (OOM during kmem_cache_alloc_bulk())" * tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm, slub: do not call do_slab_free for kfence object
2024-08-04Linux 6.11-rc2Linus Torvalds1-1/+1
2024-08-04profiling: remove profile=sleep supportTetsuo Handa4-24/+2
The kernel sleep profile is no longer working due to a recursive locking bug introduced by commit 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked") Booting with the 'profile=sleep' kernel command line option added or executing # echo -n sleep > /sys/kernel/profiling after boot causes the system to lock up. Lockdep reports kthreadd/3 is trying to acquire lock: ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70 but task is already holding lock: ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370 with the call trace being lock_acquire+0xc8/0x2f0 get_wchan+0x32/0x70 __update_stats_enqueue_sleeper+0x151/0x430 enqueue_entity+0x4b0/0x520 enqueue_task_fair+0x92/0x6b0 ttwu_do_activate+0x73/0x140 try_to_wake_up+0x213/0x370 swake_up_locked+0x20/0x50 complete+0x2f/0x40 kthread+0xfb/0x180 However, since nobody noticed this regression for more than two years, let's remove 'profile=sleep' support based on the assumption that nobody needs this functionality. Fixes: 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked") Cc: [email protected] # v5.16+ Signed-off-by: Tetsuo Handa <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-08-04Merge tag 'x86-urgent-2024-08-04' of ↵Linus Torvalds8-17/+36
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver. A recent change in the ACPI code which consolidated code pathes moved the invocation of init_freq_invariance_cppc() to be moved to a CPU hotplug handler. The first invocation on AMD CPUs ends up enabling a static branch which dead locks because the static branch enable tries to acquire cpu_hotplug_lock but that lock is already held write by the hotplug machinery. Use static_branch_enable_cpuslocked() instead and take the hotplug lock read for the Intel code path which is invoked from the architecture code outside of the CPU hotplug operations. - Fix the number of reserved bits in the sev_config structure bit field so that the bitfield does not exceed 64 bit. - Add missing Zen5 model numbers - Fix the alignment assumptions of pti_clone_pgtable() and clone_entry_text() on 32-bit: The code assumes PMD aligned code sections, but on 32-bit the kernel entry text is not PMD aligned. So depending on the code size and location, which is configuration and compiler dependent, entry text can cross a PMD boundary. As the start is not PMD aligned adding PMD size to the start address is larger than the end address which results in partially mapped entry code for user space. That causes endless recursion on the first entry from userspace (usually #PF). Cure this by aligning the start address in the addition so it ends up at the next PMD start address. clone_entry_text() enforces PMD mapping, but on 32-bit the tail might eventually be PTE mapped, which causes a map fail because the PMD for the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for the clone() invocation which resolves to PTE on 32-bit and PMD on 64-bit. - Zero the 8-byte case for get_user() on range check failure on 32-bit The recend consolidation of the 8-byte get_user() case broke the zeroing in the failure case again. Establish it by clearing ECX before the range check and not afterwards as that obvioulsy can't be reached when the range check fails * tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit x86/mm: Fix pti_clone_entry_text() for i386 x86/mm: Fix pti_clone_pgtable() alignment assumption x86/setup: Parse the builtin command line before merging x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range x86/sev: Fix __reserved field in sev_config x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
2024-08-04Merge tag 'timers-urgent-2024-08-04' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes for the timer/clocksource code: - The recent fix to make the take over of the broadcast timer more reliable retrieves a per CPU pointer in preemptible context. This went unnoticed in testing as some compilers hoist the access into the non-preemotible section where the pointer is actually used, but obviously compilers can rightfully invoke it where the code put it. Move it into the non-preemptible section right to the actual usage side to cure it. - The clocksource watchdog is supposed to emit a warning when the retry count is greater than one and the number of retries reaches the limit. The condition is backwards and warns always when the count is greater than one. Fixup the condition to prevent spamming dmesg" * tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: Fix brown-bag boolean thinko in cs_watchdog_read() tick/broadcast: Move per CPU pointer access into the atomic section
2024-08-04Merge tag 'sched-urgent-2024-08-04' of ↵Linus Torvalds2-21/+53
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: - When stime is larger than rtime due to accounting imprecision, then utime = rtime - stime becomes negative. As this is unsigned math, the result becomes a huge positive number. Cure it by resetting stime to rtime in that case, so utime becomes 0. - Restore consistent state when sched_cpu_deactivate() fails. When offlining a CPU fails in sched_cpu_deactivate() after the SMT present counter has been decremented, then the function aborts but fails to increment the SMT present counter and leaves it imbalanced. Consecutive operations cause it to underflow. Add the missing fixup for the error path. For SMT accounting the runqueue needs to marked online again in the error exit path to restore consistent state. * tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate() sched/core: Introduce sched_set_rq_on/offline() helper sched/smt: Fix unbalance sched_smt_present dec/inc sched/smt: Introduce sched_smt_present_inc/dec() helper sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
2024-08-04Merge tag 'perf-urgent-2024-08-04' of ↵Linus Torvalds2-12/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Thomas Gleixner: - Move the smp_processor_id() invocation back into the non-preemtible region, so that the result is valid to use - Add the missing package C2 residency counters for Sierra Forest CPUs to make the newly added support actually useful * tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix smp_processor_id()-in-preemptible warnings perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
2024-08-04Merge tag 'irq-urgent-2024-08-04' of ↵Linus Torvalds4-16/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A couple of fixes for interrupt chip drivers: - Make sure to skip the clear register space in the MBIGEN driver when calculating the node register index. Otherwise the clear register is clobbered and the wrong node registers are accessed. - Fix a signed/unsigned confusion in the loongarch CPU driver which converts an error code to a huge "valid" interrupt number. - Convert the mesion GPIO interrupt controller lock to a raw spinlock so it works on RT. - Add a missing static to a internal function in the pic32 EVIC driver" * tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mbigen: Fix mbigen node address layout irqchip/meson-gpio: Convert meson_gpio_irq_controller::lock to 'raw_spinlock_t' irqchip/irq-pic32-evic: Add missing 'static' to internal function irqchip/loongarch-cpu: Fix return value of lpic_gsi_to_irq()
2024-08-04Merge tag 'locking-urgent-2024-08-04' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Two fixes for locking and jump labels: - Ensure that the atomic_cmpxchg() conditions are correct and evaluating to true on any non-zero value except 1. The missing check of the return value leads to inconsisted state of the jump label counter. - Add a missing type conversion in the paravirt spinlock code which makes loongson build again" * tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: jump_label: Fix the fix, brown paper bags galore locking/pvqspinlock: Correct the type of "old" variable in pv_kick_node()
2024-08-04arm: dts: arm: versatile-ab: Fix duplicate clock node nameRob Herring (Arm)1-1/+1
Commit 04f08ef291d4 ("arm/arm64: dts: arm: Use generic clock and regulator nodenames") renamed nodes and created 2 "clock-24000000" nodes (at different paths). The kernel can't handle these duplicate names even though they are at different paths. Fix this by renaming one of the nodes to "clock-pclk". This name is aligned with other Arm boards (those didn't have a known frequency to use in the node name). Fixes: 04f08ef291d4 ("arm/arm64: dts: arm: Use generic clock and regulator nodenames") Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Rob Herring (Arm) <[email protected]> Tested-by: Guenter Roeck <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Tested-by: Linus Walleij <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-08-04Merge tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds10-96/+119
Pull smb client fixes from Steve French: - two reparse point fixes - minor cleanup - additional trace point (to help debug a recent problem) * tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal version number smb: client: fix FSCTL_GET_REPARSE_POINT against NetApp smb3: add dynamic tracepoints for shutdown ioctl cifs: Remove cifs_aio_ctx smb: client: handle lack of FSCTL_GET_REPARSE_POINT support
2024-08-04Merge tag 'media/v6.11-2' of ↵Linus Torvalds3-6/+9
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - two Kconfig fixes - one fix for the UVC driver addressing probing time detection of a UVC custom controls - one fix related to PDF generation * tag 'media/v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: v4l: Fix missing tabular column hint for Y14P format media: intel/ipu6: select AUXILIARY_BUS in Kconfig media: ipu-bridge: fix ipu6 Kconfig dependencies media: uvcvideo: Fix custom control mapping probing