aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-11f2fs: add reserved_segments sysfs nodeDaeho Jeong2-0/+8
For the fine tuning of GC behavior, add reserved_segments sysfs node. Signed-off-by: Daeho Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-11f2fs: introduce migration_window_granularityDaeho Jeong6-10/+41
We can control the scanning window granularity for GC migration. For more frequent scanning and GC on zoned devices, we need a fine grained control knob for it. Signed-off-by: Daeho Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-11f2fs: make BG GC more aggressive for zoned devicesDaeho Jeong4-6/+65
Since we don't have any GC on device side for zoned devices, need more aggressive BG GC. So, tune the parameters for that. Signed-off-by: Daeho Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-11f2fs: avoid unused block when dio write in LFS modeDaejun Park1-0/+8
This patch addresses the problem that when using LFS mode, unused blocks may occur in f2fs_map_blocks() during block allocation for dio writes. If a new section is allocated during block allocation, it will not be included in the map struct by map_is_mergeable() if the LBA of the allocated block is not contiguous. However, the block already allocated in this process will remain unused due to the LFS mode. This patch avoids the possibility of unused blocks by escaping f2fs_map_blocks() when allocating the last block in a section. Signed-off-by: Daejun Park <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-11f2fs: fix to check atomic_file in f2fs ioctl interfacesChao Yu1-1/+12
Some f2fs ioctl interfaces like f2fs_ioc_set_pin_file(), f2fs_move_file_range(), and f2fs_defragment_range() missed to check atomic_write status, which may cause potential race issue, fix it. Cc: [email protected] Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-11f2fs: get rid of online repaire on corrupted directoryChao Yu3-80/+1
syzbot reports a f2fs bug as below: kernel BUG at fs/f2fs/inode.c:896! RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896 Call Trace: evict+0x532/0x950 fs/inode.c:704 dispose_list fs/inode.c:747 [inline] evict_inodes+0x5f9/0x690 fs/inode.c:797 generic_shutdown_super+0x9d/0x2d0 fs/super.c:627 kill_block_super+0x44/0x90 fs/super.c:1696 kill_f2fs_super+0x344/0x690 fs/f2fs/super.c:4898 deactivate_locked_super+0xc4/0x130 fs/super.c:473 cleanup_mnt+0x41f/0x4b0 fs/namespace.c:1373 task_work_run+0x24f/0x310 kernel/task_work.c:228 ptrace_notify+0x2d2/0x380 kernel/signal.c:2402 ptrace_report_syscall include/linux/ptrace.h:415 [inline] ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline] syscall_exit_work+0xc6/0x190 kernel/entry/common.c:173 syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline] syscall_exit_to_user_mode+0x279/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896 Online repaire on corrupted directory in f2fs_lookup() can generate dirty data/meta while racing w/ readonly remount, it may leave dirty inode after filesystem becomes readonly, however, checkpoint() will skips flushing dirty inode in a state of readonly mode, result in above panic. Let's get rid of online repaire in f2fs_lookup(), and leave the work to fsck.f2fs. Fixes: 510022a85839 ("f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries") Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected] Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-11f2fs: prevent atomic file from being dirtied before commitDaeho Jeong3-1/+14
Keep atomic file clean while updating and make it dirtied during commit in order to avoid unnecessary and excessive inode updates in the previous fix. Fixes: 4bf78322346f ("f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag") Signed-off-by: Daeho Jeong <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: get rid of page->indexChao Yu6-12/+15
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert read_node_page() to use folioChao Yu1-4/+5
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert __write_node_page() to use folioChao Yu1-9/+10
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_write_data_page() to use folioChao Yu1-4/+5
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_do_write_data_page() to use folioChao Yu1-13/+13
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_set_compressed_page() to use folioChao Yu1-3/+5
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_write_end() to use folioChao Yu1-6/+7
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_write_begin() to use folioChao Yu1-21/+23
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_submit_page_read() to use folioChao Yu1-6/+6
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Li Zetao <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_handle_page_eio() to use folioChao Yu3-4/+6
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_read_multi_pages() to use folioChao Yu1-12/+16
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert __f2fs_write_meta_page() to use folioChao Yu1-6/+7
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_do_write_meta_page() to use folioChao Yu3-9/+9
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_write_single_data_page() to use folioChao Yu3-16/+18
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_write_inline_data() to use folioChao Yu4-8/+8
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_clear_page_cache_dirty_tag() to use folioChao Yu5-6/+5
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_vm_page_mkwrite() to use folioChao Yu1-16/+16
Convert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-09-06f2fs: convert f2fs_compress_ctx_add_page() to use folioChao Yu3-10/+10
onvert to use folio, so that we can get rid of 'page->index' to prepare for removal of 'index' field in structure page [1]. [1] https://lore.kernel.org/all/[email protected]/ Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: Use sysfs_emit_at() to simplify codeChristophe JAILLET1-24/+21
This file already uses sysfs_emit(). So be consistent and also use sysfs_emit_at(). This slightly simplifies the code and makes it more readable. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: atomic: fix to forbid dio in atomic_fileChao Yu1-12/+24
atomic write can only be used via buffered IO, let's fail direct IO on atomic_file and return -EOPNOTSUPP. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: compress: don't redirty sparse cluster during {,de}compressYeongjin Gil3-26/+61
In f2fs_do_write_data_page, when the data block is NULL_ADDR, it skips writepage considering that it has been already truncated. This results in an infinite loop as the PAGECACHE_TAG_TOWRITE tag is not cleared during the writeback process for a compressed file including NULL_ADDR in compress_mode=user. This is the reproduction process: 1. dd if=/dev/zero bs=4096 count=1024 seek=1024 of=testfile 2. f2fs_io compress testfile 3. dd if=/dev/zero bs=4096 count=1 conv=notrunc of=testfile 4. f2fs_io decompress testfile To prevent the problem, let's check whether the cluster is fully allocated before redirty its pages. Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Reviewed-by: Sungjong Seo <[email protected]> Reviewed-by: Sunmin Jeong <[email protected]> Tested-by: Jaewook Kim <[email protected]> Signed-off-by: Yeongjin Gil <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: check discard support for conventional zonesShin'ichiro Kawasaki1-0/+7
As the helper function f2fs_bdev_support_discard() shows, f2fs checks if the target block devices support discard by calling bdev_max_discard_sectors() and bdev_is_zoned(). This check works well for most cases, but it does not work for conventional zones on zoned block devices. F2fs assumes that zoned block devices support discard, and calls __submit_discard_cmd(). When __submit_discard_cmd() is called for sequential write required zones, it works fine since __submit_discard_cmd() issues zone reset commands instead of discard commands. However, when __submit_discard_cmd() is called for conventional zones, __blkdev_issue_discard() is called even when the devices do not support discard. The inappropriate __blkdev_issue_discard() call was not a problem before the commit 30f1e7241422 ("block: move discard checks into the ioctl handler") because __blkdev_issue_discard() checked if the target devices support discard or not. If not, it returned EOPNOTSUPP. After the commit, __blkdev_issue_discard() no longer checks it. It always returns zero and sets NULL to the given bio pointer. This NULL pointer triggers f2fs_bug_on() in __submit_discard_cmd(). The BUG is recreated with the commands below at the umount step, where /dev/nullb0 is a zoned null_blk with 5GB total size, 128MB zone size and 10 conventional zones. $ mkfs.f2fs -f -m /dev/nullb0 $ mount /dev/nullb0 /mnt $ for ((i=0;i<5;i++)); do dd if=/dev/zero of=/mnt/test bs=65536 count=1600 conv=fsync; done $ umount /mnt To fix the BUG, avoid the inappropriate __blkdev_issue_discard() call. When discard is requested for conventional zones, check if the device supports discard or not. If not, return EOPNOTSUPP. Fixes: 30f1e7241422 ("block: move discard checks into the ioctl handler") Cc: [email protected] Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Chao Yu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread()Chao Yu3-4/+11
syzbot reports a f2fs bug as below: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_report+0xe8/0x550 mm/kasan/report.c:491 kasan_report+0x143/0x180 mm/kasan/report.c:601 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline] __refcount_add include/linux/refcount.h:184 [inline] __refcount_inc include/linux/refcount.h:241 [inline] refcount_inc include/linux/refcount.h:258 [inline] get_task_struct include/linux/sched/task.h:118 [inline] kthread_stop+0xca/0x630 kernel/kthread.c:704 f2fs_stop_gc_thread+0x65/0xb0 fs/f2fs/gc.c:210 f2fs_do_shutdown+0x192/0x540 fs/f2fs/file.c:2283 f2fs_ioc_shutdown fs/f2fs/file.c:2325 [inline] __f2fs_ioctl+0x443a/0xbe60 fs/f2fs/file.c:4325 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is below race condition, it may cause use-after-free issue in sbi->gc_th pointer. - remount - f2fs_remount - f2fs_stop_gc_thread - kfree(gc_th) - f2fs_ioc_shutdown - f2fs_do_shutdown - f2fs_stop_gc_thread - kthread_stop(gc_th->f2fs_gc_task) : sbi->gc_thread = NULL; We will call f2fs_do_shutdown() in two paths: - for f2fs_ioc_shutdown() path, we should grab sb->s_umount semaphore for fixing. - for f2fs_shutdown() path, it's safe since caller has already grabbed sb->s_umount semaphore. Reported-by: [email protected] Closes: https://lore.kernel.org/linux-f2fs-devel/[email protected] Fixes: 7950e9ac638e ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: atomic: fix to truncate pagecache before on-disk metadata truncationChao Yu1-0/+4
We should always truncate pagecache while truncating on-disk data. Fixes: a46bebd502fe ("f2fs: synchronize atomic write aborts") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: fix to wait page writeback before setting gcing flagChao Yu1-0/+4
Soft IRQ Thread - f2fs_write_end_io - f2fs_defragment_range - set_page_private_gcing - type = WB_DATA_TYPE(page, false); : assign type w/ F2FS_WB_CP_DATA due to page_private_gcing() is true - dec_page_count() w/ wrong type - end_page_writeback() Value of F2FS_WB_CP_DATA reference count may become negative under above race condition, the root cause is we missed to wait page writeback before setting gcing page private flag, let's fix it. Fixes: 2d1fe8a86bf5 ("f2fs: fix to tag gcing flag on page during file defragment") Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: Create COW inode from parent dentry for atomic writeYeongjin Gil1-9/+3
The i_pino in f2fs_inode_info has the previous parent's i_ino when inode was renamed, which may cause f2fs_ioc_start_atomic_write to fail. If file_wrong_pino is true and i_nlink is 1, then to find a valid pino, we should refer to the dentry from inode. To resolve this issue, let's get parent inode using parent dentry directly. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Reviewed-by: Sungjong Seo <[email protected]> Reviewed-by: Sunmin Jeong <[email protected]> Signed-off-by: Yeongjin Gil <[email protected]> Reviewed-by: Daeho Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: Require FMODE_WRITE for atomic write ioctlsJann Horn1-0/+9
The F2FS ioctls for starting and committing atomic writes check for inode_owner_or_capable(), but this does not give LSMs like SELinux or Landlock an opportunity to deny the write access - if the caller's FSUID matches the inode's UID, inode_owner_or_capable() immediately returns true. There are scenarios where LSMs want to deny a process the ability to write particular files, even files that the FSUID of the process owns; but this can currently partially be bypassed using atomic write ioctls in two ways: - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can truncate an inode to size 0 - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert changes another process concurrently made to a file Fix it by requiring FMODE_WRITE for these operations, just like for F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these ioctls when intending to write into the file, that seems unlikely to break anything. Fixes: 88b88a667971 ("f2fs: support atomic writes") Cc: [email protected] Signed-off-by: Jann Horn <[email protected]> Reviewed-by: Chao Yu <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-21f2fs: clean up val{>>,<<}F2FS_BLKSIZE_BITSZhiguo Niu6-9/+9
Use F2FS_BYTES_TO_BLK(bytes) and F2FS_BLK_TO_BYTES(blk) for cleanup Signed-off-by: Zhiguo Niu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: fix to use per-inode maxbytes and cleanupZhiguo Niu3-9/+7
This is a supplement to commit 6d1451bf7f84 ("f2fs: fix to use per-inode maxbytes") for some missed cases, also cleanup redundant code in f2fs_llseek. Cc: Chengguang Xu <[email protected]> Signed-off-by: Zhiguo Niu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: use f2fs_get_node_page when write inline dataZijie Wang1-12/+11
We just need inode page when write inline data, use f2fs_get_node_page() to get it instead of using dnode_of_data, which can eliminate unnecessary struct use. Signed-off-by: Zijie Wang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: sysfs: support atgc_enabledliujinbao12-0/+14
When we add "atgc" to the fstab table, ATGC is not immediately enabled. There is a 7-day time threshold, and we can use "atgc_enabled" to show whether ATGC is enabled. Signed-off-by: liujinbao1 <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15Revert "f2fs: use flush command instead of FUA for zoned device"Wenjie Cheng2-3/+2
This reverts commit c550e25bca660ed2554cbb48d32b82d0bb98e4b1. Commit c550e25bca660ed2554cbb48d32b82d0bb98e4b1 ("f2fs: use flush command instead of FUA for zoned device") used additional flush command to keep write order. Since Commit dd291d77cc90eb6a86e9860ba8e6e38eebd57d12 ("block: Introduce zone write plugging") has enabled the block layer to handle this order issue, there is no need to use flush command. Signed-off-by: Wenjie Cheng <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: get rid of buffer_head useChao Yu5-41/+66
Convert to use folio and related functionality. Cc: Matthew Wilcox <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: fix to avoid racing in between read and OPU dio writeChao Yu1-0/+4
If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete In LFS mode, if there is inflight dio, let's wait for its completion, this policy won't cover all race cases, however it is a tradeoff which avoids abusing lock around IO paths. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: fix to wait dio completionChao Yu1-0/+13
It should wait all existing dio write IOs before block removal, otherwise, previous direct write IO may overwrite data in the block which may be reused by other inode. Cc: [email protected] Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-15f2fs: reduce expensive checkpoint trigger frequencyChao Yu4-3/+19
We may trigger high frequent checkpoint for below case: 1. mkdir /mnt/dir1; set dir1 encrypted 2. touch /mnt/file1; fsync /mnt/file1 3. mkdir /mnt/dir2; set dir2 encrypted 4. touch /mnt/file2; fsync /mnt/file2 ... Although, newly created dir and file are not related, due to commit bbf156f7afa7 ("f2fs: fix lost xattrs of directories"), we will trigger checkpoint whenever fsync() comes after a new encrypted dir created. In order to avoid such performance regression issue, let's record an entry including directory's ino in global cache whenever we update directory's xattr data, and then triggerring checkpoint() only if xattr metadata of target file's parent was updated. This patch updates to cover below no encryption case as well: 1) parent is checkpointed 2) set_xattr(dir) w/ new xnid 3) create(file) 4) fsync(file) Fixes: bbf156f7afa7 ("f2fs: fix lost xattrs of directories") Reported-by: wangzijie <[email protected]> Reported-by: Zhiguo Niu <[email protected]> Tested-by: Zhiguo Niu <[email protected]> Reported-by: Yunlei He <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: atomic: fix to avoid racing w/ GCChao Yu2-2/+16
Case #1: SQLite App GC Thread Kworker Shrinker - f2fs_ioc_start_atomic_write - f2fs_ioc_commit_atomic_write - f2fs_commit_atomic_write - filemap_write_and_wait_range : write atomic_file's data to cow_inode echo 3 > drop_caches to drop atomic_file's cache. - f2fs_gc - gc_data_segment - move_data_page - set_page_dirty - writepages - f2fs_do_write_data_page : overwrite atomic_file's data to cow_inode - f2fs_down_write(&fi->i_gc_rwsem[WRITE]) - __f2fs_commit_atomic_write - f2fs_up_write(&fi->i_gc_rwsem[WRITE]) Case #2: SQLite App GC Thread Kworker - f2fs_ioc_start_atomic_write - __writeback_single_inode - do_writepages - f2fs_write_cache_pages - f2fs_write_single_data_page - f2fs_do_write_data_page : write atomic_file's data to cow_inode - f2fs_gc - gc_data_segment - move_data_page - set_page_dirty - writepages - f2fs_do_write_data_page : overwrite atomic_file's data to cow_inode - f2fs_ioc_commit_atomic_write In above cases racing in between atomic_write and GC, previous data in atomic_file may be overwrited to cow_file, result in data corruption. This patch introduces PAGE_PRIVATE_ATOMIC_WRITE bit flag in page.private, and use it to indicate that there is last dirty data in atomic file, and the data should be writebacked into cow_file, if the flag is not tagged in page, we should never write data across files. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: Daeho Jeong <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: fix macro definition stat_inc_cp_countJulian Sun1-1/+1
The macro stat_inc_cp_count accepts a parameter si, but it was not used, rather the variable sbi was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: fix macro definition on_f2fs_build_free_nidsJulian Sun1-1/+1
The macro on_f2fs_build_free_nids accepts a parameter nmi, but it was not used, rather the variable nm_i was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: add write priority option based on zone UFSLiao Yuanhong5-1/+58
Currently, we are using a mix of traditional UFS and zone UFS to support some functionalities that cannot be achieved on zone UFS alone. However, there are some issues with this approach. There exists a significant performance difference between traditional UFS and zone UFS. Under normal usage, we prioritize writes to zone UFS. However, in critical conditions (such as when the entire UFS is almost full), we cannot determine whether data will be written to traditional UFS or zone UFS. This can lead to significant performance fluctuations, which is not conducive to development and testing. To address this, we have added an option zlu_io_enable under sys with the following three modes: 1) zlu_io_enable == 0:Normal mode, prioritize writing to zone UFS; 2) zlu_io_enable == 1:Zone UFS only mode, only allow writing to zone UFS; 3) zlu_io_enable == 2:Traditional UFS priority mode, prioritize writing to traditional UFS. Signed-off-by: Liao Yuanhong <[email protected]> Signed-off-by: Wu Bo <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: avoid potential int overflow in sanity_check_area_boundary()Nikita Zhandarovich1-2/+2
While calculating the end addresses of main area and segment 0, u32 may be not enough to hold the result without the danger of int overflow. Just in case, play it safe and cast one of the operands to a wider type (u64). Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: fd694733d523 ("f2fs: cover large section in sanity check of super") Cc: [email protected] Signed-off-by: Nikita Zhandarovich <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: fix several potential integer overflows in file offsetsNikita Zhandarovich2-3/+3
When dealing with large extents and calculating file offsets by summing up according extent offsets and lengths of unsigned int type, one may encounter possible integer overflow if the values are big enough. Prevent this from happening by expanding one of the addends to (pgoff_t) type. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: d323d005ac4a ("f2fs: support file defragment") Cc: [email protected] Signed-off-by: Nikita Zhandarovich <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-08-05f2fs: prevent possible int overflow in dir_block_index()Nikita Zhandarovich1-1/+2
The result of multiplication between values derived from functions dir_buckets() and bucket_blocks() *could* technically reach 2^30 * 2^2 = 2^32. While unlikely to happen, it is prudent to ensure that it will not lead to integer overflow. Thus, use mul_u32_u32() as it's more appropriate to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 3843154598a0 ("f2fs: introduce large directory support") Cc: [email protected] Signed-off-by: Nikita Zhandarovich <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>