aboutsummaryrefslogtreecommitdiff
path: root/fs/f2fs/gc.c
AgeCommit message (Collapse)AuthorFilesLines
2024-09-13f2fs: forcibly migrate to secure space for zoned device file pinningDaeho Jeong1-2/+1
We need to migrate data blocks even though it is full to secure space for zoned device file pinning. Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-12f2fs: remove unused parametersliuderong1-3/+3
Remove unused parameter segno from f2fs_usable_segs_in_sec. Signed-off-by: liuderong <liuderong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-11f2fs: add valid block ratio not to do excessive GC for one time GCDaeho Jeong1-4/+12
We need to introduce a valid block ratio threshold not to trigger excessive GC for zoned deivces. The initial value of it is 95%. So, F2FS will stop the thread from intiating GC for sections having valid blocks exceeding the ratio. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-11f2fs: create gc_no_zoned_gc_percent and gc_boost_zoned_gc_percentDaeho Jeong1-3/+9
Added control knobs for gc_no_zoned_gc_percent and gc_boost_zoned_gc_percent. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-11f2fs: do FG_GC when GC boosting is required for zoned devicesDaeho Jeong1-7/+17
Under low free section count, we need to use FG_GC instead of BG_GC to recover free sections. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-11f2fs: increase BG GC migration window granularity when boosted for zoned devicesDaeho Jeong1-2/+10
Need bigger BG GC migration window granularity when free section is running low. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-11f2fs: introduce migration_window_granularityDaeho Jeong1-10/+21
We can control the scanning window granularity for GC migration. For more frequent scanning and GC on zoned devices, we need a fine grained control knob for it. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-11f2fs: make BG GC more aggressive for zoned devicesDaeho Jeong1-4/+21
Since we don't have any GC on device side for zoned devices, need more aggressive BG GC. So, tune the parameters for that. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10f2fs: use meta inode for GC of COW fileSunmin Jeong1-2/+5
In case of the COW file, new updates and GC writes are already separated to page caches of the atomic file and COW file. As some cases that use the meta inode for GC, there are some race issues between a foreground thread and GC thread. To handle them, we need to take care when to invalidate and wait writeback of GC pages in COW files as the case of using the meta inode. Also, a pointer from the COW inode to the original inode is required to check the state of original pages. For the former, we can solve the problem by using the meta inode for GC of COW files. Then let's get a page from the original inode in move_data_block when GCing the COW file to avoid race condition. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable@vger.kernel.org #v5.19+ Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10f2fs: use meta inode for GC of atomic fileSunmin Jeong1-3/+3
The page cache of the atomic file keeps new data pages which will be stored in the COW file. It can also keep old data pages when GCing the atomic file. In this case, new data can be overwritten by old data if a GC thread sets the old data page as dirty after new data page was evicted. Also, since all writes to the atomic file are redirected to COW inodes, GC for the atomic file is not working well as below. f2fs_gc(gc_type=FG_GC) - select A as a victim segment do_garbage_collect - iget atomic file's inode for block B move_data_page f2fs_do_write_data_page - use dn of cow inode - set fio->old_blkaddr from cow inode - seg_freed is 0 since block B is still valid - goto gc_more and A is selected as victim again To solve the problem, let's separate GC writes and updates in the atomic file by using the meta inode for GC writes. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable@vger.kernel.org #v5.19+ Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-06-12f2fs: fix to remove redundant SBI_NEED_FSCK flag setZhiguo Niu1-1/+0
Subsequent f2fs_stop_checkpoint will set cp_err, so this SBI_NEED_FSCK flag set action is invalid. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-06-12f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GCChao Yu1-0/+10
syzbot reports a f2fs bug as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/inline.c:258! CPU: 1 PID: 34 Comm: kworker/u8:2 Not tainted 6.9.0-rc6-syzkaller-00012-g9e4bc4bcae01 #0 RIP: 0010:f2fs_write_inline_data+0x781/0x790 fs/f2fs/inline.c:258 Call Trace: f2fs_write_single_data_page+0xb65/0x1d60 fs/f2fs/data.c:2834 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3288 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3315 do_writepages+0x35b/0x870 mm/page-writeback.c:2612 __writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1650 writeback_sb_inodes+0x905/0x1260 fs/fs-writeback.c:1941 wb_writeback+0x457/0xce0 fs/fs-writeback.c:2117 wb_do_writeback fs/fs-writeback.c:2264 [inline] wb_workfn+0x410/0x1090 fs/fs-writeback.c:2304 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa12/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f2/0x390 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback. Let's add sanity check on F2FS_INLINE_DATA flag in inode during GC, so that, it can forbid migrating inline_data inode's data block for fixing. Reported-by: syzbot+848062ba19c8782ca5c8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000d103ce06174d7ec3@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-05-11f2fs: fix to add missing iput() in gc_data_segment()Chao Yu1-2/+7
During gc_data_segment(), if inode state is abnormal, it missed to call iput(), fix it. Fixes: b73e52824c89 ("f2fs: reposition unlock_new_inode to prevent accessing invalid inode") Fixes: 9056d6489f5a ("f2fs: fix to do sanity check on inode type during garbage collection") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12f2fs: use folio_test_writebackJaegeuk Kim1-1/+1
Let's convert PageWriteback to folio_test_writeback. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-12f2fs: fix to handle error paths of {new,change}_curseg()Zhiguo Niu1-2/+5
{new,change}_curseg() may return error in some special cases, error handling should be did in their callers, and this will also facilitate subsequent error path expansion in {new,change}_curseg(). Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-12f2fs: unify the error handling of f2fs_is_valid_blkaddrZhiguo Niu1-2/+0
There are some cases of f2fs_is_valid_blkaddr not handled as ERROR_INVALID_BLKADDR,so unify the error handling about all of f2fs_is_valid_blkaddr. Do f2fs_handle_error in __f2fs_is_valid_blkaddr for cleanup. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-04f2fs: introduce SEGS_TO_BLKS/BLKS_TO_SEGS for cleanupChao Yu1-5/+5
Just cleanup, no functional change. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-04f2fs: fix to check return value of f2fs_gc_rangeZhiguo Niu1-0/+3
f2fs_gc_range may return error, so its caller f2fs_allocate_pinning_section should determine whether to do retry based on ist return value. Also just do f2fs_gc_range when f2fs_allocate_new_section return -EAGAIN, and check cp error case in f2fs_gc_range. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29f2fs: fix to handle segment allocation failure correctlyChao Yu1-1/+6
If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that we run out of free segment, we should not panic kernel, instead, let's handle such error correctly in its caller. Signed-off-by: Chao Yu <chao@kernel.org> Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: support file pinning for zoned devicesDaeho Jeong1-4/+10
Support file pinning with conventional storage area for zoned devices Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: kill heap-based allocationJaegeuk Kim1-3/+2
No one uses this feature. Let's kill it. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: separate f2fs_gc_range() to use GC for a rangeDaeho Jeong1-21/+28
Make f2fs_gc_range() an extenal function to use it for GC for a range. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27f2fs: use BLKS_PER_SEG, BLKS_PER_SEC, and SEGS_PER_SECJaegeuk Kim1-20/+20
No functional change. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-20f2fs: deprecate io_bitsJaegeuk Kim1-9/+1
Let's deprecate an unused io_bits feature to save CPU cycles and memory. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-26f2fs: Add error handling for negative returns from do_garbage_collectYongpeng Yang1-0/+3
The function do_garbage_collect can return a value less than 0 due to f2fs_cp_error being true or page allocation failure, as a result of calling f2fs_get_sum_page. However, f2fs_gc does not account for such cases, which could potentially lead to an abnormal total_freed and thus cause subsequent code to behave unexpectedly. Given that an f2fs_cp_error is irrecoverable, and considering that do_garbage_collect already retries page allocation errors through its call to f2fs_get_sum_page->f2fs_get_meta_page_retry, any error reported by do_garbage_collect should immediately terminate the current GC. Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-26f2fs: Use wait_event_freezable_timeout() for freezable kthreadKevin Hao1-3/+3
A freezable kernel thread can enter frozen state during freezing by either calling try_to_freeze() or using wait_event_freezable() and its variants. So for the following snippet of code in a kernel thread loop: wait_event_interruptible_timeout(); try_to_freeze(); We can change it to a simple wait_event_freezable_timeout() and then eliminate the function calls to try_to_freeze() and freezing(). Signed-off-by: Kevin Hao <haokexin@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11f2fs: introduce f2fs_invalidate_internal_cache() for cleanupChao Yu1-3/+2
Just cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11f2fs: delete obsolete FI_FIRST_BLOCK_WRITTENChao Yu1-2/+0
Commit 3c6c2bebef79 ("f2fs: avoid punch_hole overhead when releasing volatile data") introduced FI_FIRST_BLOCK_WRITTEN as below reason: This patch is to avoid some punch_hole overhead when releasing volatile data. If volatile data was not written yet, we just can make the first page as zero. After commit 7bc155fec5b3 ("f2fs: kill volatile write support"), we won't support volatile write, but it missed to remove obsolete FI_FIRST_BLOCK_WRITTEN, delete it in this patch. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-09-02Merge tag 'f2fs-for-6-6-rc1' of ↵Linus Torvalds1-6/+12
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we don't have a highlighted feature enhancement, but mostly have fixed issues mainly in two parts: 1) zoned block device, and 2) compression support. For zoned block device, we've tried to improve the power-off recovery flow as much as possible. For compression, we found some corner cases caused by wrong compression policy and logics. Other than them, there were some reverts and stat corrections. Bug fixes: - use finish zone command when closing a zone - check zone type before sending async reset zone command - fix to assign compress_level for lz4 correctly - fix error path of f2fs_submit_page_read() - don't {,de}compress non-full cluster - send small discard commands during checkpoint back - flush inode if atomic file is aborted - correct to account gc/cp stats And, there are minor bug fixes, avoiding false lockdep warning, and clean-ups" * tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits) f2fs: use finish zone command when closing a zone f2fs: compress: fix to assign compress_level for lz4 correctly f2fs: fix error path of f2fs_submit_page_read() f2fs: clean up error handling in sanity_check_{compress_,}inode() f2fs: avoid false alarm of circular locking Revert "f2fs: do not issue small discard commands during checkpoint" f2fs: doc: fix description of max_small_discards f2fs: should update REQ_TIME for direct write f2fs: fix to account cp stats correctly f2fs: fix to account gc stats correctly f2fs: remove unneeded check condition in __f2fs_setxattr() f2fs: fix to update i_ctime in __f2fs_setxattr() Revert "f2fs: fix to do sanity check on extent cache correctly" f2fs: increase usage of folio_next_index() helper f2fs: Only lfs mode is allowed with zoned block device feature f2fs: check zone type before sending async reset zone command f2fs: compress: don't {,de}compress non-full cluster f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted f2fs: don't reopen the main block device in f2fs_scan_devices f2fs: fix to avoid mmap vs set_compress_option case ...
2023-08-14f2fs: fix to account cp stats correctlyChao Yu1-0/+5
cp_foreground_calls sysfs entry shows total CP call count rather than foreground CP call count, fix it. Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14f2fs: fix to account gc stats correctlyChao Yu1-6/+7
As reported, status debugfs entry shows inconsistent GC stats as below: GC calls: 6008 (BG: 6161) - data segments : 3053 (BG: 3053) - node segments : 2955 (BG: 2955) Total GC calls is larger than BGGC calls, the reason is: - f2fs_stat_info.call_count accounts total migrated section count by f2fs_gc() - f2fs_stat_info.bg_gc accounts total call times of f2fs_gc() from background gc_thread Another issue is gc_foreground_calls sysfs entry shows total GC call count rather than FGGC call count. This patch changes as below for fix: - account GC calls and migrated segment count separately - support to account migrated section count if it enables large section mode - fix to show correct value in gc_foreground_calls sysfs entry Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-07-17fs: distinguish between user initiated freeze and kernel initiated freezeDarrick J. Wong1-3/+5
Userspace can freeze a filesystem using the FIFREEZE ioctl or by suspending the block device; this state persists until userspace thaws the filesystem with the FITHAW ioctl or resuming the block device. Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for the fsfreeze ioctl") we only allow the first freeze command to succeed. The kernel may decide that it is necessary to freeze a filesystem for its own internal purposes, such as suspends in progress, filesystem fsck activities, or quiescing a device prior to removal. Userspace thaw commands must never break a kernel freeze, and kernel thaw commands shouldn't undo userspace's freeze command. Introduce a couple of freeze holder flags and wire it into the sb_writers state. One kernel and one userspace freeze are allowed to coexist at the same time; the filesystem will not thaw until both are lifted. I wonder if the f2fs/gfs2 code should be using a kernel freeze here, but for now we'll use FREEZE_HOLDER_USERSPACE to preserve existing behaviors. Cc: mcgrof@kernel.org Cc: jack@suse.cz Cc: hch@infradead.org Cc: ruansy.fnst@fujitsu.com Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2023-06-26f2fs: check return value of freeze_super()Chao Yu1-1/+3
freeze_super() can fail, it needs to check its return value and do error handling in f2fs_resize_fs(). Fixes: 04f0b2eaa3b3 ("f2fs: ioctl for removing a range from F2FS") Fixes: b4b10061ef98 ("f2fs: refactor resize_fs to avoid meta updates in progress") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12f2fs: fix to avoid NULL pointer dereference f2fs_write_end_io()Chao Yu1-3/+18
butt3rflyh4ck reports a bug as below: When a thread always calls F2FS_IOC_RESIZE_FS to resize fs, if resize fs is failed, f2fs kernel thread would invoke callback function to update f2fs io info, it would call f2fs_write_end_io and may trigger null-ptr-deref in NODE_MAPPING. general protection fault, probably for non-canonical address KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] RIP: 0010:NODE_MAPPING fs/f2fs/f2fs.h:1972 [inline] RIP: 0010:f2fs_write_end_io+0x727/0x1050 fs/f2fs/data.c:370 <TASK> bio_endio+0x5af/0x6c0 block/bio.c:1608 req_bio_endio block/blk-mq.c:761 [inline] blk_update_request+0x5cc/0x1690 block/blk-mq.c:906 blk_mq_end_request+0x59/0x4c0 block/blk-mq.c:1023 lo_complete_rq+0x1c6/0x280 drivers/block/loop.c:370 blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1101 __do_softirq+0x1d4/0x8ef kernel/softirq.c:571 run_ksoftirqd kernel/softirq.c:939 [inline] run_ksoftirqd+0x31/0x60 kernel/softirq.c:931 smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164 kthread+0x33e/0x440 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 The root cause is below race case can cause leaving dirty metadata in f2fs after filesystem is remount as ro: Thread A Thread B - f2fs_ioc_resize_fs - f2fs_readonly --- return false - f2fs_resize_fs - f2fs_remount - write_checkpoint - set f2fs as ro - free_segment_range - update meta_inode's data Then, if f2fs_put_super() fails to write_checkpoint due to readonly status, and meta_inode's dirty data will be writebacked after node_inode is put, finally, f2fs_write_end_io will access NULL pointer on sbi->node_inode. Thread A IRQ context - f2fs_put_super - write_checkpoint fails - iput(node_inode) - node_inode = NULL - iput(meta_inode) - write_inode_now - f2fs_write_meta_page - f2fs_write_end_io - NODE_MAPPING(sbi) : access NULL pointer on node_inode Fixes: b4b10061ef98 ("f2fs: refactor resize_fs to avoid meta updates in progress") Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Closes: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cn Tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12f2fs: Fix over-estimating free section during FG GCYonggil Song1-5/+11
There was a bug that finishing FG GC unconditionally because free sections are over-estimated after checkpoint in FG GC. This patch initializes sec_freed by every checkpoint in FG GC. Signed-off-by: Yonggil Song <yonggil.song@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-05-08f2fs: support errors=remount-ro|continue|panic mountoptionChao Yu1-1/+1
This patch supports errors=remount-ro|continue|panic mount option for f2fs. f2fs behaves as below in three different modes: mode continue remount-ro panic access ops normal noraml N/A syscall errors -EIO -EROFS N/A mount option rw ro N/A pending dir write keep keep N/A pending non-dir write drop keep N/A pending node write drop keep N/A pending meta write keep keep N/A By default it uses "continue" mode. [Yangtao helps to clean up function's name] Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-24f2fs: remove power-of-two limitation of zoned deviceJaegeuk Kim1-2/+2
In f2fs, there's no reason to force po2. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-18f2fs: add has_enough_free_secs()Yangtao Li1-2/+2
Replace !has_not_enough_free_secs w/ has_enough_free_secs. BTW avoid nested 'if' statements in f2fs_balance_fs(). Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-18f2fs: refactor f2fs_gc to call checkpoint in urgent conditionJaegeuk Kim1-14/+13
The major change is to call checkpoint, if there's not enough space while having some prefree segments in FG_GC case. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-12f2fs: fix to keep consistent i_gc_rwsem lock orderChao Yu1-4/+4
i_gc_rwsem[WRITE] and i_gc_rwsem[READ] lock order is reversed in gc_data_segment() and f2fs_dio_write_iter(), fix to keep consistent lock order as below: 1. lock i_gc_rwsem[WRITE] 2. lock i_gc_rwsem[READ] Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-10f2fs: Fix system crash due to lack of free space in LFSYonggil Song1-2/+8
When f2fs tries to checkpoint during foreground gc in LFS mode, system crash occurs due to lack of free space if the amount of dirty node and dentry pages generated by data migration exceeds free space. The reproduction sequence is as follows. - 20GiB capacity block device (null_blk) - format and mount with LFS mode - create a file and write 20,000MiB - 4k random write on full range of the file RIP: 0010:new_curseg+0x48a/0x510 [f2fs] Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 48 04 45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff RSP: 0018:ffff977bc397b218 EFLAGS: 00010246 RAX: 00000000000027b9 RBX: 0000000000000000 RCX: 00000000000027c0 RDX: 0000000000000000 RSI: 00000000000027b9 RDI: ffff8c25ab4e74f8 RBP: ffff977bc397b268 R08: 00000000000027b9 R09: ffff8c29e4a34b40 R10: 0000000000000001 R11: ffff977bc397b0d8 R12: 0000000000000000 R13: ffff8c25b4dd81a0 R14: 0000000000000000 R15: ffff8c2f667f9000 FS: 0000000000000000(0000) GS:ffff8c344ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00055d000 CR3: 0000000e30810003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> allocate_segment_by_default+0x9c/0x110 [f2fs] f2fs_allocate_data_block+0x243/0xa30 [f2fs] ? __mod_lruvec_page_state+0xa0/0x150 do_write_page+0x80/0x160 [f2fs] f2fs_do_write_node_page+0x32/0x50 [f2fs] __write_node_page+0x339/0x730 [f2fs] f2fs_sync_node_pages+0x5a6/0x780 [f2fs] block_operations+0x257/0x340 [f2fs] f2fs_write_checkpoint+0x102/0x1050 [f2fs] f2fs_gc+0x27c/0x630 [f2fs] ? folio_mark_dirty+0x36/0x70 f2fs_balance_fs+0x16f/0x180 [f2fs] This patch adds checking whether free sections are enough before checkpoint during gc. Signed-off-by: Yonggil Song <yonggil.song@samsung.com> [Jaegeuk Kim: code clean-up] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-10f2fs: remove struct victim_selection default_v_opsYangtao Li1-11/+4
There is only single instance of these ops, and Jaegeuk point out that: Originally this was intended to give a chance to provide other allocation option. Anyway, it seems quit hard to do it anymore. So remove the indirection and call f2fs_get_victim() directly. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-29f2fs: factor out victim_entry usage from general rb_tree useJaegeuk Kim1-54/+85
Let's reduce the complexity of mixed use of rb_tree in victim_entry from extent_cache and discard_cmd. This should fix arm32 memory alignment issue caused by shared rb_entry. [struct victim_entry] [struct rb_entry] [0] struct rb_node rb_node; [0] struct rb_node rb_node; union { struct { unsigned int ofs; unsigned int len; }; [16] unsigned long long mtime; [12] unsigned long long key; } __packed; Cc: <stable@vger.kernel.org> Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-29f2fs: fix uninitialized skipped_gc_rwsemYonggil Song1-1/+1
When f2fs skipped a gc round during victim migration, there was a bug which would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem was not initialized. It fixes the bug by correctly initializing the skipped_gc_rwsem inside the gc loop. Fixes: 6f8d4455060d ("f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc") Signed-off-by: Yonggil Song <yonggil.song@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-02f2fs: reduce stack memory cost by using bitfield in struct f2fs_io_infoChao Yu1-4/+4
This patch tries to use bitfield in struct f2fs_io_info to improve memory usage. struct f2fs_io_info { ... unsigned int need_lock:8; /* indicate we need to lock cp_rwsem */ unsigned int version:8; /* version of the node */ unsigned int submitted:1; /* indicate IO submission */ unsigned int in_list:1; /* indicate fio is in io_list */ unsigned int is_por:1; /* indicate IO is from recovery or not */ unsigned int retry:1; /* need to reallocate block address */ unsigned int encrypted:1; /* indicate file is encrypted */ unsigned int post_read:1; /* require post read */ ... }; After this patch, size of struct f2fs_io_info reduces from 136 to 120. [Nathan: fix a compile warning (single-bit-bitfield-constant-conversion)] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-11f2fs: convert discard_wake and gc_wake to bool typeYangtao Li1-2/+2
discard_wake and gc_wake have only two values, 0 or 1. So there is no need to use int type to store them. BTW, move discard_wake to the end of the discard_cmd_control structure. Before: - sizeof(struct discard_cmd_control): 8392 After move: - sizeof(struct discard_cmd_control): 8384 Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-11f2fs: merge f2fs_show_injection_info() into time_to_inject()Yangtao Li1-3/+1
There is no need to additionally use f2fs_show_injection_info() to output information. Concatenate time_to_inject() and __time_to_inject() via a macro. In the new __time_to_inject() function, pass in the caller function name and parent function. In this way, we no longer need the f2fs_show_injection_info() function, and let's remove it. Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06f2fs: avoid to check PG_error flagChao Yu1-1/+0
After below changes: commit 14db0b3c7b83 ("fscrypt: stop using PG_error to track error status") commit 98dc08bae678 ("fsverity: stop using PG_error to track error status") There is no place in f2fs we will set PG_error flag in page, let's remove other PG_error usage in f2fs, as a step towards freeing the PG_error flag for other uses. Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06f2fs: add a f2fs_lookup_extent_cache_block helperChristoph Hellwig1-3/+2
All but three callers of f2fs_lookup_extent_cache just want the block address. Add a small helper to simplify them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-14Merge tag 'f2fs-for-6.2-rc1' of ↵Linus Torvalds1-36/+43
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added two features: F2FS_IOC_START_ATOMIC_REPLACE and a per-block age-based extent cache. F2FS_IOC_START_ATOMIC_REPLACE is a variant of the previous atomic write feature which guarantees a per-file atomicity. It would be more efficient than AtomicFile implementation in Android framework. The per-block age-based extent cache implements another type of extent cache in memory which keeps the per-block age in a file, so that block allocator could split the hot and cold data blocks more accurately. Enhancements: - introduce F2FS_IOC_START_ATOMIC_REPLACE - refactor extent_cache to add a new per-block-age-based extent cache support - introduce discard_urgent_util, gc_mode, max_ordered_discard sysfs knobs - add proc entry to show discard_plist info - optimize iteration over sparse directories - add barrier mount option Bug fixes: - avoid victim selection from previous victim section - fix to enable compress for newly created file if extension matches - set zstd compress level correctly - initialize locks early in f2fs_fill_super() to fix bugs reported by syzbot - correct i_size change for atomic writes - allow to read node block after shutdown - allow to set compression for inlined file - fix gc mode when gc_urgent_high_remaining is 1 - should put a page when checking the summary info Minor fixes and various clean-ups in GC, discard, debugfs, sysfs, and doc" * tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits) f2fs: reset wait_ms to default if any of the victims have been selected f2fs: fix some format WARNING in debug.c and sysfs.c f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super() f2fs: fix iostat parameter for discard f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache f2fs: add block_age-based extent cache f2fs: allocate the extent_cache by default f2fs: refactor extent_cache to support for read and more f2fs: remove unnecessary __init_extent_tree f2fs: move internal functions into extent_cache.c f2fs: specify extent cache for read explicitly f2fs: introduce f2fs_is_readonly() for readability f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macro f2fs: do some cleanup for f2fs module init MAINTAINERS: Add f2fs bug tracker link f2fs: remove the unused flush argument to change_curseg f2fs: open code allocate_segment_by_default f2fs: remove struct segment_allocation default_salloc_ops f2fs: introduce discard_urgent_util sysfs node f2fs: define MIN_DISCARD_GRANULARITY macro ...