aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-02-02f2fs: clean up i_compress_flag and i_compress_level usageChao Yu3-12/+19
.i_compress_level was introduced by commit 3fde13f817e2 ("f2fs: compress: support compress level"), but never be used. This patch updates as below: - load high 8-bits of on-disk .i_compress_flag to in-memory .i_compress_level - load low 8-bits of on-disk .i_compress_flag to in-memory .i_compress_flag - change type of in-memory .i_compress_flag from unsigned short to unsigned char. w/ above changes, we can avoid unneeded bit shift whenever during .init_compress_ctx(), and shrink size of struct f2fs_inode_info. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-02f2fs: reduce stack memory cost by using bitfield in struct f2fs_io_infoChao Yu7-27/+28
This patch tries to use bitfield in struct f2fs_io_info to improve memory usage. struct f2fs_io_info { ... unsigned int need_lock:8; /* indicate we need to lock cp_rwsem */ unsigned int version:8; /* version of the node */ unsigned int submitted:1; /* indicate IO submission */ unsigned int in_list:1; /* indicate fio is in io_list */ unsigned int is_por:1; /* indicate IO is from recovery or not */ unsigned int retry:1; /* need to reallocate block address */ unsigned int encrypted:1; /* indicate file is encrypted */ unsigned int post_read:1; /* require post read */ ... }; After this patch, size of struct f2fs_io_info reduces from 136 to 120. [Nathan: fix a compile warning (single-bit-bitfield-constant-conversion)] Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-02f2fs: factor the read/write tracing logic into a helperChristoph Hellwig1-35/+26
Factor the logic to log a path for reads and writs into a helper shared between the read_iter and write_iter methods. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-02f2fs: remove __has_curseg_spaceChristoph Hellwig1-11/+8
Just open code the logic in the only caller, where it is more obvious. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-02f2fs: refactor next blk selectionChristoph Hellwig1-25/+23
Remove __refresh_next_blkoff by opencoding the SSR vs LFS segment check in the only caller, and then add helpers for SSR block selection and blkoff randomization instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-02f2fs: remove __allocate_new_sectionChristoph Hellwig1-7/+1
Just fold this trivial wrapper into the only caller. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-02f2fs: refactor __allocate_new_segmentChristoph Hellwig1-9/+5
Simplify the check whether to allocate a new segment or reuse an open one. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-02f2fs: add a f2fs_curseg_valid_blocks helperChristoph Hellwig2-23/+15
Add a helper to return the valid blocks on log and SSR segments, and replace the last two uses of curseg_blkoff with it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-02f2fs: simplify do_checkpointChristoph Hellwig1-12/+10
For each loop add a local curseg_info pointer insted of looking it up for each of the three fields. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-02-02jffs2: Fix list_del corruption if compressors initialized failedZhang Xiaoxu1-5/+28
There is a list_del corruption when remove the jffs2 module: list_del corruption, ffffffffa0623e60->next is NULL WARNING: CPU: 6 PID: 6332 at lib/list_debug.c:49 __list_del_entry_valid+0x98/0x130 Modules linked in: jffs2(-) ] CPU: 6 PID: 6332 Comm: rmmod Tainted: G W 6.1.0-rc2+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:__list_del_entry_valid+0x98/0x130 ... Call Trace: <TASK> jffs2_unregister_compressor+0x3e/0xe0 [jffs2] jffs2_zlib_exit+0x11/0x30 [jffs2] jffs2_compressors_exit+0x1e/0x30 [jffs2] exit_jffs2_fs+0x16/0x44f [jffs2] __do_sys_delete_module.constprop.0+0x244/0x370 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 If one of the compressor initialize failed, the module always insert success since jffs2_compressors_init() always return success, then something bad may happen during remove the module. For this scenario, let's insmod failed. Signed-off-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02jffs2: Use function instead of macro when initialize compressorsZhang Xiaoxu2-22/+21
The initialized compressors should be released if one of them initialize fail, this is the pre-patch for fix the problem, use function instead of the macro in jffs2_compressors_init() to simplify the codes, no functional change intended. Signed-off-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02jffs2: fix spelling mistake "neccecary"->"necessary"Yu Zhe1-1/+1
There is a spelling mistake in comment. Fix it. Signed-off-by: Yu Zhe <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Fix kernel-docYang Li1-3/+3
Fix function name in fs/ubifs/io.c kernel-doc comment to remove some warnings found by clang(make W=1 LLVM=1). fs/ubifs/io.c:497: warning: expecting prototype for wbuf_timer_callback(). Prototype was for wbuf_timer_callback_nolock() instead fs/ubifs/io.c:513: warning: expecting prototype for new_wbuf_timer(). Prototype was for new_wbuf_timer_nolock() instead fs/ubifs/io.c:538: warning: expecting prototype for cancel_wbuf_timer(). Prototype was for cancel_wbuf_timer_nolock() instead Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Fix some kernel-doc commentsYang Li1-2/+6
Remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/ubifs/journal.c:1221: warning: Function parameter or member 'old_inode' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Function parameter or member 'old_nm' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Function parameter or member 'new_inode' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Function parameter or member 'new_nm' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Function parameter or member 'whiteout' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Excess function parameter 'old_dentry' description in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Excess function parameter 'new_dentry' description in 'ubifs_jnl_rename' Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Reviewed-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this processZhihao Cheng1-5/+14
There are two states for ubifs writing pages: 1. Dirty, Private 2. Not Dirty, Not Private The normal process cannot go to ubifs_releasepage() which means there exists pages being private but not dirty. Reproducer[1] shows that it could occur (which maybe related to [2]) with following process: PA PB PC lock(page)[PA] ubifs_write_end attach_page_private // set Private __set_page_dirty_nobuffers // set Dirty unlock(page) write_cache_pages[PA] lock(page) clear_page_dirty_for_io(page) // clear Dirty ubifs_writepage do_truncation[PB] truncate_setsize i_size_write(inode, newsize) // newsize = 0 i_size = i_size_read(inode) // i_size = 0 end_index = i_size >> PAGE_SHIFT if (page->index > end_index) goto out // jump out: unlock(page) // Private, Not Dirty generic_fadvise[PC] lock(page) invalidate_inode_page try_to_release_page ubifs_releasepage ubifs_assert(c, 0) // bad assertion! unlock(page) truncate_pagecache[PB] Then we may get following assertion failed: UBIFS error (ubi0:0 pid 1683): ubifs_assert_failed [ubifs]: UBIFS assert failed: 0, in fs/ubifs/file.c:1513 UBIFS warning (ubi0:0 pid 1683): ubifs_ro_mode [ubifs]: switched to read-only mode, error -22 CPU: 2 PID: 1683 Comm: aa Not tainted 5.16.0-rc5-00184-g0bca5994cacc-dirty #308 Call Trace: dump_stack+0x13/0x1b ubifs_ro_mode+0x54/0x60 [ubifs] ubifs_assert_failed+0x4b/0x80 [ubifs] ubifs_releasepage+0x67/0x1d0 [ubifs] try_to_release_page+0x57/0xe0 invalidate_inode_page+0xfb/0x130 __invalidate_mapping_pages+0xb9/0x280 invalidate_mapping_pagevec+0x12/0x20 generic_fadvise+0x303/0x3c0 ksys_fadvise64_64+0x4c/0xb0 [1] https://bugzilla.kernel.org/show_bug.cgi?id=215373 [2] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: ubifs_writepage: Mark page dirty after writing inode failedZhihao Cheng1-3/+9
There are two states for ubifs writing pages: 1. Dirty, Private 2. Not Dirty, Not Private There is a third possibility which maybe related to [1] that page is private but not dirty caused by following process: PA lock(page) ubifs_write_end attach_page_private // set Private __set_page_dirty_nobuffers // set Dirty unlock(page) write_cache_pages lock(page) clear_page_dirty_for_io(page) // clear Dirty ubifs_writepage write_inode // fail, goto out, following codes are not executed // do_writepage // set_page_writeback // set Writeback // detach_page_private // clear Private // end_page_writeback // clear Writeback out: unlock(page) // Private, Not Dirty PB ksys_fadvise64_64 generic_fadvise invalidate_inode_page // page is neither Dirty nor Writeback invalidate_complete_page // page_has_private is true try_to_release_page ubifs_releasepage ubifs_assert(c, 0) !!! Then we may get following assertion failed: UBIFS error (ubi0:0 pid 1492): ubifs_assert_failed [ubifs]: UBIFS assert failed: 0, in fs/ubifs/file.c:1499 UBIFS warning (ubi0:0 pid 1492): ubifs_ro_mode [ubifs]: switched to read-only mode, error -22 CPU: 2 PID: 1492 Comm: aa Not tainted 5.16.0-rc2-00012-g7bb767dee0ba-dirty Call Trace: dump_stack+0x13/0x1b ubifs_ro_mode+0x54/0x60 [ubifs] ubifs_assert_failed+0x4b/0x80 [ubifs] ubifs_releasepage+0x7e/0x1e0 [ubifs] try_to_release_page+0x57/0xe0 invalidate_inode_page+0xfb/0x130 invalidate_mapping_pagevec+0x12/0x20 generic_fadvise+0x303/0x3c0 vfs_fadvise+0x35/0x40 ksys_fadvise64_64+0x4c/0xb0 Jump [2] to find a reproducer. [1] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty [2] https://bugzilla.kernel.org/show_bug.cgi?id=215357 Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: dirty_cow_znode: Fix memleak in error handling pathZhihao Cheng1-1/+8
Following process will cause a memleak for copied up znode: dirty_cow_znode zn = copy_znode(c, znode); err = insert_old_idx(c, zbr->lnum, zbr->offs); if (unlikely(err)) return ERR_PTR(err); // No one refers to zn. Fix it by adding copied znode back to tnc, then it will be freed by ubifs_destroy_tnc_subtree() while closing tnc. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216705 Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Re-statistic cleaned znode count if commit failedZhihao Cheng1-0/+15
Dirty znodes will be written on flash in committing process with following states: process A | znode state ------------------------------------------------------ do_commit | DIRTY_ZNODE ubifs_tnc_start_commit | DIRTY_ZNODE get_znodes_to_commit | DIRTY_ZNODE | COW_ZNODE layout_commit | DIRTY_ZNODE | COW_ZNODE fill_gap | 0 write master | 0 or OBSOLETE_ZNODE process B | znode state ------------------------------------------------------ do_commit | DIRTY_ZNODE[1] ubifs_tnc_start_commit | DIRTY_ZNODE get_znodes_to_commit | DIRTY_ZNODE | COW_ZNODE ubifs_tnc_end_commit | DIRTY_ZNODE | COW_ZNODE write_index | 0 write master | 0 or OBSOLETE_ZNODE[2] or | DIRTY_ZNODE[3] [1] znode is dirtied without concurrent committing process [2] znode is copied up (re-dirtied by other process) before cleaned up in committing process [3] znode is re-dirtied after cleaned up in committing process Currently, the clean znode count is updated in free_obsolete_znodes(), which is called only in normal path. If do_commit failed, clean znode count won't be updated, which triggers a failure ubifs assertion[4] in ubifs_tnc_close(): ubifs_assert_failed [ubifs]: UBIFS assert failed: freed == n [4] Commit 380347e9ca7682 ("UBIFS: Add an assertion for clean_zn_cnt"). Fix it by re-statisticing cleaned znode count in tnc_destroy_cnext(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216704 Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Fix memory leak in alloc_wbufs()Li Zetao1-4/+13
kmemleak reported a sequence of memory leaks, and show them as following: unreferenced object 0xffff8881575f8400 (size 1024): comm "mount", pid 19625, jiffies 4297119604 (age 20.383s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8176cecd>] __kmalloc+0x4d/0x150 [<ffffffffa0406b2b>] ubifs_mount+0x307b/0x7170 [ubifs] [<ffffffff819fa8fd>] legacy_get_tree+0xed/0x1d0 [<ffffffff81936f2d>] vfs_get_tree+0x7d/0x230 [<ffffffff819b2bd4>] path_mount+0xdd4/0x17b0 [<ffffffff819b37aa>] __x64_sys_mount+0x1fa/0x270 [<ffffffff83c14295>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff8881798a6e00 (size 512): comm "mount", pid 19677, jiffies 4297121912 (age 37.816s) hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace: [<ffffffff8176cecd>] __kmalloc+0x4d/0x150 [<ffffffffa0418342>] ubifs_wbuf_init+0x52/0x480 [ubifs] [<ffffffffa0406ca5>] ubifs_mount+0x31f5/0x7170 [ubifs] [<ffffffff819fa8fd>] legacy_get_tree+0xed/0x1d0 [<ffffffff81936f2d>] vfs_get_tree+0x7d/0x230 [<ffffffff819b2bd4>] path_mount+0xdd4/0x17b0 [<ffffffff819b37aa>] __x64_sys_mount+0x1fa/0x270 [<ffffffff83c14295>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 The problem is that the ubifs_wbuf_init() returns an error in the loop which in the alloc_wbufs(), then the wbuf->buf and wbuf->inodes that were successfully alloced before are not freed. Fix it by adding error hanging path in alloc_wbufs() which frees the memory alloced before when ubifs_wbuf_init() returns an error. Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Li Zetao <[email protected]> Reviewed-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Reserve one leb for each journal head while doing budgetZhihao Cheng1-4/+3
UBIFS calculates available space by c->main_bytes - c->lst.total_used (which means non-index lebs' free and dirty space is accounted into total available), then index lebs and four lebs (one for gc_lnum, one for deletions, two for journal heads) are deducted. In following situation, ubifs may get -ENOSPC from make_reservation(): LEB 84: DATAHD free 122880 used 1920 dirty 2176 dark 6144 LEB 110:DELETION free 126976 used 0 dirty 0 dark 6144 (empty) LEB 201:gc_lnum free 126976 used 0 dirty 0 dark 6144 LEB 272:GCHD free 77824 used 47672 dirty 1480 dark 6144 LEB 356:BASEHD free 0 used 39776 dirty 87200 dark 6144 OTHERS: index lebs, zero-available non-index lebs UBIFS calculates the available bytes is 6888 (How to calculate it: 126976 * 5[remain main bytes] - 1920[used] - 47672[used] - 39776[used] - 126976 * 1[deletions] - 126976 * 1[gc_lnum] - 126976 * 2[journal heads] - 6144 * 5[dark] = 6888) after doing budget, however UBIFS cannot use BASEHD's dirty space(87200), because UBIFS cannot find next BASEHD to reclaim current BASEHD. (c->bi.min_idx_lebs equals to c->lst.idx_lebs, the empty leb won't be found by ubifs_find_free_space(), and dirty index lebs won't be picked as gced lebs. All non-index lebs has dirty space less then c->dead_wm, non-index lebs won't be picked as gced lebs either. So new free lebs won't be produced.). See more details in Link. To fix it, reserve one leb for each journal head while doing budget. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216562 Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: do_rename: Fix wrong space budget when target inode's nlink > 1Zhihao Cheng1-0/+2
If target inode is a special file (eg. block/char device) with nlink count greater than 1, the inode with ui->data will be re-written on disk. However, UBIFS losts target inode's data_len while doing space budget. Bad space budget may let make_reservation() return with -ENOSPC, which could turn ubifs to read-only mode in do_writepage() process. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216494 Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Fix wrong dirty space budget for dirty inodeZhihao Cheng1-1/+1
Each dirty inode should reserve 'c->bi.inode_budget' bytes in space budget calculation. Currently, space budget for dirty inode reports more space than what UBIFS actually needs to write. Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Add comments and debug info for ubifs_xrename()Zhihao Cheng1-0/+9
Just like other operations (eg. ubifs_create, do_rename), add comments and debug information for ubifs_xrename(). Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Rectify space budget for ubifs_xrename()Zhihao Cheng1-0/+5
There is no space budget for ubifs_xrename(). It may let make_reservation() return with -ENOSPC, which could turn ubifs to read-only mode in do_writepage() process. Fix it by adding space budget for ubifs_xrename(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216569 Fixes: 9ec64962afb170 ("ubifs: Implement RENAME_EXCHANGE") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Rectify space budget for ubifs_symlink() if symlink is encryptedZhihao Cheng1-1/+1
Fix bad space budget when symlink file is encrypted. Bad space budget may let make_reservation() return with -ENOSPC, which could turn ubifs to read-only mode in do_writepage() process. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216490 Fixes: ca7f85be8d6cf9 ("ubifs: Add support for encrypted symlinks") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Fix memory leak in ubifs_sysfs_init()Liu Shixin1-0/+2
When insmod ubifs.ko, a kmemleak reported as below: unreferenced object 0xffff88817fb1a780 (size 8): comm "insmod", pid 25265, jiffies 4295239702 (age 100.130s) hex dump (first 8 bytes): 75 62 69 66 73 00 ff ff ubifs... backtrace: [<ffffffff81b3fc4c>] slab_post_alloc_hook+0x9c/0x3c0 [<ffffffff81b44bf3>] __kmalloc_track_caller+0x183/0x410 [<ffffffff8198d3da>] kstrdup+0x3a/0x80 [<ffffffff8198d486>] kstrdup_const+0x66/0x80 [<ffffffff83989325>] kvasprintf_const+0x155/0x190 [<ffffffff83bf55bb>] kobject_set_name_vargs+0x5b/0x150 [<ffffffff83bf576b>] kobject_set_name+0xbb/0xf0 [<ffffffff8100204c>] do_one_initcall+0x14c/0x5a0 [<ffffffff8157e380>] do_init_module+0x1f0/0x660 [<ffffffff815857be>] load_module+0x6d7e/0x7590 [<ffffffff8158644f>] __do_sys_finit_module+0x19f/0x230 [<ffffffff815866b3>] __x64_sys_finit_module+0x73/0xb0 [<ffffffff88c98e85>] do_syscall_64+0x35/0x80 [<ffffffff88e00087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd When kset_register() failed, we should call kset_put to cleanup it. Fixes: 2e3cbf425804 ("ubifs: Export filesystem error counters") Signed-off-by: Liu Shixin <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ubifs: Fix build errors as symbol undefinedLi Hua1-0/+5
With CONFIG_UBIFS_FS_AUTHENTICATION not set, the compiler can assume that ubifs_node_check_hash() is never true and drops the call to ubifs_bad_hash(). Is CONFIG_CC_OPTIMIZE_FOR_SIZE enabled this optimization does not happen anymore. So When CONFIG_UBIFS_FS and CONFIG_CC_OPTIMIZE_FOR_SIZE is enabled but CONFIG_UBIFS_FS_AUTHENTICATION is not set, the build errors is as followd: ERROR: modpost: "ubifs_bad_hash" [fs/ubifs/ubifs.ko] undefined! Fix it by add no-op ubifs_bad_hash() for the CONFIG_UBIFS_FS_AUTHENTICATION=n case. Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes") Signed-off-by: Li Hua <[email protected]> Reviewed-by: Sascha Hauer <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02jffs2: correct logic when creating a hole in jffs2_write_beginYifei Liu1-8/+7
Bug description and fix: 1. Write data to a file, say all 1s from offset 0 to 16. 2. Truncate the file to a smaller size, say 8 bytes. 3. Write new bytes (say 2s) from an offset past the original size of the file, say at offset 20, for 4 bytes. This is supposed to create a "hole" in the file, meaning that the bytes from offset 8 (where it was truncated above) up to the new write at offset 20, should all be 0s (zeros). 4. Flush all caches using "echo 3 > /proc/sys/vm/drop_caches" (or unmount and remount) the f/s. 5. Check the content of the file. It is wrong. The 1s that used to be between bytes 9 and 16, before the truncation, have REAPPEARED (they should be 0s). We wrote a script and helper C program to reproduce the bug (reproduce_jffs2_write_begin_issue.sh, write_file.c, and Makefile). We can make them available to anyone. The above example is shown when writing a small file within the same first page. But the bug happens for larger files, as long as steps 1, 2, and 3 above all happen within the same page. The problem was traced to the jffs2_write_begin code, where it goes into an 'if' statement intended to handle writes past the current EOF (i.e., writes that may create a hole). The code computes a 'pageofs' that is the floor of the write position (pos), aligned to the page size boundary. In other words, 'pageofs' will never be larger than 'pos'. The code then sets the internal jffs2_raw_inode->isize to the size of max(current inode size, pageofs) but that is wrong: the new file size should be the 'pos', which is larger than both the current inode size and pageofs. Similarly, the code incorrectly sets the internal jffs2_raw_inode->dsize to the difference between the pageofs minus current inode size; instead it should be the current pos minus the current inode size. Finally, inode->i_size was also set incorrectly. The patch below fixes this bug. The bug was discovered using a new tool for finding f/s bugs using model checking, called MCFS (Model Checking File Systems). Signed-off-by: Yifei Liu <[email protected]> Signed-off-by: Erez Zadok <[email protected]> Signed-off-by: Manish Adkar <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-02-02ceph: blocklist the kclient when receiving corrupted snap traceXiubo Li6-10/+93
When received corrupted snap trace we don't know what exactly has happened in MDS side. And we shouldn't continue IOs and metadatas access to MDS, which may corrupt or get incorrect contents. This patch will just block all the further IO/MDS requests immediately and then evict the kclient itself. The reason why we still need to evict the kclient just after blocking all the further IOs is that the MDS could revoke the caps faster. Link: https://tracker.ceph.com/issues/57686 Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Venky Shankar <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-02-02ceph: move mount state enum to super.hXiubo Li1-0/+10
These flags are only used in ceph filesystem in fs/ceph, so just move it to the place it should be. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Venky Shankar <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-02-01hostfs: Replace kmap() with kmap_local_page()Fabio M. De Francesco1-7/+8
The use of kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as the mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. Therefore, replace kmap() with kmap_local_page() in hostfs_kern.c, it being the only file with kmap() call sites currently left in fs/hostfs. Cc: "Venkataramanan, Anirudh" <[email protected]> Suggested-by: Ira Weiny <[email protected]> Signed-off-by: Fabio M. De Francesco <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2023-01-31Sync mm-stable with mm-hotfixes-stable to pick up dependent patchesAndrew Morton6-10/+8
Merge branch 'mm-hotfixes-stable' into mm-stable
2023-01-31Squashfs: fix handling and sanity checking of xattr_ids countPhillip Lougher4-5/+5
A Sysbot [1] corrupted filesystem exposes two flaws in the handling and sanity checking of the xattr_ids count in the filesystem. Both of these flaws cause computation overflow due to incorrect typing. In the corrupted filesystem the xattr_ids value is 4294967071, which stored in a signed variable becomes the negative number -225. Flaw 1 (64-bit systems only): The signed integer xattr_ids variable causes sign extension. This causes variable overflow in the SQUASHFS_XATTR_*(A) macros. The variable is first multiplied by sizeof(struct squashfs_xattr_id) where the type of the sizeof operator is "unsigned long". On a 64-bit system this is 64-bits in size, and causes the negative number to be sign extended and widened to 64-bits and then become unsigned. This produces the very large number 18446744073709548016 or 2^64 - 3600. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0 (stored in len). Flaw 2 (32-bit systems only): On a 32-bit system the integer variable is not widened by the unsigned long type of the sizeof operator (32-bits), and the signedness of the variable has no effect due it always being treated as unsigned. The above corrupted xattr_ids value of 4294967071, when multiplied overflows and produces the number 4294963696 or 2^32 - 3400. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows again and produces a length of 0. The effect of the 0 length computation: In conjunction with the corrupted xattr_ids field, the filesystem also has a corrupted xattr_table_start value, where it matches the end of filesystem value of 850. This causes the following sanity check code to fail because the incorrectly computed len of 0 matches the incorrect size of the table reported by the superblock (0 bytes). len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids); indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids); /* * The computed size of the index table (len bytes) should exactly * match the table start and end points */ start = table_start + sizeof(*id_table); end = msblk->bytes_used; if (len != (end - start)) return ERR_PTR(-EINVAL); Changing the xattr_ids variable to be "usigned int" fixes the flaw on a 64-bit system. This relies on the fact the computation is widened by the unsigned long type of the sizeof operator. Casting the variable to u64 in the above macro fixes this flaw on a 32-bit system. It also means 64-bit systems do not implicitly rely on the type of the sizeof operator to widen the computation. [1] https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup") Signed-off-by: Phillip Lougher <[email protected]> Reported-by: <[email protected]> Cc: Alexey Khoroshilov <[email protected]> Cc: Fedor Pchelkin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smapsMike Kravetz1-3/+1
Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs". This issue of mapcount in hugetlb pages referenced by shared PMDs was discussed in [1]. The following two patches address user visible behavior caused by this issue. [1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/ This patch (of 2): A hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. This is because only the first process increases the map count, and subsequent processes just add the shared PMD page to their page table. page_mapcount is being used to decide if a hugetlb page is shared or private in /proc/PID/smaps. Pages referenced via a shared PMD were incorrectly being counted as private. To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found count the hugetlb page as shared. A new helper to check for a shared PMD is added. [[email protected]: simplification, per David] [[email protected]: hugetlb.h: include page_ref.h for page_count()] Link: https://lkml.kernel.org/r/[email protected] Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Signed-off-by: Mike Kravetz <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: James Houghton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31freevxfs: Kconfig: fix spellingRandy Dunlap1-1/+1
Fix a spello in freevxfs Kconfig. (reported by codespell) Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31squashfs: harden sanity check in squashfs_read_xattr_id_tableFedor Pchelkin1-1/+1
While mounting a corrupted filesystem, a signed integer '*xattr_ids' can become less than zero. This leads to the incorrect computation of 'len' and 'indexes' values which can cause null-ptr-deref in copy_bio_to_actor() or out-of-bounds accesses in the next sanity checks inside squashfs_read_xattr_id_table(). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Link: https://lkml.kernel.org/r/[email protected] Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup") Reported-by: <[email protected]> Signed-off-by: Fedor Pchelkin <[email protected]> Signed-off-by: Alexey Khoroshilov <[email protected]> Cc: Phillip Lougher <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31gfs2: Improve gfs2_make_fs_rw error handlingAndreas Gruenbacher1-2/+6
In gfs2_make_fs_rw(), make sure to call gfs2_consist() to report an inconsistency and mark the filesystem as withdrawn when gfs2_find_jhead() fails. At the end of gfs2_make_fs_rw(), when we discover that the filesystem has been withdrawn, make sure we report an error. This also replaces the gfs2_withdrawn() check after gfs2_find_jhead(). Reported-by: Tetsuo Handa <[email protected]> Cc: [email protected] Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31Revert "GFS2: free disk inode which is deleted by remote node -V2"Bob Peterson1-18/+0
This reverts commit 970343cd4904 ("GFS2: free disk inode which is deleted by remote node -V2"). The original intent behind commit 970343cd49 was to cull dentries when a remote node requests to demote an iopen glock, which happens when the remote node tries to delete the inode. This is now handled by gfs2_try_evict(), which is called via iopen_go_callback() -> gfs2_queue_try_to_evict(). Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Evict inodes cooperativelyAndreas Gruenbacher3-0/+59
Add a gfs2_evict_inodes() helper that evicts inodes cooperatively across the cluster. This avoids running into timeouts during unmount unnecessarily. Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Flush delete work before shrinking inode cacheAndreas Gruenbacher2-3/+12
In gfs2_kill_sb(), flush the delete work queue after setting the SDF_DEACTIVATING flag. This ensures that no new inodes will be instantiated anymore, and the inode cache will be empty after the following kill_block_super() -> generic_shutdown_super() -> evict_inodes() call. With that, function gfs2_make_fs_ro() now calls gfs2_flush_delete_work() after the workqueue has been destroyed. Skip that by checking for the presence of the SDF_DEACTIVATING flag. Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Cease delete work during unmountBob Peterson2-1/+5
Add a check to delete_work_func() so that it quits when it finds that the filesystem is deactivating. This speeds up the delete workqueue draining in gfs2_kill_sb(). In addition, make sure that iopen_go_callback() won't queue any new delete work while the filesystem is deactivating. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Add SDF_DEACTIVATING super block flagBob Peterson3-0/+4
Add a new SDF_DEACTIVATING super block flag that is set when the filesystem has started to deactivate. This will be used in the next patch to stop and drain the delete work during unmount. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: check gl_object in rgrp glopsBob Peterson1-3/+6
Function gfs2_clear_rgrpd() is called during unmount to free all rgrps and their sub-objects. If the rgrp glock is held (e.g. in SH) it calls gfs2_glock_cb() to unlock, then calls flush_delayed_work() to make sure any glock work is finished. However, there is a race with other cluster nodes who may request the rgrp glock in another mode (say, EX). Func gfs2_clear_rgrpd() calls glock_clear_object() which sets gl_object to NULL but that's done without holding the gl_lockref spin_lock. While the lock is not held Another node's demote request can cause the state machine to run again, and since the gl_lockref is released in do_xmote, the second process's call to do_xmote can call go_inval (rgrp_go_inval) after the gl_object has been cleared, which results in NULL pointer reference of the rgrp glock's gl_object. Other go_inval glops functions don't require the gl_object to exist, as evidenced by function inode_go_inval() which explicitly checks for if (ip) before referencing gl_object. This patch does the same thing for rgrp glocks. Both the go_inval and go_sync ops are patched to check the existence of gl_object (rgd) before trying to dereference it. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Split the two kinds of glock "delete" workAndreas Gruenbacher6-22/+45
Function delete_work_func() is used for two purposes: * to immediately try to evict the glock's inode, and * to verify after a little while that the inode has been deleted as expected, and didn't just get skipped. These two operations are not separated very well, so introduce two new glock flags to improved that. Split gfs2_queue_delete_work() into gfs2_queue_try_to_evict and gfs2_queue_verify_evict(). Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Move delete workqueue into super blockAndreas Gruenbacher5-16/+22
Move the global delete workqueue into struct gfs2_sbd so that we can flush / drain it without interfering with other filesystems. Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Get rid of GLF_PENDING_DELETE flagAndreas Gruenbacher4-34/+5
Get rid of the GLF_PENDING_DELETE glock flag introduced by commit a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work"). The only use of that flag is to prevent the iopen glock from being demoted (i.e., unlocked) while delete work is pending. It turns out that demoting the iopen glock while delete work is pending is perfectly fine; we only need to make sure that the glock isn't being freed while still in use. This is ensured by the previous patch because delete_work_func() owns a reference while the work is queued or running. With these changes, gfs2_queue_delete_work() no longer takes the glock spin lock, so we can use it in iopen_go_callback() instead of open-coding it there. Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Make glock lru list scanning saferAndreas Gruenbacher1-5/+9
In __gfs2_glock_put(), remove the glock from the lru list *after* dropping the glock lock. This prevents deadlocks against gfs2_scan_glock_lru(). In gfs2_scan_glock_lru(), make sure that the glock's reference count is zero before moving the glock to the dispose list. This skips glocks that are marked dead as well as glocks that are still in use. Additionally, switch to spin_trylock() as we already do in gfs2_dispose_glock_lru(); this alone would also be enough to prevent deadlocks against __gfs2_glock_put(). Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Clean up gfs2_scan_glock_lruAndreas Gruenbacher1-9/+4
Switch to list_for_each_entry_safe() and eliminate the "skipped" list in gfs2_scan_glock_lru(). At the same time, scan the requested number of items to scan, not one more than that number. Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31gfs2: Improve gfs2_upgrade_iopen_glock commentAndreas Gruenbacher1-8/+16
Improve the comment describing the inode and iopen glock interactions and the glock poking related to inode evict. Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-01-31f2fs: remove __add_sum_entryChristoph Hellwig1-21/+2
This function just assigns a summary entry. This can be done entirely typesafe with an open code struct assignment that relies on array indexing. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>