aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-12-07orangefs: Fix kmemleak in orangefs_{kernel,client}_debug_init()Zhang Xiaoxu1-23/+3
When insert and remove the orangefs module, there are memory leaked as below: unreferenced object 0xffff88816b0cc000 (size 2048): comm "insmod", pid 783, jiffies 4294813439 (age 65.512s) hex dump (first 32 bytes): 6e 6f 6e 65 0a 00 00 00 00 00 00 00 00 00 00 00 none............ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000031ab7788>] kmalloc_trace+0x27/0xa0 [<000000005b405fee>] orangefs_debugfs_init.cold+0xaf/0x17f [<00000000e5a0085b>] 0xffffffffa02780f9 [<000000004232d9f7>] do_one_initcall+0x87/0x2a0 [<0000000054f22384>] do_init_module+0xdf/0x320 [<000000003263bdea>] load_module+0x2f98/0x3330 [<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0 [<00000000250ae02b>] do_syscall_64+0x35/0x80 [<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Use the golbal variable as the buffer rather than dynamic allocate to slove the problem. Signed-off-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Mike Marshall <[email protected]>
2022-12-07orangefs: Fix kmemleak in orangefs_sysfs_init()Zhang Xiaoxu1-8/+63
When insert and remove the orangefs module, there are kobjects memory leaked as below: unreferenced object 0xffff88810f95af00 (size 64): comm "insmod", pid 783, jiffies 4294813439 (age 65.512s) hex dump (first 32 bytes): a0 83 af 01 81 88 ff ff 08 af 95 0f 81 88 ff ff ................ 08 af 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000031ab7788>] kmalloc_trace+0x27/0xa0 [<000000005a6e4dfe>] orangefs_sysfs_init+0x42/0x3a0 [<00000000722645ca>] 0xffffffffa02780fe [<000000004232d9f7>] do_one_initcall+0x87/0x2a0 [<0000000054f22384>] do_init_module+0xdf/0x320 [<000000003263bdea>] load_module+0x2f98/0x3330 [<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0 [<00000000250ae02b>] do_syscall_64+0x35/0x80 [<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff88810f95ae80 (size 64): comm "insmod", pid 783, jiffies 4294813439 (age 65.512s) hex dump (first 32 bytes): c8 90 0f 02 81 88 ff ff 88 ae 95 0f 81 88 ff ff ................ 88 ae 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000031ab7788>] kmalloc_trace+0x27/0xa0 [<000000001a4841fa>] orangefs_sysfs_init+0xc7/0x3a0 [<00000000722645ca>] 0xffffffffa02780fe [<000000004232d9f7>] do_one_initcall+0x87/0x2a0 [<0000000054f22384>] do_init_module+0xdf/0x320 [<000000003263bdea>] load_module+0x2f98/0x3330 [<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0 [<00000000250ae02b>] do_syscall_64+0x35/0x80 [<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff88810f95ae00 (size 64): comm "insmod", pid 783, jiffies 4294813440 (age 65.511s) hex dump (first 32 bytes): 60 87 a1 00 81 88 ff ff 08 ae 95 0f 81 88 ff ff `............... 08 ae 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000031ab7788>] kmalloc_trace+0x27/0xa0 [<000000005915e797>] orangefs_sysfs_init+0x12b/0x3a0 [<00000000722645ca>] 0xffffffffa02780fe [<000000004232d9f7>] do_one_initcall+0x87/0x2a0 [<0000000054f22384>] do_init_module+0xdf/0x320 [<000000003263bdea>] load_module+0x2f98/0x3330 [<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0 [<00000000250ae02b>] do_syscall_64+0x35/0x80 [<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff88810f95ad80 (size 64): comm "insmod", pid 783, jiffies 4294813440 (age 65.511s) hex dump (first 32 bytes): 78 90 0f 02 81 88 ff ff 88 ad 95 0f 81 88 ff ff x............... 88 ad 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000031ab7788>] kmalloc_trace+0x27/0xa0 [<000000007a14eb35>] orangefs_sysfs_init+0x1ac/0x3a0 [<00000000722645ca>] 0xffffffffa02780fe [<000000004232d9f7>] do_one_initcall+0x87/0x2a0 [<0000000054f22384>] do_init_module+0xdf/0x320 [<000000003263bdea>] load_module+0x2f98/0x3330 [<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0 [<00000000250ae02b>] do_syscall_64+0x35/0x80 [<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff88810f95ac00 (size 64): comm "insmod", pid 783, jiffies 4294813440 (age 65.531s) hex dump (first 32 bytes): e0 ff 67 02 81 88 ff ff 08 ac 95 0f 81 88 ff ff ..g............. 08 ac 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000031ab7788>] kmalloc_trace+0x27/0xa0 [<000000001f38adcb>] orangefs_sysfs_init+0x291/0x3a0 [<00000000722645ca>] 0xffffffffa02780fe [<000000004232d9f7>] do_one_initcall+0x87/0x2a0 [<0000000054f22384>] do_init_module+0xdf/0x320 [<000000003263bdea>] load_module+0x2f98/0x3330 [<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0 [<00000000250ae02b>] do_syscall_64+0x35/0x80 [<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff88810f95ab80 (size 64): comm "insmod", pid 783, jiffies 4294813441 (age 65.530s) hex dump (first 32 bytes): 50 bf 2f 02 81 88 ff ff 88 ab 95 0f 81 88 ff ff P./............. 88 ab 95 0f 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000031ab7788>] kmalloc_trace+0x27/0xa0 [<000000009cc7d95b>] orangefs_sysfs_init+0x2f5/0x3a0 [<00000000722645ca>] 0xffffffffa02780fe [<000000004232d9f7>] do_one_initcall+0x87/0x2a0 [<0000000054f22384>] do_init_module+0xdf/0x320 [<000000003263bdea>] load_module+0x2f98/0x3330 [<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0 [<00000000250ae02b>] do_syscall_64+0x35/0x80 [<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Should add release function for each kobject_type to free the memory. Signed-off-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Mike Marshall <[email protected]>
2022-12-07orangefs: Fix kmemleak in orangefs_prepare_debugfs_help_string()Zhang Xiaoxu1-0/+3
When insert and remove the orangefs module, then debug_help_string will be leaked: unreferenced object 0xffff8881652ba000 (size 4096): comm "insmod", pid 1701, jiffies 4294893639 (age 13218.530s) hex dump (first 32 bytes): 43 6c 69 65 6e 74 20 44 65 62 75 67 20 4b 65 79 Client Debug Key 77 6f 72 64 73 20 61 72 65 20 75 6e 6b 6e 6f 77 words are unknow backtrace: [<0000000004e6f8e3>] kmalloc_trace+0x27/0xa0 [<0000000006f75d85>] orangefs_prepare_debugfs_help_string+0x5e/0x480 [orangefs] [<0000000091270a2a>] _sub_I_65535_1+0x57/0xf70 [crc_itu_t] [<000000004b1ee1a3>] do_one_initcall+0x87/0x2a0 [<000000001d0614ae>] do_init_module+0xdf/0x320 [<00000000efef068c>] load_module+0x2f98/0x3330 [<000000006533b44d>] __do_sys_finit_module+0x113/0x1b0 [<00000000a0da6f99>] do_syscall_64+0x35/0x80 [<000000007790b19b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 When remove the module, should always free debug_help_string. Should always free the allocated buffer when change the free_debug_help_string. Signed-off-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Mike Marshall <[email protected]>
2022-12-07orangefs: Fix sysfs not cleanup when dev init failedZhang Xiaoxu1-4/+4
When the dev init failed, should cleanup the sysfs, otherwise, the module will never be loaded since can not create duplicate sysfs directory: sysfs: cannot create duplicate filename '/fs/orangefs' CPU: 1 PID: 6549 Comm: insmod Tainted: G W 6.0.0+ #44 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 sysfs_warn_dup.cold+0x17/0x24 sysfs_create_dir_ns+0x16d/0x180 kobject_add_internal+0x156/0x3a0 kobject_init_and_add+0xcf/0x120 orangefs_sysfs_init+0x7e/0x3a0 [orangefs] orangefs_init+0xfe/0x1000 [orangefs] do_one_initcall+0x87/0x2a0 do_init_module+0xdf/0x320 load_module+0x2f98/0x3330 __do_sys_finit_module+0x113/0x1b0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 kobject_add_internal failed for orangefs with -EEXIST, don't try to register things with the same name in the same directory. Fixes: 2f83ace37181 ("orangefs: put register_chrdev immediately before register_filesystem") Signed-off-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Mike Marshall <[email protected]>
2022-12-07orangefs: remove redundant assignment to variable buffer_indexColin Ian King1-1/+0
The variable buffer_index is assigned a value that is never read, it is assigned just before the function returns. The assignment is redundant and can be removed. Cleans up clang scan build warning: fs/orangefs/file.c:276:3: warning: Value stored to 'buffer_index' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Mike Marshall <[email protected]>
2022-12-07orangefs: remove variable iColin Ian King1-2/+0
Variable i is just being incremented and it's never used anywhere else. The variable and the increment are redundant so remove it. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Mike Marshall <[email protected]>
2022-12-07fscache: Fix oops due to race with cookie_lru and use_cookieDave Wysochanski1-0/+8
If a cookie expires from the LRU and the LRU_DISCARD flag is set, but the state machine has not run yet, it's possible another thread can call fscache_use_cookie and begin to use it. When the cookie_worker finally runs, it will see the LRU_DISCARD flag set, transition the cookie->state to LRU_DISCARDING, which will then withdraw the cookie. Once the cookie is withdrawn the object is removed the below oops will occur because the object associated with the cookie is now NULL. Fix the oops by clearing the LRU_DISCARD bit if another thread uses the cookie before the cookie_worker runs. BUG: kernel NULL pointer dereference, address: 0000000000000008 ... CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G E 6.0.0-5.dneg.x86_64 #1 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs] RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles] ... Call Trace: netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs] process_one_work+0x217/0x3e0 worker_thread+0x4a/0x3b0 kthread+0xd6/0x100 Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") Reported-by: Daire Byrne <[email protected]> Signed-off-by: Dave Wysochanski <[email protected]> Signed-off-by: David Howells <[email protected]> Tested-by: Daire Byrne <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ # v1 Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Linus Torvalds <[email protected]>
2022-12-07Merge tag 'v6.1-rc8' into efi/nextArd Biesheuvel41-208/+411
Linux 6.1-rc8
2022-12-07erofs: validate the extent length for uncompressed pclustersGao Xiang1-0/+5
syzkaller reported a KASAN use-after-free: https://syzkaller.appspot.com/bug?extid=2ae90e873e97f1faf6f2 The referenced fuzzed image actually has two issues: - m_pa == 0 as a non-inlined pcluster; - The logical length is longer than its physical length. The first issue has already been addressed. This patch addresses the second issue by checking the extent length validity. Reported-by: [email protected] Fixes: 02827e1796b3 ("staging: erofs: add erofs_map_blocks_iter") Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-07erofs: fix missing unmap if z_erofs_get_extent_compressedlen() failsGao Xiang1-4/+2
Otherwise, meta buffers could be leaked. Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes") Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-07erofs: Fix pcluster memleak when its block address is zeroChen Zhongjin1-1/+2
syzkaller reported a memleak: https://syzkaller.appspot.com/bug?id=62f37ff612f0021641eda5b17f056f1668aa9aed unreferenced object 0xffff88811009c7f8 (size 136): ... backtrace: [<ffffffff821db19b>] z_erofs_do_read_page+0x99b/0x1740 [<ffffffff821dee9e>] z_erofs_readahead+0x24e/0x580 [<ffffffff814bc0d6>] read_pages+0x86/0x3d0 ... syzkaller constructed a case: in z_erofs_register_pcluster(), ztailpacking = false and map->m_pa = zero. This makes pcl->obj.index be zero although pcl is not a inline pcluster. Then following path adds refcount for grp, but the refcount won't be put because pcl is inline. z_erofs_readahead() z_erofs_do_read_page() # for another page z_erofs_collector_begin() erofs_find_workgroup() erofs_workgroup_get() Since it's illegal for the block address of a non-inlined pcluster to be zero, add check here to avoid registering the pcluster which would be leaked. Fixes: cecf864d3d76 ("erofs: support inline data decompression") Reported-by: [email protected] Signed-off-by: Chen Zhongjin <[email protected]> Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/Y42Kz6sVkf+XqJRB@debian Signed-off-by: Gao Xiang <[email protected]>
2022-12-07erofs: use kmap_local_page() only for erofs_bread()Gao Xiang5-14/+10
Convert all mapped erofs_bread() users to use kmap_local_page() instead of kmap() or kmap_atomic(). Signed-off-by: Gao Xiang <[email protected]> Reviewed-and-tested-by: Jingbo Xu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2022-12-07erofs: enable large folios for fscache modeJingbo Xu1-2/+1
Enable large folios for fscache mode. Enable this feature for non-compressed format for now, until the compression part supports large folios later. One thing worth noting is that, the feature is not enabled for the meta data routine since meta inodes don't need large folios for now, nor do they support readahead yet. Also document this new feature. Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Jia Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2022-12-07erofs: support large folios for fscache modeJingbo Xu1-68/+80
When large folios supported, one folio can be split into several slices, each of which may be mapped to META/UNMAPPED/MAPPED, and the folio can be unlocked as a whole only when all slices have completed. Thus always allocate erofs_fscache_request for each .read_folio() or .readahead(), in which case the allocated request is responsible for unlocking folios when all slices have completed. As described above, each folio or folio range can be mapped into several slices, while these slices may be mapped to different cookies, and thus each slice needs its own netfs_cache_resources. Here we introduce chained requests to support this, where each .read_folio() or .readahead() calling can correspond to multiple requests. Each request has its own netfs_cache_resources and thus is used to access one cookie. Among these requests, there's a primary request, with the others pointing to the primary request. Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Jia Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2022-12-07erofs: switch to prepare_ondemand_read() in fscache modeJingbo Xu1-167/+94
Switch to prepare_ondemand_read() interface and a self-contained request completion to get rid of netfs_io_[request|subrequest]. The whole request will still be split into slices (subrequest) according to the cache state of the backing file. As long as one of the subrequests fails, the whole request will be marked as failed. Reviewed-by: Gao Xiang <[email protected]> Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Jia Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2022-12-07fscache,cachefiles: add prepare_ondemand_read() callbackJingbo Xu1-27/+50
Add prepare_ondemand_read() callback dedicated for the on-demand read scenario, so that callers from this scenario can be decoupled from netfs_io_subrequest. The original cachefiles_prepare_read() is now refactored to a generic routine accepting a parameter list instead of netfs_io_subrequest. There's no logic change, except that the debug id of subrequest and request is removed from trace_cachefiles_prep_read(). Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Jingbo Xu <[email protected]> Acked-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2022-12-07erofs: clean up cached I/O strategiesGao Xiang1-46/+31
After commit 4c7e42552b3a ("erofs: remove useless cache strategy of DELAYEDALLOC"), only one cached I/O allocation strategy is supported: When cached I/O is preferred, page allocation is applied without direct reclaim. If allocation fails, fall back to inplace I/O. Let's get rid of z_erofs_cache_alloctype. No logical changes. Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Yue Hu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-07erofs: check the uniqueness of fsid in shared domain in advanceHou Tao3-15/+44
When shared domain is enabled, doing mount twice with the same fsid and domain_id will trigger sysfs warning as shown below: sysfs: cannot create duplicate filename '/fs/erofs/d0,meta.bin' CPU: 15 PID: 1051 Comm: mount Not tainted 6.1.0-rc6+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl+0x38/0x49 dump_stack+0x10/0x12 sysfs_warn_dup.cold+0x17/0x27 sysfs_create_dir_ns+0xb8/0xd0 kobject_add_internal+0xb1/0x240 kobject_init_and_add+0x71/0xa0 erofs_register_sysfs+0x89/0x110 erofs_fc_fill_super+0x98c/0xaf0 vfs_get_super+0x7d/0x100 get_tree_nodev+0x16/0x20 erofs_fc_get_tree+0x20/0x30 vfs_get_tree+0x24/0xb0 path_mount+0x2fa/0xa90 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The reason is erofs_fscache_register_cookie() doesn't guarantee the primary data blob (aka fsid) is unique in the shared domain and erofs_register_sysfs() invoked by the second mount will fail due to the duplicated fsid in the shared domain and report warning. It would be better to check the uniqueness of fsid before doing erofs_register_sysfs(), so adding a new flags parameter for erofs_fscache_register_cookie() and doing the uniqueness check if EROFS_REG_COOKIE_NEED_NOEXIST is enabled. After the patch, the error in dmesg for the duplicated mount would be: erofs: ...: erofs_domain_register_cookie: XX already exists in domain YY Reviewed-by: Jia Zhu <[email protected]> Reviewed-by: Jingbo Xu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Fixes: 7d41963759fe ("erofs: Support sharing cookies in the same domain") Signed-off-by: Gao Xiang <[email protected]>
2022-12-07erofs: enable large folios for iomap modeJingbo Xu2-0/+4
Enable large folios for iomap mode. Then the readahead routine will pass down large folios containing multiple pages. Let's enable this for non-compressed format for now, until the compression part supports large folios later. When large folios supported, the iomap routine will allocate iomap_page for each large folio and thus we need iomap_release_folio() and iomap_invalidate_folio() to free iomap_page when these folios get reclaimed or invalidated. Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2022-12-06NFSv4.x: Fail client initialisation if state manager thread can't runTrond Myklebust1-0/+2
If the state manager thread fails to start, then we should just mark the client initialisation as failed so that other processes or threads don't get stuck in nfs_wait_client_init_complete(). Reported-by: ChenXiaoSong <[email protected]> Fixes: 4697bd5e9419 ("NFSv4: Fix a race in the net namespace mount notification") Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06fs: nfs: sysfs: use sysfs_emit() to instead of scnprintf()ye xingchen1-1/+1
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06NFS: use sysfs_emit() to instead of scnprintf()ye xingchen1-1/+1
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06NFS: Allow very small rsize & wsize againAnna Schumaker1-4/+2
940261a19508 introduced nfs_io_size() to clamp the iosize to a multiple of PAGE_SIZE. This had the unintended side effect of no longer allowing iosizes less than a page, which could be useful in some situations. UDP already has an exception that causes it to fall back on the power-of-two style sizes instead. This patch adds an additional exception for very small iosizes. Reported-by: Jeff Layton <[email protected]> Fixes: 940261a19508 ("NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE") Signed-off-by: Anna Schumaker <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06NFSv4.2: Fix up READ_PLUS alignmentAnna Schumaker1-3/+4
Assume that the first segment will be a DATA segment, and place the data directly into the xdr pages so it doesn't need to be shifted. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06NFSv4.2: Set the correct size scratch buffer for decoding READ_PLUSAnna Schumaker1-1/+1
The scratch_buf array is 16 bytes, but I was passing 32 to the xdr_set_scratch_buffer() function. Fix this by using sizeof(), which is what I probably should have been doing this whole time. Fixes: d3b00a802c84 ("NFS: Replace the READ_PLUS decoding code") Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06NFS: avoid spurious warning of lost lock that is being unlocked.NeilBrown3-2/+5
When the NFSv4 state manager recovers state after a server restart, it reports that locks have been lost if it finds any lock state for which recovery hasn't been successful. i.e. any for which NFS_LOCK_INITIALIZED is not set. However it only tries to recover locks that are still linked to inode->i_flctx. So if a lock has been removed from inode->i_flctx, but the state for that lock has not yet been destroyed, then a spurious warning results. nfs4_proc_unlck() calls locks_lock_inode_wait() - which removes the lock from ->i_flctx - before sending the unlock request to the server and before the final nfs4_put_lock_state() is called. This allows a window in which a spurious warning can be produced. So add a new flag NFS_LOCK_UNLOCKING which is set once the decision has been made to unlock the lock. This will prevent it from triggering any warning. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06nfs: fix possible null-ptr-deref when parsing paramHawkins Jiawei1-0/+6
According to commit "vfs: parse: deal with zero length string value", kernel will set the param->string to null pointer in vfs_parse_fs_string() if fs string has zero length. Yet the problem is that, nfs_fs_context_parse_param() will dereferences the param->string, without checking whether it is a null pointer, which may trigger a null-ptr-deref bug. This patch solves it by adding sanity check on param->string in nfs_fs_context_parse_param(). Signed-off-by: Hawkins Jiawei <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06NFSv4: check FMODE_EXEC from open context mode in nfs4_opendata_access()ChenXiaoSong1-11/+5
After converting file f_flags to open context mode by flags_to_mode(), open context mode will have FMODE_EXEC when file open for exec, so we check FMODE_EXEC from open context mode. No functional change, just simplify the code. Signed-off-by: ChenXiaoSong <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06NFS: make sure open context mode have FMODE_EXEC when file open for execChenXiaoSong2-9/+6
Because file f_mode never have FMODE_EXEC, open context mode won't get FMODE_EXEC from file f_mode. Open context mode only care about FMODE_READ/ FMODE_WRITE/FMODE_EXEC, and all info about open context mode can be convert from file f_flags, so convert file f_flags to open context mode by flags_to_mode(). Signed-off-by: ChenXiaoSong <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06gfs2: Partially revert gfs2_inode_lookup changeAndreas Gruenbacher1-0/+2
Commit c412a97cf6c5 changed delete_work_func() to always perform an inode lookup when gfs2_try_evict() fails. This doesn't make sense as a gfs2_try_evict() failure indicates that the inode is likely still in use. Revert that change. Fixes: c412a97cf6c5 ("gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes") Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-06gfs2: Add gfs2_inode_lookup commentAndreas Gruenbacher1-0/+5
Add comment on when and why gfs2_cancel_delete_work() needs to be skipped in gfs2_inode_lookup(). Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-06gfs2: Uninline and improve glock_{set,clear}_objectAndreas Gruenbacher2-26/+45
Those functions have reached a size at which having them inline isn't useful anymore, so uninline them. In addition, report the glock name on assertion failures. Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-06gfs2: Simply dequeue iopen glock in gfs2_evict_inodeAndreas Gruenbacher1-5/+2
With the previous change, to simplify things, we can always just dequeue and uninitialize the iopen glock in gfs2_evict_inode() even if it isn't queued anymore. Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-06gfs2: Clean up after gfs2_create_inode reworkAndreas Gruenbacher2-21/+14
Since commit 3d36e57ff768 ("gfs2: gfs2_create_inode rework"), gfs2_evict_inode() and gfs2_create_inode() / gfs2_inode_lookup() will synchronize via the inode hash table and we can be certain that once a new inode is inserted into the inode hash table(), gfs2_evict_inode() has completely destroyed any previous versions. We no longer need to worry about overlapping inode object lifespans. Update the code and comments accordingly. Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-06gfs2: Avoid dequeuing GL_ASYNC glock holders twiceAndreas Gruenbacher1-0/+8
When a locking request fails, the associated glock holder is automatically dequeued from the list of active and waiting holders. For GL_ASYNC locking requests, this will obviously happen asynchronously and it can race with attempts to cancel that locking request via gfs2_glock_dq(). Therefore, don't forget to check if a locking request has already been dequeued in gfs2_glock_dq(). Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-06gfs2: Make gfs2_glock_hold return its glock argumentAndreas Gruenbacher3-6/+5
This allows code like 'gl = gfs2_glock_hold(...)'. Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-06gfs2: Always check inode size of inline inodesAndreas Gruenbacher3-5/+3
Check if the inode size of stuffed (inline) inodes is within the allowed range when reading inodes from disk (gfs2_dinode_in()). This prevents us from on-disk corruption. The two checks in stuffed_readpage() and gfs2_unstuffer_page() that just truncate inline data to the maximum allowed size don't actually make sense, and they can be removed now as well. Reported-by: [email protected] Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-06gfs2: Cosmetic gfs2_dinode_{in,out} cleanupAndreas Gruenbacher2-33/+35
In each of the two functions, add an inode variable that points to &ip->i_inode and use that throughout the rest of the function. Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-05pstore: Avoid kcore oops by vmap()ing with VM_IOREMAPStephen Boyd1-1/+5
An oops can be induced by running 'cat /proc/kcore > /dev/null' on devices using pstore with the ram backend because kmap_atomic() assumes lowmem pages are accessible with __va(). Unable to handle kernel paging request at virtual address ffffff807ff2b000 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000081d87000 [ffffff807ff2b000] pgd=180000017fe18003, p4d=180000017fe18003, pud=180000017fe18003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: dm_integrity CPU: 7 PID: 21179 Comm: perf Not tainted 5.15.67-10882-ge4eb2eb988cd #1 baa443fb8e8477896a370b31a821eb2009f9bfba Hardware name: Google Lazor (rev3 - 8) (DT) pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __memcpy+0x110/0x260 lr : vread+0x194/0x294 sp : ffffffc013ee39d0 x29: ffffffc013ee39f0 x28: 0000000000001000 x27: ffffff807ff2b000 x26: 0000000000001000 x25: ffffffc0085a2000 x24: ffffff802d4b3000 x23: ffffff80f8a60000 x22: ffffff802d4b3000 x21: ffffffc0085a2000 x20: ffffff8080b7bc68 x19: 0000000000001000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffd3073f2e60 x14: ffffffffad588000 x13: 0000000000000000 x12: 0000000000000001 x11: 00000000000001a2 x10: 00680000fff2bf0b x9 : 03fffffff807ff2b x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffffff802d4b4000 x4 : ffffff807ff2c000 x3 : ffffffc013ee3a78 x2 : 0000000000001000 x1 : ffffff807ff2b000 x0 : ffffff802d4b3000 Call trace: __memcpy+0x110/0x260 read_kcore+0x584/0x778 proc_reg_read+0xb4/0xe4 During early boot, memblock reserves the pages for the ramoops reserved memory node in DT that would otherwise be part of the direct lowmem mapping. Pstore's ram backend reuses those reserved pages to change the memory type (writeback or non-cached) by passing the pages to vmap() (see pfn_to_page() usage in persistent_ram_vmap() for more details) with specific flags. When read_kcore() starts iterating over the vmalloc region, it runs over the virtual address that vmap() returned for ramoops. In aligned_vread() the virtual address is passed to vmalloc_to_page() which returns the page struct for the reserved lowmem area. That lowmem page is passed to kmap_atomic(), which effectively calls page_to_virt() that assumes a lowmem page struct must be directly accessible with __va() and friends. These pages are mapped via vmap() though, and the lowmem mapping was never made, so accessing them via the lowmem virtual address oopses like above. Let's side-step this problem by passing VM_IOREMAP to vmap(). This will tell vread() to not include the ramoops region in the kcore. Instead the area will look like a bunch of zeros. The alternative is to teach kmap() about vmalloc areas that intersect with lowmem. Presumably such a change isn't a one-liner, and there isn't much interest in inspecting the ramoops region in kcore files anyway, so the most expedient route is taken for now. Cc: Brian Geffon <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 404a6043385d ("staging: android: persistent_ram: handle reserving and mapping memory") Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05gfs2: Handle -EBUSY result of insert_inode_locked4Andreas Gruenbacher1-1/+5
When creating a new inode, there is a small chance that an inode lookup for a previous version of the same inode is still in progress. In that case, that previous lookup will eventually fail, but we may still need to retry here. Signed-off-by: Andreas Gruenbacher <[email protected]>
2022-12-05NFS4.x/pnfs: Fix up logging of layout stateidsTrond Myklebust1-2/+2
If the layout is invalid, then just log a '0' value. Signed-off-by: Trond Myklebust <[email protected]>
2022-12-05btrfs: print transaction aborted messages with an error levelFilipe Manana1-3/+3
Currently we print the transaction aborted message with a debug level, but a transaction abort is an exceptional event that indicates something went wrong and it's useful to have it printed with an error level as it helps analysing problems in a production environment, where debug level messages are typically not logged. For example reports from syzbot never include the transaction aborted message, since the log level on the test machines is above the debug level. So change the log level from debug to error. Reviewed-by: Anand Jain <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-12-05btrfs: do not BUG_ON() on ENOMEM when dropping extent items for a rangeFilipe Manana1-2/+8
If we get -ENOMEM while dropping file extent items in a given range, at btrfs_drop_extents(), due to failure to allocate memory when attempting to increment the reference count for an extent or drop the reference count, we handle it with a BUG_ON(). This is excessive, instead we can simply abort the transaction and return the error to the caller. In fact most callers of btrfs_drop_extents(), directly or indirectly, already abort the transaction if btrfs_drop_extents() returns any error. Also, we already have error paths at btrfs_drop_extents() that may return -ENOMEM and in those cases we abort the transaction, like for example anything that changes the b+tree may return -ENOMEM due to a failure to allocate a new extent buffer when COWing an existing extent buffer, such as a call to btrfs_duplicate_item() for example. So replace the BUG_ON() calls with proper logic to abort the transaction and return the error. Reported-by: [email protected] Link: https://lore.kernel.org/linux-btrfs/[email protected]/ CC: [email protected] # 5.4+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-12-05btrfs: fix extent map use-after-free when handling missing device in ↵void0red1-1/+2
read_one_chunk Store the error code before freeing the extent_map. Though it's reference counted structure, in that function it's the first and last allocation so this would lead to a potential use-after-free. The error can happen eg. when chunk is stored on a missing device and the degraded mount option is missing. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216721 Reported-by: eriri <[email protected]> Fixes: adfb69af7d8c ("btrfs: add_missing_dev() should return the actual error") CC: [email protected] # 4.9+ Signed-off-by: void0red <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-12-05btrfs: remove outdated logic from overwrite_item() and add assertionFilipe Manana1-5/+9
As of commit 193df6245704 ("btrfs: search for last logged dir index if it's not cached in the inode"), the overwrite_item() function is always called for a root that is from a fs/subvolume tree. In other words, now it's only used during log replay to modify a fs/subvolume tree. Therefore we can remove the logic that checks if we are dealing with a log tree at overwrite_item(). So remove that logic, replacing it with an assertion and document that if we ever need to support a log root there, we will need to clone the leaf from the fs/subvolume tree and then release it before modifying the log tree, which is needed to avoid a potential deadlock, similar to the one recently fixed by a patch with the subject: "btrfs: do not modify log tree while holding a leaf from fs tree locked" Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-12-05btrfs: unify overwrite_item() and do_overwrite_item()Filipe Manana1-52/+24
After commit 193df6245704 ("btrfs: search for last logged dir index if it's not cached in the inode"), there are no more callers of do_overwrite_item(), except overwrite_item(). Originally both used to be the same function, but were split in commit 086dcbfa50d3 ("btrfs: insert items in batches when logging a directory when possible"), as there was the need to execute all logic of overwrite_item() but skip the tree search, since in the context of directory logging we already had a path with a leaf to copy data from. So unify them again as there is no more need to have them split. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-12-05btrfs: replace strncpy() with strscpy()Artem Chernyshev2-7/+8
Using strncpy() on NUL-terminated strings are deprecated. To avoid possible forming of non-terminated string strscpy() should be used. Found by Linux Verification Center (linuxtesting.org) with SVACE. CC: [email protected] # 4.9+ Signed-off-by: Artem Chernyshev <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-12-05btrfs: fix uninitialized variable in find_first_clear_extent_bitJosef Bacik1-1/+1
This was caught when syncing extent-io-tree.c into btrfs-progs. This however isn't really a problem, the only way next would be uninitialized is if we found the range we were looking for, and in this case we don't care about next. However it's a compile error, so fix it up. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-12-05btrfs: fix uninitialized parent in insert_stateJosef Bacik1-1/+1
I don't know how this isn't caught when we build this in the kernel, but while syncing extent-io-tree.c into btrfs-progs I got an error because parent could potentially be uninitialized when we link in a new node, specifically when the extent_io_tree is empty. This means we could have garbage in the parent color. I don't know what the ramifications are of that, but it's probably not great, so fix this by initializing parent to NULL. I spot checked all of our other usages in btrfs and we appear to be doing the correct thing everywhere else. Fixes: c7e118cf98c7 ("btrfs: open code rbtree search in insert_state") CC: [email protected] # 6.0+ Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-12-05btrfs: add might_sleep() annotationsChenXiaoSong1-0/+4
Add annotations to functions that might sleep due to allocations or IO and could be called from various contexts. In case of btrfs_search_slot it's not obvious why it would sleep: btrfs_search_slot setup_nodes_for_search reada_for_balance btrfs_readahead_node_child btrfs_readahead_tree_block btrfs_find_create_tree_block alloc_extent_buffer kmem_cache_zalloc /* allocate memory non-atomically, might sleep */ kmem_cache_alloc(GFP_NOFS|__GFP_NOFAIL|__GFP_ZERO) read_extent_buffer_pages submit_extent_page /* disk IO, might sleep */ submit_one_bio Other examples where the sleeping could happen is in 3 places might sleep in update_qgroup_limit_item(), as shown below: update_qgroup_limit_item btrfs_alloc_path /* allocate memory non-atomically, might sleep */ kmem_cache_zalloc(btrfs_path_cachep, GFP_NOFS) Signed-off-by: ChenXiaoSong <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>