aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-12-13ubifs: Pass node length in all node dumping callersZhihao Cheng15-52/+54
Function ubifs_dump_node() has been modified to avoid memory oob accessing while dumping node, node length (corresponding to the size of allocated memory for node) should be passed into all node dumping callers. Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13Revert "ubifs: Fix out-of-bounds memory access caused by abnormal value of ↵Zhihao Cheng1-14/+2
node_len" This reverts commit acc5af3efa30 ("ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len") No need to avoid memory oob in dumping for data node alone. Later, node length will be passed into function 'ubifs_dump_node()' which replaces all node dumping places. Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13ubifs: Limit dumping length by size of memory which is allocated for the nodeZhihao Cheng2-14/+52
To prevent memory out-of-bounds accessing in ubifs_dump_node(), actual dumping length should be restricted by another condition(size of memory which is allocated for the node). This patch handles following situations (These situations may be caused by bit flipping due to hardware error, writing bypass ubifs, unknown bugs in ubifs, etc.): 1. bad node_len: Dumping data according to 'ch->len' which may exceed the size of memory allocated for node. 2. bad node content: Some kinds of node can record additional data, eg. index node and orphan node, make sure the size of additional data not beyond the node length. 3. node_type changes: Read data according to type A, but expected type B, before that, node is allocated according to type B's size. Length of type A node is greater than type B node. Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13ubifs: Remove the redundant return in dbg_check_nondata_nodes_orderChengsong Ke1-1/+0
There is a redundant return in dbg_check_nondata_nodes_order, which will be never reached. In addition, error code should be returned instead of zero in this branch. Signed-off-by: Chengsong Ke <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13jffs2: Fix NULL pointer dereference in rp_size fs option parsingJamie Iles1-5/+5
syzkaller found the following JFFS2 splat: Unable to handle kernel paging request at virtual address dfffa00000000001 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [dfffa00000000001] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 12745 Comm: syz-executor.5 Tainted: G S 5.9.0-rc8+ #98 Hardware name: linux,dummy-virt (DT) pstate: 20400005 (nzCv daif +PAN -UAO BTYPE=--) pc : jffs2_parse_param+0x138/0x308 fs/jffs2/super.c:206 lr : jffs2_parse_param+0x108/0x308 fs/jffs2/super.c:205 sp : ffff000022a57910 x29: ffff000022a57910 x28: 0000000000000000 x27: ffff000057634008 x26: 000000000000d800 x25: 000000000000d800 x24: ffff0000271a9000 x23: ffffa0001adb5dc0 x22: ffff000023fdcf00 x21: 1fffe0000454af2c x20: ffff000024cc9400 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: ffffa000102dbdd0 x15: 0000000000000000 x14: ffffa000109e44bc x13: ffffa00010a3a26c x12: ffff80000476e0b3 x11: 1fffe0000476e0b2 x10: ffff80000476e0b2 x9 : ffffa00010a3ad60 x8 : ffff000023b70593 x7 : 0000000000000003 x6 : 00000000f1f1f1f1 x5 : ffff000023fdcf00 x4 : 0000000000000002 x3 : ffffa00010000000 x2 : 0000000000000001 x1 : dfffa00000000000 x0 : 0000000000000008 Call trace: jffs2_parse_param+0x138/0x308 fs/jffs2/super.c:206 vfs_parse_fs_param+0x234/0x4e8 fs/fs_context.c:117 vfs_parse_fs_string+0xe8/0x148 fs/fs_context.c:161 generic_parse_monolithic+0x17c/0x208 fs/fs_context.c:201 parse_monolithic_mount_data+0x7c/0xa8 fs/fs_context.c:649 do_new_mount fs/namespace.c:2871 [inline] path_mount+0x548/0x1da8 fs/namespace.c:3192 do_mount+0x124/0x138 fs/namespace.c:3205 __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount fs/namespace.c:3390 [inline] __arm64_sys_mount+0x164/0x238 fs/namespace.c:3390 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common.constprop.0+0x15c/0x598 arch/arm64/kernel/syscall.c:149 do_el0_svc+0x60/0x150 arch/arm64/kernel/syscall.c:195 el0_svc+0x34/0xb0 arch/arm64/kernel/entry-common.c:226 el0_sync_handler+0xc8/0x5b4 arch/arm64/kernel/entry-common.c:236 el0_sync+0x15c/0x180 arch/arm64/kernel/entry.S:663 Code: d2d40001 f2fbffe1 91002260 d343fc02 (38e16841) ---[ end trace 4edf690313deda44 ]--- This is because since ec10a24f10c8, the option parsing happens before fill_super and so the MTD device isn't associated with the filesystem. Defer the size check until there is a valid association. Fixes: ec10a24f10c8 ("vfs: Convert jffs2 to use the new mount API") Cc: <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Jamie Iles <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13ubifs: Fixed print foramt mismatch in ubifsFangping Liang1-1/+1
fs/ubifs/super.c: function mount_ubifs: the format specifier "lld" need arg type "long long", but the according arg "old_idx_sz" has type "unsigned long long" Signed-off-by: Fangping Liang <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13jffs2: remove trailing semicolon in macro definitionTom Rix1-2/+2
The macro use will already have a semicolon. Signed-off-by: Tom Rix <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13ubifs: Fix error return code in ubifs_init_authentication()Wang ShaoBo1-1/+3
Fix to return PTR_ERR() error code from the error handling case where ubifs_hash_get_desc() failed instead of 0 in ubifs_init_authentication(), as done elsewhere in this function. Fixes: 49525e5eecca5 ("ubifs: Add helper functions for authentication support") Signed-off-by: Wang ShaoBo <[email protected]> Reviewed-by: Sascha Hauer <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13ubifs: wbuf: Don't leak kernel memory to flashRichard Weinberger1-2/+11
Write buffers use a kmalloc()'ed buffer, they can leak up to seven bytes of kernel memory to flash if writes are not aligned. So use ubifs_pad() to fill these gaps with padding bytes. This was never a problem while scanning because the scanner logic manually aligns node lengths and skips over these gaps. Cc: <[email protected]> Fixes: 1e51764a3c2ac05a2 ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <[email protected]> Reviewed-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13ubifs: Fix the printing type of c->big_lptChengsong Ke2-3/+3
Ubifs uses %d to print c->big_lpt, but c->big_lpt is a variable of type unsigned int and should be printed with %u. Signed-off-by: Chengsong Ke <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13jffs2: Allow setting rp_size to zero during remountinglizhe2-2/+6
Set rp_size to zero will be ignore during remounting. The method to identify whether we input a remounting option of rp_size is to check if the rp_size input is zero. It can not work well if we pass "rp_size=0". This patch add a bool variable "set_rp_size" to fix this problem. Reported-by: Jubin Zhong <[email protected]> Signed-off-by: lizhe <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13jffs2: Fix ignoring mounting options problem during remountinglizhe1-0/+17
The jffs2 mount options will be ignored when remounting jffs2. It can be easily reproduced with the steps listed below. 1. mount -t jffs2 -o compr=none /dev/mtdblockx /mnt 2. mount -o remount compr=zlib /mnt Since ec10a24f10c8, the option parsing happens before fill_super and then pass fc, which contains the options parsing results, to function jffs2_reconfigure during remounting. But function jffs2_reconfigure do not update c->mount_opts. This patch add a function jffs2_update_mount_opts to fix this problem. By the way, I notice that tmpfs use the same way to update remounting options. If it is necessary to unify them? Cc: <[email protected]> Fixes: ec10a24f10c8 ("vfs: Convert jffs2 to use the new mount API") Signed-off-by: lizhe <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13jffs2: Fix GC exit abnormallyZhe Li1-0/+16
The log of this problem is: jffs2: Error garbage collecting node at 0x***! jffs2: No space for garbage collection. Aborting GC thread This is because GC believe that it do nothing, so it abort. After going over the image of jffs2, I find a scene that can trigger this problem stably. The scene is: there is a normal dirent node at summary-area, but abnormal at corresponding not-summary-area with error name_crc. The reason that GC exit abnormally is because it find that abnormal dirent node to GC, but when it goes to function jffs2_add_fd_to_list, it cannot meet the condition listed below: if ((*prev)->nhash == new->nhash && !strcmp((*prev)->name, new->name)) So no node is marked obsolete, statistical information of erase_block do not change, which cause GC exit abnormally. The root cause of this problem is: we do not check the name_crc of the abnormal dirent node with summary is enabled. Noticed that in function jffs2_scan_dirent_node, we use function jffs2_scan_dirty_space to deal with the dirent node with error name_crc. So this patch add a checking code in function read_direntry to ensure the correctness of dirent node. If checked failed, the dirent node will be marked obsolete so GC will pass this node and this problem will be fixed. Cc: <[email protected]> Signed-off-by: Zhe Li <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13ubifs: Code cleanup by removing ifdef macro surroundingChengguang Xu4-10/+4
Define ubifs_listxattr and ubifs_xattr_handlers to NULL when CONFIG_UBIFS_FS_XATTR is not enabled, then we can remove many ugly ifdef macros in the code. Signed-off-by: Chengguang Xu <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13jffs2: Fix if/else empty body warningsRandy Dunlap1-11/+12
When debug (print) macros are not enabled, change them to use the no_printk() macro instead of <nothing>. This fixes gcc warnings when -Wextra is used: ../fs/jffs2/nodelist.c:255:37: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body] ../fs/jffs2/nodelist.c:278:38: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body] ../fs/jffs2/nodelist.c:558:52: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body] ../fs/jffs2/xattr.c:1247:58: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/jffs2/xattr.c:1281:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] Builds without warnings on all 3 levels of CONFIG_JFFS2_FS_DEBUG. Signed-off-by: Randy Dunlap <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: [email protected] Signed-off-by: Richard Weinberger <[email protected]>
2020-12-13ubifs: Delete duplicated words + other fixesRandy Dunlap7-8/+7
Delete repeated words in fs/ubifs/. {negative, is, of, and, one, it} where "it it" was changed to "if it". Signed-off-by: Randy Dunlap <[email protected]> To: [email protected] Cc: Richard Weinberger <[email protected]> Cc: [email protected] Signed-off-by: Richard Weinberger <[email protected]>
2020-12-12fs/xfs: convert comma to semicolonZheng Yongjun1-1/+1
Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Eric Sandeen <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-12-12xfs: open code updating i_mode in xfs_set_aclChristoph Hellwig3-31/+27
Rather than going through the big and hairy xfs_setattr_nonsize function, just open code a transactional i_mode and i_ctime update. This allows to mark xfs_setattr_nonsize and remove the flags argument to it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-12-12xfs: remove xfs_vn_setattr_nonsizeChristoph Hellwig2-20/+7
Merge xfs_vn_setattr_nonsize into the only caller. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-12-12xfs: kill ialloced in xfs_dialloc()Gao Xiang1-13/+9
It's enough to just use return code, and get rid of an argument. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-12-12xfs: spilt xfs_dialloc() into 2 functionsDave Chinner3-37/+48
This patch explicitly separates free inode chunk allocation and inode allocation into two individual high level operations. Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-12-12xfs: move xfs_dialloc_roll() into xfs_dialloc()Dave Chinner3-94/+24
Get rid of the confusing ialloc_context and failure handling around xfs_dialloc() by moving xfs_dialloc_roll() into xfs_dialloc(). Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-12-12xfs: move on-disk inode allocation out of xfs_ialloc()Dave Chinner3-161/+86
So xfs_ialloc() will only address in-core inode allocation then, Also, rename xfs_ialloc() to xfs_dir_ialloc_init() in order to keep everything in xfs_inode.c under the same namespace. Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-12-12xfs: introduce xfs_dialloc_roll()Dave Chinner3-30/+41
Introduce a helper to make the on-disk inode allocation rolling logic clearer in preparation of the following cleanup. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-12-12xfs: convert noroom, okalloc in xfs_dialloc() to boolGao Xiang1-4/+4
Boolean is preferred for such use. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-12-12Merge tag 'io_uring-5.10-2020-12-11' of git://git.kernel.dk/linux-blockLinus Torvalds1-8/+11
Pull io_uring fixes from Jens Axboe: "Two fixes in here, fixing issues introduced in this merge window" * tag 'io_uring-5.10-2020-12-11' of git://git.kernel.dk/linux-block: io_uring: fix file leak on error path of io ctx creation io_uring: fix mis-seting personality's creds
2020-12-12io_uring: remove 'twa_signal_ok' deadlock work-aroundJens Axboe1-15/+6
The TIF_NOTIFY_SIGNAL based implementation of TWA_SIGNAL is always safe to use, regardless of context, as we won't be recursing into the signal lock. So now that all archs are using that, we can drop this deadlock work-around as it's always safe to use TWA_SIGNAL. Signed-off-by: Jens Axboe <[email protected]>
2020-12-12io_uring: JOBCTL_TASK_WORK is no longer used by task_workJens Axboe1-7/+2
Remove the dead code, TWA_SIGNAL will never set JOBCTL_TASK_WORK at this point. Signed-off-by: Jens Axboe <[email protected]>
2020-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski17-104/+162
xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-11Merge tag 'zonefs-5.10-rc7' of ↵Linus Torvalds1-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: "A single patch in this pull request to fix a BIO and page reference leak when writing sequential zone files" * tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: fix page reference and BIO leak
2020-12-11proc: use untagged_addr() for pagemap_read addressesMiles Chen1-2/+6
When we try to visit the pagemap of a tagged userspace pointer, we find that the start_vaddr is not correct because of the tag. To fix it, we should untag the userspace pointers in pagemap_read(). I tested with 5.10-rc4 and the issue remains. Explanation from Catalin in [1]: "Arguably, that's a user-space bug since tagged file offsets were never supported. In this case it's not even a tag at bit 56 as per the arm64 tagged address ABI but rather down to bit 47. You could say that the problem is caused by the C library (malloc()) or whoever created the tagged vaddr and passed it to this function. It's not a kernel regression as we've never supported it. Now, pagemap is a special case where the offset is usually not generated as a classic file offset but rather derived by shifting a user virtual address. I guess we can make a concession for pagemap (only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)" My test code is based on [2]: A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8 userspace program: uint64 OsLayer::VirtualToPhysical(void *vaddr) { uint64 frame, paddr, pfnmask, pagemask; int pagesize = sysconf(_SC_PAGESIZE); off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0 int fd = open(kPagemapPath, O_RDONLY); ... if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) { int err = errno; string errtxt = ErrorString(err); if (fd >= 0) close(fd); return 0; } ... } kernel fs/proc/task_mmu.c: static ssize_t pagemap_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { ... src = *ppos; svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54 start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000 end_vaddr = mm->task_size; /* watch out for wraparound */ // svpfn == 0xb400007662f54 // (mm->task_size >> PAGE) == 0x8000000 if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4 start_vaddr = end_vaddr; ret = 0; while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr int len; unsigned long end; ... } ... } [1] https://lore.kernel.org/patchwork/patch/1343258/ [2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miles Chen <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Will Deacon <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Song Bao Hua (Barry Song) <[email protected]> Cc: <[email protected]> [5.4-] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-11fsnotify: fix events reported to watching parent and childAmir Goldstein2-37/+54
fsnotify_parent() used to send two separate events to backends when a parent inode is watching children and the child inode is also watching. In an attempt to avoid duplicate events in fanotify, we unified the two backend callbacks to a single callback and handled the reporting of the two separate events for the relevant backends (inotify and dnotify). However the handling is buggy and can result in inotify and dnotify listeners receiving events of the type they never asked for or spurious events. The problem is the unified event callback with two inode marks (parent and child) is called when any of the parent and child inodes are watched and interested in the event, but the parent inode's mark that is interested in the event on the child is not necessarily the one we are currently reporting to (it could belong to a different group). So before reporting the parent or child event flavor to backend we need to check that the mark is really interested in that event flavor. The semantics of INODE and CHILD marks were hard to follow and made the logic more complicated than it should have been. Replace it with INODE and PARENT marks semantics to hopefully make the logic more clear. Thanks to Hugh Dickins for spotting a bug in the earlier version of this patch. Fixes: 497b0c5a7c06 ("fsnotify: send event to parent and child with single callback") CC: [email protected] Link: https://lore.kernel.org/r/[email protected] Reported-by: Hugh Dickins <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2020-12-10Merge tag 'nfs-for-5.10-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds7-26/+72
Pull NFS client fixes from Anna Schumaker: "Here are a handful more bugfixes for 5.10. Unfortunately, we found some problems with the new READ_PLUS operation that aren't easy to fix. We've decided to disable this codepath through a Kconfig option for now, but a series of patches going into 5.11 will clean up the code and fix the issues at the same time. This seemed like the best way to go about it. Summary: - Fix array overflow when flexfiles mirroring is enabled - Fix rpcrdma_inline_fixup() crash with new LISTXATTRS - Fix 5 second delay when doing inter-server copy - Disable READ_PLUS by default" * tag 'nfs-for-5.10-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Disable READ_PLUS by default NFSv4.2: Fix 5 seconds delay when doing inter server copy NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabled
2020-12-10Make sure that make_create_in_sticky() never sees uninitialized value of ↵Al Viro1-1/+3
dir_mode make sure nd->dir_mode is always initialized after success exit from link_path_walk(); in case of empty path it did not happen. Reported-by: Anant Thazhemadam <[email protected]> Tested-by: Anant Thazhemadam <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-12-10fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is setHao Li1-1/+8
If DCACHE_REFERENCED is set, fast_dput() will return true, and then retain_dentry() have no chance to check DCACHE_DONTCACHE. As a result, the dentry won't be killed and the corresponding inode can't be evicted. In the following example, the DAX policy can't take effects unless we do a drop_caches manually. # DCACHE_LRU_LIST will be set echo abcdefg > test.txt # DCACHE_REFERENCED will be set and DCACHE_DONTCACHE can't do anything xfs_io -c 'chattr +x' test.txt # Drop caches to make DAX changing take effects echo 2 > /proc/sys/vm/drop_caches What this patch does is preventing fast_dput() from returning true if DCACHE_DONTCACHE is set. Then retain_dentry() will detect the DCACHE_DONTCACHE and will return false. As a result, the dentry will be killed and the inode will be evicted. In this way, if we change per-file DAX policy, it will take effects automatically after this file is closed by all processes. I also add some comments to make the code more clear. Signed-off-by: Hao Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-12-10fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()Hao Li1-1/+3
If generic_drop_inode() returns true, it means iput_final() can evict this inode regardless of whether it is dirty or not. If we check I_DONTCACHE in generic_drop_inode(), any inode with this bit set will be evicted unconditionally. This is not the desired behavior because I_DONTCACHE only means the inode shouldn't be cached on the LRU list. As for whether we need to evict this inode, this is what generic_drop_inode() should do. This patch corrects the usage of I_DONTCACHE. This patch was proposed in [1]. [1]: https://lore.kernel.org/linux-fsdevel/[email protected]/ Fixes: dae2f8ed7992 ("fs: Lift XFS_IDONTCACHE to the VFS layer") Signed-off-by: Hao Li <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-12-10fs/namespace.c: WARN if mnt_count has become negativeEric Biggers2-4/+7
Missing calls to mntget() (or equivalently, too many calls to mntput()) are hard to detect because mntput() delays freeing mounts using task_work_add(), then again using call_rcu(). As a result, mnt_count can often be decremented to -1 without getting a KASAN use-after-free report. Such cases are still bugs though, and they point to real use-after-frees being possible. For an example of this, see the bug fixed by commit 1b0b9cc8d379 ("vfs: fsmount: add missing mntget()"), discussed at https://lkml.kernel.org/linux-fsdevel/20190605135401.GB30925@xxxxxxxxxxxxxxxxxxxxxxxxx/T/#u. This bug *should* have been trivial to find. But actually, it wasn't found until syzkaller happened to use fchdir() to manipulate the reference count just right for the bug to be noticeable. Address this by making mntput_no_expire() issue a WARN if mnt_count has become negative. Suggested-by: Miklos Szeredi <[email protected]> Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-12-10NFS: Disable READ_PLUS by defaultAnna Schumaker2-1/+10
We've been seeing failures with xfstests generic/091 and generic/263 when using READ_PLUS. I've made some progress on these issues, and the tests fail later on but still don't pass. Let's disable READ_PLUS by default until we can work out what is going on. Signed-off-by: Anna Schumaker <[email protected]>
2020-12-10NFSv4.2: Fix 5 seconds delay when doing inter server copyDai Ngo1-1/+1
Since commit b4868b44c5628 ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5 seconds delay regardless of the size of the copy. The delay is from nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential fails because the seqid in both nfs4_state and nfs4_stateid are 0. Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state, until after the call to update_open_stateid, to indicate this is the 1st open. This fix is part of a 2 patches, the other patch is the fix in the source server to return the stateid for COPY_NOTIFY request with seqid 1 instead of 0. Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy") Signed-off-by: Dai Ngo <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2020-12-10NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operationChuck Lever2-9/+13
By switching to an XFS-backed export, I am able to reproduce the ibcomp worker crash on my client with xfstests generic/013. For the failing LISTXATTRS operation, xdr_inline_pages() is called with page_len=12 and buflen=128. - When ->send_request() is called, rpcrdma_marshal_req() does not set up a Reply chunk because buflen is smaller than the inline threshold. Thus rpcrdma_convert_iovs() does not get invoked at all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked on the receive buffer. - During reply processing, rpcrdma_inline_fixup() tries to copy received data into rq_rcv_buf->pages because page_len is positive. But there are no receive pages because rpcrdma_marshal_req() never allocated them. The result is that the ibcomp worker faults and dies. Sometimes that causes a visible crash, and sometimes it results in a transport hang without other symptoms. RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and should eventually be fixed or replaced. However, my preference is that upper-layer operations should explicitly allocate their receive buffers (using GFP_KERNEL) when possible, rather than relying on XDRBUF_SPARSE_PAGES. Reported-by: Olga kornievskaia <[email protected]> Suggested-by: Olga kornievskaia <[email protected]> Fixes: c10a75145feb ("NFSv4.2: add the extended attribute proc functions.") Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Olga kornievskaia <[email protected]> Reviewed-by: Frank van der Linden <[email protected]> Tested-by: Olga kornievskaia <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2020-12-10exec: Transform exec_update_mutex into a rw_semaphoreEric W. Biederman2-11/+11
Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The problematic lock ordering found by lockdep was: perf_event_open (exec_update_mutex -> ovl_i_mutex) chown (ovl_i_mutex -> sb_writes) sendfile (sb_writes -> p->lock) by reading from a proc file and writing to overlayfs proc_pid_syscall (p->lock -> exec_update_mutex) While looking at possible solutions it occured to me that all of the users and possible users involved only wanted to state of the given process to remain the same. They are all readers. The only writer is exec. There is no reason for readers to block on each other. So fix this deadlock by transforming exec_update_mutex into a rw_semaphore named exec_update_lock that only exec takes for writing. Cc: Jann Horn <[email protected]> Cc: Vasiliy Kulikov <[email protected]> Cc: Al Viro <[email protected]> Cc: Bernd Edlinger <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Christopher Yeoh <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Sargun Dhillon <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex") [0] https://lkml.kernel.org/r/[email protected] Reported-by: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
2020-12-10exec: Move io_uring_task_cancel after the point of no returnEric W. Biederman1-5/+5
Now that unshare_files happens in begin_new_exec after the point of no return, io_uring_task_cancel can also happen later. Effectively this means io_uring activities for a task are only canceled when exec succeeds. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
2020-12-10coredump: Document coredump code exclusively used by cell spufsEric W. Biederman2-0/+3
Oleg Nesterov recently asked[1] why is there an unshare_files in do_coredump. After digging through all of the callers of lookup_fd it turns out that it is arch/powerpc/platforms/cell/spufs/coredump.c:coredump_next_context that needs the unshare_files in do_coredump. Looking at the history[2] this code was also the only piece of coredump code that required the unshare_files when the unshare_files was added. Looking at that code it turns out that cell is also the only architecture that implements elf_coredump_extra_notes_size and elf_coredump_extra_notes_write. I looked at the gdb repo[3] support for cell has been removed[4] in binutils 2.34. Geoff Levand reports he is still getting questions on how to run modern kernels on the PS3, from people using 3rd party firmware so this code is not dead. According to Wikipedia the last PS3 shipped in Japan sometime in 2017. So it will probably be a little while before everyone's hardware dies. Add some comments briefly documenting the coredump code that exists only to support cell spufs to make it easier to understand the coredump code. Eventually the hardware will be dead, or their won't be userspace tools, or the coredump code will be refactored and it will be too difficult to update a dead architecture and these comments make it easy to tell where to pull to remove cell spufs support. [1] https://lkml.kernel.org/r/[email protected] [2] 179e037fc137 ("do_coredump(): make sure that descriptor table isn't shared") [3] git://sourceware.org/git/binutils-gdb.git [4] abf516c6931a ("Remove Cell Broadband Engine debugging support"). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
2020-12-10file: Remove get_files_structEric W. Biederman1-13/+0
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance. Now that get_files_struct has no more users and can not cause the problems for posix file locking and fget_light remove get_files_struct so that it does not gain any new users. [1] https://lkml.kernel.org/r/[email protected] Suggested-by: Oleg Nesterov <[email protected]> Acked-by: Christian Brauner <[email protected]> v1: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
2020-12-10file: Rename __close_fd_get_file close_fd_get_fileEric W. Biederman2-3/+3
The function close_fd_get_file is explicitly a variant of __close_fd[1]. Now that __close_fd has been renamed close_fd, rename close_fd_get_file to be consistent with close_fd. When __alloc_fd, __close_fd and __fd_install were introduced the double underscore indicated that the function took a struct files_struct parameter. The function __close_fd_get_file never has so the naming has always been inconsistent. This just cleans things up so there are not any lingering mentions or references __close_fd left in the code. [1] 80cd795630d6 ("binder: fix use-after-free due to ksys_close() during fdget()") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
2020-12-10file: Replace ksys_close with close_fdEric W. Biederman1-2/+3
Now that ksys_close is exactly identical to close_fd replace the one caller of ksys_close with close_fd. [1] https://lkml.kernel.org/r/[email protected] Suggested-by: Christoph Hellwig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
2020-12-10file: Rename __close_fd to close_fd and remove the files parameterEric W. Biederman2-7/+5
The function __close_fd was added to support binder[1]. Now that binder has been fixed to no longer need __close_fd[2] all calls to __close_fd pass current->files. Therefore transform the files parameter into a local variable initialized to current->files, and rename __close_fd to close_fd to reflect this change, and keep it in sync with the similar changes to __alloc_fd, and __fd_install. This removes the need for callers to care about the extra care that needs to be take if anything except current->files is passed, by limiting the callers to only operation on current->files. [1] 483ce1d4b8c3 ("take descriptor-related part of close() to file.c") [2] 44d8047f1d87 ("binder: use standard functions to allocate fds") Acked-by: Christian Brauner <[email protected]> v1: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
2020-12-10file: Merge __alloc_fd into alloc_fdEric W. Biederman1-8/+3
The function __alloc_fd was added to support binder[1]. With binder fixed[2] there are no more users. As alloc_fd just calls __alloc_fd with "files=current->files", merge them together by transforming the files parameter into a local variable initialized to current->files. [1] dcfadfa4ec5a ("new helper: __alloc_fd()") [2] 44d8047f1d87 ("binder: use standard functions to allocate fds") Acked-by: Christian Brauner <[email protected]> v1: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
2020-12-10file: In f_dupfd read RLIMIT_NOFILE once.Eric W. Biederman1-4/+5
Simplify the code, and remove the chance of races by reading RLIMIT_NOFILE only once in f_dupfd. Pass the read value of RLIMIT_NOFILE into alloc_fd which is the other location the rlimit was read in f_dupfd. As f_dupfd is the only caller of alloc_fd this changing alloc_fd is trivially safe. Further this causes alloc_fd to take all of the same arguments as __alloc_fd except for the files_struct argument. Acked-by: Christian Brauner <[email protected]> v1: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>
2020-12-10file: Merge __fd_install into fd_installEric W. Biederman1-19/+6
The function __fd_install was added to support binder[1]. With binder fixed[2] there are no more users. As fd_install just calls __fd_install with "files=current->files", merge them together by transforming the files parameter into a local variable initialized to current->files. [1] f869e8a7f753 ("expose a low-level variant of fd_install() for binder") [2] 44d8047f1d87 ("binder: use standard functions to allocate fds") Acked-by: Christian Brauner <[email protected]> v1:https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric W. Biederman <[email protected]>