aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-10-01ext4: fix potential infinite loop in ext4_dx_readdir()yangerkun1-3/+3
When ext4_htree_fill_tree() fails, ext4_dx_readdir() can run into an infinite loop since if info->last_pos != ctx->pos this will reset the directory scan and reread the failing entry. For example: 1. a dx_dir which has 3 block, block 0 as dx_root block, block 1/2 as leaf block which own the ext4_dir_entry_2 2. block 1 read ok and call_filldir which will fill the dirent and update the ctx->pos 3. block 2 read fail, but we has already fill some dirent, so we will return back to userspace will a positive return val(see ksys_getdents64) 4. the second ext4_dx_readdir will reset the world since info->last_pos != ctx->pos, and will also init the curr_hash which pos to block 1 5. So we will read block1 too, and once block2 still read fail, we can only fill one dirent because the hash of the entry in block1(besides the last one) won't greater than curr_hash 6. this time, we forget update last_pos too since the read for block2 will fail, and since we has got the one entry, ksys_getdents64 can return success 7. Latter we will trapped in a loop with step 4~6 Cc: [email protected] Signed-off-by: yangerkun <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-01ext4: flush s_error_work before journal destroy in ext4_fill_superyangerkun1-1/+4
The error path in ext4_fill_super forget to flush s_error_work before journal destroy, and it may trigger the follow bug since flush_stashed_error_work can run concurrently with journal destroy without any protection for sbi->s_journal. [32031.740193] EXT4-fs (loop66): get root inode failed [32031.740484] EXT4-fs (loop66): mount failed [32031.759805] ------------[ cut here ]------------ [32031.759807] kernel BUG at fs/jbd2/transaction.c:373! [32031.760075] invalid opcode: 0000 [#1] SMP PTI [32031.760336] CPU: 5 PID: 1029268 Comm: kworker/5:1 Kdump: loaded 4.18.0 [32031.765112] Call Trace: [32031.765375] ? __switch_to_asm+0x35/0x70 [32031.765635] ? __switch_to_asm+0x41/0x70 [32031.765893] ? __switch_to_asm+0x35/0x70 [32031.766148] ? __switch_to_asm+0x41/0x70 [32031.766405] ? _cond_resched+0x15/0x40 [32031.766665] jbd2__journal_start+0xf1/0x1f0 [jbd2] [32031.766934] jbd2_journal_start+0x19/0x20 [jbd2] [32031.767218] flush_stashed_error_work+0x30/0x90 [ext4] [32031.767487] process_one_work+0x195/0x390 [32031.767747] worker_thread+0x30/0x390 [32031.768007] ? process_one_work+0x390/0x390 [32031.768265] kthread+0x10d/0x130 [32031.768521] ? kthread_flush_work_fn+0x10/0x10 [32031.768778] ret_from_fork+0x35/0x40 static int start_this_handle(...) BUG_ON(journal->j_flags & JBD2_UNMOUNT); <---- Trigger this Besides, after we enable fast commit, ext4_fc_replay can add work to s_error_work but return success, so the latter journal destroy in ext4_load_journal can trigger this problem too. Fix this problem with two steps: 1. Call ext4_commit_super directly in ext4_handle_error for the case that called from ext4_fc_replay 2. Since it's hard to pair the init and flush for s_error_work, we'd better add a extras flush_work before journal destroy in ext4_fill_super Besides, this patch will call ext4_commit_super in ext4_handle_error for any nojournal case too. But it seems safe since the reason we call schedule_work was that we should save error info to sb through journal if available. Conversely, for the nojournal case, it seems useless delay commit superblock to s_error_work. Fixes: c92dc856848f ("ext4: defer saving error info from atomic context") Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available") Cc: [email protected] Signed-off-by: yangerkun <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-01ext4: fix loff_t overflow in ext4_max_bitmap_size()Ritesh Harjani1-5/+5
We should use unsigned long long rather than loff_t to avoid overflow in ext4_max_bitmap_size() for comparison before returning. w/o this patch sbi->s_bitmap_maxbytes was becoming a negative value due to overflow of upper_limit (with has_huge_files as true) Below is a quick test to trigger it on a 64KB pagesize system. sudo mkfs.ext4 -b 65536 -O ^has_extents,^64bit /dev/loop2 sudo mount /dev/loop2 /mnt sudo echo "hello" > /mnt/hello -> This will error out with "echo: write error: File too large" Signed-off-by: Ritesh Harjani <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/594f409e2c543e90fd836b78188dfa5c575065ba.1622867594.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <[email protected]>
2021-10-01ext4: fix reserved space counter leakageJeffle Xu2-0/+11
When ext4_insert_delayed block receives and recovers from an error from ext4_es_insert_delayed_block(), e.g., ENOMEM, it does not release the space it has reserved for that block insertion as it should. One effect of this bug is that s_dirtyclusters_counter is not decremented and remains incorrectly elevated until the file system has been unmounted. This can result in premature ENOSPC returns and apparent loss of free space. Another effect of this bug is that /sys/fs/ext4/<dev>/delayed_allocation_blocks can remain non-zero even after syncfs has been executed on the filesystem. Besides, add check for s_dirtyclusters_counter when inode is going to be evicted and freed. s_dirtyclusters_counter can still keep non-zero until inode is written back in .evict_inode(), and thus the check is delayed to .destroy_inode(). Fixes: 51865fda28e5 ("ext4: let ext4 maintain extent status tree") Cc: [email protected] Suggested-by: Gao Xiang <[email protected]> Signed-off-by: Jeffle Xu <[email protected]> Reviewed-by: Eric Whitney <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-01ext4: limit the number of blocks in one ADD_RANGE TLVHou Tao1-0/+6
Now EXT4_FC_TAG_ADD_RANGE uses ext4_extent to track the newly-added blocks, but the limit on the max value of ee_len field is ignored, and it can lead to BUG_ON as shown below when running command "fallocate -l 128M file" on a fast_commit-enabled fs: kernel BUG at fs/ext4/ext4_extents.h:199! invalid opcode: 0000 [#1] SMP PTI CPU: 3 PID: 624 Comm: fallocate Not tainted 5.14.0-rc6+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:ext4_fc_write_inode_data+0x1f3/0x200 Call Trace: ? ext4_fc_write_inode+0xf2/0x150 ext4_fc_commit+0x93b/0xa00 ? ext4_fallocate+0x1ad/0x10d0 ext4_sync_file+0x157/0x340 ? ext4_sync_file+0x157/0x340 vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Simply fixing it by limiting the number of blocks in one EXT4_FC_TAG_ADD_RANGE TLV. Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: [email protected] Signed-off-by: Hou Tao <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-09-30ksmbd: missing check for NULL in convert_to_nt_pathname()Dan Carpenter1-10/+7
The kmalloc() does not have a NULL check. This code can be re-written slightly cleaner to just use the kstrdup(). Fixes: 265fd1991c1d ("ksmbd: use LOOKUP_BENEATH to prevent the out of share access") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Namjae Jeon <[email protected]> Acked-by: Hyunchul Lee <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-30nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zeroTrond Myklebust1-8/+11
RFC3530 notes that the 'dircount' field may be zero, in which case the recommendation is to ignore it, and only enforce the 'maxcount' field. In RFC5661, this recommendation to ignore a zero valued field becomes a requirement. Fixes: aee377644146 ("nfsd4: fix rd_dircount enforcement") Cc: <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2021-09-30fs/ntfs3: Check for NULL if ATTR_EA_INFO is incorrectKonstantin Komarov1-1/+3
This can be reason for reported panic https://lore.kernel.org/ntfs3/[email protected]/ Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") Reported-by: Mohammad Rasim <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-30fs/ntfs3: Refactoring of ntfs_init_from_bootKonstantin Komarov2-12/+9
Remove ntfs_sb_info members sector_size and sector_bits. Print details why mount failed. Reviewed-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-30fs/ntfs3: Reject mount if boot's cluster size < media sector sizeKonstantin Komarov1-1/+12
If we continue to work in this case, then we can corrupt fs. Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block"). Reviewed-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-30nfsd: fix error handling of register_pernet_subsys() in init_nfsd()Patrick Ho1-1/+1
init_nfsd() should not unregister pernet subsys if the register fails but should instead unwind from the last successful operation which is register_filesystem(). Unregistering a failed register_pernet_subsys() call can result in a kernel GPF as revealed by programmatically injecting an error in register_pernet_subsys(). Verified the fix handled failure gracefully with no lingering nfsd entry in /proc/filesystems. This change was introduced by the commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first"), the original error handling logic was correct. Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Cc: [email protected] Signed-off-by: Patrick Ho <[email protected]> Acked-by: J. Bruce Fields <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2021-09-30ksmbd: fix transform header validationNamjae Jeon1-9/+9
Validate that the transform and smb request headers are present before checking OriginalMessageSize and SessionId fields. Cc: Ronnie Sahlberg <[email protected]> Cc: Ralph Böhme <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Reviewed-by: Tom Talpey <[email protected]> Acked-by: Hyunchul Lee <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-30ksmbd: add buffer validation for SMB2_CREATE_CONTEXTHyunchul Lee3-13/+74
Add buffer validation for SMB2_CREATE_CONTEXT. Cc: Ronnie Sahlberg <[email protected]> Reviewed-by: Ralph Boehme <[email protected]> Signed-off-by: Hyunchul Lee <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-30ksmbd: add validation in smb2 negotiateNamjae Jeon2-6/+68
This patch add validation to check request buffer check in smb2 negotiate and fix null pointer deferencing oops in smb3_preauth_hash_rsp() that found from manual test. Cc: Tom Talpey <[email protected]> Cc: Ronnie Sahlberg <[email protected]> Cc: Ralph Böhme <[email protected]> Cc: Hyunchul Lee <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Reviewed-by: Ralph Boehme <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-30ksmbd: add request buffer validation in smb2_set_infoNamjae Jeon1-42/+107
Add buffer validation in smb2_set_info, and remove unused variable in set_file_basic_info. and smb2_set_info infolevel functions take structure pointer argument. Cc: Tom Talpey <[email protected]> Cc: Ronnie Sahlberg <[email protected]> Cc: Ralph Böhme <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Acked-by: Hyunchul Lee <[email protected]> Reviewed-by: Ralph Boehme <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-30ksmbd: use correct basic info level in set_file_basic_info()Namjae Jeon2-7/+15
Use correct basic info level in set/get_file_basic_info(). Reviewed-by: Ralph Boehme <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-29ksmbd: remove NTLMv1 authenticationNamjae Jeon3-229/+0
Remove insecure NTLMv1 authentication. Cc: Ronnie Sahlberg <[email protected]> Cc: Ralph Böhme <[email protected]> Reviewed-by: Tom Talpey <[email protected]> Acked-by: Steve French <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-28ksmbd: fix documentation for 2 functionsEnzo Matsumiya1-2/+2
ksmbd_kthread_fn() and create_socket() returns 0 or error code, and not task_struct/ERR_PTR. Signed-off-by: Enzo Matsumiya <[email protected]> Acked-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-28kernfs: also call kernfs_set_rev() for positive dentryHou Tao1-2/+7
A KMSAN warning is reported by Alexander Potapenko: BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 d_revalidate fs/namei.c:854 lookup_dcache fs/namei.c:1522 __lookup_hash+0x3a6/0x590 fs/namei.c:1543 filename_create+0x312/0x7c0 fs/namei.c:3657 do_mkdirat+0x103/0x930 fs/namei.c:3900 __do_sys_mkdir fs/namei.c:3931 __se_sys_mkdir fs/namei.c:3929 __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929 do_syscall_x64 arch/x86/entry/common.c:51 It seems a positive dentry in kernfs becomes a negative dentry directly through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized when accessing it in kernfs_dop_revalidate(), because it is only initialized when created as negative dentry in kernfs_iop_lookup(). The problem can be reproduced by the following command: cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi A simple fixes seems to be initializing d->d_time for positive dentry in kernfs_iop_lookup() as well. The downside is the negative dentry will be revalidated again after it becomes negative in d_delete(), because the revison of its parent must have been increased due to its removal. Alternative solution is implement .d_iput for kernfs, and assign d_time for the newly-generated negative dentry in it. But we may need to take kernfs_rwsem to protect again the concurrent kernfs_link_sibling() on the parent directory, it is a little over-killing. Now the simple fix is chosen. Link: https://marc.info/?l=linux-fsdevel&m=163249838610499 Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching") Reported-by: Alexander Potapenko <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2021-09-28Merge tag 'fsverity-for-linus' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity fix from Eric Biggers: "Fix an integer overflow when computing the Merkle tree layout of extremely large files, exposed by btrfs adding support for fs-verity" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fs-verity: fix signed integer overflow with i_size near S64_MAX
2021-09-28ovl: fix IOCB_DIRECT if underlying fs doesn't support direct IOMiklos Szeredi1-1/+14
Normally the check at open time suffices, but e.g loop device does set IOCB_DIRECT after doing its own checks (which are not sufficent for overlayfs). Make sure we don't call the underlying filesystem read/write method with the IOCB_DIRECT if it's not supported. Reported-by: Huang Jianan <[email protected]> Fixes: 16914e6fc7e1 ("ovl: add ovl_read_iter()") Cc: <[email protected]> # v4.19 Tested-by: Huang Jianan <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-09-27vboxfs: fix broken legacy mount signature checkingLinus Torvalds1-10/+2
Commit 9d682ea6bcc7 ("vboxsf: Fix the check for the old binary mount-arguments struct") was meant to fix a build error due to sign mismatch in 'char' and the use of character constants, but it just moved the error elsewhere, in that on some architectures characters and signed and on others they are unsigned, and that's just how the C standard works. The proper fix is a simple "don't do that then". The code was just being silly and odd, and it should never have cared about signed vs unsigned characters in the first place, since what it is testing is not four "characters", but four bytes. And the way to compare four bytes is by using "memcmp()". Which compilers will know to just turn into a single 32-bit compare with a constant, as long as you don't have crazy debug options enabled. Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Arnd Bergmann <[email protected]> Cc: Hans de Goede <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-27io-wq: exclusively gate signal based exit on get_signal() returnJens Axboe1-4/+1
io-wq threads block all signals, except SIGKILL and SIGSTOP. We should not need any extra checking of signal_pending or fatal_signal_pending, rely exclusively on whether or not get_signal() tells us to exit. The original debugging of this issue led to the false positive that we were exiting on non-fatal signals, but that is not the case. The issue was around races with nr_workers accounting. Fixes: 87c169665578 ("io-wq: ensure we exit if thread group is exiting") Fixes: 15e20db2e0ce ("io-wq: only exit on fatal signals") Reported-by: Eric W. Biederman <[email protected]> Reported-by: Linus Torvalds <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-26ksmbd: fix invalid request buffer access in compoundNamjae Jeon1-2/+11
Ronnie reported invalid request buffer access in chained command when inserting garbage value to NextCommand of compound request. This patch add validation check to avoid this issue. Cc: Tom Talpey <[email protected]> Cc: Ronnie Sahlberg <[email protected]> Cc: Ralph Böhme <[email protected]> Tested-by: Steve French <[email protected]> Reviewed-by: Steve French <[email protected]> Acked-by: Hyunchul Lee <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-26ksmbd: remove RFC1002 check in smb2 requestRonnie Sahlberg2-22/+1
In smb_common.c you have this function : ksmbd_smb_request() which is called from connection.c once you have read the initial 4 bytes for the next length+smb2 blob. It checks the first byte of this 4 byte preamble for valid values, i.e. a NETBIOSoverTCP SESSION_MESSAGE or a SESSION_KEEP_ALIVE. We don't need to check this for ksmbd since it only implements SMB2 over TCP port 445. The netbios stuff was only used in very old servers when SMB ran over TCP port 139. Now that we run over TCP port 445, this is actually not a NB header anymore and you can just treat it as a 4 byte length field that must be less than 16Mbyte. and remove the references to the RFC1002 constants that no longer applies. Cc: Tom Talpey <[email protected]> Cc: Ronnie Sahlberg <[email protected]> Cc: Ralph Böhme <[email protected]> Cc: Steve French <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Acked-by: Hyunchul Lee <[email protected]> Signed-off-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-26Merge tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds8-260/+164
Pull ksmbd fixes from Steve French: "Five fixes for the ksmbd kernel server, including three security fixes: - remove follow symlinks support - use LOOKUP_BENEATH to prevent out of share access - SMB3 compounding security fix - fix for returning the default streams correctly, fixing a bug when writing ppt or doc files from some clients - logging more clearly that ksmbd is experimental (at module load time)" * tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: use LOOKUP_BENEATH to prevent the out of share access ksmbd: remove follow symlinks support ksmbd: check protocol id in ksmbd_verify_smb_message() ksmbd: add default data stream name in FILE_STREAM_INFORMATION ksmbd: log that server is experimental at module load
2021-09-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-3/+8
Merge misc fixes from Andrew Morton: "16 patches. Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts, lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache, debug, and pagemap)" * emailed patches from Andrew Morton <[email protected]>: mm: fix uninitialized use in overcommit_policy_handler mm/memory_failure: fix the missing pte_unmap() call kasan: always respect CONFIG_KASAN_STACK sh: pgtable-3level: fix cast to pointer from integer of different size mm/debug: sync up latest migrate_reason to migrate_reason_names mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN mm: fs: invalidate bh_lrus for only cold path lib/zlib_inflate/inffast: check config in C to avoid unused function warning tools/vm/page-types: remove dependency on opt_file for idle page tracking scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error ocfs2: drop acl cache for directories too mm/shmem.c: fix judgment error in shmem_is_huge() xtensa: increase size of gcc stack frame check mm/damon: don't use strnlen() with known-bogus source length kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
2021-09-25Merge tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-blockLinus Torvalds2-16/+72
Pull io_uring fixes from Jens Axboe: "This one looks a bit bigger than it is, but that's mainly because 2/3 of it is enabling IORING_OP_CLOSE to close direct file descriptors. We've had a few folks using them and finding it confusing that the way to close them is through using -1 for file update, this just brings API symmetry for direct descriptors. Hence I think we should just do this now and have a better API for 5.15 release. There's some room for de-duplicating the close code, but we're leaving that for the next merge window. Outside of that, just small fixes: - Poll race fixes (Hao) - io-wq core dump exit fix (me) - Reschedule around potentially intensive tctx and buffer iterators on teardown (me) - Fix for always ending up punting files update to io-wq (me) - Put the provided buffer meta data under memcg accounting (me) - Tweak for io_write(), removing dead code that was added with the iterator changes in this release (Pavel)" * tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block: io_uring: make OP_CLOSE consistent with direct open io_uring: kill extra checks in io_write() io_uring: don't punt files update to io-wq unconditionally io_uring: put provided buffer meta data under memcg accounting io_uring: allow conditional reschedule for intensive iterators io_uring: fix potential req refcount underflow io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow io_uring: fix race between poll completion and cancel_hash insertion io-wq: ensure we exit if thread group is exiting
2021-09-25Merge tag 'erofs-for-5.15-rc3-fixes' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Two bugfixes to fix the 4KiB blockmap chunk format availability and a dangling pointer usage. There is also a trivial cleanup to clarify compacted_2b if compacted_4b_initial > totalidx. Summary: - fix the dangling pointer use in erofs_lookup tracepoint - fix unsupported chunk format check - zero out compacted_2b if compacted_4b_initial > totalidx" * tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: clear compacted_2b if compacted_4b_initial > totalidx erofs: fix misbehavior of unsupported chunk format check erofs: fix up erofs_lookup tracepoint
2021-09-25Merge tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds4-9/+21
Pull cifs fixes from Steve French: "Six small cifs/smb3 fixes, two for stable: - important fix for deferred close (found by a git functional test) related to attribute caching on close. - four (two cosmetic, two more serious) small fixes for problems pointed out by smatch via Dan Carpenter - fix for comment formatting problems pointed out by W=1" * tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix incorrect check for null pointer in header_assemble smb3: correct server pointer dereferencing check to be more consistent smb3: correct smb3 ACL security descriptor cifs: Clear modified attribute bit from inode flags cifs: Deal with some warnings from W=1 cifs: fix a sign extension bug
2021-09-24ksmbd: use LOOKUP_BENEATH to prevent the out of share accessHyunchul Lee5-206/+140
instead of removing '..' in a given path, call kern_path with LOOKUP_BENEATH flag to prevent the out of share access. ran various test on this: smb2-cat-async smb://127.0.0.1/homes/../out_of_share smb2-cat-async smb://127.0.0.1/homes/foo/../../out_of_share smbclient //127.0.0.1/homes -c "mkdir ../foo2" smbclient //127.0.0.1/homes -c "rename bar ../bar" Cc: Ronnie Sahlberg <[email protected]> Cc: Ralph Boehme <[email protected]> Tested-by: Steve French <[email protected]> Tested-by: Namjae Jeon <[email protected]> Acked-by: Namjae Jeon <[email protected]> Signed-off-by: Hyunchul Lee <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-24mm: fs: invalidate bh_lrus for only cold pathMinchan Kim1-2/+6
The kernel test robot reported the regression of fio.write_iops[1] with commit 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration"). Since lru_add_drain is called frequently, invalidate bh_lrus there could increase bh_lrus cache miss ratio, which needs more IO in the end. This patch moves the bh_lrus invalidation from the hot path( e.g., zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all, lru_cache_disable). Zhengjun Xing confirmed "I test the patch, the regression reduced to -2.9%" [1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/ [2] 8cc621d2f45d, mm: fs: invalidate BH LRU during page migration Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Chris Goldsworthy <[email protected]> Tested-by: "Xing, Zhengjun" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24ocfs2: drop acl cache for directories tooWengang Wang1-1/+2
ocfs2_data_convert_worker() is currently dropping any cached acl info for FILE before down-converting meta lock. It should also drop for DIRECTORY. Otherwise the second acl lookup returns the cached one (from VFS layer) which could be already stale. The problem we are seeing is that the acl changes on one node doesn't get refreshed on other nodes in the following case: Node 1 Node 2 -------------- ---------------- getfacl dir1 getfacl dir1 <-- this is OK setfacl -m u:user1:rwX dir1 getfacl dir1 <-- see the change for user1 getfacl dir1 <-- can't see change for user1 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wengang Wang <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24io_uring: make OP_CLOSE consistent with direct openPavel Begunkov1-1/+51
From recently open/accept are now able to manipulate fixed file table, but it's inconsistent that close can't. Close the gap, keep API same as with open/accept, i.e. via sqe->file_slot. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-24ovl: fix missing negative dentry check in ovl_rename()Zheng Liang1-3/+7
The following reproducer mkdir lower upper work merge touch lower/old touch lower/new mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merge rm merge/new mv merge/old merge/new & unlink upper/new may result in this race: PROCESS A: rename("merge/old", "merge/new"); overwrite=true,ovl_lower_positive(old)=true, ovl_dentry_is_whiteout(new)=true -> flags |= RENAME_EXCHANGE PROCESS B: unlink("upper/new"); PROCESS A: lookup newdentry in new_upperdir call vfs_rename() with negative newdentry and RENAME_EXCHANGE Fix by adding the missing check for negative newdentry. Signed-off-by: Zheng Liang <[email protected]> Fixes: e9be9d5e76e3 ("overlay filesystem") Cc: <[email protected]> # v3.18 Signed-off-by: Miklos Szeredi <[email protected]>
2021-09-24Merge tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-clientLinus Torvalds1-2/+2
Pull ceph fix from Ilya Dryomov: "A fix for a potential array out of bounds access from Dan" * tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-client: ceph: fix off by one bugs in unsafe_request_wait()
2021-09-24Merge tag 'fixes_for_v5.15-rc3' of ↵Linus Torvalds2-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc filesystem fixes from Jan Kara: "A for ext2 sleep in atomic context in case of some fs problems and a cleanup of an invalidate_lock initialization" * tag 'fixes_for_v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: fix sleeping in atomic bugs on error mm: Fully initialize invalidate_lock, amend lock class later
2021-09-24io_uring: kill extra checks in io_write()Pavel Begunkov1-3/+0
We don't retry short writes and so we would never get to async setup in io_write() in that case. Thus ret2 > 0 is always false and iov_iter_advance() is never used. Apparently, the same is found by Coverity, which complains on the code. Fixes: cd65869512ab ("io_uring: use iov_iter state save/restore helpers") Reported-by: Dave Jones <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/5b33e61034748ef1022766efc0fb8854cfcf749c.1632500058.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: don't punt files update to io-wq unconditionallyJens Axboe1-5/+2
There's no reason to punt it unconditionally, we just need to ensure that the submit lock grabbing is conditional. Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: put provided buffer meta data under memcg accountingJens Axboe1-1/+1
For each provided buffer, we allocate a struct io_buffer to hold the data associated with it. As a large number of buffers can be provided, account that data with memcg. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: allow conditional reschedule for intensive iteratorsJens Axboe1-2/+6
If we have a lot of threads and rings, the tctx list can get quite big. This is especially true if we keep creating new threads and rings. Likewise for the provided buffers list. Be nice and insert a conditional reschedule point while iterating the nodes for deletion. Link: https://lore.kernel.org/io-uring/[email protected]/ Reported-by: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: fix potential req refcount underflowHao Xu1-2/+7
For multishot mode, there may be cases like: iowq original context io_poll_add _arm_poll() mask = vfs_poll() is not 0 if mask (2) io_poll_complete() compl_unlock (interruption happens tw queued to original context) io_poll_task_func() compl_lock (3) done = io_poll_complete() is true compl_unlock put req ref (1) if (poll->flags & EPOLLONESHOT) put req ref EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple combinations that can cause ref underfow. Let's address it by: - check the return value in (2) as done - change (1) to if (done) in this way, we only do ref put in (1) if 'oneshot flag' is from (2) - do poll.done check in io_poll_task_func(), so that we won't put ref for the second time. Signed-off-by: Hao Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: fix missing set of EPOLLONESHOT for CQ ring overflowHao Xu1-1/+3
We should set EPOLLONESHOT if cqring_fill_event() returns false since io_poll_add() decides to put req or not by it. Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow") Signed-off-by: Hao Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: fix race between poll completion and cancel_hash insertionHao Xu1-3/+3
If poll arming and poll completion runs in parallel, there maybe races. For instance, run io_poll_add in iowq and io_poll_task_func in original context, then: iowq original context io_poll_add vfs_poll (interruption happens tw queued to original context) io_poll_task_func generate cqe del from cancel_hash[] if !poll.done insert to cancel_hash[] The entry left in cancel_hash[], similar case for fast poll. Fix it by set poll.done = true when del from cancel_hash[]. Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow") Signed-off-by: Hao Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io-wq: ensure we exit if thread group is exitingJens Axboe1-1/+2
Dave reports that a coredumping workload gets stuck in 5.15-rc2, and identified the culprit in the Fixes line below. The problem is that relying solely on fatal_signal_pending() to gate whether to exit or not fails miserably if a process gets eg SIGILL sent. Don't exclusively rely on fatal signals, also check if the thread group is exiting. Fixes: 15e20db2e0ce ("io-wq: only exit on fatal signals") Reported-by: Dave Chinner <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-24fs/ntfs3: Refactoring lock in ntfs_init_aclKonstantin Komarov1-41/+14
This is possible because of moving lock into ntfs_create_inode. Reviewed-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-24fs/ntfs3: Change posix_acl_equiv_mode to posix_acl_update_modeKonstantin Komarov1-11/+4
Right now ntfs3 uses posix_acl_equiv_mode instead of posix_acl_update_mode like all other fs. Reviewed-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-24fs/ntfs3: Pass flags to ntfs_set_ea in ntfs_set_acl_exKonstantin Komarov1-2/+7
In case of removing of xattr there must be XATTR_REPLACE flag and zero length. We already check XATTR_REPLACE in ntfs_set_ea, so now we pass XATTR_REPLACE to ntfs_set_ea. Reviewed-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-24fs/ntfs3: Refactor ntfs_get_acl_ex for better readabilityKonstantin Komarov1-3/+6
We can safely move set_cached_acl because it works with NULL acl too. Reviewed-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-24fs/ntfs3: Move ni_lock_dir and ni_unlock into ntfs_create_inodeKonstantin Komarov2-23/+14
Now ntfs3 locks mutex for smaller time. Theoretically in successful cases those locks aren't needed at all. But proving the same for error cases is difficult. So instead of removing them we just move them. Reviewed-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>