aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-09-17Merge tag 'iov_iter.3-5.15-2021-09-17' of git://git.kernel.dk/linux-blockLinus Torvalds1-40/+76
Pull io_uring iov_iter retry fixes from Jens Axboe: "This adds a helper to save/restore iov_iter state, and modifies io_uring to use it. After that is done, we can now kill the iter->truncated addition that we added for this release. The io_uring change is being overly cautious with the save/restore/advance, but better safe than sorry and we can always improve that and reduce the overhead if it proves to be of concern. The only case to be worried about in this regard is huge IO, where iteration can take a while to iterate segments. I spent some time writing test cases, and expanded the coverage quite a bit from the last posting of this. liburing carries this regression test case now: https://git.kernel.dk/cgit/liburing/tree/test/file-verify.c which exercises all of this. It now also supports provided buffers, and explicitly tests for end-of-file/device truncation as well. On top of that, Pavel sanitized the IOPOLL retry path to follow the exact same pattern as normal IO" * tag 'iov_iter.3-5.15-2021-09-17' of git://git.kernel.dk/linux-block: io_uring: move iopoll reissue into regular IO path Revert "iov_iter: track truncated size" io_uring: use iov_iter state save/restore helpers iov_iter: add helper to save iov_iter state
2021-09-17Merge tag 'io_uring-5.15-2021-09-17' of git://git.kernel.dk/linux-blockLinus Torvalds2-51/+81
Pull io_uring fixes from Jens Axboe: "Mostly fixes for regressions in this cycle, but also a few fixes that predate this release. The odd one out is a tweak to the direct files added in this release, where attempting to reuse a slot is allowed instead of needing an explicit removal of that slot first. It's a considerable improvement in usability to that API, hence I'm sending it for -rc2. - io-wq race fix and cleanup (Hao) - loop_rw_iter() type fix - SQPOLL max worker race fix - Allow poll arm for O_NONBLOCK files, fixing a case where it's impossible to properly use io_uring if you cannot modify the file flags - Allow direct open to simply reuse a slot, instead of needing it explicitly removed first (Pavel) - Fix a case where we missed signal mask restoring in cqring_wait, if we hit -EFAULT (Xiaoguang)" * tag 'io_uring-5.15-2021-09-17' of git://git.kernel.dk/linux-block: io_uring: allow retry for O_NONBLOCK if async is supported io_uring: auto-removal for direct open/accept io_uring: fix missing sigmask restore in io_cqring_wait() io_uring: pin SQPOLL data before unlocking ring lock io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items io-wq: fix potential race of acct->nr_workers io-wq: code clean of io_wqe_create_worker() io_uring: ensure symmetry in handling iter types in loop_rw_iter()
2021-09-17nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWNDai Ngo1-3/+13
When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client recovers by sending BIND_CONN_TO_SESSION but the server fails to recover the back channel and leaves it as NFSD4_CB_DOWN. Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel by calling nfsd4_probe_callback. Signed-off-by: Dai Ngo <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2021-09-17NLM: Fix svcxdr_encode_owner()Chuck Lever1-11/+2
Dai Ngo reports that, since the XDR overhaul, the NLM server crashes when the TEST procedure wants to return NLM_DENIED. There is a bug in svcxdr_encode_owner() that none of our standard test cases found. Replace the open-coded function with a call to an appropriate pre-fabricated XDR helper. Reported-by: Dai Ngo <[email protected]> Fixes: a6a63ca5652e ("lockd: Common NLM XDR helpers") Signed-off-by: Chuck Lever <[email protected]>
2021-09-17ksmbd: transport_rdma: Don't include rwlock.h directlyMike Galbraith1-1/+0
rwlock.h specifically asks to not be included directly. In fact, the proper spinlock.h include isn't needed either, it comes with the huge pile that kthread.h ends up pulling in, so just drop it entirely. Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-17mm: Fully initialize invalidate_lock, amend lock class laterSebastian Andrzej Siewior1-2/+4
The function __init_rwsem() is not part of the official API, it just a helper function used by init_rwsem(). Changing the lock's class and name should be done by using lockdep_set_class_and_name() after the has been fully initialized. The overhead of the additional class struct and setting it twice is negligible and it works across all locks. Fully initialize the lock with init_rwsem() and then set the custom class and name for the lock. Fixes: 730633f0b7f95 ("mm: Protect operations adding pages to page cache with invalidate_lock") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2021-09-16fs/ntfs3: Use min/max macros instated of ternary operatorsKari Argillander3-9/+11
We can make code little bit more readable by using min/max macros. These were found with Coccinelle. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-16fs/ntfs3: Use clamp/max macros instead of comparisonsKari Argillander1-5/+3
We can make code little more readable by using kernel macros clamp/max. This were found with kernel included Coccinelle minmax script. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-16fs/ntfs3: Remove always false condition checkKari Argillander1-4/+1
We do not need this check as this is same thing as NTFS_MIN_MFT_ZONE > zlen. We already check NTFS_MIN_MFT_ZONE <= zlen and exit because is too big request. Remove it so code is cleaner. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-16fs/ntfs3: Fix ntfs_look_for_free_space() does only report -ENOSPCKari Argillander1-24/+27
If ntfs_refresh_zone() returns error it will be changed to -ENOSPC. It is not right. Also caller of this functions also check other errors. Fixes: 78ab59fee07f ("fs/ntfs3: Rework file operations") Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-16fs/ntfs3: Remove tabs before spaces from commentKari Argillander1-1/+1
Remove tabs before spaces from comment as recommended by kernel coding style. Checkpatch also warn about these. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-16fs/ntfs3: Remove braces from single statment blockKari Argillander1-2/+1
Remove braces from single statment block as they are not needed. Also Linux kernel coding style guide recommend this and checkpatch warn about this. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-16fs/ntfs3: Place Comparisons constant right side of the testKari Argillander1-1/+1
For better code readability place constant always right side of the test. This will also address checkpatch warning. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-16fs/ntfs3: Remove '+' before constant in ni_insert_resident()Kari Argillander1-1/+1
No need for plus sign here. So remove it. Checkpatch will also be happy. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-15qnx4: avoid stringop-overread errorsLinus Torvalds1-17/+34
The qnx4 directory entries are 64-byte blocks that have different contents depending on the a status byte that is in the last byte of the block. In particular, a directory entry can be either a "link info" entry with a 48-byte name and pointers to the real inode information, or an "inode entry" with a smaller 16-byte name and the full inode information. But the code was written to always just treat the directory name as if it was part of that "inode entry", and just extend the name to the longer case if the status byte said it was a link entry. That work just fine and gives the right results, but now that gcc is tracking data structure accesses much more, the code can trigger a compiler error about using up to 48 bytes (the long name) in a structure that only has that shorter name in it: fs/qnx4/dir.c: In function ‘qnx4_readdir’: fs/qnx4/dir.c:51:32: error: ‘strnlen’ specified bound 48 exceeds source size 16 [-Werror=stringop-overread] 51 | size = strnlen(de->di_fname, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from fs/qnx4/qnx4.h:3, from fs/qnx4/dir.c:16: include/uapi/linux/qnx4_fs.h:45:25: note: source object declared here 45 | char di_fname[QNX4_SHORT_NAME_MAX]; | ^~~~~~~~ which is because the source code doesn't really make this whole "one of two different types" explicit. Fix this by introducing a very explicit union of the two types, and basically explaining to the compiler what is really going on. Signed-off-by: Linus Torvalds <[email protected]>
2021-09-15io_uring: move iopoll reissue into regular IO pathPavel Begunkov1-19/+15
230d50d448acb ("io_uring: move reissue into regular IO path") made non-IOPOLL I/O to not retry from ki_complete handler. Follow it steps and do the same for IOPOLL. Same problems, same implementation, same -EAGAIN assumptions. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/f80dfee2d5fa7678f0052a8ab3cfca9496a112ca.1631699928.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-09-15io_uring: use iov_iter state save/restore helpersJens Axboe1-21/+61
Get rid of the need to do re-expand and revert on an iterator when we encounter a short IO, or failure that warrants a retry. Use the new state save/restore helpers instead. We keep the iov_iter_state persistent across retries, if we need to restart the read or write operation. If there's a pending retry, the operation will always exit with the state correctly saved. Signed-off-by: Jens Axboe <[email protected]>
2021-09-14io_uring: allow retry for O_NONBLOCK if async is supportedJens Axboe1-5/+11
A common complaint is that using O_NONBLOCK files with io_uring can be a bit of a pain. Be a bit nicer and allow normal retry IFF the file does support async behavior. This makes it possible to use io_uring more reliably with O_NONBLOCK files, for use cases where it either isn't possible or feasible to modify the file flags. Cc: [email protected] Reported-and-tested-by: Dan Melnic <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-14io_uring: auto-removal for direct open/acceptPavel Begunkov1-18/+34
It might be inconvenient that direct open/accept deviates from the update semantics and fails if the slot is taken instead of removing a file sitting there. Implement this auto-removal. Note that removal might need to allocate and so may fail. However, if an empty slot is specified, it's guaraneed to not fail on the fd installation side for valid userspace programs. It's needed for users who can't tolerate such failures, e.g. accept where the other end never retries. Suggested-by: Franz-B. Tuneke <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/c896f14ea46b0eaa6c09d93149e665c2c37979b4.1631632300.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-09-14io_uring: fix missing sigmask restore in io_cqring_wait()Xiaoguang Wang1-8/+8
Move get_timespec() section in io_cqring_wait() before the sigmask saving, otherwise we'll fail to restore sigmask once get_timespec() returns error. Fixes: c73ebb685fb6 ("io_uring: add timeout support for io_uring_enter()") Signed-off-by: Xiaoguang Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-13io_uring: pin SQPOLL data before unlocking ring lockJens Axboe1-3/+9
We need to re-check sqd->thread after we've dropped the lock. Pin the sqd before doing the lockdep lock dance, and check if the thread is alive after that. It's either NULL or alive, as the SQPOLL thread cannot exit without holding the same sqd->lock. Reported-and-tested-by: [email protected] Fixes: fa84693b3c89 ("io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL") Signed-off-by: Jens Axboe <[email protected]>
2021-09-13cifs: fix incorrect kernel doc commentsSteve French2-3/+11
Correct kernel-doc comments pointed out by the automated kernel test robot. Reported-by: kernel test robot <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-13cifs: remove pathname for file from SPDX headerSteve French47-48/+8
checkpatch complains about source files with filenames (e.g. in these cases just below the SPDX header in comments at the top of various files in fs/cifs). It also is helpful to change this now so will be less confusing when the parent directory is renamed e.g. from fs/cifs to fs/smb_client (or fs/smbfs) Reviewed-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-13fs/ntfs3: Always use binary search with entry searchKari Argillander2-47/+6
We do not have any reason to keep old linear search in. Before this was used for error path or if table was so big that it cannot be allocated. Current binary search implementation won't need error path. Remove old references to linear entry search. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Make binary search to search smaller chunks in beginningKari Argillander1-2/+5
We could try to optimize algorithm to first fill just small table and after that use bigger table all the way up to ARRAY_SIZE(offs). This way we can use bigger search array, but not lose benefits with entry count smaller < ARRAY_SIZE(offs). Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Limit binary search table sizeKari Argillander1-69/+41
Current binary search allocates memory for table and fill whole table before we start actual binary search. This is quite inefficient because table fill will always be O(n). Also if table is huge we need to reallocate memory which is costly. This implementation use just stack memory and always when table is full we will check if last element is <= and if not start table fill again. The idea was that it would be same cost as table reallocation. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Remove unneeded header files from c filesKari Argillander15-40/+0
We have lot of unnecessary headers in these files. Remove them so that we help compiler a little bit. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Change right headers to lznt.cKari Argillander1-5/+5
There is lot of headers which we do not need in this file. Delete them and add what we really need. Here is list which identify why we need this header. <linux/kernel.h> // min() <linux/slab.h> // kzalloc() <linux/stddef.h> // offsetof() <linux/string.h> // memcpy(), memset() <linux/types.h> // u8, size_t, etc. "debug.h" // PtrOffset() Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Change right headers to upcase.cKari Argillander1-6/+2
There is no headers. They will be included through ntfs_fs.c, but that is not right thing to do. Let's include headers what this file need straight away. types.h is needed for __le16, u8 etc. kernel.h is needed for le16_to_cpu() Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Change right headers to bitfunc.cKari Argillander1-6/+1
We only need linux/types.h for types like u8 etc. So we can remove rest and help compiler a little bit. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Add missing header and guards to lib/ headersKari Argillander2-0/+11
size_t needs header. Add missing header guards so that compiler will only include these ones. Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Add missing headers and forward declarations to ntfs_fs.hKari Argillander1-0/+31
We do not have headers at all in this file. We should have them so that not every .c file needs to include all of the stuff which this file need for building. This way we can remove some headers from other files and get better picture what is needed. This can save some compilation time. And this can help if we sometimes want to separate this one big header. Also use forward declarations for structs and enums when it not included straight with include and it is used in function declarations input. This will prevent possible compiler warning: xxx declared inside parameter list will not be visible outside of this definition or declaration Here is list which I made when parsing this. There is not necessarily all example from this header file, but this just proofs we need it. <linux/blkdev.h> SECTOR_SHIFT <linux/buffer_head.h> sb_bread(), put_bh <linux/cleancache.h> put_page() <linux/fs.h> struct inode (Just struct ntfs_inode need it) <linux/highmem.h> kunmap(), kmap() <linux/kernel.h> cpu_to_leXX() ALIGN <linux/mm.h> kvfree() <linux/mutex.h> struct mutex, mutex_(un/try)lock() <linux/page-flags.h> PageError() <linux/pagemap.h> read_mapping_page() <linux/rbtree.h> struct rb_root <linux/rwsem.h> struct rw_semaphore <linux/slab.h> krfree(), kzalloc() <linux/string.h> memset() <linux/time64.h> struct timespec64 <linux/types.h> uXX, __leXX <linux/uidgid.h> kuid_t, kgid_t <asm/div64.h> do_div() <asm/page.h> PAGE_SIZE "debug.h" ntfs_err() (Just one entry. Maybe we can drop this) "ntfs.h" Do you even ask? Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Add missing header files to ntfs.hKari Argillander1-0/+9
We do not have header files at all in this file. Add following headers and there is also explanation which for it was added. Note that explanation might not be complete, but it just proofs it is needed. <linux/blkdev.h> // SECTOR_SHIFT <linux/build_bug.h> // static_assert() <linux/kernel.h> // cpu_to_le64, cpu_to_le32, ALIGN <linux/stddef.h> // offsetof() <linux/string.h> // memcmp() <linux/types.h> //__le32, __le16 "debug.h" // PtrOffset(), Add2Ptr() Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3. Add forward declarations for structs to debug.hKari Argillander1-0/+3
Add forward declarations for structs so that we can include this file without warnings even without linux/fs.h Signed-off-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13fs/ntfs3: Remove redundant initialization of variable errColin Ian King1-1/+1
The variable err is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Kari Argillander <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]>
2021-09-13io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg itemsEugene Syromiatnikov1-0/+5
The items passed in the array pointed by the arg parameter of IORING_REGISTER_IOWQ_MAX_WORKERS io_uring_register operation carry certain semantics: they refer to different io-wq worker categories; provide IO_WQ_* constants in the UAPI, so these categories can be referenced in the user space code. Suggested-by: Jens Axboe <[email protected]> Complements: 2e480058ddc21ec5 ("io-wq: provide a way to limit max number of workers") Signed-off-by: Eugene Syromiatnikov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-13afs: Fix updating of i_blocks on file/dir extensionDavid Howells4-13/+13
When an afs file or directory is modified locally such that the total file size is extended, i_blocks needs to be recalculated too. Fix this by making afs_write_end() and afs_edit_dir_add() call afs_set_i_size() rather than setting inode->i_size directly as that also recalculates inode->i_blocks. This can be tested by creating and writing into directories and files and then examining them with du. Without this change, directories show a 4 blocks (they start out at 2048 bytes) and files show 0 blocks; with this change, they should show a number of blocks proportional to the file size rounded up to 1024. Fixes: 31143d5d515e ("AFS: implement basic file write support") Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") Reported-by: Markus Suvanto <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Marc Dionne <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/163113612442.352844.11162345591911691150.stgit@warthog.procyon.org.uk/
2021-09-13afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS serverDavid Howells5-12/+49
AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and Linux's afs client switches between them when talking to a non-YFS server if the read size, the file position or the sum of the two have the upper 32 bits set of the 64-bit value. This is a problem, however, since the file position and length fields of FS.FetchData are *signed* 32-bit values. Fix this by capturing the capability bits obtained from the fileserver when it's sent an FS.GetCapabilities RPC, rather than just discarding them, and then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be used to decide whether to use FS.FetchData or FS.FetchData64 - and also FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to switch on the parameter values. This capabilities flag could also be used to limit the maximum size of the file, but all servers must be checked for that. Note that the issue does not exist with FS.StoreData - that uses *unsigned* 32-bit values. It's also not a problem with Auristor servers as its YFS.FetchData64 op uses unsigned 64-bit values. This can be tested by cloning a git repo through an OpenAFS client to an OpenAFS server and then doing "git status" on it from a Linux afs client[1]. Provided the clone has a pack file that's in the 2G-4G range, the git status will show errors like: error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index This can be observed in the server's FileLog with something like the following appearing: Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001 Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001 Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154 Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866 ... Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5 Note the file position of 18446744071815340032. This is the requested file position sign-extended. Fixes: b9b1f8d5930a ("AFS: write support fixes") Reported-by: Markus Suvanto <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Marc Dionne <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1] Link: https://lore.kernel.org/r/[email protected]/
2021-09-13afs: Try to avoid taking RCU read lock when checking vnode validityDavid Howells4-45/+48
Try to avoid taking the RCU read lock when checking the validity of a vnode's callback state. The only thing it's needed for is to pin the parent volume's server list whilst we search it to find the record of the server we're currently using to see if it has been reinitialised (ie. it sent us a CB.InitCallBackState* RPC). Do this by the following means: (1) Keep an additional per-cell counter (fs_s_break) that's incremented each time any of the fileservers in the cell reinitialises. Since the new counter can be accessed without RCU from the vnode, we can check that first - and only if it differs, get the RCU read lock and check the volume's server list. (2) Replace afs_get_s_break_rcu() with afs_check_server_good() which now indicates whether the callback promise is still expected to be present on the server. This does the checks as described in (1). (3) Restructure afs_check_validity() to take account of the change in (2). We can also get rid of the valid variable and just use the need_clear variable with the addition of the afs_cb_break_no_promise reason. (4) afs_check_validity() probably shouldn't be altering vnode->cb_v_break and vnode->cb_s_break when it doesn't have cb_lock exclusively locked. Move the change to vnode->cb_v_break to __afs_break_callback(). Delegate the change to vnode->cb_s_break to afs_select_fileserver() and set vnode->cb_fs_s_break there also. (5) afs_validate() no longer needs to get the RCU read lock around its call to afs_check_validity() - and can skip the call entirely if we don't have a promise. Signed-off-by: David Howells <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/163111669583.283156.1397603105683094563.stgit@warthog.procyon.org.uk/
2021-09-13afs: Fix mmap coherency vs 3rd-party changesDavid Howells6-3/+119
Fix the coherency management of mmap'd data such that 3rd-party changes become visible as soon as possible after the callback notification is delivered by the fileserver. This is done by the following means: (1) When we break a callback on a vnode specified by the CB.CallBack call from the server, we queue a work item (vnode->cb_work) to go and clobber all the PTEs mapping to that inode. This causes the CPU to trip through the ->map_pages() and ->page_mkwrite() handlers if userspace attempts to access the page(s) again. (Ideally, this would be done in the service handler for CB.CallBack, but the server is waiting for our reply before considering, and we have a list of vnodes, all of which need breaking - and the process of getting the mmap_lock and stripping the PTEs on all CPUs could be quite slow.) (2) Call afs_validate() from the ->map_pages() handler to check to see if the file has changed and to get a new callback promise from the server. Also handle the fileserver telling us that it's dropping all callbacks, possibly after it's been restarted by sending us a CB.InitCallBackState* call by the following means: (3) Maintain a per-cell list of afs files that are currently mmap'd (cell->fs_open_mmaps). (4) Add a work item to each server that is invoked if there are any open mmaps when CB.InitCallBackState happens. This work item goes through the aforementioned list and invokes the vnode->cb_work work item for each one that is currently using this server. This causes the PTEs to be cleared, causing ->map_pages() or ->page_mkwrite() to be called again, thereby calling afs_validate() again. I've chosen to simply strip the PTEs at the point of notification reception rather than invalidate all the pages as well because (a) it's faster, (b) we may get a notification for other reasons than the data being altered (in which case we don't want to clobber the pagecache) and (c) we need to ask the server to find out - and I don't want to wait for the reply before holding up userspace. This was tested using the attached test program: #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/mman.h> int main(int argc, char *argv[]) { size_t size = getpagesize(); unsigned char *p; bool mod = (argc == 3); int fd; if (argc != 2 && argc != 3) { fprintf(stderr, "Format: %s <file> [mod]\n", argv[0]); exit(2); } fd = open(argv[1], mod ? O_RDWR : O_RDONLY); if (fd < 0) { perror(argv[1]); exit(1); } p = mmap(NULL, size, mod ? PROT_READ|PROT_WRITE : PROT_READ, MAP_SHARED, fd, 0); if (p == MAP_FAILED) { perror("mmap"); exit(1); } for (;;) { if (mod) { p[0]++; msync(p, size, MS_ASYNC); fsync(fd); } printf("%02x", p[0]); fflush(stdout); sleep(1); } } It runs in two modes: in one mode, it mmaps a file, then sits in a loop reading the first byte, printing it and sleeping for a second; in the second mode it mmaps a file, then sits in a loop incrementing the first byte and flushing, then printing and sleeping. Two instances of this program can be run on different machines, one doing the reading and one doing the writing. The reader should see the changes made by the writer, but without this patch, they aren't because validity checking is being done lazily - only on entry to the filesystem. Testing the InitCallBackState change is more complicated. The server has to be taken offline, the saved callback state file removed and then the server restarted whilst the reading-mode program continues to run. The client machine then has to poke the server to trigger the InitCallBackState call. Signed-off-by: David Howells <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/163111668833.283156.382633263709075739.stgit@warthog.procyon.org.uk/
2021-09-13afs: Fix incorrect triggering of sillyrename on 3rd-party invalidationDavid Howells1-39/+7
The AFS filesystem is currently triggering the silly-rename cleanup from afs_d_revalidate() when it sees that a dentry has been changed by a third party[1]. It should not be doing this as the cleanup includes deleting the silly-rename target file on iput. Fix this by removing the places in the d_revalidate handling that validate anything other than the directory and the dirent. It probably should not be looking to validate the target inode of the dentry also. This includes removing the point in afs_d_revalidate() where the inode that a dentry used to point to was marked as being deleted (AFS_VNODE_DELETED). We don't know it got deleted. It could have been renamed or it could have hard links remaining. This was reproduced by cloning a git repo onto an afs volume on one machine, switching to another machine and doing "git status", then switching back to the first and doing "git status". The second status would show weird output due to ".git/index" getting deleted by the above mentioned mechanism. A simpler way to do it is to do: machine 1: touch a machine 2: touch b; mv -f b a machine 1: stat a on an afs volume. The bug shows up as the stat failing with ENOENT and the file server log showing that machine 1 deleted "a". Fixes: 79ddbfa500b3 ("afs: Implement sillyrename for unlink and rename") Reported-by: Markus Suvanto <[email protected]> Signed-off-by: David Howells <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c4 [1] Link: https://lore.kernel.org/r/163111668100.283156.3851669884664475428.stgit@warthog.procyon.org.uk/
2021-09-13afs: Add missing vnode validation checksDavid Howells3-3/+41
afs_d_revalidate() should only be validating the directory entry it is given and the directory to which that belongs; it shouldn't be validating the inode/vnode to which that dentry points. Besides, validation need to be done even if we don't call afs_d_revalidate() - which might be the case if we're starting from a file descriptor. In order for afs_d_revalidate() to be fixed, validation points must be added in some other places. Certain directory operations, such as afs_unlink(), already check this, but not all and not all file operations either. Note that the validation of a vnode not only checks to see if the attributes we have are correct, but also gets a promise from the server to notify us if that file gets changed by a third party. Add the following checks: - Check the vnode we're going to make a hard link to. - Check the vnode we're going to move/rename. - Check the vnode we're going to read from. - Check the vnode we're going to write to. - Check the vnode we're going to sync. - Check the vnode we're going to make a mapped page writable for. Some of these aren't strictly necessary as we're going to perform a server operation that might get the attributes anyway from which we can determine if something changed - though it might not get us a callback promise. Signed-off-by: David Howells <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/163111667354.283156.12720698333342917516.stgit@warthog.procyon.org.uk/
2021-09-12io-wq: fix potential race of acct->nr_workersHao Xu1-2/+1
Given max_worker is 1, and we currently have 1 running and it is exiting. There may be race like: io_wqe_enqueue worker1 no work there and timeout unlock(wqe->lock) ->insert work -->io_worker_exit lock(wqe->lock) ->if(!nr_workers) //it's still 1 unlock(wqe->lock) goto run_cancel lock(wqe->lock) nr_workers-- ->dec_running ->worker creation fails unlock(wqe->lock) We enqueued one work but there is no workers, causes hung. Signed-off-by: Hao Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-12io-wq: code clean of io_wqe_create_worker()Hao Xu1-12/+7
Remove do_create to save a local variable. Signed-off-by: Hao Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-12io_uring: ensure symmetry in handling iter types in loop_rw_iter()Jens Axboe1-3/+6
When setting up the next segment, we check what type the iter is and handle it accordingly. However, when incrementing and processed amount we do not, and both iter advance and addr/len are adjusted, regardless of type. Split the increment side just like we do on the setup side. Fixes: 4017eb91a9e7 ("io_uring: make loop_rw_iter() use original user supplied pointers") Cc: [email protected] Reported-by: Valentina Palmiotti <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-12Merge branch 'misc.namei' of ↵Linus Torvalds2-59/+58
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull namei updates from Al Viro: "Clearing fallout from mkdirat in io_uring series. The fix in the kern_path_locked() patch plus associated cleanups" * 'misc.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: putname(): IS_ERR_OR_NULL() is wrong here namei: Standardize callers of filename_create() namei: Standardize callers of filename_lookup() rename __filename_parentat() to filename_parentat() namei: Fix use after free in kern_path_locked
2021-09-12Merge tag '5.15-rc-cifs-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds12-21/+37
Pull smbfs updates from Steve French: "cifs/smb3 updates: - DFS reconnect fix - begin creating common headers for server and client - rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb (smb3 or smbfs is more accurate, as the very old cifs dialect has long been superseded by smb3 dialects). In the future we can rename the fs/cifs directory to fs/smbfs. This does not include the set of multichannel fixes nor the two deferred close fixes (they are still being reviewed and tested)" * tag '5.15-rc-cifs-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: properly invalidate cached root handle when closing it cifs: move SMB FSCTL definitions to common code cifs: rename cifs_common to smbfs_common cifs: update FSCTL definitions
2021-09-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-0/+6
Pull virtio updates from Michael Tsirkin: - vduse driver ("vDPA Device in Userspace") supporting emulated virtio block devices - virtio-vsock support for end of record with SEQPACKET - vdpa: mac and mq support for ifcvf and mlx5 - vdpa: management netlink for ifcvf - virtio-i2c, gpio dt bindings - misc fixes and cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits) Documentation: Add documentation for VDUSE vduse: Introduce VDUSE - vDPA Device in Userspace vduse: Implement an MMU-based software IOTLB vdpa: Support transferring virtual addressing during DMA mapping vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() vhost-iotlb: Add an opaque pointer for vhost IOTLB vhost-vdpa: Handle the failure of vdpa_reset() vdpa: Add reset callback in vdpa_config_ops vdpa: Fix some coding style issues file: Export receive_fd() to modules eventfd: Export eventfd_wake_count to modules iova: Export alloc_iova_fast() and free_iova_fast() virtio-blk: remove unneeded "likely" statements virtio-balloon: Use virtio_find_vqs() helper vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro vsock_test: update message bounds test for MSG_EOR af_vsock: rename variables in receive loop virtio/vsock: support MSG_EOR bit processing vhost/vsock: support MSG_EOR bit processing ...
2021-09-11Merge tag 'io_uring-5.15-2021-09-11' of git://git.kernel.dk/linux-blockLinus Torvalds2-13/+44
Pull io_uring fixes from Jens Axboe: - Fix an off-by-one in a BUILD_BUG_ON() check. Not a real issue right now as we have plenty of flags left, but could become one. (Hao) - Fix lockdep issue introduced in this merge window (me) - Fix a few issues with the worker creation (me, Pavel, Qiang) - Fix regression with wq_has_sleeper() for IOPOLL (Pavel) - Timeout link error propagation fix (Pavel) * tag 'io_uring-5.15-2021-09-11' of git://git.kernel.dk/linux-block: io_uring: fix off-by-one in BUILD_BUG_ON check of __REQ_F_LAST_BIT io_uring: fail links of cancelled timeouts io-wq: fix memory leak in create_io_worker() io-wq: fix silly logic error in io_task_work_match() io_uring: drop ctx->uring_lock before acquiring sqd->lock io_uring: fix missing mb() before waitqueue_active io-wq: fix cancellation on create-worker failure
2021-09-11Merge tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-blockLinus Torvalds3-1697/+2
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph: - fix nvmet command set reporting for passthrough controllers (Adam Manzanares) - update a MAINTAINERS email address (Chaitanya Kulkarni) - set QUEUE_FLAG_NOWAIT for nvme-multipth (me) - handle errors from add_disk() (Luis Chamberlain) - update the keep alive interval when kato is modified (Tatsuya Sasaki) - fix a buffer overrun in nvmet_subsys_attr_serial (Hannes Reinecke) - do not reset transport on data digest errors in nvme-tcp (Daniel Wagner) - only call synchronize_srcu when clearing current path (Daniel Wagner) - revalidate paths during rescan (Hannes Reinecke) - Split out the fs/block_dev into block/fops.c and block/bdev.c, which has been long overdue. Do this now before -rc1, to avoid annoying conflicts due to this (Christoph) - blk-throtl use-after-free fix (Li) - Improve plug depth for multi-device plugs, greatly increasing md resync performance (Song) - blkdev_show() locking fix (Tetsuo) - n64cart error check fix (Yang) * tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-block: n64cart: fix return value check in n64cart_probe() blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queues block: move fs/block_dev.c to block/bdev.c block: split out operations on block special files blk-throttle: fix UAF by deleteing timer in blk_throtl_exit() block: genhd: don't call blkdev_show() with major_names_lock held nvme: update MAINTAINERS email address nvme: add error handling support for add_disk() nvme: only call synchronize_srcu when clearing current path nvme: update keep alive interval when kato is modified nvme-tcp: Do not reset transport on data digest errors nvmet: fixup buffer overrun in nvmet_subsys_attr_serial() nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req nvmet: looks at the passthrough controller when initializing CAP nvme: move nvme_multi_css into nvme.h nvme-multipath: revalidate paths during rescan nvme-multipath: set QUEUE_FLAG_NOWAIT