Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull io_uring iov_iter retry fixes from Jens Axboe:
"This adds a helper to save/restore iov_iter state, and modifies
io_uring to use it.
After that is done, we can now kill the iter->truncated addition that
we added for this release. The io_uring change is being overly
cautious with the save/restore/advance, but better safe than sorry and
we can always improve that and reduce the overhead if it proves to be
of concern. The only case to be worried about in this regard is huge
IO, where iteration can take a while to iterate segments.
I spent some time writing test cases, and expanded the coverage quite
a bit from the last posting of this. liburing carries this regression
test case now:
https://git.kernel.dk/cgit/liburing/tree/test/file-verify.c
which exercises all of this. It now also supports provided buffers,
and explicitly tests for end-of-file/device truncation as well.
On top of that, Pavel sanitized the IOPOLL retry path to follow the
exact same pattern as normal IO"
* tag 'iov_iter.3-5.15-2021-09-17' of git://git.kernel.dk/linux-block:
io_uring: move iopoll reissue into regular IO path
Revert "iov_iter: track truncated size"
io_uring: use iov_iter state save/restore helpers
iov_iter: add helper to save iov_iter state
|
|
Pull io_uring fixes from Jens Axboe:
"Mostly fixes for regressions in this cycle, but also a few fixes that
predate this release.
The odd one out is a tweak to the direct files added in this release,
where attempting to reuse a slot is allowed instead of needing an
explicit removal of that slot first. It's a considerable improvement
in usability to that API, hence I'm sending it for -rc2.
- io-wq race fix and cleanup (Hao)
- loop_rw_iter() type fix
- SQPOLL max worker race fix
- Allow poll arm for O_NONBLOCK files, fixing a case where it's
impossible to properly use io_uring if you cannot modify the file
flags
- Allow direct open to simply reuse a slot, instead of needing it
explicitly removed first (Pavel)
- Fix a case where we missed signal mask restoring in cqring_wait, if
we hit -EFAULT (Xiaoguang)"
* tag 'io_uring-5.15-2021-09-17' of git://git.kernel.dk/linux-block:
io_uring: allow retry for O_NONBLOCK if async is supported
io_uring: auto-removal for direct open/accept
io_uring: fix missing sigmask restore in io_cqring_wait()
io_uring: pin SQPOLL data before unlocking ring lock
io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items
io-wq: fix potential race of acct->nr_workers
io-wq: code clean of io_wqe_create_worker()
io_uring: ensure symmetry in handling iter types in loop_rw_iter()
|
|
When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client
recovers by sending BIND_CONN_TO_SESSION but the server fails to recover
the back channel and leaves it as NFSD4_CB_DOWN.
Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel
by calling nfsd4_probe_callback.
Signed-off-by: Dai Ngo <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
Dai Ngo reports that, since the XDR overhaul, the NLM server crashes
when the TEST procedure wants to return NLM_DENIED. There is a bug
in svcxdr_encode_owner() that none of our standard test cases found.
Replace the open-coded function with a call to an appropriate
pre-fabricated XDR helper.
Reported-by: Dai Ngo <[email protected]>
Fixes: a6a63ca5652e ("lockd: Common NLM XDR helpers")
Signed-off-by: Chuck Lever <[email protected]>
|
|
rwlock.h specifically asks to not be included directly.
In fact, the proper spinlock.h include isn't needed either,
it comes with the huge pile that kthread.h ends up pulling
in, so just drop it entirely.
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
The function __init_rwsem() is not part of the official API, it just a helper
function used by init_rwsem().
Changing the lock's class and name should be done by using
lockdep_set_class_and_name() after the has been fully initialized. The overhead
of the additional class struct and setting it twice is negligible and it works
across all locks.
Fully initialize the lock with init_rwsem() and then set the custom class and
name for the lock.
Fixes: 730633f0b7f95 ("mm: Protect operations adding pages to page cache with invalidate_lock")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
|
|
We can make code little bit more readable by using min/max macros.
These were found with Coccinelle.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
We can make code little more readable by using kernel macros clamp/max.
This were found with kernel included Coccinelle minmax script.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
We do not need this check as this is same thing as
NTFS_MIN_MFT_ZONE > zlen. We already check NTFS_MIN_MFT_ZONE <= zlen and
exit because is too big request. Remove it so code is cleaner.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
If ntfs_refresh_zone() returns error it will be changed to -ENOSPC. It
is not right. Also caller of this functions also check other errors.
Fixes: 78ab59fee07f ("fs/ntfs3: Rework file operations")
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
Remove tabs before spaces from comment as recommended by kernel coding
style. Checkpatch also warn about these.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
Remove braces from single statment block as they are not needed. Also
Linux kernel coding style guide recommend this and checkpatch warn about
this.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
For better code readability place constant always right side of the
test. This will also address checkpatch warning.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
No need for plus sign here. So remove it. Checkpatch will also be happy.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
The qnx4 directory entries are 64-byte blocks that have different
contents depending on the a status byte that is in the last byte of the
block.
In particular, a directory entry can be either a "link info" entry with
a 48-byte name and pointers to the real inode information, or an "inode
entry" with a smaller 16-byte name and the full inode information.
But the code was written to always just treat the directory name as if
it was part of that "inode entry", and just extend the name to the
longer case if the status byte said it was a link entry.
That work just fine and gives the right results, but now that gcc is
tracking data structure accesses much more, the code can trigger a
compiler error about using up to 48 bytes (the long name) in a structure
that only has that shorter name in it:
fs/qnx4/dir.c: In function ‘qnx4_readdir’:
fs/qnx4/dir.c:51:32: error: ‘strnlen’ specified bound 48 exceeds source size 16 [-Werror=stringop-overread]
51 | size = strnlen(de->di_fname, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from fs/qnx4/qnx4.h:3,
from fs/qnx4/dir.c:16:
include/uapi/linux/qnx4_fs.h:45:25: note: source object declared here
45 | char di_fname[QNX4_SHORT_NAME_MAX];
| ^~~~~~~~
which is because the source code doesn't really make this whole "one of
two different types" explicit.
Fix this by introducing a very explicit union of the two types, and
basically explaining to the compiler what is really going on.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
230d50d448acb ("io_uring: move reissue into regular IO path")
made non-IOPOLL I/O to not retry from ki_complete handler. Follow it
steps and do the same for IOPOLL. Same problems, same implementation,
same -EAGAIN assumptions.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/f80dfee2d5fa7678f0052a8ab3cfca9496a112ca.1631699928.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
Get rid of the need to do re-expand and revert on an iterator when we
encounter a short IO, or failure that warrants a retry. Use the new
state save/restore helpers instead.
We keep the iov_iter_state persistent across retries, if we need to
restart the read or write operation. If there's a pending retry, the
operation will always exit with the state correctly saved.
Signed-off-by: Jens Axboe <[email protected]>
|
|
A common complaint is that using O_NONBLOCK files with io_uring can be a
bit of a pain. Be a bit nicer and allow normal retry IFF the file does
support async behavior. This makes it possible to use io_uring more
reliably with O_NONBLOCK files, for use cases where it either isn't
possible or feasible to modify the file flags.
Cc: [email protected]
Reported-and-tested-by: Dan Melnic <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
It might be inconvenient that direct open/accept deviates from the
update semantics and fails if the slot is taken instead of removing a
file sitting there. Implement this auto-removal.
Note that removal might need to allocate and so may fail. However, if an
empty slot is specified, it's guaraneed to not fail on the fd
installation side for valid userspace programs. It's needed for users
who can't tolerate such failures, e.g. accept where the other end
never retries.
Suggested-by: Franz-B. Tuneke <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/c896f14ea46b0eaa6c09d93149e665c2c37979b4.1631632300.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
Move get_timespec() section in io_cqring_wait() before the sigmask
saving, otherwise we'll fail to restore sigmask once get_timespec()
returns error.
Fixes: c73ebb685fb6 ("io_uring: add timeout support for io_uring_enter()")
Signed-off-by: Xiaoguang Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
We need to re-check sqd->thread after we've dropped the lock. Pin
the sqd before doing the lockdep lock dance, and check if the thread
is alive after that. It's either NULL or alive, as the SQPOLL thread
cannot exit without holding the same sqd->lock.
Reported-and-tested-by: [email protected]
Fixes: fa84693b3c89 ("io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL")
Signed-off-by: Jens Axboe <[email protected]>
|
|
Correct kernel-doc comments pointed out by the
automated kernel test robot.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
checkpatch complains about source files with filenames (e.g. in
these cases just below the SPDX header in comments at the top of
various files in fs/cifs). It also is helpful to change this now
so will be less confusing when the parent directory is renamed
e.g. from fs/cifs to fs/smb_client (or fs/smbfs)
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
We do not have any reason to keep old linear search in. Before this was
used for error path or if table was so big that it cannot be allocated.
Current binary search implementation won't need error path. Remove old
references to linear entry search.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
We could try to optimize algorithm to first fill just small table and
after that use bigger table all the way up to ARRAY_SIZE(offs). This
way we can use bigger search array, but not lose benefits with entry
count smaller < ARRAY_SIZE(offs).
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
Current binary search allocates memory for table and fill whole table
before we start actual binary search. This is quite inefficient because
table fill will always be O(n). Also if table is huge we need to
reallocate memory which is costly.
This implementation use just stack memory and always when table is full
we will check if last element is <= and if not start table fill again.
The idea was that it would be same cost as table reallocation.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
We have lot of unnecessary headers in these files. Remove them so that
we help compiler a little bit.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
There is lot of headers which we do not need in this file. Delete them
and add what we really need. Here is list which identify why we need
this header.
<linux/kernel.h> // min()
<linux/slab.h> // kzalloc()
<linux/stddef.h> // offsetof()
<linux/string.h> // memcpy(), memset()
<linux/types.h> // u8, size_t, etc.
"debug.h" // PtrOffset()
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
There is no headers. They will be included through ntfs_fs.c, but that
is not right thing to do. Let's include headers what this file need
straight away.
types.h is needed for __le16, u8 etc.
kernel.h is needed for le16_to_cpu()
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
We only need linux/types.h for types like u8 etc. So we can remove rest
and help compiler a little bit.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
size_t needs header. Add missing header guards so that compiler will
only include these ones.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
We do not have headers at all in this file. We should have them so that
not every .c file needs to include all of the stuff which this file need
for building. This way we can remove some headers from other files and
get better picture what is needed. This can save some compilation time.
And this can help if we sometimes want to separate this one big header.
Also use forward declarations for structs and enums when it not included
straight with include and it is used in function declarations input.
This will prevent possible compiler warning:
xxx declared inside parameter list will not be visible
outside of this definition or declaration
Here is list which I made when parsing this. There is not necessarily
all example from this header file, but this just proofs we need it.
<linux/blkdev.h> SECTOR_SHIFT
<linux/buffer_head.h> sb_bread(), put_bh
<linux/cleancache.h> put_page()
<linux/fs.h> struct inode (Just struct ntfs_inode need it)
<linux/highmem.h> kunmap(), kmap()
<linux/kernel.h> cpu_to_leXX() ALIGN
<linux/mm.h> kvfree()
<linux/mutex.h> struct mutex, mutex_(un/try)lock()
<linux/page-flags.h> PageError()
<linux/pagemap.h> read_mapping_page()
<linux/rbtree.h> struct rb_root
<linux/rwsem.h> struct rw_semaphore
<linux/slab.h> krfree(), kzalloc()
<linux/string.h> memset()
<linux/time64.h> struct timespec64
<linux/types.h> uXX, __leXX
<linux/uidgid.h> kuid_t, kgid_t
<asm/div64.h> do_div()
<asm/page.h> PAGE_SIZE
"debug.h" ntfs_err() (Just one entry. Maybe we can drop this)
"ntfs.h" Do you even ask?
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
We do not have header files at all in this file. Add following headers
and there is also explanation which for it was added. Note that
explanation might not be complete, but it just proofs it is needed.
<linux/blkdev.h> // SECTOR_SHIFT
<linux/build_bug.h> // static_assert()
<linux/kernel.h> // cpu_to_le64, cpu_to_le32, ALIGN
<linux/stddef.h> // offsetof()
<linux/string.h> // memcmp()
<linux/types.h> //__le32, __le16
"debug.h" // PtrOffset(), Add2Ptr()
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
Add forward declarations for structs so that we can include this file
without warnings even without linux/fs.h
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
The variable err is being initialized with a value that is never read, it
is being updated later on. The assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
The items passed in the array pointed by the arg parameter
of IORING_REGISTER_IOWQ_MAX_WORKERS io_uring_register operation
carry certain semantics: they refer to different io-wq worker categories;
provide IO_WQ_* constants in the UAPI, so these categories can be referenced
in the user space code.
Suggested-by: Jens Axboe <[email protected]>
Complements: 2e480058ddc21ec5 ("io-wq: provide a way to limit max number of workers")
Signed-off-by: Eugene Syromiatnikov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
When an afs file or directory is modified locally such that the total file
size is extended, i_blocks needs to be recalculated too.
Fix this by making afs_write_end() and afs_edit_dir_add() call
afs_set_i_size() rather than setting inode->i_size directly as that also
recalculates inode->i_blocks.
This can be tested by creating and writing into directories and files and
then examining them with du. Without this change, directories show a 4
blocks (they start out at 2048 bytes) and files show 0 blocks; with this
change, they should show a number of blocks proportional to the file size
rounded up to 1024.
Fixes: 31143d5d515e ("AFS: implement basic file write support")
Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...")
Reported-by: Markus Suvanto <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Marc Dionne <[email protected]>
Tested-by: Markus Suvanto <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163113612442.352844.11162345591911691150.stgit@warthog.procyon.org.uk/
|
|
AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.
This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.
Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.
This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.
Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.
This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
This can be observed in the server's FileLog with something like the
following appearing:
Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5
Note the file position of 18446744071815340032. This is the requested file
position sign-extended.
Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Marc Dionne <[email protected]>
Tested-by: Markus Suvanto <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
Link: https://lore.kernel.org/r/[email protected]/
|
|
Try to avoid taking the RCU read lock when checking the validity of a
vnode's callback state. The only thing it's needed for is to pin the
parent volume's server list whilst we search it to find the record of the
server we're currently using to see if it has been reinitialised (ie. it
sent us a CB.InitCallBackState* RPC).
Do this by the following means:
(1) Keep an additional per-cell counter (fs_s_break) that's incremented
each time any of the fileservers in the cell reinitialises.
Since the new counter can be accessed without RCU from the vnode, we
can check that first - and only if it differs, get the RCU read lock
and check the volume's server list.
(2) Replace afs_get_s_break_rcu() with afs_check_server_good() which now
indicates whether the callback promise is still expected to be present
on the server. This does the checks as described in (1).
(3) Restructure afs_check_validity() to take account of the change in (2).
We can also get rid of the valid variable and just use the need_clear
variable with the addition of the afs_cb_break_no_promise reason.
(4) afs_check_validity() probably shouldn't be altering vnode->cb_v_break
and vnode->cb_s_break when it doesn't have cb_lock exclusively locked.
Move the change to vnode->cb_v_break to __afs_break_callback().
Delegate the change to vnode->cb_s_break to afs_select_fileserver()
and set vnode->cb_fs_s_break there also.
(5) afs_validate() no longer needs to get the RCU read lock around its
call to afs_check_validity() - and can skip the call entirely if we
don't have a promise.
Signed-off-by: David Howells <[email protected]>
Tested-by: Markus Suvanto <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163111669583.283156.1397603105683094563.stgit@warthog.procyon.org.uk/
|
|
Fix the coherency management of mmap'd data such that 3rd-party changes
become visible as soon as possible after the callback notification is
delivered by the fileserver. This is done by the following means:
(1) When we break a callback on a vnode specified by the CB.CallBack call
from the server, we queue a work item (vnode->cb_work) to go and
clobber all the PTEs mapping to that inode.
This causes the CPU to trip through the ->map_pages() and
->page_mkwrite() handlers if userspace attempts to access the page(s)
again.
(Ideally, this would be done in the service handler for CB.CallBack,
but the server is waiting for our reply before considering, and we
have a list of vnodes, all of which need breaking - and the process of
getting the mmap_lock and stripping the PTEs on all CPUs could be
quite slow.)
(2) Call afs_validate() from the ->map_pages() handler to check to see if
the file has changed and to get a new callback promise from the
server.
Also handle the fileserver telling us that it's dropping all callbacks,
possibly after it's been restarted by sending us a CB.InitCallBackState*
call by the following means:
(3) Maintain a per-cell list of afs files that are currently mmap'd
(cell->fs_open_mmaps).
(4) Add a work item to each server that is invoked if there are any open
mmaps when CB.InitCallBackState happens. This work item goes through
the aforementioned list and invokes the vnode->cb_work work item for
each one that is currently using this server.
This causes the PTEs to be cleared, causing ->map_pages() or
->page_mkwrite() to be called again, thereby calling afs_validate()
again.
I've chosen to simply strip the PTEs at the point of notification reception
rather than invalidate all the pages as well because (a) it's faster, (b)
we may get a notification for other reasons than the data being altered (in
which case we don't want to clobber the pagecache) and (c) we need to ask
the server to find out - and I don't want to wait for the reply before
holding up userspace.
This was tested using the attached test program:
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
size_t size = getpagesize();
unsigned char *p;
bool mod = (argc == 3);
int fd;
if (argc != 2 && argc != 3) {
fprintf(stderr, "Format: %s <file> [mod]\n", argv[0]);
exit(2);
}
fd = open(argv[1], mod ? O_RDWR : O_RDONLY);
if (fd < 0) {
perror(argv[1]);
exit(1);
}
p = mmap(NULL, size, mod ? PROT_READ|PROT_WRITE : PROT_READ,
MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}
for (;;) {
if (mod) {
p[0]++;
msync(p, size, MS_ASYNC);
fsync(fd);
}
printf("%02x", p[0]);
fflush(stdout);
sleep(1);
}
}
It runs in two modes: in one mode, it mmaps a file, then sits in a loop
reading the first byte, printing it and sleeping for a second; in the
second mode it mmaps a file, then sits in a loop incrementing the first
byte and flushing, then printing and sleeping.
Two instances of this program can be run on different machines, one doing
the reading and one doing the writing. The reader should see the changes
made by the writer, but without this patch, they aren't because validity
checking is being done lazily - only on entry to the filesystem.
Testing the InitCallBackState change is more complicated. The server has
to be taken offline, the saved callback state file removed and then the
server restarted whilst the reading-mode program continues to run. The
client machine then has to poke the server to trigger the InitCallBackState
call.
Signed-off-by: David Howells <[email protected]>
Tested-by: Markus Suvanto <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163111668833.283156.382633263709075739.stgit@warthog.procyon.org.uk/
|
|
The AFS filesystem is currently triggering the silly-rename cleanup from
afs_d_revalidate() when it sees that a dentry has been changed by a third
party[1]. It should not be doing this as the cleanup includes deleting the
silly-rename target file on iput.
Fix this by removing the places in the d_revalidate handling that validate
anything other than the directory and the dirent. It probably should not
be looking to validate the target inode of the dentry also.
This includes removing the point in afs_d_revalidate() where the inode that
a dentry used to point to was marked as being deleted (AFS_VNODE_DELETED).
We don't know it got deleted. It could have been renamed or it could have
hard links remaining.
This was reproduced by cloning a git repo onto an afs volume on one
machine, switching to another machine and doing "git status", then
switching back to the first and doing "git status". The second status
would show weird output due to ".git/index" getting deleted by the above
mentioned mechanism.
A simpler way to do it is to do:
machine 1: touch a
machine 2: touch b; mv -f b a
machine 1: stat a
on an afs volume. The bug shows up as the stat failing with ENOENT and the
file server log showing that machine 1 deleted "a".
Fixes: 79ddbfa500b3 ("afs: Implement sillyrename for unlink and rename")
Reported-by: Markus Suvanto <[email protected]>
Signed-off-by: David Howells <[email protected]>
Tested-by: Markus Suvanto <[email protected]>
cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c4 [1]
Link: https://lore.kernel.org/r/163111668100.283156.3851669884664475428.stgit@warthog.procyon.org.uk/
|
|
afs_d_revalidate() should only be validating the directory entry it is
given and the directory to which that belongs; it shouldn't be validating
the inode/vnode to which that dentry points. Besides, validation need to
be done even if we don't call afs_d_revalidate() - which might be the case
if we're starting from a file descriptor.
In order for afs_d_revalidate() to be fixed, validation points must be
added in some other places. Certain directory operations, such as
afs_unlink(), already check this, but not all and not all file operations
either.
Note that the validation of a vnode not only checks to see if the
attributes we have are correct, but also gets a promise from the server to
notify us if that file gets changed by a third party.
Add the following checks:
- Check the vnode we're going to make a hard link to.
- Check the vnode we're going to move/rename.
- Check the vnode we're going to read from.
- Check the vnode we're going to write to.
- Check the vnode we're going to sync.
- Check the vnode we're going to make a mapped page writable for.
Some of these aren't strictly necessary as we're going to perform a server
operation that might get the attributes anyway from which we can determine
if something changed - though it might not get us a callback promise.
Signed-off-by: David Howells <[email protected]>
Tested-by: Markus Suvanto <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163111667354.283156.12720698333342917516.stgit@warthog.procyon.org.uk/
|
|
Given max_worker is 1, and we currently have 1 running and it is
exiting. There may be race like:
io_wqe_enqueue worker1
no work there and timeout
unlock(wqe->lock)
->insert work
-->io_worker_exit
lock(wqe->lock)
->if(!nr_workers) //it's still 1
unlock(wqe->lock)
goto run_cancel
lock(wqe->lock)
nr_workers--
->dec_running
->worker creation fails
unlock(wqe->lock)
We enqueued one work but there is no workers, causes hung.
Signed-off-by: Hao Xu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Remove do_create to save a local variable.
Signed-off-by: Hao Xu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When setting up the next segment, we check what type the iter is and
handle it accordingly. However, when incrementing and processed amount
we do not, and both iter advance and addr/len are adjusted, regardless
of type. Split the increment side just like we do on the setup side.
Fixes: 4017eb91a9e7 ("io_uring: make loop_rw_iter() use original user supplied pointers")
Cc: [email protected]
Reported-by: Valentina Palmiotti <[email protected]>
Reviewed-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull namei updates from Al Viro:
"Clearing fallout from mkdirat in io_uring series. The fix in the
kern_path_locked() patch plus associated cleanups"
* 'misc.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
putname(): IS_ERR_OR_NULL() is wrong here
namei: Standardize callers of filename_create()
namei: Standardize callers of filename_lookup()
rename __filename_parentat() to filename_parentat()
namei: Fix use after free in kern_path_locked
|
|
Pull smbfs updates from Steve French:
"cifs/smb3 updates:
- DFS reconnect fix
- begin creating common headers for server and client
- rename the cifs_common directory to smbfs_common to be more
consistent ie change use of the name cifs to smb (smb3 or smbfs is
more accurate, as the very old cifs dialect has long been
superseded by smb3 dialects).
In the future we can rename the fs/cifs directory to fs/smbfs.
This does not include the set of multichannel fixes nor the two
deferred close fixes (they are still being reviewed and tested)"
* tag '5.15-rc-cifs-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: properly invalidate cached root handle when closing it
cifs: move SMB FSCTL definitions to common code
cifs: rename cifs_common to smbfs_common
cifs: update FSCTL definitions
|
|
Pull virtio updates from Michael Tsirkin:
- vduse driver ("vDPA Device in Userspace") supporting emulated virtio
block devices
- virtio-vsock support for end of record with SEQPACKET
- vdpa: mac and mq support for ifcvf and mlx5
- vdpa: management netlink for ifcvf
- virtio-i2c, gpio dt bindings
- misc fixes and cleanups
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits)
Documentation: Add documentation for VDUSE
vduse: Introduce VDUSE - vDPA Device in Userspace
vduse: Implement an MMU-based software IOTLB
vdpa: Support transferring virtual addressing during DMA mapping
vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()
vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()
vhost-iotlb: Add an opaque pointer for vhost IOTLB
vhost-vdpa: Handle the failure of vdpa_reset()
vdpa: Add reset callback in vdpa_config_ops
vdpa: Fix some coding style issues
file: Export receive_fd() to modules
eventfd: Export eventfd_wake_count to modules
iova: Export alloc_iova_fast() and free_iova_fast()
virtio-blk: remove unneeded "likely" statements
virtio-balloon: Use virtio_find_vqs() helper
vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro
vsock_test: update message bounds test for MSG_EOR
af_vsock: rename variables in receive loop
virtio/vsock: support MSG_EOR bit processing
vhost/vsock: support MSG_EOR bit processing
...
|
|
Pull io_uring fixes from Jens Axboe:
- Fix an off-by-one in a BUILD_BUG_ON() check. Not a real issue right
now as we have plenty of flags left, but could become one. (Hao)
- Fix lockdep issue introduced in this merge window (me)
- Fix a few issues with the worker creation (me, Pavel, Qiang)
- Fix regression with wq_has_sleeper() for IOPOLL (Pavel)
- Timeout link error propagation fix (Pavel)
* tag 'io_uring-5.15-2021-09-11' of git://git.kernel.dk/linux-block:
io_uring: fix off-by-one in BUILD_BUG_ON check of __REQ_F_LAST_BIT
io_uring: fail links of cancelled timeouts
io-wq: fix memory leak in create_io_worker()
io-wq: fix silly logic error in io_task_work_match()
io_uring: drop ctx->uring_lock before acquiring sqd->lock
io_uring: fix missing mb() before waitqueue_active
io-wq: fix cancellation on create-worker failure
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph:
- fix nvmet command set reporting for passthrough controllers (Adam Manzanares)
- update a MAINTAINERS email address (Chaitanya Kulkarni)
- set QUEUE_FLAG_NOWAIT for nvme-multipth (me)
- handle errors from add_disk() (Luis Chamberlain)
- update the keep alive interval when kato is modified (Tatsuya Sasaki)
- fix a buffer overrun in nvmet_subsys_attr_serial (Hannes Reinecke)
- do not reset transport on data digest errors in nvme-tcp (Daniel Wagner)
- only call synchronize_srcu when clearing current path (Daniel Wagner)
- revalidate paths during rescan (Hannes Reinecke)
- Split out the fs/block_dev into block/fops.c and block/bdev.c, which
has been long overdue. Do this now before -rc1, to avoid annoying
conflicts due to this (Christoph)
- blk-throtl use-after-free fix (Li)
- Improve plug depth for multi-device plugs, greatly increasing md
resync performance (Song)
- blkdev_show() locking fix (Tetsuo)
- n64cart error check fix (Yang)
* tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-block:
n64cart: fix return value check in n64cart_probe()
blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queues
block: move fs/block_dev.c to block/bdev.c
block: split out operations on block special files
blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()
block: genhd: don't call blkdev_show() with major_names_lock held
nvme: update MAINTAINERS email address
nvme: add error handling support for add_disk()
nvme: only call synchronize_srcu when clearing current path
nvme: update keep alive interval when kato is modified
nvme-tcp: Do not reset transport on data digest errors
nvmet: fixup buffer overrun in nvmet_subsys_attr_serial()
nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req
nvmet: looks at the passthrough controller when initializing CAP
nvme: move nvme_multi_css into nvme.h
nvme-multipath: revalidate paths during rescan
nvme-multipath: set QUEUE_FLAG_NOWAIT
|