aboutsummaryrefslogtreecommitdiff
path: root/io_uring
AgeCommit message (Collapse)AuthorFilesLines
2024-04-15io_uring: drop ->prep_async()Jens Axboe5-65/+25
It's now unused, drop the code related to it. This includes the io_issue_defs->manual alloc field. While in there, and since ->async_size is now being used a bit more frequently and in the issue path, move it to io_issue_defs[]. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/uring_cmd: defer SQE copying until it's neededJens Axboe1-6/+19
The previous commit turned on async data for uring_cmd, and did the basic conversion of setting everything up on the prep side. However, for a lot of use cases, -EIOCBQUEUED will get returned on issue, as the operation got successfully queued. For that case, a persistent SQE isn't needed, as it's just used for issue. Unless execution goes async immediately, defer copying the double SQE until it's necessary. This greatly reduces the overhead of such commands, as evidenced by a perf diff from before and after this change: 10.60% -8.58% [kernel.vmlinux] [k] io_uring_cmd_prep where the prep side drops from 10.60% to ~2%, which is more expected. Performance also rises from ~113M IOPS to ~122M IOPS, bringing us back to where it was before the async command prep. Tested-by: Anuj Gupta <[email protected]> Reviewed-by: Anuj Gupta <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/uring_cmd: switch to always allocating async dataJens Axboe4-23/+68
Basic conversion ensuring async_data is allocated off the prep path. Adds a basic alloc cache as well, as passthrough IO can be quite high in rate. Tested-by: Anuj Gupta <[email protected]> Reviewed-by: Anuj Gupta <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: move connect to always using async dataJens Axboe3-37/+12
While doing that, get rid of io_async_connect and just use the generic io_async_msghdr. Both of them have a struct sockaddr_storage in there, and while io_async_msghdr is bigger, if the same type can be used then the netmsg_cache can get reused for connect as well. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/rw: add iovec recyclingJens Axboe2-6/+39
Let the io_async_rw hold on to the iovec and reuse it, rather than always allocate and free them. Also enables KASAN for the iovec entries, so that reuse can be detected even while they are in the cache. While doing so, shrink io_async_rw by getting rid of the bigger embedded fast iovec. Since iovecs are being recycled now, shrink it from 8 to 1. This reduces the io_async_rw size from 264 to 160 bytes, a 40% reduction. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/rw: cleanup retry pathJens Axboe1-27/+8
We no longer need to gate a potential retry on whether or not the context matches our original task, as all read/write operations have been fully prepared upfront. This means there's never any re-import needed, and hence we can always retry requests. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: get rid of struct io_rw_stateJens Axboe2-29/+26
A separate state struct is not needed anymore, just fold it in with io_async_rw. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/rw: always setup io_async_rw for read/write requestsJens Axboe4-298/+277
read/write requests try to put everything on the stack, and then alloc and copy if a retry is needed. This necessitates a bunch of nasty code that deals with intermediate state. Get rid of this, and have the prep side setup everything that is needed upfront, which greatly simplifies the opcode handlers. This includes adding an alloc cache for io_async_rw, to make it cheap to handle. In terms of cost, this should be basically free and transparent. For the worst case of {READ,WRITE}_FIXED which didn't need it before, performance is unaffected in the normal peak workload that is being used to test that. Still runs at 122M IOPS. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: drop 'kmsg' parameter from io_req_msg_cleanup()Jens Axboe1-6/+5
Now that iovec recycling is being done, the iovec is no longer being freed in there. Hence the kmsg parameter is now useless. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: add iovec recyclingJens Axboe2-53/+91
Right now the io_async_msghdr is recycled to avoid the overhead of allocating+freeing it for every request. But the iovec is not included, hence that will be allocated and freed for each transfer regardless. This commit enables recyling of the iovec between io_async_msghdr recycles. This avoids alloc+free for each one if an iovec is used, and on top of that, it extends the cache hot nature of msg to the iovec as well. Also enables KASAN for the iovec entries, so that reuse can be detected even while they are in the cache. The io_async_msghdr also shrinks from 376 -> 288 bytes, an 88 byte saving (or ~23% smaller), as the fast_iovec entry is dropped from 8 entries to a single entry. There's no point keeping a big fast iovec entry, if iovecs aren't being allocated and freed continually. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: remove (now) dead code in io_netmsg_recycle()Jens Axboe1-1/+1
All net commands have async data at this point, there's no reason to check if this is the case or not. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: kill io_msg_alloc_async_prep()Jens Axboe1-21/+10
We now ONLY call io_msg_alloc_async() from inside prep handling, which is always locked. No need for this helper anymore, or the check in io_msg_alloc_async() on whether the ring is locked or not. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: get rid of ->prep_async() for send sideJens Axboe3-114/+46
Move the io_async_msghdr out of the issue path and into prep handling, e it's now done unconditionally and hence does not need to be part of the issue path. This means any usage of io_sendrecv_prep_async() and io_sendmsg_prep_async(), and hence the forced async setup path is now unified with the normal prep setup. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: get rid of ->prep_async() for receive sideJens Axboe3-46/+28
Move the io_async_msghdr out of the issue path and into prep handling, since it's now done unconditionally and hence does not need to be part of the issue path. This reduces the footprint of the multishot fast path of multiple invocations of ->issue() per prep, and also means that using ->prep_async() can be dropped for recvmsg asthis is now done via setup on the prep side. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: always set kmsg->msg.msg_control_user before issueJens Axboe1-2/+3
We currently set this separately for async/sync entry, but let's just move it to a generic pre-issue spot and eliminate the difference between the two. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: always setup an io_async_msghdrJens Axboe1-70/+47
Rather than use an on-stack one and then need to allocate and copy if async execution is required, always grab one upfront. This should be very cheap, and potentially even have cache hotness benefits for back-to-back send/recv requests. For any recv type of request, this is probably a good choice in general, as it's expected that no data is available initially. For send this is not necessarily the case, as space in the socket buffer is expected to be available. However, getting a cached io_async_msghdr is very cheap, and as it should be cache hot, probably the difference here is neglible, if any. A nice side benefit is that io_setup_async_msg can get killed completely, which has some nasty iovec manipulation code. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: unify cleanup handlingJens Axboe1-15/+11
Now that recv/recvmsg both do the same cleanup, put it in the retry and finish handlers. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: switch io_recv() to using io_async_msghdrJens Axboe3-31/+53
No functional changes in this patch, just in preparation for carrying more state than what is available now, if necessary. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdrJens Axboe2-94/+101
No functional changes in this patch, just in preparation for carrying more state then what is being done now, if necessary. While unifying some of this code, add a generic send setup prep handler that they can both use. This gets rid of some manual msghdr and sockaddr on the stack, and makes it look a bit more like the sendmsg/recvmsg variants. Going forward, more can get unified on top. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/alloc_cache: shrink default max entries from 512 to 128Jens Axboe1-1/+1
In practice, we just need to recycle a few elements for (by far) most use cases. Shrink the total size down from 512 to 128, which should be more than plenty. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: remove timeout/poll specific cancelationsJens Axboe1-9/+0
For historical reasons these were special cased, as they were the only ones that needed cancelation. But now we handle cancelations generally, and hence there's no need to check for these in io_ring_ctx_wait_and_kill() when io_uring_try_cancel_requests() handles both these and the rest as well. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: flush delayed fallback task_work in cancelationJens Axboe1-0/+2
Just like we run the inline task_work, ensure we also factor in and run the fallback task_work. Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: clean up io_lockdep_assert_cq_lockedPavel Begunkov1-6/+2
Move CONFIG_PROVE_LOCKING checks inside of io_lockdep_assert_cq_locked() and kill the else branch. Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/bbf33c429c9f6d7207a8fe66d1a5866ec2c99850.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: refactor io_req_complete_post()Pavel Begunkov2-19/+11
Make io_req_complete_post() to push all IORING_SETUP_IOPOLL requests to task_work, it's much cleaner and should normally happen. We couldn't do it before because there was a possibility of looping in complete_post() -> tw -> complete_post() -> ... Also, unexport the function and inline __io_req_complete_post(). Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/ea19c032ace3e0dd96ac4d991a063b0188037014.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: remove current check from complete_postPavel Begunkov1-1/+1
task_work execution is now always locked, and we shouldn't get into io_req_complete_post() from them. That means that complete_post() is always called out of the original task context and we don't even need to check current. Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/24ec27f27db0d8f58c974d8118dca1d345314ddc.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: get rid of intermediate aux cqe cachesPavel Begunkov2-50/+14
io_post_aux_cqe(), which is used for multishot requests, delays completions by putting CQEs into a temporary array for the purpose completion lock/flush batching. DEFER_TASKRUN doesn't need any locking, so for it we can put completions directly into the CQ and defer post completion handling with a flag. That leaves !DEFER_TASKRUN, which is not that interesting / hot for multishot requests, so have conditional locking with deferred flush for them. Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/b1d05a81fd27aaa2a07f9860af13059e7ad7a890.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: refactor io_fill_cqe_req_auxPavel Begunkov6-24/+9
The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task context, either from the syscall path or in task_work. Since task_work now always takes the ctx lock implying IO_URING_F_COMPLETE_DEFER, we can just assume that the function is always called with its defer argument set to true. Kill the argument. Also rename the function for more consistency as "fill" in CQE related functions was usually meant for raw interfaces only copying data into the CQ without any locking, waking the user and other accounting "post" functions take care of. Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/93423d106c33116c7d06bf277f651aa68b427328.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: remove struct io_tw_state::lockedPavel Begunkov7-45/+17
ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which is not empty, because it still serves the purpose of indicating which function is a tw callback and forcing users not to invoke them carelessly out of a wrong context. The removal can always be done later. Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/e95e1ea116d0bfa54b656076e6a977bc221392a4.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring: force tw ctx lockingPavel Begunkov1-12/+9
We can run normal task_work without locking the ctx, however we try to lock anyway and most handlers prefer or require it locked. It might have been interesting to multi-submitter ring with high contention completing async read/write requests via task_work, however that will still need to go through io_req_complete_post() and potentially take the lock for rsrc node putting or some other case. In other words, it's hard to care about it, so alawys force the locking. The case described would also because of various io_uring caches. Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/6ae858f2ef562e6ed9f13c60978c0d48926954ba.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/rw: avoid punting to io-wq directlyPavel Begunkov3-12/+5
kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core code io_uring handle offloading. Signed-off-by: Pavel Begunkov <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/cmd: fix tw <-> issue_flags conversionPavel Begunkov1-3/+10
!IO_URING_F_UNLOCKED does not translate to availability of the deferred completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should pass and look for to use io_req_complete_defer() and other variants. Luckily, it's not a real problem as two wrongs actually made it right, at least as far as io_uring_cmd_work() goes. Signed-off-by: Pavel Begunkov <[email protected]> Reviewed-by: Ming Lei <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/aef76d34fe9410df8ecc42a14544fd76cd9d8b9e.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/cmd: kill one issue_flags to tw conversionPavel Begunkov1-4/+4
io_uring cmd converts struct io_tw_state to issue_flags and later back to io_tw_state, it's awfully ill-fated, not to mention that intermediate issue_flags state is not correct. Get rid of the last conversion, drag through tw everything that came with IO_URING_F_UNLOCKED, and replace io_req_complete_defer() with a direct call to io_req_complete_defer(), at least for the time being. Signed-off-by: Pavel Begunkov <[email protected]> Reviewed-by: Ming Lei <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/c53fa3df749752bd058cf6f824a90704822d6bcc.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-15io_uring/cmd: move io_uring_try_cancel_uring_cmd()Pavel Begunkov4-38/+41
io_uring_try_cancel_uring_cmd() is a part of the cmd handling so let's move it closer to all cmd bits into uring_cmd.c Signed-off-by: Pavel Begunkov <[email protected]> Reviewed-by: Ming Lei <[email protected]> Tested-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/43a3937af4933655f0fd9362c381802f804f43de.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-08io_uring/net: restore msg_control on sendzc retryPavel Begunkov1-0/+1
cac9e4418f4cb ("io_uring/net: save msghdr->msg_control for retries") reinstatiates msg_control before every __sys_sendmsg_sock(), since the function can overwrite the value in msghdr. We need to do same for zerocopy sendmsg. Cc: [email protected] Fixes: 493108d95f146 ("io_uring/net: zerocopy sendmsg") Link: https://github.com/axboe/liburing/issues/1067 Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/cc1d5d9df0576fa66ddad4420d240a98a020b267.1712596179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-04-07fs: claw back a few FMODE_* bitsChristian Brauner2-5/+6
There's a bunch of flags that are purely based on what the file operations support while also never being conditionally set or unset. IOW, they're not subject to change for individual files. Imho, such flags don't need to live in f_mode they might as well live in the fops structs itself. And the fops struct already has that lonely mmap_supported_flags member. We might as well turn that into a generic fop_flags member and move a few flags from FMODE_* space into FOP_* space. That gets us four FMODE_* bits back and the ability for new static flags that are about file ops to not have to live in FMODE_* space but in their own FOP_* space. It's not the most beautiful thing ever but it gets the job done. Yes, there'll be an additional pointer chase but hopefully that won't matter for these flags. I suspect there's a few more we can move into there and that we can also redirect a bunch of new flag suggestions that follow this pattern into the fop_flags field instead of f_mode. Link: https://lore.kernel.org/r/20240328-gewendet-spargel-aa60a030ef74@brauner Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-04-05io_uring: Fix io_cqring_wait() not restoring sigmask on get_timespec64() failureAlexey Izbyshev1-13/+13
This bug was introduced in commit 950e79dd7313 ("io_uring: minor io_cqring_wait() optimization"), which was made in preparation for adc8682ec690 ("io_uring: Add support for napi_busy_poll"). The latter got reverted in cb3182167325 ("Revert "io_uring: Add support for napi_busy_poll""), so simply undo the former as well. Cc: [email protected] Fixes: 950e79dd7313 ("io_uring: minor io_cqring_wait() optimization") Signed-off-by: Alexey Izbyshev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-04-02io_uring/kbuf: hold io_buffer_list reference over mmapJens Axboe3-14/+36
If we look up the kbuf, ensure that it doesn't get unregistered until after we're done with it. Since we're inside mmap, we cannot safely use the io_uring lock. Rely on the fact that we can lookup the buffer list under RCU now and grab a reference to it, preventing it from being unregistered until we're done with it. The lookup returns the io_buffer_list directly with it referenced. Cc: [email protected] # v6.4+ Fixes: 5cf4f52e6d8a ("io_uring: free io_buffer_list entries via RCU") Signed-off-by: Jens Axboe <[email protected]>
2024-04-02io_uring/kbuf: protect io_buffer_list teardown with a referenceJens Axboe2-4/+13
No functional changes in this patch, just in preparation for being able to keep the buffer list alive outside of the ctx->uring_lock. Cc: [email protected] # v6.4+ Signed-off-by: Jens Axboe <[email protected]>
2024-04-02io_uring/kbuf: get rid of bl->is_readyJens Axboe2-10/+0
Now that xarray is being exclusively used for the buffer_list lookup, this check is no longer needed. Get rid of it and the is_ready member. Cc: [email protected] # v6.4+ Signed-off-by: Jens Axboe <[email protected]>
2024-04-02io_uring/kbuf: get rid of lower BGID listsJens Axboe2-64/+8
Just rely on the xarray for any kind of bgid. This simplifies things, and it really doesn't bring us much, if anything. Cc: [email protected] # v6.4+ Signed-off-by: Jens Axboe <[email protected]>
2024-04-02io_uring: use private workqueue for exit workJens Axboe1-1/+4
Rather than use the system unbound event workqueue, use an io_uring specific one. This avoids dependencies with the tty, which also uses the system_unbound_wq, and issues flushes of said workqueue from inside its poll handling. Cc: [email protected] Reported-by: Rasmus Karlsson <[email protected]> Tested-by: Rasmus Karlsson <[email protected]> Tested-by: Iskren Chernev <[email protected]> Link: https://github.com/axboe/liburing/issues/1113 Signed-off-by: Jens Axboe <[email protected]>
2024-04-01io_uring: disable io-wq execution of multishot NOWAIT requestsJens Axboe1-4/+9
Do the same check for direct io-wq execution for multishot requests that commit 2a975d426c82 did for the inline execution, and disable multishot mode (and revert to single shot) if the file type doesn't support NOWAIT, and isn't opened in O_NONBLOCK mode. For multishot to work properly, it's a requirement that nonblocking read attempts can be done. Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-04-01io_uring/rw: don't allow multishot reads without NOWAIT supportJens Axboe1-1/+8
Supporting multishot reads requires support for NOWAIT, as the alternative would be always having io-wq execute the work item whenever the poll readiness triggered. Any fast file type will have NOWAIT support (eg it understands both O_NONBLOCK and IOCB_NOWAIT). If the given file type does not, then simply resort to single shot execution. Cc: [email protected] Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Jens Axboe <[email protected]>
2024-03-18io_uring/sqpoll: early exit thread if task_context wasn't allocatedJens Axboe1-1/+5
Ideally we'd want to simply kill the task rather than wake it, but for now let's just add a startup check that causes the thread to exit. This can only happen if io_uring_alloc_task_context() fails, which generally requires fault injection. Reported-by: Ubisectech Sirius <[email protected]> Fixes: af5d68f8892f ("io_uring/sqpoll: manage task_work privately") Signed-off-by: Jens Axboe <[email protected]>
2024-03-16io_uring: clear opcode specific data for an early failureJens Axboe1-9/+16
If failure happens before the opcode prep handler is called, ensure that we clear the opcode specific area of the request, which holds data specific to that request type. This prevents errors where opcode handlers either don't get to clear per-request private data since prep isn't even called. Reported-and-tested-by: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-03-16io_uring/net: ensure async prep handlers always initialize ->done_ioJens Axboe1-1/+8
If we get a request with IOSQE_ASYNC set, then we first run the prep async handlers. But if we then fail setting it up and want to post a CQE with -EINVAL, we use ->done_io. This was previously guarded with REQ_F_PARTIAL_IO, and the normal setup handlers do set it up before any potential errors, but we need to cover the async setup too. Fixes: 9817ad85899f ("io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io") Signed-off-by: Jens Axboe <[email protected]>
2024-03-15io_uring/waitid: always remove waitid entry for cancel allJens Axboe1-6/+1
We know the request is either being removed, or already in the process of being removed through task_work, so we can delete it from our waitid list upfront. This is important for remove all conditions, as we otherwise will find it multiple times and prevent cancelation progress. Remove the dead check in cancelation as well for the hash_node being empty or not. We already have a waitid reference check for ownership, so we don't need to check the list too. Cc: [email protected] Fixes: f31ecf671ddc ("io_uring: add IORING_OP_WAITID support") Signed-off-by: Jens Axboe <[email protected]>
2024-03-15io_uring/futex: always remove futex entry for cancel allJens Axboe1-0/+1
We know the request is either being removed, or already in the process of being removed through task_work, so we can delete it from our futex list upfront. This is important for remove all conditions, as we otherwise will find it multiple times and prevent cancelation progress. Cc: [email protected] Fixes: 194bb58c6090 ("io_uring: add support for futex wake and wait") Fixes: 8f350194d5cf ("io_uring: add support for vectored futex waits") Signed-off-by: Jens Axboe <[email protected]>
2024-03-15io_uring: fix poll_remove stalled req completionPavel Begunkov1-2/+2
Taking the ctx lock is not enough to use the deferred request completion infrastructure, it'll get queued into the list but no one would expect it there, so it will sit there until next io_submit_flush_completions(). It's hard to care about the cancellation path, so complete it via tw. Fixes: ef7dfac51d8ed ("io_uring/poll: serialize poll linked timer start with poll removal") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/c446740bc16858f8a2a8dcdce899812f21d15f23.1710514702.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2024-03-13io_uring: Fix release of pinned pages when __io_uaddr_map failsGabriel Krisman Bertazi1-9/+13
Looking at the error path of __io_uaddr_map, if we fail after pinning the pages for any reasons, ret will be set to -EINVAL and the error handler won't properly release the pinned pages. I didn't manage to trigger it without forcing a failure, but it can happen in real life when memory is heavily fragmented. Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Fixes: 223ef4743164 ("io_uring: don't allow IORING_SETUP_NO_MMAP rings on highmem pages") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>