aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-12io_uring: optimise SQPOLL mm/files grabbingPavel Begunkov1-14/+13
There are two reasons for this. First is to optimise io_sq_thread_acquire_mm_files() for non-SQPOLL case, which currently do too many checks and function calls in the hot path, e.g. in io_init_req(). The second is to not grab mm/files when there are not needed. As __io_queue_sqe() issues only one request now, we can reuse io_sq_thread_acquire_mm_files() instead of unconditional acquire mm/files. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-12io_uring: optimise out unlikely link queuePavel Begunkov1-32/+10
__io_queue_sqe() tries to issue as much requests of a link as it can, and uses io_put_req_find_next() to extract a next one, targeting inline completed requests. As now __io_queue_sqe() is always used together with struct io_comp_state, it leaves next propagation only a small window and only for async reqs, that doesn't justify its existence. Remove it, make __io_queue_sqe() to issue only a head request. It simplifies the code and will allow other optimisations. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-12io_uring: take compl state from submit statePavel Begunkov1-6/+3
Completion and submission states are now coupled together, it's weird to get one from argument and another from ctx, do it consistently for io_req_free_batch(). It's also faster as we already have @state cached in registers. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11io_uring: inline io_complete_rw_common()Pavel Begunkov1-17/+9
__io_complete_rw() casts request to kiocb for it to be immediately container_of()'ed by io_complete_rw_common(). And the last function's name doesn't do a great job of illuminating its purposes, so just inline it in its only user. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11io_uring: move res check out of io_rw_reissue()Pavel Begunkov1-9/+8
We pass return code into io_rw_reissue() only to be able to check if it's -EAGAIN. That's not the cleanest approach and may prevent inlining of the non-EAGAIN fast path, so do it at call sites. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11io_uring: simplify iopoll reissuingPavel Begunkov1-21/+5
Don't stash -EAGAIN'ed iopoll requests into a list to reissue it later, do it eagerly. It removes overhead on keeping and checking that list, and allows in case of failure for these requests to be completed through normal iopoll completion path. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11io_uring: clean up io_req_free_batch_finish()Pavel Begunkov1-3/+1
io_req_free_batch_finish() is final and does not permit struct req_batch to be reused without re-init. To be more consistent don't clear ->task there. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11io_uring: move submit side state closer in the ringJens Axboe1-7/+10
We recently added the submit side req cache, but it was placed at the end of the struct. Move it near the other submission state for better memory placement, and reshuffle a few other members at the same time. Signed-off-by: Jens Axboe <[email protected]>
2021-02-11io_uring: assign file_slot prior to calling io_sqe_file_register()Jens Axboe1-1/+2
We use the assigned slot in io_sqe_file_register(), and a previous patch moved the assignment to after we have called it. This isn't super pretty, and will get cleaned up in the future. For now, fix the regression by restoring the previous assignment/clear of the file_slot. Fixes: ea64ec02b31d ("io_uring: deduplicate file table slot calculation") Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: remove redundant initialization of variable retColin Ian King1-1/+1
The variable ret is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: unpark SQPOLL thread for cancelationPavel Begunkov1-0/+5
We park SQPOLL task before going into io_uring_cancel_files(), so the task won't run task_works including those that might be important for the cancellation passes. In this case it's io_poll_remove_one(), which frees requests via io_put_req_deferred(). Unpark it for while waiting, it's ok as we disable submissions beforehand, so no new requests will be generated. INFO: task syz-executor893:8493 blocked for more than 143 seconds. Call Trace: context_switch kernel/sched/core.c:4327 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5078 schedule+0xcf/0x270 kernel/sched/core.c:5157 io_uring_cancel_files fs/io_uring.c:8912 [inline] io_uring_cancel_task_requests+0xe70/0x11a0 fs/io_uring.c:8979 __io_uring_files_cancel+0x110/0x1b0 fs/io_uring.c:9067 io_uring_files_cancel include/linux/io_uring.h:51 [inline] do_exit+0x2fe/0x2ae0 kernel/exit.c:780 do_group_exit+0x125/0x310 kernel/exit.c:922 __do_sys_exit_group kernel/exit.c:933 [inline] __se_sys_exit_group kernel/exit.c:931 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: [email protected] # 5.5+ Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: place ring SQ/CQ arrays under memcg memory limitsJens Axboe1-75/+10
Instead of imposing rlimit memlock limits for the rings themselves, ensure that we account them properly under memcg with __GFP_ACCOUNT. We retain rlimit memlock for registered buffers, this is just for the ring arrays themselves. Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: enable kmemcg account for io_uring requestsJens Axboe1-1/+2
This puts io_uring under the memory cgroups accounting and limits for requests. Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: enable req cache for IRQ driven IOJens Axboe1-20/+51
This is the last class of requests that cannot utilize the req alloc cache. Add a per-ctx req cache that is protected by the completion_lock, and refill our submit side cache when it gets over our batch count. Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: fix possible deadlock in io_uring_pollHao Xu1-2/+15
Abaci reported follow issue: [ 30.615891] ====================================================== [ 30.616648] WARNING: possible circular locking dependency detected [ 30.617423] 5.11.0-rc3-next-20210115 #1 Not tainted [ 30.618035] ------------------------------------------------------ [ 30.618914] a.out/1128 is trying to acquire lock: [ 30.619520] ffff88810b063868 (&ep->mtx){+.+.}-{3:3}, at: __ep_eventpoll_poll+0x9f/0x220 [ 30.620505] [ 30.620505] but task is already holding lock: [ 30.621218] ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 30.622349] [ 30.622349] which lock already depends on the new lock. [ 30.622349] [ 30.623289] [ 30.623289] the existing dependency chain (in reverse order) is: [ 30.624243] [ 30.624243] -> #1 (&ctx->uring_lock){+.+.}-{3:3}: [ 30.625263] lock_acquire+0x2c7/0x390 [ 30.625868] __mutex_lock+0xae/0x9f0 [ 30.626451] io_cqring_overflow_flush.part.95+0x6d/0x70 [ 30.627278] io_uring_poll+0xcb/0xd0 [ 30.627890] ep_item_poll.isra.14+0x4e/0x90 [ 30.628531] do_epoll_ctl+0xb7e/0x1120 [ 30.629122] __x64_sys_epoll_ctl+0x70/0xb0 [ 30.629770] do_syscall_64+0x2d/0x40 [ 30.630332] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 30.631187] [ 30.631187] -> #0 (&ep->mtx){+.+.}-{3:3}: [ 30.631985] check_prevs_add+0x226/0xb00 [ 30.632584] __lock_acquire+0x1237/0x13a0 [ 30.633207] lock_acquire+0x2c7/0x390 [ 30.633740] __mutex_lock+0xae/0x9f0 [ 30.634258] __ep_eventpoll_poll+0x9f/0x220 [ 30.634879] __io_arm_poll_handler+0xbf/0x220 [ 30.635462] io_issue_sqe+0xa6b/0x13e0 [ 30.635982] __io_queue_sqe+0x10b/0x550 [ 30.636648] io_queue_sqe+0x235/0x470 [ 30.637281] io_submit_sqes+0xcce/0xf10 [ 30.637839] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 30.638465] do_syscall_64+0x2d/0x40 [ 30.638999] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 30.639643] [ 30.639643] other info that might help us debug this: [ 30.639643] [ 30.640618] Possible unsafe locking scenario: [ 30.640618] [ 30.641402] CPU0 CPU1 [ 30.641938] ---- ---- [ 30.642664] lock(&ctx->uring_lock); [ 30.643425] lock(&ep->mtx); [ 30.644498] lock(&ctx->uring_lock); [ 30.645668] lock(&ep->mtx); [ 30.646321] [ 30.646321] *** DEADLOCK *** [ 30.646321] [ 30.647642] 1 lock held by a.out/1128: [ 30.648424] #0: ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 30.649954] [ 30.649954] stack backtrace: [ 30.650592] CPU: 1 PID: 1128 Comm: a.out Not tainted 5.11.0-rc3-next-20210115 #1 [ 30.651554] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 30.652290] Call Trace: [ 30.652688] dump_stack+0xac/0xe3 [ 30.653164] check_noncircular+0x11e/0x130 [ 30.653747] ? check_prevs_add+0x226/0xb00 [ 30.654303] check_prevs_add+0x226/0xb00 [ 30.654845] ? add_lock_to_list.constprop.49+0xac/0x1d0 [ 30.655564] __lock_acquire+0x1237/0x13a0 [ 30.656262] lock_acquire+0x2c7/0x390 [ 30.656788] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.657379] ? __io_queue_proc.isra.88+0x180/0x180 [ 30.658014] __mutex_lock+0xae/0x9f0 [ 30.658524] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.659112] ? mark_held_locks+0x5a/0x80 [ 30.659648] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.660229] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 30.660885] ? trace_hardirqs_on+0x46/0x110 [ 30.661471] ? __io_queue_proc.isra.88+0x180/0x180 [ 30.662102] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.662696] __ep_eventpoll_poll+0x9f/0x220 [ 30.663273] ? __ep_eventpoll_poll+0x220/0x220 [ 30.663875] __io_arm_poll_handler+0xbf/0x220 [ 30.664463] io_issue_sqe+0xa6b/0x13e0 [ 30.664984] ? __lock_acquire+0x782/0x13a0 [ 30.665544] ? __io_queue_proc.isra.88+0x180/0x180 [ 30.666170] ? __io_queue_sqe+0x10b/0x550 [ 30.666725] __io_queue_sqe+0x10b/0x550 [ 30.667252] ? __fget_files+0x131/0x260 [ 30.667791] ? io_req_prep+0xd8/0x1090 [ 30.668316] ? io_queue_sqe+0x235/0x470 [ 30.668868] io_queue_sqe+0x235/0x470 [ 30.669398] io_submit_sqes+0xcce/0xf10 [ 30.669931] ? xa_load+0xe4/0x1c0 [ 30.670425] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 30.671051] ? lockdep_hardirqs_on_prepare+0xde/0x180 [ 30.671719] ? syscall_enter_from_user_mode+0x2b/0x80 [ 30.672380] do_syscall_64+0x2d/0x40 [ 30.672901] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 30.673503] RIP: 0033:0x7fd89c813239 [ 30.673962] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 30.675920] RSP: 002b:00007ffc65a7c628 EFLAGS: 00000217 ORIG_RAX: 00000000000001aa [ 30.676791] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd89c813239 [ 30.677594] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 0000000000000003 [ 30.678678] RBP: 00007ffc65a7c720 R08: 0000000000000000 R09: 0000000003000000 [ 30.679492] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400ff0 [ 30.680282] R13: 00007ffc65a7c840 R14: 0000000000000000 R15: 0000000000000000 This might happen if we do epoll_wait on a uring fd while reading/writing the former epoll fd in a sqe in the former uring instance. So let's don't flush cqring overflow list, just do a simple check. Reported-by: Abaci <[email protected]> Fixes: 6c503150ae33 ("io_uring: patch up IOPOLL overflow_flush sync") Signed-off-by: Hao Xu <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: defer flushing cached reqsPavel Begunkov1-9/+19
Awhile there are requests in the allocation cache -- use them, only if those ended go for the stashed memory in comp.free_list. As list manipulation are generally heavy and are not good for caches, flush them all or as much as can in one go. Signed-off-by: Pavel Begunkov <[email protected]> [axboe: return success/failure from io_flush_cached_reqs()] Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: take comp_state from ctxPavel Begunkov1-19/+18
__io_queue_sqe() is always called with a non-NULL comp_state, which is taken directly from context. Don't pass it around but infer from ctx. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: enable req cache for task_work itemsJens Axboe1-1/+21
task_work is run without utilizing the req alloc cache, so any deferred items don't get to take advantage of either the alloc or free side of it. With task_work now being wrapped by io_uring, we can use the ctx completion state to both use the req cache and the completion flush batching. With this, the only request type that cannot take advantage of the req cache is IRQ driven IO for regular files / block devices. Anything else, including IOPOLL polled IO to those same tyes, will take advantage of it. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: provide FIFO ordering for task_workJens Axboe3-16/+108
task_work is a LIFO list, due to how it's implemented as a lockless list. For long chains of task_work, this can be problematic as the first entry added is the last one processed. Similarly, we'd waste a lot of CPU cycles reversing this list. Wrap the task_work so we have a single task_work entry per task per ctx, and use that to run it in the right order. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: use persistent request cacheJens Axboe1-18/+30
Now that we have the submit_state in the ring itself, we can have io_kiocb allocations that are persistent across invocations. This reduces the time spent doing slab allocations and frees. [sil: rebased] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: feed reqs back into alloc cachePavel Begunkov1-7/+12
Make io_req_free_batch(), which is used for inline executed requests and IOPOLL, to return requests back into the allocation cache, so avoid most of kmalloc()/kfree() for those cases. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: persistent req cachePavel Begunkov1-8/+13
Don't free batch-allocated requests across syscalls. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: count ctx refs separately from reqsPavel Begunkov1-1/+5
Currently batch free handles request memory freeing and ctx ref putting together. Separate them and use different counters, that will be needed for reusing reqs memory. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: remove fallback_reqPavel Begunkov1-36/+2
Remove fallback_req for now, it gets in the way of other changes. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: submit-completion free batchingPavel Begunkov1-20/+29
io_submit_flush_completions() does completion batching, but may also use free batching as iopoll does. The main beneficiaries should be buffered reads/writes and send/recv. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: replace list with array for compl batchPavel Begunkov1-24/+11
Reincarnation of an old patch that replaces a list in struct io_compl_batch with an array. It's needed to avoid hooking requests via their compl.list, because it won't be always available in the future. It's also nice to split io_submit_flush_completions() to avoid free under locks and remove unlock/lock with a long comment describing when it can be done. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: don't reinit submit state every timePavel Begunkov1-5/+4
As now submit_state is retained across syscalls, we can save ourself from initialising it from ground up for each io_submit_sqes(). Set some fields during ctx allocation, and just keep them always consistent. Signed-off-by: Pavel Begunkov <[email protected]> [axboe: remove unnecessary zeroing of ctx members] Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: remove ctx from comp_statePavel Begunkov1-11/+9
completion state is closely bound to ctx, we don't need to store ctx inside as we always have it around to pass to flush. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: don't keep submit_state on stackPavel Begunkov1-44/+46
struct io_submit_state is quite big (168 bytes) and going to grow. It's better to not keep it on stack as it is now. Move it to context, it's always protected by uring_lock, so it's fine to have only one instance of it. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10io_uring: don't propagate io_comp_statePavel Begunkov1-91/+73
There is no reason to drag io_comp_state into opcode handlers, we just need a flag and the actual work will be done in __io_queue_sqe(). Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-09io_uring: make op handlers always take issue flagsPavel Begunkov1-12/+13
Make opcode handler interfaces a bit more consistent by always passing in issue flags. Bulky but pretty easy and mechanical change. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-09io_uring: replace force_nonblock with flagsPavel Begunkov1-82/+96
Replace bool force_nonblock with flags. It has a long standing goal of differentiating context from which we execute. Currently we have some subtle places where some invariants, like holding of uring_lock, are subtly inferred. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-08io_uring: cleanup up cancel SQPOLL reqs across execPavel Begunkov1-21/+36
For SQPOLL rings tctx_inflight() always returns zero, so it might skip doing full cancelation. It's fine because we jam all sqpoll submissions in any case and do go through files cancel for them, but not nice. Do the intended full cancellation, by mimicking __io_uring_task_cancel() waiting but impersonating SQPOLL task. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-05io_uring: refactor sendmsg/recvmsg iov managingPavel Begunkov1-30/+27
Current iov handling with recvmsg/sendmsg may be confusing. First make a rule for msg->iov: either it points to an allocated iov that have to be kfree()'d later, or it's NULL and we use fast_iov. That's much better than current 3-state (also can point to fast_iov). And rename it into free_iov for uniformity with read/write. Also, instead of after struct io_async_msghdr copy fixing up of msg.msg_iter.iov has been happening in io_recvmsg()/io_sendmsg(). Move it into io_setup_async_msg(), that's the right place. Signed-off-by: Pavel Begunkov <[email protected]> [axboe: add comment on NULL check before kfree()] Signed-off-by: Jens Axboe <[email protected]>
2021-02-05io_uring: clean iov usage for recvmsg buf selectPavel Begunkov1-6/+4
Don't pretend we don't know that REQ_F_BUFFER_SELECT for recvmsg always uses fast_iov -- clean up confusing intermixing kmsg->iov and kmsg->fast_iov for buffer select. Also don't init iter with garbage in __io_recvmsg_copy_hdr() only for it to be set shortly after in io_recvmsg(). Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-05io_uring: set msg_name on msg fixupPavel Begunkov1-2/+1
io_setup_async_msg() should fully prepare io_async_msghdr, let it also handle assigning msg_name and don't hand code it in [send,recv]msg(). Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring/io-wq: return 2-step work swap schemePavel Begunkov3-34/+12
Saving one lock/unlock for io-wq is not super important, but adds some ugliness in the code. More important, atomic decs not turning it to zero for some archs won't give the right ordering/barriers so the io_steal_work() may pretty easily get subtly and completely broken. Return back 2-step io-wq work exchange and clean it up. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: deduplicate file table slot calculationPavel Begunkov1-22/+19
Extract a helper io_fixed_file_slot() returning a place in our fixed files table, so we don't hand-code it three times in the code. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: io_import_iovec return type cleanupPavel Begunkov1-19/+11
io_import_iovec() doesn't return IO size anymore, only error code. Make it more apparent by returning int instead of ssize and clean up leftovers. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: treat NONBLOCK and RWF_NOWAIT similarlyPavel Begunkov1-14/+16
Make decision making of whether we need to retry read/write similar for O_NONBLOCK and RWF_NOWAIT. Set REQ_F_NOWAIT when either is specified and use it for all relevant checks. Also fix resubmitting NOWAIT requests via io_rw_reissue(). Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: highlight read-retry loopPavel Begunkov1-20/+20
We already have implicit do-while for read-retries but with goto in the end. Convert it to an actual do-while, it highlights it so making a bit more understandable and is cleaner in general. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: inline io_read()'s iovec freeingPavel Begunkov1-18/+13
io_read() has not the simpliest control flow with a lot of jumps and it's hard to read. One of those is a out_free: label, which frees iovec. However, from the middle of io_read() iovec is NULL'ed and so kfree(iovec) is no-op, it leaves us with two place where we can inline it and further clean up the code. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: don't forget to adjust io_sizePavel Begunkov1-9/+5
We have invariant in io_read() of how much we're trying to read spilled into an iter and io_size variable. The last one controls decision making about whether to do read-retries. However, io_size is modified only after the first read attempt, so if we happen to go for a third retry in a single call to io_read(), we will get io_size greater than in the iterator, so may lead to various side effects up to live-locking. Modify io_size each time. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: let io_setup_async_rw take care of iovecPavel Begunkov1-15/+9
Now we give out ownership of iovec into io_setup_async_rw(), so it either sets request's context right or frees the iovec on error itself. Makes our life a bit easier at call sites. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: further simplify do_read error parsingPavel Begunkov1-9/+8
First, instead of checking iov_iter_count(iter) for 0 to find out that all needed bytes were read, just compare returned code against io_size. It's more reliable and arguably cleaner. Also, place the half-read case into an else branch and delete an extra label. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: refactor io_read for unsupported nowaitPavel Begunkov1-6/+6
!io_file_supports_async() case of io_read() is hard to read, it jumps somewhere in the middle of the function just to do async setup and fail on a similar check. Call io_setup_async_rw() directly for this case, it's much easier to follow. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: refactor io_cqring_waitPavel Begunkov1-21/+22
It's easy to make a mistake in io_cqring_wait() because for all break/continue clauses we need to watch for prepare/finish_wait to be used correctly. Extract all those into a new helper io_cqring_wait_schedule(), and transforming the loop into simple series of func calls: prepare(); check_and_schedule(); finish(); Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: refactor scheduling in io_cqring_waitPavel Begunkov1-11/+8
schedule_timeout() with timeout=MAX_SCHEDULE_TIMEOUT is guaranteed to work just as schedule(), so instead of hand-coding it based on arguments always use the timeout version and simplify code. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04io_uring: deduplicate core cancellations sequencePavel Begunkov1-45/+40
Files and task cancellations go over same steps trying to cancel requests in io-wq, poll, etc. Deduplicate it with a helper. note: new io_uring_try_cancel_requests() is former __io_uring_cancel_task_requests() with files passed as an agrument and flushing overflowed requests. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: simplify do_read return parsingPavel Begunkov1-5/+2
do_read() returning 0 bytes read (not -EAGAIN/etc.) is not an important enough of a case to prioritise it. Fold it into ret < 0 check, so we get rid of an extra if and make it a bit more readable. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>