blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-12	io_uring: optimise SQPOLL mm/files grabbing	Pavel Begunkov	1	-14/+13
	There are two reasons for this. First is to optimise io_sq_thread_acquire_mm_files() for non-SQPOLL case, which currently do too many checks and function calls in the hot path, e.g. in io_init_req(). The second is to not grab mm/files when there are not needed. As __io_queue_sqe() issues only one request now, we can reuse io_sq_thread_acquire_mm_files() instead of unconditional acquire mm/files. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-12	io_uring: optimise out unlikely link queue	Pavel Begunkov	1	-32/+10
	__io_queue_sqe() tries to issue as much requests of a link as it can, and uses io_put_req_find_next() to extract a next one, targeting inline completed requests. As now __io_queue_sqe() is always used together with struct io_comp_state, it leaves next propagation only a small window and only for async reqs, that doesn't justify its existence. Remove it, make __io_queue_sqe() to issue only a head request. It simplifies the code and will allow other optimisations. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-12	io_uring: take compl state from submit state	Pavel Begunkov	1	-6/+3
	Completion and submission states are now coupled together, it's weird to get one from argument and another from ctx, do it consistently for io_req_free_batch(). It's also faster as we already have @state cached in registers. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11	io_uring: inline io_complete_rw_common()	Pavel Begunkov	1	-17/+9
	__io_complete_rw() casts request to kiocb for it to be immediately container_of()'ed by io_complete_rw_common(). And the last function's name doesn't do a great job of illuminating its purposes, so just inline it in its only user. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11	io_uring: move res check out of io_rw_reissue()	Pavel Begunkov	1	-9/+8
	We pass return code into io_rw_reissue() only to be able to check if it's -EAGAIN. That's not the cleanest approach and may prevent inlining of the non-EAGAIN fast path, so do it at call sites. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11	io_uring: simplify iopoll reissuing	Pavel Begunkov	1	-21/+5
	Don't stash -EAGAIN'ed iopoll requests into a list to reissue it later, do it eagerly. It removes overhead on keeping and checking that list, and allows in case of failure for these requests to be completed through normal iopoll completion path. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11	io_uring: clean up io_req_free_batch_finish()	Pavel Begunkov	1	-3/+1
	io_req_free_batch_finish() is final and does not permit struct req_batch to be reused without re-init. To be more consistent don't clear ->task there. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-11	io_uring: move submit side state closer in the ring	Jens Axboe	1	-7/+10
	We recently added the submit side req cache, but it was placed at the end of the struct. Move it near the other submission state for better memory placement, and reshuffle a few other members at the same time. Signed-off-by: Jens Axboe <[email protected]>
2021-02-11	io_uring: assign file_slot prior to calling io_sqe_file_register()	Jens Axboe	1	-1/+2
	We use the assigned slot in io_sqe_file_register(), and a previous patch moved the assignment to after we have called it. This isn't super pretty, and will get cleaned up in the future. For now, fix the regression by restoring the previous assignment/clear of the file_slot. Fixes: ea64ec02b31d ("io_uring: deduplicate file table slot calculation") Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: remove redundant initialization of variable ret	Colin Ian King	1	-1/+1
	The variable ret is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: unpark SQPOLL thread for cancelation	Pavel Begunkov	1	-0/+5
	We park SQPOLL task before going into io_uring_cancel_files(), so the task won't run task_works including those that might be important for the cancellation passes. In this case it's io_poll_remove_one(), which frees requests via io_put_req_deferred(). Unpark it for while waiting, it's ok as we disable submissions beforehand, so no new requests will be generated. INFO: task syz-executor893:8493 blocked for more than 143 seconds. Call Trace: context_switch kernel/sched/core.c:4327 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5078 schedule+0xcf/0x270 kernel/sched/core.c:5157 io_uring_cancel_files fs/io_uring.c:8912 [inline] io_uring_cancel_task_requests+0xe70/0x11a0 fs/io_uring.c:8979 __io_uring_files_cancel+0x110/0x1b0 fs/io_uring.c:9067 io_uring_files_cancel include/linux/io_uring.h:51 [inline] do_exit+0x2fe/0x2ae0 kernel/exit.c:780 do_group_exit+0x125/0x310 kernel/exit.c:922 __do_sys_exit_group kernel/exit.c:933 [inline] __se_sys_exit_group kernel/exit.c:931 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: [email protected] # 5.5+ Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: place ring SQ/CQ arrays under memcg memory limits	Jens Axboe	1	-75/+10
	Instead of imposing rlimit memlock limits for the rings themselves, ensure that we account them properly under memcg with __GFP_ACCOUNT. We retain rlimit memlock for registered buffers, this is just for the ring arrays themselves. Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: enable kmemcg account for io_uring requests	Jens Axboe	1	-1/+2
	This puts io_uring under the memory cgroups accounting and limits for requests. Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: enable req cache for IRQ driven IO	Jens Axboe	1	-20/+51
	This is the last class of requests that cannot utilize the req alloc cache. Add a per-ctx req cache that is protected by the completion_lock, and refill our submit side cache when it gets over our batch count. Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: fix possible deadlock in io_uring_poll	Hao Xu	1	-2/+15
	Abaci reported follow issue: [ 30.615891] ====================================================== [ 30.616648] WARNING: possible circular locking dependency detected [ 30.617423] 5.11.0-rc3-next-20210115 #1 Not tainted [ 30.618035] ------------------------------------------------------ [ 30.618914] a.out/1128 is trying to acquire lock: [ 30.619520] ffff88810b063868 (&ep->mtx){+.+.}-{3:3}, at: __ep_eventpoll_poll+0x9f/0x220 [ 30.620505] [ 30.620505] but task is already holding lock: [ 30.621218] ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 30.622349] [ 30.622349] which lock already depends on the new lock. [ 30.622349] [ 30.623289] [ 30.623289] the existing dependency chain (in reverse order) is: [ 30.624243] [ 30.624243] -> #1 (&ctx->uring_lock){+.+.}-{3:3}: [ 30.625263] lock_acquire+0x2c7/0x390 [ 30.625868] __mutex_lock+0xae/0x9f0 [ 30.626451] io_cqring_overflow_flush.part.95+0x6d/0x70 [ 30.627278] io_uring_poll+0xcb/0xd0 [ 30.627890] ep_item_poll.isra.14+0x4e/0x90 [ 30.628531] do_epoll_ctl+0xb7e/0x1120 [ 30.629122] __x64_sys_epoll_ctl+0x70/0xb0 [ 30.629770] do_syscall_64+0x2d/0x40 [ 30.630332] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 30.631187] [ 30.631187] -> #0 (&ep->mtx){+.+.}-{3:3}: [ 30.631985] check_prevs_add+0x226/0xb00 [ 30.632584] __lock_acquire+0x1237/0x13a0 [ 30.633207] lock_acquire+0x2c7/0x390 [ 30.633740] __mutex_lock+0xae/0x9f0 [ 30.634258] __ep_eventpoll_poll+0x9f/0x220 [ 30.634879] __io_arm_poll_handler+0xbf/0x220 [ 30.635462] io_issue_sqe+0xa6b/0x13e0 [ 30.635982] __io_queue_sqe+0x10b/0x550 [ 30.636648] io_queue_sqe+0x235/0x470 [ 30.637281] io_submit_sqes+0xcce/0xf10 [ 30.637839] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 30.638465] do_syscall_64+0x2d/0x40 [ 30.638999] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 30.639643] [ 30.639643] other info that might help us debug this: [ 30.639643] [ 30.640618] Possible unsafe locking scenario: [ 30.640618] [ 30.641402] CPU0 CPU1 [ 30.641938] ---- ---- [ 30.642664] lock(&ctx->uring_lock); [ 30.643425] lock(&ep->mtx); [ 30.644498] lock(&ctx->uring_lock); [ 30.645668] lock(&ep->mtx); [ 30.646321] [ 30.646321] * DEADLOCK * [ 30.646321] [ 30.647642] 1 lock held by a.out/1128: [ 30.648424] #0: ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 30.649954] [ 30.649954] stack backtrace: [ 30.650592] CPU: 1 PID: 1128 Comm: a.out Not tainted 5.11.0-rc3-next-20210115 #1 [ 30.651554] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 30.652290] Call Trace: [ 30.652688] dump_stack+0xac/0xe3 [ 30.653164] check_noncircular+0x11e/0x130 [ 30.653747] ? check_prevs_add+0x226/0xb00 [ 30.654303] check_prevs_add+0x226/0xb00 [ 30.654845] ? add_lock_to_list.constprop.49+0xac/0x1d0 [ 30.655564] __lock_acquire+0x1237/0x13a0 [ 30.656262] lock_acquire+0x2c7/0x390 [ 30.656788] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.657379] ? __io_queue_proc.isra.88+0x180/0x180 [ 30.658014] __mutex_lock+0xae/0x9f0 [ 30.658524] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.659112] ? mark_held_locks+0x5a/0x80 [ 30.659648] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.660229] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 30.660885] ? trace_hardirqs_on+0x46/0x110 [ 30.661471] ? __io_queue_proc.isra.88+0x180/0x180 [ 30.662102] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.662696] __ep_eventpoll_poll+0x9f/0x220 [ 30.663273] ? __ep_eventpoll_poll+0x220/0x220 [ 30.663875] __io_arm_poll_handler+0xbf/0x220 [ 30.664463] io_issue_sqe+0xa6b/0x13e0 [ 30.664984] ? __lock_acquire+0x782/0x13a0 [ 30.665544] ? __io_queue_proc.isra.88+0x180/0x180 [ 30.666170] ? __io_queue_sqe+0x10b/0x550 [ 30.666725] __io_queue_sqe+0x10b/0x550 [ 30.667252] ? __fget_files+0x131/0x260 [ 30.667791] ? io_req_prep+0xd8/0x1090 [ 30.668316] ? io_queue_sqe+0x235/0x470 [ 30.668868] io_queue_sqe+0x235/0x470 [ 30.669398] io_submit_sqes+0xcce/0xf10 [ 30.669931] ? xa_load+0xe4/0x1c0 [ 30.670425] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 30.671051] ? lockdep_hardirqs_on_prepare+0xde/0x180 [ 30.671719] ? syscall_enter_from_user_mode+0x2b/0x80 [ 30.672380] do_syscall_64+0x2d/0x40 [ 30.672901] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 30.673503] RIP: 0033:0x7fd89c813239 [ 30.673962] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 30.675920] RSP: 002b:00007ffc65a7c628 EFLAGS: 00000217 ORIG_RAX: 00000000000001aa [ 30.676791] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd89c813239 [ 30.677594] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 0000000000000003 [ 30.678678] RBP: 00007ffc65a7c720 R08: 0000000000000000 R09: 0000000003000000 [ 30.679492] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400ff0 [ 30.680282] R13: 00007ffc65a7c840 R14: 0000000000000000 R15: 0000000000000000 This might happen if we do epoll_wait on a uring fd while reading/writing the former epoll fd in a sqe in the former uring instance. So let's don't flush cqring overflow list, just do a simple check. Reported-by: Abaci <[email protected]> Fixes: 6c503150ae33 ("io_uring: patch up IOPOLL overflow_flush sync") Signed-off-by: Hao Xu <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: defer flushing cached reqs	Pavel Begunkov	1	-9/+19
	Awhile there are requests in the allocation cache -- use them, only if those ended go for the stashed memory in comp.free_list. As list manipulation are generally heavy and are not good for caches, flush them all or as much as can in one go. Signed-off-by: Pavel Begunkov <[email protected]> [axboe: return success/failure from io_flush_cached_reqs()] Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: take comp_state from ctx	Pavel Begunkov	1	-19/+18
	__io_queue_sqe() is always called with a non-NULL comp_state, which is taken directly from context. Don't pass it around but infer from ctx. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: enable req cache for task_work items	Jens Axboe	1	-1/+21
	task_work is run without utilizing the req alloc cache, so any deferred items don't get to take advantage of either the alloc or free side of it. With task_work now being wrapped by io_uring, we can use the ctx completion state to both use the req cache and the completion flush batching. With this, the only request type that cannot take advantage of the req cache is IRQ driven IO for regular files / block devices. Anything else, including IOPOLL polled IO to those same tyes, will take advantage of it. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: provide FIFO ordering for task_work	Jens Axboe	3	-16/+108
	task_work is a LIFO list, due to how it's implemented as a lockless list. For long chains of task_work, this can be problematic as the first entry added is the last one processed. Similarly, we'd waste a lot of CPU cycles reversing this list. Wrap the task_work so we have a single task_work entry per task per ctx, and use that to run it in the right order. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: use persistent request cache	Jens Axboe	1	-18/+30
	Now that we have the submit_state in the ring itself, we can have io_kiocb allocations that are persistent across invocations. This reduces the time spent doing slab allocations and frees. [sil: rebased] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: feed reqs back into alloc cache	Pavel Begunkov	1	-7/+12
	Make io_req_free_batch(), which is used for inline executed requests and IOPOLL, to return requests back into the allocation cache, so avoid most of kmalloc()/kfree() for those cases. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: persistent req cache	Pavel Begunkov	1	-8/+13
	Don't free batch-allocated requests across syscalls. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: count ctx refs separately from reqs	Pavel Begunkov	1	-1/+5
	Currently batch free handles request memory freeing and ctx ref putting together. Separate them and use different counters, that will be needed for reusing reqs memory. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: remove fallback_req	Pavel Begunkov	1	-36/+2
	Remove fallback_req for now, it gets in the way of other changes. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: submit-completion free batching	Pavel Begunkov	1	-20/+29
	io_submit_flush_completions() does completion batching, but may also use free batching as iopoll does. The main beneficiaries should be buffered reads/writes and send/recv. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: replace list with array for compl batch	Pavel Begunkov	1	-24/+11
	Reincarnation of an old patch that replaces a list in struct io_compl_batch with an array. It's needed to avoid hooking requests via their compl.list, because it won't be always available in the future. It's also nice to split io_submit_flush_completions() to avoid free under locks and remove unlock/lock with a long comment describing when it can be done. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: don't reinit submit state every time	Pavel Begunkov	1	-5/+4
	As now submit_state is retained across syscalls, we can save ourself from initialising it from ground up for each io_submit_sqes(). Set some fields during ctx allocation, and just keep them always consistent. Signed-off-by: Pavel Begunkov <[email protected]> [axboe: remove unnecessary zeroing of ctx members] Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: remove ctx from comp_state	Pavel Begunkov	1	-11/+9
	completion state is closely bound to ctx, we don't need to store ctx inside as we always have it around to pass to flush. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: don't keep submit_state on stack	Pavel Begunkov	1	-44/+46
	struct io_submit_state is quite big (168 bytes) and going to grow. It's better to not keep it on stack as it is now. Move it to context, it's always protected by uring_lock, so it's fine to have only one instance of it. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10	io_uring: don't propagate io_comp_state	Pavel Begunkov	1	-91/+73
	There is no reason to drag io_comp_state into opcode handlers, we just need a flag and the actual work will be done in __io_queue_sqe(). Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-09	io_uring: make op handlers always take issue flags	Pavel Begunkov	1	-12/+13
	Make opcode handler interfaces a bit more consistent by always passing in issue flags. Bulky but pretty easy and mechanical change. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-09	io_uring: replace force_nonblock with flags	Pavel Begunkov	1	-82/+96
	Replace bool force_nonblock with flags. It has a long standing goal of differentiating context from which we execute. Currently we have some subtle places where some invariants, like holding of uring_lock, are subtly inferred. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-08	io_uring: cleanup up cancel SQPOLL reqs across exec	Pavel Begunkov	1	-21/+36
	For SQPOLL rings tctx_inflight() always returns zero, so it might skip doing full cancelation. It's fine because we jam all sqpoll submissions in any case and do go through files cancel for them, but not nice. Do the intended full cancellation, by mimicking __io_uring_task_cancel() waiting but impersonating SQPOLL task. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-05	io_uring: refactor sendmsg/recvmsg iov managing	Pavel Begunkov	1	-30/+27
	Current iov handling with recvmsg/sendmsg may be confusing. First make a rule for msg->iov: either it points to an allocated iov that have to be kfree()'d later, or it's NULL and we use fast_iov. That's much better than current 3-state (also can point to fast_iov). And rename it into free_iov for uniformity with read/write. Also, instead of after struct io_async_msghdr copy fixing up of msg.msg_iter.iov has been happening in io_recvmsg()/io_sendmsg(). Move it into io_setup_async_msg(), that's the right place. Signed-off-by: Pavel Begunkov <[email protected]> [axboe: add comment on NULL check before kfree()] Signed-off-by: Jens Axboe <[email protected]>
2021-02-05	io_uring: clean iov usage for recvmsg buf select	Pavel Begunkov	1	-6/+4
	Don't pretend we don't know that REQ_F_BUFFER_SELECT for recvmsg always uses fast_iov -- clean up confusing intermixing kmsg->iov and kmsg->fast_iov for buffer select. Also don't init iter with garbage in __io_recvmsg_copy_hdr() only for it to be set shortly after in io_recvmsg(). Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-05	io_uring: set msg_name on msg fixup	Pavel Begunkov	1	-2/+1
	io_setup_async_msg() should fully prepare io_async_msghdr, let it also handle assigning msg_name and don't hand code it in [send,recv]msg(). Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring/io-wq: return 2-step work swap scheme	Pavel Begunkov	3	-34/+12
	Saving one lock/unlock for io-wq is not super important, but adds some ugliness in the code. More important, atomic decs not turning it to zero for some archs won't give the right ordering/barriers so the io_steal_work() may pretty easily get subtly and completely broken. Return back 2-step io-wq work exchange and clean it up. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: deduplicate file table slot calculation	Pavel Begunkov	1	-22/+19
	Extract a helper io_fixed_file_slot() returning a place in our fixed files table, so we don't hand-code it three times in the code. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: io_import_iovec return type cleanup	Pavel Begunkov	1	-19/+11
	io_import_iovec() doesn't return IO size anymore, only error code. Make it more apparent by returning int instead of ssize and clean up leftovers. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: treat NONBLOCK and RWF_NOWAIT similarly	Pavel Begunkov	1	-14/+16
	Make decision making of whether we need to retry read/write similar for O_NONBLOCK and RWF_NOWAIT. Set REQ_F_NOWAIT when either is specified and use it for all relevant checks. Also fix resubmitting NOWAIT requests via io_rw_reissue(). Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: highlight read-retry loop	Pavel Begunkov	1	-20/+20
	We already have implicit do-while for read-retries but with goto in the end. Convert it to an actual do-while, it highlights it so making a bit more understandable and is cleaner in general. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: inline io_read()'s iovec freeing	Pavel Begunkov	1	-18/+13
	io_read() has not the simpliest control flow with a lot of jumps and it's hard to read. One of those is a out_free: label, which frees iovec. However, from the middle of io_read() iovec is NULL'ed and so kfree(iovec) is no-op, it leaves us with two place where we can inline it and further clean up the code. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: don't forget to adjust io_size	Pavel Begunkov	1	-9/+5
	We have invariant in io_read() of how much we're trying to read spilled into an iter and io_size variable. The last one controls decision making about whether to do read-retries. However, io_size is modified only after the first read attempt, so if we happen to go for a third retry in a single call to io_read(), we will get io_size greater than in the iterator, so may lead to various side effects up to live-locking. Modify io_size each time. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: let io_setup_async_rw take care of iovec	Pavel Begunkov	1	-15/+9
	Now we give out ownership of iovec into io_setup_async_rw(), so it either sets request's context right or frees the iovec on error itself. Makes our life a bit easier at call sites. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: further simplify do_read error parsing	Pavel Begunkov	1	-9/+8
	First, instead of checking iov_iter_count(iter) for 0 to find out that all needed bytes were read, just compare returned code against io_size. It's more reliable and arguably cleaner. Also, place the half-read case into an else branch and delete an extra label. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: refactor io_read for unsupported nowait	Pavel Begunkov	1	-6/+6
	!io_file_supports_async() case of io_read() is hard to read, it jumps somewhere in the middle of the function just to do async setup and fail on a similar check. Call io_setup_async_rw() directly for this case, it's much easier to follow. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: refactor io_cqring_wait	Pavel Begunkov	1	-21/+22
	It's easy to make a mistake in io_cqring_wait() because for all break/continue clauses we need to watch for prepare/finish_wait to be used correctly. Extract all those into a new helper io_cqring_wait_schedule(), and transforming the loop into simple series of func calls: prepare(); check_and_schedule(); finish(); Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: refactor scheduling in io_cqring_wait	Pavel Begunkov	1	-11/+8
	schedule_timeout() with timeout=MAX_SCHEDULE_TIMEOUT is guaranteed to work just as schedule(), so instead of hand-coding it based on arguments always use the timeout version and simplify code. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04	io_uring: deduplicate core cancellations sequence	Pavel Begunkov	1	-45/+40
	Files and task cancellations go over same steps trying to cancel requests in io-wq, poll, etc. Deduplicate it with a helper. note: new io_uring_try_cancel_requests() is former __io_uring_cancel_task_requests() with files passed as an agrument and flushing overflowed requests. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01	io_uring: simplify do_read return parsing	Pavel Begunkov	1	-5/+2
	do_read() returning 0 bytes read (not -EAGAIN/etc.) is not an important enough of a case to prioritise it. Fold it into ret < 0 check, so we get rid of an extra if and make it a bit more readable. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>