aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-02-01io_uring: kill not used needs_file_no_errorPavel Begunkov1-4/+1
We have no request types left using needs_file_no_error, remove it. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: fix inconsistent lock statePavel Begunkov1-4/+11
WARNING: inconsistent lock state inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. syz-executor217/8450 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline] ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_req_clean_work fs/io_uring.c:1398 [inline] ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&fs->lock); <Interrupt> lock(&fs->lock); *** DEADLOCK *** 1 lock held by syz-executor217/8450: #0: ffff88802417c3e8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x1071/0x1f30 fs/io_uring.c:9442 stack backtrace: CPU: 1 PID: 8450 Comm: syz-executor217 Not tainted 5.11.0-rc5-next-20210129-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> [...] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] io_req_clean_work fs/io_uring.c:1398 [inline] io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029 __io_free_req+0x3d/0x2e0 fs/io_uring.c:2046 io_free_req fs/io_uring.c:2269 [inline] io_double_put_req fs/io_uring.c:2392 [inline] io_put_req+0xf9/0x570 fs/io_uring.c:2388 io_link_timeout_fn+0x30c/0x480 fs/io_uring.c:6497 __run_hrtimer kernel/time/hrtimer.c:1519 [inline] __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583 hrtimer_interrupt+0x334/0x940 kernel/time/hrtimer.c:1645 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1085 [inline] __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1102 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_sysvec_on_irqstack arch/x86/include/asm/irq_stack.h:37 [inline] run_sysvec_on_irqstack_cond arch/x86/include/asm/irq_stack.h:89 [inline] sysvec_apic_timer_interrupt+0xbd/0x100 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629 RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline] RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:199 spin_unlock_irq include/linux/spinlock.h:404 [inline] io_queue_linked_timeout+0x194/0x1f0 fs/io_uring.c:6525 __io_queue_sqe+0x328/0x1290 fs/io_uring.c:6594 io_queue_sqe+0x631/0x10d0 fs/io_uring.c:6639 io_queue_link_head fs/io_uring.c:6650 [inline] io_submit_sqe fs/io_uring.c:6697 [inline] io_submit_sqes+0x19b5/0x2720 fs/io_uring.c:6960 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9443 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Don't free requests from under hrtimer context (softirq) as it may sleep or take spinlocks improperly (e.g. non-irq versions). Cc: [email protected] # 5.6+ Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: Fix NULL dereference in error in io_sqe_files_register()Dan Carpenter1-1/+1
If we hit a "goto out_free;" before the "ctx->file_data" pointer has been assigned then it leads to a NULL derefence when we call: free_fixed_rsrc_data(ctx->file_data); We can fix this by moving the assignment earlier. Fixes: 1ad555c6ae6e ("io_uring: create common fixed_rsrc_data allocation routines") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: check kthread parked flag before sqthread goes to sleepHao Xu1-4/+1
Abaci reported this issue: #[ 605.170872] INFO: task kworker/u4:1:53 blocked for more than 143 seconds. [ 605.172123] Not tainted 5.10.0+ #1 [ 605.172811] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.173915] task:kworker/u4:1 state:D stack: 0 pid: 53 ppid: 2 flags:0x00004000 [ 605.175130] Workqueue: events_unbound io_ring_exit_work [ 605.175931] Call Trace: [ 605.176334] __schedule+0xe0e/0x25a0 [ 605.176971] ? firmware_map_remove+0x1a1/0x1a1 [ 605.177631] ? write_comp_data+0x2a/0x80 [ 605.178272] schedule+0xd0/0x270 [ 605.178811] schedule_timeout+0x6b6/0x940 [ 605.179415] ? mark_lock.part.0+0xca/0x1420 [ 605.180062] ? usleep_range+0x170/0x170 [ 605.180684] ? wait_for_completion+0x16d/0x280 [ 605.181392] ? mark_held_locks+0x9e/0xe0 [ 605.182079] ? rwlock_bug.part.0+0x90/0x90 [ 605.182853] ? lockdep_hardirqs_on_prepare+0x286/0x400 [ 605.183817] wait_for_completion+0x175/0x280 [ 605.184713] ? wait_for_completion_interruptible+0x340/0x340 [ 605.185611] ? _raw_spin_unlock_irq+0x24/0x30 [ 605.186307] ? migrate_swap_stop+0x9c0/0x9c0 [ 605.187046] kthread_park+0x127/0x1c0 [ 605.187738] io_sq_thread_stop+0xd5/0x530 [ 605.188459] io_ring_exit_work+0xb1/0x970 [ 605.189207] process_one_work+0x92c/0x1510 [ 605.189947] ? pwq_dec_nr_in_flight+0x360/0x360 [ 605.190682] ? rwlock_bug.part.0+0x90/0x90 [ 605.191430] ? write_comp_data+0x2a/0x80 [ 605.192207] worker_thread+0x9b/0xe20 [ 605.192900] ? process_one_work+0x1510/0x1510 [ 605.193599] kthread+0x353/0x460 [ 605.194154] ? _raw_spin_unlock_irq+0x24/0x30 [ 605.194910] ? kthread_create_on_node+0x100/0x100 [ 605.195821] ret_from_fork+0x1f/0x30 [ 605.196605] [ 605.196605] Showing all locks held in the system: [ 605.197598] 1 lock held by khungtaskd/25: [ 605.198301] #0: ffffffff8b5f76a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.0+0x0/0x30 [ 605.199914] 3 locks held by kworker/u4:1/53: [ 605.200609] #0: ffff888100109938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x82a/0x1510 [ 605.202108] #1: ffff888100e47dc0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x85e/0x1510 [ 605.203681] #2: ffff888116931870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park.part.0+0x19/0x50 [ 605.205183] 3 locks held by systemd-journal/161: [ 605.206037] 1 lock held by syslog-ng/254: [ 605.206674] 2 locks held by agetty/311: [ 605.207292] #0: ffff888101097098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x27/0x80 [ 605.208715] #1: ffffc900000332e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x222/0x1bb0 [ 605.210131] 2 locks held by bash/677: [ 605.210723] #0: ffff88810419a098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x27/0x80 [ 605.212105] #1: ffffc900000512e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x222/0x1bb0 [ 605.213777] [ 605.214151] ============================================= I believe this is caused by the follow race: (ctx_list is empty now) => io_put_sq_data | ==> kthread_park(sqd->thread); | ====> set KTHREAD_SHOULD_PARK | ====> wake_up_process(k) | sq thread is running | | | needs_sched is true since no ctx, | so TASK_INTERRUPTIBLE set and schedule | out then never wake up again | ====> wait_for_completion | (stuck here) So check if sqthread gets park flag right before schedule(). since ctx_list is always empty when this problem happens, here I put kthread_should_park() before setting the wakeup flag(ctx_list is empty so this for loop is fast), where is close enough to schedule(). The problem doesn't show again in my repro testing after this fix. Reported-by: Abaci <[email protected]> Signed-off-by: Hao Xu <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: Add skip option for __io_sqe_files_updatenoah1-0/+3
This patch adds support for skipping a file descriptor when using IORING_REGISTER_FILES_UPDATE. __io_sqe_files_update will skip fds set to IORING_REGISTER_FILES_SKIP. IORING_REGISTER_FILES_SKIP is inturn added as a #define in io_uring.h Signed-off-by: noah <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: cleanup files_update loopingPavel Begunkov1-6/+2
Replace a while with a simple for loop, that looks way more natural, and enables us to use "continue" as indexes are no more updated by hand in the end of the loop. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: consolidate putting reqs taskPavel Begunkov1-20/+14
We grab a task for each request and while putting it it also have to do extra work like inflight accounting and waking up that task. This sequence is duplicated several time, it's good time to add a helper. More to that, the helper generates better code due to better locality and so not failing alias analysis. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: ensure only sqo_task has file notesPavel Begunkov1-0/+4
For SQPOLL io_uring we want to have only one file note held by sqo_task. Add a warning to make sure it holds. It's deep in io_uring_add_task_file() out of hot path, so shouldn't hurt. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: simplify io_remove_personalities()Yejune Deng1-17/+11
The function io_remove_personalities() is very similar to io_unregister_personality(),so implement io_remove_personalities() calling io_unregister_personality(). Signed-off-by: Yejune Deng <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring/io-wq: kill off now unused IO_WQ_WORK_NO_CANCELJens Axboe3-6/+1
It's no longer used as IORING_OP_CLOSE got rid for the need of flagging it as uncancelable, kill it of. Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: get rid of intermediate IORING_OP_CLOSE stageJens Axboe1-29/+35
We currently split the close into two, in case we have a ->flush op that we can't safely handle from non-blocking context. This requires us to flag the op as uncancelable if we do need to punt it async, and that means special handling for just this op type. Use __close_fd_get_file() and grab the files lock so we can get the file and check if we need to go async in one atomic operation. That gets rid of the need for splitting this into two steps, and hence the need for IO_WQ_WORK_NO_CANCEL. Signed-off-by: Jens Axboe <[email protected]>
2021-02-01fs: provide locked helper variant of close_fd_get_file()Jens Axboe2-11/+26
Assumes current->files->file_lock is already held on invocation. Helps the caller check the file before removing the fd, if it needs to. Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: save atomic dec for inline executed reqsPavel Begunkov1-7/+10
When a request is completed with comp_state, its completion reference put is deferred to io_submit_flush_completions(), but the submission is put not far from there, so do it together to save one atomic dec per request. That targets requests that complete inline, e.g. buffered rw, send/recv. Proper benchmarking haven't been conducted but for nops(batch=32) it was around 7901 vs 8117 KIOPS (~2.7%), or ~4% per perf profiling. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: don't flush CQEs deep down the stackPavel Begunkov1-3/+10
io_submit_flush_completions() is called down the stack in the _state version of io_req_complete(), that's ok because is only called by io_uring opcode handler functions directly. Move it up to __io_queue_sqe() as preparation. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: help inlining of io_req_complete()Pavel Begunkov1-15/+21
__io_req_complete() inlining is a bit weird, some compilers don't optimise out the non-NULL branch of it even when called as io_req_complete(). Help it a bit by extracting state and stateless helpers out of __io_req_complete(). Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: add a helper timeout mode calculationPavel Begunkov1-12/+11
Deduplicates translation of timeout flags into hrtimer_mode. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: deduplicate failing task_work_addPavel Begunkov1-29/+17
When io_req_task_work_add() fails, the request will be cancelled by enqueueing via task_works of io-wq. Extract a function for that. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: remove __io_state_file_putPavel Begunkov1-9/+5
The check in io_state_file_put() is optimised pretty well when called from __io_file_get(). Don't pollute the code with all these variants. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: simplify io_alloc_req()Pavel Begunkov1-3/+1
Get rid of a label in io_alloc_req(), it's cleaner to do return directly. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: further deduplicate #CQ events calcPavel Begunkov1-8/+7
Apparently, there is one more place hand coded calculation of number of CQ events in the ring. Use __io_cqring_events() helper in io_get_cqring() as well. Naturally, assembly stays identical. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: inline __io_commit_cqring()Pavel Begunkov1-9/+3
Inline it in its only user, that's cleaner Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: inline io_async_submit()Pavel Begunkov1-6/+1
The name is confusing and it's used only in one place. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: cleanup personalities under uring_lockPavel Begunkov1-1/+1
personality_idr is usually synchronised by uring_lock, the exception would be removing personalities in io_ring_ctx_wait_and_kill(), which is legit as refs are killed by that point but still would be more resilient to do it under the lock. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: refactor io_resubmit_prep()Pavel Begunkov1-20/+13
It's awkward to pass return a value into a function for it to return it back. Check it at the caller site and clean up io_resubmit_prep() a bit. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: optimise io_rw_reissue()Pavel Begunkov1-3/+4
The hot path is IO completing on the first try. Reshuffle io_rw_reissue() so it's checked first. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: make percpu_ref_release names consistentBijan Mottahedeh1-4/+4
Make the percpu ref release function names consistent between rsrc data and nodes. Signed-off-by: Bijan Mottahedeh <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: create common fixed_rsrc_data allocation routinesBijan Mottahedeh1-15/+29
Create common alloc/free fixed_rsrc_data routines for both files and buffers. Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Bijan Mottahedeh <[email protected]> [remove buffer part] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: create common fixed_rsrc_ref_node handling routinesBijan Mottahedeh1-12/+26
Create common routines to be used for both files/buffers registration. [remove io_sqe_rsrc_set_node substitution] Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Bijan Mottahedeh <[email protected]> [merge, quiesce only for files] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: split ref_node alloc and initPavel Begunkov1-13/+11
A simple prep patch allowing to set refnode callbacks after it was allocated. This needed to 1) keep ourself off of hi-level functions where it's not pretty and they are not necessary 2) amortise ref_node allocation in the future, e.g. for updates. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: split alloc_fixed_file_ref_nodeBijan Mottahedeh1-2/+14
Split alloc_fixed_file_ref_node into resource generic/specific parts, to be leveraged for fixed buffers. Signed-off-by: Bijan Mottahedeh <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: add rsrc_ref locking routinesBijan Mottahedeh1-6/+16
Encapsulate resource reference locking into separate routines. Signed-off-by: Bijan Mottahedeh <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: separate ref_list from fixed_rsrc_dataBijan Mottahedeh1-17/+18
Uplevel ref_list and make it common to all resources. This is to allow one common ref_list to be used for both files, and buffers in upcoming patches. Signed-off-by: Bijan Mottahedeh <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: generalize io_queue_rsrc_removalBijan Mottahedeh1-7/+15
Generalize io_queue_rsrc_removal to handle both files and buffers. Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Bijan Mottahedeh <[email protected]> [remove io_mapped_ubuf from rsrc tables/etc. for now] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: rename file related variables to rsrcBijan Mottahedeh1-111/+117
This is a prep rename patch for subsequent patches to generalize file registration. [io_uring_rsrc_update:: rename fds -> data] Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Bijan Mottahedeh <[email protected]> [leave io_uring_files_update as struct] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: modularize io_sqe_buffers_registerBijan Mottahedeh1-17/+34
Move allocation of buffer management structures, and validation of buffers into separate routines. Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Bijan Mottahedeh <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: modularize io_sqe_buffer_registerBijan Mottahedeh1-103/+107
Split io_sqe_buffer_register into two routines: - io_sqe_buffer_register() registers a single buffer - io_sqe_buffers_register iterates over all user specified buffers Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Bijan Mottahedeh <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-01io_uring: enable LOOKUP_CACHED path resolution for filename lookupsJens Axboe1-20/+27
Instead of being pessimistic and assume that path lookup will block, use LOOKUP_CACHED to attempt just a cached lookup. This ensures that the fast path is always done inline, and we only punt to async context if IO is needed to satisfy the lookup. For forced nonblock open attempts, mark the file O_NONBLOCK over the actual ->open() call as well. We can safely clear this again before doing fd_install(), so it'll never be user visible that we fiddled with it. This greatly improves the performance of file open where the dentry is already cached: ached 5.10-git 5.10-git+LOOKUP_CACHED Speedup --------------------------------------------------------------- 33% 1,014,975 900,474 1.1x 89% 545,466 292,937 1.9x 100% 435,636 151,475 2.9x The more cache hot we are, the faster the inline LOOKUP_CACHED optimization helps. This is unsurprising and expected, as a thread offload becomes a more dominant part of the total overhead. If we look at io_uring tracing, doing an IORING_OP_OPENAT on a file that isn't in the dentry cache will yield: 275.550481: io_uring_create: ring 00000000ddda6278, fd 3 sq size 8, cq size 16, flags 0 275.550491: io_uring_submit_sqe: ring 00000000ddda6278, op 18, data 0x0, non block 1, sq_thread 0 275.550498: io_uring_queue_async_work: ring 00000000ddda6278, request 00000000c0267d17, flags 69760, normal queue, work 000000003d683991 275.550502: io_uring_cqring_wait: ring 00000000ddda6278, min_events 1 275.550556: io_uring_complete: ring 00000000ddda6278, user_data 0x0, result 4 which shows a failed nonblock lookup, then punt to worker, and then we complete with fd == 4. This takes 65 usec in total. Re-running the same test case again: 281.253956: io_uring_create: ring 0000000008207252, fd 3 sq size 8, cq size 16, flags 0 281.253967: io_uring_submit_sqe: ring 0000000008207252, op 18, data 0x0, non block 1, sq_thread 0 281.253973: io_uring_complete: ring 0000000008207252, user_data 0x0, result 4 shows the same request completing inline, also returning fd == 4. This takes 6 usec. Signed-off-by: Jens Axboe <[email protected]>
2021-02-01Merge branch 'work.namei' of ↵Jens Axboe2-45/+49
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into for-5.12/io_uring Merge RESOLVE_CACHED bits from Al, as the io_uring changes will build on top of that. * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED fs: add support for LOOKUP_CACHED saner calling conventions for unlazy_child() fs: make unlazy_walk() error handling consistent fs/namei.c: Remove unlikely of status being -ECHILD in lookup_fast() do_tmpfile(): don't mess with finish_open()
2021-01-31Merge tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-25/+44
Pull NFS client fixes from Trond Myklebust: - SUNRPC: Handle 0 length opaque XDR object data properly - Fix a layout segment leak in pnfs_layout_process() - pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn - pNFS/NFSv4: Improve rejection of out-of-order layouts - pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() * tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Handle 0 length opaque XDR object data properly SUNRPC: Move simple_get_bytes and simple_get_netobj into private header pNFS/NFSv4: Improve rejection of out-of-order layouts pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
2021-01-30Merge tag '5.11-rc5-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds6-20/+81
Pull cifs fixes from Steve French: "Four cifs patches found in additional testing of the conversion to the new mount API: three small option processing ones, and one fixing domain based DFS referrals" * tag '5.11-rc5-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix dfs domain referrals cifs: returning mount parm processing errors correctly cifs: fix mounts to subdirectories of target cifs: ignore auto and noauto options if given
2021-01-29Merge tag 'for-5.11-rc5-tag' of ↵Linus Torvalds6-51/+46
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes for a late rc: - fix lockdep complaint on 32bit arches and also remove an unsafe memory use due to device vs filesystem lifetime - two fixes for free space tree: * race during log replay and cache rebuild, now more likely to happen due to changes in this dev cycle * possible free space tree corruption with online conversion during initial tree population" * tag 'for-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix log replay failure due to race with space cache rebuild btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch btrfs: fix possible free space tree corruption with online conversion
2021-01-29Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+9
Pull block fixes from Jens Axboe: "All over the place fixes for this release: - blk-cgroup iteration teardown resched fix (Baolin) - NVMe pull request from Christoph: - add another Write Zeroes quirk (Chaitanya Kulkarni) - handle a no path available corner case (Daniel Wagner) - use the proper RCU aware list_add helper (Chao Leng) - bcache regression fix (Coly) - bdev->bd_size_lock IRQ fix. This will be fixed in drivers for 5.12, but for now, we'll make it IRQ safe (Damien) - null_blk zoned init fix (Damien) - add_partition() error handling fix (Dinghao) - s390 dasd kobject fix (Jan) - nbd fix for freezing queue while adding connections (Josef) - tag queueing regression fix (Ming) - revert of a patch that inadvertently meant that we regressed write performance on raid (Maxim)" * tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block: null_blk: cleanup zoned mode initialization nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head nvme-multipath: Early exit if no path is available nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES block: fix bd_size_lock use blk-cgroup: Use cond_resched() when destroy blkgs Revert "block: simplify set_init_blocksize" to regain lost performance nbd: freeze the queue while we're adding connections s390/dasd: Fix inconsistent kobject removal block: Fix an error handling in add_partition blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
2021-01-29Merge tag 'io_uring-5.11-2021-01-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-42/+53
Pull io_uring fixes from Jens Axboe: "We got the cancelation story sorted now, so for all intents and purposes, this should be it for 5.11 outside of any potential little fixes that may come in. This contains: - task_work task state fixes (Hao, Pavel) - Cancelation fixes (me, Pavel) - Fix for an inflight req patch in this release (Pavel) - Fix for a lock deadlock issue (Pavel)" * tag 'io_uring-5.11-2021-01-29' of git://git.kernel.dk/linux-block: io_uring: reinforce cancel on flush during exit io_uring: fix sqo ownership false positive warning io_uring: fix list corruption for splice file_get io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE io_uring: fix wqe->lock/completion_lock deadlock io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE io_uring: only call io_cqring_ev_posted() if events were posted io_uring: if we see flush on exit, cancel related tasks
2021-01-28cifs: fix dfs domain referralsRonnie Sahlberg6-16/+75
The new mount API requires additional changes to how DFS is handled. Additional testing of DFS uncovered problems with domain based DFS referrals (a follow on patch addresses DFS links) which this patch addresses. Signed-off-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-01-28Merge tag 'ecryptfs-5.11-rc6-setxattr-fix' of ↵Linus Torvalds1-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fix from Tyler Hicks: "Fix a regression that resulted in two rounds of UID translations when setting v3 namespaced file capabilities in some configurations" * tag 'ecryptfs-5.11-rc6-setxattr-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: fix uid translation for setxattr on security.capability
2021-01-28io_uring: reinforce cancel on flush during exitPavel Begunkov1-2/+1
What 84965ff8a84f0 ("io_uring: if we see flush on exit, cancel related tasks") really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably. That can be achieved by io_uring_cancel_files(), but we'll miss it calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(), because it will go through __io_uring_cancel_task_requests(). Just always call io_uring_cancel_files() during cancel, it's good enough for now. Cc: [email protected] # 5.9+ Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-01-28cifs: returning mount parm processing errors correctlySteve French1-4/+4
During additional testing of the updated cifs.ko with the new mount API support, we found a few additional cases where we were logging errors, but not returning them to the user. For example: a) invalid security mechanisms b) invalid cache options c) unsupported rdma d) invalid smb dialect requested Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Acked-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-01-28io_uring: fix sqo ownership false positive warningPavel Begunkov1-2/+0
WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042 io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042 Call Trace: io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227 filp_close+0xb4/0x170 fs/open.c:1295 close_files fs/file.c:403 [inline] put_files_struct fs/file.c:418 [inline] put_files_struct+0x1cc/0x350 fs/file.c:415 exit_files+0x7e/0xa0 fs/file.c:435 do_exit+0xc22/0x2ae0 kernel/exit.c:820 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x427/0x20f0 kernel/signal.c:2773 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Now io_uring_cancel_task_requests() can be called not through file notes but directly, remove a WARN_ONCE() there that give us false positives. That check is not very important and we catch it in other places. Fixes: 84965ff8a84f0 ("io_uring: if we see flush on exit, cancel related tasks") Cc: [email protected] # 5.9+ Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-01-28io_uring: fix list corruption for splice file_getPavel Begunkov1-1/+2
kernel BUG at lib/list_debug.c:29! Call Trace: __list_add include/linux/list.h:67 [inline] list_add include/linux/list.h:86 [inline] io_file_get+0x8cc/0xdb0 fs/io_uring.c:6466 __io_splice_prep+0x1bc/0x530 fs/io_uring.c:3866 io_splice_prep fs/io_uring.c:3920 [inline] io_req_prep+0x3546/0x4e80 fs/io_uring.c:6081 io_queue_sqe+0x609/0x10d0 fs/io_uring.c:6628 io_submit_sqe fs/io_uring.c:6705 [inline] io_submit_sqes+0x1495/0x2720 fs/io_uring.c:6953 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9353 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_file_get() may be called from splice, and so REQ_F_INFLIGHT may already be set. Fixes: 02a13674fa0e8 ("io_uring: account io_uring internal files as REQ_F_INFLIGHT") Cc: [email protected] # 5.9+ Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-01-28cifs: fix mounts to subdirectories of targetSteve French1-0/+1
The "prefixpath" mount option needs to be ignored which was missed in the recent conversion to the new mount API (prefixpath would be set by the mount helper if mounting a subdirectory of the root of a share e.g. //server/share/subdir) Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Suggested-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]> Reviewed-by: Ronnie Sahlberg <[email protected]>