aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-02-08btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_nodeJosef Bacik1-7/+0
Zygo reported the following panic when testing my error handling patches for relocation: kernel BUG at fs/btrfs/backref.c:2545! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 3 PID: 8472 Comm: btrfs Tainted: G W 14 Hardware name: QEMU Standard PC (i440FX + PIIX, Call Trace: btrfs_backref_error_cleanup+0x4df/0x530 build_backref_tree+0x1a5/0x700 ? _raw_spin_unlock+0x22/0x30 ? release_extent_buffer+0x225/0x280 ? free_extent_buffer.part.52+0xd7/0x140 relocate_tree_blocks+0x2a6/0xb60 ? kasan_unpoison_shadow+0x35/0x50 ? do_relocation+0xc10/0xc10 ? kasan_kmalloc+0x9/0x10 ? kmem_cache_alloc_trace+0x6a3/0xcb0 ? free_extent_buffer.part.52+0xd7/0x140 ? rb_insert_color+0x342/0x360 ? add_tree_block.isra.36+0x236/0x2b0 relocate_block_group+0x2eb/0x780 ? merge_reloc_roots+0x470/0x470 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x18f0 ? pvclock_clocksource_read+0xeb/0x190 ? btrfs_relocate_chunk+0x120/0x120 ? lock_contended+0x620/0x6e0 ? do_raw_spin_lock+0x1e0/0x1e0 ? do_raw_spin_unlock+0xa8/0x140 btrfs_ioctl_balance+0x1f9/0x460 btrfs_ioctl+0x24c8/0x4380 ? __kasan_check_read+0x11/0x20 ? check_chain_key+0x1f4/0x2f0 ? __asan_loadN+0xf/0x20 ? btrfs_ioctl_get_supported_features+0x30/0x30 ? kvm_sched_clock_read+0x18/0x30 ? check_chain_key+0x1f4/0x2f0 ? lock_downgrade+0x3f0/0x3f0 ? handle_mm_fault+0xad6/0x2150 ? do_vfs_ioctl+0xfc/0x9d0 ? ioctl_file_clone+0xe0/0xe0 ? check_flags.part.50+0x6c/0x1e0 ? check_flags.part.50+0x6c/0x1e0 ? check_flags+0x26/0x30 ? lock_is_held_type+0xc3/0xf0 ? syscall_enter_from_user_mode+0x1b/0x60 ? do_syscall_64+0x13/0x80 ? rcu_read_lock_sched_held+0xa1/0xd0 ? __kasan_check_read+0x11/0x20 ? __fget_light+0xae/0x110 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This occurs because of this check if (RB_EMPTY_NODE(&upper->rb_node)) BUG_ON(!list_empty(&node->upper)); As we are dropping the backref node, if we discover that our upper node in the edge we just cleaned up isn't linked into the cache that we are now done with this node, thus the BUG_ON(). However this is an erroneous assumption, as we will look up all the references for a node first, and then process the pending edges. All of the 'upper' nodes in our pending edges won't be in the cache's rb_tree yet, because they haven't been processed. We could very well have many edges still left to cleanup on this node. The fact is we simply do not need this check, we can just process all of the edges only for this node, because below this check we do the following if (list_empty(&upper->lower)) { list_add_tail(&upper->lower, &cache->leaves); upper->lowest = 1; } If the upper node truly isn't used yet, then we add it to the cache->leaves list to be cleaned up later. If it is still used then the last child node that has it linked into its node will add it to the leaves list and then it will be cleaned up. Fix this problem by dropping this logic altogether. With this fix I no longer see the panic when testing with error injection in the backref code. CC: [email protected] # 4.4+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: keep track of the root owner for relocation readsJosef Bacik1-3/+43
While testing the error paths in relocation, I hit the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.10.0-rc3+ #206 Not tainted ------------------------------------------------------ btrfs-balance/1571 is trying to acquire lock: ffff8cdbcc8f77d0 (&head_ref->mutex){+.+.}-{3:3}, at: btrfs_lookup_extent_info+0x156/0x3b0 but task is already holding lock: ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-tree-00){++++}-{3:3}: down_write_nested+0x43/0x80 __btrfs_tree_lock+0x27/0x100 btrfs_search_slot+0x248/0x890 relocate_tree_blocks+0x490/0x650 relocate_block_group+0x1ba/0x5d0 kretprobe_trampoline+0x0/0x50 -> #1 (btrfs-csum-01){++++}-{3:3}: down_read_nested+0x43/0x130 __btrfs_tree_read_lock+0x27/0x100 btrfs_read_lock_root_node+0x31/0x40 btrfs_search_slot+0x5ab/0x890 btrfs_del_csums+0x10b/0x3c0 __btrfs_free_extent+0x49d/0x8e0 __btrfs_run_delayed_refs+0x283/0x11f0 btrfs_run_delayed_refs+0x86/0x220 btrfs_start_dirty_block_groups+0x2ba/0x520 kretprobe_trampoline+0x0/0x50 -> #0 (&head_ref->mutex){+.+.}-{3:3}: __lock_acquire+0x1167/0x2150 lock_acquire+0x116/0x3e0 __mutex_lock+0x7e/0x7b0 btrfs_lookup_extent_info+0x156/0x3b0 walk_down_proc+0x1c3/0x280 walk_down_tree+0x64/0xe0 btrfs_drop_subtree+0x182/0x260 do_relocation+0x52e/0x660 relocate_tree_blocks+0x2ae/0x650 relocate_block_group+0x1ba/0x5d0 kretprobe_trampoline+0x0/0x50 other info that might help us debug this: Chain exists of: &head_ref->mutex --> btrfs-csum-01 --> btrfs-tree-00 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-tree-00); lock(btrfs-csum-01); lock(btrfs-tree-00); lock(&head_ref->mutex); *** DEADLOCK *** 5 locks held by btrfs-balance/1571: #0: ffff8cdb89749ff8 (&fs_info->delete_unused_bgs_mutex){+.+.}-{3:3}, at: btrfs_balance+0x563/0xf40 #1: ffff8cdb89748838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x156/0x300 #2: ffff8cdbc2c16650 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x413/0x5c0 #3: ffff8cdbc135f538 (btrfs-treloc-01){+.+.}-{3:3}, at: __btrfs_tree_lock+0x27/0x100 #4: ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100 stack backtrace: CPU: 1 PID: 1571 Comm: btrfs-balance Not tainted 5.10.0-rc3+ #206 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb0 check_noncircular+0xcf/0xf0 ? trace_call_bpf+0x139/0x260 __lock_acquire+0x1167/0x2150 lock_acquire+0x116/0x3e0 ? btrfs_lookup_extent_info+0x156/0x3b0 __mutex_lock+0x7e/0x7b0 ? btrfs_lookup_extent_info+0x156/0x3b0 ? btrfs_lookup_extent_info+0x156/0x3b0 ? release_extent_buffer+0x124/0x170 ? _raw_spin_unlock+0x1f/0x30 ? release_extent_buffer+0x124/0x170 btrfs_lookup_extent_info+0x156/0x3b0 walk_down_proc+0x1c3/0x280 walk_down_tree+0x64/0xe0 btrfs_drop_subtree+0x182/0x260 do_relocation+0x52e/0x660 relocate_tree_blocks+0x2ae/0x650 ? add_tree_block+0x149/0x1b0 relocate_block_group+0x1ba/0x5d0 elfcorehdr_read+0x40/0x40 ? elfcorehdr_read+0x40/0x40 ? btrfs_balance+0x796/0xf40 ? __kthread_parkme+0x66/0x90 ? btrfs_balance+0xf40/0xf40 ? balance_kthread+0x37/0x50 ? kthread+0x137/0x150 ? __kthread_bind_mask+0x60/0x60 ? ret_from_fork+0x1f/0x30 As you can see this is bogus, we never take another tree's lock under the csum lock. This happens because sometimes we have to read tree blocks from disk without knowing which root they belong to during relocation. We defaulted to an owner of 0, which translates to an fs tree. This is fine as all fs trees have the same class, but obviously isn't fine if the block belongs to a COW only tree. Thankfully COW only trees only have their owners root as a reference to them, and since we already look up the extent information during relocation, go ahead and check and see if this block might belong to a COW only tree, and if so save the owner in the tree_block struct. This allows us to read_tree_block with the proper owner, which gets rid of this lockdep splat. Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: introduce helper to grab an existing extent buffer from a pageQu Wenruo1-20/+30
This patch will extract the code to grab an extent buffer from a page into a helper, grab_extent_buffer_from_page(). This reduces one indent level, and provides the work place for later expansion for subapge support. Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: update comment for btrfs_dirty_pagesQu Wenruo1-6/+5
The original comment is from the initial merge, which has several problems: - No holes check any more - No inline decision is made Update the out-of-date comment with more correct one. Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: refactor __extent_writepage_io() to improve readabilityQu Wenruo1-21/+16
The refactoring involves the following modifications: - iosize alignment In fact we don't really need to manually do alignment at all. All extent maps should already be aligned, thus basic ASSERT() check would be enough. - redundant variables We have extra variable like blocksize/pg_offset/end. They are all unnecessary. @blocksize can be replaced by sectorsize size directly, and it's only used to verify the em start/size is aligned. @pg_offset can be easily calculated using @cur and page_offset(page). @end is just assigned from @page_end and never modified, use "start + PAGE_SIZE - 1" directly and remove @page_end. - remove some BUG_ON()s The BUG_ON()s are for extent map, which we have tree-checker to check on-disk extent data item and runtime check. ASSERT() should be enough. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: rename parameter offset to disk_bytenr in submit_extent_pageQu Wenruo1-15/+15
The parameter offset is confusing, it's supposed to be the disk bytenr of metadata/data. Rename it to disk_bytenr and update the comment. Also rename each offset passed to submit_extent_page() as @disk_bytenr so they're consistent. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: refactor btrfs_dec_test_* functions for ordered extentsQu Wenruo3-55/+65
The refactoring involves the following modifications: - Return bool instead of int - Parameter update for @cached of btrfs_dec_test_first_ordered_pending() For btrfs_dec_test_first_ordered_pending(), @cached is only used to return the finished ordered extent. Rename it to @finished_ret. - Comment updates * Change one stale comment Which still refers to btrfs_dec_test_ordered_pending(), but the context is calling btrfs_dec_test_first_ordered_pending(). * Follow the common comment style for both functions Add more detailed descriptions for parameters and the return value * Move the reason why test_and_set_bit() is used into the call sites - Change how the return value is calculated The most anti-human part of the return value is: if (...) ret = 1; ... return ret == 0; This means, when we set ret to 1, the function returns 0. Change the local variable name to @finished, and directly return the value of it. Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: make btrfs_dio_private::bytes u32Qu Wenruo1-1/+2
btrfs_dio_private::bytes is only assigned from bio::bi_iter::bi_size, which is never larger than U32. Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: remove always true condition in btrfs_start_delalloc_rootsNikolay Borisov1-1/+1
Following the rework in e076ab2a2ca7 ("btrfs: shrink delalloc pages instead of full inodes") the nr variable is no longer passed by reference to start_delalloc_inodes hence it cannot change. Additionally we are always guaranteed for it to be positive number hence it's redundant to have it as a condition in the loop. Simply remove that usage. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: make btrfs_start_delalloc_root's nr argument a longNikolay Borisov5-7/+8
It's currently u64 which gets instantly translated either to LONG_MAX (if U64_MAX is passed) or cast to an unsigned long (which is in fact, wrong because writeback_control::nr_to_write is a signed, long type). Just convert the function's argument to be long time which obviates the need to manually convert u64 value to a long. Adjust all call sites which pass U64_MAX to pass LONG_MAX. Finally ensure that in shrink_delalloc the u64 is converted to a long without overflowing, resulting in a negative number. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: send: remove stale code when checking for shared extentsFilipe Manana1-16/+0
After commit 040ee6120cb670 ("Btrfs: send, improve clone range") we do not use anymore the data_offset field of struct backref_ctx, as after that we do all the necessary checks for the data offset of file extent items at clone_range(). Since there are no more users of data_offset from that structure, remove it. Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: consolidate btrfs_previous_item ret val handling in btrfs_shrink_deviceNikolay Borisov1-4/+3
Instead of having three 'if' to handle non-NULL return value consolidate this in one 'if (ret)'. That way the code is more obvious: - Always drop delete_unused_bgs_mutex if ret is not NULL - If ret is negative -> goto done - If it's 1 -> reset ret to 0, release the path and finish the loop. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: ref-verify: make sure owner is set for all refsJosef Bacik1-13/+10
I noticed that shared ref entries in ref-verify didn't have the proper owner set, which caused me to think there was something seriously wrong. However the problem is if we have a parent we simply weren't filling out the owner part of the reference, even though we have it. Fix this by making sure we set all the proper fields when we modify a reference, this way we'll have the proper owner if a problem happens and we don't waste time thinking we're updating the wrong level. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: ref-verify: pass down tree block level when building refsJosef Bacik1-8/+12
I noticed that sometimes I would have the wrong level printed out with ref-verify while testing some error injection related problems. This is because we only get the level from the main extent item, but our references could go off the current leaf into another, and at that point we lose our level. Fix this by keeping track of the last tree block level that we found, the same way we keep track of our bytenr and num_bytes, in case we happen to wander into another leaf while still processing the references for a bytenr. Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: noinline btrfs_should_cancel_balanceJosef Bacik1-1/+1
I was attempting to reproduce a problem that Zygo hit, but my error injection wasn't firing for a few of the common calls to btrfs_should_cancel_balance. This is because the compiler decided to inline it at these spots. Keep this from happening by explicitly marking the function as noinline so that error injection will always work. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: allow error injection for btrfs_search_slot and btrfs_cow_blockJosef Bacik1-0/+2
The following patches are going to address error handling in relocation, in order to test those patches I need to be able to inject errors in btrfs_search_slot and btrfs_cow_block, as we call both of these pretty often in different cases during relocation. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: remove new_dirid argument from btrfs_create_subvol_rootNikolay Borisov3-7/+4
It's no longer used. While at it also remove new_dirid in create_subvol as it's used in a single place and open code it. No functional changes. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: make btrfs_root::free_objectid hold the next available objectidNikolay Borisov3-10/+10
Adjust the way free_objectid is being initialized, it now stores BTRFS_FIRST_FREE_OBJECTID rather than the, somewhat arbitrary, BTRFS_FIRST_FREE_OBJECTID - 1. This change also has the added benefit that now it becomes unnecessary to explicitly initialize free_objectid for a newly create fs root. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: rename btrfs_root::highest_objectid to free_objectidNikolay Borisov3-9/+9
This reflects the true purpose of the member as it's being used solely in context where a new objectid is being allocated. Future changes will also change the way it's being used to closely follow this semantics. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: rename btrfs_find_free_objectid to btrfs_get_free_objectidNikolay Borisov7-12/+12
This better reflects the semantics of the function i.e no search is performed whatsoever. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: rename btrfs_find_highest_objectid to btrfs_init_root_free_objectidNikolay Borisov3-10/+7
This function is used to initialize the in-memory btrfs_root::highest_objectid member, which is used to get an available objectid. Rename it to better reflect its semantics. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: cleanup local variables in btrfs_file_write_iterNikolay Borisov1-9/+7
First replace all inode instances with a pointer to btrfs_inode. This removes multiple invocations of the BTRFS_I macro, subsequently remove 2 local variables as they are called only once and simply refer to them directly. Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: clarify error returns values in __load_free_space_cacheZhihao Cheng1-1/+5
Return value in __load_free_space_cache is not properly set after (unlikely) memory allocation failures and 0 is returned instead. This is not a problem for the caller load_free_space_cache because only value 1 is considered as 'cache loaded' but for clarity it's better to set the errors accordingly. Fixes: a67509c30079 ("Btrfs: add a io_ctl struct and helpers for dealing with the space cache") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Zhihao Cheng <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-08btrfs: fix error handling in commit_fs_rootsJosef Bacik1-5/+6
While doing error injection I would sometimes get a corrupt file system. This is because I was injecting errors at btrfs_search_slot, but would only do it one time per stack. This uncovered a problem in commit_fs_roots, where if we get an error we would just break. However we're in a nested loop, the first loop being a loop to find all the dirty fs roots, and then subsequent root updates would succeed clearing the error value. This isn't likely to happen in real scenarios, however we could potentially get a random ENOMEM once and then not again, and we'd end up with a corrupted file system. Fix this by moving the error checking around a bit to the main loop, as this is the only place where something will fail, and return the error as soon as it occurs. With this patch my reproducer no longer corrupts the file system. Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-02-06Merge tag '5.11-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds3-6/+36
Pull cifs fixes from Steve French: "Three small smb3 fixes for stable" * tag '5.11-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: report error instead of invalid when revalidating a dentry fails smb3: fix crediting for compounding when only one request in flight smb3: Fix out-of-bounds bug in SMB2_negotiate()
2021-02-06Merge tag 'io_uring-5.11-2021-02-05' of git://git.kernel.dk/linux-blockLinus Torvalds1-6/+3
Pull io_uring fixes from Jens Axboe: "Two small fixes that should go into 5.11: - task_work resource drop fix (Pavel) - identity COW fix (Xiaoguang)" * tag 'io_uring-5.11-2021-02-05' of git://git.kernel.dk/linux-block: io_uring: drop mm/files between task_work_submit io_uring: don't modify identity's files uncess identity is cowed
2021-02-05cifs: report error instead of invalid when revalidating a dentry failsAurelien Aptel1-2/+20
Assuming - //HOST/a is mounted on /mnt - //HOST/b is mounted on /mnt/b On a slow connection, running 'df' and killing it while it's processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS. This triggers the following chain of events: => the dentry revalidation fail => dentry is put and released => superblock associated with the dentry is put => /mnt/b is unmounted This patch makes cifs_d_revalidate() return the error instead of 0 (invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file deleted) and ESTALE (file recreated). Signed-off-by: Aurelien Aptel <[email protected]> Suggested-by: Shyam Prasad N <[email protected]> Reviewed-by: Shyam Prasad N <[email protected]> CC: [email protected] Signed-off-by: Steve French <[email protected]>
2021-02-05mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB pageMuchun Song1-1/+2
If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/[email protected] Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: Muchun Song <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-05smb3: fix crediting for compounding when only one request in flightPavel Shilovsky1-3/+15
Currently we try to guess if a compound request is going to succeed waiting for credits or not based on the number of requests in flight. This approach doesn't work correctly all the time because there may be only one request in flight which is going to bring multiple credits satisfying the compound request. Change the behavior to fail a request only if there are no requests in flight at all and proceed waiting for credits otherwise. Cc: <[email protected]> # 5.1+ Signed-off-by: Pavel Shilovsky <[email protected]> Reviewed-by: Tom Talpey <[email protected]> Reviewed-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-02-04io_uring: drop mm/files between task_work_submitPavel Begunkov1-0/+3
Since SQPOLL task can be shared and so task_work entries can be a mix of them, we need to drop mm and files before trying to issue next request. Cc: [email protected] # 5.10+ Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-04Merge tag 'ovl-fixes-5.11-rc7' of ↵Linus Torvalds9-35/+85
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: - Fix capability conversion and minor overlayfs bugs that are related to the unprivileged overlay mounts introduced in this cycle. - Fix two recent (v5.10) and one old (v4.10) bug. - Clean up security xattr copy-up (related to a SELinux regression). * tag 'ovl-fixes-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: implement volatile-specific fsync error behaviour ovl: skip getxattr of security labels ovl: fix dentry leak in ovl_get_redirect ovl: avoid deadlock on directory ioctl cap: fix conversions on getxattr ovl: perform vfs_getxattr() with mounter creds ovl: add warning on user_ns mismatch
2021-02-04io_uring: don't modify identity's files uncess identity is cowedXiaoguang Wang1-6/+0
Abaci Robot reported following panic: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 800000010ef3f067 P4D 800000010ef3f067 PUD 10d9df067 PMD 0 Oops: 0002 [#1] SMP PTI CPU: 0 PID: 1869 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc3+ #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:put_files_struct+0x1b/0x120 Code: 24 18 c7 00 f4 ff ff ff e9 4d fd ff ff 66 90 0f 1f 44 00 00 41 57 41 56 49 89 fe 41 55 41 54 55 53 48 83 ec 08 e8 b5 6b db ff 41 ff 0e 74 13 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 9c RSP: 0000:ffffc90002147d48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88810d9a5300 RCX: 0000000000000000 RDX: ffff88810d87c280 RSI: ffffffff8144ba6b RDI: 0000000000000000 RBP: 0000000000000080 R08: 0000000000000001 R09: ffffffff81431500 R10: ffff8881001be000 R11: 0000000000000000 R12: ffff88810ac2f800 R13: ffff88810af38a00 R14: 0000000000000000 R15: ffff8881057130c0 FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010dbaa002 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __io_clean_op+0x10c/0x2a0 io_dismantle_req+0x3c7/0x600 __io_free_req+0x34/0x280 io_put_req+0x63/0xb0 io_worker_handle_work+0x60e/0x830 ? io_wqe_worker+0x135/0x520 io_wqe_worker+0x158/0x520 ? __kthread_parkme+0x96/0xc0 ? io_worker_handle_work+0x830/0x830 kthread+0x134/0x180 ? kthread_create_worker_on_cpu+0x90/0x90 ret_from_fork+0x1f/0x30 Modules linked in: CR2: 0000000000000000 ---[ end trace c358ca86af95b1e7 ]--- I guess case below can trigger above panic: there're two threads which operates different io_uring ctxs and share same sqthread identity, and later one thread exits, io_uring_cancel_task_requests() will clear task->io_uring->identity->files to be NULL in sqpoll mode, then another ctx that uses same identity will panic. Indeed we don't need to clear task->io_uring->identity->files here, io_grab_identity() should handle identity->files changes well, if task->io_uring->identity->files is not equal to current->files, io_cow_identity() should handle this changes well. Cc: [email protected] # 5.5+ Reported-by: Abaci Robot <[email protected]> Signed-off-by: Xiaoguang Wang <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-02Merge tag 'net-5.11-rc7' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.11-rc7, including fixes from bpf and mac80211 trees. Current release - regressions: - ip_tunnel: fix mtu calculation - mlx5: fix function calculation for page trees Previous releases - regressions: - vsock: fix the race conditions in multi-transport support - neighbour: prevent a dead entry from updating gc_list - dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add Previous releases - always broken: - bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF cgroup getsockopt infra when user space is trying to race against optlen, from Loris Reiff. - bpf: add missing fput() in BPF inode storage map update helper - udp: ipv4: manipulate network header of NATed UDP GRO fraglist - mac80211: fix station rate table updates on assoc - r8169: work around RTL8125 UDP HW bug - igc: report speed and duplex as unknown when device is runtime suspended - rxrpc: fix deadlock around release of dst cached on udp tunnel" * tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary net: ipa: fix two format specifier errors net: ipa: use the right accessor in ipa_endpoint_status_skip() net: ipa: be explicit about endianness net: ipa: add a missing __iomem attribute net: ipa: pass correct dma_handle to dma_free_coherent() r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS net: mvpp2: TCAM entry enable should be written after SRAM data net: lapb: Copy the skb before sending a packet net/mlx5e: Release skb in case of failure in tc update skb net/mlx5e: Update max_opened_tc also when channels are closed net/mlx5: Fix leak upon failure of rule creation net/mlx5: Fix function calculation for page trees docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc udp: ipv4: manipulate network header of NATed UDP GRO fraglist net: ip_tunnel: fix mtu calculation vsock: fix the race conditions in multi-transport support net: sched: replaced invalid qdisc tree flush helper in qdisc_replace ibmvnic: device remove has higher precedence over reset ...
2021-02-01smb3: Fix out-of-bounds bug in SMB2_negotiate()Gustavo A. R. Silva1-1/+1
While addressing some warnings generated by -Warray-bounds, I found this bug that was introduced back in 2017: CC [M] fs/cifs/smb2pdu.o fs/cifs/smb2pdu.c: In function ‘SMB2_negotiate’: fs/cifs/smb2pdu.c:822:16: warning: array subscript 1 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 822 | req->Dialects[1] = cpu_to_le16(SMB30_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:823:16: warning: array subscript 2 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 823 | req->Dialects[2] = cpu_to_le16(SMB302_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:824:16: warning: array subscript 3 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 824 | req->Dialects[3] = cpu_to_le16(SMB311_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:816:16: warning: array subscript 1 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 816 | req->Dialects[1] = cpu_to_le16(SMB302_PROT_ID); | ~~~~~~~~~~~~~^~~ At the time, the size of array _Dialects_ was changed from 1 to 3 in struct validate_negotiate_info_req, and then in 2019 it was changed from 3 to 4, but those changes were never made in struct smb2_negotiate_req, which has led to a 3 and a half years old out-of-bounds bug in function SMB2_negotiate() (fs/cifs/smb2pdu.c). Fix this by increasing the size of array _Dialects_ in struct smb2_negotiate_req to 4. Fixes: 9764c02fcbad ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)") Fixes: d5c7076b772a ("smb3: add smb3.1.1 to default dialect list") Cc: [email protected] Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-01-31Merge tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-25/+44
Pull NFS client fixes from Trond Myklebust: - SUNRPC: Handle 0 length opaque XDR object data properly - Fix a layout segment leak in pnfs_layout_process() - pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn - pNFS/NFSv4: Improve rejection of out-of-order layouts - pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() * tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Handle 0 length opaque XDR object data properly SUNRPC: Move simple_get_bytes and simple_get_netobj into private header pNFS/NFSv4: Improve rejection of out-of-order layouts pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
2021-01-30Merge tag '5.11-rc5-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds6-20/+81
Pull cifs fixes from Steve French: "Four cifs patches found in additional testing of the conversion to the new mount API: three small option processing ones, and one fixing domain based DFS referrals" * tag '5.11-rc5-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix dfs domain referrals cifs: returning mount parm processing errors correctly cifs: fix mounts to subdirectories of target cifs: ignore auto and noauto options if given
2021-01-29rxrpc: Fix deadlock around release of dst cached on udp tunnelDavid Howells1-3/+3
AF_RXRPC sockets use UDP ports in encap mode. This causes socket and dst from an incoming packet to get stolen and attached to the UDP socket from whence it is leaked when that socket is closed. When a network namespace is removed, the wait for dst records to be cleaned up happens before the cleanup of the rxrpc and UDP socket, meaning that the wait never finishes. Fix this by moving the rxrpc (and, by dependence, the afs) private per-network namespace registrations to the device group rather than subsys group. This allows cached rxrpc local endpoints to be cleared and their UDP sockets closed before we try waiting for the dst records. The symptom is that lines looking like the following: unregister_netdevice: waiting for lo to become free get emitted at regular intervals after running something like the referenced syzbot test. Thanks to Vadim for tracking this down and work out the fix. Reported-by: [email protected] Reported-by: Vadim Fedorenko <[email protected]> Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells <[email protected]> Acked-by: Vadim Fedorenko <[email protected]> Link: https://lore.kernel.org/r/161196443016.3868642.5577440140646403533.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-29Merge tag 'for-5.11-rc5-tag' of ↵Linus Torvalds6-51/+46
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes for a late rc: - fix lockdep complaint on 32bit arches and also remove an unsafe memory use due to device vs filesystem lifetime - two fixes for free space tree: * race during log replay and cache rebuild, now more likely to happen due to changes in this dev cycle * possible free space tree corruption with online conversion during initial tree population" * tag 'for-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix log replay failure due to race with space cache rebuild btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch btrfs: fix possible free space tree corruption with online conversion
2021-01-29Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+9
Pull block fixes from Jens Axboe: "All over the place fixes for this release: - blk-cgroup iteration teardown resched fix (Baolin) - NVMe pull request from Christoph: - add another Write Zeroes quirk (Chaitanya Kulkarni) - handle a no path available corner case (Daniel Wagner) - use the proper RCU aware list_add helper (Chao Leng) - bcache regression fix (Coly) - bdev->bd_size_lock IRQ fix. This will be fixed in drivers for 5.12, but for now, we'll make it IRQ safe (Damien) - null_blk zoned init fix (Damien) - add_partition() error handling fix (Dinghao) - s390 dasd kobject fix (Jan) - nbd fix for freezing queue while adding connections (Josef) - tag queueing regression fix (Ming) - revert of a patch that inadvertently meant that we regressed write performance on raid (Maxim)" * tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block: null_blk: cleanup zoned mode initialization nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head nvme-multipath: Early exit if no path is available nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES block: fix bd_size_lock use blk-cgroup: Use cond_resched() when destroy blkgs Revert "block: simplify set_init_blocksize" to regain lost performance nbd: freeze the queue while we're adding connections s390/dasd: Fix inconsistent kobject removal block: Fix an error handling in add_partition blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
2021-01-29Merge tag 'io_uring-5.11-2021-01-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-42/+53
Pull io_uring fixes from Jens Axboe: "We got the cancelation story sorted now, so for all intents and purposes, this should be it for 5.11 outside of any potential little fixes that may come in. This contains: - task_work task state fixes (Hao, Pavel) - Cancelation fixes (me, Pavel) - Fix for an inflight req patch in this release (Pavel) - Fix for a lock deadlock issue (Pavel)" * tag 'io_uring-5.11-2021-01-29' of git://git.kernel.dk/linux-block: io_uring: reinforce cancel on flush during exit io_uring: fix sqo ownership false positive warning io_uring: fix list corruption for splice file_get io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE io_uring: fix wqe->lock/completion_lock deadlock io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE io_uring: only call io_cqring_ev_posted() if events were posted io_uring: if we see flush on exit, cancel related tasks
2021-01-28cifs: fix dfs domain referralsRonnie Sahlberg6-16/+75
The new mount API requires additional changes to how DFS is handled. Additional testing of DFS uncovered problems with domain based DFS referrals (a follow on patch addresses DFS links) which this patch addresses. Signed-off-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-01-28Merge tag 'ecryptfs-5.11-rc6-setxattr-fix' of ↵Linus Torvalds1-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fix from Tyler Hicks: "Fix a regression that resulted in two rounds of UID translations when setting v3 namespaced file capabilities in some configurations" * tag 'ecryptfs-5.11-rc6-setxattr-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: fix uid translation for setxattr on security.capability
2021-01-28io_uring: reinforce cancel on flush during exitPavel Begunkov1-2/+1
What 84965ff8a84f0 ("io_uring: if we see flush on exit, cancel related tasks") really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably. That can be achieved by io_uring_cancel_files(), but we'll miss it calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(), because it will go through __io_uring_cancel_task_requests(). Just always call io_uring_cancel_files() during cancel, it's good enough for now. Cc: [email protected] # 5.9+ Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-01-28cifs: returning mount parm processing errors correctlySteve French1-4/+4
During additional testing of the updated cifs.ko with the new mount API support, we found a few additional cases where we were logging errors, but not returning them to the user. For example: a) invalid security mechanisms b) invalid cache options c) unsupported rdma d) invalid smb dialect requested Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Acked-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-01-28io_uring: fix sqo ownership false positive warningPavel Begunkov1-2/+0
WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042 io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042 Call Trace: io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227 filp_close+0xb4/0x170 fs/open.c:1295 close_files fs/file.c:403 [inline] put_files_struct fs/file.c:418 [inline] put_files_struct+0x1cc/0x350 fs/file.c:415 exit_files+0x7e/0xa0 fs/file.c:435 do_exit+0xc22/0x2ae0 kernel/exit.c:820 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x427/0x20f0 kernel/signal.c:2773 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Now io_uring_cancel_task_requests() can be called not through file notes but directly, remove a WARN_ONCE() there that give us false positives. That check is not very important and we catch it in other places. Fixes: 84965ff8a84f0 ("io_uring: if we see flush on exit, cancel related tasks") Cc: [email protected] # 5.9+ Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-01-28io_uring: fix list corruption for splice file_getPavel Begunkov1-1/+2
kernel BUG at lib/list_debug.c:29! Call Trace: __list_add include/linux/list.h:67 [inline] list_add include/linux/list.h:86 [inline] io_file_get+0x8cc/0xdb0 fs/io_uring.c:6466 __io_splice_prep+0x1bc/0x530 fs/io_uring.c:3866 io_splice_prep fs/io_uring.c:3920 [inline] io_req_prep+0x3546/0x4e80 fs/io_uring.c:6081 io_queue_sqe+0x609/0x10d0 fs/io_uring.c:6628 io_submit_sqe fs/io_uring.c:6705 [inline] io_submit_sqes+0x1495/0x2720 fs/io_uring.c:6953 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9353 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_file_get() may be called from splice, and so REQ_F_INFLIGHT may already be set. Fixes: 02a13674fa0e8 ("io_uring: account io_uring internal files as REQ_F_INFLIGHT") Cc: [email protected] # 5.9+ Reported-by: [email protected] Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-01-28cifs: fix mounts to subdirectories of targetSteve French1-0/+1
The "prefixpath" mount option needs to be ignored which was missed in the recent conversion to the new mount API (prefixpath would be set by the mount helper if mounting a subdirectory of the root of a share e.g. //server/share/subdir) Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Suggested-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]> Reviewed-by: Ronnie Sahlberg <[email protected]>
2021-01-28cifs: ignore auto and noauto options if givenAdam Harvey1-0/+1
In 24e0a1eff9e2, the noauto and auto options were missed when migrating to the new mount API. As a result, users with noauto in their fstab mount options are now unable to mount cifs filesystems, as they'll receive an "Unknown parameter" error. This restores the old behaviour of ignoring noauto and auto if they're given. Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Signed-off-by: Adam Harvey <[email protected]> Reviewed-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-01-28ovl: implement volatile-specific fsync error behaviourSargun Dhillon6-11/+63
Overlayfs's volatile option allows the user to bypass all forced sync calls to the upperdir filesystem. This comes at the cost of safety. We can never ensure that the user's data is intact, but we can make a best effort to expose whether or not the data is likely to be in a bad state. The best way to handle this in the time being is that if an overlayfs's upperdir experiences an error after a volatile mount occurs, that error will be returned on fsync, fdatasync, sync, and syncfs. This is contradictory to the traditional behaviour of VFS which fails the call once, and only raises an error if a subsequent fsync error has occurred, and been raised by the filesystem. One awkward aspect of the patch is that we have to manually set the superblock's errseq_t after the sync_fs callback as opposed to just returning an error from syncfs. This is because the call chain looks something like this: sys_syncfs -> sync_filesystem -> __sync_filesystem -> /* The return value is ignored here sb->s_op->sync_fs(sb) _sync_blockdev /* Where the VFS fetches the error to raise to userspace */ errseq_check_and_advance Because of this we call errseq_set every time the sync_fs callback occurs. Due to the nature of this seen / unseen dichotomy, if the upperdir is an inconsistent state at the initial mount time, overlayfs will refuse to mount, as overlayfs cannot get a snapshot of the upperdir's errseq that will increment on error until the user calls syncfs. Signed-off-by: Sargun Dhillon <[email protected]> Suggested-by: Amir Goldstein <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Fixes: c86243b090bc ("ovl: provide a mount option "volatile"") Cc: [email protected] Reviewed-by: Vivek Goyal <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-01-28ovl: skip getxattr of security labelsAmir Goldstein1-7/+8
When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will intercept in inode_getxattr hooks. When selinux LSM is installed but not initialized, it will list the security.selinux xattr in inode_listsecurity, but will not intercept it in inode_getxattr. This results in -ENODATA for a getxattr call for an xattr returned by listxattr. This situation was manifested as overlayfs failure to copy up lower files from squashfs when selinux is built-in but not initialized, because ovl_copy_xattr() iterates the lower inode xattrs by vfs_listxattr() and vfs_getxattr(). ovl_copy_xattr() skips copy up of security labels that are indentified by inode_copy_up_xattr LSM hooks, but it does that after vfs_getxattr(). Since we are not going to copy them, skip vfs_getxattr() of the security labels. Reported-by: Michael Labriola <[email protected]> Tested-by: Michael Labriola <[email protected]> Link: https://lore.kernel.org/linux-unionfs/[email protected]/ Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>