blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-08-21	smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()	David Hildenbrand	3	-5/+8
	We shouldn't be using a GUP-internal helper if it can be avoided. Similar to smaps_pte_entry() that uses vm_normal_page(), let's use vm_normal_page_pmd() that similarly refuses to return the huge zeropage. In contrast to follow_trans_huge_pmd(), vm_normal_page_pmd(): (1) Will always return the head page, not a tail page of a THP. If we'd ever call smaps_account with a tail page while setting "compound = true", we could be in trouble, because smaps_account() would look at the memmap of unrelated pages. If we're unlucky, that memmap does not exist at all. Before we removed PG_doublemap, we could have triggered something similar as in commit 24d7275ce279 ("fs/proc: task_mmu.c: don't read mapcount for migration entry"). This can theoretically happen ever since commit ff9f47f6f00c ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock"): (a) We're in show_smaps_rollup() and processed a VMA (b) We release the mmap lock in show_smaps_rollup() because it is contended (c) We merged that VMA with another VMA (d) We collapsed a THP in that merged VMA at that position If the end address of the original VMA falls into the middle of a THP area, we would call smap_gather_stats() with a start address that falls into a PMD-mapped THP. It's probably very rare to trigger when not really forced. (2) Will succeed on a is_pci_p2pdma_page(), like vm_normal_page() Treat such PMDs here just like smaps_pte_entry() would treat such PTEs. If such pages would be anonymous, we most certainly would want to account them. (3) Will skip over pmd_devmap(), like vm_normal_page() for pte_devmap() As noted in vm_normal_page(), that is only for handling legacy ZONE_DEVICE pages. So just like smaps_pte_entry(), we'll now also ignore such PMD entries. Especially, follow_pmd_mask() never ends up calling follow_trans_huge_pmd() on pmd_devmap(). Instead it calls follow_devmap_pmd() -- which will fail if neither FOLL_GET nor FOLL_PIN is set. So skipping pmd_devmap() pages seems to be the right thing to do. (4) Will properly handle VM_MIXEDMAP/VM_PFNMAP, like vm_normal_page() We won't be returning a memmap that should be ignored by core-mm, or worse, a memmap that does not even exist. Note that while walk_page_range() will skip VM_PFNMAP mappings, walk_page_vma() won't. Most probably this case doesn't currently really happen on the PMD level, otherwise we'd already be able to trigger kernel crashes when reading smaps / smaps_rollup. So most probably only (1) is relevant in practice as of now, but could only cause trouble in extreme corner cases. Let's move follow_trans_huge_pmd() to mm/internal.h to discourage future reuse in wrong context. Link: https://lkml.kernel.org/r/[email protected] Fixes: ff9f47f6f00c ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock") Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: liubo <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-21	mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT	David Hildenbrand	4	-14/+49
	Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") missed that follow_page() and follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting or due to inaccessible (PROT_NONE) VMAs. As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from gup/gup_fast"): "Other follow_page callers like KSM should not use FOLL_NUMA, or they would fail to get the pages if they use follow_page instead of get_user_pages." liubo reported [1] that smaps_rollup results are imprecise, because they miss accounting of pages that are mapped PROT_NONE. Further, it's easy to reproduce that KSM no longer works on inaccessible VMAs on x86-64, because pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs, and follow_page() refuses to return such pages right now. As KVM really depends on these NUMA hinting faults, removing the pte_protnone()/pmd_protnone() handling in GUP code completely is not really an option. To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT to restore the original behavior for now and add better comments. Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in is_valid_gup_args(), to add that flag for all external GUP users. Note that there are three GUP-internal __get_user_pages() users that don't end up calling is_valid_gup_args() and consequently won't get FOLL_HONOR_NUMA_FAULT set. 1) get_dump_page(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE and wouldn't have honored NUMA hinting faults already. 2) populate_vma_page_range(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have honored NUMA hinting faults already. 3) faultin_vma_page_range(): we similarly don't want to handle NUMA hinting faults. To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in inaccessible VMAs properly, we have to perform VMA accessibility checks in gup_can_follow_protnone(). As GUP-fast should reject such pages either way in pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and arm64 that both implement pte_protnone() -- let's just always fallback to ordinary GUP when stumbling over pte_protnone()/pmd_protnone(). As Linus notes [2], honoring NUMA faults might only make sense for selected GUP users. So we should really see if we can instead let relevant GUP callers specify it manually, and not trigger NUMA hinting faults from GUP as default. Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and adding appropriate documenation. While at it, remove a stale comment from follow_trans_huge_pmd(): That comment for pmd_protnone() was added in commit 2b4847e73004 ("mm: numa: serialise parallel get_user_page against THP migration"), which noted: THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages Nowadays, we do have PMD migration entries, so the comment no longer applies. Let's drop it. [1] https://lore.kernel.org/r/[email protected] [2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=g@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") Signed-off-by: David Hildenbrand <[email protected]> Reported-by: liubo <[email protected]> Closes: https://lore.kernel.org/r/[email protected] Reported-by: Peter Xu <[email protected]> Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/ Acked-by: Mel Gorman <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Shuah Khan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-21	nsproxy: Convert nsproxy.count to refcount_t	Elena Reshetova	2	-6/+5
	atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nsproxy.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in refcount.h have different memory ordering guarantees than their atomic counterparts. Please check Documentation/core-api/refcount-vs-atomic.rst for more information. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nsproxy.count it might make a difference in following places: - put_nsproxy() and switch_task_namespaces(): decrement in refcount_dec_and_test() only provides RELEASE ordering and ACQUIRE ordering on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Reviewed-by: David Windsor <[email protected]> Reviewed-by: Hans Liljestrand <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2023-08-21	irqchip: Add support for Amlogic-C3 SoCs	Huqiang Qin	1	-0/+5
	The Amlogic-C3 SoCs support 12 GPIO IRQ lines compared with previous serial chips and have something different, details are as below. IRQ Number: - 54 1 pins on bank TESTN - 53:40 14 pins on bank X - 39:33 7 pins on bank D - 32:27 6 pins on bank A - 26:22 5 pins on bank E - 21:15 7 pins on bank C - 14:0 15 pins on bank B Signed-off-by: Huqiang Qin <[email protected]> Acked-by: Martin Blumenstingl <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	dt-bindings: interrupt-controller: Add support for Amlogic-C3 SoCs	Huqiang Qin	1	-0/+1
	Update dt-binding document for GPIO interrupt controller of Amlogic-C3 SoCs Signed-off-by: Huqiang Qin <[email protected]> Acked-by: Conor Dooley <[email protected]> Acked-by: Martin Blumenstingl <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchip/irq-mvebu-sei: Use devm_platform_get_and_ioremap_resource()	Yangtao Li	1	-2/+1
	Convert platform_get_resource(), devm_ioremap_resource() to a single call to devm_platform_get_and_ioremap_resource(), as this is exactly what this function does. Signed-off-by: Yangtao Li <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchip/ls-scfg-msi: Use devm_platform_get_and_ioremap_resource()	Yangtao Li	1	-2/+1
	Convert platform_get_resource(), devm_ioremap_resource() to a single call to devm_platform_get_and_ioremap_resource(), as this is exactly what this function does. Signed-off-by: Yangtao Li <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchip: Explicitly include correct DT includes	Rob Herring	23	-28/+18
	The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchip/orion: Use of_address_count() helper	Yang Yingliang	1	-2/+1
	After commit 16988c742968 ("of/address: introduce of_address_count() helper"), Use of_address_count() to instead of open-coding it, it's no functional change. Signed-off-by: Yang Yingliang <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchip/irq-pruss-intc: Do not check for 0 return after calling ↵	Ruan Jinjie	1	-2/+2
	platform_get_irq() It is not possible for platform_get_irq() to return 0. Use the return value from platform_get_irq(). Signed-off-by: Ruan Jinjie <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchip/imx-mu-msi: Do not check for 0 return after calling platform_get_irq()	Ruan Jinjie	1	-2/+2
	It is not possible for platform_get_irq() to return 0. Use the return value from platform_get_irq(). Signed-off-by: Ruan Jinjie <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchipr/i8259: Mark i8259_of_init() static	Arnd Bergmann	1	-1/+1
	i8259_of_init() is only used as an initcall and does not need to be global, so mark it static to avoid: drivers/irqchip/irq-i8259.c:343:12: warning: no previous prototype for 'i8259_of_init' [-Wmissing-prototypes] Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchip/mips-gic: Mark gic_irq_domain_free() static	Arnd Bergmann	1	-1/+1
	This function is only used locally and should be static to avoid a warning: drivers/irqchip/irq-mips-gic.c:560:6: error: no previous prototype for 'gic_irq_domain_free' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Reviewed-by: Serge Semin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchip/xtensa-pic: Include header for xtensa_pic_init_legacy()	Arnd Bergmann	1	-0/+1
	The declaration for this function is not included, which leads to a harmless warning: drivers/irqchip/irq-xtensa-pic.c:91:12: error: no previous prototype for 'xtensa_pic_init_legacy' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Max Filippov <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	irqchip/loongson-eiointc: Fix return value checking of eiointc_index	Bibo Mao	1	-1/+1
	Return value of function eiointc_index is int, however it is converted into uint32_t and then compared smaller than zero, this will cause logic problem. Fixes: dd281e1a1a93 ("irqchip: Add Loongson Extended I/O interrupt controller support") Signed-off-by: Bibo Mao <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21	ice: Fix NULL pointer deref during VF reset	Petr Oros	1	-7/+8
	During stress test with attaching and detaching VF from KVM and simultaneously changing VFs spoofcheck and trust there was a NULL pointer dereference in ice_reset_vf that VF's VSI is null. More than one instance of ice_reset_vf() can be running at a given time. When we rebuild the VSI in ice_reset_vf, another reset can be triaged from ice_service_task. In this case we can access the currently uninitialized VSI and cause panic. The window for this racing condition has been around for a long time but it's much worse after commit 227bf4500aaa ("ice: move VSI delete outside deconfig") because the reset runs faster. ice_reset_vf() using vf->cfg_lock and when we move this lock before accessing to the VF VSI, we can fix BUG for all cases. Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes in ice_vsi_stop_all_rx_rings() With our reproducer, we can hit BUG: ~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). ~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). After this fix we are not able to reproduce it after ~48h There was commit cf90b74341ee ("ice: Fix call trace with null VSI during VF reset") which also tried to fix this issue, but it was only partially resolved and the bug still exists. [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 6420.665382] #PF: supervisor read access in kernel mode [ 6420.670521] #PF: error_code(0x0000) - not-present page [ 6420.675659] PGD 0 [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1 [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022 [ 6420.698729] Workqueue: ice ice_service_task [ice] [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice] [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83 [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246 [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000 [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828 [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000 [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0 [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff [ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000 [ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0 [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002) [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6420.811841] PKRU: 55555554 [ 6420.811842] Call Trace: [ 6420.811843] <TASK> [ 6420.811844] ice_reset_vf+0x9a/0x450 [ice] [ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice] [ 6420.841343] ice_service_task+0x23b/0x600 [ice] [ 6420.845884] ? __schedule+0x212/0x550 [ 6420.849550] process_one_work+0x1e2/0x3b0 [ 6420.853563] ? rescuer_thread+0x390/0x390 [ 6420.857577] worker_thread+0x50/0x3a0 [ 6420.861242] ? rescuer_thread+0x390/0x390 [ 6420.865253] kthread+0xdd/0x100 [ 6420.868400] ? kthread_complete_and_exit+0x20/0x20 [ 6420.873194] ret_from_fork+0x1f/0x30 [ 6420.876774] </TASK> [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk r Fixes: efe41860008e ("ice: Fix memory corruption in VF driver") Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of trusted VF") Signed-off-by: Petr Oros <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-08-21	Revert "ice: Fix ice VF reset during iavf initialization"	Petr Oros	4	-25/+4
	This reverts commit 7255355a0636b4eff08d5e8139c77d98f151c4fc. After this commit we are not able to attach VF to VM: virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52 error: Failed to attach interface error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable ice_check_vf_ready_for_cfg() already contain waiting for reset. New condition in ice_check_vf_ready_for_reset() causing only problems. Fixes: 7255355a0636 ("ice: Fix ice VF reset during iavf initialization") Signed-off-by: Petr Oros <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-08-21	ice: fix receive buffer size miscalculation	Jesse Brandeburg	1	-1/+2
	The driver is misconfiguring the hardware for some values of MTU such that it could use multiple descriptors to receive a packet when it could have simply used one. Change the driver to use a round-up instead of the result of a shift, as the shift can truncate the lower bits of the size, and result in the problem noted above. It also aligns this driver with similar code in i40e. The insidiousness of this problem is that everything works with the wrong size, it's just not working as well as it could, as some MTU sizes end up using two or more descriptors, and there is no way to tell that is happening without looking at ice_trace or a bus analyzer. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Jesse Brandeburg <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-08-21	super: wait until we passed kill super	Christian Brauner	2	-7/+65
	Recent rework moved block device closing out of sb->put_super() and into sb->kill_sb() to avoid deadlocks as s_umount is held in put_super() and blkdev_put() can end up taking s_umount again. That means we need to move the removal of the superblock from @fs_supers out of generic_shutdown_super() and into deactivate_locked_super() to ensure that concurrent mounters don't fail to open block devices that are still in use because blkdev_put() in sb->kill_sb() hasn't been called yet. We can now do this as we can make iterators through @fs_super and @super_blocks wait without holding s_umount. Concurrent mounts will wait until a dying superblock is fully dead so until sb->kill_sb() has been called and SB_DEAD been set. Concurrent iterators can already discard any SB_DYING superblock. Reviewed-by: Jan Kara <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	super: wait for nascent superblocks	Christian Brauner	2	-51/+154
	Recent patches experiment with making it possible to allocate a new superblock before opening the relevant block device. Naturally this has intricate side-effects that we get to learn about while developing this. Superblock allocators such as sget{_fc}() return with s_umount of the new superblock held and lock ordering currently requires that block level locks such as bdev_lock and open_mutex rank above s_umount. Before aca740cecbe5 ("fs: open block device after superblock creation") ordering was guaranteed to be correct as block devices were opened prior to superblock allocation and thus s_umount wasn't held. But now s_umount must be dropped before opening block devices to avoid locking violations. This has consequences. The main one being that iterators over @super_blocks and @fs_supers that grab a temporary reference to the superblock can now also grab s_umount before the caller has managed to open block devices and called fill_super(). So whereas before such iterators or concurrent mounts would have simply slept on s_umount until SB_BORN was set or the superblock was discard due to initalization failure they can now needlessly spin through sget{_fc}(). If the caller is sleeping on bdev_lock or open_mutex one caller waiting on SB_BORN will always spin somewhere and potentially this can go on for quite a while. It should be possible to drop s_umount while allowing iterators to wait on a nascent superblock to either be born or discarded. This patch implements a wait_var_event() mechanism allowing iterators to sleep until they are woken when the superblock is born or discarded. This also allows us to avoid relooping through @fs_supers and @super_blocks if a superblock isn't yet born or dying. Link: aca740cecbe5 ("fs: open block device after superblock creation") Reviewed-by: Jan Kara <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	efi/runtime-wrapper: Move workqueue manipulation out of line	Ard Biesheuvel	1	-28/+33
	efi_queue_work() is a macro that implements the non-trivial manipulation of the EFI runtime workqueue and completion data structure, most of which is generic, and could be shared between all the users of the macro. So move it out of the macro and into a new helper function. Signed-off-by: Ard Biesheuvel <[email protected]>
2023-08-21	efi/runtime-wrappers: Use type safe encapsulation of call arguments	Ard Biesheuvel	2	-76/+138
	The current code that marshalls the EFI runtime call arguments to hand them off to a async helper does so in a type unsafe and slightly messy manner - everything is cast to void* except for some integral types that are passed by reference and dereferenced on the receiver end. Let's clean this up a bit, and record the arguments of each runtime service invocation exactly as they are issued, in a manner that permits the compiler to check the types of the arguments at both ends. Signed-off-by: Ard Biesheuvel <[email protected]>
2023-08-21	efi/riscv: Move EFI runtime call setup/teardown helpers out of line	Ard Biesheuvel	2	-10/+15
	Only the arch_efi_call_virt() macro that some architectures override needs to be a macro, given that it is variadic and encapsulates calls via function pointers that have different prototypes. The associated setup and teardown code are not special in this regard, and don't need to be instantiated at each call site. So turn them into ordinary C functions and move them out of line. Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2023-08-21	efi/arm64: Move EFI runtime call setup/teardown helpers out of line	Ard Biesheuvel	2	-16/+18
	Only the arch_efi_call_virt() macro that some architectures override needs to be a macro, given that it is variadic and encapsulates calls via function pointers that have different prototypes. The associated setup and teardown code are not special in this regard, and don't need to be instantiated at each call site. So turn them into ordinary C functions and move them out of line. Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2023-08-21	cachefiles: use kiocb_{start,end}_write() helpers	Amir Goldstein	1	-13/+3
	Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	ovl: use kiocb_{start,end}_write() helpers	Amir Goldstein	1	-8/+2
	Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	aio: use kiocb_{start,end}_write() helpers	Amir Goldstein	1	-17/+3
	Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	io_uring: use kiocb_{start,end}_write() helpers	Amir Goldstein	1	-19/+4
	Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	fs: create kiocb_{start,end}_write() helpers	Amir Goldstein	1	-0/+36
	aio, io_uring, cachefiles and overlayfs, all open code an ugly variant of file_{start,end}_write() to silence lockdep warnings. Create helpers for this lockdep dance so we can use the helpers in all the callers. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	fs: add kerneldoc to file_{start,end}_write() helpers	Amir Goldstein	1	-1/+14
	and use sb_end_write() instead of open coded version. Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	io_uring: rename kiocb_end_write() local helper	Amir Goldstein	1	-5/+5
	This helper does not take a kiocb as input and we want to create a common helper by that name that takes a kiocb as input. Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	splice: Convert page_cache_pipe_buf_confirm() to use a folio	Matthew Wilcox (Oracle)	1	-11/+9
	Convert buf->page to a folio once instead of five times. There's only one uptodate bit per folio, not per page, so we lose nothing here. Signed-off-by: "Matthew Wilcox (Oracle)" <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	libfs: Convert simple_write_begin and simple_write_end to use a folio	Matthew Wilcox (Oracle)	1	-20/+20
	Remove a number of implicit calls to compound_head() and various calls to compatibility functions. This is not sufficient to enable support for large folios; generic_perform_write() must be converted first. Signed-off-by: "Matthew Wilcox (Oracle)" <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21	tracing: Introduce pipe_cpumask to avoid race on trace_pipes	Zheng Yejian	2	-7/+50
	There is race issue when concurrently splice_read main trace_pipe and per_cpu trace_pipes which will result in data read out being different from what actually writen. As suggested by Steven: > I believe we should add a ref count to trace_pipe and the per_cpu > trace_pipes, where if they are opened, nothing else can read it. > > Opening trace_pipe locks all per_cpu ref counts, if any of them are > open, then the trace_pipe open will fail (and releases any ref counts > it had taken). > > Opening a per_cpu trace_pipe will up the ref count for just that > CPU buffer. This will allow multiple tasks to read different per_cpu > trace_pipe files, but will prevent the main trace_pipe file from > being opened. But because we only need to know whether per_cpu trace_pipe is open or not, using a cpumask instead of using ref count may be easier. After this patch, users will find that: - Main trace_pipe can be opened by only one user, and if it is opened, all per_cpu trace_pipes cannot be opened; - Per_cpu trace_pipes can be opened by multiple users, but each per_cpu trace_pipe can only be opened by one user. And if one of them is opened, main trace_pipe cannot be opened. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Suggested-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Zheng Yejian <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-08-21	drm/panfrost: Skip speed binning on EOPNOTSUPP	David Michael	1	-1/+1
	Encountered on an ARM Mali-T760 MP4, attempting to read the nvmem variable can also return EOPNOTSUPP instead of ENOENT when speed binning is unsupported. Cc: <[email protected]> Fixes: 7d690f936e9b ("drm/panfrost: Add basic support for speed binning") Signed-off-by: David Michael <[email protected]> Reviewed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-08-21	of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock	Rob Herring	1	-22/+9
	While originally it was fine to format strings using "%pOF" while holding devtree_lock, this now causes a deadlock. Lockdep reports: of_get_parent from of_fwnode_get_parent+0x18/0x24 ^^^^^^^^^^^^^ of_fwnode_get_parent from fwnode_count_parents+0xc/0x28 fwnode_count_parents from fwnode_full_name_string+0x18/0xac fwnode_full_name_string from device_node_string+0x1a0/0x404 device_node_string from pointer+0x3c0/0x534 pointer from vsnprintf+0x248/0x36c vsnprintf from vprintk_store+0x130/0x3b4 Fix this by moving the printing in __of_changeset_entry_apply() outside the lock. As the only difference in the multiple prints is the action name, use the existing "action_names" to refactor the prints into a single print. Fixes: a92eb7621b9fb2c2 ("lib/vsprintf: Make use of fwnode API to obtain node names and separators") Cc: [email protected] Reported-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-08-21	of: unittest: Fix EXPECT for parse_phandle_with_args_map() test	Rob Herring	1	-2/+2
	Commit 12e17243d8a1 ("of: base: improve error msg in of_phandle_iterator_next()") added printing of the phandle value on error, but failed to update the unittest. Fixes: 12e17243d8a1 ("of: base: improve error msg in of_phandle_iterator_next()") Cc: [email protected] Reviewed-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-08-21	btrfs: tests: test invalid splitting when skipping pinned drop extent_map	Josef Bacik	1	-0/+138
	This reproduces the bug fixed by "btrfs: fix incorrect splitting in btrfs_drop_extent_map_range", we were improperly calculating the range for the split extent. Add a test that exercises this scenario and validates that we get the correct resulting extent_maps in our tree. Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: tests: add a test for btrfs_add_extent_mapping	Josef Bacik	1	-0/+56
	This helper is different from the normal add_extent_mapping in that it will stuff an em into a gap that exists between overlapping em's in the tree. It appeared there was a bug so I wrote a self test to validate it did the correct thing when it worked with two side by side ems. Thankfully it is correct, but more testing is better. Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: tests: add extent_map tests for dropping with odd layouts	Josef Bacik	1	-0/+218
	While investigating weird problems with the extent_map I wrote a self test testing the various edge cases of btrfs_drop_extent_map_range. This can split in different ways and behaves different in each case, so test the various edge cases to make sure everything is functioning properly. Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: scrub: move write back of repaired sectors to ↵	Qu Wenruo	1	-47/+25
	scrub_stripe_read_repair_worker() Currently the scrub_stripe_read_repair_worker() only does reads to rebuild the corrupted sectors, it doesn't do any writeback. The design is mostly to put writeback into a more ordered manner, to co-operate with dev-replace with zoned mode, which requires every write to be submitted in their bytenr order. However the writeback for repaired sectors into the original mirror doesn't need such strong sync requirement, as it can only happen for non-zoned devices. This patch would move the writeback for repaired sectors into scrub_stripe_read_repair_worker(), which removes two calls sites for repaired sectors writeback. (one from flush_scrub_stripes(), one from scrub_raid56_parity_stripe()) Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: scrub: don't go ordered workqueue for dev-replace	Qu Wenruo	1	-7/+3
	The workqueue fs_info->scrub_worker would go ordered workqueue if it's a device replace operation. However the scrub is relying on multiple workers to do data csum verification, and we always submit several read requests in a row. Thus there is no need to use ordered workqueue just for dev-replace. We have extra synchronization (the main thread will always submit-and-wait for dev-replace writes) to handle it for zoned devices. Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: scrub: fix grouping of read IO	Qu Wenruo	1	-25/+71
	[REGRESSION] There are several regression reports about the scrub performance with v6.4 kernel. On a PCIe 3.0 device, the old v6.3 kernel can go 3GB/s scrub speed, but v6.4 can only go 1GB/s, an obvious 66% performance drop. [CAUSE] Iostat shows a very different behavior between v6.3 and v6.4 kernel: Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 9731.00 3425544.00 17237.00 63.92 2.18 352.02 21.18 100.00 nvme0n1p3 15578.00 993616.00 5.00 0.03 0.09 63.78 1.32 100.00 The upper one is v6.3 while the lower one is v6.4. There are several obvious differences: - Very few read merges This turns out to be a behavior change that we no longer do bio plug/unplug. - Very low aqu-sz This is due to the submit-and-wait behavior of flush_scrub_stripes(), and extra extent/csum tree search. Both behaviors are not that obvious on SATA SSDs, as SATA SSDs have NCQ to merge the reads, while SATA SSDs can not handle high queue depth well either. [FIX] For now this patch focuses on the read speed fix. Dev-replace replace speed needs more work. For the read part, we go two directions to fix the problems: - Re-introduce blk plug/unplug to merge read requests This is pretty simple, and the behavior is pretty easy to observe. This would enlarge the average read request size to 512K. - Introduce multi-group reads and no longer wait for each group Instead of the old behavior, which submits 8 stripes and waits for them, here we would enlarge the total number of stripes to 16 * 8. Which is 8M per device, the same limit as the old scrub in-flight bios size limit. Now every time we fill a group (8 stripes), we submit them and continue to next stripes. Only when the full 16 * 8 stripes are all filled, we submit the remaining ones (the last group), and wait for all groups to finish. Then submit the repair writes and dev-replace writes. This should enlarge the queue depth. This would greatly improve the merge rate (thus read block size) and queue depth: Before (with regression, and cached extent/csum path): Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 20666.00 1318240.00 10.00 0.05 0.08 63.79 1.63 100.00 After (with all patches applied): nvme0n1p3 5165.00 2278304.00 30557.00 85.54 0.55 441.10 2.81 100.00 i.e. 1287 to 2224 MB/s. CC: [email protected] # 6.4+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: scrub: avoid unnecessary csum tree search preparing stripes	Qu Wenruo	4	-27/+46
	One of the bottleneck of the new scrub code is the extra csum tree search. The old code would only do the csum tree search for each scrub bio, which can be as large as 512KiB, thus they can afford to allocate a new path each time. But the new scrub code is doing csum tree search for each stripe, which is only 64KiB, this means we'd better re-use the same csum path during each search. This patch would introduce a per-sctx path for csum tree search, as we don't need to re-allocate the path every time we need to do a csum tree search. With this change we can further improve the queue depth and improve the scrub read performance: Before (with regression and cached extent tree path): Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 15875.00 1013328.00 12.00 0.08 0.08 63.83 1.35 100.00 After (with both cached extent/csum tree path): nvme0n1p3 17759.00 1133280.00 10.00 0.06 0.08 63.81 1.50 100.00 Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") CC: [email protected] # 6.4+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: scrub: avoid unnecessary extent tree search preparing stripes	Qu Wenruo	1	-12/+29
	Since commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"), scrub no longer re-use the same path for extent tree search. This can lead to unnecessary extent tree search, especially for the new stripe based scrub, as we have way more stripes to prepare. This patch would re-introduce a shared path for extent tree search, and properly release it when the block group is scrubbed. This change alone can improve scrub performance slightly by reducing the time spend preparing the stripe thus improving the queue depth. Before (with regression): Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 15578.00 993616.00 5.00 0.03 0.09 63.78 1.32 100.00 After (with this patch): nvme0n1p3 15875.00 1013328.00 12.00 0.08 0.08 63.83 1.35 100.00 Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") CC: [email protected] # 6.4+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: copy dir permission and time when creating a stub subvolume	Lee Trager	1	-5/+7
	btrfs supports creating nested subvolumes however snapshots are not recursive. When a snapshot is taken of a volume which contains a subvolume the subvolume is replaced with a stub subvolume which has the same name and uses inode number 2[1]. The stub subvolume kept the directory name but did not set the time or permissions of the stub subvolume. This resulted in all time information being the current time and ownership defaulting to root. When subvolumes and snapshots are created using unshare this results in a snapshot directory the user created but has no permissions for. Test case: [vmuser@archvm ~]# sudo -i [root@archvm ~]# mkdir -p /mnt/btrfs/test [root@archvm ~]# chown vmuser:users /mnt/btrfs/test/ [root@archvm ~]# exit logout [vmuser@archvm ~]$ cd /mnt/btrfs/test [vmuser@archvm test]$ unshare --user --keep-caps --map-auto --map-root-user [root@archvm test]# btrfs subvolume create subvolume Create subvolume './subvolume' [root@archvm test]# btrfs subvolume create subvolume/subsubvolume Create subvolume 'subvolume/subsubvolume' [root@archvm test]# btrfs subvolume snapshot subvolume snapshot Create a snapshot of 'subvolume' in './snapshot' [root@archvm test]# exit logout [vmuser@archvm test]$ tree -ug [vmuser users ] . ├── [vmuser users ] snapshot │ └── [vmuser users ] subsubvolume <-- Without patch perm is root:root └── [vmuser users ] subvolume └── [vmuser users ] subsubvolume 5 directories, 0 files [1] https://btrfs.readthedocs.io/en/latest/btrfs-subvolume.html#nested-subvolumes Signed-off-by: Lee Trager <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: remove pointless empty list check when reading delayed dir indexes	Filipe Manana	1	-3/+0
	At btrfs_readdir_delayed_dir_index(), called when reading a directory, we have this check for an empty list to return immediately, but it's not needed since list_for_each_entry_safe(), called immediately after, is prepared to deal with an empty list, it simply does nothing. So remove the empty list check. Besides shorter source code, it also slightly reduces the binary text size: Before this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1609408 167269 16864 1793541 1b5e05 fs/btrfs/btrfs.ko After this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1609392 167269 16864 1793525 1b5df5 fs/btrfs/btrfs.ko Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: drop redundant check to use fs_devices::metadata_uuid	Anand Jain	1	-10/+5
	fs_devices::metadata_uuid value is already updated based on the super_block::METADATA_UUID flag for either fsid or metadata_uuid as appropriate. So, fs_devices::metadata_uuid can be used directly. Reviewed-by: Johannes Thumshirn <[email protected]> Tested-by: Guilherme G. Piccoli <[email protected]> Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super	Anand Jain	1	-5/+3
	The function btrfs_validate_super() should verify the metadata_uuid in the provided superblock argument. Because, all its callers expect it to do that. Such as in the following stacks: write_all_supers() sb = fs_info->super_for_commit; btrfs_validate_write_super(.., sb) btrfs_validate_super(.., sb, ..) scrub_one_super() btrfs_validate_super(.., sb, ..) And check_dev_super() btrfs_validate_super(.., sb, ..) However, it currently verifies the fs_info::super_copy::metadata_uuid instead. Fix this using the correct metadata_uuid in the superblock argument. CC: [email protected] # 5.4+ Reviewed-by: Johannes Thumshirn <[email protected]> Tested-by: Guilherme G. Piccoli <[email protected]> Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21	btrfs: use the correct superblock to compare fsid in btrfs_validate_super	Anand Jain	1	-3/+2
	The function btrfs_validate_super() should verify the fsid in the provided superblock argument. Because, all its callers expect it to do that. Such as in the following stack: write_all_supers() sb = fs_info->super_for_commit; btrfs_validate_write_super(.., sb) btrfs_validate_super(.., sb, ..) scrub_one_super() btrfs_validate_super(.., sb, ..) And check_dev_super() btrfs_validate_super(.., sb, ..) However, it currently verifies the fs_info::super_copy::fsid instead, which is not correct. Fix this using the correct fsid in the superblock argument. CC: [email protected] # 5.4+ Reviewed-by: Johannes Thumshirn <[email protected]> Tested-by: Guilherme G. Piccoli <[email protected]> Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>