aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-08-21ACPI: x86: s2idle: Add a function to get LPS0 constraint for a deviceMario Limonciello2-0/+30
Other parts of the kernel may use LPS0 constraints information to make decisions on what power state to put a device into. Signed-off-by: Mario Limonciello <[email protected]> [ rjw: Rewrite kerneldoc, rearrange if () statement, edit subject and changelog ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-08-21ACPI: x86: s2idle: Add for_each_lpi_constraint() helperAndy Shevchenko1-11/+15
We have one existing and one coming user of this macro. Introduce a helper. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-08-21ACPI: x86: s2idle: Add more debugging for AMD constraints parsingMario Limonciello1-3/+7
While parsing the constraints show all the entries for the table to aid with debugging other problems later. Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-08-21ACPI: x86: s2idle: Fix a logic error parsing AMD constraints tableMario Limonciello1-19/+12
The constraints table should be resetting the `list` object after running through all of `info_obj` iterations. This adjusts whitespace as well as less code will now be included with each loop. This fixes a functional problem is fixed where a badly formed package in the inner loop may have incorrect data. Fixes: 146f1ed852a8 ("ACPI: PM: s2idle: Add AMD support to handle _DSM") Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-08-21ACPI: x86: s2idle: Catch multiple ACPI_TYPE_PACKAGE objectsMario Limonciello1-0/+6
If a badly constructed firmware includes multiple `ACPI_TYPE_PACKAGE` objects while evaluating the AMD LPS0 _DSM, there will be a memory leak. Explicitly guard against this. Suggested-by: Bjorn Helgaas <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-08-21ACPI: x86: s2idle: Post-increment variables when getting constraintsMario Limonciello1-4/+4
When code uses a pre-increment it makes the reader question "why". In the constraint fetching code there is no reason for the variables to be pre-incremented so adjust to post-increment. No intended functional changes. Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]> Suggested-by: Bjorn Helgaas <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-08-21ACPI: Adjust #ifdef for *_lps0_dev useMario Limonciello1-2/+2
The `#ifdef` for acpi_register_lps0_dev() currently is guarded against `CONFIG_X86`, but actually the functions contained in the block are specifically sleep related functions. Adjust the guard to also check for `CONFIG_SUSPEND`. Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-08-21irqchip: Add support for Amlogic-C3 SoCsHuqiang Qin1-0/+5
The Amlogic-C3 SoCs support 12 GPIO IRQ lines compared with previous serial chips and have something different, details are as below. IRQ Number: - 54 1 pins on bank TESTN - 53:40 14 pins on bank X - 39:33 7 pins on bank D - 32:27 6 pins on bank A - 26:22 5 pins on bank E - 21:15 7 pins on bank C - 14:0 15 pins on bank B Signed-off-by: Huqiang Qin <[email protected]> Acked-by: Martin Blumenstingl <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21dt-bindings: interrupt-controller: Add support for Amlogic-C3 SoCsHuqiang Qin1-0/+1
Update dt-binding document for GPIO interrupt controller of Amlogic-C3 SoCs Signed-off-by: Huqiang Qin <[email protected]> Acked-by: Conor Dooley <[email protected]> Acked-by: Martin Blumenstingl <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchip/irq-mvebu-sei: Use devm_platform_get_and_ioremap_resource()Yangtao Li1-2/+1
Convert platform_get_resource(), devm_ioremap_resource() to a single call to devm_platform_get_and_ioremap_resource(), as this is exactly what this function does. Signed-off-by: Yangtao Li <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchip/ls-scfg-msi: Use devm_platform_get_and_ioremap_resource()Yangtao Li1-2/+1
Convert platform_get_resource(), devm_ioremap_resource() to a single call to devm_platform_get_and_ioremap_resource(), as this is exactly what this function does. Signed-off-by: Yangtao Li <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchip: Explicitly include correct DT includesRob Herring23-28/+18
The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchip/orion: Use of_address_count() helperYang Yingliang1-2/+1
After commit 16988c742968 ("of/address: introduce of_address_count() helper"), Use of_address_count() to instead of open-coding it, it's no functional change. Signed-off-by: Yang Yingliang <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchip/irq-pruss-intc: Do not check for 0 return after calling ↵Ruan Jinjie1-2/+2
platform_get_irq() It is not possible for platform_get_irq() to return 0. Use the return value from platform_get_irq(). Signed-off-by: Ruan Jinjie <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchip/imx-mu-msi: Do not check for 0 return after calling platform_get_irq()Ruan Jinjie1-2/+2
It is not possible for platform_get_irq() to return 0. Use the return value from platform_get_irq(). Signed-off-by: Ruan Jinjie <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchipr/i8259: Mark i8259_of_init() staticArnd Bergmann1-1/+1
i8259_of_init() is only used as an initcall and does not need to be global, so mark it static to avoid: drivers/irqchip/irq-i8259.c:343:12: warning: no previous prototype for 'i8259_of_init' [-Wmissing-prototypes] Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchip/mips-gic: Mark gic_irq_domain_free() staticArnd Bergmann1-1/+1
This function is only used locally and should be static to avoid a warning: drivers/irqchip/irq-mips-gic.c:560:6: error: no previous prototype for 'gic_irq_domain_free' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Reviewed-by: Serge Semin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchip/xtensa-pic: Include header for xtensa_pic_init_legacy()Arnd Bergmann1-0/+1
The declaration for this function is not included, which leads to a harmless warning: drivers/irqchip/irq-xtensa-pic.c:91:12: error: no previous prototype for 'xtensa_pic_init_legacy' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Max Filippov <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21irqchip/loongson-eiointc: Fix return value checking of eiointc_indexBibo Mao1-1/+1
Return value of function eiointc_index is int, however it is converted into uint32_t and then compared smaller than zero, this will cause logic problem. Fixes: dd281e1a1a93 ("irqchip: Add Loongson Extended I/O interrupt controller support") Signed-off-by: Bibo Mao <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-21ice: Fix NULL pointer deref during VF resetPetr Oros1-7/+8
During stress test with attaching and detaching VF from KVM and simultaneously changing VFs spoofcheck and trust there was a NULL pointer dereference in ice_reset_vf that VF's VSI is null. More than one instance of ice_reset_vf() can be running at a given time. When we rebuild the VSI in ice_reset_vf, another reset can be triaged from ice_service_task. In this case we can access the currently uninitialized VSI and cause panic. The window for this racing condition has been around for a long time but it's much worse after commit 227bf4500aaa ("ice: move VSI delete outside deconfig") because the reset runs faster. ice_reset_vf() using vf->cfg_lock and when we move this lock before accessing to the VF VSI, we can fix BUG for all cases. Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes in ice_vsi_stop_all_rx_rings() With our reproducer, we can hit BUG: ~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). ~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). After this fix we are not able to reproduce it after ~48h There was commit cf90b74341ee ("ice: Fix call trace with null VSI during VF reset") which also tried to fix this issue, but it was only partially resolved and the bug still exists. [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 6420.665382] #PF: supervisor read access in kernel mode [ 6420.670521] #PF: error_code(0x0000) - not-present page [ 6420.675659] PGD 0 [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1 [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022 [ 6420.698729] Workqueue: ice ice_service_task [ice] [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice] [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83 [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246 [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000 [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828 [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000 [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0 [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff [ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000 [ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0 [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002) [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6420.811841] PKRU: 55555554 [ 6420.811842] Call Trace: [ 6420.811843] <TASK> [ 6420.811844] ice_reset_vf+0x9a/0x450 [ice] [ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice] [ 6420.841343] ice_service_task+0x23b/0x600 [ice] [ 6420.845884] ? __schedule+0x212/0x550 [ 6420.849550] process_one_work+0x1e2/0x3b0 [ 6420.853563] ? rescuer_thread+0x390/0x390 [ 6420.857577] worker_thread+0x50/0x3a0 [ 6420.861242] ? rescuer_thread+0x390/0x390 [ 6420.865253] kthread+0xdd/0x100 [ 6420.868400] ? kthread_complete_and_exit+0x20/0x20 [ 6420.873194] ret_from_fork+0x1f/0x30 [ 6420.876774] </TASK> [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk r Fixes: efe41860008e ("ice: Fix memory corruption in VF driver") Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of trusted VF") Signed-off-by: Petr Oros <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-08-21Revert "ice: Fix ice VF reset during iavf initialization"Petr Oros4-25/+4
This reverts commit 7255355a0636b4eff08d5e8139c77d98f151c4fc. After this commit we are not able to attach VF to VM: virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52 error: Failed to attach interface error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable ice_check_vf_ready_for_cfg() already contain waiting for reset. New condition in ice_check_vf_ready_for_reset() causing only problems. Fixes: 7255355a0636 ("ice: Fix ice VF reset during iavf initialization") Signed-off-by: Petr Oros <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-08-21ice: fix receive buffer size miscalculationJesse Brandeburg1-1/+2
The driver is misconfiguring the hardware for some values of MTU such that it could use multiple descriptors to receive a packet when it could have simply used one. Change the driver to use a round-up instead of the result of a shift, as the shift can truncate the lower bits of the size, and result in the problem noted above. It also aligns this driver with similar code in i40e. The insidiousness of this problem is that everything works with the wrong size, it's just not working as well as it could, as some MTU sizes end up using two or more descriptors, and there is no way to tell that is happening without looking at ice_trace or a bus analyzer. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Jesse Brandeburg <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-08-21super: wait until we passed kill superChristian Brauner2-7/+65
Recent rework moved block device closing out of sb->put_super() and into sb->kill_sb() to avoid deadlocks as s_umount is held in put_super() and blkdev_put() can end up taking s_umount again. That means we need to move the removal of the superblock from @fs_supers out of generic_shutdown_super() and into deactivate_locked_super() to ensure that concurrent mounters don't fail to open block devices that are still in use because blkdev_put() in sb->kill_sb() hasn't been called yet. We can now do this as we can make iterators through @fs_super and @super_blocks wait without holding s_umount. Concurrent mounts will wait until a dying superblock is fully dead so until sb->kill_sb() has been called and SB_DEAD been set. Concurrent iterators can already discard any SB_DYING superblock. Reviewed-by: Jan Kara <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21super: wait for nascent superblocksChristian Brauner2-51/+154
Recent patches experiment with making it possible to allocate a new superblock before opening the relevant block device. Naturally this has intricate side-effects that we get to learn about while developing this. Superblock allocators such as sget{_fc}() return with s_umount of the new superblock held and lock ordering currently requires that block level locks such as bdev_lock and open_mutex rank above s_umount. Before aca740cecbe5 ("fs: open block device after superblock creation") ordering was guaranteed to be correct as block devices were opened prior to superblock allocation and thus s_umount wasn't held. But now s_umount must be dropped before opening block devices to avoid locking violations. This has consequences. The main one being that iterators over @super_blocks and @fs_supers that grab a temporary reference to the superblock can now also grab s_umount before the caller has managed to open block devices and called fill_super(). So whereas before such iterators or concurrent mounts would have simply slept on s_umount until SB_BORN was set or the superblock was discard due to initalization failure they can now needlessly spin through sget{_fc}(). If the caller is sleeping on bdev_lock or open_mutex one caller waiting on SB_BORN will always spin somewhere and potentially this can go on for quite a while. It should be possible to drop s_umount while allowing iterators to wait on a nascent superblock to either be born or discarded. This patch implements a wait_var_event() mechanism allowing iterators to sleep until they are woken when the superblock is born or discarded. This also allows us to avoid relooping through @fs_supers and @super_blocks if a superblock isn't yet born or dying. Link: aca740cecbe5 ("fs: open block device after superblock creation") Reviewed-by: Jan Kara <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21efi/runtime-wrapper: Move workqueue manipulation out of lineArd Biesheuvel1-28/+33
efi_queue_work() is a macro that implements the non-trivial manipulation of the EFI runtime workqueue and completion data structure, most of which is generic, and could be shared between all the users of the macro. So move it out of the macro and into a new helper function. Signed-off-by: Ard Biesheuvel <[email protected]>
2023-08-21efi/runtime-wrappers: Use type safe encapsulation of call argumentsArd Biesheuvel2-76/+138
The current code that marshalls the EFI runtime call arguments to hand them off to a async helper does so in a type unsafe and slightly messy manner - everything is cast to void* except for some integral types that are passed by reference and dereferenced on the receiver end. Let's clean this up a bit, and record the arguments of each runtime service invocation exactly as they are issued, in a manner that permits the compiler to check the types of the arguments at both ends. Signed-off-by: Ard Biesheuvel <[email protected]>
2023-08-21efi/riscv: Move EFI runtime call setup/teardown helpers out of lineArd Biesheuvel2-10/+15
Only the arch_efi_call_virt() macro that some architectures override needs to be a macro, given that it is variadic and encapsulates calls via function pointers that have different prototypes. The associated setup and teardown code are not special in this regard, and don't need to be instantiated at each call site. So turn them into ordinary C functions and move them out of line. Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2023-08-21efi/arm64: Move EFI runtime call setup/teardown helpers out of lineArd Biesheuvel2-16/+18
Only the arch_efi_call_virt() macro that some architectures override needs to be a macro, given that it is variadic and encapsulates calls via function pointers that have different prototypes. The associated setup and teardown code are not special in this regard, and don't need to be instantiated at each call site. So turn them into ordinary C functions and move them out of line. Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2023-08-21cachefiles: use kiocb_{start,end}_write() helpersAmir Goldstein1-13/+3
Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21ovl: use kiocb_{start,end}_write() helpersAmir Goldstein1-8/+2
Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21aio: use kiocb_{start,end}_write() helpersAmir Goldstein1-17/+3
Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21io_uring: use kiocb_{start,end}_write() helpersAmir Goldstein1-19/+4
Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21fs: create kiocb_{start,end}_write() helpersAmir Goldstein1-0/+36
aio, io_uring, cachefiles and overlayfs, all open code an ugly variant of file_{start,end}_write() to silence lockdep warnings. Create helpers for this lockdep dance so we can use the helpers in all the callers. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21fs: add kerneldoc to file_{start,end}_write() helpersAmir Goldstein1-1/+14
and use sb_end_write() instead of open coded version. Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21io_uring: rename kiocb_end_write() local helperAmir Goldstein1-5/+5
This helper does not take a kiocb as input and we want to create a common helper by that name that takes a kiocb as input. Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21splice: Convert page_cache_pipe_buf_confirm() to use a folioMatthew Wilcox (Oracle)1-11/+9
Convert buf->page to a folio once instead of five times. There's only one uptodate bit per folio, not per page, so we lose nothing here. Signed-off-by: "Matthew Wilcox (Oracle)" <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21libfs: Convert simple_write_begin and simple_write_end to use a folioMatthew Wilcox (Oracle)1-20/+20
Remove a number of implicit calls to compound_head() and various calls to compatibility functions. This is not sufficient to enable support for large folios; generic_perform_write() must be converted first. Signed-off-by: "Matthew Wilcox (Oracle)" <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-08-21tracing: Introduce pipe_cpumask to avoid race on trace_pipesZheng Yejian2-7/+50
There is race issue when concurrently splice_read main trace_pipe and per_cpu trace_pipes which will result in data read out being different from what actually writen. As suggested by Steven: > I believe we should add a ref count to trace_pipe and the per_cpu > trace_pipes, where if they are opened, nothing else can read it. > > Opening trace_pipe locks all per_cpu ref counts, if any of them are > open, then the trace_pipe open will fail (and releases any ref counts > it had taken). > > Opening a per_cpu trace_pipe will up the ref count for just that > CPU buffer. This will allow multiple tasks to read different per_cpu > trace_pipe files, but will prevent the main trace_pipe file from > being opened. But because we only need to know whether per_cpu trace_pipe is open or not, using a cpumask instead of using ref count may be easier. After this patch, users will find that: - Main trace_pipe can be opened by only one user, and if it is opened, all per_cpu trace_pipes cannot be opened; - Per_cpu trace_pipes can be opened by multiple users, but each per_cpu trace_pipe can only be opened by one user. And if one of them is opened, main trace_pipe cannot be opened. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Suggested-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Zheng Yejian <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-08-21drm/panfrost: Skip speed binning on EOPNOTSUPPDavid Michael1-1/+1
Encountered on an ARM Mali-T760 MP4, attempting to read the nvmem variable can also return EOPNOTSUPP instead of ENOENT when speed binning is unsupported. Cc: <[email protected]> Fixes: 7d690f936e9b ("drm/panfrost: Add basic support for speed binning") Signed-off-by: David Michael <[email protected]> Reviewed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-08-21of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lockRob Herring1-22/+9
While originally it was fine to format strings using "%pOF" while holding devtree_lock, this now causes a deadlock. Lockdep reports: of_get_parent from of_fwnode_get_parent+0x18/0x24 ^^^^^^^^^^^^^ of_fwnode_get_parent from fwnode_count_parents+0xc/0x28 fwnode_count_parents from fwnode_full_name_string+0x18/0xac fwnode_full_name_string from device_node_string+0x1a0/0x404 device_node_string from pointer+0x3c0/0x534 pointer from vsnprintf+0x248/0x36c vsnprintf from vprintk_store+0x130/0x3b4 Fix this by moving the printing in __of_changeset_entry_apply() outside the lock. As the only difference in the multiple prints is the action name, use the existing "action_names" to refactor the prints into a single print. Fixes: a92eb7621b9fb2c2 ("lib/vsprintf: Make use of fwnode API to obtain node names and separators") Cc: [email protected] Reported-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-08-21of: unittest: Fix EXPECT for parse_phandle_with_args_map() testRob Herring1-2/+2
Commit 12e17243d8a1 ("of: base: improve error msg in of_phandle_iterator_next()") added printing of the phandle value on error, but failed to update the unittest. Fixes: 12e17243d8a1 ("of: base: improve error msg in of_phandle_iterator_next()") Cc: [email protected] Reviewed-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-08-21kunit: fix struct kunit_attr headerRae Moar1-0/+2
Add parameter descriptions to struct kunit_attr header for the parameters attr_default and print. Fixes: 39e92cb1e4a1 ("kunit: Add test attributes API structure") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Rae Moar <[email protected]> Reviewed-by: David Gow <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-08-21xen: Fix one kernel-doc commentYang Li1-1/+1
Use colon to separate parameter name from their specific meaning. silence the warning: drivers/xen/grant-table.c:1051: warning: Function parameter or member 'nr_pages' not described in 'gnttab_free_pages' Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6030 Signed-off-by: Yang Li <[email protected]> Acked-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2023-08-21btrfs: tests: test invalid splitting when skipping pinned drop extent_mapJosef Bacik1-0/+138
This reproduces the bug fixed by "btrfs: fix incorrect splitting in btrfs_drop_extent_map_range", we were improperly calculating the range for the split extent. Add a test that exercises this scenario and validates that we get the correct resulting extent_maps in our tree. Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21btrfs: tests: add a test for btrfs_add_extent_mappingJosef Bacik1-0/+56
This helper is different from the normal add_extent_mapping in that it will stuff an em into a gap that exists between overlapping em's in the tree. It appeared there was a bug so I wrote a self test to validate it did the correct thing when it worked with two side by side ems. Thankfully it is correct, but more testing is better. Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21btrfs: tests: add extent_map tests for dropping with odd layoutsJosef Bacik1-0/+218
While investigating weird problems with the extent_map I wrote a self test testing the various edge cases of btrfs_drop_extent_map_range. This can split in different ways and behaves different in each case, so test the various edge cases to make sure everything is functioning properly. Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21btrfs: scrub: move write back of repaired sectors to ↵Qu Wenruo1-47/+25
scrub_stripe_read_repair_worker() Currently the scrub_stripe_read_repair_worker() only does reads to rebuild the corrupted sectors, it doesn't do any writeback. The design is mostly to put writeback into a more ordered manner, to co-operate with dev-replace with zoned mode, which requires every write to be submitted in their bytenr order. However the writeback for repaired sectors into the original mirror doesn't need such strong sync requirement, as it can only happen for non-zoned devices. This patch would move the writeback for repaired sectors into scrub_stripe_read_repair_worker(), which removes two calls sites for repaired sectors writeback. (one from flush_scrub_stripes(), one from scrub_raid56_parity_stripe()) Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21btrfs: scrub: don't go ordered workqueue for dev-replaceQu Wenruo1-7/+3
The workqueue fs_info->scrub_worker would go ordered workqueue if it's a device replace operation. However the scrub is relying on multiple workers to do data csum verification, and we always submit several read requests in a row. Thus there is no need to use ordered workqueue just for dev-replace. We have extra synchronization (the main thread will always submit-and-wait for dev-replace writes) to handle it for zoned devices. Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21btrfs: scrub: fix grouping of read IOQu Wenruo1-25/+71
[REGRESSION] There are several regression reports about the scrub performance with v6.4 kernel. On a PCIe 3.0 device, the old v6.3 kernel can go 3GB/s scrub speed, but v6.4 can only go 1GB/s, an obvious 66% performance drop. [CAUSE] Iostat shows a very different behavior between v6.3 and v6.4 kernel: Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 9731.00 3425544.00 17237.00 63.92 2.18 352.02 21.18 100.00 nvme0n1p3 15578.00 993616.00 5.00 0.03 0.09 63.78 1.32 100.00 The upper one is v6.3 while the lower one is v6.4. There are several obvious differences: - Very few read merges This turns out to be a behavior change that we no longer do bio plug/unplug. - Very low aqu-sz This is due to the submit-and-wait behavior of flush_scrub_stripes(), and extra extent/csum tree search. Both behaviors are not that obvious on SATA SSDs, as SATA SSDs have NCQ to merge the reads, while SATA SSDs can not handle high queue depth well either. [FIX] For now this patch focuses on the read speed fix. Dev-replace replace speed needs more work. For the read part, we go two directions to fix the problems: - Re-introduce blk plug/unplug to merge read requests This is pretty simple, and the behavior is pretty easy to observe. This would enlarge the average read request size to 512K. - Introduce multi-group reads and no longer wait for each group Instead of the old behavior, which submits 8 stripes and waits for them, here we would enlarge the total number of stripes to 16 * 8. Which is 8M per device, the same limit as the old scrub in-flight bios size limit. Now every time we fill a group (8 stripes), we submit them and continue to next stripes. Only when the full 16 * 8 stripes are all filled, we submit the remaining ones (the last group), and wait for all groups to finish. Then submit the repair writes and dev-replace writes. This should enlarge the queue depth. This would greatly improve the merge rate (thus read block size) and queue depth: Before (with regression, and cached extent/csum path): Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 20666.00 1318240.00 10.00 0.05 0.08 63.79 1.63 100.00 After (with all patches applied): nvme0n1p3 5165.00 2278304.00 30557.00 85.54 0.55 441.10 2.81 100.00 i.e. 1287 to 2224 MB/s. CC: [email protected] # 6.4+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-08-21btrfs: scrub: avoid unnecessary csum tree search preparing stripesQu Wenruo4-27/+46
One of the bottleneck of the new scrub code is the extra csum tree search. The old code would only do the csum tree search for each scrub bio, which can be as large as 512KiB, thus they can afford to allocate a new path each time. But the new scrub code is doing csum tree search for each stripe, which is only 64KiB, this means we'd better re-use the same csum path during each search. This patch would introduce a per-sctx path for csum tree search, as we don't need to re-allocate the path every time we need to do a csum tree search. With this change we can further improve the queue depth and improve the scrub read performance: Before (with regression and cached extent tree path): Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 15875.00 1013328.00 12.00 0.08 0.08 63.83 1.35 100.00 After (with both cached extent/csum tree path): nvme0n1p3 17759.00 1133280.00 10.00 0.06 0.08 63.81 1.50 100.00 Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") CC: [email protected] # 6.4+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>