aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-07-10xen/blkback: add missing MODULE_DESCRIPTION() macroJeff Johnson1-0/+1
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/block/xen-blkback/xen-blkback.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-10io_uring/napi: Remove unnecessary s64 castThorsten Blum1-1/+1
Since the do_div() macro casts the divisor to u32 anyway, remove the unnecessary s64 cast and fix the following Coccinelle/coccicheck warning reported by do_div.cocci: WARNING: do_div() does a 64-by-32 division, please consider using div64_s64 instead Signed-off-by: Thorsten Blum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-10swiotlb: reduce swiotlb pool lookupsMichael Kelley7-106/+129
With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple places, the pool is found and used in one function, and then must be found again in the next function that is called because only the tlb_addr is passed as an argument. These are the six call sites: dma_direct_map_page: 1. swiotlb_map -> swiotlb_tbl_map_single -> swiotlb_bounce dma_direct_unmap_page: 2. dma_direct_sync_single_for_cpu -> is_swiotlb_buffer 3. dma_direct_sync_single_for_cpu -> swiotlb_sync_single_for_cpu -> swiotlb_bounce 4. is_swiotlb_buffer 5. swiotlb_tbl_unmap_single -> swiotlb_del_transient 6. swiotlb_tbl_unmap_single -> swiotlb_release_slots Reduce the number of calls by finding the pool at a higher level, and passing it as an argument instead of searching again. A key change is for is_swiotlb_buffer() to return a pool pointer instead of a boolean, and then pass this pool pointer to subsequent swiotlb functions. There are 9 occurrences of is_swiotlb_buffer() used to test if a buffer is a swiotlb buffer before calling a swiotlb function. To reduce code duplication in getting the pool pointer and passing it as an argument, introduce inline wrappers for this pattern. The generated code is essentially unchanged. Since is_swiotlb_buffer() no longer returns a boolean, rename some functions to reflect the change: * swiotlb_find_pool() becomes __swiotlb_find_pool() * is_swiotlb_buffer() becomes swiotlb_find_pool() * is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool() With these changes, a round-trip map/unmap pair requires only 2 pool lookups (listed using the new names and wrappers): dma_direct_unmap_page: 1. dma_direct_sync_single_for_cpu -> swiotlb_find_pool 2. swiotlb_tbl_unmap_single -> swiotlb_find_pool These changes come from noticing the inefficiencies in a code review, not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC, __swiotlb_find_pool() is not trivial, and it uses an RCU read lock, so avoiding the redundant calls helps performance in a hot path. When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction is minimal and the perf benefits are likely negligible, but no harm is done. No functional change is intended. Signed-off-by: Michael Kelley <[email protected]> Reviewed-by: Petr Tesarik <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2024-07-10PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq()Dan Carpenter1-1/+1
Smatch complains that "ret" could be uninitialized if "pcie->icc_mem" is NULL and "pm_suspend_target_state == PM_SUSPEND_MEM". Silence this warning by initializing ret to zero. Fixes: 78b5f6f8855e ("PCI: qcom: Add OPP support to scale performance") Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Tested-by: Anders Roxell <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]>
2024-07-10PCI: qcom: Prevent potential error pointer dereferenceDan Carpenter1-1/+1
Only call dev_pm_opp_put() if dev_pm_opp_find_freq_exact() succeeds; otherwise it leads to an error pointer dereference. Fixes: 78b5f6f8855e ("PCI: qcom: Add OPP support to scale performance") Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Tested-by: Anders Roxell <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]>
2024-07-10PCI: qcom: Fix missing error code in qcom_pcie_probe()Dan Carpenter1-1/+2
Return a negative error code if dev_pm_opp_find_freq_floor() fails; don't return success. Fixes: 78b5f6f8855e ("PCI: qcom: Add OPP support to scale performance") Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Tested-by: Anders Roxell <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]>
2024-07-10minixfs: Fix minixfs_rename with HIGHMEMMatthew Wilcox (Oracle)1-2/+1
minixfs now uses kmap_local_page(), so we can't call kunmap() to undo it. This one call was missed as part of the commit this fixes. Fixes: 6628f69ee66a (minixfs: Use dir_put_page() in minix_unlink() and minix_rename()) Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-07-10PCI: Give pcim_set_mwi() its own devres cleanup callbackPhilipp Stanner2-12/+18
Managing pci_set_mwi() with devres can easily be done with its own callback, without the necessity to store any state about it in a device-related struct. Remove the MWI state from struct pci_devres. Give pcim_set_mwi() a separate devres cleanup callback. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-10PCI: Move struct pci_devres.pinned bit to struct pci_devPhilipp Stanner3-12/+6
The bit describing whether the PCI device is currently pinned is stored in struct pci_devres. To clean up and simplify the PCI devres API, it's better if this information is stored in struct pci_dev. This will later permit simplifying pcim_enable_device(). Move the 'pinned' boolean bit to struct pci_dev. Restructure bits in struct pci_dev so the pm / pme fields are next to each other. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-10PCI: Remove struct pci_devres.enabled status bitPhilipp Stanner3-14/+4
The struct pci_devres has a separate boolean to track whether a device is enabled. That, however, can easily be tracked in an agnostic manner through the function pci_is_enabled(). Using it allows for simplifying the PCI devres implementation. Replace the separate 'enabled' status bit from struct pci_devres with calls to pci_is_enabled() at the appropriate places. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-10PCI: Document hybrid devres hazardsPhilipp Stanner2-1/+62
These functions: pci_request_region() pci_request_regions() pci_request_regions_exclusive() pci_request_selected_regions() pci_request_selected_regions_exclusive() pci_intx() are "hybrid" functions that are managed if pcim_enable_device() has been called, but unmanaged otherwise. This is confusing and has already caused a bug (in 8558de401b5f ("drm/vboxvideo: use managed pci functions")) because users believe all PCI functions, such as pci_iomap_range(), can become managed that way, which is not the case. Add comments to the relevant functions' docstrings that warn users about this behavior. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-10PCI: Add managed pcim_request_region()Philipp Stanner3-65/+47
These existing functions: pci_request_region() pci_request_selected_regions() pci_request_selected_regions_exclusive() are "hybrid" functions built on __pci_request_region() and are managed if pcim_enable_device() has been called, but unmanaged otherwise. Add these new functions: pcim_request_region() pcim_request_region_exclusive() These are *always* managed and use the new pcim_addr_devres tracking infrastructure instead of find_pci_dr() and struct pci_devres.region_mask. Implement the hybrid functions using the new "pure" functions and remove struct pci_devres.region_mask, which is no longer needed. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-10PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()Philipp Stanner1-1/+11
Deprecate pcim_iomap_table(). It returns a pointer to a table of ioremapped BARs, or NULL if it fails. This makes uses like this: addr = pcim_iomap_table(pdev)[0]; problematic because it causes a NULL pointer dereference on failure. Callers should use pcim_iomap() instead. Deprecate pcim_iomap_regions_request_all() because it is built on __pci_request_region() and is managed if pcim_enable_device() has been called, but unmanaged otherwise, which is prone to errors. Callers should either use pcim_iomap_regions() to request and map BARs, or use pcim_request_region() followed by pcim_iomap(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> [bhelgaas: commit log, sphinx markup] Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-10PCI: Add managed partial-BAR request and map infrastructurePhilipp Stanner3-65/+569
The pcim_iomap_devres table tracks entire-BAR mappings, so we can't use it to build a managed version of pci_iomap_range(), which maps partial BARs. Add struct pcim_addr_devres, which can track request and mapping of both entire BARs and partial BARs. Add the following internal devres functions based on struct pcim_addr_devres: pcim_iomap_region() # request & map entire BAR pcim_iounmap_region() # unmap & release entire BAR pcim_request_region() # request entire BAR pcim_release_region() # release entire BAR pcim_request_all_regions() # request all entire BARs pcim_release_all_regions() # release all entire BARs Rework the following public interfaces using the new infrastructure listed above: pcim_iomap() # map partial BAR pcim_iounmap() # unmap partial BAR pcim_iomap_regions() # request & map specified BARs pcim_iomap_regions_request_all() # request all BARs, map specified BARs pcim_iounmap_regions() # unmap & release specified BARs Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-10PCI: Add devres helpers for iomap tablePhilipp Stanner1-19/+58
The pcim_iomap_devres.table administrated by pcim_iomap_table() has its entries set and unset at several places throughout devres.c using manual iterations which are effectively code duplications. Add pcim_add_mapping_to_legacy_table() and pcim_remove_mapping_from_legacy_table() helper functions and use them where possible. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-10PCI: Add and use devres helper for bit masksPhilipp Stanner1-4/+8
The current devres implementation uses manual shift operations to check whether a bit in a mask is set. The code can be made more readable by writing a small helper function for that. Implement mask_contains_bar() and use it where applicable. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-10net: phy: aquantia: add support for aqr115cBartosz Golaszewski1-0/+26
Add support for a new model to the Aquantia driver. This PHY supports 2.5 gigabit speeds. The PHY mode is referred to by the manufacturer as Overclocked SGMII (OCSGMII) but this actually is just 2500BASEX without in-band signalling so reuse the existing mode to avoid changing the uAPI. Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-10net: phy: aquantia: wait for the GLOBAL_CFG to start returning real valuesBartosz Golaszewski1-1/+7
When the PHY is first coming up (or resuming from suspend), it's possible that although the FW status shows as running, we still see zeroes in the GLOBAL_CFG set of registers and cannot determine available modes. Since all models support 10M, add a poll and wait the config to become available. Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-10net: phy: aquantia: wait for FW reset before checking the vendor IDBartosz Golaszewski1-0/+4
Checking the firmware register before it complete the boot process makes no sense, it will report 0 even if FW is available from internal memory. Always wait for FW to boot before continuing or we'll unnecessarily try to load it from nvmem/filesystem and fail. Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-10net: phy: aquantia: rename and export aqr107_wait_reset_complete()Bartosz Golaszewski2-3/+5
This function is quite generic in this driver and not limited to aqr107. We will use it outside its current compilation unit soon so rename it and declare it in the header. Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-09bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCUMatt Bobrowski4-15/+12
Currently, BPF kfuncs which accept trusted pointer arguments i.e. those flagged as KF_TRUSTED_ARGS, KF_RCU, or KF_RELEASE, all require an original/unmodified trusted pointer argument to be supplied to them. By original/unmodified, it means that the backing register holding the trusted pointer argument that is to be supplied to the BPF kfunc must have its fixed offset set to zero, or else the BPF verifier will outright reject the BPF program load. However, this zero fixed offset constraint that is currently enforced by the BPF verifier onto BPF kfuncs specifically flagged to accept KF_TRUSTED_ARGS or KF_RCU trusted pointer arguments is rather unnecessary, and can limit their usability in practice. Specifically, it completely eliminates the possibility of constructing a derived trusted pointer from an original trusted pointer. To put it simply, a derived pointer is a pointer which points to one of the nested member fields of the object being pointed to by the original trusted pointer. This patch relaxes the zero fixed offset constraint that is enforced upon BPF kfuncs which specifically accept KF_TRUSTED_ARGS, or KF_RCU arguments. Although, the zero fixed offset constraint technically also applies to BPF kfuncs accepting KF_RELEASE arguments, relaxing this constraint for such BPF kfuncs has subtle and unwanted side-effects. This was discovered by experimenting a little further with an initial version of this patch series [0]. The primary issue with relaxing the zero fixed offset constraint on BPF kfuncs accepting KF_RELEASE arguments is that it'd would open up the opportunity for BPF programs to supply both trusted pointers and derived trusted pointers to them. For KF_RELEASE BPF kfuncs specifically, this could be problematic as resources associated with the backing pointer could be released by the backing BPF kfunc and cause instabilities for the rest of the kernel. With this new fixed offset semantic in-place for BPF kfuncs accepting KF_TRUSTED_ARGS and KF_RCU arguments, we now have more flexibility when it comes to the BPF kfuncs that we're able to introduce moving forward. Early discussions covering the possibility of relaxing the zero fixed offset constraint can be found using the link below. This will provide more context on where all this has stemmed from [1]. Notably, pre-existing tests have been updated such that they provide coverage for the updated zero fixed offset functionality. Specifically, the nested offset test was converted from a negative to positive test as it was already designed to assert zero fixed offset semantics of a KF_TRUSTED_ARGS BPF kfunc. [0] https://lore.kernel.org/bpf/[email protected]/ [1] https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Matt Bobrowski <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-07-09Merge branch 'fix-libbpf-bpf-skeleton-forward-backward-compat'Alexei Starovoitov2-45/+72
Andrii Nakryiko says: ==================== Fix libbpf BPF skeleton forward/backward compat Fix recently identified (but long standing) bug with handling BPF skeleton forward and backward compatibility. On libbpf side, even though BPF skeleton was always designed to be forward and backwards compatible through recording actual size of constrituents of BPF skeleton itself (map/prog/var skeleton definitions), libbpf implementation did implicitly hard-code those sizes by virtue of using a trivial array access syntax. This issue will only affect libbpf used as a shared library. Statically compiled libbpfs will always be in sync with BPF skeleton, bypassing this problem altogether. This patch set fixes libbpf, but also mitigates the problem for old libbpf versions by teaching bpftool to generate more conservative BPF skeleton, if possible (i.e., if there are no struct_ops maps defined). v1->v2: - fix SOB, add acks, typo fixes (Quentin, Eduard); - improve reporting of skipped map auto-attachment (Alan, Eduard). ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-07-09libbpf: improve old BPF skeleton handling for map auto-attachAndrii Nakryiko1-12/+14
Improve how we handle old BPF skeletons when it comes to BPF map auto-attachment. Emit one warn-level message per each struct_ops map that could have been auto-attached, if user provided recent enough BPF skeleton version. Don't spam log if there are no relevant struct_ops maps, though. This should help users realize that they probably need to regenerate BPF skeleton header with more recent bpftool/libbpf-cargo (or whatever other means of BPF skeleton generation). Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-07-09libbpf: fix BPF skeleton forward/backward compat handlingAndrii Nakryiko1-20/+27
BPF skeleton was designed from day one to be extensible. Generated BPF skeleton code specifies actual sizes of map/prog/variable skeletons for that reason and libbpf is supposed to work with newer/older versions correctly. Unfortunately, it was missed that we implicitly embed hard-coded most up-to-date (according to libbpf's version of libbpf.h header used to compile BPF skeleton header) sizes of those structs, which can differ from the actual sizes at runtime when libbpf is used as a shared library. We have a few places were we just index array of maps/progs/vars, which implicitly uses these potentially invalid sizes of structs. This patch aims to fix this problem going forward. Once this lands, we'll backport these changes in Github repo to create patched releases for older libbpfs. Acked-by: Eduard Zingerman <[email protected]> Reviewed-by: Alan Maguire <[email protected]> Fixes: d66562fba1ce ("libbpf: Add BPF object skeleton support") Fixes: 430025e5dca5 ("libbpf: Add subskeleton scaffolding") Fixes: 08ac454e258e ("libbpf: Auto-attach struct_ops BPF maps in BPF skeleton") Co-developed-by: Mykyta Yatsenko <[email protected]> Signed-off-by: Mykyta Yatsenko <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-07-09bpftool: improve skeleton backwards compat with old buggy libbpfsAndrii Nakryiko1-14/+32
Old versions of libbpf don't handle varying sizes of bpf_map_skeleton struct correctly. As such, BPF skeleton generated by newest bpftool might not be compatible with older libbpf (though only when libbpf is used as a shared library), even though it, by design, should. Going forward libbpf will be fixed, plus we'll release bug fixed versions of relevant old libbpfs, but meanwhile try to mitigate from bpftool side by conservatively assuming older and smaller definition of bpf_map_skeleton, if possible. Meaning, if there are no struct_ops maps. If there are struct_ops, then presumably user would like to have auto-attaching logic and struct_ops map link placeholders, so use the full bpf_map_skeleton definition in that case. Acked-by: Quentin Monnet <[email protected]> Co-developed-by: Mykyta Yatsenko <[email protected]> Signed-off-by: Mykyta Yatsenko <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-07-09net: ethernet: lantiq_etop: fix double free in detachAleksander Jan Bajkowski1-2/+2
The number of the currently released descriptor is never incremented which results in the same skb being released multiple times. Fixes: 504d4721ee8e ("MIPS: Lantiq: Add ethernet driver") Reported-by: Joe Perches <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Aleksander Jan Bajkowski <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09netxen_nic: Use {low,upp}er_32_bits() helpersGeert Uytterhoeven1-5/+2
Use the existing {low,upp}er_32_bits() helpers instead of defining custom variants. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/319d4a5313ac75f7bbbb6b230b6802b18075c3e0.1720430602.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09Merge branch 'mlx5-misc-patches-2023-07-08'Jakub Kicinski4-6/+8
Tariq Toukan says: ==================== mlx5 misc patches 2023-07-08 This patchset contains features and small enhancements from the team to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09net/mlx5e: CT: Initialize err to 0 to avoid warningCosmin Ratiu1-1/+1
It is theoretically possible to return bogus uninitialized values from mlx5_tc_ct_entry_replace_rules, even though in practice this will never be the case as the flow rule will be part of at least the regular ct table or the ct nat table, if not both. But to reduce noise, initialize err to 0. Fixes: 49d37d05f216 ("net/mlx5: CT: Separate CT and CT-NAT tuple entries") Signed-off-by: Cosmin Ratiu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09net/mlx5e: SHAMPO, Add missing aggregate counterDragos Tatulea1-0/+2
When the rx_hds_nodata_packets/bytes counters were added, the aggregate counters were omitted. This patch adds them. Fixes: e95c5b9e8912 ("net/mlx5e: SHAMPO, Add header-only ethtool counters for header data split") Signed-off-by: Dragos Tatulea <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09net/mlx5: DR, Remove definer functions from SW Steering APIYevgeny Kliteynik2-5/+5
No need to expose definer get/put functions as part of SW Steering API - they are internal functions. Signed-off-by: Yevgeny Kliteynik <[email protected]> Reviewed-by: Alex Vesker <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09Merge branch 'mlxsw-improvements'Jakub Kicinski3-26/+35
Petr Machata says: ==================== mlxsw: Improvements This patchset contains assortments of improvements to the mlxsw driver. Please see individual patches for details. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09mlxsw: pci: Lock configuration space of upstream bridge during resetIdo Schimmel1-0/+6
The driver triggers a "Secondary Bus Reset" (SBR) by calling __pci_reset_function_locked() which asserts the SBR bit in the "Bridge Control Register" in the configuration space of the upstream bridge for 2ms. This is done without locking the configuration space of the upstream bridge port, allowing user space to access it concurrently. Linux 6.11 will start warning about such unlocked resets [1][2]: pcieport 0000:00:01.0: unlocked secondary bus reset via: pci_reset_bus_function+0x51c/0x6a0 Avoid the warning and the concurrent access by locking the configuration space of the upstream bridge prior to the reset and unlocking it afterwards. [1] https://lore.kernel.org/all/171711746953.1628941.4692125082286867825.stgit@dwillia2-xfh.jf.intel.com/ [2] https://lore.kernel.org/all/20240531213150.GA610983@bhelgaas/ Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Link: https://patch.msgid.link/9937b0afdb50f2f2825945393c94c093c04a5897.1720447210.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09mlxsw: core_thermal: Report valid current state during cooling device ↵Ido Schimmel1-26/+25
registration Commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") changed the thermal core to read the current state of the cooling device as part of the cooling device's registration. This is incompatible with the current implementation of the cooling device operations in mlxsw, leading to initialization failure with errors such as: mlxsw_spectrum 0000:01:00.0: Failed to register cooling device mlxsw_spectrum 0000:01:00.0: cannot register bus device The reason for the failure is that when the get current state operation is invoked the driver tries to derive the index of the cooling device by walking a per thermal zone array and looking for the matching cooling device pointer. However, the pointer is returned from the registration function and therefore only set in the array after the registration. The issue was later fixed by commit 1af89dedc8a5 ("thermal: core: Do not fail cdev registration because of invalid initial state") by not failing the registration of the cooling device if it cannot report a valid current state during registration, although drivers are responsible for ensuring that this will not happen. Therefore, make sure the driver is able to report a valid current state for the cooling device during registration by passing to the registration function a per cooling device private data that already has the cooling device index populated. While at it, call thermal_cooling_device_unregister() unconditionally since the function returns immediately if the cooling device pointer is NULL. Reviewed-by: Vadim Pasternak <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Link: https://patch.msgid.link/c823c4678b6b7afb902c35b3551c81a053afd110.1720447210.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09mlxsw: Warn about invalid accesses to array fieldsPetr Machata1-0/+4
A forgotten or buggy variable initialization can cause out-of-bounds access to a register or other item array field. For an overflow, such access would mangle adjacent parts of the register payload. For an underflow, due to all variables being unsigned, the access would likely trample unrelated memory. Since neither is correct, replace these accesses with accesses at the index of 0, and warn about the issue. Suggested-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Link: https://patch.msgid.link/b988fb265c2f6c1206fe12d5bfdcfa188b7672d1.1720447210.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09i40e: Fix XDP program unloading while removing the driverMichal Kubiak1-5/+4
The commit 6533e558c650 ("i40e: Fix reset path while removing the driver") introduced a new PF state "__I40E_IN_REMOVE" to block modifying the XDP program while the driver is being removed. Unfortunately, such a change is useful only if the ".ndo_bpf()" callback was called out of the rmmod context because unloading the existing XDP program is also a part of driver removing procedure. In other words, from the rmmod context the driver is expected to unload the XDP program without reporting any errors. Otherwise, the kernel warning with callstack is printed out to dmesg. Example failing scenario: 1. Load the i40e driver. 2. Load the XDP program. 3. Unload the i40e driver (using "rmmod" command). The example kernel warning log: [ +0.004646] WARNING: CPU: 94 PID: 10395 at net/core/dev.c:9290 unregister_netdevice_many_notify+0x7a9/0x870 [...] [ +0.010959] RIP: 0010:unregister_netdevice_many_notify+0x7a9/0x870 [...] [ +0.002726] Call Trace: [ +0.002457] <TASK> [ +0.002119] ? __warn+0x80/0x120 [ +0.003245] ? unregister_netdevice_many_notify+0x7a9/0x870 [ +0.005586] ? report_bug+0x164/0x190 [ +0.003678] ? handle_bug+0x3c/0x80 [ +0.003503] ? exc_invalid_op+0x17/0x70 [ +0.003846] ? asm_exc_invalid_op+0x1a/0x20 [ +0.004200] ? unregister_netdevice_many_notify+0x7a9/0x870 [ +0.005579] ? unregister_netdevice_many_notify+0x3cc/0x870 [ +0.005586] unregister_netdevice_queue+0xf7/0x140 [ +0.004806] unregister_netdev+0x1c/0x30 [ +0.003933] i40e_vsi_release+0x87/0x2f0 [i40e] [ +0.004604] i40e_remove+0x1a1/0x420 [i40e] [ +0.004220] pci_device_remove+0x3f/0xb0 [ +0.003943] device_release_driver_internal+0x19f/0x200 [ +0.005243] driver_detach+0x48/0x90 [ +0.003586] bus_remove_driver+0x6d/0xf0 [ +0.003939] pci_unregister_driver+0x2e/0xb0 [ +0.004278] i40e_exit_module+0x10/0x5f0 [i40e] [ +0.004570] __do_sys_delete_module.isra.0+0x197/0x310 [ +0.005153] do_syscall_64+0x85/0x170 [ +0.003684] ? syscall_exit_to_user_mode+0x69/0x220 [ +0.004886] ? do_syscall_64+0x95/0x170 [ +0.003851] ? exc_page_fault+0x7e/0x180 [ +0.003932] entry_SYSCALL_64_after_hwframe+0x71/0x79 [ +0.005064] RIP: 0033:0x7f59dc9347cb [ +0.003648] Code: 73 01 c3 48 8b 0d 65 16 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 16 0c 00 f7 d8 64 89 01 48 [ +0.018753] RSP: 002b:00007ffffac99048 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ +0.007577] RAX: ffffffffffffffda RBX: 0000559b9bb2f6e0 RCX: 00007f59dc9347cb [ +0.007140] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559b9bb2f748 [ +0.007146] RBP: 00007ffffac99070 R08: 1999999999999999 R09: 0000000000000000 [ +0.007133] R10: 00007f59dc9a5ac0 R11: 0000000000000206 R12: 0000000000000000 [ +0.007141] R13: 00007ffffac992d8 R14: 0000559b9bb2f6e0 R15: 0000000000000000 [ +0.007151] </TASK> [ +0.002204] ---[ end trace 0000000000000000 ]--- Fix this by checking if the XDP program is being loaded or unloaded. Then, block only loading a new program while "__I40E_IN_REMOVE" is set. Also, move testing "__I40E_IN_REMOVE" flag to the beginning of XDP_SETUP callback to avoid unnecessary operations and checks. Fixes: 6533e558c650 ("i40e: Fix reset path while removing the driver") Signed-off-by: Michal Kubiak <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-10tracing/kprobes: Fix build error when find_module() is not availableMasami Hiramatsu (Google)1-6/+19
The kernel test robot reported that the find_module() is not available if CONFIG_MODULES=n. Fix this error by hiding find_modules() in #ifdef CONFIG_MODULES with related rcu locks as try_module_get_by_name(). Link: https://lore.kernel.org/all/172056819167.201571.250053007194508038.stgit@devnote2/ Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-07-09Merge branch 'selftests-drv-net-rss_ctx-more-tests'Jakub Kicinski1-25/+189
Jakub Kicinski says: ==================== selftests: drv-net: rss_ctx: more tests Add a few more tests for RSS. v1: https://lore.kernel.org/all/[email protected]/ ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09selftests: drv-net: rss_ctx: test flow rehashing without impacting trafficJakub Kicinski1-1/+31
Some workloads may want to rehash the flows in response to an imbalance. Most effective way to do that is changing the RSS key. Check that changing the key does not cause link flaps or traffic disruption. Disrupting traffic for key update is not incorrect, but makes the key update unusable for rehashing under load. Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09selftests: drv-net: rss_ctx: check behavior of indirection table resizingJakub Kicinski1-1/+36
Some devices dynamically increase and decrease the size of the RSS indirection table based on the number of enabled queues. When that happens driver must maintain the balance of entries (preferably duplicating the smaller table). Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09selftests: drv-net: rss_ctx: test queue changes vs user RSS configJakub Kicinski1-1/+80
By default main RSS table should change to include all queues. When user sets a specific RSS config the driver should preserve it, even when queue count changes. Driver should refuse to deactivate queues used in the user-set RSS config. For additional contexts driver should still refuse to deactivate queues in use. Whether the contexts should get resized like context 0 when queue count increases is a bit unclear. I anticipate most drivers today don't do that. Since main use case for additional contexts is to set the indir table - it doesn't seem worthwhile to care about behavior of the default table too much. Don't test that. Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09selftests: drv-net: rss_ctx: factor out send traffic and checkJakub Kicinski1-19/+39
Wrap up sending traffic and checking in which queues it landed in a helper. The method used for testing is to send a lot of iperf traffic and check which queues received the most packets. Those should be the queues where we expect iperf to land - either because we installed a filter for the port iperf uses, or we didn't and expect it to use context 0. Contexts get disjoint queue sets, but the main context (AKA context 0) may receive some background traffic (noise). Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09selftests: drv-net: rss_ctx: fix cleanup in the basic testJakub Kicinski1-4/+4
The basic test may fail without resetting the RSS indir table. Use the .exec() method to run cleanup early since we re-test with traffic that returning to default state works. While at it reformat the doc a tiny bit. Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-09PCI: dw-rockchip: Use pci_epc_init_notify() directlyNiklas Cassel1-1/+1
A previous commit ("PCI: dwc: ep: Remove dw_pcie_ep_init_notify() wrapper") removed the dw_pcie_ep_init_notify() wrapper and changed the DWC glue drivers to instead use pci_epc_init_notify() directly. The endpoint support for the dw-rockchip had not been merged at that point in time, so the previous commit wrapper") did not update dw-rockchip. Do the same change for dw-rockchip, so that the driver will not try to use a function that has now been removed. Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-07-09PCI: dw-rockchip: Add endpoint mode supportNiklas Cassel3-5/+228
The PCIe controller in rk3568 and rk3588 can operate in endpoint mode. This endpoint mode support heavily leverages the existing code in pcie-designware-ep.c. Add support for endpoint mode to the existing pcie-dw-rockchip glue driver. [kwilczynski: squash with patch adding the PCI_ENDPOINT dependency] Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]>
2024-07-09PCI: dw-rockchip: Refactor the driver to prepare for EP modeNiklas Cassel1-24/+60
Refactor the driver to prepare for EP mode. Add of-match data to the existing compatible, and explicitly define it as DW_PCIE_RC_TYPE. This way, we will be able to add EP mode in a follow-up commit in a much less intrusive way, which makes the follow-up commit much easier to review. No functional change intended. Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]>
2024-07-09PCI: dw-rockchip: Add rockchip_pcie_get_ltssm() helperNiklas Cassel1-1/+6
Add a rockchip_pcie_ltssm() helper function that reads the LTSSM status. This helper will be used in additional places in follow-up commits. Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]>
2024-07-09PCI: dw-rockchip: Fix weird indentationNiklas Cassel1-4/+3
Fix the indentation of rockchip_pcie_{readl,writel}_apb() parameters to match the opening parenthesis. Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]>
2024-07-09PCI: dw-rockchip: Fix initial PERST# GPIO valueNiklas Cassel1-1/+1
PERST# is active low according to the PCIe specification. However, the existing pcie-dw-rockchip.c driver does: gpiod_set_value(..., 0); msleep(100); gpiod_set_value(..., 1); when asserting + deasserting PERST#. This is of course wrong, but because all the device trees for this compatible string have also incorrectly marked this GPIO as ACTIVE_HIGH: $ git grep -B 10 reset-gpios arch/arm64/boot/dts/rockchip/rk3568* $ git grep -B 10 reset-gpios arch/arm64/boot/dts/rockchip/rk3588* The actual toggling of PERST# is correct, and we cannot change it anyway, since that would break device tree compatibility. However, this driver does request the GPIO to be initialized as GPIOD_OUT_HIGH, which does cause a silly sequence where PERST# gets toggled back and forth for no good reason. Fix this by requesting the GPIO to be initialized as GPIOD_OUT_LOW (which for this driver means PERST# asserted). This will avoid an unnecessary signal change where PERST# gets deasserted (by devm_gpiod_get_optional()) and then gets asserted (by rockchip_pcie_start_link()) just a few instructions later. Before patch, debug prints on EP side, when booting RC: [ 845.606810] pci: PERST# asserted by host! [ 852.483985] pci: PERST# de-asserted by host! [ 852.503041] pci: PERST# asserted by host! [ 852.610318] pci: PERST# de-asserted by host! After patch, debug prints on EP side, when booting RC: [ 125.107921] pci: PERST# asserted by host! [ 132.111429] pci: PERST# de-asserted by host! This extra, very short, PERST# assertion + deassertion has been reported to cause issues with certain WLAN controllers, e.g. RTL8822CE. Fixes: 0e898eb8df4e ("PCI: rockchip-dwc: Add Rockchip RK356X host controller driver") Link: https://lore.kernel.org/linux-pci/[email protected] Tested-by: Heiko Stuebner <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Heiko Stuebner <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Cc: [email protected] # v5.15+
2024-07-09PCI: dw-rockchip: Add error messages in .probe() error pathsUwe Kleine-König1-8/+13
Drivers that silently fail to probe provide a bad user experience and make it unnecessarily hard to debug such a failure. Fix it by using dev_err_probe() instead of a plain return. [kwilczynski: commit log] Link: https://lore.kernel.org/linux-pci/[email protected] Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Heiko Stuebner <[email protected]> Reviewed-by: Jesper Nilsson <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]>