aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-20bpf: Move the declaration of __bpf_obj_drop_impl() to bpf.hHou Tao3-4/+1
both syscall.c and helpers.c have the declaration of __bpf_obj_drop_impl(), so just move it to a common header file. Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-10-20bpf: Use pcpu_alloc_size() in bpf_mem_free{_rcu}()Hou Tao2-2/+15
For bpf_global_percpu_ma, the pointer passed to bpf_mem_free_rcu() is allocated by kmalloc() and its size is fixed (16-bytes on x86-64). So no matter which cache allocates the dynamic per-cpu area, on x86-64 cache[2] will always be used to free the per-cpu area. Fix the unbalance by checking whether the bpf memory allocator is per-cpu or not and use pcpu_alloc_size() instead of ksize() to find the correct cache for per-cpu free. Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-10-20bpf: Re-enable unit_size checking for global per-cpu allocatorHou Tao1-10/+12
With pcpu_alloc_size() in place, check whether or not the size of the dynamic per-cpu area is matched with unit_size. Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-10-20mm/percpu.c: introduce pcpu_alloc_size()Hou Tao2-0/+32
Introduce pcpu_alloc_size() to get the size of the dynamic per-cpu area. It will be used by bpf memory allocator in the following patches. BPF memory allocator maintains per-cpu area caches for multiple area sizes and its free API only has the to-be-freed per-cpu pointer, so it needs the size of dynamic per-cpu area to select the corresponding cache when bpf program frees the dynamic per-cpu pointer. Acked-by: Dennis Zhou <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-10-20Merge tag 'nfs-for-6.6-4' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds5-21/+38
Pull NFS client fixes from Anna Schumaker: "Stable Fix: - Fix a pNFS hang in nfs4_evict_inode() Fixes: - Force update of suid/sgid bits after an NFS v4.2 ALLOCATE op - Fix a potential oops in nfs_inode_remove_request() - Check the validity of the layout pointer in ff_layout_mirror_prepare_stats() - Fix incorrectly marking the pNFS MDS with USE_PNFS_DS in some cases" * tag 'nfs-for-6.6-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.1: fixup use EXCHGID4_FLAG_USE_PNFS_DS for DS server pNFS/flexfiles: Check the layout validity in ff_layout_mirror_prepare_stats pNFS: Fix a hang in nfs4_evict_inode() NFS: Fix potential oops in nfs_inode_remove_request() nfs42: client needs to strip file mode's suid/sgid bit after ALLOCATE op
2023-10-20Merge tag 'fsnotify_for_v6.6-rc7' of ↵Linus Torvalds1-8/+17
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "Disable superblock / mount marks for filesystems that can encode file handles but not open them (currently only overlayfs). It is not clear the functionality is useful in any way so let's better disable it before someone comes up with some creative misuse" * tag 'fsnotify_for_v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: limit reporting of event with non-decodeable file handles
2023-10-20Merge tag 'acpi-6.6-rc7' of ↵Linus Torvalds2-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix the ACPI initialization ordering on ARM and ACPI IRQ management in the cases when irq_create_fwspec_mapping() fails. Specifics: - Fix ACPI initialization ordering on ARM that was changed incorrectly during the 6.5 development cycle (Hanjun Guo) - Make acpi_register_gsi() return an error code as appropriate when irq_create_fwspec_mapping() returns 0 on failure (Sunil V L)" * tag 'acpi-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: bus: Move acpi_arm_init() to the place of after acpi_ghes_init() ACPI: irq: Fix incorrect return value in acpi_register_gsi()
2023-10-20Merge tag 'scsi-fixes' of ↵Linus Torvalds2-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes, both in drivers. The mptsas one is really fixing an error path issue where it can leave the misc driver loaded even though the sas driver fails to initialize" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: qla2xxx: Fix double free of dsd_list during driver load scsi: mpt3sas: Fix in error path
2023-10-20Merge tag 'pinctrl-v6.6-3' of ↵Linus Torvalds2-15/+18
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Concurrent register updates in the Qualcomm LPASS pin controller gets a proper lock. - revert a mutex fix that was causing problems: contention on the mutex or something of the sort lead to probe reordering and MMC block devices start to register in a different order, which unsuspecting userspace is not ready to handle * tag 'pinctrl-v6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: Revert "pinctrl: avoid unsafe code pattern in find_pinctrl()" pinctrl: qcom: lpass-lpi: fix concurrent register updates
2023-10-20Merge tag 'mtd/fixes-for-6.6-rc7' of ↵Linus Torvalds12-5/+73
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD fixes from Miquel Raynal: "In the raw NAND subsystem, the major fix prevents using cached reads with devices not supporting it. There was two bug reports about this. Apart from that, three drivers (pl353, arasan and marvell) could sometimes hide page program failures due to their their own program page helper not being fully compliant with the specification (many drivers use the default helpers shared by the core). Adding a missing check prevents these situation. Finally, the Qualcomm driver had a broken error path. In the SPI-NAND subsystem one Micron device used a wrong bitmak reporting possibly corrupted ECC status. Finally, the physmap-core got stripped from its map_rom fallback by mistake, this feature is added back" * tag 'mtd/fixes-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: Ensure the nand chip supports cached reads mtd: rawnand: qcom: Unmap the right resource upon probe failure mtd: rawnand: pl353: Ensure program page operations are successful mtd: rawnand: arasan: Ensure program page operations are successful mtd: spinand: micron: correct bitmask for ecc status mtd: physmap-core: Restore map_rom fallback mtd: rawnand: marvell: Ensure program page operations are successful
2023-10-20Merge tag 'mmc-v6.6-rc3' of ↵Linus Torvalds7-55/+99
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Capture correct oemid-bits for eMMC cards - Fix error propagation for some ioctl commands - Hold retuning if SDIO is in 1-bit mode MMC host: - mtk-sd: Use readl_poll_timeout_atomic to not "schedule while atomic" - sdhci-msm: Correct minimum number of clocks - sdhci-pci-gli: Fix LPM negotiation so x86/S0ix SoCs can suspend - sdhci-sprd: Fix error code in sdhci_sprd_tuning()" * tag 'mmc-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Capture correct oemid-bits for eMMC cards mmc: mtk-sd: Use readl_poll_timeout_atomic in msdc_reset_hw mmc: core: Fix error propagation for some ioctl commands mmc: sdhci-sprd: Fix error code in sdhci_sprd_tuning() mmc: sdhci-pci-gli: fix LPM negotiation so x86/S0ix SoCs can suspend mmc: core: sdio: hold retuning if sdio in 1-bit mode dt-bindings: mmc: sdhci-msm: correct minimum number of clocks
2023-10-20Merge tag 'block-6.6-2023-10-20' of git://git.kernel.dk/linuxLinus Torvalds7-18/+25
Pull block fixes from Jens Axboe: "A fix for a regression with sed-opal and saved keys, and outside of that an NVMe pull request fixing a few minor issues on that front" * tag 'block-6.6-2023-10-20' of git://git.kernel.dk/linux: nvme-pci: add BOGUS_NID for Intel 0a54 device nvmet-auth: complete a request only after freeing the dhchap pointers nvme: sanitize metadata bounce buffer for reads block: Fix regression in sed-opal for a saved key. nvme-auth: use chap->s2 to indicate bidirectional authentication nvmet-tcp: Fix a possible UAF in queue intialization setup nvme-rdma: do not try to stop unallocated queues
2023-10-20Merge tag 'io_uring-6.6-2023-10-20' of git://git.kernel.dk/linuxLinus Torvalds1-0/+6
Pull io_uring fix from Jens Axboe: "Just a single fix for a bug report that came in, fixing a case where failure to init a ring with IORING_SETUP_NO_MMAP can trigger a NULL pointer dereference" * tag 'io_uring-6.6-2023-10-20' of git://git.kernel.dk/linux: io_uring: fix crash with IORING_SETUP_NO_MMAP and invalid SQ ring address
2023-10-20mm/percpu.c: don't acquire pcpu_lock for pcpu_chunk_addr_search()Hou Tao1-3/+1
There is no need to acquire pcpu_lock for pcpu_chunk_addr_search(): 1) both pcpu_first_chunk & pcpu_reserved_chunk must have been initialized before the invocation of free_percpu(). 2) The dynamically-created chunk must be valid before the per-cpu pointers allocated from it are freed. So acquire pcpu_lock() after the invocation of pcpu_chunk_addr_search(). Acked-by: Dennis Zhou <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-10-20Merge tag 'sound-6.6-rc7' of ↵Linus Torvalds16-34/+146
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Still higher volume than wished, but all are driver-specific small fixes and look safe for this late RC. The majority of changes are for ASoC, especially for wcd938x driver and Cirrus codec drivers, while there are other random fixes including usual HD-audio quirks" * tag 'sound-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (22 commits) ASoC: da7219: Correct the process of setting up Gnd switch in AAD ALSA: hda/realtek - Fixed ASUS platform headset Mic issue ALSA: hda/realtek: Add quirk for ASUS ROG GU603ZV ALSA: hda/relatek: Enable Mute LED on HP Laptop 15s-fq5xxx ASoC: dwc: Fix non-DT instantiation ASoC: codecs: tas2780: Fix log of failed reset via I2C. ASoC: rt5650: fix the wrong result of key button ASoC: cs42l42: Fix missing include of gpio/consumer.h ASoC: cs42l43: Update values for bias sense ASoC: dt-bindings: cirrus,cs42l43: Update values for bias sense ASoC: cs35l56: ASP1 DOUT must default to Hi-Z when not transmitting ASoC: pxa: fix a memory leak in probe() ASoC: cs35l56: Fix illegal use of init_completion() ASoC: codecs: wcd938x-sdw: fix runtime PM imbalance on probe errors ASoC: codecs: wcd938x-sdw: fix use after free on driver unbind ASoC: codecs: wcd938x: fix runtime PM imbalance on remove ASoC: codecs: wcd938x: fix regulator leaks on probe errors ASoC: codecs: wcd938x: fix resource leaks on bind errors ASoC: codecs: wcd938x: fix unbind tear down order ASoC: codecs: wcd938x: drop bogus bind error handling ...
2023-10-20Merge tag 'drm-fixes-2023-10-20' of git://anongit.freedesktop.org/drm/drmLinus Torvalds23-72/+109
Pull drm fixes from Dave Airlie: "Regular fixes for the week, amdgpu, i915, nouveau, with some other scattered around, nothing major. amdgpu: - Fix possible NULL pointer dereference - Avoid possible BUG_ON in GPUVM updates - Disable AMD_CTX_PRIORITY_UNSET i915: - Fix display issue that was blocking S0ix - Retry gtt fault when out of fence registers bridge: - ti-sn65dsi86: Fix device lifetime edid: - Add quirk for BenQ GW2765 ivpu: - Extend address range for MMU mmap nouveau: - DP-connector fixes - Documentation fixes panel: - Move AUX B116XW03 into panel-simple scheduler: - Eliminate DRM_SCHED_PRIORITY_UNSET ttm: - Fix possible NULL-ptr deref in cleanup mediatek: - Correctly free sg_table in gem prime vmap" * tag 'drm-fixes-2023-10-20' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: Reserve fences for VM update drm/amdgpu: Fix possible null pointer dereference accel/ivpu: Extend address range for MMU mmap Revert "accel/ivpu: Use cached buffers for FW loading" accel/ivpu: Don't enter d0i3 during FLR drm/i915: Retry gtt fault when out of fence registers drm/i915/cx0: Only clear/set the Pipe Reset bit of the PHY Lanes Owned gpu/drm: Eliminate DRM_SCHED_PRIORITY_UNSET drm/amdgpu: Unset context priority is now invalid drm/mediatek: Correctly free sg_table in gem prime vmap drm/edid: add 8 bpc quirk to the BenQ GW2765 drm/ttm: Reorder sys manager cleanup step drm/nouveau/disp: fix DP capable DSM connectors drm/nouveau: exec: fix ioctl kernel-doc warning drm/panel: Move AUX B116XW03 out of panel-edp back to panel-simple drm/bridge: ti-sn65dsi86: Associate DSI device lifetime with auxiliary device
2023-10-20selftests/bpf: Make linked_list failure test more robustKumar Kartikeya Dwivedi2-9/+5
The linked list failure test 'pop_front_off' and 'pop_back_off' currently rely on matching exact instruction and register values. The purpose of the test is to ensure the offset is correctly incremented for the returned pointers from list pop helpers, which can then be used with container_of to obtain the real object. Hence, somehow obtaining the information that the offset is 48 will work for us. Make the test more robust by relying on verifier error string of bpf_spin_lock and remove dependence on fragile instruction index or register number, which can be affected by different clang versions used to build the selftests. Fixes: 300f19dcdb99 ("selftests/bpf: Add BPF linked list API tests") Reported-by: Andrii Nakryiko <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-10-20Merge 3rd batch of EFI fixes into efi/urgentArd Biesheuvel4-10/+71
2023-10-20efi/unaccepted: Fix soft lockups caused by parallel memory acceptanceKirill A. Shutemov1-4/+60
Michael reported soft lockups on a system that has unaccepted memory. This occurs when a user attempts to allocate and accept memory on multiple CPUs simultaneously. The root cause of the issue is that memory acceptance is serialized with a spinlock, allowing only one CPU to accept memory at a time. The other CPUs spin and wait for their turn, leading to starvation and soft lockup reports. To address this, the code has been modified to release the spinlock while accepting memory. This allows for parallel memory acceptance on multiple CPUs. A newly introduced "accepting_list" keeps track of which memory is currently being accepted. This is necessary to prevent parallel acceptance of the same memory block. If a collision occurs, the lock is released and the process is retried. Such collisions should rarely occur. The main path for memory acceptance is the page allocator, which accepts memory in MAX_ORDER chunks. As long as MAX_ORDER is equal to or larger than the unit_size, collisions will never occur because the caller fully owns the memory block being accepted. Aside from the page allocator, only memblock and deferered_free_range() accept memory, but this only happens during boot. The code has been tested with unit_size == 128MiB to trigger collisions and validate the retry codepath. Fixes: 2053bc57f367 ("efi: Add unaccepted memory support") Signed-off-by: Kirill A. Shutemov <[email protected]> Reported-by: Michael Roth <[email protected] Reviewed-by: Nikolay Borisov <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Tested-by: Michael Roth <[email protected]> [ardb: drop unnecessary cpu_relax() call] Signed-off-by: Ard Biesheuvel <[email protected]>
2023-10-20Merge branch 'acpi-irq'Rafael J. Wysocki1-1/+6
Merge ACPI IRQ management fix for 6.6-rc7 (Sunil V L). * acpi-irq: ACPI: irq: Fix incorrect return value in acpi_register_gsi()
2023-10-20selftests/ftrace: Add new test case which checks non unique symbolFrancis Laniel1-0/+13
If name_show() is non unique, this test will try to install a kprobe on this function which should fail returning EADDRNOTAVAIL. On kernel where name_show() is not unique, this test is skipped. Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Signed-off-by: Francis Laniel <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-10-20tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbolsFrancis Laniel2-0/+64
When a kprobe is attached to a function that's name is not unique (is static and shares the name with other functions in the kernel), the kprobe is attached to the first function it finds. This is a bug as the function that it is attaching to is not necessarily the one that the user wants to attach to. Instead of blindly picking a function to attach to what is ambiguous, error with EADDRNOTAVAIL to let the user know that this function is not unique, and that the user must use another unique function with an address offset to get to the function they want to attach to. Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Fixes: 413d37d1eb69 ("tracing: Add kprobe-based event tracer") Suggested-by: Masami Hiramatsu <[email protected]> Signed-off-by: Francis Laniel <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-10-20igb: Fix potential memory leak in igb_add_ethtool_nfc_entryMateusz Palczewski1-1/+5
Add check for return of igb_update_ethtool_nfc_entry so that in case of any potential errors the memory alocated for input will be freed. Fixes: 0e71def25281 ("igb: add support of RX network flow classification") Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Mateusz Palczewski <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20treewide: Spelling fix in commentKunwu Chan1-1/+1
reques -> request Fixes: 09dde54c6a69 ("PS3: gelic: Add wireless support for PS3") Signed-off-by: Kunwu Chan <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20Merge branch 'ice-vf-resource-tracking'David S. Miller18-108/+398
Jacob Keller says: ==================== Intel Wired LAN Driver Updates 2023-10-19 (ice, igb, ixgbe) This series contains improvements to the ice driver related to VF MSI-X resource tracking, as well as other minor cleanups. Dan fixes code in igb and ixgbe where the conversion to list_for_each_entry failed to account for logic which assumed a NULL pointer after iteration. Jacob makes ice_get_pf_c827_idx static, and refactors ice_find_netlist_node based on feedback that got missed before the function merged. Michal adds a switch rule to drop all traffic received by an inactive LAG port. He also implements ops to allow individual control of MSI-X vectors for SR-IOV VFs. Przemek removes some unused fields in struct ice_flow_entry, and modifies the ice driver to cache the VF PCI device inside struct ice_vf rather than performing lookup at run time. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-10-20ixgbe: fix end of loop test in ixgbe_set_vf_macvlan()Dan Carpenter1-9/+10
The list iterator in a list_for_each_entry() loop can never be NULL. If the loop exits without hitting a break then the iterator points to an offset off the list head and dereferencing it is an out of bounds access. Before we transitioned to using list_for_each_entry() loops, then it was possible for "entry" to be NULL and the comments mention this. I have updated the comments to match the new code. Fixes: c1fec890458a ("ethernet/intel: Use list_for_each_entry() helper") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20igb: Fix an end of loop testDan Carpenter1-3/+6
When we exit a list_for_each_entry() without hitting a break statement, the list iterator isn't NULL, it just point to an offset off the list_head. In that situation, it wouldn't be too surprising for entry->free to be true and we end up corrupting memory. The way to test for these is to just set a flag. Fixes: c1fec890458a ("ethernet/intel: Use list_for_each_entry() helper") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20ice: cleanup ice_find_netlist_nodeJacob Keller1-15/+15
The ice_find_netlist_node function was introduced in commit 8a3a565ff210 ("ice: add admin commands to access cgu configuration"). Variations of this function were reviewed concurrently on both intel-wired-lan[1][2], and netdev [3][4] [1]: https://lore.kernel.org/intel-wired-lan/[email protected]/ [2]: https://lore.kernel.org/intel-wired-lan/[email protected]/ [3]: https://lore.kernel.org/netdev/[email protected]/ [4]: https://lore.kernel.org/netdev/[email protected]/ The variant I posted had a few changes due to review feedback which were never incorporated into the DPLL series: * Replace the references to ancient and long removed ICE_SUCCESS and ICE_ERR_DOES_NOT_EXIST status codes in the function comment. * Return -ENOENT instead of -ENOTBLK, as a more common way to indicate that an entry doesn't exist. * Avoid the use of memset() and use simple static initialization for the cmd variable. * Use FIELD_PREP to assign the node_type_ctx. * Remove an unnecessary local variable to keep track of rec_node_handle, just pass the node_handle pointer directly into ice_aq_get_netlist_node. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Reviewed-by: Paul Menzel <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20ice: make ice_get_pf_c827_idx staticJacob Keller2-2/+1
The ice_get_pf_c827_idx function is only called inside of ice_ptp_hw.c, so there is no reason to export it. Mark it static and remove the declaration from ice_ptp_hw.h Signed-off-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: David S. Miller <[email protected]>
2023-10-20ice: manage VFs MSI-X using resource trackingMichal Swiatkowski1-19/+151
Track MSI-X for VFs using bitmap, by setting and clearing bitmap during allocation and freeing. Try to linearize irqs usage for VFs, by freeing them and allocating once again. Do it only for VFs that aren't currently running. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20ice: set MSI-X vector count on VFMichal Swiatkowski3-0/+84
Implement ops needed to set MSI-X vector count on VF. sriov_get_vf_total_msix() should return total number of MSI-X that can be used by the VFs. Return the value set by devlink resources API (pf->req_msix.vf). sriov_set_msix_vec_count() will set number of MSI-X on particular VF. Disable VF register mapping, rebuild VSI with new MSI-X and queues values and enable new VF register mapping. For best performance set number of queues equal to number of MSI-X. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20ice: add bitmap to track VF MSI-X usageMichal Swiatkowski2-0/+11
Create a bitamp to track MSI-X usage for VFs. The bitmap has the size of total MSI-X amount on device, because at init time the amount of MSI-X used by VFs isn't known. The bitmap is used in follow up patchset to provide a block of continuous block of MSI-X indexes for each created VF. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20ice: implement num_msix field per VFMichal Swiatkowski4-7/+14
Store the amount of MSI-X per VF instead of storing it in pf struct. It is used to calculate number of q_vectors (and queues) for VF VSI. This is necessary because with follow up changes the number of MSI-X can be different between VFs. Use it instead of using pf->vf_msix value in all cases. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20ice: store VF's pci_dev ptr in ice_vfPrzemek Kitszel5-28/+32
Extend struct ice_vf by vfdev. Calculation of vfdev falls more nicely into ice_create_vf_entries(). Caching of vfdev enables simplification of ice_restore_all_vfs_msi_state(). Reviewed-by: Jesse Brandeburg <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Co-developed-by: Mateusz Polchlopek <[email protected]> Signed-off-by: Mateusz Polchlopek <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20ice: add drop rule matching on not active lportMichal Swiatkowski3-20/+75
Inactive LAG port should not receive any packets, as it can cause adding invalid FDBs (bridge offload). Add a drop rule matching on inactive lport in LAG. Reviewed-by: Simon Horman <[email protected]> Co-developed-by: Marcin Szycik <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20ice: remove unused ice_flow_entry fieldsPrzemek Kitszel2-7/+1
Remove ::entry and ::entry_sz fields of &ice_flow_entry, as they were never set. Suggested-by: Maciej Fijalkowski <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20i40e: Fix I40E_FLAG_VF_VLAN_PRUNING valueIvan Vecera1-1/+1
Commit c87c938f62d8f1 ("i40e: Add VF VLAN pruning") added new PF flag I40E_FLAG_VF_VLAN_PRUNING but its value collides with existing I40E_FLAG_TOTAL_PORT_SHUTDOWN_ENABLED flag. Move the affected flag at the end of the flags and fix its value. Reproducer: [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close on [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning on [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off [ 6323.142585] i40e 0000:02:00.0: Setting link-down-on-close not supported on this port (because total-port-shutdown is enabled) netlink error: Operation not supported [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning off [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off The link-down-on-close flag cannot be modified after setting vf-vlan-pruning because vf-vlan-pruning shares the same bit with total-port-shutdown flag that prevents any modification of link-down-on-close flag. Fixes: c87c938f62d8 ("i40e: Add VF VLAN pruning") Cc: Mateusz Palczewski <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: David S. Miller <[email protected]>
2023-10-20ethtool: untangle the linkmode and ethtool headersJakub Kicinski3-35/+37
Commit 26c5334d344d ("ethtool: Add forced speed to supported link modes maps") added a dependency between ethtool.h and linkmode.h. The dependency in the opposite direction already exists so the new code was inserted in an awkward place. The reason for ethtool.h to include linkmode.h, is that ethtool_forced_speed_maps_init() is a static inline helper. That's not really necessary. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Paul Greenwalt <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.Heng Guo8-10/+16
Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.168.122.207 MTU 1500 VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500 VM3: eth0 ip: 192.168.123.240 MTU 1500 Reproduce: VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0. scapy command: send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1, inter=1.000000) Result: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0 ...... ---------------------------------------------------------------------- "ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase from 7 to 8. Issue description and patch: IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS ("OutOctets") in ip_finish_output2(). According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but not "OutRequests". "OutRequests" does not include any datagrams counted in "ForwDatagrams". ipSystemStatsOutOctets OBJECT-TYPE DESCRIPTION "The total number of octets in IP datagrams delivered to the lower layers for transmission. Octets from datagrams counted in ipIfStatsOutTransmits MUST be counted here. ipSystemStatsOutRequests OBJECT-TYPE DESCRIPTION "The total number of IP datagrams that local IP user- protocols (including ICMP) supplied to IP in requests for transmission. Note that this counter does not include any datagrams counted in ipSystemStatsOutForwDatagrams. So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add IPSTATS_MIB_OUTREQUESTS for "OutRequests". Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6. Test result with patch: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0 ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates OutTransmits Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5 ...... root@qemux86-64:~# cat /proc/net/netstat ...... IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0 ---------------------------------------------------------------------- "ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3. "OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428. Signed-off-by: Heng Guo <[email protected]> Reviewed-by: Kun Song <[email protected]> Reviewed-by: Filip Pudak <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20Merge branch 'ksz886x-forced-link-modes'David S. Miller3-3/+109
Oleksij Rempel says: ==================== fix forced link mode for KSZ886X switches changes v3: - squash patch 1 and 2 - use genphy_config_aneg() instead of genphy_setup_forced() changes v2: - address kernel test robot warning - change comment explaining clearing of KSZ886X_CTRL_FORCE_LINK bit - s/PHY we create/PHY will create/ ==================== Signed-off-by: David S. Miller <[email protected]>
2023-10-20net: phy: micrel: Fix forced link mode for KSZ886X switchesOleksij Rempel1-0/+22
Address a link speed detection issue in KSZ886X PHY driver when in forced link mode. Previously, link partners like "ASIX AX88772B" with KSZ8873 could fall back to 10Mbit instead of configured 100Mbit. The issue arises as KSZ886X PHY continues sending Fast Link Pulses (FLPs) even with autonegotiation off, misleading link partners in autoneg mode, leading to incorrect link speed detection. Now, when autonegotiation is disabled, the driver sets the link state forcefully using KSZ886X_CTRL_FORCE_LINK bit. This action, beyond just disabling autonegotiation, makes the PHY state more reliably detected by link partners using parallel detection, thus fixing the link speed misconfiguration. With autonegotiation enabled, link state is not forced, allowing proper autonegotiation process participation. Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Divya Koppera <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20net: dsa: microchip: ksz8: Enable MIIM PHY Control reg accessOleksij Rempel2-3/+87
Provide access to MIIM PHY Control register (Reg. 31) through ksz8_r_phy_ctrl() and ksz8_w_phy_ctrl() functions. Necessary for upcoming micrel.c patch to address forced link mode configuration. Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reported-by: kernel test robot <[email protected]> Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20Merge branch 'mlxsw-lag-table-allocation'David S. Miller9-81/+202
Petr Machata says: ==================== mlxsw: Move allocation of LAG table to the driver PGT is an in-HW table that maps addresses to sets of ports. Then when some HW process needs a set of ports as an argument, instead of embedding the actual set in the dynamic configuration, what gets configured is the address referencing the set. The HW then works with the appropriate PGT entry. Within the PGT is placed a LAG table. That is a contiguous block of PGT memory where each entry describes which ports are members of the corresponding LAG port. The PGT is split to two parts: one managed by the FW, and one managed by the driver. Historically, the FW part included also the LAG table, referred to as FW LAG mode. Giving the responsibility for placement of the LAG table to the driver, referred to as SW LAG mode, makes the whole system more flexible. The FW currently supports both FW and SW LAG modes. To shed complexity, the FW should in the future only support SW LAG mode. Hence this patchset, where support for placement of LAG is added to mlxsw. There are FW versions out there that do not support SW LAG mode, and on Spectrum-1 in particular, there is no plan to support it at all. mlxsw will therefore have to support both modes of operation. Another aspect is that at least on Spectrum-1, there are FW versions out there that claim to support driver-placed LAG table, but then reject or ignore configurations enabling the same. The driver thus has to have a say in whether an attempt to configure SW LAG mode should even be done. The feature is therefore expressed in terms of "does the driver prefer SW LAG mode?", and "what LAG mode the PCI module managed to configure the FW with". This is unlike current flood mode configuration, where the driver can give a strict value, and that's what gets configured. But it gives a chance to the driver to determine whether LAG mode should be enabled at all. The "does the driver prefer SW LAG mode?" bit is expressed as a boolean lag_mode_prefer_sw. The reason for this is largely another feature that will be introduced in a follow-up patchset: support for CFF flood mode. The driver currently requires that the FW be configured with what is called controlled flood mode. But on capable systems, CFF would be preferred. So there are two values in flight: the preferred flood mode, and the fallback. This could be expressed with an array of flood modes ordered by preference, but that looks like an overkill in comparison. This flag/value model is then reused for LAG mode as well, except the fallback value is absent and implied to be FW, because there are no other values to choose from. The patchset progresses as follows: - Patches #1 to #5 adjust reg.h and cmd.h with new register fields, constants and remarks. - Patches #6 and #7 add the ability to request SW LAG mode and to query the LAG mode that was actually negotiated. This is where the abovementioned lag_mode_prefer_sw flag is added. - Patches #7 to #9 generalize PGT allocations to make it possible to allocate the LAG table, which is done in patch #10. - In patch #11, toggle lag_mode_prefer_sw on Spectrum-2 and above, which makes the newly-added code live. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-10-20mlxsw: spectrum: Set SW LAG mode on Spectrum>1Petr Machata1-0/+2
On Spectrum-2, Spectrum-3 and Spectrum-4 machines, request SW responsibility for placement of the LAG table. On Spectrum-1, some FW versions claim to support lag_mode field despite quietly ignoring any settings made to that field. Thus refrain from attempting to configure lag_mode on those systems at all. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20mlxsw: spectrum: Allocate LAG table when in SW LAG modePetr Machata2-11/+83
In this patch, if the LAG mode is SW, allocate the LAG table and configure SGCR to indicate where it was allocated. We use the default "DDD" (for dynamic data duplication) layout of the LAG table. In the DDD mode, the membership information for each LAG is copied in 8 PGT entries. This is done for performance reasons. The LAG table then needs to be allocated on an address aligned to 8. Deal with this by moving the LAG init ahead so that the LAG table is allocated at address 0. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20mlxsw: spectrum_pgt: Generalize PGT allocationPetr Machata3-23/+7
PGT blocks are allocated through the function mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows which piece of PGT exactly they want to get. That was fine while the FID code was the only client allocating blocks of PGT. However for SW-allocated LAG table, there will be an additional client: mlxsw_sp_lag_init(). The interface should therefore be changed to not require particular coordinates, but to take just the requested size, allocate the block wherever, and give back the PGT address. In this patch, change the interface accordingly. Initialize FID family's pgt_base from the result of the PGT allocation (note that mlxsw makes a copy of the family structure, so what gets initialized is not actually the global structure). Drop the now-unnecessary pgt_base initializations and the corresponding defines. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20mlxsw: spectrum_fid: Allocate PGT for the whole FID family in one goPetr Machata1-30/+33
PGT blocks are allocated through the function mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows which piece of PGT exactly they want to get. That was fine while the FID code was the only client allocating blocks of PGT. However for SW-allocated LAG table, there will be an additional client: mlxsw_sp_lag_init(). The interface should therefore be changed to not require particular coordinates, but to take just the requested size, allocate the block wherever, and give back the PGT address. The current FID mode has one place where PGT address can be stored: the FID family's pgt_base. The allocation scheme should therefore be changed from allocating a block per FID flood table, to allocating a block per FID family. Do just that in this patch. The per-family allocation is going to be useful for another related feature as well: the CFF mode. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20mlxsw: pci: Permit toggling LAG modePetr Machata2-4/+13
Add to struct mlxsw_config_profile a field lag_mode_prefer_sw for the driver to indicate that SW LAG mode should be configured if possible. Add to the PCI module code to set lag_mode as appropriate. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20mlxsw: core, pci: Add plumbing related to LAG modePetr Machata3-0/+24
lag_mode describes where the responsibility for LAG table placement lies: SW or FW. The bus module determines whether LAG is supported, can configure it if it is, and knows what (if any) configuration has been applied. Therefore add a bus callback to determine the configured LAG mode. Also add to core an API to query it. The LAG mode is for now kept at the default value of 0 for FW-managed. The code to actually toggle it will be added later. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20mlxsw: cmd: Add QUERY_FW.lag_mode_supportPetr Machata1-0/+6
Add QUERY_FW.lag_mode_support, which determines whether CONFIG_PROFILE.lag_mode is available. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>