aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2024-04-24eth: bnxt: fix counting packets discarded due to OOM and netpollJakub Kicinski1-26/+18
I added OOM and netpoll discard counters, naively assuming that the cpr pointer is pointing to a common completion ring. Turns out that is usually *a* completion ring but not *the* completion ring which bnapi->cp_ring points to. bnapi->cp_ring is where the stats are read from, so we end up reporting 0 thru ethtool -S and qstat even though the drop events have happened. Make 100% sure we're recording statistics in the correct structure. Fixes: 907fd4a294db ("bnxt: count discards due to memory allocation errors") Reviewed-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24igc: Fix LED-related deadlock on driver unbindLukas Wunner3-8/+35
Roman reports a deadlock on unplug of a Thunderbolt docking station containing an Intel I225 Ethernet adapter. The root cause is that led_classdev's for LEDs on the adapter are registered such that they're device-managed by the netdev. That results in recursive acquisition of the rtnl_lock() mutex on unplug: When the driver calls unregister_netdev(), it acquires rtnl_lock(), then frees the device-managed resources. Upon unregistering the LEDs, netdev_trig_deactivate() invokes unregister_netdevice_notifier(), which tries to acquire rtnl_lock() again. Avoid by using non-device-managed LED registration. Stack trace for posterity: schedule+0x6e/0xf0 schedule_preempt_disabled+0x15/0x20 __mutex_lock+0x2a0/0x750 unregister_netdevice_notifier+0x40/0x150 netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev] led_trigger_set+0x102/0x330 led_classdev_unregister+0x4b/0x110 release_nodes+0x3d/0xb0 devres_release_all+0x8b/0xc0 device_del+0x34f/0x3c0 unregister_netdevice_many_notify+0x80b/0xaf0 unregister_netdev+0x7c/0xd0 igc_remove+0xd8/0x1e0 [igc] pci_device_remove+0x3f/0xb0 Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226") Reported-by: Roman Lozko <[email protected]> Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/ Reported-by: "Marek Marczykowski-Górecki" <[email protected]> Closes: https://lore.kernel.org/r/ZhRD3cOtz5i-61PB@mail-itl/ Signed-off-by: Kurt Kanzenbach <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Cc: Heiner Kallweit <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Tested-by: Kurt Kanzenbach <[email protected]> # Intel i225 Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24Merge branch '100GbE' of ↵Jakub Kicinski28-74/+598
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: Support 5 layer Tx scheduler topology Mateusz Polchlopek says: For performance reasons there is a need to have support for selectable Tx scheduler topology. Currently firmware supports only the default 9-layer and 5-layer topology. This patch series enables switch from default to 5-layer topology, if user decides to opt-in. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Document tx_scheduling_layers parameter ice: Add tx_scheduling_layers devlink param ice: Enable switching default Tx scheduler topology ice: Adjust the VSI/Aggregator layers ice: Support 5 layer topology devlink: extend devlink_param *set pointer ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24octeontx2-pf: flower: check for unsupported control flagsAsbjørn Sloth Tønnesen1-4/+4
Use flow_rule_is_supp_control_flags() to reject filters with unsupported control flags. In case any unsupported control flags are masked, flow_rule_is_supp_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Remove FLOW_DIS_FIRST_FRAG specific error message, and treat it as any other unsupported control flag. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Sunil Goutham <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24net: hns3: flower: validate control flagsAsbjørn Sloth Tønnesen1-3/+15
This driver currently doesn't support any control flags. Use flow_rule_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Also propagate extack to hclge_get_cls_key_ip(), and convert it to return error code. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Tested-by: Jijie Shao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24net: ethernet: ti: cpsw: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+3
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24net: ethernet: ti: am65-cpsw: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+3
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24bnxt_en: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+4
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Sriharsha Basavapatna <[email protected]> Tested-by: Sriharsha Basavapatna <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24net: ethernet: ti: am65-cpsw-nuss: Enable SGMII mode for J784S4 CPSW9GChintan Vankar1-1/+2
TI's J784S4 SoC supports SGMII mode with CPSW9G instance of the CPSW Ethernet Switch. Thus, enable it by adding SGMII mode to the extra_modes member of the "j784s4_cpswxg_pdata" SoC data. Reviewed-by: Roger Quadros <[email protected]> Signed-off-by: Chintan Vankar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24Revert "net: txgbe: fix clk_name exceed MAX_DEV_ID limits"Duanqiang Wen1-1/+1
This reverts commit e30cef001da259e8df354b813015d0e5acc08740. commit 99f4570cfba1 ("clkdev: Update clkdev id usage to allow for longer names") can fix clk_name exceed MAX_DEV_ID limits, so this commit is meaningless. Signed-off-by: Duanqiang Wen <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24Revert "net: txgbe: fix i2c dev name cannot match clkdev"Duanqiang Wen1-5/+3
This reverts commit c644920ce9220d83e070f575a4df711741c07f07. when register i2c dev, txgbe shorten "i2c_designware" to "i2c_dw", will cause this i2c dev can't match platfom driver i2c_designware_platform. Signed-off-by: Duanqiang Wen <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash workIdo Schimmel1-1/+5
The rehash delayed work is rescheduled with a delay if the number of credits at end of the work is not negative as supposedly it means that the migration ended. Otherwise, it is rescheduled immediately. After "mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash" the above is no longer accurate as a non-negative number of credits is no longer indicative of the migration being done. It can also happen if the work encountered an error in which case the migration will resume the next time the work is scheduled. The significance of the above is that it is possible for the work to be pending and associated with hints that were allocated when the migration started. This leads to the hints being leaked [1] when the work is canceled while pending as part of ACL region dismantle. Fix by freeing the hints if hints are associated with a work that was canceled while pending. Blame the original commit since the reliance on not having a pending work associated with hints is fragile. [1] unreferenced object 0xffff88810e7c3000 (size 256): comm "kworker/0:16", pid 176, jiffies 4295460353 hex dump (first 32 bytes): 00 30 95 11 81 88 ff ff 61 00 00 00 00 00 00 80 .0......a....... 00 00 61 00 40 00 00 00 00 00 00 00 04 00 00 00 ..a.@........... backtrace (crc 2544ddb9): [<00000000cf8cfab3>] kmalloc_trace+0x23f/0x2a0 [<000000004d9a1ad9>] objagg_hints_get+0x42/0x390 [<000000000b143cf3>] mlxsw_sp_acl_erp_rehash_hints_get+0xca/0x400 [<0000000059bdb60a>] mlxsw_sp_acl_tcam_vregion_rehash_work+0x868/0x1160 [<00000000e81fd734>] process_one_work+0x59c/0xf20 [<00000000ceee9e81>] worker_thread+0x799/0x12c0 [<00000000bda6fe39>] kthread+0x246/0x300 [<0000000070056d23>] ret_from_fork+0x34/0x70 [<00000000dea2b93e>] ret_from_fork_asm+0x1a/0x30 Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/0cc12ebb07c4d4c41a1265ee2c28b392ff997a86.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix incorrect list API usageIdo Schimmel1-0/+7
Both the function that migrates all the chunks within a region and the function that migrates all the entries within a chunk call list_first_entry() on the respective lists without checking that the lists are not empty. This is incorrect usage of the API, which leads to the following warning [1]. Fix by returning if the lists are empty as there is nothing to migrate in this case. [1] WARNING: CPU: 0 PID: 6437 at drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:1266 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0> Modules linked in: CPU: 0 PID: 6437 Comm: kworker/0:37 Not tainted 6.9.0-rc3-custom-00883-g94a65f079ef6 #39 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0x2c0 [...] Call Trace: <TASK> mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x4a0 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/4628e9a22d1d84818e28310abbbc498e7bc31bc9.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix warning during rehashIdo Schimmel1-3/+17
As previously explained, the rehash delayed work migrates filters from one region to another. This is done by iterating over all chunks (all the filters with the same priority) in the region and in each chunk iterating over all the filters. When the work runs out of credits it stores the current chunk and entry as markers in the per-work context so that it would know where to resume the migration from the next time the work is scheduled. Upon error, the chunk marker is reset to NULL, but without resetting the entry markers despite being relative to it. This can result in migration being resumed from an entry that does not belong to the chunk being migrated. In turn, this will eventually lead to a chunk being iterated over as if it is an entry. Because of how the two structures happen to be defined, this does not lead to KASAN splats, but to warnings such as [1]. Fix by creating a helper that resets all the markers and call it from all the places the currently only reset the chunk marker. For good measures also call it when starting a completely new rehash. Add a warning to avoid future cases. [1] WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0 Modules linked in: CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G W 6.9.0-rc3-custom-00880-g29e61d91b77b #29 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_afk_encode+0x242/0x2f0 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290 mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 </TASK> Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix memory leak during rehashIdo Schimmel1-0/+4
The rehash delayed work migrates filters from one region to another. This is done by iterating over all chunks (all the filters with the same priority) in the region and in each chunk iterating over all the filters. If the migration fails, the code tries to migrate the filters back to the old region. However, the rollback itself can also fail in which case another migration will be erroneously performed. Besides the fact that this ping pong is not a very good idea, it also creates a problem. Each virtual chunk references two chunks: The currently used one ('vchunk->chunk') and a backup ('vchunk->chunk2'). During migration the first holds the chunk we want to migrate filters to and the second holds the chunk we are migrating filters from. The code currently assumes - but does not verify - that the backup chunk does not exist (NULL) if the currently used chunk does not reference the target region. This assumption breaks when we are trying to rollback a rollback, resulting in the backup chunk being overwritten and leaked [1]. Fix by not rolling back a failed rollback and add a warning to avoid future cases. [1] WARNING: CPU: 5 PID: 1063 at lib/parman.c:291 parman_destroy+0x17/0x20 Modules linked in: CPU: 5 PID: 1063 Comm: kworker/5:11 Tainted: G W 6.9.0-rc2-custom-00784-gc6a05c468a0b #14 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:parman_destroy+0x17/0x20 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_region_fini+0x19/0x60 mlxsw_sp_acl_tcam_region_destroy+0x49/0xf0 mlxsw_sp_acl_tcam_vregion_rehash_work+0x1f1/0x470 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 843500518509 ("mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/d5edd4f4503934186ae5cfe268503b16345b4e0f.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Rate limit error messageIdo Schimmel1-1/+1
In the rare cases when the device resources are exhausted it is likely that the rehash delayed work will fail. An error message will be printed whenever this happens which can be overwhelming considering the fact that the work is per-region and that there can be hundreds of regions. Fix by rate limiting the error message. Fixes: e5e7962ee5c2 ("mlxsw: spectrum_acl: Implement region migration according to hints") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/c510763b2ebd25e7990d80183feff91cde593145.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehashIdo Schimmel1-0/+1
The rehash delayed work migrates filters from one region to another according to the number of available credits. The migrated from region is destroyed at the end of the work if the number of credits is non-negative as the assumption is that this is indicative of migration being complete. This assumption is incorrect as a non-negative number of credits can also be the result of a failed migration. The destruction of a region that still has filters referencing it can result in a use-after-free [1]. Fix by not destroying the region if migration failed. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230 Read of size 8 at addr ffff8881735319e8 by task kworker/0:31/3858 CPU: 0 PID: 3858 Comm: kworker/0:31 Tainted: G W 6.9.0-rc2-custom-00782-gf2275c2157d8 #5 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xce/0x670 kasan_report+0xd7/0x110 mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230 mlxsw_sp_acl_ctcam_entry_del+0x2e/0x70 mlxsw_sp_acl_atcam_entry_del+0x81/0x210 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3cd/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 174: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc+0x19c/0x360 mlxsw_sp_acl_tcam_region_create+0xdf/0x9c0 mlxsw_sp_acl_tcam_vregion_rehash_work+0x954/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Freed by task 7: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 poison_slab_object+0x102/0x170 __kasan_slab_free+0x14/0x30 kfree+0xc1/0x290 mlxsw_sp_acl_tcam_region_destroy+0x272/0x310 mlxsw_sp_acl_tcam_vregion_rehash_work+0x731/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/3e412b5659ec2310c5c615760dfe5eac18dd7ebd.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity updateIdo Schimmel1-2/+8
The rule activity update delayed work periodically traverses the list of configured rules and queries their activity from the device. As part of this task it accesses the entry pointed by 'ventry->entry', but this entry can be changed concurrently by the rehash delayed work, leading to a use-after-free [1]. Fix by closing the race and perform the activity query under the 'vregion->lock' mutex. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140 Read of size 8 at addr ffff8881054ed808 by task kworker/0:18/181 CPU: 0 PID: 181 Comm: kworker/0:18 Not tainted 6.9.0-rc2-custom-00781-gd5ab772d32f7 #2 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xce/0x670 kasan_report+0xd7/0x110 mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140 mlxsw_sp_acl_rule_activity_update_work+0x219/0x400 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 1039: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc+0x19c/0x360 mlxsw_sp_acl_tcam_entry_create+0x7b/0x1f0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x30d/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Freed by task 1039: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 poison_slab_object+0x102/0x170 __kasan_slab_free+0x14/0x30 kfree+0xc1/0x290 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3d7/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/1fcce0a60b231ebeb2515d91022284ba7b4ffe7a.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix race during rehash delayed workIdo Schimmel1-2/+2
The purpose of the rehash delayed work is to reduce the number of masks (eRPs) used by an ACL region as the eRP bank is a global and limited resource. This is done in three steps: 1. Creating a new set of masks and a new ACL region which will use the new masks and to which the existing filters will be migrated to. The new region is assigned to 'vregion->region' and the region from which the filters are migrated from is assigned to 'vregion->region2'. 2. Migrating all the filters from the old region to the new region. 3. Destroying the old region and setting 'vregion->region2' to NULL. Only the second steps is performed under the 'vregion->lock' mutex although its comments says that among other things it "Protects consistency of region, region2 pointers". This is problematic as the first step can race with filter insertion from user space that uses 'vregion->region', but under the mutex. Fix by holding the mutex across the entirety of the delayed work and not only during the second step. Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/1ec1d54edf2bad0a369e6b4fa030aba64e1f124b.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix race in region ID allocationIdo Schimmel2-36/+30
Region identifiers can be allocated both when user space tries to insert a new tc filter and when filters are migrated from one region to another as part of the rehash delayed work. There is no lock protecting the bitmap from which these identifiers are allocated from, which is racy and leads to bad parameter errors from the device's firmware. Fix by converting the bitmap to IDA which handles its own locking. For consistency, do the same for the group identifiers that are part of the same structure. Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()") Reported-by: Amit Cohen <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/ce494b7940cadfe84f3e18da7785b51ef5f776e3.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24iavf: switch to Page PoolAlexander Lobakin4-214/+94
Now that the IAVF driver simply uses dev_alloc_page() + free_page() with no custom recycling logics, it can easily be switched to using Page Pool / libeth API instead. This allows to removing the whole dancing around headroom, HW buffer size, and page order. All DMA-for-device is now done in the PP core, for-CPU -- in the libeth helper. Use skb_mark_for_recycle() to bring back the recycling and restore the performance. Speaking of performance: on par with the baseline and faster with the PP optimization series applied. But the memory usage for 1500b MTU is now almost 2x lower (x86_64) thanks to allocating a page every second descriptor. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-04-24iavf: pack iavf_ring more efficientlyAlexander Lobakin2-25/+9
Before replacing the Rx buffer management with libie, clean up &iavf_ring a bit. There are several fields not used anywhere in the code -- simply remove them. Move ::tail up to remove a hole. Replace ::arm_wb boolean with 1-bit flag in ::flags to free 1 more byte. Finally, move ::prev_pkt_ctr out of &iavf_tx_queue_stats -- it doesn't belong there (used for Tx stall detection). Place it next to the stats on the ring itself to fill the 4-byte slot. The result: no holes and all the hot fields fit into the first 64-byte cacheline. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-04-24libeth: add Rx buffer managementAlexander Lobakin2-0/+99
Add a couple intuitive helpers to hide Rx buffer implementation details in the library and not multiplicate it between drivers. The settings are sorta optimized for 100G+ NICs, but nothing really HW-specific here. Use the new page_pool_dev_alloc() to dynamically switch between split-page and full-page modes depending on MTU, page size, required headroom etc. For example, on x86_64 with the default driver settings each page is shared between 2 buffers. Turning on XDP (not in this series) -> increasing headroom requirement pushes truesize out of 2048 boundary, leading to that each buffer starts getting a full page. The "ceiling" limit is %PAGE_SIZE, as only order-0 pages are used to avoid compound overhead. For the above architecture, this means maximum linear frame size of 3712 w/o XDP. Not that &libeth_buf_queue is not a complete queue/ring structure for now, rather a shim, but eventually the libeth-enabled drivers will move to it, with iavf being the first one. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-04-24iavf: drop page splitting and recyclingAlexander Lobakin5-241/+10
As an intermediate step, remove all page splitting/recycling code. Just always allocate a new page and don't touch its refcount, so that it gets freed by the core stack later. Same for the "in-place" recycling, i.e. when an unused buffer gets assigned to a first needs-refilling descriptor. In some cases, this was leading to moving up to 63 &iavf_rx_buf structures around the ring on a per-field basis -- not something wanted on hotpath. The change allows to greatly simplify certain parts of the code: Function: add/remove: 0/2 grow/shrink: 0/7 up/down: 0/-744 (-744) Although the array of &iavf_rx_buf is barely used now and could be replaced with just page pointer array, don't touch it now to not complicate replacing it with libie Rx buffer struct later on. No surprise perf loses up to 30% here, but that regression will go away once PP lands. Note that iavf_rx_pg_*() definitions are left to reduce diffstat. They will be removed with the conversion to Page Pool. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-04-24iavf: kill "legacy-rx" for goodAlexander Lobakin6-256/+8
Ever since build_skb() became stable, the old way with allocating an skb for storing the headers separately, which will be then copied manually, was slower, less flexible, and thus obsolete. * It had higher pressure on MM since it actually allocates new pages, which then get split and refcount-biased (NAPI page cache); * It implies memcpy() of packet headers (40+ bytes per each frame); * the actual header length was calculated via eth_get_headlen(), which invokes Flow Dissector and thus wastes a bunch of CPU cycles; * XDP makes it even more weird since it requires headroom for long and also tailroom for some time (since mbuf landed). Take a look at the ice driver, which is built around work-arounds to make XDP work with it. Even on some quite low-end hardware (not a common case for 100G NICs) it was performing worse. The only advantage "legacy-rx" had is that it didn't require any reserved headroom and tailroom. But iavf didn't use this, as it always splits pages into two halves of 2k, while that save would only be useful when striding. And again, XDP effectively removes that sole pro. There's a train of features to land in IAVF soon: Page Pool, XDP, XSk, multi-buffer etc. Each new would require adding more and more Danse Macabre for absolutely no reason, besides making hotpath less and less effective. Remove the "feature" with all the related code. This includes at least one very hot branch (typically hit on each new frame), which was either always-true or always-false at least for a complete NAPI bulk of 64 frames, the whole private flags cruft, and so on. Some stats: Function: add/remove: 0/4 grow/shrink: 0/7 up/down: 0/-721 (-721) RO Data: add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-40 (-40) Reviewed-by: Alexander Duyck <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-04-24net: intel: introduce {, Intel} Ethernet common libraryAlexander Lobakin21-1220/+268
Not a secret there's a ton of code duplication between two and more Intel ethernet modules. Before introducing new changes, which would need to be copied over again, start decoupling the already existing duplicate functionality into a new module, which will be shared between several Intel Ethernet drivers. Add the lookup table which converts 8/10-bit hardware packet type into a parsed bitfield structure for easy checking packet format parameters, such as payload level, IP version, etc. This is currently used by i40e, ice and iavf and it's all the same in all three drivers. The only difference introduced in this implementation is that instead of defining a 256 (or 1024 in case of ice) element array, add unlikely() condition to limit the input to 154 (current maximum non-reserved packet type). There's no reason to waste 600 (or even 3600) bytes only to not hurt very unlikely exception packets. The hash computation function now takes payload level directly as a pkt_hash_type. There's a couple cases when non-IP ptypes are marked as L3 payload and in the previous versions their hash level would be 2, not 3. But skb_set_hash() only sees difference between L4 and non-L4, thus this won't change anything at all. The module is behind the hidden Kconfig symbol, which the drivers will select when needed. The exports are behind 'LIBIE' namespace to limit the scope of the functions. Not that non-HW-specific symbols will live in yet another module, libeth. This is done to easily distinguish pretty generic code ready for reusing by any other vendor and/or for moving the layer up from the code useful in Intel's 1-100G drivers only. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-04-24net: sparx5: add support for matchall mirror statsDaniel Machon3-0/+67
Add support for tc matchall mirror stats. When a new matchall mirror rule is added, the baseline stats for that port is saved. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Steen Hegelund <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-24net: sparx5: add the tc glue to support port mirroringDaniel Machon1-2/+36
Add the necessary tc glue to add and delete mirror rules through tc matchall. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Steen Hegelund <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-24net: sparx5: add port mirroring implementationDaniel Machon3-1/+212
The hardware supports three independent mirroring probes. Each probe can be configured to mirror rx or tx traffic (direction). Using tc matchall, it is now possible to add a source port and a monitor port to a mirror probe. Depending on the mirror direction, rx or tx traffic from a source port will be mirrored to the monitor port. A single source port can be a member of multiple mirror probes. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Steen Hegelund <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-24net: sparx5: add bookkeeping code for matchall rulesDaniel Machon3-8/+67
In preparation for new tc matchall rules, we add a bit of bookkeeping code to keep track of them. The rules are identified by the cookie passed from the tc stack. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Steen Hegelund <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-24net: sparx5: add new register definitionsDaniel Machon1-0/+68
In preparation for port mirroring support through tc matchall, add the required register definitions. Signed-off-by: Daniel Machon <[email protected]> Reviewed-by: Steen Hegelund <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-24net: ibm/emac: allocate dummy net_device dynamicallyBreno Leitao2-4/+12
Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_device from the private struct by converting it into a pointer. Then use the leverage the new alloc_netdev_dummy() helper to allocate and initialize dummy devices. [1] https://lore.kernel.org/all/[email protected]/ Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-24net: mediatek: mtk_eth_sock: allocate dummy net_device dynamicallyBreno Leitao2-5/+14
Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_device from the private struct by converting it into a pointer. Then use the leverage the new alloc_netdev_dummy() helper to allocate and initialize dummy devices. [1] https://lore.kernel.org/all/[email protected]/ Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-24net: marvell: prestera: allocate dummy net_device dynamicallyBreno Leitao1-3/+12
Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_device from the private struct by converting it into a pointer. Then use the leverage the new alloc_netdev_dummy() helper to allocate and initialize dummy devices. [1] https://lore.kernel.org/all/[email protected]/ Signed-off-by: Breno Leitao <[email protected]> Acked-by: Elad Nachman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-23net: stmmac: Move MAC caps init to phylink MAC caps getterSerge Semin1-19/+17
After a set of recent fixes the stmmac_phy_setup() and stmmac_reinit_queues() methods have turned to having some duplicated code. Let's get rid from the duplication by moving the MAC-capabilities initialization to the PHYLINK MAC-capabilities getter. The getter is called during each network device interface open/close cycle. So the MAC-capabilities will be initialized in generic device open procedure and in case of the Tx/Rx queues re-initialization as the original code semantics implies. Signed-off-by: Serge Semin <[email protected]> Reviewed-by: Romain Gantois <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-23net: stmmac: Rename phylink_get_caps() callback to update_caps()Serge Semin3-11/+11
Since recent commits the stmmac_ops::phylink_get_caps() callback has no longer been responsible for the phylink MAC capabilities getting, but merely updates the MAC capabilities in the mac_device_info::link::caps field. Rename the callback to comply with the what the method does now. Signed-off-by: Serge Semin <[email protected]> Reviewed-by: Romain Gantois <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-23net: ethernet: ti: am65-cpsw/ethtool: Enable RX HW timestamp only for PTP ↵Chintan Vankar4-57/+35
packets In the current mechanism of timestamping, am65-cpsw-nuss driver enables hardware timestamping for all received packets by setting the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS module to timestamp all received packets, followed by passing timestamp via DMA descriptors. This mechanism causes CPSW Port to Lock up. To prevent port lock up, don't enable rx packet timestamping by setting TSTAMP_EN bit in CPTS_CONTROL register. The workaround for timestamping received packets is to utilize the CPTS Event FIFO that records timestamps corresponding to certain events. The CPTS module is configured to generate timestamps for Multicast Ethernet, UDP/IPv4 and UDP/IPv6 PTP packets. Update supported hwtstamp_rx_filters values for CPSW's timestamping capability. Fixes: b1f66a5bee07 ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support") Signed-off-by: Chintan Vankar <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-23net: ethernet: ti: am65-cpts: Enable RX HW timestamp for PTP packets using ↵Chintan Vankar2-7/+83
CPTS FIFO Add a new function "am65_cpts_rx_timestamp()" which checks for PTP packets from header and timestamps them. Add another function "am65_cpts_find_rx_ts()" which finds CPTS FIFO Event to get the timestamp of received PTP packet. Signed-off-by: Chintan Vankar <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-22net: ethernet: mtk_eth_soc: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+4
This driver currently doesn't support any control flags. Use flow_rule_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-22dpaa2-switch: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+6
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-22cxgb4: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+3
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Only compile tested, no hardware available. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-22net/mlx5e: Implement ethtool callbacks for supporting per-queue coalescingRahul Rameshbabu3-0/+152
Use mlx5 on-the-fly coalescing configuration support to enable individual channel configuration. Co-developed-by: Nabil S. Alramli <[email protected]> Signed-off-by: Nabil S. Alramli <[email protected]> Co-developed-by: Joe Damato <[email protected]> Signed-off-by: Joe Damato <[email protected]> Signed-off-by: Rahul Rameshbabu <[email protected]> Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-22net/mlx5e: Support updating coalescing configuration without resetting channelsRahul Rameshbabu10-165/+454
When CQE mode or DIM state is changed, gracefully reconfigure channels to handle new configuration. Previously, would create new channels that would reflect the changes rather than update the original channels. Co-developed-by: Nabil S. Alramli <[email protected]> Signed-off-by: Nabil S. Alramli <[email protected]> Co-developed-by: Joe Damato <[email protected]> Signed-off-by: Joe Damato <[email protected]> Signed-off-by: Rahul Rameshbabu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-22net/mlx5e: Dynamically allocate DIM structure for SQs/RQsRahul Rameshbabu4-12/+31
Make it possible for the DIM structure to be torn down while an SQ or RQ is still active. Changing the CQ period mode is an example where the previous sampling done with the DIM structure would need to be invalidated. Co-developed-by: Nabil S. Alramli <[email protected]> Signed-off-by: Nabil S. Alramli <[email protected]> Co-developed-by: Joe Damato <[email protected]> Signed-off-by: Joe Damato <[email protected]> Signed-off-by: Rahul Rameshbabu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-22net/mlx5e: Use DIM constants for CQ period mode parameterRahul Rameshbabu5-52/+51
Use core DIM CQ period mode enum values for the CQ parameter for the period mode. Translate the value to the specific mlx5 device constant for the selected period mode when creating a CQ. Avoid needing to translate mlx5 device constants to DIM constants for core DIM functionality. Co-developed-by: Nabil S. Alramli <[email protected]> Signed-off-by: Nabil S. Alramli <[email protected]> Co-developed-by: Joe Damato <[email protected]> Signed-off-by: Joe Damato <[email protected]> Signed-off-by: Rahul Rameshbabu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-22net/mlx5e: Move DIM function declarations to en/dim.hRahul Rameshbabu5-3/+18
Create a header specifically for DIM-related declarations. Move existing DIM-specific functionality from en.h. Future DIM-related functionality will be declared in en/dim.h in subsequent patches. Co-developed-by: Nabil S. Alramli <[email protected]> Signed-off-by: Nabil S. Alramli <[email protected]> Co-developed-by: Joe Damato <[email protected]> Signed-off-by: Joe Damato <[email protected]> Signed-off-by: Rahul Rameshbabu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-22net: hns3: Remove io_stop_wc() calls after __iowrite64_copy()Jason Gunthorpe1-4/+0
Now that the ARM64 arch implementation does the DGH as part of __iowrite64_copy() there is no reason to open code this in drivers. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jijie Shao<[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-04-22ice: Add tx_scheduling_layers devlink paramLukasz Czapnik6-10/+191
It was observed that Tx performance was inconsistent across all queues and/or VSIs and that it was directly connected to existing 9-layer topology of the Tx scheduler. Introduce new private devlink param - tx_scheduling_layers. This parameter gives user flexibility to choose the 5-layer transmit scheduler topology which helps to smooth out the transmit performance. Allowed parameter values are 5 and 9. Example usage: Show: devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers pci/0000:4b:00.0: name tx_scheduling_layers type driver-specific values: cmode permanent value 9 Set: devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5 cmode permanent devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 9 cmode permanent Signed-off-by: Lukasz Czapnik <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Co-developed-by: Mateusz Polchlopek <[email protected]> Signed-off-by: Mateusz Polchlopek <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-04-22ice: Enable switching default Tx scheduler topologyMichal Wilczynski1-19/+89
Introduce support for Tx scheduler topology change, based on user selection, from default 9-layer to 5-layer. Change requires NVM (version 3.20 or newer) and DDP package (OS Package 1.3.30 or newer - available for over a year in linux-firmware, since commit aed71f296637 in linux-firmware ("ice: Update package to 1.3.30.0")) https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=aed71f296637 Enable 5-layer topology switch in init path of the driver. To accomplish that upload of the DDP package needs to be delayed, until change in Tx topology is finished. To trigger the Tx change user selection should be changed in NVM using devlink. Then the platform should be rebooted. Signed-off-by: Michal Wilczynski <[email protected]> Co-developed-by: Mateusz Polchlopek <[email protected]> Signed-off-by: Mateusz Polchlopek <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-04-22ice: Adjust the VSI/Aggregator layersRaj Victor1-18/+19
Adjust the VSI/Aggregator layers based on the number of logical layers supported by the FW. Currently the VSI and Aggregator layers are fixed based on the 9 layer scheduler tree layout. Due to performance reasons the number of layers of the scheduler tree is changing from 9 to 5. It requires a readjustment of these VSI/Aggregator layer values. Signed-off-by: Raj Victor <[email protected]> Co-developed-by: Michal Wilczynski <[email protected]> Signed-off-by: Michal Wilczynski <[email protected]> Signed-off-by: Mateusz Polchlopek <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>