aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice
AgeCommit message (Collapse)AuthorFilesLines
2022-06-30ice: Add support for double VLAN in switchdevMartyna Szapar-Mudlaw4-2/+275
Enable support for adding TC rules with both C-tag and S-tag that can filter on the inner and outer VLAN in QinQ for basic packets (not tunneled cases). Signed-off-by: Wiktor Pilarczyk <[email protected]> Signed-off-by: Martyna Szapar-Mudlaw <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-7/+89
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-06-21ice: ethtool: Prohibit improper channel config for DCBAnatolii Gerasymenko2-5/+47
Do not allow setting less channels, than Traffic Classes there are via ethtool. There must be at least one channel per Traffic Class. If you set less channels, than Traffic Classes there are, then during ice_vsi_rebuild there would be allocated only the requested amount of tx/rx rings in ice_vsi_alloc_arrays. But later in ice_vsi_setup_q_map there would be requested at least one channel per Traffic Class. This results in setting num_rxq > alloc_rxq and num_txq > alloc_txq. Later, there would be a NULL pointer dereference in ice_vsi_map_rings_to_vectors, because we go beyond of rx_rings or tx_rings arrays. Change ice_set_channels() to return error if you try to allocate less channels, than Traffic Classes there are. Change ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() to return status code instead of void. Add error handling for ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() in ice_vsi_init() and ice_vsi_cfg_tc(). [53753.889983] INFO: Flow control is disabled for this traffic class (0) on this vsi. [53763.984862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [53763.992915] PGD 14b45f5067 P4D 0 [53763.996444] Oops: 0002 [#1] SMP NOPTI [53764.000312] CPU: 12 PID: 30661 Comm: ethtool Kdump: loaded Tainted: GOE --------- - - 4.18.0-240.el8.x86_64 #1 [53764.011825] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P21.2012150710 12/15/2020 [53764.022584] RIP: 0010:ice_vsi_map_rings_to_vectors+0x7e/0x120 [ice] [53764.029089] Code: 41 0d 0f b7 b7 12 05 00 00 0f b6 d0 44 29 de 44 0f b7 c6 44 01 c2 41 39 d0 7d 2d 4c 8b 47 28 44 0f b7 ce 83 c6 01 4f 8b 04 c8 <49> 89 48 28 4 c 8b 89 b8 01 00 00 4d 89 08 4c 89 81 b8 01 00 00 44 [53764.048379] RSP: 0018:ff550dd88ea47b20 EFLAGS: 00010206 [53764.053884] RAX: 0000000000000002 RBX: 0000000000000004 RCX: ff385ea42fa4a018 [53764.061301] RDX: 0000000000000006 RSI: 0000000000000005 RDI: ff385e9baeedd018 [53764.068717] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000004 [53764.076133] R10: 0000000000000002 R11: 0000000000000004 R12: 0000000000000000 [53764.083553] R13: 0000000000000000 R14: ff385e658fdd9000 R15: ff385e9baeedd018 [53764.090976] FS: 000014872c5b5740(0000) GS:ff385e847f100000(0000) knlGS:0000000000000000 [53764.099362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [53764.105409] CR2: 0000000000000028 CR3: 0000000a820fa002 CR4: 0000000000761ee0 [53764.112851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [53764.120301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [53764.127747] PKRU: 55555554 [53764.130781] Call Trace: [53764.133564] ice_vsi_rebuild+0x611/0x870 [ice] [53764.138341] ice_vsi_recfg_qs+0x94/0x100 [ice] [53764.143116] ice_set_channels+0x1a8/0x3e0 [ice] [53764.147975] ethtool_set_channels+0x14e/0x240 [53764.152667] dev_ethtool+0xd74/0x2a10 [53764.156665] ? __mod_lruvec_state+0x44/0x110 [53764.161280] ? __mod_lruvec_state+0x44/0x110 [53764.165893] ? page_add_file_rmap+0x15/0x170 [53764.170518] ? inet_ioctl+0xd1/0x220 [53764.174445] ? netdev_run_todo+0x5e/0x290 [53764.178808] dev_ioctl+0xb5/0x550 [53764.182485] sock_do_ioctl+0xa0/0x140 [53764.186512] sock_ioctl+0x1a8/0x300 [53764.190367] ? selinux_file_ioctl+0x161/0x200 [53764.195090] do_vfs_ioctl+0xa4/0x640 [53764.199035] ksys_ioctl+0x60/0x90 [53764.202722] __x64_sys_ioctl+0x16/0x20 [53764.206845] do_syscall_64+0x5b/0x1a0 [53764.210887] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 87324e747fde ("ice: Implement ethtool ops for channels") Signed-off-by: Anatolii Gerasymenko <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-06-21ice: ethtool: advertise 1000M speeds properlyAnatolii Gerasymenko1-1/+38
In current implementation ice_update_phy_type enables all link modes for selected speed. This approach doesn't work for 1000M speeds, because both copper (1000baseT) and optical (1000baseX) standards cannot be enabled at once. Fix this, by adding the function `ice_set_phy_type_from_speed()` for 1000M speeds. Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Anatolii Gerasymenko <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-06-21ice: Fix switchdev rules book keepingWojciech Drewek1-0/+1
Adding two filters with same matching criteria ends up with one rule in hardware with act = ICE_FWD_TO_VSI_LIST. In order to remove them properly we have to keep the information about vsi handle which is used in VSI bitmap (ice_adv_fltr_mgmt_list_entry::vsi_list_info::vsi_map). Fixes: 0d08a441fb1a ("ice: ndo_setup_tc implementation for PF") Reported-by: Sridhar Samudrala <[email protected]> Signed-off-by: Wojciech Drewek <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-21ice: ignore protocol field in GTP offloadMarcin Szycik1-1/+3
Commit 34a897758efe ("ice: Add support for inner etype in switchdev") added the ability to match on inner ethertype. A side effect of that change is that it is now impossible to add some filters for protocols which do not contain inner ethtype field. tc requires the protocol field to be specified when providing certain other options, e.g. src_ip. This is a problem in case of GTP - when user wants to specify e.g. src_ip, they also need to specify protocol in tc command (otherwise tc fails with: Illegal "src_ip"). Because GTP is a tunnel, the protocol field is treated as inner protocol. GTP does not contain inner ethtype field and the filter cannot be added. To fix this, ignore the ethertype field in case of GTP filters. Fixes: 9a225f81f540 ("ice: Support GTP-U and GTP-C offload in switchdev") Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-46/+94
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-06-14ice: Fix memory corruption in VF driverPrzemyslaw Patynowski1-0/+5
Disable VF's RX/TX queues, when it's disabled. VF can have queues enabled, when it requests a reset. If PF driver assumes that VF is disabled, while VF still has queues configured, VF may unmap DMA resources. In such scenario device still can map packets to memory, which ends up silently corrupting it. Previously, VF driver could experience memory corruption, which lead to crash: [ 5119.170157] BUG: unable to handle kernel paging request at 00001b9780003237 [ 5119.170166] PGD 0 P4D 0 [ 5119.170173] Oops: 0002 [#1] PREEMPT_RT SMP PTI [ 5119.170181] CPU: 30 PID: 427592 Comm: kworker/u96:2 Kdump: loaded Tainted: G W I --------- - - 4.18.0-372.9.1.rt7.166.el8.x86_64 #1 [ 5119.170189] Hardware name: Dell Inc. PowerEdge R740/014X06, BIOS 2.3.10 08/15/2019 [ 5119.170193] Workqueue: iavf iavf_adminq_task [iavf] [ 5119.170219] RIP: 0010:__page_frag_cache_drain+0x5/0x30 [ 5119.170238] Code: 0f 0f b6 77 51 85 f6 74 07 31 d2 e9 05 df ff ff e9 90 fe ff ff 48 8b 05 49 db 33 01 eb b4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 f6 c4 80 74 0f 0f b6 77 51 85 f6 74 [ 5119.170244] RSP: 0018:ffffa43b0bdcfd78 EFLAGS: 00010282 [ 5119.170250] RAX: ffffffff896b3e40 RBX: ffff8fb282524000 RCX: 0000000000000002 [ 5119.170254] RDX: 0000000049000000 RSI: 0000000000000000 RDI: 00001b9780003203 [ 5119.170259] RBP: ffff8fb248217b00 R08: 0000000000000022 R09: 0000000000000009 [ 5119.170262] R10: 2b849d6300000000 R11: 0000000000000020 R12: 0000000000000000 [ 5119.170265] R13: 0000000000001000 R14: 0000000000000009 R15: 0000000000000000 [ 5119.170269] FS: 0000000000000000(0000) GS:ffff8fb1201c0000(0000) knlGS:0000000000000000 [ 5119.170274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5119.170279] CR2: 00001b9780003237 CR3: 00000008f3e1a003 CR4: 00000000007726e0 [ 5119.170283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5119.170286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5119.170290] PKRU: 55555554 [ 5119.170292] Call Trace: [ 5119.170298] iavf_clean_rx_ring+0xad/0x110 [iavf] [ 5119.170324] iavf_free_rx_resources+0xe/0x50 [iavf] [ 5119.170342] iavf_free_all_rx_resources.part.51+0x30/0x40 [iavf] [ 5119.170358] iavf_virtchnl_completion+0xd8a/0x15b0 [iavf] [ 5119.170377] ? iavf_clean_arq_element+0x210/0x280 [iavf] [ 5119.170397] iavf_adminq_task+0x126/0x2e0 [iavf] [ 5119.170416] process_one_work+0x18f/0x420 [ 5119.170429] worker_thread+0x30/0x370 [ 5119.170437] ? process_one_work+0x420/0x420 [ 5119.170445] kthread+0x151/0x170 [ 5119.170452] ? set_kthread_struct+0x40/0x40 [ 5119.170460] ret_from_fork+0x35/0x40 [ 5119.170477] Modules linked in: iavf sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nfp tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common isst_if_common skx_edac irdma nfit libnvdimm x86_pkg_temp_thermal i40e intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ib_uverbs rapl ipmi_ssif intel_cstate intel_uncore mei_me pcspkr acpi_ipmi ib_core mei lpc_ich i2c_i801 ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ice ahci drm libahci crc32c_intel libata tg3 megaraid_sas [ 5119.170613] i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: iavf] [ 5119.170627] CR2: 00001b9780003237 Fixes: ec4f5a436bdf ("ice: Check if VF is disabled for Opcode and other operations") Signed-off-by: Przemyslaw Patynowski <[email protected]> Co-developed-by: Slawomir Laba <[email protected]> Signed-off-by: Slawomir Laba <[email protected]> Signed-off-by: Mateusz Palczewski <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-14ice: Fix queue config fail handlingPrzemyslaw Patynowski1-27/+26
Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail. Not disabling them might lead to scenario, where PF driver leaves VF queues enabled, when VF's VSI failed queue config. In this scenario VF should not have RX/TX queues enabled. If PF failed to set up VF's queues, VF will reset due to TX timeouts in VF driver. Initialize iterator 'i' to -1, so if error happens prior to configuring queues then error path code will not disable queue 0. Loop that configures queues will is using same iterator, so error path code will only disable queues that were configured. Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Suggested-by: Slawomir Laba <[email protected]> Signed-off-by: Przemyslaw Patynowski <[email protected]> Signed-off-by: Mateusz Palczewski <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-14ice: Sync VLAN filtering features for DVMRoman Storozhenko1-18/+31
VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be both enabled or disabled. In case of turning off/on only one of the features, another feature must be turned off/on automatically with issuing an appropriate message to the kernel log. Fixes: 1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev") Signed-off-by: Roman Storozhenko <[email protected]> Co-developed-by: Anatolii Gerasymenko <[email protected]> Signed-off-by: Anatolii Gerasymenko <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-06-14ice: Fix PTP TX timestamp offset calculationMichal Michalik2-1/+32
The offset was being incorrectly calculated for E822 - that led to collisions in choosing TX timestamp register location when more than one port was trying to use timestamping mechanism. In E822 one quad is being logically split between ports, so quad 0 is having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should have separate memory location for tracking timestamps. Due to error for example ports 1 and 2 had been assigned to quad 0 with same offset (0), while port 1 should have offset 0 and 1 offset 16. Fix it by correctly calculating quad offset. Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support") Signed-off-by: Michal Michalik <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-06-10ethernet: Remove vf rate limit check for driversBin Chen1-10/+0
The commit a14857c27a50 ("rtnetlink: verify rate parameters for calls to ndo_set_vf_rate") has been merged to master, so we can to remove the now-duplicate checks in drivers. Signed-off-by: Bin Chen <[email protected]> Signed-off-by: Baowen Zheng <[email protected]> Signed-off-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-06-10Merge branch '10GbE' of ↵Jakub Kicinski1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 10GbE Intel Wired LAN Driver Updates 2022-06-09 Maximilian Heyne adds reporting of VF statistics on ixgbe via iproute2 interface. Kai-Heng Feng removes duplicate defines from igb. Jiaqing Zhao fixes typos in e1000, ixgb, and ixgbe drivers. Julia Lawall fixes typos for fm10k, ixgbe, and ice drivers. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: drivers/net/ethernet/intel: fix typos in comments ixgbe: Fix typos in comments ixgb: Fix typos in comments e1000: Fix typos in comments igb: Remove duplicate defines drivers, ixgbe: export vf statistics ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-06-09drivers/net/ethernet/intel: fix typos in commentsJulia Lawall1-1/+1
Spelling mistakes (triple letters) in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-08ice: Use correct order for the parameters of devm_kcalloc()Christophe JAILLET1-2/+2
We should have 'n', then 'size', not the opposite. This is harmless because the 2 values are just multiplied, but having the correct order silence a (unpublished yet) smatch warning. While at it use '*tun_seg' instead '*seg'. The both variable have the same type, so the result is the same, but it lokks more logical. Signed-off-by: Christophe JAILLET <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-06-08ice: remove u16 arithmetic in ice_gnssKarol Kolacinski1-5/+6
Change u16 to unsigned int where arithmetic occurs. Signed-off-by: Karol Kolacinski <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-06-08ice: remove VLAN representor specific opsMichal Swiatkowski2-40/+7
In switchdev mode VF VLAN caps will not be set there is no need to have specific VLAN ops for representor that only returns not supported error. As VLAN configuration commands will be blocked, the VF driver can't disable VLAN stripping at initialization. It leads to the situation when VLAN stripping on VF VSI is on, but in kernel it is off. To prevent this, disable VLAN stripping in VSI initialization. It doesn't break other usecases, because it is set according to kernel settings. Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-08ice: don't set VF VLAN caps in switchdevMichal Swiatkowski1-27/+50
In switchdev mode any VLAN manipulation from VF side isn't allowed. In order to prevent parsing VLAN commands don't set VF VLAN caps. This will result in removing VLAN specific opcodes from allowlist. If VF send any VLAN specific opcode PF driver will answer with not supported error. With this approach VF driver know that VLAN caps aren't supported so it shouldn't send VLAN specific opcodes. Thanks to that, some ugly errors will not show up in dmesg (ex. on creating VFs in switchdev mode there are errors about not supported VLAN insertion and stripping) Move setting VLAN caps to separate function, including switchdev mode specific code. Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-02ice: fix access-beyond-end in the switch codeAlexander Lobakin4-139/+115
Global `-Warray-bounds` enablement revealed some problems, one of which is the way we define and use AQC rules messages. In fact, they have a shared header, followed by the actual message, which can be of one of several different formats. So it is straightforward enough to define that header as a separate struct and then embed it into message structures as needed, but currently all the formats reside in one union coupled with the header. Then, the code allocates only the memory needed for a particular message format, leaving the union potentially incomplete. There are no actual reads or writes beyond the end of an allocated chunk, but at the same time, the whole implementation is fragile and backed by an equilibrium rather than strong type and memory checks. Define the structures the other way around: one for the common header and the rest for the actual formats with the header embedded. There are no places where several union members would be used at the same time anyway. This allows to use proper struct_size() and let the compiler know what is going to be done. Finally, unsilence `-Warray-bounds` back for ice_switch.c. Other little things worth mentioning: * &ice_sw_rule_vsi_list_query is not used anywhere, remove it. It's weird anyway to talk to hardware with purely kernel types (bitmaps); * expand the ICE_SW_RULE_*_SIZE() macros to pass a structure variable name to struct_size() to let it do strict typechecking; * rename ice_sw_rule_lkup_rx_tx::hdr to ::hdr_data to keep ::hdr for the header structure to have the same name for it constistenly everywhere; * drop the duplicate of %ICE_SW_RULE_RX_TX_NO_HDR_SIZE residing in ice_switch.h. Fixes: 9daf8208dd4d ("ice: Add support for switch filter programming") Fixes: 66486d8943ba ("ice: replace single-element array used for C struct hack") Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Marcin Szycik <[email protected]> Acked-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-05-22eth: ice: silence the GCC 12 array-bounds warningJakub Kicinski1-0/+5
GCC 12 gets upset because driver allocates partial struct ice_aqc_sw_rules_elem buffers. The writes are within bounds. Silence these warnings for now, our build bot runs GCC 12 so we won't allow any new instances. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-18/+35
drivers/net/ethernet/mellanox/mlx5/core/main.c b33886971dbc ("net/mlx5: Initialize flow steering during driver probe") 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support") f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table") https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/ 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") https://lore.kernel.org/all/[email protected]/ tools/testing/selftests/net/mptcp/mptcp_join.sh e274f7154008 ("selftests: mptcp: add subflow limits test-cases") b6e074e171bc ("selftests: mptcp: add infinite map testcase") 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type") https://lore.kernel.org/all/[email protected]/ net/mptcp/options.c ba2c89e0ea74 ("mptcp: fix checksum byte order") 1e39e5a32ad7 ("mptcp: infinite mapping sending") ea66758c1795 ("tcp: allow MPTCP to update the announced window") https://lore.kernel.org/all/[email protected]/ net/mptcp/pm.c 95d686517884 ("mptcp: fix subflow accounting on close") 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") https://lore.kernel.org/all/[email protected]/ net/mptcp/subflow.c ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure") 0348c690ed37 ("mptcp: add the fallback check") f8d4bcacff3b ("mptcp: infinite mapping receiving") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-17ice: Fix interrupt moderation settings getting clearedMichal Wilczynski2-11/+16
Adaptive-rx and Adaptive-tx are interrupt moderation settings that can be enabled/disabled using ethtool: ethtool -C ethX adaptive-rx on/off adaptive-tx on/off Unfortunately those settings are getting cleared after changing number of queues, or in ethtool world 'channels': ethtool -L ethX rx 1 tx 1 Clearing was happening due to introduction of bit fields in ice_ring_container struct. This way only itr_setting bits were rebuilt during ice_vsi_rebuild_set_coalesce(). Introduce an anonymous struct of bitfields and create a union to refer to them as a single variable. This way variable can be easily saved and restored. Fixes: 61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild") Signed-off-by: Michal Wilczynski <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-05-17ice: fix possible under reporting of ethtool Tx and Rx statisticsPaul Greenwalt1-3/+4
The hardware statistics counters are not cleared during resets so the drivers first access is to initialize the baseline and then subsequent reads are for reporting the counters. The statistics counters are read during the watchdog subtask when the interface is up. If the baseline is not initialized before the interface is up, then there can be a brief window in which some traffic can be transmitted/received before the initial baseline reading takes place. Directly initialize ethtool statistics in driver open so the baseline will be initialized when the interface is up, and any dropped packets incremented before the interface is up won't be reported. Fixes: 28dc1b86f8ea9 ("ice: ignore dropped packets during init") Signed-off-by: Paul Greenwalt <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-05-17ice: fix crash when writing timestamp on RX ringsArkadiusz Kubalewski1-4/+15
Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: 77a781155a65 ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Michal Schmidt <[email protected]> Tested-by: Dave Cain <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-05-13ice: Expose RSS indirection tables for queue groups via ethtoolSridhar Samudrala1-18/+51
When ADQ queue groups (TCs) are created via tc mqprio command, RSS contexts and associated RSS indirection tables are configured automatically per TC based on the queue ranges specified for each traffic class. For ex: tc qdisc add dev enp175s0f0 root mqprio num_tc 3 map 0 1 2 \ queues 2@0 8@2 4@10 hw 1 mode channel will create 3 queue groups (TC 0-2) with queue ranges 2, 8 and 4 in 3 queue groups. Each queue group is associated with its own RSS context and RSS indirection table. Add support to expose RSS indirection tables for all ADQ queue groups using ethtool RSS contexts interface. ethtool -x enp175s0f0 context <tc-num> Signed-off-by: Sridhar Samudrala <[email protected]> Signed-off-by: Sudheer Mogilappagari <[email protected]> Tested-by: Bharathi Sreenivas <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-28/+78
No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-09Merge branch '100GbE' of ↵Jakub Kicinski2-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-05-06 Marcin Szycik says: This patchset adds support for systemd defined naming scheme for port representors, as well as re-enables displaying PCI bus-info in ethtool. bus-info information has previously been removed from ethtool for port representors, as a workaround for a bug in lshw tool, where the tool would sometimes display wrong descriptions for port representors/PF. Now the bug has been fixed in lshw tool [1]. Removing the workaround can be considered a regression (user might be running an older, unpatched version of lshw) (see [2] for discussion). However, calling SET_NETDEV_DEV also produces the same effect as removing the workaround, i.e. lshw is able to access PCI bus-info (this time not via ethtool, but in some other way) and the bug can occur. Adding SET_NETDEV_DEV is important, as it greatly improves netdev naming - - port representors are named based on PF name. Currently port representors are named "ethX", which might be confusing, especially when spawning VFs on multiple PFs. Furthermore, it's currently harder to determine to which PF does a particular port representor belong, as bus-info is not shown in ethtool. Consider the following three cases: Case 1: current code - driver workaround in place, no SET_NETDEV_DEV, lshw with or without fix. Port representors are not displayed because they don't have bus-info (the workaround), PFs are labelled correctly: $ sudo ./lshw -c net -businfo Bus info Device Class Description ======================================================== pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP <-- PF pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function <-- VF pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... Case 2: driver workaround in place, SET_NETDEV_DEV, no lshw fix. Port representors have predictable names. lshw is able to get bus-info because of SET_NETDEV_DEV and netdevs CAN be mislabelled: $ sudo ./lshw -c net -businfo Bus info Device Class Description ============================================================= pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP <-- mislabeled port representor pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... pci@0000:02:00.0 ens6f0npf0vf26 network Ethernet interface pci@0000:02:00.0 ens6f0 network Ethernet interface <-- mislabeled PF pci@0000:02:00.0 ens6f0npf0vf81 network Ethernet interface ... $ sudo ethtool -i ens6f0npf0vf60 driver: ice ... bus-info: ... Output of lshw would be the same with workaround removed; it does not change the fact that lshw labels netdevs incorrectly, while at the same time it prevents ethtool from displaying potentially useful data (bus-info). Case 3: workaround removed, SET_NETDEV_DEV, lshw fix: $ sudo ./lshw -c net -businfo Bus info Device Class Description ============================================================= pci@0000:02:00.0 ens6f0npf0vf73 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... pci@0000:02:00.0 ens6f0npf0vf5 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP ... $ sudo ethtool -i ens6f0npf0vf73 driver: ice ... bus-info: 0000:02:00.0 ... In this case poort representors have predictable names, netdevs are not mislabelled in lshw, and bus-info is shown in ethtool. [1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1 [2] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/[email protected] * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode" ice: link representors to PCI device ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-09rtnetlink: add extack support in fdb del handlersAlaa Mohamed1-1/+2
Add extack support to .ndo_fdb_del in netdevice.h and all related methods. Signed-off-by: Alaa Mohamed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-06ice: fix PTP stale Tx timestamps cleanupMichal Michalik1-2/+8
Read stale PTP Tx timestamps from PHY on cleanup. After running out of Tx timestamps request handlers, hardware (HW) stops reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used to only clean up stale handlers in driver and was leaving the hardware registers not read. Not reading stale PTP Tx timestamps prevents next interrupts from arriving and makes timestamping unusable. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Michal Michalik <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-05-06ice: clear stale Tx queue settings before configuringAnatolii Gerasymenko1-18/+50
The iAVF driver uses 3 virtchnl op codes to communicate with the PF regarding the VF Tx queues: * VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware logic for the Tx queues * VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts * VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings. There is a bug in the iAVF driver due to the race condition between VF reset request and shutdown being executed in parallel. This leads to a break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent. If this occurs, the PF driver never cleans up the Tx queues. This results in leaving behind stale Tx queue settings in the hardware and firmware. The most obvious outcome is that upon the next VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx scheduler node due to a lack of space. We need to protect ICE driver against such situation. To fix this, make sure we clear existing stale settings out when handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the previous settings. Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if the queue is not configured. The function already handles the case when the Tx queue is not currently configured and exits with a 0 return in that case. Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Anatolii Gerasymenko <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-06ice: Fix race during aux device (un)pluggingIvan Vecera3-8/+20
Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] <TASK> [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] </TASK> Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Dave Ertman <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-05-06Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode"Marcin Szycik1-5/+3
This reverts commit bfaaba99e680bf82bf2cbf69866c3f37434ff766. Commit bfaaba99e680 ("ice: Hide bus-info in ethtool for PRs in switchdev mode") was a workaround for lshw tool displaying incorrect descriptions for port representors and PF in switchdev mode. Now the issue has been fixed in the lshw tool itself [1]. Removing the workaround can be considered a regression, as the user might be running older, unpatched lshw version. However, another important change (ice: link representors to PCI device, which improves port representor netdev naming with SET_NETDEV_DEV) also causes the same "regression" as removing the workaround, i.e. unpatched lshw is able to access bus-info information (this time not via ethtool) and the bug can occur. Therefore, the workaround no longer prevents the bug and can be removed. [1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1 Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-06ice: link representors to PCI deviceMichal Swiatkowski1-0/+1
Link port representors to parent PCI device to benefit from systemd defined naming scheme. Example from ip tool: - without linking: eth0 ... - with linking: eth0 ... altname enp24s0f0npf0vf0 The port representor name is being shown in altname, because the name is longer than IFNAMSIZ (16) limit. Altname can be used in ip tool. Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: remove period on argument description in ice_for_each_vfJacob Keller1-2/+2
The ice_for_each_vf macros have comments describing the implementation. One of the arguments has a period on the end, which is not our typical style. Remove the unnecessary period. Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: add a function comment for ice_cfg_mac_antispoofJacob Keller1-0/+7
This function definition was missing a comment describing its implementation. Add one. Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: fix wording in comment for ice_reset_vfJacob Keller1-2/+2
The comment explaining ice_reset_vf has an extraneous "the" with the "if the resets are disabled". Remove it. Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: remove return value comment for ice_reset_all_vfsJacob Keller1-2/+2
Since commit fe99d1c06c16 ("ice: make ice_reset_all_vfs void"), the ice_reset_all_vfs function has not returned anything. The function comment still indicated it did. Fix this. While here, also add a line to clarify the function resets all VFs at once in response to hardware resets such as a PF reset. Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: always check VF VSI pointer valuesJacob Keller6-7/+77
The ice_get_vf_vsi function can return NULL in some cases, such as if handling messages during a reset where the VSI is being removed and recreated. Several places throughout the driver do not bother to check whether this VSI pointer is valid. Static analysis tools maybe report issues because they detect paths where a potentially NULL pointer could be dereferenced. Fix this by checking the return value of ice_get_vf_vsi everywhere. Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: add newline to dev_dbg in ice_vf_fdir_dump_infoJacob Keller1-1/+1
The debug print in ice_vf_fdir_dump_info does not end in newlines. This can look confusing when reading the kernel log, as the next print will immediately continue on the same line. Fix this by adding the forgotten newline. Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: get switch id on switchdev devicesMichal Swiatkowski2-0/+37
Switch id should be the same for each netdevice on a driver. The id must be unique between devices on the same system, but does not need to be unique between devices on different systems. The switch id is used to locate ports on a switch and to know if aggregated ports belong to the same switch. To meet this requirements, use pci_get_dsn as switch id value, as this is unique value for each devices on the same system. Implementing switch id is needed by automatic tools for kubernetes. Set switch id by setting devlink port attribiutes and calling devlink_port_attrs_set while creating pf (for uplink) and vf (for representator) devlink port. To get switch id (in switchdev mode): cat /sys/class/net/$PF0/phys_switch_id Signed-off-by: Michal Swiatkowski <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: return ENOSPC when exceeding ICE_MAX_CHAIN_WORDSWojciech Drewek2-2/+4
When number of words exceeds ICE_MAX_CHAIN_WORDS, -ENOSPC should be returned not -EINVAL. Do not overwrite this error code in ice_add_tc_flower_adv_fltr. Signed-off-by: Wojciech Drewek <[email protected]> Suggested-by: Marcin Szycik <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: introduce common helper for retrieving VSI by vsi_numMaciej Fijalkowski3-35/+17
Both ice_idc.c and ice_virtchnl.c carry their own implementation of a helper function that is looking for a given VSI based on provided vsi_num. Their functionality is the same, so let's introduce the common function in ice.h that both of the mentioned sites will use. This is a strictly cleanup thing, no functionality is changed. Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-05ice: use min_t() to make code cleaner in ice_gnssWan Jiabing1-2/+1
Fix the following coccicheck warning: ./drivers/net/ethernet/intel/ice/ice_gnss.c:79:26-27: WARNING opportunity for min() Signed-off-by: Wan Jiabing <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-19/+13
include/linux/netdevice.h net/core/dev.c 6510ea973d8d ("net: Use this_cpu_inc() to increment net->core_stats") 794c24e9921f ("net-core: rx_otherhost_dropped to core_stats") https://lore.kernel.org/all/[email protected]/ drivers/net/wan/cosa.c d48fea8401cf ("net: cosa: fix error check return value of register_chrdev()") 89fbca3307d4 ("net: wan: remove support for COSA and SRP synchronous serial boards") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-04-27Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2-20/+34
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-04-27 We've added 85 non-merge commits during the last 18 day(s) which contain a total of 163 files changed, 4499 insertions(+), 1521 deletions(-). The main changes are: 1) Teach libbpf to enhance BPF verifier log with human-readable and relevant information about failed CO-RE relocations, from Andrii Nakryiko. 2) Add typed pointer support in BPF maps and enable it for unreferenced pointers (via probe read) and referenced ones that can be passed to in-kernel helpers, from Kumar Kartikeya Dwivedi. 3) Improve xsk to break NAPI loop when rx queue gets full to allow for forward progress to consume descriptors, from Maciej Fijalkowski & Björn Töpel. 4) Fix a small RCU read-side race in BPF_PROG_RUN routines which dereferenced the effective prog array before the rcu_read_lock, from Stanislav Fomichev. 5) Implement BPF atomic operations for RV64 JIT, and add libbpf parsing logic for USDT arguments under riscv{32,64}, from Pu Lehui. 6) Implement libbpf parsing of USDT arguments under aarch64, from Alan Maguire. 7) Enable bpftool build for musl and remove nftw with FTW_ACTIONRETVAL usage so it can be shipped under Alpine which is musl-based, from Dominique Martinet. 8) Clean up {sk,task,inode} local storage trace RCU handling as they do not need to use call_rcu_tasks_trace() barrier, from KP Singh. 9) Improve libbpf API documentation and fix error return handling of various API functions, from Grant Seltzer. 10) Enlarge offset check for bpf_skb_{load,store}_bytes() helpers given data length of frags + frag_list may surpass old offset limit, from Liu Jian. 11) Various improvements to prog_tests in area of logging, test execution and by-name subtest selection, from Mykola Lysenko. 12) Simplify map_btf_id generation for all map types by moving this process to build time with help of resolve_btfids infra, from Menglong Dong. 13) Fix a libbpf bug in probing when falling back to legacy bpf_probe_read*() helpers; the probing caused always to use old helpers, from Runqing Yang. 14) Add support for ARCompact and ARCv2 platforms for libbpf's PT_REGS tracing macros, from Vladimir Isaev. 15) Cleanup BPF selftests to remove old & unneeded rlimit code given kernel switched to memcg-based memory accouting a while ago, from Yafang Shao. 16) Refactor of BPF sysctl handlers to move them to BPF core, from Yan Zhu. 17) Fix BPF selftests in two occasions to work around regressions caused by latest LLVM to unblock CI until their fixes are worked out, from Yonghong Song. 18) Misc cleanups all over the place, from various others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add libbpf's log fixup logic selftests libbpf: Fix up verifier log for unguarded failed CO-RE relos libbpf: Simplify bpf_core_parse_spec() signature libbpf: Refactor CO-RE relo human description formatting routine libbpf: Record subprog-resolved CO-RE relocations unconditionally selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests libbpf: Avoid joining .BTF.ext data with BPF programs by section name libbpf: Fix logic for finding matching program for CO-RE relocation libbpf: Drop unhelpful "program too large" guess libbpf: Fix anonymous type check in CO-RE logic bpf: Compute map_btf_id during build time selftests/bpf: Add test for strict BTF type check selftests/bpf: Add verifier tests for kptr selftests/bpf: Add C tests for kptr libbpf: Add kptr type tag macros to bpf_helpers.h bpf: Make BTF type match stricter for release arguments bpf: Teach verifier about kptr_get kfunc helpers bpf: Wire up freeing of referenced kptr bpf: Populate pairs of btf_id and destructor kfunc in btf bpf: Adapt copy_map_value for multiple offset case ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-04-26ice: fix use-after-free when deinitializing mailbox snapshotJacob Keller1-1/+1
During ice_sriov_configure, if num_vfs is 0, we are being asked by the kernel to remove all VFs. The driver first de-initializes the snapshot before freeing all the VFs. This results in a use-after-free BUG detected by KASAN. The bug occurs because the snapshot can still be accessed until all VFs are removed. Fix this by freeing all the VFs first before calling ice_mbx_deinit_snapshot. [ +0.032591] ================================================================== [ +0.000021] BUG: KASAN: use-after-free in ice_mbx_vf_state_handler+0x1c3/0x410 [ice] [ +0.000315] Write of size 28 at addr ffff889908eb6f28 by task kworker/55:2/1530996 [ +0.000029] CPU: 55 PID: 1530996 Comm: kworker/55:2 Kdump: loaded Tainted: G S I 5.17.0-dirty #1 [ +0.000022] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 1.6.13 12/17/2018 [ +0.000013] Workqueue: ice ice_service_task [ice] [ +0.000279] Call Trace: [ +0.000012] <TASK> [ +0.000011] dump_stack_lvl+0x33/0x42 [ +0.000030] print_report.cold.13+0xb2/0x6b3 [ +0.000028] ? ice_mbx_vf_state_handler+0x1c3/0x410 [ice] [ +0.000295] kasan_report+0xa5/0x120 [ +0.000026] ? __switch_to_asm+0x21/0x70 [ +0.000024] ? ice_mbx_vf_state_handler+0x1c3/0x410 [ice] [ +0.000298] kasan_check_range+0x183/0x1e0 [ +0.000019] memset+0x1f/0x40 [ +0.000018] ice_mbx_vf_state_handler+0x1c3/0x410 [ice] [ +0.000304] ? ice_conv_link_speed_to_virtchnl+0x160/0x160 [ice] [ +0.000297] ? ice_vsi_dis_spoofchk+0x40/0x40 [ice] [ +0.000305] ice_is_malicious_vf+0x1aa/0x250 [ice] [ +0.000303] ? ice_restore_all_vfs_msi_state+0x160/0x160 [ice] [ +0.000297] ? __mutex_unlock_slowpath.isra.15+0x410/0x410 [ +0.000022] ? ice_debug_cq+0xb7/0x230 [ice] [ +0.000273] ? __kasan_slab_alloc+0x2f/0x90 [ +0.000022] ? memset+0x1f/0x40 [ +0.000017] ? do_raw_spin_lock+0x119/0x1d0 [ +0.000022] ? rwlock_bug.part.2+0x60/0x60 [ +0.000024] __ice_clean_ctrlq+0x3a6/0xd60 [ice] [ +0.000273] ? newidle_balance+0x5b1/0x700 [ +0.000026] ? ice_print_link_msg+0x2f0/0x2f0 [ice] [ +0.000271] ? update_cfs_group+0x1b/0x140 [ +0.000018] ? load_balance+0x1260/0x1260 [ +0.000022] ? ice_process_vflr_event+0x27/0x130 [ice] [ +0.000301] ice_service_task+0x136e/0x1470 [ice] [ +0.000281] process_one_work+0x3b4/0x6c0 [ +0.000030] worker_thread+0x65/0x660 [ +0.000023] ? __kthread_parkme+0xe4/0x100 [ +0.000021] ? process_one_work+0x6c0/0x6c0 [ +0.000020] kthread+0x179/0x1b0 [ +0.000018] ? kthread_complete_and_exit+0x20/0x20 [ +0.000022] ret_from_fork+0x22/0x30 [ +0.000026] </TASK> [ +0.000018] Allocated by task 10742: [ +0.000013] kasan_save_stack+0x1c/0x40 [ +0.000018] __kasan_kmalloc+0x84/0xa0 [ +0.000016] kmem_cache_alloc_trace+0x16c/0x2e0 [ +0.000015] intel_iommu_probe_device+0xeb/0x860 [ +0.000015] __iommu_probe_device+0x9a/0x2f0 [ +0.000016] iommu_probe_device+0x43/0x270 [ +0.000015] iommu_bus_notifier+0xa7/0xd0 [ +0.000015] blocking_notifier_call_chain+0x90/0xc0 [ +0.000017] device_add+0x5f3/0xd70 [ +0.000014] pci_device_add+0x404/0xa40 [ +0.000015] pci_iov_add_virtfn+0x3b0/0x550 [ +0.000016] sriov_enable+0x3bb/0x600 [ +0.000013] ice_ena_vfs+0x113/0xa79 [ice] [ +0.000293] ice_sriov_configure.cold.17+0x21/0xe0 [ice] [ +0.000291] sriov_numvfs_store+0x160/0x200 [ +0.000015] kernfs_fop_write_iter+0x1db/0x270 [ +0.000018] new_sync_write+0x21d/0x330 [ +0.000013] vfs_write+0x376/0x410 [ +0.000013] ksys_write+0xba/0x150 [ +0.000012] do_syscall_64+0x3a/0x80 [ +0.000012] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000028] Freed by task 10742: [ +0.000011] kasan_save_stack+0x1c/0x40 [ +0.000015] kasan_set_track+0x21/0x30 [ +0.000016] kasan_set_free_info+0x20/0x30 [ +0.000012] __kasan_slab_free+0x104/0x170 [ +0.000016] kfree+0x9b/0x470 [ +0.000013] devres_destroy+0x1c/0x20 [ +0.000015] devm_kfree+0x33/0x40 [ +0.000012] ice_mbx_deinit_snapshot+0x39/0x70 [ice] [ +0.000295] ice_sriov_configure+0xb0/0x260 [ice] [ +0.000295] sriov_numvfs_store+0x1bc/0x200 [ +0.000015] kernfs_fop_write_iter+0x1db/0x270 [ +0.000016] new_sync_write+0x21d/0x330 [ +0.000012] vfs_write+0x376/0x410 [ +0.000012] ksys_write+0xba/0x150 [ +0.000012] do_syscall_64+0x3a/0x80 [ +0.000012] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000024] Last potentially related work creation: [ +0.000010] kasan_save_stack+0x1c/0x40 [ +0.000016] __kasan_record_aux_stack+0x98/0xa0 [ +0.000013] insert_work+0x34/0x160 [ +0.000015] __queue_work+0x20e/0x650 [ +0.000016] queue_work_on+0x4c/0x60 [ +0.000015] nf_nat_masq_schedule+0x297/0x2e0 [nf_nat] [ +0.000034] masq_device_event+0x5a/0x60 [nf_nat] [ +0.000031] raw_notifier_call_chain+0x5f/0x80 [ +0.000017] dev_close_many+0x1d6/0x2c0 [ +0.000015] unregister_netdevice_many+0x4e3/0xa30 [ +0.000015] unregister_netdevice_queue+0x192/0x1d0 [ +0.000014] iavf_remove+0x8f9/0x930 [iavf] [ +0.000058] pci_device_remove+0x65/0x110 [ +0.000015] device_release_driver_internal+0xf8/0x190 [ +0.000017] pci_stop_bus_device+0xb5/0xf0 [ +0.000014] pci_stop_and_remove_bus_device+0xe/0x20 [ +0.000016] pci_iov_remove_virtfn+0x19c/0x230 [ +0.000015] sriov_disable+0x4f/0x170 [ +0.000014] ice_free_vfs+0x9a/0x490 [ice] [ +0.000306] ice_sriov_configure+0xb8/0x260 [ice] [ +0.000294] sriov_numvfs_store+0x1bc/0x200 [ +0.000015] kernfs_fop_write_iter+0x1db/0x270 [ +0.000016] new_sync_write+0x21d/0x330 [ +0.000012] vfs_write+0x376/0x410 [ +0.000012] ksys_write+0xba/0x150 [ +0.000012] do_syscall_64+0x3a/0x80 [ +0.000012] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000025] The buggy address belongs to the object at ffff889908eb6f00 which belongs to the cache kmalloc-96 of size 96 [ +0.000016] The buggy address is located 40 bytes inside of 96-byte region [ffff889908eb6f00, ffff889908eb6f60) [ +0.000026] The buggy address belongs to the physical page: [ +0.000010] page:00000000b7e99a2e refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1908eb6 [ +0.000016] flags: 0x57ffffc0000200(slab|node=1|zone=2|lastcpupid=0x1fffff) [ +0.000024] raw: 0057ffffc0000200 ffffea0069d9fd80 dead000000000002 ffff88810004c780 [ +0.000015] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ +0.000009] page dumped because: kasan: bad access detected [ +0.000016] Memory state around the buggy address: [ +0.000012] ffff889908eb6e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ +0.000014] ffff889908eb6e80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ +0.000014] >ffff889908eb6f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ +0.000011] ^ [ +0.000013] ffff889908eb6f80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ +0.000013] ffff889908eb7000: fa fb fb fb fb fb fb fb fc fc fc fc fa fb fb fb [ +0.000012] ================================================================== Fixes: 0891c89674e8 ("ice: warn about potentially malicious VFs") Reported-by: Slawomir Laba <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-04-26ice: wait 5 s for EMP reset after firmware flashPetr Oros1-0/+3
We need to wait 5 s for EMP reset after firmware flash. Code was extracted from OOT driver (ice v1.8.3 downloaded from sourceforge). Without this wait, fw_activate let card in inconsistent state and recoverable only by second flash/activate. Flash was tested on these fw's: From -> To 3.00 -> 3.10/3.20 3.10 -> 3.00/3.20 3.20 -> 3.00/3.10 Reproducer: [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin Preparing to flash [fw.mgmt] Erasing [fw.mgmt] Erasing done [fw.mgmt] Flashing 100% [fw.mgmt] Flashing done 100% [fw.undi] Erasing [fw.undi] Erasing done [fw.undi] Flashing 100% [fw.undi] Flashing done 100% [fw.netlist] Erasing [fw.netlist] Erasing done [fw.netlist] Flashing 100% [fw.netlist] Flashing done 100% Activate new firmware by devlink reload [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate reload_actions_performed: fw_activate [root@host ~]# ip link show ens7f0 71: ens7f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff altname enp202s0f0 dmesg after flash: [ 55.120788] ice: Copyright (c) 2018, Intel Corporation. [ 55.274734] ice 0000:ca:00.0: Get PHY capabilities failed status = -5, continuing anyway [ 55.569797] ice 0000:ca:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.28.0 [ 55.603629] ice 0000:ca:00.0: Get PHY capability failed. [ 55.608951] ice 0000:ca:00.0: ice_init_nvm_phy_type failed: -5 [ 55.647348] ice 0000:ca:00.0: PTP init successful [ 55.675536] ice 0000:ca:00.0: DCB is enabled in the hardware, max number of TCs supported on this port are 8 [ 55.685365] ice 0000:ca:00.0: FW LLDP is disabled, DCBx/LLDP in SW mode. [ 55.692179] ice 0000:ca:00.0: Commit DCB Configuration to the hardware [ 55.701382] ice 0000:ca:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x8 link at 0000:c9:02.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link) Reboot doesn’t help, only second flash/activate with OOT or patched driver put card back in consistent state. After patch: [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin Preparing to flash [fw.mgmt] Erasing [fw.mgmt] Erasing done [fw.mgmt] Flashing 100% [fw.mgmt] Flashing done 100% [fw.undi] Erasing [fw.undi] Erasing done [fw.undi] Flashing 100% [fw.undi] Flashing done 100% [fw.netlist] Erasing [fw.netlist] Erasing done [fw.netlist] Flashing 100% [fw.netlist] Flashing done 100% Activate new firmware by devlink reload [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate reload_actions_performed: fw_activate [root@host ~]# ip link show ens7f0 19: ens7f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff altname enp202s0f0 Fixes: 399e27dbbd9e94 ("ice: support immediate firmware activation via devlink reload") Signed-off-by: Petr Oros <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-04-26ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()Ivan Vecera1-7/+5
Previous patch labelled "ice: Fix incorrect locking in ice_vc_process_vf_msg()" fixed an issue with ignored messages sent by VF driver but a small race window still left. Recently caught trace during 'ip link set ... vf 0 vlan ...' operation: [ 7332.995625] ice 0000:3b:00.0: Clearing port VLAN on VF 0 [ 7333.001023] iavf 0000:3b:01.0: Reset indication received from the PF [ 7333.007391] iavf 0000:3b:01.0: Scheduling reset task [ 7333.059575] iavf 0000:3b:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 3 [ 7333.059626] ice 0000:3b:00.0: Invalid message from VF 0, opcode 3, len 4, error -1 Setting of VLAN for VF causes a reset of the affected VF using ice_reset_vf() function that runs with cfg_lock taken: 1. ice_notify_vf_reset() informs IAVF driver that reset is needed and IAVF schedules its own reset procedure 2. Bit ICE_VF_STATE_DIS is set in vf->vf_state 3. Misc initialization steps 4. ice_sriov_post_vsi_rebuild() -> ice_vf_set_initialized() and that clears ICE_VF_STATE_DIS in vf->vf_state Step 3 is mentioned race window because IAVF reset procedure runs in parallel and one of its step is sending of VIRTCHNL_OP_GET_VF_RESOURCES message (opcode==3). This message is handled in ice_vc_process_vf_msg() and if it is received during the mentioned race window then it's marked as invalid and error is returned to VF driver. Protect vf_state check in ice_vc_process_vf_msg() by cfg_lock to avoid this race condition. Fixes: e6ba5273d4ed ("ice: Fix race conditions between virtchnl handling and VF ndo ops") Tested-by: Fei Liu <[email protected]> Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-04-26ice: Fix incorrect locking in ice_vc_process_vf_msg()Ivan Vecera1-14/+7
Usage of mutex_trylock() in ice_vc_process_vf_msg() is incorrect because message sent from VF is ignored and never processed. Use mutex_lock() instead to fix the issue. It is safe because this mutex is used to prevent races between VF related NDOs and handlers processing request messages from VF and these handlers are running in ice_service_task() context. Additionally move this mutex lock prior ice_vc_is_opcode_allowed() call to avoid potential races during allowlist access. Fixes: e6ba5273d4ed ("ice: Fix race conditions between virtchnl handling and VF ndo ops") Signed-off-by: Ivan Vecera <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-04-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni4-3/+10
drivers/net/ethernet/microchip/lan966x/lan966x_main.c d08ed852560e ("net: lan966x: Make sure to release ptp interrupt") c8349639324a ("net: lan966x: Add FDMA functionality") Signed-off-by: Paolo Abeni <[email protected]>