aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni6-27/+35
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <[email protected]>
2023-09-21igc: Expose tx-usecs coalesce setting to userMuhammad Husaini Zulkifli1-12/+19
When users attempt to obtain the coalesce setting using the ethtool command, current code always returns 0 for tx-usecs. This is because I225/6 always uses a queue pair setting, hence tx_coalesce_usecs does not return a value during the igc_ethtool_get_coalesce() callback process. The pair queue condition checking in igc_ethtool_get_coalesce() is removed by this patch so that the user gets information of the value of tx-usecs. Even if i225/6 is using queue pair setting, there is no harm in notifying the user of the tx-usecs. The implementation of the current code may have previously been a copy of the legacy code i210. Since I225 has the queue pair setting enabled, tx-usecs will always adhere to the user-set rx-usecs value. An error message will appear when the user attempts to set the tx-usecs value for the input parameters because, by default, they should only set the rx-usecs value. This patch also adds the helper function to get the previous rx coalesce value similar to tx coalesce. How to test: User can get the coalesce value using ethtool command. Example command: Get: ethtool -c <interface> Previous output: rx-usecs: 3 rx-frames: n/a rx-usecs-irq: n/a rx-frames-irq: n/a tx-usecs: 0 tx-frames: n/a tx-usecs-irq: n/a tx-frames-irq: n/a New output: rx-usecs: 3 rx-frames: n/a rx-usecs-irq: n/a rx-frames-irq: n/a tx-usecs: 3 tx-frames: n/a tx-usecs-irq: n/a tx-frames-irq: n/a Fixes: 8c5ad0dae93c ("igc: Add ethtool support") Signed-off-by: Muhammad Husaini Zulkifli <[email protected]> Tested-by: Naama Meir <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-09-20ice: Remove the FW shared parametersMichal Michalik3-86/+0
The only feature using the Firmware (FW) shared parameters was the PTP clock ID. Since this ID is now shared using auxiliary buss - remove the FW shared parameters from the code. Signed-off-by: Michal Michalik <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-20ice: PTP: add clock domain number to auxiliary interfaceMichal Michalik4-147/+34
The PHC clock id used to be moved between PFs using FW admin queue shared parameters - move the implementation to auxiliary bus. Signed-off-by: Karol Kolacinski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Michal Michalik <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-20ice: Use PTP auxbus for all PHYs restart in E822Michal Michalik1-3/+21
The E822 (and other devices based on the same PHY) is having issue while setting the PHC timer - the PHY timers are drifting from the PHC. After such a set all PHYs need to be restarted and resynchronised - do it using auxiliary bus. Signed-off-by: Karol Kolacinski <[email protected]> Signed-off-by: Michal Michalik <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-20ice: Auxbus devices & driver for E822 TSMichal Michalik5-17/+430
There is a problem in HW in E822-based devices leading to race condition. It might happen that, in order: - PF0 (which owns the PHC) requests few timestamps, - PF1 requests a timestamp, - interrupt is being triggered and both PF0 and PF1 threads are woken up, - PF0 got one timestamp, still waiting for others so not going to sleep, - PF1 gets it's timestamp, process it and go to sleep, - PF1 requests a timestamp again, - just before PF0 goes to sleep timestamp of PF1 appear, - PF0 finishes all it's timestamps and go to sleep (PF1 also sleeping). That leaves PF1 timestamp memory not read, which lead to blocking the next interrupt from arriving. Fix it by adding auxiliary devices and only one driver to handle all the timestamps for all PF's by PHC owner. In the past each PF requested it's own timestamps and process it from the start till the end which causes problem described above. Currently each PF requests the timestamps as before, but the actual reading of the completed timestamps is being done by the PTP auxiliary driver, which is registered by the PF which owns PHC. Additionally, the newly introduced auxiliary driver/devices for PTP clock owner will be used for other features in all products (including E810). Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Karol Kolacinski <[email protected]> Signed-off-by: Michal Michalik <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: check netlist before enabling ICE_F_GNSSJacob Keller4-0/+21
Similar to the change made for ICE_F_SMA_CTRL, check the netlist before enabling support for ICE_F_GNSS. This ensures that the driver only enables the GNSS feature on devices which actually have the feature enabled in the firmware device configuration. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: introduce ice_pf_src_tmr_ownedJacob Keller4-5/+7
Add ice_pf_src_tmr_owned() macro to check the function capability bit indicating if the current function owns the PTP hardware clock. This is slightly shorter than the more verbose access via hw.func_caps.ts_func_info.src_tmr_owned. Use this where possible rather than open coding its equivalent. Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: fix pin assignment for E810-T without SMA controlJacob Keller1-3/+7
Since commit 43c4958a3ddb ("ice: Merge pin initialization of E810 and E810T adapters"), the ice_ptp_setup_pins_e810() function has been used for both E810 and E810-T devices. The new implementation only distinguishes between whether the device has SMA control or not. It was assumed this is always true for E810-T devices. In addition, it does not set the n_per_out value appropriately when SMA control is enabled. In some cases, the E810-T device may not have access to SMA control. In that case, the E810-T device actually has access to fewer pins than a standard E810 device. Fix the implementation to correctly assign the appropriate pin counts for E810-T devices both with and without SMA control. The mentioned commit already includes the appropriate macro values for these pin counts but they were unused. Instead of assigning the default E810 values and then overwriting them, handle the cases separately in order of E810-T with SMA, E810-T without SMA, and then standard E810. This flow makes following the logic easier. Fixes: 43c4958a3ddb ("ice: Merge pin initialization of E810 and E810T adapters") Signed-off-by: Jacob Keller <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: remove ICE_F_PTP_EXTTS feature flagJacob Keller3-5/+1
The ICE_F_PTP_EXTTS feature flag is ostensibly intended to support checking whether the device supports external timestamp pins. It is only checked in E810-specific code flows, and is enabled for all E810-based devices. E822 and E823 flows unconditionally enable external timestamp support. This makes the feature flag meaningless, as it is always enabled. Just unconditionally enable support for external timestamp pins and remove this unnecessary flag. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: PTP: move quad value check inside ice_fill_phy_msg_e822Karol Kolacinski1-7/+12
The callers of ice_fill_phy_msg_e822 check for whether the quad number is within the expected range. Move this check inside the ice_fill_phy_msg_e822 function instead of duplicating it twice. Signed-off-by: Karol Kolacinski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: PTP: Rename macros used for PHY/QUAD port definitionsKarol Kolacinski2-11/+11
The ice_fill_phy_msg_e822 function uses several macros to specify the correct address when sending a sideband message to the PHY block in hardware. The names of these macros are fairly generic and confusing. Future development is going to extend the driver to support new hardware families which have different relationships between PHY and QUAD. Rename the macros for clarity and to indicate that they are E822 specific. This also matches closer to the hardware specification in the data sheet. Signed-off-by: Karol Kolacinski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: PTP: Clean up timestamp registers correctlyKarol Kolacinski1-25/+45
E822 PHY TS registers should not be written and the only way to clean up them is to reset QUAD memory. To ensure that the status bit for the timestamp index is cleared, ensure that ice_clear_phy_tstamp implementations first read the timestamp out. Implementations which can write the register continue to do so. Add a note to indicate this function should only be called on timestamps which have their valid bit set. Update the dynamic debug messages to reflect the actual action taken. Signed-off-by: Karol Kolacinski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: introduce hw->phy_model for handling PTP PHY differencesJacob Keller4-27/+117
The ice driver has PTP support which works across a couple of different device families. The device families each have different PHY hardware which have unique requirements for programming. Today, there is E810-based hardware, and E822-based hardware. To handle this, the driver checks the ice_is_e810() function to separate between the two existing families of hardware. Future development is going to add new hardware designs which have further unique requirements. To make this easier, introduce a phy_model field to the HW structure. This field represents what PHY model the current device has, and is used to allow distinguishing which logic a particular device needs. This will make supporting future upcoming hardware easier, by providing an obvious place to initialize the PHY model, and by already using switch/case statements instead of the previous if statements. Astute reviewers may notice that there are a handful of remaining checks for ice_is_e810() left in ice_ptp.c These conflict with some other cleanup patches in development, and will be fixed in the near future. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: Support cross-timestamping for E823 devicesJacob Keller3-11/+21
The E822 hardware has cross timestamping support using a device feature termed "Hammock Harbor" by the data sheet. This device feature is similar to PCIe PTM, and captures the Always Running Timer (ART) simultaneously with the PTP hardware clock time. This functionality also exists on E823 devices, but is not currently enabled. Rename the cross-timestamp functions to use the _e82x postfix, indicating that the support works across the E82x family of devices and not just the E822 hardware. The flow for capturing a cross-timestamp requires an additional step on E823 devices. The GLTSYN_CMD register must be programmed with the READ_TIME command. Otherwise, the cross timestamp will always report a value of zero for the PTP hardware clock time. To fix this, call ice_ptp_src_cmd() prior to initiating the cross timestamp logic. Once the cross timestamp has completed, call ice_ptp_src_cmd() with ICE_PTP_OP to ensure that the timer command registers are cleared. Co-developed-by: Sergey Temerkhanov <[email protected]> Signed-off-by: Sergey Temerkhanov <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: retry acquiring hardware semaphore during cross-timestamp requestKarol Kolacinski1-7/+15
The hardware for performing a cross-timestamp on E822 uses a hardware semaphore which we must acquire before initiating the cross-timestamp operation. The current implementation only attempts to acquire the semaphore once, and assumes that it will succeed. If the semaphore is busy for any reason, the cross-timestamp operation fails with -EFAULT. Instead of immediately failing, try the acquire the lock a few times with a small sleep between attempts. This ensures that most requests will go through without issue. Additionally, return -EBUSY instead of -EFAULT if the operation can't continue due to the semaphore being busy. Signed-off-by: Karol Kolacinski <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-18ice: prefix clock timer command enumeration values with ICE_PTPSergey Temerkhanov2-51/+52
The ice driver has an enumeration for the various commands that can be programmed to the MAC and PHY for setting up hardware clock operations. Prefix these with ICE_PTP so that they are clearly namespaced to the ice driver. Signed-off-by: Sergey Temerkhanov <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-09-17Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-David S. Miller4-14/+15
queue Tony Nguyen says: ==================== This series contains updates to iavf and i40e drivers. Radoslaw prevents admin queue operations being added when the driver is being removed for iavf. Petr Oros immediately starts reconfiguration on changes to VLANs on iavf. Ivan Vecera moves reset of VF to occur after port VLAN values are set on i40e. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-09-17ice: implement dpll interface to control cguArkadiusz Kubalewski6-1/+2020
Control over clock generation unit is required for further development of Synchronous Ethernet feature. Interface provides ability to obtain current state of a dpll, its sources and outputs which are pins, and allows their configuration. Co-developed-by: Milena Olech <[email protected]> Signed-off-by: Milena Olech <[email protected]> Co-developed-by: Michal Michalik <[email protected]> Signed-off-by: Michal Michalik <[email protected]> Signed-off-by: Arkadiusz Kubalewski <[email protected]> Signed-off-by: Vadim Fedorenko <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-17ice: add admin commands to access cgu configurationArkadiusz Kubalewski8-33/+1385
Add firmware admin command to access clock generation unit configuration, it is required to enable Extended PTP and SyncE features in the driver. Add definitions of possible hardware variations of input and output pins related to clock generation unit and functions to access the data. Signed-off-by: Arkadiusz Kubalewski <[email protected]> Signed-off-by: Vadim Fedorenko <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-17Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nextDavid S. Miller5-5/+132
-queue Tony Nguyen says: ==================== Support rx-fcs on/off for VFs Ahmed Zaki says: Allow the user to turn on/off the CRC/FCS stripping through ethtool. We first add the CRC offload capability in the virtchannel, then the feature is enabled in ice and iavf drivers. We make sure that the netdev features are fixed such that CRC stripping cannot be disabled if VLAN rx offload (VLAN strip) is enabled. Also, VLAN stripping cannot be enabled unless CRC stripping is ON. Testing was done using tcpdump to make sure that the CRC is included in the frame after: # ethtool -K <interface> rx-fcs on and is not included when it is back "off". Also, ethtool should return an error for the above command if "rx-vlan-offload" is already on and at least one VLAN interface/filter exists on the VF. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-09-16igc: Fix infinite initialization loop with early XDP redirectVinicius Costa Gomes1-1/+1
When an XDP redirect happens before the link is ready, that transmission will not finish and will timeout, causing an adapter reset. If the redirects do not stop, the adapter will not stop resetting. Wait for the driver to signal that there's a carrier before allowing transmissions to proceed. Previous code was relying that when __IGC_DOWN is cleared, the NIC is ready to transmit as all the queues are ready, what happens is that the carrier presence will only be signaled later, after the watchdog workqueue has a chance to run. And during this interval (between clearing __IGC_DOWN and the watchdog running) if any transmission happens the timeout is emitted (detected by igc_tx_timeout()) which causes the reset, with the potential for the infinite loop. Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action") Reported-by: Ferenc Fejes <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Vinicius Costa Gomes <[email protected]> Tested-by: Ferenc Fejes <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-16Merge branch '200GbE' of ↵David S. Miller24-0/+19030
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Introduce Intel IDPF driver Pavan Kumar Linga says: This patch series introduces the Intel Infrastructure Data Path Function (IDPF) driver. It is used for both physical and virtual functions. Except for some of the device operations the rest of the functionality is the same for both PF and VF. IDPF uses virtchnl version2 opcodes and structures defined in the virtchnl2 header file which helps the driver to learn the capabilities and register offsets from the device Control Plane (CP) instead of assuming the default values. The format of the series follows the driver init flow to interface open. To start with, probe gets called and kicks off the driver initialization by spawning the 'vc_event_task' work queue which in turn calls the 'hard reset' function. As part of that, the mailbox is initialized which is used to send/receive the virtchnl messages to/from the CP. Once that is done, 'core init' kicks in which requests all the required global resources from the CP and spawns the 'init_task' work queue to create the vports. Based on the capability information received, the driver creates the said number of vports (one or many) where each vport is associated to a netdev. Also, each vport has its own resources such as queues, vectors etc. From there, rest of the netdev_ops and data path are added. IDPF implements both single queue which is traditional queueing model as well as split queue model. In split queue model, it uses separate queue for both completion descriptors and buffers which helps to implement out-of-order completions. It also helps to implement asymmetric queues, for example multiple RX completion queues can be processed by a single RX buffer queue and multiple TX buffer queues can be processed by a single TX completion queue. In single queue model, same queue is used for both descriptor completions as well as buffer completions. It also supports features such as generic checksum offload, generic receive offload (hardware GRO) etc. --- v7: Patch 2: * removed pci_[disable|enable]_pcie_error_reporting as they are dropped from the core Patch 4, 9: * used 'kasprintf' instead of 'snprintf' to avoid providing explicit character string size which also fixes "-Wformat-truncation" warnings Patch 14: * used 'ethtool_sprintf' instead of 'snprintf' to avoid providing explicit character string size which also fixes "-Wformat-truncation" warning * add string format argument to the 'ethtool_sprintf' to avoid warning on "-Wformat-security" v6: https://lore.kernel.org/netdev/[email protected]/ Note: 'Acked-by' was only added to patches 1, 2, 12 and not to the other patches because of the changes in v6 Patch 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15: * renamed 'reset_lock' to 'vport_ctrl_lock' to reflect the lock usage * to avoid defensive programming, used 'vport_ctrl_lock' for the user callbacks that access the 'vport' to prevent the hardware reset thread from releasing the 'vport', when the user callback is in progress * added some variables to netdev private structure to avoid vport access if possible from ethtool and ndo callbacks * moved 'mac_filter_list_lock' and MAC related flags to vport_config structure and refactored mac filter flow to handle asynchronous ndo mac filter callbacks * stop the queues before starting the reset flow to avoid TX hangs * removed 'sw_mutex' and 'stop_mutex' as they are not needed anymore * added missing clear bit in 'init_task' error path * renamed labels appropriately Patch 8: * replaced page_pool_put_page with page_pool_put_full_page * for the page pool max_len, used PAGE_SIZE Patch 10, 11, 13: * made use of the 'netif_txq_maybe_stop', '__netif_txq_completed_wake' helper macros Patch 13: * removed IDPF_HR_RESET_IN_PROG flag check in idpf_tx_singleq_start as it is defensive Patch 14: * removed max descriptor check as the core does that * removed unnecessary error messages * removed the stats that are common between the ones reported by ethtool and ip link * replaced snprintf with ethtool_sprintf * added a comment to explain the reason for the max queue check * as the netdev queues are set on alloc, there is no need to set them again on reset unless there is a queue change, so move the 'idpf_set_real_num_queues' to 'idpf_initiate_soft_reset' Patch 15: * reworded the 'configure SRIOV' in the commit message v5: https://lore.kernel.org/netdev/[email protected]/ Most Patches: * wrapped line limit to 80 chars to those which don't effect readability Patch 12: * in skb_add_rx_frag, offset 'headlen' w.r.t page_offset when adding a frag to avoid adding the header again Patch 14: * added NULL check for 'rxq' when dereferencing it in page_pool_get_stats v4: https://lore.kernel.org/netdev/[email protected]/ Patch 1: * s/virtcnl/virtchnl * removed the kernel doc for the error code definitions that don't exist * reworded the summary part in the virtchnl2 header Patch 3: * don't set local variable to NULL on error * renamed sq_send_command_out label with err_unlock * don't use __GFP_ZERO in dma_alloc_coherent Patch 4: * introduced mailbox workqueue to process mailbox interrupts Patch 3, 4, 5, 6, 7, 8, 9, 11, 15: * removed unnecessary variable 0-init Patch 3, 5, 7, 8, 9, 15: * removed defensive programming checks wherever applicable * removed IDPF_CAP_FIELD_LAST as it can be treated as defensive programming Patch 3, 4, 5, 6, 7: * replaced IDPF_DFLT_MBX_BUF_SIZE with IDPF_CTLQ_MAX_BUF_LEN Patch 2 to 15: * add kernel-doc for idpf.h and idpf_txrx.h enums and structures Patch 4, 5, 15: * adjusted the destroy sequence of the workqueues as per the alloc sequence Patch 4, 5, 9, 15: * scrub unnecessary flags in 'idpf_flags' - IDPF_REMOVE_IN_PROG flag can take care of the cases where IDPF_REL_RES_IN_PROG is used, removed the later one - IDPF_REQ_[TX|RX]_SPLITQ are replaced with struct variables - IDPF_CANCEL_[SERVICE|STATS]_TASK are redundant as the work queue doesn't get rescheduled again after 'cancel_delayed_work_sync' - IDPF_HR_CORE_RESET is removed as there is no set_bit for this flag - IDPF_MB_INTR_TRIGGER is removed as it is not needed anymore with the mailbox workqueue implementation Patch 7 to 15: * replaced the custom buffer recycling code with page pool API * switched the header split buffer allocations from using a bunch of pages to using one large chunk of DMA memory * reordered some of the flows in vport_open to support page pool Patch 8, 12: * don't suppress the alloc errors by using __GFP_NOWARN Patch 9: * removed dyn_ctl_clrpba_m as it is not being used Patch 14: * introduced enum idpf_vport_reset_cause instead of using vport flags * introduced page pool stats v3: https://lore.kernel.org/netdev/[email protected]/ Patch 5: * instead of void, used 'struct virtchnl2_create_vport' type for vport_params_recvd and vport_params_reqd and removed the typecasting * used u16/u32 as needed instead of int for variables which cannot be negative and updated in all the places whereever applicable Patch 6: * changed the commit message to "add ptypes and MAC filter support" * used the sender Signed-off-by as the last tag on all the patches * removed unnecessary variables 0-init * instead of fixing the code in this commit, fixed it in the commit where the change was introduced first * moved get_type_info struct on to the stack instead of memory alloc * moved mutex_lock and ptype_info memory alloc outside while loop and adjusted the return flow * used 'break' instead of 'continue' in ptype id switch case v2: https://lore.kernel.org/netdev/[email protected]/ Patch 2: * added "Intel(R)" to the DRV_SUMMARY and Makefile. Patch 4, 5, 6, 15: * replaced IDPF_VC_MSG_PENDING flag with mutex 'vc_buf_lock' for the adapter related virtchnl opcodes. * get the mutex lock in the virtchnl send thread itself instead of in receive thread. Patch 5, 6, 7, 8, 9, 11, 14, 15: * replaced IDPF_VPORT_VC_MSG_PENDING flag with mutex 'vc_buf_lock' for the vport related virtchnl opcodes. * get the mutex lock in the virtchnl send thread itself instead of in receive thread. Patch 6: * converted get_ptype_info logic from 1:N to 1:1 message exchange for better handling of mutex lock. Patch 15: * introduced 'stats_lock' spinlock to avoid concurrent stats update. v1: https://lore.kernel.org/netdev/[email protected]/ ==================== Signed-off-by: David S. Miller <[email protected]>
2023-09-15i40e: Fix VF VLAN offloading when port VLAN is configuredIvan Vecera1-3/+5
If port VLAN is configured on a VF then any other VLANs on top of this VF are broken. During i40e_ndo_set_vf_port_vlan() call the i40e driver reset the VF and iavf driver asks PF (using VIRTCHNL_OP_GET_VF_RESOURCES) for VF capabilities but this reset occurs too early, prior setting of vf->info.pvid field and because this field can be zero during i40e_vc_get_vf_resources_msg() then VIRTCHNL_VF_OFFLOAD_VLAN capability is reported to iavf driver. This is wrong because iavf driver should not report VLAN offloading capability when port VLAN is configured as i40e does not support QinQ offloading. Fix the issue by moving VF reset after setting of vf->port_vlan_id field. Without this patch: $ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs $ ip link set enp2s0f0 vf 0 vlan 3 $ ip link set enp2s0f0v0 up $ ip link add link enp2s0f0v0 name vlan4 type vlan id 4 $ ip link set vlan4 up ... $ ethtool -k enp2s0f0v0 | grep vlan-offload rx-vlan-offload: on tx-vlan-offload: on $ dmesg -l err | grep iavf [1292500.742914] iavf 0000:02:02.0: Failed to add VLAN filter, error IAVF_ERR_INVALID_QP_ID With this patch: $ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs $ ip link set enp2s0f0 vf 0 vlan 3 $ ip link set enp2s0f0v0 up $ ip link add link enp2s0f0v0 name vlan4 type vlan id 4 $ ip link set vlan4 up ... $ ethtool -k enp2s0f0v0 | grep vlan-offload rx-vlan-offload: off [requested on] tx-vlan-offload: off [requested on] $ dmesg -l err | grep iavf Fixes: f9b4b6278d51 ("i40e: Reset the VF upon conflicting VLAN configuration") Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-15iavf: schedule a request immediately after add/delete vlanPetr Oros1-2/+2
When the iavf driver wants to reconfigure the VLAN filters (iavf_add_vlan, iavf_del_vlan), it sets a flag in aq_required: adapter->aq_required |= IAVF_FLAG_AQ_ADD_VLAN_FILTER; or: adapter->aq_required |= IAVF_FLAG_AQ_DEL_VLAN_FILTER; This is later processed by the watchdog_task, but it runs periodically every 2 seconds, so it can be a long time before it processes the request. In the worst case, the interface is unable to receive traffic for more than 2 seconds for no objective reason. Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Petr Oros <[email protected]> Co-developed-by: Michal Schmidt <[email protected]> Signed-off-by: Michal Schmidt <[email protected]> Co-developed-by: Ivan Vecera <[email protected]> Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Ahmed Zaki <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-15iavf: add iavf_schedule_aq_request() helperPetr Oros3-8/+6
Add helper for set iavf aq request AVF_FLAG_AQ_* and immediately schedule watchdog_task. Helper will be used in cases where it is necessary to run aq requests asap Signed-off-by: Petr Oros <[email protected]> Co-developed-by: Michal Schmidt <[email protected]> Signed-off-by: Michal Schmidt <[email protected]> Co-developed-by: Ivan Vecera <[email protected]> Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-15iavf: do not process adminq tasks when __IAVF_IN_REMOVE_TASK is setRadoslaw Tyl1-1/+2
Prevent schedule operations for adminq during device remove and when __IAVF_IN_REMOVE_TASK flag is set. Currently, the iavf_down function adds operations for adminq that shouldn't be processed when the device is in the __IAVF_REMOVE state. Reproduction: echo 4 > /sys/bus/pci/devices/0000:17:00.0/sriov_numvfs ip link set dev ens1f0 vf 0 trust on ip link set dev ens1f0 vf 1 trust on ip link set dev ens1f0 vf 2 trust on ip link set dev ens1f0 vf 3 trust on ip link set dev ens1f0 vf 0 mac 00:22:33:44:55:66 ip link set dev ens1f0 vf 1 mac 00:22:33:44:55:67 ip link set dev ens1f0 vf 2 mac 00:22:33:44:55:68 ip link set dev ens1f0 vf 3 mac 00:22:33:44:55:69 echo 0000:17:02.0 > /sys/bus/pci/devices/0000\:17\:02.0/driver/unbind echo 0000:17:02.1 > /sys/bus/pci/devices/0000\:17\:02.1/driver/unbind echo 0000:17:02.2 > /sys/bus/pci/devices/0000\:17\:02.2/driver/unbind echo 0000:17:02.3 > /sys/bus/pci/devices/0000\:17\:02.3/driver/unbind sleep 10 echo 0000:17:02.0 > /sys/bus/pci/drivers/iavf/bind echo 0000:17:02.1 > /sys/bus/pci/drivers/iavf/bind echo 0000:17:02.2 > /sys/bus/pci/drivers/iavf/bind echo 0000:17:02.3 > /sys/bus/pci/drivers/iavf/bind modprobe vfio-pci echo 8086 154c > /sys/bus/pci/drivers/vfio-pci/new_id qemu-system-x86_64 -accel kvm -m 4096 -cpu host \ -drive file=centos9.qcow2,if=none,id=virtio-disk0 \ -device virtio-blk-pci,drive=virtio-disk0,bootindex=0 -smp 4 \ -device vfio-pci,host=17:02.0 -net none \ -device vfio-pci,host=17:02.1 -net none \ -device vfio-pci,host=17:02.2 -net none \ -device vfio-pci,host=17:02.3 -net none \ -daemonize -vnc :5 Current result: There is a probability that the mac of VF in guest is inconsistent with it in host Expected result: When passthrough NIC VF to guest, the VF in guest should always get the same mac as it in host. Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage") Signed-off-by: Radoslaw Tyl <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2-14/+19
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <[email protected]>
2023-09-13idpf: add SRIOV support and other ndo_opsJoshua Hay8-3/+953
Add support for SRIOV: send the requested number of VFs to the device Control Plane, via the virtchnl message and then enable the VFs using 'pci_enable_sriov'. Add other ndo ops supported by the driver such as features_check, set_rx_mode, validate_addr, set_mac_address, change_mtu, get_stats64, set_features, and tx_timeout. Initialize the statistics task which requests the queue related statistics to the CP. Add loopback and promiscuous mode support and the respective virtchnl messages. Finally, add documentation and build support for the driver. Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add ethtool callbacksAlan Brady7-3/+1851
Initialize all the ethtool ops that are supported by the driver and add the necessary support for the ethtool callbacks. Also add asynchronous link notification virtchnl support where the device Control Plane sends the link status and link speed as an asynchronous event message. Driver report the link speed on ethtool .idpf_get_link_ksettings query. Introduce soft reset function which is used by some of the ethtool callbacks such as .set_channels, .set_ringparam etc. to change the existing queue configuration. It deletes the existing queues by sending delete queues virtchnl message to the CP and calls the 'vport_stop' flow which disables the queues, vport etc. New set of queues are requested to the CP and reconfigure the queue context by calling the 'vport_open' flow. Soft reset flow also adjusts the number of vectors associated to a vport if .set_channels is called. Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Alice Michael <[email protected]> Signed-off-by: Alice Michael <[email protected]> Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add singleq start_xmit and napi pollJoshua Hay8-31/+1290
Add the start_xmit, TX and RX napi poll support for the single queue model. Unlike split queue model, single queue uses same queue to post buffer descriptors and completed descriptors. Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add RX splitq napi poll supportAlan Brady4-6/+892
Add support to handle interrupts for the RX completion queue and RX buffer queue. When the interrupt fires on RX completion queue, process the RX descriptors that are received. Allocate and prepare the SKB with the RX packet info, for both data and header buffer. IDPF uses software maintained refill queues to manage buffers between RX queue producer and the buffer queue consumer. They are required in order to maintain a lockless buffer management system and are strictly software only constructs. Instead of updating the RX buffer queue tail with available buffers right after the clean routine, it posts the buffer ids to the refill queues, only to post them to the HW later. If the generic receive offload (GRO) is enabled in the capabilities and turned on by default or via ethtool, then HW performs the packet coalescing if certain criteria are met by the incoming packets and updates the RX descriptor. Similar to GRO, if generic checksum is enabled, HW computes the checksum and updates the respective fields in the descriptor. Add support to update the SKB fields with the GRO and the generic checksum received. Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add TX splitq napi poll supportJoshua Hay6-5/+926
Add support to handle the interrupts for the TX completion queue and process the various completion types. In the flow scheduling mode, the driver processes primarily buffer completions as well as descriptor completions occasionally. This mode supports out of order TX completions. To do so, HW generates one buffer completion per packet. Each of those completions contains the unique tag provided during the TX encoding which is used to locate the packet either on the TX buffer ring or in a hash table. The hash table is used to track TX buffer information so the descriptor(s) for a given packet can be reused while the driver is still waiting on the buffer completion(s). Packets end up in the hash table in one of 2 ways: 1) a packet was stashed during descriptor completion cleaning, or 2) because an out of order buffer completion was processed. A descriptor completion arrives only every so often and is primarily used to guarantee the TX descriptor ring can be reused without having to wait on the individual buffer completions. E.g. a descriptor completion for N+16 guarantees HW read all of the descriptors for packets N through N+15, therefore all of the buffers for packets N through N+15 are stashed into the hash table and the descriptors can be reused for more TX packets. Similarly, a packet can be stashed in the hash table because an out an order buffer completion was processed. E.g. processing a buffer completion for packet N+3 implies that HW read all of the descriptors for packets N through N+3 and they can be reused. However, the HW did not do the DMA yet. The buffers for packets N through N+2 cannot be freed, so they are stashed in the hash table. In either case, the buffer completions will eventually be processed for all of the stashed packets, and all of the buffers will be cleaned from the hash table. In queue based scheduling mode, the driver processes primarily descriptor completions and cleans the TX ring the conventional way. Finally, the driver triggers a TX queue drain after sending the disable queues virtchnl message. When the HW completes the queue draining, it sends the driver a queue marker packet completion. The driver determines when all TX queues have been drained and proceeds with the disable flow. With this, the driver can send TX packets and clean up the resources properly. Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add splitq start_xmitJoshua Hay5-1/+1107
Add start_xmit support for split queue model. To start with, add the necessary checks to linearize the skb if it uses more number of buffers than the hardware supported limit. Stop the transmit queue if there are no enough descriptors available for the skb to use or if there we're going to potentially overrun the completion queue. Finally prepare the descriptor with all the required information and update the tail. Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: initialize interrupts and enable vportPavan Kumar Linga10-3/+1389
To further continue 'vport open', initialize all the resources required for the interrupts. To start with, initialize the queue vector indices with the ones received from the device Control Plane. Now that all the TX and RX queues are initialized, map the RX descriptor and buffer queues as well as TX completion queues to the allocated vectors. Initialize and enable the napi handler for the napi polling. Finally, request the IRQs for the interrupt vectors from the stack and setup the interrupt handler. Once the interrupt init is done, send 'map queue vector', 'enable queues' and 'enable vport' virtchnl messages to the CP to complete the 'vport open' flow. Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: configure resources for RX queuesAlan Brady8-12/+1705
Similar to the TX, RX also supports both single and split queue models. In single queue model, the same descriptor queue is used by SW to post buffer descriptors to HW and by HW to post completed descriptors to SW. In split queue model, "RX buffer queues" are used to pass descriptor buffers from SW to HW whereas "RX queues" are used to post the descriptor completions i.e. descriptors that point to completed buffers, from HW to SW. "RX queue group" is a set of RX queues grouped together and will be serviced by a "RX buffer queue group". IDPF supports 2 buffer queues i.e. large buffer (4KB) queue and small buffer (2KB) queue per buffer queue group. HW uses large buffers for 'hardware gro' feature and also if the packet size is more than 2KB, if not 2KB buffers are used. Add all the resources required for the RX queues initialization. Allocate memory for the RX queue and RX buffer queue groups. Initialize the software maintained refill queues for buffer management algorithm. Same like the TX queues, initialize the queue parameters for the RX queues and send the config RX queue virtchnl message to the device Control Plane. Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Alice Michael <[email protected]> Signed-off-by: Alice Michael <[email protected]> Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: configure resources for TX queuesAlan Brady6-0/+1410
IDPF supports two queue models i.e. single queue which is a traditional queueing model as well as split queue model. In single queue model, the same descriptor queue is used by SW to post descriptors to the HW, HW to post completed descriptors to SW. In split queue model, "TX Queues" are used to pass buffers from SW to HW and "TX Completion Queues" are used to post descriptor completions from HW to SW. Device supports asymmetric ratio of TX queues to TX completion queues. Considering this, queue group mechanism is used i.e. some TX queues are grouped together which will be serviced by only one TX completion queue per TX queue group. Add all the resources required for the TX queues initialization. To start with, allocate memory for the TX queue groups, TX queues and TX completion queues. Then, allocate the descriptors for both TX and TX completion queues, and bookkeeping buffers for TX queues alone. Also, allocate queue vectors for the vport and initialize the TX queue related fields for each queue vector. Initialize the queue parameters such as q_id, q_type and tail register offset with the info received from the device control plane (CP). Once all the TX queues are configured, send config TX queue virtchnl message to the CP with all the TX queue context information. Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Alice Michael <[email protected]> Signed-off-by: Alice Michael <[email protected]> Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add ptypes and MAC filter supportPavan Kumar Linga4-1/+837
Add the virtchnl support to request the packet types. Parse the responses received from CP and based on the protocol headers, populate the packet type structure with necessary information. Initialize the MAC address and add the virtchnl support to add and del MAC address. Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Co-developed-by: Shailendra Bhatnagar <[email protected]> Signed-off-by: Shailendra Bhatnagar <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add create vport and netdev configurationPavan Kumar Linga7-20/+1522
Add the required support to create a vport by spawning the init task. Once the vport is created, initialize and allocate the resources needed for it. Configure and register a netdev for each vport with all the features supported by the device based on the capabilities received from the device Control Plane. Spawn the init task till all the default vports are created. Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Co-developed-by: Shailendra Bhatnagar <[email protected]> Signed-off-by: Shailendra Bhatnagar <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add core init and interrupt requestPavan Kumar Linga9-3/+1471
As the mailbox is setup, add the necessary send and receive mailbox message framework to support the virtchnl communication between the driver and device Control Plane (CP). Add the core initialization. To start with, driver confirms the virtchnl version with the CP. Once that is done, it requests and gets the required capabilities and resources needed such as max vectors, queues etc. Based on the vector information received in 'VIRTCHNL2_OP_GET_CAPS', request the stack to allocate the required vectors. Finally add the interrupt handling mechanism for the mailbox queue and enable the interrupt. Note: Checkpatch issues a warning about IDPF_FOREACH_VPORT_VC_STATE and IDPF_GEN_STRING being complex macros and should be enclosed in parentheses but it's not the case. They are never used as a statement and instead only used to define the enum and array. Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Emil Tantilov <[email protected]> Signed-off-by: Emil Tantilov <[email protected]> Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Co-developed-by: Shailendra Bhatnagar <[email protected]> Signed-off-by: Shailendra Bhatnagar <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add controlq init and reset checksJoshua Hay14-3/+1851
At the end of the probe, initialize and schedule the event workqueue. It calls the hard reset function where reset checks are done to find if the device is out of the reset. Control queue initialization and the necessary control queue support is added. Introduce function pointers for the register operations which are different between PF and VF devices. Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Co-developed-by: Shailendra Bhatnagar <[email protected]> Signed-off-by: Shailendra Bhatnagar <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13idpf: add module register and probe functionalityPhani Burra5-0/+193
Add the required support to register IDPF PCI driver, as well as probe and remove call backs. Enable the PCI device and request the kernel to reserve the memory resources that will be used by the driver. Finally map the BAR0 address space. Signed-off-by: Phani Burra <[email protected]> Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Shailendra Bhatnagar <[email protected]> Signed-off-by: Shailendra Bhatnagar <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Co-developed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13virtchnl: add virtchnl version 2 opsPavan Kumar Linga2-0/+1724
Virtchnl version 1 is an interface used by the current generation of foundational NICs to negotiate the capabilities and configure the HW resources such as queues, vectors, RSS LUT, etc between the PF and VF drivers. It is not extensible to enable new features supported in the next generation of NICs/IPUs and to negotiate descriptor types, packet types and register offsets. To overcome the limitations of the existing interface, introduce the virtchnl version 2 and add the necessary opcodes, structures, definitions, and descriptor formats. The driver also learns the data queue and other register offsets to use instead of hardcoding them. The advantage of this approach is that it gives the flexibility to modify the register offsets if needed, restrict the use of certain descriptor types and negotiate the supported packet types. Co-developed-by: Alan Brady <[email protected]> Signed-off-by: Alan Brady <[email protected]> Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Madhu Chittim <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Co-developed-by: Phani Burra <[email protected]> Signed-off-by: Phani Burra <[email protected]> Co-developed-by: Sridhar Samudrala <[email protected]> Signed-off-by: Sridhar Samudrala <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13iavf: Add ability to turn off CRC stripping for VFNorbert Zulinski3-1/+64
Previously CRC stripping was always enabled for VF. Now it is possible to turn off CRC stripping via ethtool: #ethtool -K <interface> rx-fcs on To turn off CRC stripping, first VLAN stripping must be disabled: #ethtool -K <interface> rx-vlan-offload off if any VLAN interfaces exists, otherwise VLAN stripping will be turned off by the driver. In iavf_configure_queues add check if CRC stripping is enabled for VF, if it's enabled then set crc_disabled to false on every VF's queue. In iavf_set_features add check if CRC stripping setting was changed then schedule reset. Signed-off-by: Norbert Zulinski <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Signed-off-by: Ahmed Zaki <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13ice: Check CRC strip requirement for VLAN stripHaiyue Wang2-9/+58
When VLAN strip is enabled, the CRC strip must not be disabled. And when the CRC strip is disabled, the VLAN strip should not be enabled. The driver needs to check CRC strip disable setting parameter before configuring the Rx/Tx queues, otherwise, in current error handling, the already set Tx queue context doesn't roll back correctly, it will cause the Tx queue setup failure next time: "Failed to set LAN Tx queue context" Signed-off-by: Haiyue Wang <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Signed-off-by: Ahmed Zaki <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13ice: Support FCS/CRC strip disable for VFHaiyue Wang1-0/+15
To support CRC strip enable/disable functionality, VF needs the explicit request VIRTCHNL_VF_OFFLOAD_CRC offload. Then according to crc_disable flag of Rx queue configuration information to set up the queue context. Signed-off-by: Haiyue Wang <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Signed-off-by: Ahmed Zaki <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-13igb: clean up in all error paths when enabling SR-IOVCorinna Vinschen1-1/+4
After commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), removing the igb module could hang or crash (depending on the machine) when the module has been loaded with the max_vfs parameter set to some value != 0. In case of one test machine with a dual port 82580, this hang occurred: [ 232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1 [ 233.093257] igb 0000:41:00.1: IOV Disabled [ 233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0 [ 233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.352248] igb 0000:41:00.0: device [8086:1516] error status/mask=00100000 [ 233.361088] igb 0000:41:00.0: [20] UnsupReq (First) [ 233.368183] igb 0000:41:00.0: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.388779] igb 0000:41:00.1: device [8086:1516] error status/mask=00100000 [ 233.397629] igb 0000:41:00.1: [20] UnsupReq (First) [ 233.404736] igb 0000:41:00.1: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback) [ 233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0 [ 233.546197] pcieport 0000:40:01.0: AER: device recovery failed [ 234.157244] igb 0000:41:00.0: IOV Disabled [ 371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds. [ 371.627489] Not tainted 6.4.0-dirty #2 [ 371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this. [ 371.641000] task:irq/35-aerdrv state:D stack:0 pid:257 ppid:2 f0 [ 371.650330] Call Trace: [ 371.653061] <TASK> [ 371.655407] __schedule+0x20e/0x660 [ 371.659313] schedule+0x5a/0xd0 [ 371.662824] schedule_preempt_disabled+0x11/0x20 [ 371.667983] __mutex_lock.constprop.0+0x372/0x6c0 [ 371.673237] ? __pfx_aer_root_reset+0x10/0x10 [ 371.678105] report_error_detected+0x25/0x1c0 [ 371.682974] ? __pfx_report_normal_detected+0x10/0x10 [ 371.688618] pci_walk_bus+0x72/0x90 [ 371.692519] pcie_do_recovery+0xb2/0x330 [ 371.696899] aer_process_err_devices+0x117/0x170 [ 371.702055] aer_isr+0x1c0/0x1e0 [ 371.705661] ? __set_cpus_allowed_ptr+0x54/0xa0 [ 371.710723] ? __pfx_irq_thread_fn+0x10/0x10 [ 371.715496] irq_thread_fn+0x20/0x60 [ 371.719491] irq_thread+0xe6/0x1b0 [ 371.723291] ? __pfx_irq_thread_dtor+0x10/0x10 [ 371.728255] ? __pfx_irq_thread+0x10/0x10 [ 371.732731] kthread+0xe2/0x110 [ 371.736243] ? __pfx_kthread+0x10/0x10 [ 371.740430] ret_from_fork+0x2c/0x50 [ 371.744428] </TASK> The reproducer was a simple script: #!/bin/sh for i in `seq 1 5`; do modprobe -rv igb modprobe -v igb max_vfs=1 sleep 1 modprobe -rv igb done It turned out that this could only be reproduce on 82580 (quad and dual-port), but not on 82576, i350 and i210. Further debugging showed that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because dev->is_physfn is 0 on 82580. Prior to commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), igb_enable_sriov() jumped into the "err_out" cleanup branch. After this commit it only returned the error code. So the cleanup didn't take place, and the incorrect VF setup in the igb_adapter structure fooled the igb driver into assuming that VFs have been set up where no VF actually existed. Fix this problem by cleaning up again if pci_enable_sriov() fails. Fixes: 50f303496d92 ("igb: Enable SR-IOV after reinit") Signed-off-by: Corinna Vinschen <[email protected]> Reviewed-by: Akihiko Odaki <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-13ixgbe: fix timestamp configuration codeVadim Fedorenko1-13/+15
The commit in fixes introduced flags to control the status of hardware configuration while processing packets. At the same time another structure is used to provide configuration of timestamper to user-space applications. The way it was coded makes this structures go out of sync easily. The repro is easy for 82599 chips: [root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1 current settings: tx_type 0 rx_filter 0 new settings: tx_type 1 rx_filter 12 The eth0 device is properly configured to timestamp any PTPv2 events. [root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1 current settings: tx_type 1 rx_filter 12 SIOCSHWTSTAMP failed: Numerical result out of range The requested time stamping mode is not supported by the hardware. The error is properly returned because HW doesn't support all packets timestamping. But the adapter->flags is cleared of timestamp flags even though no HW configuration was done. From that point no RX timestamps are received by user-space application. But configuration shows good values: [root@hostname ~]# hwstamp_ctl -i eth0 current settings: tx_type 1 rx_filter 12 Fix the issue by applying new flags only when the HW was actually configured. Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices") Signed-off-by: Vadim Fedorenko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-11iavf: Fix promiscuous mode configuration flow messagesBrett Creeley3-60/+74
Currently when configuring promiscuous mode on the AVF we detect a change in the netdev->flags. We use IFF_PROMISC and IFF_ALLMULTI to determine whether or not we need to request/release promiscuous mode and/or multicast promiscuous mode. The problem is that the AQ calls for setting/clearing promiscuous/multicast mode are treated separately. This leads to a case where we can trigger two promiscuous mode AQ calls in a row with the incorrect state. To fix this make a few changes. Use IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE instead of the previous IAVF_FLAG_AQ_[REQUEST|RELEASE]_[PROMISC|ALLMULTI] flags. In iavf_set_rx_mode() detect if there is a change in the netdev->flags in comparison with adapter->flags and set the IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE aq_required bit. Then in iavf_process_aq_command() only check for IAVF_FLAG_CONFIGURE_PROMISC_MODE and call iavf_set_promiscuous() if it's set. In iavf_set_promiscuous() check again to see which (if any) promiscuous mode bits have changed when comparing the netdev->flags with the adapter->flags. Use this to set the flags which get sent to the PF driver. Add a spinlock that is used for updating current_netdev_promisc_flags and only allows one promiscuous mode AQ at a time. [1] Fixes the fact that we will only have one AQ call in the aq_required queue at any one time. [2] Streamlines the change in promiscuous mode to only set one AQ required bit. [3] This allows us to keep track of the current state of the flags and also makes it so we can take the most recent netdev->flags promiscuous mode state. [4] This fixes the problem where a change in the netdev->flags can cause IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE to be set in iavf_set_rx_mode(), but cleared in iavf_set_promiscuous() before the change is ever made via AQ call. Fixes: 47d3483988f6 ("i40evf: Add driver support for promiscuous mode") Signed-off-by: Brett Creeley <[email protected]> Signed-off-by: Ahmed Zaki <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-09-11i40e: fix potential memory leaks in i40e_remove()Andrii Staikov1-3/+7
Instead of freeing memory of a single VSI, make sure the memory for all VSIs is cleared before releasing VSIs. Add releasing of their resources in a loop with the iteration number equal to the number of allocated VSIs. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Andrii Staikov <[email protected]> Signed-off-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>