aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice
AgeCommit message (Collapse)AuthorFilesLines
2023-01-30devlink: remove devlink featuresJiri Pirko1-1/+0
Devlink features were introduced to disallow devlink reload calls of userspace before the devlink was fully initialized. The reason for this workaround was the fact that devlink reload was originally called without devlink instance lock held. However, with recent changes that converted devlink reload to be performed under devlink instance lock, this is redundant so remove devlink features entirely. Note that mlx5 used this to enable devlink reload conditionally only when device didn't act as multi port slave. Move the multi port check into mlx5_devlink_reload_down() callback alongside with the other checks preventing the device from reload in certain states. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-11/+17
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/[email protected]/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-27ice: Prevent set_channel from changing queues while RDMA activeDave Ertman5-19/+43
The PF controls the set of queues that the RDMA auxiliary_driver requests resources from. The set_channel command will alter that pool and trigger a reconfiguration of the VSI, which breaks RDMA functionality. Prevent set_channel from executing when RDMA driver bound to auxiliary device. Adding a locked variable to pass down the call chain to avoid double locking the device_lock. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-27ice: remove pointless calls to devlink_param_driverinit_value_set()Jiri Pirko1-18/+2
devlink_param_driverinit_value_set() call makes sense only for "driverinit" params. However here, both params are "runtime". devlink_param_driverinit_value_set() returns -EOPNOTSUPP in such case and does not do anything. So remove the pointless calls to devlink_param_driverinit_value_set() entirely. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-01-24ice: move devlink port creation/deletionPaul M Stillwell Jr2-11/+17
Commit a286ba738714 ("ice: reorder PF/representor devlink port register/unregister flows") moved the code to create and destroy the devlink PF port. This was fine, but created a corner case issue in the case of ice_register_netdev() failing. In that case, the driver would end up calling ice_devlink_destroy_pf_port() twice. Additionally, it makes no sense to tie creation of the devlink PF port to the creation of the netdev so separate out the code to create/destroy the devlink PF port from the netdev code. This makes it a cleaner interface. Fixes: a286ba738714 ("ice: reorder PF/representor devlink port register/unregister flows") Signed-off-by: Paul M Stillwell Jr <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-20ice: use GNSS subsystem instead of TTYArkadiusz Kubalewski4-257/+146
Previously support for GNSS was implemented as a TTY driver, it allowed to access GNSS receiver on /dev/ttyGNSS_<bus><func>. Use generic GNSS subsystem API instead of implementing own TTY driver. The receiver is accessible on /dev/gnss<id>. In case of multiple receivers in the OS, correct device can be found by enumerating either: - /sys/class/net/<eth port>/device/gnss/ - /sys/class/gnss/gnss<id>/device/ Using GNSS subsystem is superior to implementing own TTY driver, as the GNSS subsystem was designed solely for this purpose. It also implements TTY driver but in a common and defined way. From user perspective, there is no difference in communicating with a device, except new path to the device shall be used. The device will provide same information to the userspace as the old one, and can be used in the same way, i.e.: old # gpsmon /dev/ttyGNSS_2100_0 new # gpsmon /dev/gnss0 There is no other impact on userspace tools. User expecting onboard GNSS receiver support is required to enable CONFIG_GNSS=y/m in kernel config. Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: Karol Kolacinski <[email protected]> Signed-off-by: Michal Michalik <[email protected]> Signed-off-by: Arkadiusz Kubalewski <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-01-19ice: Remove excess spaceTony Nguyen1-1/+1
smatch reports inconsistent indenting due to an extra space; remove it to resolve the issue. smatch warnings: drivers/net/ethernet/intel/ice/ice_lib.c:1673 ice_vsi_alloc_ring_stats() warn: inconsistent indenting Reported-by: kernel test robot <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: Introduce local var for readabilityTony Nguyen1-3/+6
Based on previous feedback[1], introduce a local var to make things more readable. [1] https://lore.kernel.org/netdev/20220315203218.607f612b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ Signed-off-by: Tony Nguyen <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
2023-01-19ice: Match parameter name for ice_cfg_phy_fc()Tony Nguyen1-1/+1
The parameter name in the function declaration and definition do not match; adjust the naming for consistency and to avoid confusion. Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: Explicitly return 0Tony Nguyen2-2/+2
Previous checks, and goto, will catch all errors meaning these returns will only return 0; explicitly return 0 for these cases. Signed-off-by: Tony Nguyen <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
2023-01-19ice: Reduce scope of variablesTony Nguyen2-5/+10
There are some places where the scope of a variable can be reduced, so do that. Signed-off-by: Tony Nguyen <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
2023-01-19ice: Move support DDP code out of ice_flex_pipe.cSergey Temerkhanov6-2300/+2344
Currently, ice_flex_pipe.c includes the DDP loading functions and has grown large. Although flexible processing support code is related to DDP loading, these parts are distinct. Move the DDP loading functionality from ice_flex_pipe.c to a separate file. Signed-off-by: Sergey Temerkhanov <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: Remove cppcheck suppressionsTony Nguyen4-8/+0
The use of suppressions for cppcheck in the kernel does not look to be standard as the ice driver is the only one doing it. Remove the comments/suppressions. Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: combine cases in ice_ksettings_find_adv_link_speed()Przemek Kitszel1-12/+8
Combine if statements setting the same link speed together. Suggested-by: Tony Nguyen <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Acked-by: Anirudh Venkataramanan <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: Add support for 100G KR2/CR2/SR2 link reportingAnirudh Venkataramanan1-9/+32
Commit 2736d94f351b ("ethtool: Added support for 50Gbps per lane link modes") in v5.1 added (among other things) support for 100G CR2/KR2/SR2 link modes. Advertise these link modes if the firmware reports the corresponding PHY types. Signed-off-by: Anirudh Venkataramanan <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: add missing checks for PF vsi typeJesse Brandeburg1-9/+8
There were a few places we had missed checking the VSI type to make sure it was definitely a PF VSI, before calling setup functions intended only for the PF VSI. This doesn't fix any explicit bugs but cleans up the code in a few places and removes one explicit != vsi->type check that can be superseded by this code (it's a super set) Signed-off-by: Jesse Brandeburg <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: remove redundant non-null check in ice_setup_pf_sw()Anirudh Venkataramanan1-7/+5
Remove a redundant null check, as vsi could not be null at this point. Signed-off-by: Anirudh Venkataramanan <[email protected]> Signed-off-by: Przemek Kitszel <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: restrict PTP HW clock freq adjustments to 100, 000, 000 PPBSiddaraju DH1-1/+1
The PHY provides only 39b timestamp. With current timing implementation, we discard lower 7b, leaving 32b timestamp. The driver reconstructs the full 64b timestamp by correlating the 32b timestamp with cached_time for performance. The reconstruction algorithm does both forward & backward interpolation. The 32b timeval has overflow duration of 2^32 counts ~= 4.23 second. Due to interpolation in both direction, its now ~= 2.125 second IIRC, going with at least half a duration, the cached_time is updated with periodic thread of 1 second (worst-case) periodicity. But the 1 second periodicity is based on System-timer. With PPB adjustments, if the 1588 timers increments at say double the rate, (2s in-place of 1s), the Nyquist rate/half duration sampling/update of cached_time with 1 second periodic thread will lead to incorrect interpolations. Hence we should restrict the PPB adjustments to at least half duration of cached_time update which translates to 500,000,000 PPB. Since the periodicity of the cached-time system thread can vary, it is good to have some buffer time and considering practicality of PPB adjustments, limiting the max_adj to 100,000,000. Signed-off-by: Siddaraju DH <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: Support drop actionAmritha Nambiar2-20/+40
Currently the drop action is supported only in switchdev mode. Add support for offloading receive filters with action drop in ADQ/non-ADQ modes. This is in addition to other actions such as forwarding to a VSI (ADQ) or a queue (ADQ/non-ADQ). Also renamed 'ch_vsi' to 'dest_vsi' as it is valid for multiple actions such as forward to vsi/queue which may/may not create a channel vsi. Reviewed-by: Sridhar Samudrala <[email protected]> Signed-off-by: Amritha Nambiar <[email protected]> Tested-by: Bharathi Sreenivas <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: Handle LLDP MIB Pending changeAnatolii Gerasymenko3-15/+91
If the number of Traffic Classes (TC) is decreased, the FW will no longer remove TC nodes, but will send a pending change notification. This will allow RDMA to destroy corresponding Control QP markers. After RDMA finishes outstanding operations, the ice driver will send an execute MIB Pending change admin queue command to FW to finish DCB configuration change. The FW will buffer all incoming Pending changes, so there can be only one active Pending change. RDMA driver guarantees to remove Control QP markers within 5000 ms. Hence, LLDP response timeout txTTL (default 30 sec) will be met. In the case of a Pending change, LLDP MIB Change Event (opcode 0x0A01) will contain the whole new MIB. But Get LLDP MIB (opcode 0x0A00) AQ call would still return an old MIB, as the Pending change hasn't been applied yet. Add ice_get_dcb_cfg_from_mib_change() function to retrieve DCBX config from LLDP MIB Change Event's buffer for Pending changes. Co-developed-by: Dave Ertman <[email protected]> Signed-off-by: Dave Ertman <[email protected]> Signed-off-by: Anatolii Gerasymenko <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-19ice: Add 'Execute Pending LLDP MIB' Admin Queue commandTsotne Chakhvadze4-2/+33
In DCB Willing Mode (FW managed LLDP), when the link partner changes configuration which requires fewer TCs, the TCs that are no longer needed are suspended by EMP FW, removed, and never resumed. This occurs before a MIB change event is indicated to SW. The permanent suspension and removal of these TC nodes in the scheduler prevents RDMA from being able to destroy QPs associated with this TC, requiring a CORE reset to recover. A new DCBX configuration change flow is defined to allow SW driver and other SW components (RDMA) to properly adjust to the configuration changes before they are taking effect in HW. This flow includes a two-way handshake between EMP FW<->LAN SW<->RDMA SW. List of changes: - Add 'Execute Pending LLDP MIB' AQC. - Add 'Pending Event Enable' bit. - Add additional logic to ignore Pending Event Enable' request while 'LLDP MIB Chnage' event is disabled. - Add 'Execute Pending LLDP MIB' AQC sending function to FW, which is needed to take place MIB Event change. Signed-off-by: Tsotne Chakhvadze <[email protected]> Co-developed-by: Karen Sornek <[email protected]> Signed-off-by: Karen Sornek <[email protected]> Co-developed-by: Dave Ertman <[email protected]> Signed-off-by: Dave Ertman <[email protected]> Co-developed-by: Anatolii Gerasymenko <[email protected]> Signed-off-by: Anatolii Gerasymenko <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-09ice: Add check for kzallocJiasheng Jiang1-9/+14
Add the check for the return value of kzalloc in order to avoid NULL pointer dereference. Moreover, use the goto-label to share the clean code. Fixes: d6b98c8d242a ("ice: add write functionality for GNSS TTY") Signed-off-by: Jiasheng Jiang <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-09ice: Fix potential memory leak in ice_gnss_tty_write()Yuan Can1-0/+1
The ice_gnss_tty_write() return directly if the write_buf alloc failed, leaking the cmd_buf. Fix by free cmd_buf if write_buf alloc failed. Fixes: d6b98c8d242a ("ice: add write functionality for GNSS TTY") Signed-off-by: Yuan Can <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-21ice: xsk: do not use xdp_return_frame() on tx_buf->raw_bufMaciej Fijalkowski1-1/+1
Previously ice XDP xmit routine was changed in a way that it avoids xdp_buff->xdp_frame conversion as it is simply not needed for handling XDP_TX action and what is more it saves us CPU cycles. This routine is re-used on ZC driver to handle XDP_TX action. Although for XDP_TX on Rx ZC xdp_buff that comes from xsk_buff_pool is converted to xdp_frame, xdp_frame itself is not stored inside ice_tx_buf, we only store raw data pointer. Casting this pointer to xdp_frame and calling against it xdp_return_frame in ice_clean_xdp_tx_buf() results in undefined behavior. To fix this, simply call page_frag_free() on tx_buf->raw_buf. Later intention is to remove the buff->frame conversion in order to simplify the codebase and improve XDP_TX performance on ZC. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Reported-and-tested-by: Robin Cowley <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Piotr Raczynski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-08ice: reschedule ice_ptp_wait_for_offset_valid during resetJacob Keller1-1/+6
If the ice_ptp_wait_for_offest_valid function is scheduled to run while the driver is resetting, it will exit without completing calibration. The work function gets scheduled by ice_ptp_port_phy_restart which will be called as part of the reset recovery process. It is possible for the first execution to occur before the driver has completely cleared its resetting flags. Ensure calibration completes by rescheduling the task until reset is fully completed. Reported-by: Siddaraju DH <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: make Tx and Rx vernier offset calibration independentSiddaraju DH3-173/+91
The Tx and Rx calibration and timestamp generation blocks are independent. However, the ice driver waits until both blocks are ready before configuring either block. This can result in delay of configuring one block because we have not yet received a packet in the other block. There is no reason to wait to finish programming Tx just because we haven't received a packet. Similarly there is no reason to wait to program Rx just because we haven't transmitted a packet. Instead of checking both offset status before programming either block, refactor the ice_phy_cfg_tx_offset_e822 and ice_phy_cfg_rx_offset_e822 functions so that they perform their own offset status checks. Additionally, make them also check the offset ready bit to determine if the offset values have already been programmed. Call the individual configure functions directly in ice_ptp_wait_for_offset_valid. The functions will now correctly check status, and program the offsets if ready. Once the offset is programmed, the functions will exit quickly after just checking the offset ready register. Remove the ice_phy_calc_vernier_e822 in ice_ptp_hw.c, as well as the offset valid check functions in ice_ptp.c entirely as they are no longer necessary. With this change, the Tx and Rx blocks will each be enabled as soon as possible without waiting for the other block to complete calibration. This can enable timestamps faster in setups which have a low rate of transmitted or received packets. In particular, it can stop a situation where one port never receives traffic, and thus never finishes calibration of the Tx block, resulting in continuous faults reported by the ptp4l daemon application. Signed-off-by: Siddaraju DH <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: only check set bits in ice_ptp_flush_tx_trackerJacob Keller1-9/+29
The ice_ptp_flush_tx_tracker function is called to clear all outstanding Tx timestamp requests when the port is being brought down. This function iterates over the entire list, but this is unnecessary. We only need to check the bits which are actually set in the ready bitmap. Replace this logic with for_each_set_bit, and follow a similar flow as in ice_ptp_tx_tstamp_cleanup. Note that it is safe to call dev_kfree_skb_any on a NULL pointer as it will perform a no-op so we do not need to verify that the skb is actually NULL. The new implementation also avoids clearing (and thus reading!) the PHY timestamp unless the index is marked as having a valid timestamp in the timestamp status bitmap. This ensures that we properly clear the status registers as appropriate. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstampJacob Keller2-36/+67
In the event of a PTP clock time change due to .adjtime or .settime, the ice driver needs to update the cached copy of the PHC time and also discard any outstanding Tx timestamps. This is required because otherwise the wrong copy of the PHC time will be used when extending the Tx timestamp. This could result in reporting incorrect timestamps to the stack. The current approach taken to handle this is to call ice_ptp_flush_tx_tracker, which will discard any timestamps which are not yet complete. This is problematic for two reasons: 1) it could lead to a potential race condition where the wrong timestamp is associated with a future packet. This can occur with the following flow: 1. Thread A gets request to transmit a timestamped packet, and picks an index and transmits the packet 2. Thread B calls ice_ptp_flush_tx_tracker and sees the index in use, marking is as disarded. No timestamp read occurs because the status bit is not set, but the index is released for re-use 3. Thread A gets a new request to transmit another timestamped packet, picks the same (now unused) index and transmits that packet. 4. The PHY transmits the first packet and updates the timestamp slot and generates an interrupt. 5. The ice_ptp_tx_tstamp thread executes and sees the interrupt and a valid timestamp but associates it with the new Tx SKB and not the one that actual timestamp for the packet as expected. This could result in the previous timestamp being assigned to a new packet producing incorrect timestamps and leading to incorrect behavior in PTP applications. This is most likely to occur when the packet rate for Tx timestamp requests is very high. 2) on E822 hardware, we must avoid reading a timestamp index more than once each time its status bit is set and an interrupt is generated by hardware. We do have some extensive checks for the unread flag to ensure that only one of either the ice_ptp_flush_tx_tracker or ice_ptp_tx_tstamp threads read the timestamp. However, even with this we can still have cases where we "flush" a timestamp that was actually completed in hardware. This can lead to cases where we don't read the timestamp index as appropriate. To fix both of these issues, we must avoid calling ice_ptp_flush_tx_tracker outside of the teardown path. Rather than using ice_ptp_flush_tx_tracker, introduce a new state bitmap, the stale bitmap. Start this as cleared when we begin a new timestamp request. When we're about to extend a timestamp and send it up to the stack, first check to see if that stale bit was set. If so, drop the timestamp without sending it to the stack. When we need to update the cached PHC timestamp out of band, just mark all currently outstanding timestamps as stale. This will ensure that once hardware completes the timestamp we'll ignore it correctly and avoid reporting bogus timestamps to userspace. With this change, we fix potential issues caused by calling ice_ptp_flush_tx_tracker during normal operation. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: cleanup allocations in ice_ptp_alloc_tx_trackerJacob Keller1-7/+11
The ice_ptp_alloc_tx_tracker function must allocate the timestamp array and the bitmap for tracking the currently in use indexes. A future change is going to add yet another allocation to this function. If these allocations fail we need to ensure that we properly cleanup and ensure that the pointers in the ice_ptp_tx structure are NULL. Simplify this logic by allocating to local variables first. If any allocation fails, then free everything and exit. Only update the ice_ptp_tx structure if all allocations succeed. This ensures that we have no side effects on the Tx structure unless all allocations have succeeded. Thus, no code will see an invalid pointer and we don't need to re-assign NULL on cleanup. This is safe because kernel "free" functions are designed to be NULL safe and perform no action if passed a NULL pointer. Thus its safe to simply always call kfree or bitmap_free even if one of those pointers was NULL. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: protect init and calibrating check in ice_ptp_request_tsJacob Keller2-6/+32
When requesting a new timestamp, the ice_ptp_request_ts function does not hold the Tx tracker lock while checking init and calibrating. This means that we might issue a new timestamp request just after the Tx timestamp tracker starts being deinitialized. This could lead to incorrect access of the timestamp structures. Correct this by moving the init and calibrating checks under the lock, and updating the flows which modify these fields to use the lock. Note that we do not need to hold the lock while checking for tx->init in ice_ptp_tx_tstamp. This is because the teardown function will use synchronize_irq after clearing the flag to ensure that the threaded interrupt completes. Either a) the tx->init flag will be cleared before the ice_ptp_tx_tstamp function starts, thus it will exit immediately, or b) the threaded interrupt will be executing and the synchronize_irq will wait until the threaded interrupt has completed at which point we know the init field has definitely been set and new interrupts will not execute the Tx timestamp thread function. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: synchronize the misc IRQ when tearing down Tx trackerJacob Keller1-4/+6
Since commit 1229b33973c7 ("ice: Add low latency Tx timestamp read") the ice driver has used a threaded IRQ for handling Tx timestamps. This change did not add a call to synchronize_irq during ice_ptp_release_tx_tracker. Thus it is possible that an interrupt could occur just as the tracker is being removed. This could lead to a use-after-free of the Tx tracker structure data. Fix this by calling sychronize_irq in ice_ptp_release_tx_tracker after we've cleared the init flag. In addition, make sure that we re-check the init flag at the end of ice_ptp_tx_tstamp before we exit ensuring that we will stop polling for new timestamps once the tracker de-initialization has begun. Refactor the ts_handled variable into "more_timestamps" so that we can simply directly assign this boolean instead of relying on an initialized value of true. This makes the new combined check easier to read. With this change, the ice_ptp_release_tx_tracker function will now wait for the threaded interrupt to complete if it was executing while the init flag was cleared. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: check Tx timestamp memory register for ready timestampsJacob Keller4-12/+126
The PHY for E822 based hardware has a register which indicates which timestamps are valid in the PHY timestamp memory block. Each bit in the register indicates whether the associated index in the timestamp memory is valid. Hardware sets this bit when the timestamp is captured, and clears the bit when the timestamp is read. Use of this register is important as reading timestamp registers can impact the way that hardware generates timestamp interrupts. This occurs because the PHY has an internal value which is incremented when hardware captures a timestamp and decremented when software reads a timestamp. Reading timestamps which are not marked as valid still decrement the internal value and can result in the Tx timestamp interrupt not triggering in the future. To prevent this, use the timestamp memory value to determine which timestamps are ready to be read. The ice_get_phy_tx_tstamp_ready function reads this value. For E810 devices, this just always returns with all bits set. Skip any timestamp which is not set in this bitmap, avoiding reading extra timestamps on E822 devices. The stale check against a cached timestamp value is no longer necessary for PHYs which support the timestamp ready bitmap properly. E810 devices still need this. Introduce a new verify_cached flag to the ice_ptp_tx structure. Use this to determine if we need to perform the verification against the cached timestamp value. Set this to 1 for the E810 Tx tracker init function. Notice that many of the fields in ice_ptp_tx are simple 1 bit flags. Save some structure space by using bitfields of length 1 for these values. Modify the ICE_PTP_TS_VALID check to simply drop the timestamp immediately so that in an event of getting such an invalid timestamp the driver does not attempt to re-read the timestamp again in a future poll of the register. With these changes, the driver now reads each timestamp register exactly once, and does not attempt any re-reads. This ensures the interrupt tracking logic in the PHY will not get stuck. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: handle discarding old Tx requests in ice_ptp_tx_tstampJacob Keller1-63/+48
Currently the driver uses the PTP kthread to process handling and discarding of stale Tx timestamp requests. The function ice_ptp_tx_tstamp_cleanup is used for this. A separate thread creates complications for the driver as we now have both the main Tx timestamp processing IRQ checking timestamps as well as the kthread. Rather than using the kthread to handle this, simply check for stale timestamps within the ice_ptp_tx_tstamp function. This function must already process the timestamps anyways. If a Tx timestamp has been waiting for 2 seconds we simply clear the bit and discard the SKB. This avoids the complication of having separate threads polling, reducing overall CPU work. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: always call ice_ptp_link_change and make it voidJacob Keller3-21/+19
The ice_ptp_link_change function is currently only called for E822 based hardware. Future changes are going to extend this function to perform additional tasks on link change. Always call this function, moving the E810 check from the callers down to just before we call the E822-specific function required to restart the PHY. This function also returns an error value, but none of the callers actually check it. In general, the errors it produces are more likely systemic problems such as invalid or corrupt port numbers. No caller checks these, and so no warning is logged. Re-order the flag checks so that ICE_FLAG_PTP is checked first. Drop the unnecessary check for ICE_FLAG_PTP_SUPPORTED, as ICE_FLAG_PTP will not be set except when ICE_FLAG_PTP_SUPPORTED is set. Convert the port checks to WARN_ON_ONCE, in order to generate a kernel stack trace when they are hit. Convert the function to void since no caller actually checks these return values. Co-developed-by: Dave Ertman <[email protected]> Signed-off-by: Dave Ertman <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: fix misuse of "link err" with "link status"Jacob Keller1-1/+1
The ice_ptp_link_change function has a comment which mentions "link err" when referring to the current link status. We are storing the status of whether link is up or down, which is not an error. It is appears that this use of err accidentally got included due to an overzealous search and replace when removing the ice_status enum and local status variable. Fix the wording to use the correct term. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: Reset TS memory for all quadsKarol Kolacinski3-27/+42
In E822 products, the owner PF should reset memory for all quads, not only for the one where assigned lport is. Signed-off-by: Karol Kolacinski <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: Remove the E822 vernier "bypass" logicMilena Olech3-145/+14
The E822 devices support an extended "vernier" calibration which enables higher precision timestamps by accounting for delays in the PHY, and compensating for them. These delays are measured by hardware as part of its vernier calibration logic. The driver currently starts the PHY in "bypass" mode which skips the compensation. Then it later attempts to switch from bypass to vernier. This unfortunately does not work as expected. Instead of properly compensating for the delays, the hardware continues operating in bypass without the improved precision expected. Because we cannot dynamically switch between bypass and vernier mode, refactor the driver to always operate in vernier mode. This has a slight downside: Tx timestamp and Rx timestamp requests that occur as the very first packet set after link up will not complete properly and may be reported to applications as missing timestamps. This occurs frequently in test environments where traffic is light or targeted specifically at testing PTP. However, in practice most environments will have transmitted or received some data over the network before such initial requests are made. Signed-off-by: Milena Olech <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-12-08ice: Use more generic names for ice_ptp_tx fieldsSergey Temerkhanov2-18/+19
Some supported devices have per-port timestamp memory blocks while others have shared ones within quads. Rename the struct ice_ptp_tx fields to reflect the block entities it works with Signed-off-by: Sergey Temerkhanov <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-11-30net: devlink: let the core report the driver name instead of the driversVincent Mailhol1-6/+0
The driver name is available in device_driver::name. Right now, drivers still have to report this piece of information themselves in their devlink_ops::info_get callback function. In order to factorize code, make devlink_nl_info_fill() add the driver name attribute. Now that the core sets the driver name attribute, drivers are not supposed to call devlink_info_driver_name_put() anymore. Remove devlink_info_driver_name_put() and clean-up all the drivers using this function in their callback. Signed-off-by: Vincent Mailhol <[email protected]> Tested-by: Ido Schimmel <[email protected]> # mlxsw Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30ice: implement direct read for NVM and Shadow RAM regionsJacob Keller1-0/+69
Implement the .read handler for the NVM and Shadow RAM regions. This enables user space to read a small chunk of the flash without needing the overhead of creating a full snapshot. Update the documentation for ice to detail which regions have direct read support. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30ice: use same function to snapshot both NVM and Shadow RAMJacob Keller1-72/+23
The ice driver supports a region for both the flat NVM contents as well as the Shadow RAM which is a layer built on top of the flash during device initialization. These regions use an almost identical read function, except that the NVM needs to set the direct flag when reading, while Shadow RAM needs to read without the direct flag set. They each call ice_read_flat_nvm with the only difference being whether to set the direct flash flag. The NVM region read function also was fixed to read the NVM in blocks to avoid a situation where the firmware reclaims the lock due to taking too long. Note that the region snapshot function takes the ops pointer so the function can easily determine which region to read. Make use of this and re-use the NVM snapshot function for both the NVM and Shadow RAM regions. This makes Shadow RAM benefit from the same block approach as the NVM region. It also reduces code in the ice driver. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-16/+16
tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-23ice: Use ICE_RLAN_BASE_S instead of magic numberAnatolii Gerasymenko1-1/+1
Commit 72adf2421d9b ("ice: Move common functions out of ice_main.c part 2/7") moved an older version of ice_setup_rx_ctx() function with usage of magic number 7. Reimplement the commit 5ab522443bd1 ("ice: Cleanup magic number") to use ICE_RLAN_BASE_S instead of magic number. Signed-off-by: Anatolii Gerasymenko <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-11-23ice: Fix configuring VIRTCHNL_OP_CONFIG_VSI_QUEUES with unbalanced queuesMarcin Szycik1-20/+17
Currently the VIRTCHNL_OP_CONFIG_VSI_QUEUES command may fail if there are less RX queues than TX queues requested. To fix it, only configure RXDID if RX queue exists. Fixes: e753df8fbca5 ("ice: Add support Flex RXD") Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-11-23ice: Accumulate ring statistics over resetBenjamin Mikailenko9-61/+351
Resets may occur with or without user interaction. For example, a TX hang or reconfiguration of parameters will result in a reset. During reset, the VSI is freed, freeing any statistics structures inside as well. This would create an issue for the user where a reset happens in the background, statistics set to zero, and the user checks ring statistics expecting them to be populated. To ensure this doesn't happen, accumulate ring statistics over reset. Define a new ring statistics structure, ice_ring_stats. The new structure lives in the VSI's parent, preserving ring statistics when VSI is freed. 1. Define a new structure vsi_ring_stats in the PF scope 2. Allocate/free stats only during probe, unload, or change in ring size 3. Replace previous ring statistics functionality with new structure Signed-off-by: Benjamin Mikailenko <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-11-23ice: Accumulate HW and Netdev statistics over resetBenjamin Mikailenko4-4/+37
Resets happen with or without user interaction. For example, incidents such as TX hang or a reconfiguration of parameters will result in a reset. During reset, hardware and software statistics were set to zero. This created an issue for the user where a reset happens in the background, statistics set to zero, and the user checks statistics expecting them to be populated. To ensure this doesn't happen, keep accumulating stats over reset. 1. Remove function calls which reset hardware and netdev statistics. 2. Do not rollover statistics in ice_stat_update40 during reset. Signed-off-by: Benjamin Mikailenko <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-11-23ice: Remove and replace ice speed defines with ethtool.h versionsBrett Creeley5-109/+69
The driver is currently using ICE_LINK_SPEED_* defines that mirror what ethtool.h defines, with one exception ICE_LINK_SPEED_UNKNOWN. This issue is fixed by the following changes: 1. replace ICE_LINK_SPEED_UNKNOWN with 0 because SPEED_UNKNOWN in ethtool.h is "-1" and that doesn't match the driver's expected behavior 2. transform ICE_LINK_SPEED_*MBPS to SPEED_* using static tables and fls()-1 to convert from BIT() to an index in a table. Suggested-by: Alexander Lobakin <[email protected]> Signed-off-by: Brett Creeley <[email protected]> Co-developed-by: Jesse Brandeburg <[email protected]> Signed-off-by: Jesse Brandeburg <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-11-23ice: Check for PTP HW lock more frequentlyKarol Kolacinski1-5/+7
It was observed that PTP HW semaphore can be held for ~50 ms in worst case. SW should wait longer and check more frequently if the HW lock is held. Signed-off-by: Karol Kolacinski <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-11-21ice: fix handling of burst Tx timestampsJacob Keller2-16/+16
Commit 1229b33973c7 ("ice: Add low latency Tx timestamp read") refactored PTP timestamping logic to use a threaded IRQ instead of a separate kthread. This implementation introduced ice_misc_intr_thread_fn and redefined the ice_ptp_process_ts function interface to return a value of whether or not the timestamp processing was complete. ice_misc_intr_thread_fn would take the return value from ice_ptp_process_ts and convert it into either IRQ_HANDLED if there were no more timestamps to be processed, or IRQ_WAKE_THREAD if the thread should continue processing. This is not correct, as the kernel does not re-schedule threaded IRQ functions automatically. IRQ_WAKE_THREAD can only be used by the main IRQ function. This results in the ice_ptp_process_ts function (and in turn the ice_ptp_tx_tstamp function) from only being called exactly once per interrupt. If an application sends a burst of Tx timestamps without waiting for a response, the interrupt will trigger for the first timestamp. However, later timestamps may not have arrived yet. This can result in dropped or discarded timestamps. Worse, on E822 hardware this results in the interrupt logic getting stuck such that no future interrupts will be triggered. The result is complete loss of Tx timestamp functionality. Fix this by modifying the ice_misc_intr_thread_fn to perform its own polling of the ice_ptp_process_ts function. We sleep for a few microseconds between attempts to avoid wasting significant CPU time. The value was chosen to allow time for the Tx timestamps to complete without wasting so much time that we overrun application wait budgets in the worst case. The ice_ptp_process_ts function also currently returns false in the event that the Tx tracker is not initialized. This would result in the threaded IRQ handler never exiting if it gets started while the tracker is not initialized. Fix the function to appropriately return true when the tracker is not initialized. Note that this will not reproduce with default ptp4l behavior, as the program always synchronously waits for a timestamp response before sending another timestamp request. Reported-by: Siddaraju DH <[email protected]> Fixes: 1229b33973c7 ("ice: Add low latency Tx timestamp read") Signed-off-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-17ice: Prevent ADQ, DCB coexistence with Custom Tx schedulerMichal Wilczynski6-0/+111
ADQ, DCB might interfere with Custom Tx Scheduler changes that user might introduce using devlink-rate API. Check if ADQ, DCB is active, when user tries to change any setting in exported Tx scheduler tree. If any of those are active block the user from doing so, and log an appropriate message. Remove the exported hierarchy if user enable ADQ or DCB. Prevent ADQ or DCB from getting configured if user already made some changes using devlink-rate API. Signed-off-by: Michal Wilczynski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>