aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice/ice_lib.c
AgeCommit message (Collapse)AuthorFilesLines
2021-10-15ice: introduce XDP_TX fallback pathMaciej Fijalkowski1-1/+3
Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: unify xdp_rings accessesMaciej Fijalkowski1-1/+1
There has been a long lasting issue of improper xdp_rings indexing for XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index is mixed with smp_processor_id(), there could be a situation where Tx descriptors are produced onto XDP Tx ring, but tail is never bumped - for example pin a particular queue id to non-matching IRQ line. Address this problem by ignoring the user ring count setting and always initialize the xdp_rings array to be of num_possible_cpus() size. Then, always use the smp_processor_id() as an index to xdp_rings array. This provides serialization as at given time only a single softirq can run on a particular CPU. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: split ice_ring onto Tx/Rx separate structsMaciej Fijalkowski1-24/+40
While it was convenient to have a generic ring structure that served both Tx and Rx sides, next commits are going to introduce several Tx-specific fields, so in order to avoid hurting the Rx side, let's pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs. Rx ring could be handled by the old ice_ring which would reduce the code churn within this patch, but this would make things asymmetric. Make the union out of the ring container within ice_q_vector so that it is possible to iterate over newly introduced ice_tx_ring. Remove the @size as it's only accessed from control path and it can be calculated pretty easily. Change definitions of ice_update_ring_stats and ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be used for both Rx and Tx rings. Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In Rx ring xdp_rxq_info occupies its own cacheline, so it's the major difference now. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: remove ring_active from ice_ringMaciej Fijalkowski1-2/+0
This field is dead and driver is not making any use of it. Simply remove it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Fix failure to re-add LAN/RDMA Tx queuesBrett Creeley1-0/+9
Currently if the VSI is rebuilt/removed and the RDMA PF driver is active the RDMA Tx queue scheduler node configuration will not be cleaned up. This will cause the rebuild/re-add of the VSI to fail due to the software structures not being correctly cleaned up for the VSI index. Fix this by always calling ice_rm_vsi_rdma_cfg() for all VSI. If there are no RDMA scheduler nodes created, then there is no harm in calling ice_rm_vsi_rdma_cfg(). This change applies to all VSI types, so if RDMA support is added for other VSI types they will also get this change. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Jerzy Wiktor Jurkowski <jerzy.wiktor.jurkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Implement support for SMA and U.FL on E810-TMaciej Machnikowski1-0/+15
Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and allow controlling them. E810-T adapters are equipped with: - 2 external bidirectional SMA connectors - 1 internal TX U.FL - 1 internal RX U.FL U.FL connectors share signal lines with the SMA connectors. The TX U.FL1 share the line with the SMA1 and the RX U.FL2 share line with the SMA2. This dependence is controlled by the ice_verify_pin_e810t. Additionally add support for the E810-T-based devices which don't use the SMA/U.FL controller. If the IO expander is not detected don't expose pins and use 2 predefined 1PPS input and output pins. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: introduce new type of VSI for switchdevGrzegorz Nitka1-1/+47
New type of VSI has to be defined for switchdev control plane VSI. Number of allocated Tx and Rx queue has to be equal to amount of VFs, because each port representor should have one Tx and Rx queue. Also to not increase number of used irqs too much, control plane VSI uses only one q_vector and handle all queues in one irq. To allow handling all queues in one irq , new function to clean msix for eswitch was introduced. This function will schedule napi for each representor instead of scheduling it only for one like in normal clean irq function. Only one additional msix has to be requested. Always try to request it in ice_ena_msix_range function. Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: manage VSI antispoof and destination overrideMichal Swiatkowski1-0/+61
Implement functions to make setting VSI security config easier. Main function ice_update_security fills security section field and checks against error in updating VSI. Reset functions are responsible for correct filling config according to user expectations. This helper is needed because destination override is located in this section. Driver has to set this bit to allow strering Tx packet on VSI based on value in Tx descriptors. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: Move devlink port to PF/VF structWojciech Drewek1-1/+2
Keeping devlink port inside VSI data structure causes some issues. Since VF VSI is released during reset that means that we have to unregister devlink port and register it again every time reset is triggered. With the new changes in devlink API it might cause deadlock issues. After calling devlink_port_register/devlink_port_unregister devlink API is going to lock rtnl_mutex. It's an issue when VF reset is triggered in netlink operation context (like setting VF MAC address or VLAN), because rtnl_lock is already taken by netlink. Another call of rtnl_lock from devlink API results in dead-lock. By moving devlink port to PF/VF we avoid creating/destroying it during reset. Since this patch, devlink ports are created during ice_probe, destroyed during ice_remove for PF and created during ice_repr_add, destroyed during ice_repr_rem for VF. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-09-28ice: Add feature bitmap, helpers and a check for DSCPAnirudh Venkataramanan1-0/+47
DSCP a.k.a L3 QoS is only supported on certain devices. To enforce this, this patch introduces a bitmap of features and helper functions. The feature bitmap is set based on device IDs on driver init. Currently, DSCP is the only feature in this bitmap, but there will be more in the future. In the DCB netlink flow, check if the feature bit is set before exercising DSCP. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-8/+10
Trivial conflicts in net/can/isotp.c and tools/testing/selftests/net/mptcp/mptcp_connect.sh scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c to include/linux/ptp_clock_kernel.h in -next so re-apply the fix there. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-17ice: reduce scope of variablesPaul M Stillwell Jr1-4/+4
There are some places where the scope of a variable can be reduced so do that. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-11ice: enable transmit timestamps for E810 devicesJacob Keller1-0/+1
Add support for enabling Tx timestamp requests for outgoing packets on E810 devices. The ice hardware can support multiple outstanding Tx timestamp requests. When sending a descriptor to hardware, a Tx timestamp request is made by setting a request bit, and assigning an index that represents which Tx timestamp index to store the timestamp in. Hardware makes no effort to synchronize the index use, so it is up to software to ensure that Tx timestamp indexes are not re-used before the timestamp is reported back. To do this, introduce a Tx timestamp tracker which will keep track of currently in-use indexes. In the hot path, if a packet has a timestamp request, an index will be requested from the tracker. Unfortunately, this does require a lock as the indexes are shared across all queues on a PHY. There are not enough indexes to reliably assign only 1 to each queue. For the E810 devices, the timestamp indexes are not shared across PHYs, so each port can have its own tracking. Once hardware captures a timestamp, an interrupt is fired. In this interrupt, trigger a new work item that will figure out which timestamp was completed, and report the timestamp back to the stack. This function loops through the Tx timestamp indexes and checks whether there is now a valid timestamp. If so, it clears the PHY timestamp indication in the PHY memory, locks and removes the SKB and bit in the tracker, then reports the timestamp to the stack. It is possible in some cases that a timestamp request will be initiated but never completed. This might occur if the packet is dropped by software or hardware before it reaches the PHY. Add a task to the periodic work function that will check whether a timestamp request is more than a few seconds old. If so, the timestamp index is cleared in the PHY, and the SKB is released. Just as with Rx timestamps, the Tx timestamps are only 40 bits wide, and use the same overall logic for extending to 64 bits of nanoseconds. With this change, E810 devices should be able to perform basic PTP functionality. Future changes will extend the support to cover the E822-based devices. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-11ice: enable receive hardware timestampingJacob Keller1-1/+7
Add SIOCGHWTSTAMP and SIOCSHWTSTAMP ioctl handlers to respond to requests to enable timestamping support. If the request is for enabling Rx timestamps, set a bit in the Rx descriptors to indicate that receive timestamps should be reported. Hardware captures receive timestamps in the PHY which only captures part of the timer, and reports only 40 bits into the Rx descriptor. The upper 32 bits represent the contents of GLTSYN_TIME_L at the point of packet reception, while the lower 8 bits represent the upper 8 bits of GLTSYN_TIME_0. The networking and PTP stack expect 64 bit timestamps in nanoseconds. To support this, implement some logic to extend the timestamps by using the full PHC time. If the Rx timestamp was captured prior to the PHC time, then the real timestamp is PHC - (lower_32_bits(PHC) - timestamp) If the Rx timestamp was captured after the PHC time, then the real timestamp is PHC + (timestamp - lower_32_bits(PHC)) These calculations are correct as long as neither the PHC timestamp nor the Rx timestamps are more than 2^32-1 nanseconds old. Further, we can detect when the Rx timestamp is before or after the PHC as long as the PHC timestamp is no more than 2^31-1 nanoseconds old. In that case, we calculate the delta between the lower 32 bits of the PHC and the Rx timestamp. If it's larger than 2^31-1 then the Rx timestamp must have been captured in the past. If it's smaller, then the Rx timestamp must have been captured after PHC time. Add an ice_ptp_extend_32b_ts function that relies on a cached copy of the PHC time and implements this algorithm to calculate the proper upper 32bits of the Rx timestamps. Cache the PHC time periodically in all of the Rx rings. This enables each Rx ring to simply call the extension function with a recent copy of the PHC time. By ensuring that the PHC time is kept up to date periodically, we ensure this algorithm doesn't use stale data and produce incorrect results. To cache the time, introduce a kworker and a kwork item to periodically store the Rx time. It might seem like we should use the .do_aux_work interface of the PTP clock. This doesn't work because all PFs must cache this time, but only one PF owns the PTP clock device. Thus, the ice driver will manage its own kthread instead of relying on the PTP do_aux_work handler. With this change, the driver can now report Rx timestamps on all incoming packets. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-11ice: add support for sideband messagesJacob Keller1-1/+10
In order to support certain device features, including enabling the PTP hardware clock, the ice driver needs to control some registers on the device PHY. These registers are accessed by sending sideband messages. For some hardware, these messages must be sent over the device admin queue, while other hardware has a dedicated control queue for the sideband messages. Add the neighbor device message structure for sending a message to the neighboring device. Where supported, initialize the sideband control queue and handle cleanup. Add a wrapper function for sending sideband control queue messages that read or write a neighboring device register. Because some devices send sideband messages over the AdminQ, also increase the length of the admin queue to allow more messages to be queued up. This is important because the sideband messages add additional pressure on the AQ usage. This support will be used in following patches to enable support for CONFIG_1588_PTP_CLOCK. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-09ice: parameterize functions responsible for Tx ring managementMaciej Fijalkowski1-8/+10
Commit ae15e0ba1b33 ("ice: Change number of XDP Tx queues to match number of Rx queues") tried to address the incorrect setting of XDP queue count that was based on the Tx queue count, whereas in theory we should provide the XDP queue per Rx queue. However, the routines that setup and destroy the set of Tx resources are still based on the vsi->num_txq. Ice supports the asynchronous Tx/Rx queue count, so for a setup where vsi->num_txq > vsi->num_rxq, ice_vsi_stop_tx_rings and ice_vsi_cfg_txqs will be accessing the vsi->xdp_rings out of the bounds. Parameterize two mentioned functions so they get the size of Tx resources array as the input. Fixes: ae15e0ba1b33 ("ice: Change number of XDP Tx queues to match number of Rx queues") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-07Merge branch '100GbE' of ↵David S. Miller1-14/+67
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-06-07 This series contains updates to virtchnl header file and ice driver. Brett adds capability bits to virtchnl to specify whether a primary or secondary MAC address is being requested and adds the implementation to ice. He also adds storing of VF MAC address so that it will be preserved across reboots of VM and refactors VF queue configuration to remove the expectation that configuration be done all at once. Krzysztof refactors ice_setup_rx_ctx() to remove configuration not related to Rx context into a new function, ice_vsi_cfg_rxq(). Liwei Song extends the wait time for the global config timeout. Salil Mehta refactors code in ice_vsi_set_num_qs() to remove an unnecessary call when the user has requested specific number of Rx or Tx queues. Jesse converts define macros to static inlines for NOP configurations. Jake adds messaging when devlink fails to read device capabilities and when pldmfw cannot find the requested firmware. Adds a wait for reset completion when reporting devlink info and reinitializes NVM during rebuild to ensure values are current. Ani adds detection and reporting of modules exceeding supported power levels and changes an error message to a debug message. Paul fixes a clang warning for deadcode.DeadStores. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+12
Bug fixes overlapping feature additions and refactoring, mostly. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07ice: downgrade error print to debug printAnirudh Venkataramanan1-1/+1
Failing to add or remove LLDP filter doesn't seem to be a fatal error, so downgrade the dev_err message to a dev_dbg message. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-07ice: wait for reset before reporting devlink infoJacob Keller1-0/+28
Requesting device firmware information while the device is busy cleaning up after a reset can result in an unexpected failure: This occurs because the command is attempting to access the device AdminQ while it is down. Resolve this by having the command wait for a while until the reset is complete. To do this, introduce a reset_wait_queue and associated helper function "ice_wait_for_reset". This helper will use the wait queue to sleep until the driver is done rebuilding. Use of a wait queue is preferred because the potential sleep duration can be several seconds. To ensure that the thread wakes up properly, a new wake_up call is added during all code paths which clear the reset state bits associated with the driver rebuild flow. Using this ensures that tools can request device information without worrying about whether the driver is cleaning up from a reset. Specifically, it is expected that a flash update could result in a device reset, and it is better to delay the response for information until the reset is complete rather than exit with an immediate failure. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-07ice: Re-organizes reqstd/avail {R, T}XQ check/code for efficiencySalil Mehta1-6/+8
If user has explicitly requested the number of {R,T}XQs, then it is unnecessary to get the count of already available {R,T}XQs from the PF avail_{r,t}xqs bitmap. This value will get overridden by user specified value in any case. Re-organize this code for improving the flow, readability and efficiency. This scope of improvement was found during the review of the ICE driver code. Fixes: 87324e747fde ("ice: Implement ethtool ops for channels") Cc: intel-wired-lan@lists.osuosl.org Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-07ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handlingBrett Creeley1-0/+27
Currently, when a VF requests queue configuration via VIRTCHNL_OP_CONFIG_VSI_QUEUES the PF driver expects that this message will only be called once and we always assume the queues being configured start from 0. This is incorrect and is causing issues when a VF tries to send this message for multiple queue blocks. Fix this by using the queue_id specified in the virtchnl message and allowing for individual Rx and/or Tx queues to be configured. Also, reduce the duplicated for loops for configuring the queues by moving all the logic into a single for loop. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-07ice: Refactor ice_setup_rx_ctxKrzysztof Kazimierczak1-7/+3
Move AF_XDP logic and buffer allocation out of ice_setup_rx_ctx() to a new function ice_vsi_cfg_rxq(), so the function actually sets up the Rx context. Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Co-developed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
2021-06-04ice: Fix allowing VF to request more/less queues via virtchnlBrett Creeley1-0/+2
Commit 12bb018c538c ("ice: Refactor VF reset") caused a regression that removes the ability for a VF to request a different amount of queues via VIRTCHNL_OP_REQUEST_QUEUES. This prevents VF drivers to either increase or decrease the number of queue pairs they are allocated. Fix this by using the variable vf->num_req_qs when determining the vf->num_vf_qs during VF VSI creation. Fixes: 12bb018c538c ("ice: Refactor VF reset") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03ice: track AF_XDP ZC enabled queues in bitmapMaciej Fijalkowski1-0/+10
Commit c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") silently introduced a regression and broke the Tx side of AF_XDP in copy mode. xsk_pool on ice_ring is set only based on the existence of the XDP prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed. That is not something that should happen for copy mode as it should use the regular data path ice_clean_tx_irq. This results in a following splat when xdpsock is run in txonly or l2fwd scenarios in copy mode: <snip> [ 106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 106.057269] #PF: supervisor read access in kernel mode [ 106.062493] #PF: error_code(0x0000) - not-present page [ 106.067709] PGD 0 P4D 0 [ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45 [ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50 [ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00 [ 106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206 [ 106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800 [ 106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800 [ 106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800 [ 106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff [ 106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018 [ 106.157117] FS: 0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000 [ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0 [ 106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.192898] PKRU: 55555554 [ 106.195653] Call Trace: [ 106.198143] <IRQ> [ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice] [ 106.205087] ice_napi_poll+0x3e/0x590 [ice] [ 106.209356] __napi_poll+0x2a/0x160 [ 106.212911] net_rx_action+0xd6/0x200 [ 106.216634] __do_softirq+0xbf/0x29b [ 106.220274] irq_exit_rcu+0x88/0xc0 [ 106.223819] common_interrupt+0x7b/0xa0 [ 106.227719] </IRQ> [ 106.229857] asm_common_interrupt+0x1e/0x40 </snip> Fix this by introducing the bitmap of queues that are zero-copy enabled, where each bit, corresponding to a queue id that xsk pool is being configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and checked within ice_xsk_pool(). The latter is a function used for deciding which napi poll routine is executed. Idea is being taken from our other drivers such as i40e and ixgbe. Fixes: c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-05-28ice: Initialize RDMA supportDave Ertman1-0/+11
Probe the device's capabilities to see if it supports RDMA. If so, allocate and reserve resources to support its operation; populate structures with initial values. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: remove return variablePaul M Stillwell Jr1-5/+3
We were saving the return value from ice_vsi_manage_rss_lut(), but the errors from that function are not critical so change it to return void and remove the code that saved the value. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: Set vsi->vf_id as ICE_INVAL_VFID for non VF VSI typesBrett Creeley1-0/+2
Currently the vsi->vf_id is set only for ICE_VSI_VF and it's left as 0 for all other VSI types. This is confusing and could be problematic since 0 is a valid vf_id. Fix this by always setting non VF VSI types to ICE_INVAL_VFID. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: use local for consistencyJesse Brandeburg1-5/+7
Do a minor refactor on ice_vsi_rebuild to use a local variable to store vsi->type. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: refactor ITR data structuresJesse Brandeburg1-7/+0
Use a dedicated bitfield in order to both increase the amount of checking around the length of ITR writes as well as simplify the checks of dynamic mode. Basically unpack the "high bit means dynamic" logic into bitfields. Also, remove some unused ITR defines. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: replace custom AIM algorithm with kernel's DIM libraryJacob Keller1-8/+5
The ice driver has support for adaptive interrupt moderation, an algorithm for tuning the interrupt rate dynamically. This algorithm is based on various assumptions about ring size, socket buffer size, link speed, SKB overhead, ethernet frame overhead and more. The Linux kernel has support for a dynamic interrupt moderation algorithm known as "dimlib". Replace the custom driver-specific implementation of dynamic interrupt moderation with the kernel's algorithm. The Intel hardware has a different hardware implementation than the originators of the dimlib code had to work with, which requires the driver to use a slightly different set of inputs for the actual moderation values, while getting all the advice from dimlib of better/worse, shift left or right. The change made for this implementation is to use a pair of values for each of the 5 "slots" that the dimlib moderation expects, and the driver will program those pairs when dimlib recommends a slot to use. The currently implementation uses two tables, one for receive and one for transmit, and the pairs of values in each slot set the maximum delay of an interrupt and a maximum number of interrupts per second (both expressed in microseconds). There are two separate kinds of bugs fixed by using DIMLIB, one is UDP single stream send was too slow, and the other is that 8K ping-pong was going to the most aggressive moderation and has much too high latency. The overall result of using DIMLIB is that we meet or exceed our performance expectations set based on the old algorithm. Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: refactor interrupt moderation writesJesse Brandeburg1-76/+95
Introduce several new helpers for writing ITR and GLINT_RATE registers, and refactor the code calling them. This resulted in removal of several duplicate functions and rolled a bunch of simple code back into the calling routines. In particular this removes some code that was doing both a store and a set in a helper function, which seems better done as separate tasks in the caller (and generally takes less lines of code even with a tiny bit of repetition). Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: Add new VSI states to track netdev alloc/registrationAnirudh Venkataramanan1-6/+15
Add two new VSI states, one to track if a netdev for the VSI has been allocated and the other to track if the netdev has been registered. Call unregister_netdev/free_netdev only when the corresponding state bits are set. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14ice: Drop leading underscores in enum ice_pf_stateAnirudh Venkataramanan1-9/+9
Remove the leading underscores in enum ice_pf_state. This is not really communicating anything and is unnecessary. No functional change. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+2
Conflicts: MAINTAINERS - keep Chandrasekar drivers/net/ethernet/mellanox/mlx5/core/en_main.c - simple fix + trust the code re-added to param.c in -next is fine include/linux/bpf.h - trivial include/linux/ethtool.h - trivial, fix kdoc while at it include/linux/skmsg.h - move to relevant place in tcp.c, comment re-wrapped net/core/skmsg.c - add the sk = sk // sk = NULL around calls net/tipc/crypto.c - trivial Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-07ice: Ignore EMODE return for opcode 0x0605Anirudh Venkataramanan1-0/+37
When link is owned by manageability, the driver is not allowed to fiddle with link. FW returns ICE_AQ_RC_EMODE if the driver attempts to do so. This patch adds a new function ice_set_link which abstracts the call to ice_aq_set_link_restart_an and provides a clean way to turn on/off link. While making this change, I also spotted that an int variable was being used to hold both an ice_status return code and the Linux errno return code. This pattern more often than not results in the driver inadvertently returning ice_status back to kernel which is a major boo-boo. Clean it up. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: Consolidate VSI state and flagsAnirudh Venkataramanan1-7/+7
struct ice_vsi has two fields, state and flags which seem to be serving the same purpose. Consolidate them into one field 'state'. enum ice_state is used to represent state information of the PF. While some of these enum values can be use to represent VSI state, it makes more sense to represent VSI state with its own enum. So derive a new enum ice_vsi_state from ice_vsi_flags and ice_state and use it. Also rename enum ice_state to ice_pf_state for clarity. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: Refactor ice_set/get_rss into LUT and key specific functionsBrett Creeley1-30/+12
Currently ice_set/get_rss are used to set/get the RSS LUT and/or RSS key. However nearly everywhere these functions are called only the LUT or key are set/get. Also, making this change reduces how many things ice_set/get_rss are doing. Fix this by adding ice_set/get_rss_lut and ice_set/get_rss_key functions. Also, consolidate all calls for setting/getting the RSS LUT and RSS Key to use ice_set/get_rss_lut() and ice_set/get_rss_key(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: Refactor get/set RSS LUT to use struct parameterBrett Creeley1-2/+7
Update ice_aq_get_rss_lut() and ice_aq_set_rss_lut() to take a new structure ice_aq_get_set_rss_params instead of passing individual parameters. This is done for 2 reasons: 1. Reduce the number of parameters passed to the functions. 2. Reduce the amount of change required if the arguments ever need to be updated in the future. Also, reduce duplicate code that was checking for an invalid vsi_handle and lut parameter by moving the checks to the lower level __ice_aq_get_set_rss_lut(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: Change ice_vsi_setup_q_map() to not depend on RSSBrett Creeley1-34/+16
Currently, ice_vsi_setup_q_map() depends on the VSI's rss_size. However, the Rx Queue Mapping section of the VSI context has no dependency on RSS. Instead, limit the maximum number of Rx queues per TC based on the Rx Queue mapping section of the VSI context, which currently allows for up to 256 Rx queues per TC. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31ice: handle increasing Tx or Rx ring sizesPaul M Stillwell Jr1-33/+90
There is an issue when the Tx or Rx ring size increases using 'ethtool -L ...' where the new rings don't get the correct ITR values because when we rebuild the VSI we don't know that some of the rings may be new. Fix this by looking at the original number of rings and determining if the rings in ice_vsi_rebuild_set_coalesce() were not present in the original rings received in ice_vsi_rebuild_get_coalesce(). Also change the code to return an error if we can't allocate memory for the coalesce data in ice_vsi_rebuild(). Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-29ice: remove DCBNL_DEVRESET bit from PF stateDave Ertman1-1/+0
The original purpose of the ICE_DCBNL_DEVRESET was to protect the driver during DCBNL device resets. But, the flow for DCBNL device resets now consists of only calls up the stack such as dev_close() and dev_open() that will result in NDO calls to the driver. These will be handled with state changes from the stack. Also, there is a problem of the dev_close and dev_open being blocked by checks for reset in progress also using the ICE_DCBNL_DEVRESET bit. Since the ICE_DCBNL_DEVRESET bit is not necessary for protecting the driver from DCBNL device resets and it is actually blocking changes coming from the DCBNL interface, remove the bit from the PF state and don't block driver function based on DCBNL reset in progress. Fixes: b94b013eb626 ("ice: Implement DCBNL support") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-29ice: prevent ice_open and ice_stop during resetKrzysztof Goreczny1-2/+2
There is a possibility of race between ice_open or ice_stop calls performed by OS and reset handling routine both trying to modify VSI resources. Observed scenarios: - reset handler deallocates memory in ice_vsi_free_arrays and ice_open tries to access it in ice_vsi_cfg_txq leading to driver crash - reset handler deallocates memory in ice_vsi_free_arrays and ice_close tries to access it in ice_down leading to driver crash - reset handler clears port scheduler topology and sets port state to ICE_SCHED_PORT_STATE_INIT leading to ice_ena_vsi_txq fail in ice_open To prevent this additional checks in ice_open and ice_stop are introduced to make sure that OS is not allowed to alter VSI config while reset is in progress. Fixes: cdedef59deb0 ("ice: Configure VSIs for Tx/Rx") Signed-off-by: Krzysztof Goreczny <krzysztof.goreczny@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-22ice: Add support for per VF ctrl VSI enablingQi Zhang1-8/+56
We are going to enable FDIR configure for AVF through virtual channel. The first step is to add helper functions to support control VSI setup. A control VSI will be allocated for a VF when AVF creates its first FDIR rule through ice_vf_ctrl_vsi_setup(). The patch will also allocate FDIR rule space for VF's control VSI. If a VF asks for flow director rules, then those should come entirely from the best effort pool and not from the guaranteed pool. The patch allow a VF VSI to have only space in the best effort rules. Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> Signed-off-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Tested-by: Chen Bo <BoX.C.Chen@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: Refactor DCB related variables out of the ice_port_info structChinh T Cao1-1/+1
Refactor the DCB related variables out of the ice_port_info_struct. The goal is to make the ice_port_info struct cleaner. Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: create scheduler aggregator node config and move VSIsKiran Patil1-0/+125
Create set scheduler aggregator node and move for VSIs into respective scheduler node. Max children per aggregator node is 64. There are two types of aggregator node(s) created. 1. dedicated node for PF and _CTRL VSIs 2. dedicated node(s) for VFs. As part of reset and rebuild, aggregator nodes are recreated and VSIs are moved to respective aggregator node. Having related VSIs in respective tree avoid starvation between PF and VF w.r.t Tx bandwidth. Co-developed-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Tarun Singh <tarun.k.singh@intel.com> Co-developed-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Victor Raj <victor.raj@intel.com> Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: Add initial support framework for LAGDave Ertman1-0/+2
Add the framework and initial implementation for receiving and processing netdev bonding events. This is only the software support and the implementation of the HW offload for bonding support will be coming at a later time. There are some architectural gaps that need to be closed before that happens. Because this is a software only solution that supports in kernel bonding, SR-IOV is not supported with this implementation. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08ice: implement new LLDP filter commandDave Ertman1-3/+10
There is an issue with some NVMs where an already existent LLDP filter is blocking the creation of a filter to allow LLDP packets to be redirected to the default VSI for the interface. This is blocking all LLDP functionality based in the kernel when the FW LLDP agent is disabled (e.g. software based DCBx). Implement the new AQ command to allow adding VSI destinations to existent filters on NVM versions that support the new command. The new lldp_fltr_ctrl AQ command supports Rx filters only, so the code flow for adding filters to disable Tx of control frames will remain intact. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-01-26ice: Fix MSI-X vector fallback logicBrett Creeley1-5/+9
The current MSI-X enablement logic tries to enable best-case MSI-X vectors and if that fails we only support a bare-minimum set. This includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X for the OICR interrupt. Unfortunately, the driver fails to load when we don't get as many MSI-X as requested for a couple reasons. First, the code to allocate MSI-X in the driver tries to allocate num_online_cpus() MSI-X for LAN traffic without caring about the number of MSI-X actually enabled/requested from the kernel for LAN traffic. So, when calling ice_get_res() for the PF VSI, it returns failure because the number of available vectors is less than requested. Fix this by not allowing the PF VSI to allocation more than pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues. Limiting the number of queues is done because we don't want more than 1 Tx/Rx queue per interrupt due to performance conerns. Second, the driver assigns pf->num_lan_msix = 2, to account for LAN traffic and the OICR. However, pf->num_lan_msix is only meant for LAN MSI-X. This is causing a failure when the PF VSI tries to allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has already been reserved, so there may not be enough MSI-X vectors left. Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for the failure case. Update the related defines used in ice_ena_msix_range() to align with the above behavior and remove the unused RDMA defines because RDMA is currently not supported. Also, remove the now incorrect comment. Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-10-09ice: refactor devlink_port to be per-VSIJacob Keller1-1/+4
Currently, the devlink_port structure is stored within the ice_pf. This made sense because we create a single devlink_port for each PF. This setup does not mesh with the abstractions in the driver very well, and led to a flow where we accidentally call devlink_port_unregister twice during error cleanup. In particular, if devlink_port_register or devlink_port_unregister are called twice, this leads to a kernel panic. This appears to occur during some possible flows while cleaning up from a failure during driver probe. If register_netdev fails, then we will call devlink_port_unregister in ice_cfg_netdev as it cleans up. Later, we again call devlink_port_unregister since we assume that we must cleanup the port that is associated with the PF structure. This occurs because we cleanup the devlink_port for the main PF even though it was not allocated. We allocated the port within a per-VSI function for managing the main netdev, but did not release the port when cleaning up that VSI, the allocation and destruction are not aligned. Instead of attempting to manage the devlink_port as part of the PF structure, manage it as part of the PF VSI. Doing this has advantages, as we can match the de-allocation of the devlink_port with the unregister_netdev associated with the main PF VSI. Moving the port to the VSI is preferable as it paves the way for handling devlink ports allocated for other purposes such as SR-IOV VFs. Since we're changing up how we allocate the devlink_port, also change the indexing. Originally, we indexed the port using the PF id number. This came from an old goal of sharing a devlink for each physical function. Managing devlink instances across multiple function drivers is not workable. Instead, lets set the port number to the logical port number returned by firmware and set the index using the VSI index (sometimes referred to as VSI handle). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>