aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2023-02-01ice: Remove next_{dd,rs} fields from ice_tx_ringMaciej Fijalkowski4-8/+0
Now that both ZC and standard XDP data paths stopped using Tx logic based on next_dd and next_rs fields, we can safely remove these fields and shrink Tx ring structure. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Add support for XDP multi-buffer on Tx sideMaciej Fijalkowski5-96/+199
Similarly as for Rx side in previous patch, logic on XDP Tx in ice driver needs to be adjusted for multi-buffer support. Specifically, the way how HW Tx descriptors are produced and cleaned. Currently, XDP_TX works on strict ring boundaries, meaning it sets RS bit (on producer side) / looks up DD bit (on consumer/cleaning side) every quarter of the ring. It means that if for example multi buffer frame would span across the ring quarter boundary (say that frame consists of 4 frames and we start from 62 descriptor where ring is sized to 256 entries), RS bit would be produced in the middle of multi buffer frame, which would be a broken behavior as it needs to be set on the last descriptor of the frame. To make it work, set RS bit at the last descriptor from the batch of frames that XDP_TX action was used on and make the first entry remember the index of last descriptor with RS bit set. This way, cleaning side can take the index of descriptor with RS bit, look up DD bit's presence and clean from first entry to last. In order to clean up the code base introduce the common ice_set_rs_bit() which will return index of descriptor that got RS bit produced on so that standard driver can store this within proper ice_tx_buf and ZC driver can simply ignore return value. Co-developed-by: Martyna Szapar-Mudlaw <[email protected]> Signed-off-by: Martyna Szapar-Mudlaw <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Add support for XDP multi-buffer on Rx sideMaciej Fijalkowski6-98/+200
Ice driver needs to be a bit reworked on Rx data path in order to support multi-buffer XDP. For skb path, it currently works in a way that Rx ring carries pointer to skb so if driver didn't manage to combine fragmented frame at current NAPI instance, it can restore the state on next instance and keep looking for last fragment (so descriptor with EOP bit set). What needs to be achieved is that xdp_buff needs to be combined in such way (linear + frags part) in the first place. Then skb will be ready to go in case of XDP_PASS or BPF program being not present on interface. If BPF program is there, it would work on multi-buffer XDP. At this point xdp_buff resides directly on Rx ring, so given the fact that skb will be built straight from xdp_buff, there will be no further need to carry skb on Rx ring. Besides removing skb pointer from Rx ring, lots of members have been moved around within ice_rx_ring. First and foremost reason was to place rx_buf with xdp_buff on the same cacheline. This means that once we touch rx_buf (which is a preceding step before touching xdp_buff), xdp_buff will already be hot in cache. Second thing was that xdp_rxq is used rather rarely and it occupies a separate cacheline, so maybe it is better to have it at the end of ice_rx_ring. Other change that affects ice_rx_ring is the introduction of ice_rx_ring::first_desc. Its purpose is twofold - first is to propagate rx_buf->act to all the parts of current xdp_buff after running XDP program, so that ice_put_rx_buf() that got moved out of the main Rx processing loop will be able to tak an appriopriate action on each buffer. Second is for ice_construct_skb(). ice_construct_skb() has a copybreak mechanism which had an explicit impact on xdp_buff->skb conversion in the new approach when legacy Rx flag is toggled. It works in a way that linear part is 256 bytes long, if frame is bigger than that, remaining bytes are going as a frag to skb_shared_info. This means while memcpying frags from xdp_buff to newly allocated skb, care needs to be taken when picking the destination frag array entry. Upon the time ice_construct_skb() is called, when dealing with fragmented frame, current rx_buf points to the *last* fragment, but copybreak needs to be done against the first one. That's where ice_rx_ring::first_desc helps. When frame building spans across NAPI polls (DD bit is not set on current descriptor and xdp->data is not NULL) with current Rx buffer handling state there might be some problems. Since calls to ice_put_rx_buf() were pulled out of the main Rx processing loop and were scoped from cached_ntc to current ntc, remember that now mentioned function relies on rx_buf->act, which is set within ice_run_xdp(). ice_run_xdp() is called when EOP bit was found, so currently we could put Rx buffer with rx_buf->act being *uninitialized*. To address this, change scoping to rely on first_desc on both boundaries instead. This also implies that cleaned_count which is used as an input to ice_alloc_rx_buffers() and tells how many new buffers should be refilled has to be adjusted. If it stayed as is, what could happen is a case where ntc would go over ntu. Therefore, remove cleaned_count altogether and use against allocing routine newly introduced ICE_RX_DESC_UNUSED() macro which is an equivalent of ICE_DESC_UNUSED() dedicated for Rx side and based on struct ice_rx_ring::first_desc instead of next_to_clean. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Use xdp->frame_sz instead of recalculating truesizeMaciej Fijalkowski1-24/+9
SKB path calculates truesize on three different functions, which could be avoided as xdp_buff carries the already calculated truesize under xdp_buff::frame_sz. If ice_add_rx_frag() is adjusted to take the xdp_buff as an input just like functions responsible for creating sk_buff initially, codebase could be simplified by removing these redundant recalculations and rely on xdp_buff::frame_sz instead. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Do not call ice_finalize_xdp_rx() unnecessarilyMaciej Fijalkowski1-1/+1
Currently ice_finalize_xdp_rx() is called only when xdp_prog is present on VSI, which is a good thing. However, this optimization can be enhanced and check only if any of the XDP_TX/XDP_REDIRECT took place in current Rx processing. Non-zero value of @xdp_xmit indicates that xdp_prog is present on VSI. This way XDP_DROP-based workloads will not suffer from unnecessary calls to external function. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Use ice_max_xdp_frame_size() in ice_xdp_setup_prog()Maciej Fijalkowski1-14/+14
This should have been used in there from day 1, let us address that before introducing XDP multi-buffer support for Rx side. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Centrallize Rx buffer recyclingMaciej Fijalkowski2-38/+44
Currently calls to ice_put_rx_buf() are sprinkled through ice_clean_rx_irq() - first place is for explicit flow director's descriptor handling, second is after running XDP prog and the last one is after taking care of skb. 1st callsite was actually only for ntc bump purpose, as Rx buffer to be recycled is not even passed to a function. It is possible to walk through Rx buffers processed in particular NAPI cycle by caching ntc from beginning of the ice_clean_rx_irq(). To do so, let us store XDP verdict inside ice_rx_buf, so action we need to take on will be known. For XDP prog absence, just store ICE_XDP_PASS as a verdict. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Inline eop checkMaciej Fijalkowski2-21/+22
This might be in future used by ZC driver and might potentially yield a minor performance boost. While at it, constify arguments that ice_is_non_eop() takes, since they are pointers and this will help compiler while generating asm. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Pull out next_to_clean bump out of ice_put_rx_buf()Maciej Fijalkowski1-13/+16
Plan is to move ice_put_rx_buf() to the end of ice_clean_rx_irq() so in order to keep the ability of walking through HW Rx descriptors, pull out next_to_clean handling out of ice_put_rx_buf(). Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Store page count inside ice_rx_bufMaciej Fijalkowski2-17/+12
This will allow us to avoid carrying additional auxiliary array of page counts when dealing with XDP multi buffer support. Previously combining fragmented frame to skb was not affected in the same way as XDP would be as whole frame is needed to be in place before executing XDP prog. Therefore, when going through HW Rx descriptors one-by-one, calls to ice_put_rx_buf() need to be taken *after* running XDP prog on a potentially multi buffered frame, so some additional storage of page count is needed. By adding page count to rx buf, it will make it easier to walk through processed entries at the end of rx cleaning routine and decide whether or not buffers should be recycled. While at it, bump ice_rx_buf::pagecnt_bias from u16 up to u32. It was proven many times that calculations on variables smaller than standard register size are harmful. This was also the case during experiments with embedding page count to ice_rx_buf - when this was added as u16 it had a performance impact. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Add xdp_buff to ice_rx_ring structMaciej Fijalkowski3-16/+25
In preparation for XDP multi-buffer support, let's store xdp_buff on Rx ring struct. This will allow us to combine fragmented frames across separate NAPI cycles in the same way as currently skb fragments are handled. This means that skb pointer on Rx ring will become redundant and will be removed. For now it is kept and layout of Rx ring struct was not inspected, some member movement will be needed later on so that will be the time to take care of it. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-01ice: Prepare legacy-rx for upcoming XDP multi-buffer supportMaciej Fijalkowski5-23/+17
Rx path is going to be modified in a way that fragmented frame will be gathered within xdp_buff in the first place. This approach implies that underlying buffer has to provide tailroom for skb_shared_info. This is currently the case when ring uses build_skb but not when legacy-rx knob is turned on. This case configures 2k Rx buffers and has no way to provide either headroom or tailroom - FWIW it currently has XDP_PACKET_HEADROOM which is broken and in here it is removed. 2k Rx buffers were used so driver in this setting was able to support 9k MTU as it can chain up to 5 Rx buffers. With offset configuring HW writing 2k of a data was passing the half of the page which broke the assumption of our internal page recycling tricks. Now if above got fixed and legacy-rx path would be left as is, when referring to skb_shared_info via xdp_get_shared_info_from_buff(), packet's content would be corrupted again. Hence size of Rx buffer needs to be lowered and therefore supported MTU. This operation will allow us to keep the unified data path and with 8k MTU users (if any of legacy-rx) would still be good to go. However, tendency is to drop the support for this code path at some point. Add ICE_RXBUF_1664 as vsi::rx_buf_len and ICE_MAX_FRAME_LEGACY_RX (8320) as vsi::max_frame for legacy-rx. For bigger page sizes configure 3k Rx buffers, not 2k. Since headroom support is removed, disable data_meta support on legacy-rx. When preparing XDP buff, rely on ice_rx_ring::rx_offset setting when deciding whether to support data_meta or not. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-01-31Merge tag 'mlx5-updates-2023-01-30' of ↵Jakub Kicinski14-74/+835
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-01-30 Add fast update encryption key Jianbo Liu Says: ================ Data encryption keys (DEKs) are the keys used for data encryption and decryption operations. Starting from version 22.33.0783, firmware is optimized to accelerate the update of user keys into DEK object in hardware. The support for bulk allocation and destruction of DEK objects is added, and the bulk allocated DEKs are uninitialized, as the bulk creation requires no input key. When offload encryption/decryption, user gets one object from a bulk, and updates key by a new "modify DEK" command. This command is the same as create DEK object, but requires no heavy context memory allocation in firmware, which consumes most cpu cycles of the create DEK command. DEKs are cached internally by the NIC, so invalidating internal NIC caches is required before reusing DEKs. The SYNC_CRYPTO command is added to support it. DEK object can be reused, the keys in it can be updated after this command is executed. This patchset enhances the key creation and destruction flow, to get use of this new feature. Any user, for example, ktls, ipsec and macsec, can use it to offload keys. But, only ktls uses it, as others don't need many keys, and caching two many DEKs in pool is wasteful. There are two new data struts added: a. DEK pool. One pool is created for each key type. The bulks by the type, are placed in the pool's different bulk lists, according to the number of available and in_used DEKs in the bulk. b. DEK bulk. All DEKs in one bulk allocation are store here. There are two bitmaps to indicate the state of each DEK. New APIs are then added. When user need a DEK object, a. Fetch one bulk with avail DEKs, from the partial_list or avail_list, otherwise create new one. b. Pick one DEK, and set its need_sync and in_used bits to 1. Move the bulk to full_list if no more available keys, or put it to partial_list if the bulk is newly created. c. Update DEK object's key with user key, by the "modify DEK" command. d. Return DEK struct to user, then it gets the object id and fills it into the offload commands. When user free a DEK, a. Set in_use bit to 0. If all need_sync bits are 1 and all in_use bits of this bulk are 0, move it to sync_list. b. If the number of DEKs, which are freed by users, is over the threshold (128), schedule a workqueue to do the sync process. For the sync process, the SYNC_CRYPTO command is executed first. Then, for each bulks in partial_list, full_list and sync_list, reset need_sync bits of the freed DEK objects. If all need_sync bits in one bulk are zero, move it to avail_list. We already supported TIS pool to recycle the TISes. With this series and TIS pool, TLS CPS performance is improved greatly. And we tested https on the system: CPU: dual AMD EPYC 7763 64-Core processors RAM: 512G DEV: ConnectX-6 DX, with FW ver 22.33.0838 and TLS_OPTIMISE=true TLS CPS performance numbers are: Before: 11k connections/sec After: 101 connections/sec ================ * tag 'mlx5-updates-2023-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: kTLS, Improve connection rate by using fast update encryption key net/mlx5: Keep only one bulk of full available DEKs net/mlx5: Add async garbage collector for DEK bulk net/mlx5: Reuse DEKs after executing SYNC_CRYPTO command net/mlx5: Use bulk allocation for fast update encryption key net/mlx5: Add bulk allocation and modify_dek operation net/mlx5: Add support SYNC_CRYPTO command net/mlx5: Add new APIs for fast update encryption key net/mlx5: Refactor the encryption key creation net/mlx5: Add const to the key pointer of encryption key creation net/mlx5: Prepare for fast crypto key update if hardware supports it net/mlx5: Change key type to key purpose net/mlx5: Add IFC bits and enums for crypto key net/mlx5: Add IFC bits for general obj create param net/mlx5: Header file for crypto ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-31net: fman: memac: free mdio device if lynx_pcs_create() failsVladimir Oltean1-0/+3
When memory allocation fails in lynx_pcs_create() and it returns NULL, there remains a dangling reference to the mdiodev returned by of_mdio_find_device() which is leaked as soon as memac_pcs_create() returns empty-handed. Fixes: a7c2a32e7f22 ("net: fman: memac: Use lynx pcs driver") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Sean Anderson <[email protected]> Acked-by: Madalin Bucur <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-31Merge branch '10GbE' of ↵Jakub Kicinski8-39/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN: Remove redundant Device Control Error Reporting Enable Bjorn Helgaas says: Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core sets the Device Control bits that enable error reporting for PCIe devices. This series removes redundant calls to pci_enable_pcie_error_reporting() that do the same thing from several NIC drivers. There are several more drivers where this should be removed; I started with just the Intel drivers here. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ixgbe: Remove redundant pci_enable_pcie_error_reporting() igc: Remove redundant pci_enable_pcie_error_reporting() igb: Remove redundant pci_enable_pcie_error_reporting() ice: Remove redundant pci_enable_pcie_error_reporting() iavf: Remove redundant pci_enable_pcie_error_reporting() i40e: Remove redundant pci_enable_pcie_error_reporting() fm10k: Remove redundant pci_enable_pcie_error_reporting() e1000e: Remove redundant pci_enable_pcie_error_reporting() ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-31net: ethernet: mtk_eth_soc: Avoid truncating allocationKees Cook2-3/+1
There doesn't appear to be a reason to truncate the allocation used for flow_info, so do a full allocation and remove the unused empty struct. GCC does not like having a reference to an object that has been partially allocated, as bounds checking may become impossible when such an object is passed to other code. Seen with GCC 13: ../drivers/net/ethernet/mediatek/mtk_ppe.c: In function 'mtk_foe_entry_commit_subflow': ../drivers/net/ethernet/mediatek/mtk_ppe.c:623:18: warning: array subscript 'struct mtk_flow_entry[0]' is partly outside array bounds of 'unsigned char[48]' [-Warray-bounds=] 623 | flow_info->l2_data.base_flow = entry; | ^~ Cc: Felix Fietkau <[email protected]> Cc: John Crispin <[email protected]> Cc: Sean Wang <[email protected]> Cc: Mark Lee <[email protected]> Cc: Lorenzo Bianconi <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Matthias Brugger <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-01-31ibmvnic: Toggle between queue types in affinity mappingNick Child1-13/+16
Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all the IRQs for transmit queues then assigning all the IRQs for receive queues. With multi-threaded processors, in a heavy RX or TX environment, physical cores would either be overloaded or underutilized (due to the IRQ assignment algorithm). This approach is sub-optimal because IRQs for the same subprocess (RX or TX) would be bound to adjacent CPU numbers, meaning they were more likely to be contending for the same core. For example, in a system with 64 CPU's and 32 queues, the IRQs would be bound to CPU in the following pattern: IRQ type | CPU number ----------------------- TX0 | 0-1 TX1 | 2-3 <etc> RX0 | 32-33 RX1 | 34-35 <etc> Observe that in SMT-8, the first 4 tx queues would be sharing the same core. A more optimal algorithm would balance the number RX and TX IRQ's across the physical cores. Therefore, to increase performance, distribute RX and TX IRQs across cores by alternating between assigning IRQs for RX and TX queues to CPUs. With a system with 64 CPUs and 32 queues, this results in the following pattern: IRQ type | CPU number ----------------------- TX0 | 0-1 RX0 | 2-3 TX1 | 4-5 RX1 | 6-7 <etc> Observe that in SMT-8, there is equal distribution of RX and TX IRQs per core. In the above case, each core handles 2 TX and 2 RX IRQ's. Signed-off-by: Nick Child <[email protected]> Reviewed-by: Haren Myneni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-01-30net: mscc: ocelot: expose vsc7514_regmap definitionColin Foster2-14/+15
The VSC7514 target regmap is identical for ones shared with similar hardware, specifically the VSC7512. Share this resource, and change the name to match the pattern of other exported resources. Signed-off-by: Colin Foster <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Tested-by: Vladimir Oltean <[email protected]> # regression Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-30net: mscc: ocelot: expose ocelot_reset routineColin Foster2-45/+47
Resetting the switch core is the same whether it is done internally or externally. Move this routine to the ocelot library so it can be used by other drivers. Signed-off-by: Colin Foster <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Tested-by: Vladimir Oltean <[email protected]> # regression Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-30net: mscc: ocelot: expose vcap_props structureColin Foster2-43/+44
The vcap_props structure is common to other devices, specifically the VSC7512 chip that can only be controlled externally. Export this structure so it doesn't need to be recreated. Signed-off-by: Colin Foster <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Tested-by: Vladimir Oltean <[email protected]> # regression Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-30net: mscc: ocelot: expose regfield definition to be used by other driversColin Foster2-59/+60
The ocelot_regfields struct is common between several different chips, some of which can only be controlled externally. Export this structure so it doesn't have to be duplicated in these other drivers. Rename the structure as well, to follow the conventions of other shared resources. Signed-off-by: Colin Foster <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Tested-by: Vladimir Oltean <[email protected]> # regression Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-30net: mscc: ocelot: expose ocelot wm functionsColin Foster2-28/+31
Expose ocelot_wm functions so they can be shared with other drivers. Signed-off-by: Colin Foster <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Tested-by: Vladimir Oltean <[email protected]> # regression Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-30net: b44: Remove the unused function __b44_cam_read()Jiapeng Chong1-22/+0
The function __b44_cam_read() is defined in the b44.c file, but not called elsewhere, so remove this unused function. drivers/net/ethernet/broadcom/b44.c:199:20: warning: unused function '__b44_cam_read'. Reported-by: Abaci Robot <[email protected]> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3858 Signed-off-by: Jiapeng Chong <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-30Merge branch '100GbE' of ↵Jakub Kicinski5-19/+43
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-01-27 (ice) This series contains updates to ice driver only. Dave prevents modifying channels when RDMA is active as this will break RDMA traffic. Michal fixes a broken URL. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix broken link in ice NAPI doc ice: Prevent set_channel from changing queues while RDMA active ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-30net/mlx5e: kTLS, Improve connection rate by using fast update encryption keyJianbo Liu4-30/+45
As the fast DEK update is fully implemented, use it for kTLS to get better performance. TIS pool was already supported to recycle the TISes. With this series and TIS pool, TLS CPS is improved by 9x higher, from 11k/s to 101k/s. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Keep only one bulk of full available DEKsJianbo Liu1-5/+13
One bulk with full available keys is left undestroyed, to service the possible requests from users quickly. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Add async garbage collector for DEK bulkJianbo Liu1-9/+65
After invalidation, the idle bulk with all DEKs available for use, is destroyed, to free keys and mem. To get better performance, the firmware destruction operation is done asynchronously. So idle bulks are enqueued in destroy_list first, then destroyed in system workqueue. This will improve performance, as the destruction doesn't need to hold pool's mutex. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Reuse DEKs after executing SYNC_CRYPTO commandJianbo Liu1-9/+183
To fast update encryption keys, those freed keys with need_sync bit 1 and in_use bit 0 in a bulk, can be recycled. The keys are cached internally by the NIC, so invalidating internal NIC caches by SYNC_CRYPTO command is required before reusing them. A threshold in driver is added to avoid invalidating for every update. Only when the number of DEKs, which need to be synced, is over this threshold, the sync process will start. Besides, it is done in system workqueue. After SYNC_CRYPTO command is executed successfully, the bitmaps of each bulk must be reset accordingly, so that the freed DEKs can be reused. From the analysis in previous patch, the number of reused DEKs can be calculated by hweight_long(need_sync XOR in_use), and the need_sync bits can be reset by simply copying from in_use bits. Two more list (avail_list and sync_list) are added for each pool. The avail_list is for a bulk when all bits in need_sync are reset after sync. If there is no avail deks, and all are be freed by users, the bulk is moved to sync_list, instead of being destroyed in previous patch, and waiting for the invalidation. While syncing, they are simply reset need_sync bits, and moved to avail_list. Besides, add a wait_for_free list for the to-be-free DEKs. It is to avoid this corner case: when thread A is done with SYNC_CRYPTO but just before starting to reset the bitmaps, thread B is alloc dek, and free it immediately. It's obvious that this DEK can't be reused this time, so put it to waiting list, and do free after bulk bitmaps reset is finished. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Use bulk allocation for fast update encryption keyJianbo Liu1-7/+208
We create a pool for each key type. For the pool, there is a struct to store the info for all DEK objects of one bulk allocation. As we use crypto->log_dek_obj_range, which is set to 12 in previous patch, for the log_obj_range of bulk allocation, 4096 DEKs are allocated in one time. To trace the state of all the keys in a bulk, two bitmaps are created. The need_sync bitmap is used to indicate the available state of the corresponding key. If the bit is 0, it can be used (available) as it either is newly created by FW, or SYNC_CRYPTO is executed and bit is reset after it is freed by upper layer user (this is the case to be handled in later patch). Otherwise, the key need to be synced. The in_use bitmap is used to indicate the key is being used, and reset when user free it. When ktls, ipsec or macsec need a key from a bulk, it get one with need_sync bit 0, then set both need_sync and in_used bit to 1. When user free a key, only in_use bit is reset to 0. So, for the combinations of (need_sync, in_use) of one DEK object, - (0,0) means the key is ready for use, - (1,1) means the key is currently being used by a user, - (1,0) means the key is freed, and waiting for being synced, - (0,1) is invalid state. There are two lists in each pool, partial_list and full_list, according to the number for available DEKs in a bulk. When user need a key, it get a bulk, either from partial list, or create new one from FW. Then the bulk is put in the different pool's lists according to the num of avail deks it has. If there is no avail deks, and all of them are be freed by users, for now, the bulk is destroyed. To speed up the bitmap search, a variable (avail_start) is added to indicate where to start to search need_sync bitmap for available key. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Add bulk allocation and modify_dek operationJianbo Liu1-2/+83
To support fast update of keys into hardware, we optimize firmware to achieve the maximum rate. The approach is to create DEK objects in bulk, and update each of them with modify command. This patch supports bulk allocation and modify_dek commands for new firmware. However, as log_obj_range is 0 for now, only one DEK obj is allocated each time, and then updated with user key by modify_dek. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Add support SYNC_CRYPTO commandJianbo Liu2-0/+39
Add support for SYNC_CRYPTO command. For now, it is executed only when initializing DEK, but needed when reusing keys in later patch. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Add new APIs for fast update encryption keyJianbo Liu2-5/+98
New APIs are added to support fast update DEKs. As a pool is created for each key purpose (type), one pair of pool APIs to get/put pool. Anotehr pair of DEKs APIs is to get DEK object from pool and update it with user key, or free it back to the pool. As The bulk allocation and destruction will be supported in later patches, old implementation is used here. To support these APIs, pool and dek structs are defined first. Only small number of fields are stored in them. For example, key_purpose and refcnt in pool struct, DEK object id in dek struct. More fields will be added to these structs in later patches, for example, the different bulk lists for pool struct, the bulk pointer dek struct belongs to, and a list_entry for the list in a pool, which is used to save keys waiting for being freed while other thread is doing sync. Besides the creation and destruction interfaces, new one is also added to get obj id. Currently these APIs are planned to used by TLS only. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Refactor the encryption key creationJianbo Liu1-24/+53
Move the common code to general functions which can be used by fast update encryption key in later patches. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Add const to the key pointer of encryption key creationJianbo Liu3-3/+3
Change key pointer to const void *, as there is no need to change the key content. This is also to avoid modifying the key by mistake. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Prepare for fast crypto key update if hardware supports itJianbo Liu5-0/+55
Add CAP for crypto offload, do the simple initialization if hardware supports it. Currently set log_dek_obj_range to 12, so 4k DEKs will be created in one bulk allocation. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Change key type to key purposeJianbo Liu2-4/+4
Change the naming of key type in DEK fields and macros, to be consistent with the device spec. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Add IFC bits for general obj create paramJianbo Liu1-2/+4
Before this patch, the log_obj_range was defined inside general_obj_in_cmd_hdr to support bulk allocation. However, we need to modify/query one of the object in the bulk in later patch, so change those fields to param bits for parameters specific for cmd header, and add general_obj_create_param according to what was updated in spec. We will also add general_obj_query_param for modify/query later. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30net/mlx5: Header file for cryptoTariq Toukan6-15/+23
Take crypto API out of the generic mlx5.h header into a dedicated header. Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Jianbo Liu <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-01-30ixgbe: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas1-5/+0
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: [email protected] Cc: [email protected] Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-30igc: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas1-5/+0
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: [email protected] Cc: [email protected] Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-01-30igb: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas1-5/+0
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: [email protected] Cc: [email protected] Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-30ice: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas1-3/+0
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: [email protected] Cc: [email protected] Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-30iavf: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas1-5/+0
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: [email protected] Cc: [email protected] Tested-by: Marek Szlosek <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-01-30i40e: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas1-4/+0
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: [email protected] Cc: [email protected] Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-01-30fm10k: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas1-5/+0
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Tony Nguyen <[email protected]>
2023-01-30e1000e: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas1-7/+0
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Tony Nguyen <[email protected]>
2023-01-30devlink: remove devlink featuresJiri Pirko7-10/+5
Devlink features were introduced to disallow devlink reload calls of userspace before the devlink was fully initialized. The reason for this workaround was the fact that devlink reload was originally called without devlink instance lock held. However, with recent changes that converted devlink reload to be performed under devlink instance lock, this is redundant so remove devlink features entirely. Note that mlx5 used this to enable devlink reload conditionally only when device didn't act as multi port slave. Move the multi port check into mlx5_devlink_reload_down() callback alongside with the other checks preventing the device from reload in certain states. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-01-30net: microchip: sparx5: Add KUNIT tests for enabling/disabling chainsSteen Hegelund1-0/+66
This enhances the KUNIT test of the VCAP API with tests of the chaining functionality. Signed-off-by: Steen Hegelund <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-01-30net: microchip: sparx5: Add TC support for the ES2 VCAPSteen Hegelund2-4/+39
This enables the TC command to use the Sparx5 ES2 VCAP, and provides a new ES2 ethertype table and handling of rule links between IS0 and ES2. Signed-off-by: Steen Hegelund <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-01-30net: microchip: sparx5: Add ingress information to VCAP instanceSteen Hegelund11-19/+46
This allows the check of the goto action to be specific to the ingress and egress VCAP instances. The debugfs support is also updated to show this information. Signed-off-by: Steen Hegelund <[email protected]> Signed-off-by: David S. Miller <[email protected]>