Age | Commit message (Collapse) | Author | Files | Lines |
|
Conversion of the driver to gpiod API done in 468ba54bd616 ("fec:
convert to gpio descriptor") incorrectly made reset line mandatory and
resulted in aborting driver probe in cases where reset line was not
specified (note: this way of specifying PHY reset line is actually
deprecated).
Switch to using devm_gpiod_get_optional() and skip manipulating reset
line if it can not be located.
Fixes: 468ba54bd616 ("fec: convert to gpio descriptor")
Signed-off-by: Dmitry Torokhov <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reported-by: Marc Kleine-Budde <[email protected]>
Tested-by: Marc Kleine-Budde <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
A summary of the flags being set for various drivers is given below.
Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features
that can be turned off and on at runtime. This means that these flags
may be set and unset under RTNL lock protection by the driver. Hence,
READ_ONCE must be used by code loading the flag value.
Also, these flags are not used for synchronization against the availability
of XDP resources on a device. It is merely a hint, and hence the read
may race with the actual teardown of XDP resources on the device. This
may change in the future, e.g. operations taking a reference on the XDP
resources of the driver, and in turn inhibiting turning off this flag.
However, for now, it can only be used as a hint to check whether device
supports becoming a redirection target.
Turn 'hw-offload' feature flag on for:
- netronome (nfp)
- netdevsim.
Turn 'native' and 'zerocopy' features flags on for:
- intel (i40e, ice, ixgbe, igc)
- mellanox (mlx5).
- stmmac
- netronome (nfp)
Turn 'native' features flags on for:
- amazon (ena)
- broadcom (bnxt)
- freescale (dpaa, dpaa2, enetc)
- funeth
- intel (igb)
- marvell (mvneta, mvpp2, octeontx2)
- mellanox (mlx4)
- mtk_eth_soc
- qlogic (qede)
- sfc
- socionext (netsec)
- ti (cpsw)
- tap
- tsnep
- veth
- xen
- virtio_net.
Turn 'basic' (tx, pass, aborted and drop) features flags on for:
- netronome (nfp)
- cavium (thunder)
- hyperv.
Turn 'redirect_target' feature flag on for:
- amanzon (ena)
- broadcom (bnxt)
- freescale (dpaa, dpaa2)
- intel (i40e, ice, igb, ixgbe)
- ti (cpsw)
- marvell (mvneta, mvpp2)
- sfc
- socionext (netsec)
- qlogic (qede)
- mellanox (mlx5)
- tap
- veth
- virtio_net
- xen
Reviewed-by: Gerhard Engleder <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Co-developed-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Co-developed-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Marek Majtyka <[email protected]>
Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
net/core/gro.c
7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO")
b1a78b9b9886 ("net: add support for ipv4 big tcp")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently there is no IRQ handling (even the SGMII supports it).
Enable polling to support SFP ports.
Fixes: 14a44ab0330d ("net: mtk_eth_soc: partially convert to phylink_pcs")
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Alexander Couzens <[email protected]>
[ bmork: changed "1" => "true" ]
Signed-off-by: Bjørn Mork <[email protected]>
Acked-by: Daniel Golle <[email protected]>
Tested-by: Daniel Golle <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The logic of the duplex bit is inverted. Setting it means half
duplex, not full duplex.
Fix and rename macro to avoid confusion.
Fixes: 7e538372694b ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Bjørn Mork <[email protected]>
Acked-by: Daniel Golle <[email protected]>
Tested-by: Daniel Golle <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The code expect the PHY to be in power down which is only true after reset.
Allow changes of the SGMII parameters more than once.
Only power down when reconfiguring to avoid bouncing the link when there's
no reason to - based on code from Russell King.
There are cases when the SGMII_PHYA_PWD register contains 0x9 which
prevents SGMII from working. The SGMII still shows link but no traffic
can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was
taken from a good working state of the SGMII interface.
Fixes: 42c03844e93d ("net-next: mediatek: add support for MediaTek MT7622 SoC")
Suggested-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Alexander Couzens <[email protected]>
[ bmork: rebased and squashed into one patch ]
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Bjørn Mork <[email protected]>
Acked-by: Daniel Golle <[email protected]>
Tested-by: Daniel Golle <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
According to my tests on MT7621AT and MT7623NI SoCs, hardware DSA untagging
won't work on the second MAC. Therefore, disable this feature when the
second MAC of the MT7621 and MT7623 SoCs is being used.
Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Link: https://lore.kernel.org/netdev/[email protected]/
Tested-by: Arınç ÜNAL <[email protected]>
Signed-off-by: Arınç ÜNAL <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The existing implementation for non-Autonegotiation 10G speed modes does
not enable RX adaptation in the Driver and FW. The RX Equalization
settings (AFE settings alone) are manually configured and the existing
link-up sequence in the driver does not perform rx adaptation process as
mentioned in the Synopsys databook. There's a customer request for 10G
backplane mode without Auto-negotiation and for the DAC cables of more
significant length that follow the non-Autonegotiation mode. These modes
require PHY to perform RX Adaptation.
The proposed logic adds the necessary changes to Yellow Carp devices to
ensure seamless RX Adaptation for 10G-SFI (LONG DAC) and 10G-KR without
AN (CL72 not present). The RX adaptation core algorithm is executed by
firmware, however, to achieve that a new mailbox sub-command is required
to be sent by the driver.
Co-developed-by: Shyam Sundar S K <[email protected]>
Signed-off-by: Shyam Sundar S K <[email protected]>
Signed-off-by: Raju Rangoju <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Add support to the driver to fully recognize and enable 2.5GbE speed in
10GBaseT mode.
Acked-by: Shyam Sundar S K <[email protected]>
Signed-off-by: Raju Rangoju <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
NPC exact match feature is supported only on one silicon
variant, removed debug messages which print that this
feature is not available on all other silicon variants.
Signed-off-by: Sunil Goutham <[email protected]>
Signed-off-by: Ratheesh Kannoth <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Exact match feature is only available in CN10K-B.
Unregister exact match devlink entry only for
this silicon variant.
Fixes: 87e4ea29b030 ("octeontx2-af: Debugsfs support for exact match.")
Signed-off-by: Ratheesh Kannoth <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
clang static analysis reports
drivers/net/ethernet/intel/igc/igc_ptp.c:673:3: warning: The left operand of
'+' is a garbage value [core.UndefinedBinaryOperatorResult]
ktime_add_ns(shhwtstamps.hwtstamp, adjust);
^ ~~~~~~~~~~~~~~~~~~~~
igc_ptp_systim_to_hwtstamp() silently returns without setting the hwtstamp
if the mac type is unknown. This should be treated as an error.
Fixes: 81b055205e8b ("igc: Add support for RX timestamping")
Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Sasha Neftin <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This patch corrects two oversights relating to releasing resources
and DCB initialisation.
1. If mapping of the dcbcfg_tbl area fails: an error should be
propagated, allowing partial initialisation (probe) to be unwound.
2. Conversely, if where dcbcfg_tbl is successfully mapped: it should
be unmapped in nfp_nic_dcb_clean() which is called via various error
cleanup paths, and shutdown or removal of the PCIE device.
Fixes: 9b7fe8046d74 ("nfp: add DCB IEEE support")
Signed-off-by: Huayu Chen <[email protected]>
Reviewed-by: Niklas Söderlund <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
A mutex may sleep, which is not permitted in atomic context.
Avoid a case where this may arise by moving the to
nfp_flower_lag_get_info_from_netdev() in nfp_tun_write_neigh() spinlock.
Fixes: abc210952af7 ("nfp: flower: tunnel neigh support bond offload")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Yanguo Li <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Modified the bnxt_en code to create and pre-configure RDMA devices
with the right MSI-X vector count for the ROCE driver to use.
This is to align the ROCE driver to the auxiliary device model which
will simply bind the driver without getting into PCI-related handling.
All PCI-related logic will now be in the bnxt_en driver.
Suggested-by: Leon Romanovsky <[email protected]>
Signed-off-by: Ajit Khaparde <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
Remove the SRIOV config callback which the bnxt_en was calling
to reconfigure the chip resources for a PF device when VFs are
created. The code is now modified to provision the VF resources
based on the total VF count instead of the actual VF count.
This allows the SRIOV config callback to be removed from the
list of ulp_ops.
Suggested-by: Leon Romanovsky <[email protected]>
Signed-off-by: Ajit Khaparde <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
Decouple RoCE driver from directly accessing L2's private bnxt
structure. Move the fields needed by RoCE driver into bnxt_en_dev.
They'll be passed to RoCE driver by bnxt_rdma_aux_device_add()
function.
Signed-off-by: Hongguang Gao <[email protected]>
Signed-off-by: Ajit Khaparde <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Reviewed-by: Selvin Xavier <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
Wherever possible use the function ops provided by auxiliary bus
instead of using proprietary ops.
Defined bnxt_re_suspend and bnxt_re_resume calls which can be
invoked by the bnxt_en driver instead of the ULP stop/start calls.
Signed-off-by: Ajit Khaparde <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Reviewed-by: Selvin Xavier <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
For a single ULP user there is no need for complicating function
indirection calls. Remove all this complexity in favour of direct
function calls exported by the bnxt_en driver. This allows to
simplify the code greatly. Also remove unused ulp_async_notifier.
Suggested-by: Leon Romanovsky <[email protected]>
Signed-off-by: Ajit Khaparde <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Reviewed-by: Selvin Xavier <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
Since the driver continues to use the single ULP model,
the extra complexity and indirection is unnecessary.
Remove the usage of ulp_id from the code.
Suggested-by: Leon Romanovsky <[email protected]>
Signed-off-by: Ajit Khaparde <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Reviewed-by: Selvin Xavier <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
Use auxiliary driver interface for driver load, unload ROCE driver.
The driver does not need to register the interface using the netdev
notifier anymore. Removed the bnxt_re_dev_list which is not needed.
Currently probe, remove and shutdown ops have been implemented for
the auxiliary device.
Also remove exccessve validation checks for rdev.
Signed-off-by: Ajit Khaparde <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Reviewed-by: Selvin Xavier <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
Add auxiliary driver support.
An auxiliary device will be created if the hardware indicates
support for RDMA.
The bnxt_ulp_probe() function has been removed and a new
bnxt_rdma_aux_device_add() function has been added.
The bnxt_free_msix_vecs() and bnxt_req_msix_vecs() will now hold
the RTNL lock when they call the bnxt_close_nic()and bnxt_open_nic()
since the device close and open need to be protected under RTNL lock.
The operations between the bnxt_en and bnxt_re will be protected
using the en_ops_lock.
This will be used by the bnxt_re driver in a follow-on patch
to create ROCE interfaces.
Signed-off-by: Ajit Khaparde <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Reviewed-by: Selvin Xavier <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
Let us store pointer to xdp_buff that came from xsk_buff_pool on tx_buf
so that it will be possible to recycle it via xsk_buff_free() on Tx
cleaning side. This way it is not necessary to do expensive copy to
another xdp_buff backed by a newly allocated page.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Now that both ZC and standard XDP data paths stopped using Tx logic
based on next_dd and next_rs fields, we can safely remove these fields
and shrink Tx ring structure.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Similarly as for Rx side in previous patch, logic on XDP Tx in ice
driver needs to be adjusted for multi-buffer support. Specifically, the
way how HW Tx descriptors are produced and cleaned.
Currently, XDP_TX works on strict ring boundaries, meaning it sets RS
bit (on producer side) / looks up DD bit (on consumer/cleaning side)
every quarter of the ring. It means that if for example multi buffer
frame would span across the ring quarter boundary (say that frame
consists of 4 frames and we start from 62 descriptor where ring is sized
to 256 entries), RS bit would be produced in the middle of multi buffer
frame, which would be a broken behavior as it needs to be set on the
last descriptor of the frame.
To make it work, set RS bit at the last descriptor from the batch of
frames that XDP_TX action was used on and make the first entry remember
the index of last descriptor with RS bit set. This way, cleaning side
can take the index of descriptor with RS bit, look up DD bit's presence
and clean from first entry to last.
In order to clean up the code base introduce the common ice_set_rs_bit()
which will return index of descriptor that got RS bit produced on so
that standard driver can store this within proper ice_tx_buf and ZC
driver can simply ignore return value.
Co-developed-by: Martyna Szapar-Mudlaw <[email protected]>
Signed-off-by: Martyna Szapar-Mudlaw <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Ice driver needs to be a bit reworked on Rx data path in order to
support multi-buffer XDP. For skb path, it currently works in a way that
Rx ring carries pointer to skb so if driver didn't manage to combine
fragmented frame at current NAPI instance, it can restore the state on
next instance and keep looking for last fragment (so descriptor with EOP
bit set). What needs to be achieved is that xdp_buff needs to be
combined in such way (linear + frags part) in the first place. Then skb
will be ready to go in case of XDP_PASS or BPF program being not present
on interface. If BPF program is there, it would work on multi-buffer
XDP. At this point xdp_buff resides directly on Rx ring, so given the
fact that skb will be built straight from xdp_buff, there will be no
further need to carry skb on Rx ring.
Besides removing skb pointer from Rx ring, lots of members have been
moved around within ice_rx_ring. First and foremost reason was to place
rx_buf with xdp_buff on the same cacheline. This means that once we
touch rx_buf (which is a preceding step before touching xdp_buff),
xdp_buff will already be hot in cache. Second thing was that xdp_rxq is
used rather rarely and it occupies a separate cacheline, so maybe it is
better to have it at the end of ice_rx_ring.
Other change that affects ice_rx_ring is the introduction of
ice_rx_ring::first_desc. Its purpose is twofold - first is to propagate
rx_buf->act to all the parts of current xdp_buff after running XDP
program, so that ice_put_rx_buf() that got moved out of the main Rx
processing loop will be able to tak an appriopriate action on each
buffer. Second is for ice_construct_skb().
ice_construct_skb() has a copybreak mechanism which had an explicit
impact on xdp_buff->skb conversion in the new approach when legacy Rx
flag is toggled. It works in a way that linear part is 256 bytes long,
if frame is bigger than that, remaining bytes are going as a frag to
skb_shared_info.
This means while memcpying frags from xdp_buff to newly allocated skb,
care needs to be taken when picking the destination frag array entry.
Upon the time ice_construct_skb() is called, when dealing with
fragmented frame, current rx_buf points to the *last* fragment, but
copybreak needs to be done against the first one. That's where
ice_rx_ring::first_desc helps.
When frame building spans across NAPI polls (DD bit is not set on
current descriptor and xdp->data is not NULL) with current Rx buffer
handling state there might be some problems.
Since calls to ice_put_rx_buf() were pulled out of the main Rx
processing loop and were scoped from cached_ntc to current ntc, remember
that now mentioned function relies on rx_buf->act, which is set within
ice_run_xdp(). ice_run_xdp() is called when EOP bit was found, so
currently we could put Rx buffer with rx_buf->act being *uninitialized*.
To address this, change scoping to rely on first_desc on both boundaries
instead.
This also implies that cleaned_count which is used as an input to
ice_alloc_rx_buffers() and tells how many new buffers should be refilled
has to be adjusted. If it stayed as is, what could happen is a case
where ntc would go over ntu.
Therefore, remove cleaned_count altogether and use against allocing
routine newly introduced ICE_RX_DESC_UNUSED() macro which is an
equivalent of ICE_DESC_UNUSED() dedicated for Rx side and based on
struct ice_rx_ring::first_desc instead of next_to_clean.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
SKB path calculates truesize on three different functions, which could
be avoided as xdp_buff carries the already calculated truesize under
xdp_buff::frame_sz. If ice_add_rx_frag() is adjusted to take the
xdp_buff as an input just like functions responsible for creating
sk_buff initially, codebase could be simplified by removing these
redundant recalculations and rely on xdp_buff::frame_sz instead.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Currently ice_finalize_xdp_rx() is called only when xdp_prog is present
on VSI, which is a good thing. However, this optimization can be
enhanced and check only if any of the XDP_TX/XDP_REDIRECT took place in
current Rx processing. Non-zero value of @xdp_xmit indicates that
xdp_prog is present on VSI. This way XDP_DROP-based workloads will not
suffer from unnecessary calls to external function.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This should have been used in there from day 1, let us address that
before introducing XDP multi-buffer support for Rx side.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Currently calls to ice_put_rx_buf() are sprinkled through
ice_clean_rx_irq() - first place is for explicit flow director's
descriptor handling, second is after running XDP prog and the last one
is after taking care of skb.
1st callsite was actually only for ntc bump purpose, as Rx buffer to be
recycled is not even passed to a function.
It is possible to walk through Rx buffers processed in particular NAPI
cycle by caching ntc from beginning of the ice_clean_rx_irq().
To do so, let us store XDP verdict inside ice_rx_buf, so action we need
to take on will be known. For XDP prog absence, just store ICE_XDP_PASS
as a verdict.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This might be in future used by ZC driver and might potentially yield a
minor performance boost. While at it, constify arguments that
ice_is_non_eop() takes, since they are pointers and this will help compiler
while generating asm.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Plan is to move ice_put_rx_buf() to the end of ice_clean_rx_irq() so
in order to keep the ability of walking through HW Rx descriptors, pull
out next_to_clean handling out of ice_put_rx_buf().
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This will allow us to avoid carrying additional auxiliary array of page
counts when dealing with XDP multi buffer support. Previously combining
fragmented frame to skb was not affected in the same way as XDP would be
as whole frame is needed to be in place before executing XDP prog.
Therefore, when going through HW Rx descriptors one-by-one, calls to
ice_put_rx_buf() need to be taken *after* running XDP prog on a
potentially multi buffered frame, so some additional storage of
page count is needed.
By adding page count to rx buf, it will make it easier to walk through
processed entries at the end of rx cleaning routine and decide whether
or not buffers should be recycled.
While at it, bump ice_rx_buf::pagecnt_bias from u16 up to u32. It was
proven many times that calculations on variables smaller than standard
register size are harmful. This was also the case during experiments
with embedding page count to ice_rx_buf - when this was added as u16 it
had a performance impact.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
In preparation for XDP multi-buffer support, let's store xdp_buff on
Rx ring struct. This will allow us to combine fragmented frames across
separate NAPI cycles in the same way as currently skb fragments are
handled. This means that skb pointer on Rx ring will become redundant
and will be removed. For now it is kept and layout of Rx ring struct was
not inspected, some member movement will be needed later on so that will
be the time to take care of it.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Rx path is going to be modified in a way that fragmented frame will be
gathered within xdp_buff in the first place. This approach implies that
underlying buffer has to provide tailroom for skb_shared_info. This is
currently the case when ring uses build_skb but not when legacy-rx knob
is turned on. This case configures 2k Rx buffers and has no way to
provide either headroom or tailroom - FWIW it currently has
XDP_PACKET_HEADROOM which is broken and in here it is removed. 2k Rx
buffers were used so driver in this setting was able to support 9k MTU
as it can chain up to 5 Rx buffers. With offset configuring HW writing
2k of a data was passing the half of the page which broke the assumption
of our internal page recycling tricks.
Now if above got fixed and legacy-rx path would be left as is, when
referring to skb_shared_info via xdp_get_shared_info_from_buff(),
packet's content would be corrupted again. Hence size of Rx buffer needs
to be lowered and therefore supported MTU. This operation will allow us
to keep the unified data path and with 8k MTU users (if any of
legacy-rx) would still be good to go. However, tendency is to drop the
support for this code path at some point.
Add ICE_RXBUF_1664 as vsi::rx_buf_len and ICE_MAX_FRAME_LEGACY_RX (8320)
as vsi::max_frame for legacy-rx. For bigger page sizes configure 3k Rx
buffers, not 2k.
Since headroom support is removed, disable data_meta support on legacy-rx.
When preparing XDP buff, rely on ice_rx_ring::rx_offset setting when
deciding whether to support data_meta or not.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-01-30
Add fast update encryption key
Jianbo Liu Says:
================
Data encryption keys (DEKs) are the keys used for data encryption and
decryption operations. Starting from version 22.33.0783, firmware is
optimized to accelerate the update of user keys into DEK object in
hardware. The support for bulk allocation and destruction of DEK
objects is added, and the bulk allocated DEKs are uninitialized, as
the bulk creation requires no input key. When offload
encryption/decryption, user gets one object from a bulk, and updates
key by a new "modify DEK" command. This command is the same as create
DEK object, but requires no heavy context memory allocation in
firmware, which consumes most cpu cycles of the create DEK command.
DEKs are cached internally by the NIC, so invalidating internal NIC
caches is required before reusing DEKs. The SYNC_CRYPTO command is
added to support it. DEK object can be reused, the keys in it can be
updated after this command is executed.
This patchset enhances the key creation and destruction flow, to get
use of this new feature. Any user, for example, ktls, ipsec and
macsec, can use it to offload keys. But, only ktls uses it, as others
don't need many keys, and caching two many DEKs in pool is wasteful.
There are two new data struts added:
a. DEK pool. One pool is created for each key type. The bulks by
the type, are placed in the pool's different bulk lists, according to
the number of available and in_used DEKs in the bulk.
b. DEK bulk. All DEKs in one bulk allocation are store here. There
are two bitmaps to indicate the state of each DEK.
New APIs are then added. When user need a DEK object,
a. Fetch one bulk with avail DEKs, from the partial_list or
avail_list, otherwise create new one.
b. Pick one DEK, and set its need_sync and in_used bits to 1.
Move the bulk to full_list if no more available keys, or put it to
partial_list if the bulk is newly created.
c. Update DEK object's key with user key, by the "modify DEK"
command.
d. Return DEK struct to user, then it gets the object id and fills
it into the offload commands.
When user free a DEK,
a. Set in_use bit to 0. If all need_sync bits are 1 and all in_use
bits of this bulk are 0, move it to sync_list.
b. If the number of DEKs, which are freed by users, is over the
threshold (128), schedule a workqueue to do the sync process.
For the sync process, the SYNC_CRYPTO command is executed first. Then,
for each bulks in partial_list, full_list and sync_list, reset
need_sync bits of the freed DEK objects. If all need_sync bits in one
bulk are zero, move it to avail_list.
We already supported TIS pool to recycle the TISes. With this series
and TIS pool, TLS CPS performance is improved greatly.
And we tested https on the system:
CPU: dual AMD EPYC 7763 64-Core processors
RAM: 512G
DEV: ConnectX-6 DX, with FW ver 22.33.0838 and TLS_OPTIMISE=true
TLS CPS performance numbers are:
Before: 11k connections/sec
After: 101 connections/sec
================
* tag 'mlx5-updates-2023-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: kTLS, Improve connection rate by using fast update encryption key
net/mlx5: Keep only one bulk of full available DEKs
net/mlx5: Add async garbage collector for DEK bulk
net/mlx5: Reuse DEKs after executing SYNC_CRYPTO command
net/mlx5: Use bulk allocation for fast update encryption key
net/mlx5: Add bulk allocation and modify_dek operation
net/mlx5: Add support SYNC_CRYPTO command
net/mlx5: Add new APIs for fast update encryption key
net/mlx5: Refactor the encryption key creation
net/mlx5: Add const to the key pointer of encryption key creation
net/mlx5: Prepare for fast crypto key update if hardware supports it
net/mlx5: Change key type to key purpose
net/mlx5: Add IFC bits and enums for crypto key
net/mlx5: Add IFC bits for general obj create param
net/mlx5: Header file for crypto
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When memory allocation fails in lynx_pcs_create() and it returns NULL,
there remains a dangling reference to the mdiodev returned by
of_mdio_find_device() which is leaked as soon as memac_pcs_create()
returns empty-handed.
Fixes: a7c2a32e7f22 ("net: fman: memac: Use lynx pcs driver")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Sean Anderson <[email protected]>
Acked-by: Madalin Bucur <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN: Remove redundant Device Control Error Reporting Enable
Bjorn Helgaas says:
Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"),
the PCI core sets the Device Control bits that enable error reporting for
PCIe devices.
This series removes redundant calls to pci_enable_pcie_error_reporting()
that do the same thing from several NIC drivers.
There are several more drivers where this should be removed; I started with
just the Intel drivers here.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ixgbe: Remove redundant pci_enable_pcie_error_reporting()
igc: Remove redundant pci_enable_pcie_error_reporting()
igb: Remove redundant pci_enable_pcie_error_reporting()
ice: Remove redundant pci_enable_pcie_error_reporting()
iavf: Remove redundant pci_enable_pcie_error_reporting()
i40e: Remove redundant pci_enable_pcie_error_reporting()
fm10k: Remove redundant pci_enable_pcie_error_reporting()
e1000e: Remove redundant pci_enable_pcie_error_reporting()
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There doesn't appear to be a reason to truncate the allocation used for
flow_info, so do a full allocation and remove the unused empty struct.
GCC does not like having a reference to an object that has been
partially allocated, as bounds checking may become impossible when
such an object is passed to other code. Seen with GCC 13:
../drivers/net/ethernet/mediatek/mtk_ppe.c: In function 'mtk_foe_entry_commit_subflow':
../drivers/net/ethernet/mediatek/mtk_ppe.c:623:18: warning: array subscript 'struct mtk_flow_entry[0]' is partly outside array bounds of 'unsigned char[48]' [-Warray-bounds=]
623 | flow_info->l2_data.base_flow = entry;
| ^~
Cc: Felix Fietkau <[email protected]>
Cc: John Crispin <[email protected]>
Cc: Sean Wang <[email protected]>
Cc: Mark Lee <[email protected]>
Cc: Lorenzo Bianconi <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Matthias Brugger <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all
the IRQs for transmit queues then assigning all the IRQs for receive
queues. With multi-threaded processors, in a heavy RX or TX environment,
physical cores would either be overloaded or underutilized (due to the
IRQ assignment algorithm). This approach is sub-optimal because IRQs for
the same subprocess (RX or TX) would be bound to adjacent CPU numbers,
meaning they were more likely to be contending for the same core.
For example, in a system with 64 CPU's and 32 queues, the IRQs would
be bound to CPU in the following pattern:
IRQ type | CPU number
-----------------------
TX0 | 0-1
TX1 | 2-3
<etc>
RX0 | 32-33
RX1 | 34-35
<etc>
Observe that in SMT-8, the first 4 tx queues would be sharing the
same core.
A more optimal algorithm would balance the number RX and TX IRQ's across
the physical cores. Therefore, to increase performance, distribute RX and
TX IRQs across cores by alternating between assigning IRQs for RX and TX
queues to CPUs.
With a system with 64 CPUs and 32 queues, this results in the following
pattern:
IRQ type | CPU number
-----------------------
TX0 | 0-1
RX0 | 2-3
TX1 | 4-5
RX1 | 6-7
<etc>
Observe that in SMT-8, there is equal distribution of RX and TX IRQs
per core. In the above case, each core handles 2 TX and 2 RX IRQ's.
Signed-off-by: Nick Child <[email protected]>
Reviewed-by: Haren Myneni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The VSC7514 target regmap is identical for ones shared with similar
hardware, specifically the VSC7512. Share this resource, and change the
name to match the pattern of other exported resources.
Signed-off-by: Colin Foster <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]> # regression
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Resetting the switch core is the same whether it is done internally or
externally. Move this routine to the ocelot library so it can be used by
other drivers.
Signed-off-by: Colin Foster <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]> # regression
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The vcap_props structure is common to other devices, specifically the
VSC7512 chip that can only be controlled externally. Export this structure
so it doesn't need to be recreated.
Signed-off-by: Colin Foster <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]> # regression
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The ocelot_regfields struct is common between several different chips, some
of which can only be controlled externally. Export this structure so it
doesn't have to be duplicated in these other drivers.
Rename the structure as well, to follow the conventions of other shared
resources.
Signed-off-by: Colin Foster <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Tested-by: Vladimir Oltean <[email protected]> # regression
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Expose ocelot_wm functions so they can be shared with other drivers.
Signed-off-by: Colin Foster <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Tested-by: Vladimir Oltean <[email protected]> # regression
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The function __b44_cam_read() is defined in the b44.c file, but not called
elsewhere, so remove this unused function.
drivers/net/ethernet/broadcom/b44.c:199:20: warning: unused function '__b44_cam_read'.
Reported-by: Abaci Robot <[email protected]>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3858
Signed-off-by: Jiapeng Chong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-01-27 (ice)
This series contains updates to ice driver only.
Dave prevents modifying channels when RDMA is active as this will break
RDMA traffic.
Michal fixes a broken URL.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix broken link in ice NAPI doc
ice: Prevent set_channel from changing queues while RDMA active
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
As the fast DEK update is fully implemented, use it for kTLS to get
better performance.
TIS pool was already supported to recycle the TISes. With this series
and TIS pool, TLS CPS is improved by 9x higher, from 11k/s to 101k/s.
Signed-off-by: Jianbo Liu <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
One bulk with full available keys is left undestroyed, to service the
possible requests from users quickly.
Signed-off-by: Jianbo Liu <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
After invalidation, the idle bulk with all DEKs available for use, is
destroyed, to free keys and mem.
To get better performance, the firmware destruction operation is done
asynchronously. So idle bulks are enqueued in destroy_list first, then
destroyed in system workqueue. This will improve performance, as the
destruction doesn't need to hold pool's mutex.
Signed-off-by: Jianbo Liu <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|