Age | Commit message (Collapse) | Author | Files | Lines |
|
Introduce a new generic helper ice_vf_init_host_cfg which performs common
host configuration initialization tasks that will need to be done for both
Single Root IOV and the new Scalable IOV implementation.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Some of the initialization code for Single Root IOV VFs will need to be
reused when we introduce Scalable IOV. Pull this code out into a new
ice_initialize_vf_entry helper function.
Co-developed-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The Single Root IOV implementation of .post_vsi_rebuild performs some tasks
that will ultimately need to be shared with the Scalable IOV implementation
such as rebuilding the host configuration.
Refactor by introducing a new wrapper function, ice_vf_post_vsi_rebuild
which performs the tasks that will be shared between SR-IOV and Scalable
IOV. Move the ice_vf_rebuild_host_cfg and ice_vf_set_initialized calls into
this wrapper. Then call the implementation specific post_vsi_rebuild
handler afterwards.
This ensures that we will properly re-initialize filters and expected
settings for both SR-IOV and Scalable IOV.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_vf_vsi_release function will be used in a future change to
refactor the .vsi_rebuild function. Move this over to ice_vf_lib.c so
that it can be used there.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_vsi_alloc and ice_vsi_cfg functions are used together to allocate
and configure a new VSI, called as part of the ice_vsi_setup function.
In the future with the addition of the subfunction code the ice driver
will want to be able to allocate a VSI while delaying the configuration to
a later point of the port activation.
Currently this requires that the port code know what type of VSI should
be allocated. This is required because ice_vsi_alloc assigns the VSI type.
Refactor the ice_vsi_alloc and ice_vsi_cfg functions so that VSI type
assignment isn't done until the configuration stage. This will allow the
devlink port addition logic to reserve a VSI as early as possible before
the type of the port is known. In this way, the port add can fail in the
event that all hardware VSI resources are exhausted.
Since the ice_vsi_cfg function already takes the ice_vsi_cfg_params
structure, this is relatively straight forward.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_vsi_setup function, ice_vsi_alloc, and ice_vsi_cfg functions have
grown a large number of parameters. These parameters are used to initialize
a new VSI, as well as re-configure an existing VSI
Any time we want to add a new parameter to this function chain, even if it
will usually be unset, we have to change many call sites due to changing
the function signature.
A future change is going to refactor ice_vsi_alloc and ice_vsi_cfg to move
the VSI configuration and initialization all into ice_vsi_cfg.
Before this, refactor the VSI setup flow to use a new ice_vsi_cfg_params
structure. This will contain the configuration (mainly pointers) used to
initialize a VSI.
Pass this from ice_vsi_setup into the related functions such as
ice_vsi_alloc, ice_vsi_cfg, and ice_vsi_cfg_def.
Introduce a helper, ice_vsi_to_params to convert an existing VSI to the
parameters used to initialize it. This will aid in the flows where we
rebuild an existing VSI.
Since we also pass the ICE_VSI_FLAG_INIT to more functions which do not
need (or cannot yet have) the VSI parameters, lets make this clear by
renaming the function parameter to vsi_flags and using a u32 instead of a
signed integer. The name vsi_flags also makes it clear that we may extend
the flags in the future.
This change will make it easier to refactor the setup flow in the future,
and will reduce the complexity required to add a new parameter for
configuration in the future.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The vsi->vf pointer gets assigned early on during ice_vsi_alloc. Several
functions currently take a VF pointer, but they can just use the existing
vsi->vf pointer as needed. Modify these functions to drop the unnecessary
VF parameter.
Note that ice_vsi_cfg is not changed as a following change will refactor so
that the VF pointer is assigned during ice_vsi_cfg rather than
ice_vsi_alloc.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Since commit 1d2e32275de7 ("ice: split ice_vsi_setup into smaller
functions") ice_vsi_alloc has not been responsible for all of the behavior
implied by the comment for ice_vsi_setup_vector_base.
Fix the comment to refer to the new function ice_vsi_alloc_def().
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Extend the usage of function ice_get_vf_vsi(vf) in multiple places
instead of VF's VSI by using a long string of dereferences
(i.e. vf->pf->vsi[vf->lan_vsi_idx]).
Signed-off-by: Brett Creeley <[email protected]>
Signed-off-by: Kalyan Kodamagula <[email protected]>
Tested-by: Piotr Tyda <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
There are 2 classes of in-tree drivers currently:
- those who act upon struct tc_taprio_sched_entry :: gate_mask as if it
holds a bit mask of TXQs
- those who act upon the gate_mask as if it holds a bit mask of TCs
When it comes to the standard, IEEE 802.1Q-2018 does say this in the
second paragraph of section 8.6.8.4 Enhancements for scheduled traffic:
| A gate control list associated with each Port contains an ordered list
| of gate operations. Each gate operation changes the transmission gate
| state for the gate associated with each of the Port's traffic class
| queues and allows associated control operations to be scheduled.
In typically obtuse language, it refers to a "traffic class queue"
rather than a "traffic class" or a "queue". But careful reading of
802.1Q clarifies that "traffic class" and "queue" are in fact
synonymous (see 8.6.6 Queuing frames):
| A queue in this context is not necessarily a single FIFO data structure.
| A queue is a record of all frames of a given traffic class awaiting
| transmission on a given Bridge Port. The structure of this record is not
| specified.
i.o.w. their definition of "queue" isn't the Linux TX queue.
The gate_mask really is input into taprio via its UAPI as a mask of
traffic classes, but taprio_sched_to_offload() converts it into a TXQ
mask.
The breakdown of drivers which handle TC_SETUP_QDISC_TAPRIO is:
- hellcreek, felix, sja1105: these are DSA switches, it's not even very
clear what TXQs correspond to, other than purely software constructs.
Only the mqprio configuration with 8 TCs and 1 TXQ per TC makes sense.
So it's fine to convert these to a gate mask per TC.
- enetc: I have the hardware and can confirm that the gate mask is per
TC, and affects all TXQs (BD rings) configured for that priority.
- igc: in igc_save_qbv_schedule(), the gate_mask is clearly interpreted
to be per-TXQ.
- tsnep: Gerhard Engleder clarifies that even though this hardware
supports at most 1 TXQ per TC, the TXQ indices may be different from
the TC values themselves, and it is the TXQ indices that matter to
this hardware. So keep it per-TXQ as well.
- stmmac: I have a GMAC datasheet, and in the EST section it does
specify that the gate events are per TXQ rather than per TC.
- lan966x: again, this is a switch, and while not a DSA one, the way in
which it implements lan966x_mqprio_add() - by only allowing num_tc ==
NUM_PRIO_QUEUES (8) - makes it clear to me that TXQs are a purely
software construct here as well. They seem to map 1:1 with TCs.
- am65_cpsw: from looking at am65_cpsw_est_set_sched_cmds(), I get the
impression that the fetch_allow variable is treated like a prio_mask.
This definitely sounds closer to a per-TC gate mask rather than a
per-TXQ one, and TI documentation does seem to recomment an identity
mapping between TCs and TXQs. However, Roger Quadros would like to do
some testing before making changes, so I'm leaving this driver to
operate as it did before, for now. Link with more details at the end.
Based on this breakdown, we have 5 drivers with a gate mask per TC and
4 with a gate mask per TXQ. So let's make the gate mask per TXQ the
opt-in and the gate mask per TC the default.
Benefit from the TC_QUERY_CAPS feature that Jakub suggested we add, and
query the device driver before calling the proper ndo_setup_tc(), and
figure out if it expects one or the other format.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/#25193204
Cc: Horatiu Vultur <[email protected]>
Cc: Siddharth Vadapalli <[email protected]>
Cc: Roger Quadros <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Acked-by: Kurt Kanzenbach <[email protected]> # hellcreek
Reviewed-by: Gerhard Engleder <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Since mqprio is a scheduler and not a classifier, move its offload
structure to pkt_sched.h, where struct tc_taprio_qopt_offload also lies.
Also update some header inclusions in drivers that access this
structure, to the best of my abilities.
Cc: Igor Russkikh <[email protected]>
Cc: Yisen Zhuang <[email protected]>
Cc: Salil Mehta <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: Thomas Petazzoni <[email protected]>
Cc: Saeed Mahameed <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Horatiu Vultur <[email protected]>
Cc: Lars Povlsen <[email protected]>
Cc: Steen Hegelund <[email protected]>
Cc: Daniel Machon <[email protected]>
Cc: [email protected]
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Call ice_unload() and ice_load() in driver reinit flow.
Block reinit when switchdev, ADQ or SRIOV is active. In reload path we
don't want to rebuild all features. Ask user to remove them instead of
quitely removing it in reload path.
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
ice_vsi_cfg() is called from different contexts:
1) VSI exsist in HW, but it is reconfigured, because of changing queues
for example -> update instead of init should be used
2) VSI doesn't exsist, because rest has happened -> init command should
be sent
To support both cases pass boolean value which will store information
what type of command has to be sent to HW.
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In deconfig VSI shouldn't be deleted from hw.
Rewrite VSI delete function to reflect that sometimes it is only needed
to remove VSI from hw without freeing the memory:
ice_vsi_delete() -> delete from HW and free memory
ice_vsi_delete_from_hw() -> delete only from HW
Value returned from ice_vsi_free() is never used. Change return type to
void.
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In driver reload path the netdev isn't removed, but VSI is. Remove
filters on netdev right after removing them on VSI.
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Part of code from probe can be reused in reload flow. Move this code to
separate function. Create unroll functions for each part of
initialization, like: ice_init_dev() and ice_deinit_dev(). It
simplifies unrolling and can be used in remove flow.
Avoid freeing port info as it could be reused in reload path.
Will be freed in remove path since is allocated via devm_kzalloc().
Also clean the remove path to reflect the init steps.
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
When allocating the ICE_VSI_CTRL, the allocated struct ice_vsi pointer is
stored into the PF's pf->vsi array at a fixed location. This was
historically done on the basis that it could provide an O(1) lookup for the
special control VSI.
Since we store the ctrl_vsi_idx, we already have O(1) lookup regardless of
where in the array we store this VSI.
Simplify the logic in ice_vsi_alloc by using the same method of storing the
control VSI as other types of VSIs.
Signed-off-by: Jacob Keller <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Main goal is to reuse the same functions in VSI config and rebuild
paths.
To do this split ice_vsi_setup into smaller pieces and reuse it during
rebuild.
ice_vsi_alloc() should only alloc memory, not set the default values
for VSI.
Move setting defaults to separate function. This will allow config of
already allocated VSI, for example in reload path.
The path is mostly moving code around without introducing new
functionality. Functions ice_vsi_cfg() and ice_vsi_decfg() were
added, but they are using code that already exist.
Use flag to pass information about VSI initialization during rebuild
instead of using boolean value.
Co-developed-by: Jacob Keller <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Do few small cleanups:
1) Rename the function to reflect that it doesn't configure all things
related to VSI. ice_vsi_cfg_lan() better fits to what function is doing.
ice_vsi_cfg() can be use to name function that will configure whole VSI.
2) Remove unused ethtype field from VSI. There is no need to set
ethtype here, because it is never used.
3) Remove unnecessary check for ICE_VSI_CHNL. There is check for
ICE_VSI_CHNL in ice_vsi_get_qs, so there is no need to check it before
calling the function.
4) Simplify ice_vsi_alloc() call. There is no need to check the type of
VSI before calling ice_vsi_alloc(). For ICE_VSI_CHNL vf is always NULL
(ice_vsi_setup() is called with vf=NULL).
For ICE_VSI_VF or ICE_VSI_CTRL ch is always NULL and for other VSI types
ch and vf are always NULL.
5) Remove unnecessary call to ice_vsi_dis_irq(). ice_vsi_dis_irq() will
be called in ice_vsi_close() flow (ice_vsi_close() -> ice_vsi_down() ->
ice_vsi_dis_irq()). Remove unnecessary call.
6) Don't remove specific filters in release. All hw filters are removed
in ice_fltr_remove_alli(), which is always called in VSI release flow.
There is no need to remove only ethertype filters before calling
ice_fltr_remove_all().
7) Rename ice_vsi_clear() to ice_vsi_free(). As ice_vsi_clear() only
free memory allocated in ice_vsi_alloc() rename it to ice_vsi_free()
which better shows what function is doing.
8) Free coalesce param in rebuild. There is potential memory leak if
configuration of VSI lan fails. Free coalesce to avoid it.
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Use xa_array instead of deprecated ida to alloc id for RDMA aux driver.
Signed-off-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Simplify probe flow by moving all RDMA related code to ice_init_rdma().
Unroll irq allocation if RDMA initialization fails.
Implement ice_deinit_rdma() and use it in remove flow.
Signed-off-by: Michal Swiatkowski <[email protected]>
Acked-by: Dave Ertman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
A summary of the flags being set for various drivers is given below.
Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features
that can be turned off and on at runtime. This means that these flags
may be set and unset under RTNL lock protection by the driver. Hence,
READ_ONCE must be used by code loading the flag value.
Also, these flags are not used for synchronization against the availability
of XDP resources on a device. It is merely a hint, and hence the read
may race with the actual teardown of XDP resources on the device. This
may change in the future, e.g. operations taking a reference on the XDP
resources of the driver, and in turn inhibiting turning off this flag.
However, for now, it can only be used as a hint to check whether device
supports becoming a redirection target.
Turn 'hw-offload' feature flag on for:
- netronome (nfp)
- netdevsim.
Turn 'native' and 'zerocopy' features flags on for:
- intel (i40e, ice, ixgbe, igc)
- mellanox (mlx5).
- stmmac
- netronome (nfp)
Turn 'native' features flags on for:
- amazon (ena)
- broadcom (bnxt)
- freescale (dpaa, dpaa2, enetc)
- funeth
- intel (igb)
- marvell (mvneta, mvpp2, octeontx2)
- mellanox (mlx4)
- mtk_eth_soc
- qlogic (qede)
- sfc
- socionext (netsec)
- ti (cpsw)
- tap
- tsnep
- veth
- xen
- virtio_net.
Turn 'basic' (tx, pass, aborted and drop) features flags on for:
- netronome (nfp)
- cavium (thunder)
- hyperv.
Turn 'redirect_target' feature flag on for:
- amanzon (ena)
- broadcom (bnxt)
- freescale (dpaa, dpaa2)
- intel (i40e, ice, igb, ixgbe)
- ti (cpsw)
- marvell (mvneta, mvpp2)
- sfc
- socionext (netsec)
- qlogic (qede)
- mellanox (mlx5)
- tap
- veth
- virtio_net
- xen
Reviewed-by: Gerhard Engleder <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Co-developed-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Co-developed-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Marek Majtyka <[email protected]>
Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
net/core/gro.c
7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO")
b1a78b9b9886 ("net: add support for ipv4 big tcp")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
clang static analysis reports
drivers/net/ethernet/intel/igc/igc_ptp.c:673:3: warning: The left operand of
'+' is a garbage value [core.UndefinedBinaryOperatorResult]
ktime_add_ns(shhwtstamps.hwtstamp, adjust);
^ ~~~~~~~~~~~~~~~~~~~~
igc_ptp_systim_to_hwtstamp() silently returns without setting the hwtstamp
if the mac type is unknown. This should be treated as an error.
Fixes: 81b055205e8b ("igc: Add support for RX timestamping")
Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Sasha Neftin <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Let us store pointer to xdp_buff that came from xsk_buff_pool on tx_buf
so that it will be possible to recycle it via xsk_buff_free() on Tx
cleaning side. This way it is not necessary to do expensive copy to
another xdp_buff backed by a newly allocated page.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Now that both ZC and standard XDP data paths stopped using Tx logic
based on next_dd and next_rs fields, we can safely remove these fields
and shrink Tx ring structure.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Similarly as for Rx side in previous patch, logic on XDP Tx in ice
driver needs to be adjusted for multi-buffer support. Specifically, the
way how HW Tx descriptors are produced and cleaned.
Currently, XDP_TX works on strict ring boundaries, meaning it sets RS
bit (on producer side) / looks up DD bit (on consumer/cleaning side)
every quarter of the ring. It means that if for example multi buffer
frame would span across the ring quarter boundary (say that frame
consists of 4 frames and we start from 62 descriptor where ring is sized
to 256 entries), RS bit would be produced in the middle of multi buffer
frame, which would be a broken behavior as it needs to be set on the
last descriptor of the frame.
To make it work, set RS bit at the last descriptor from the batch of
frames that XDP_TX action was used on and make the first entry remember
the index of last descriptor with RS bit set. This way, cleaning side
can take the index of descriptor with RS bit, look up DD bit's presence
and clean from first entry to last.
In order to clean up the code base introduce the common ice_set_rs_bit()
which will return index of descriptor that got RS bit produced on so
that standard driver can store this within proper ice_tx_buf and ZC
driver can simply ignore return value.
Co-developed-by: Martyna Szapar-Mudlaw <[email protected]>
Signed-off-by: Martyna Szapar-Mudlaw <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Ice driver needs to be a bit reworked on Rx data path in order to
support multi-buffer XDP. For skb path, it currently works in a way that
Rx ring carries pointer to skb so if driver didn't manage to combine
fragmented frame at current NAPI instance, it can restore the state on
next instance and keep looking for last fragment (so descriptor with EOP
bit set). What needs to be achieved is that xdp_buff needs to be
combined in such way (linear + frags part) in the first place. Then skb
will be ready to go in case of XDP_PASS or BPF program being not present
on interface. If BPF program is there, it would work on multi-buffer
XDP. At this point xdp_buff resides directly on Rx ring, so given the
fact that skb will be built straight from xdp_buff, there will be no
further need to carry skb on Rx ring.
Besides removing skb pointer from Rx ring, lots of members have been
moved around within ice_rx_ring. First and foremost reason was to place
rx_buf with xdp_buff on the same cacheline. This means that once we
touch rx_buf (which is a preceding step before touching xdp_buff),
xdp_buff will already be hot in cache. Second thing was that xdp_rxq is
used rather rarely and it occupies a separate cacheline, so maybe it is
better to have it at the end of ice_rx_ring.
Other change that affects ice_rx_ring is the introduction of
ice_rx_ring::first_desc. Its purpose is twofold - first is to propagate
rx_buf->act to all the parts of current xdp_buff after running XDP
program, so that ice_put_rx_buf() that got moved out of the main Rx
processing loop will be able to tak an appriopriate action on each
buffer. Second is for ice_construct_skb().
ice_construct_skb() has a copybreak mechanism which had an explicit
impact on xdp_buff->skb conversion in the new approach when legacy Rx
flag is toggled. It works in a way that linear part is 256 bytes long,
if frame is bigger than that, remaining bytes are going as a frag to
skb_shared_info.
This means while memcpying frags from xdp_buff to newly allocated skb,
care needs to be taken when picking the destination frag array entry.
Upon the time ice_construct_skb() is called, when dealing with
fragmented frame, current rx_buf points to the *last* fragment, but
copybreak needs to be done against the first one. That's where
ice_rx_ring::first_desc helps.
When frame building spans across NAPI polls (DD bit is not set on
current descriptor and xdp->data is not NULL) with current Rx buffer
handling state there might be some problems.
Since calls to ice_put_rx_buf() were pulled out of the main Rx
processing loop and were scoped from cached_ntc to current ntc, remember
that now mentioned function relies on rx_buf->act, which is set within
ice_run_xdp(). ice_run_xdp() is called when EOP bit was found, so
currently we could put Rx buffer with rx_buf->act being *uninitialized*.
To address this, change scoping to rely on first_desc on both boundaries
instead.
This also implies that cleaned_count which is used as an input to
ice_alloc_rx_buffers() and tells how many new buffers should be refilled
has to be adjusted. If it stayed as is, what could happen is a case
where ntc would go over ntu.
Therefore, remove cleaned_count altogether and use against allocing
routine newly introduced ICE_RX_DESC_UNUSED() macro which is an
equivalent of ICE_DESC_UNUSED() dedicated for Rx side and based on
struct ice_rx_ring::first_desc instead of next_to_clean.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
SKB path calculates truesize on three different functions, which could
be avoided as xdp_buff carries the already calculated truesize under
xdp_buff::frame_sz. If ice_add_rx_frag() is adjusted to take the
xdp_buff as an input just like functions responsible for creating
sk_buff initially, codebase could be simplified by removing these
redundant recalculations and rely on xdp_buff::frame_sz instead.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Currently ice_finalize_xdp_rx() is called only when xdp_prog is present
on VSI, which is a good thing. However, this optimization can be
enhanced and check only if any of the XDP_TX/XDP_REDIRECT took place in
current Rx processing. Non-zero value of @xdp_xmit indicates that
xdp_prog is present on VSI. This way XDP_DROP-based workloads will not
suffer from unnecessary calls to external function.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This should have been used in there from day 1, let us address that
before introducing XDP multi-buffer support for Rx side.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Currently calls to ice_put_rx_buf() are sprinkled through
ice_clean_rx_irq() - first place is for explicit flow director's
descriptor handling, second is after running XDP prog and the last one
is after taking care of skb.
1st callsite was actually only for ntc bump purpose, as Rx buffer to be
recycled is not even passed to a function.
It is possible to walk through Rx buffers processed in particular NAPI
cycle by caching ntc from beginning of the ice_clean_rx_irq().
To do so, let us store XDP verdict inside ice_rx_buf, so action we need
to take on will be known. For XDP prog absence, just store ICE_XDP_PASS
as a verdict.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This might be in future used by ZC driver and might potentially yield a
minor performance boost. While at it, constify arguments that
ice_is_non_eop() takes, since they are pointers and this will help compiler
while generating asm.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Plan is to move ice_put_rx_buf() to the end of ice_clean_rx_irq() so
in order to keep the ability of walking through HW Rx descriptors, pull
out next_to_clean handling out of ice_put_rx_buf().
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This will allow us to avoid carrying additional auxiliary array of page
counts when dealing with XDP multi buffer support. Previously combining
fragmented frame to skb was not affected in the same way as XDP would be
as whole frame is needed to be in place before executing XDP prog.
Therefore, when going through HW Rx descriptors one-by-one, calls to
ice_put_rx_buf() need to be taken *after* running XDP prog on a
potentially multi buffered frame, so some additional storage of
page count is needed.
By adding page count to rx buf, it will make it easier to walk through
processed entries at the end of rx cleaning routine and decide whether
or not buffers should be recycled.
While at it, bump ice_rx_buf::pagecnt_bias from u16 up to u32. It was
proven many times that calculations on variables smaller than standard
register size are harmful. This was also the case during experiments
with embedding page count to ice_rx_buf - when this was added as u16 it
had a performance impact.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
In preparation for XDP multi-buffer support, let's store xdp_buff on
Rx ring struct. This will allow us to combine fragmented frames across
separate NAPI cycles in the same way as currently skb fragments are
handled. This means that skb pointer on Rx ring will become redundant
and will be removed. For now it is kept and layout of Rx ring struct was
not inspected, some member movement will be needed later on so that will
be the time to take care of it.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Rx path is going to be modified in a way that fragmented frame will be
gathered within xdp_buff in the first place. This approach implies that
underlying buffer has to provide tailroom for skb_shared_info. This is
currently the case when ring uses build_skb but not when legacy-rx knob
is turned on. This case configures 2k Rx buffers and has no way to
provide either headroom or tailroom - FWIW it currently has
XDP_PACKET_HEADROOM which is broken and in here it is removed. 2k Rx
buffers were used so driver in this setting was able to support 9k MTU
as it can chain up to 5 Rx buffers. With offset configuring HW writing
2k of a data was passing the half of the page which broke the assumption
of our internal page recycling tricks.
Now if above got fixed and legacy-rx path would be left as is, when
referring to skb_shared_info via xdp_get_shared_info_from_buff(),
packet's content would be corrupted again. Hence size of Rx buffer needs
to be lowered and therefore supported MTU. This operation will allow us
to keep the unified data path and with 8k MTU users (if any of
legacy-rx) would still be good to go. However, tendency is to drop the
support for this code path at some point.
Add ICE_RXBUF_1664 as vsi::rx_buf_len and ICE_MAX_FRAME_LEGACY_RX (8320)
as vsi::max_frame for legacy-rx. For bigger page sizes configure 3k Rx
buffers, not 2k.
Since headroom support is removed, disable data_meta support on legacy-rx.
When preparing XDP buff, rely on ice_rx_ring::rx_offset setting when
deciding whether to support data_meta or not.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Tony Nguyen <[email protected]>
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Devlink features were introduced to disallow devlink reload calls of
userspace before the devlink was fully initialized. The reason for this
workaround was the fact that devlink reload was originally called
without devlink instance lock held.
However, with recent changes that converted devlink reload to be
performed under devlink instance lock, this is redundant so remove
devlink features entirely.
Note that mlx5 used this to enable devlink reload conditionally only
when device didn't act as multi port slave. Move the multi port check
into mlx5_devlink_reload_down() callback alongside with the other
checks preventing the device from reload in certain states.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Conflicts:
drivers/net/ethernet/intel/ice/ice_main.c
418e53401e47 ("ice: move devlink port creation/deletion")
643ef23bd9dd ("ice: Introduce local var for readability")
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
drivers/net/ethernet/engleder/tsnep_main.c
3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues")
25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/[email protected]/
net/netfilter/nf_conntrack_proto_sctp.c
13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths")
f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The PF controls the set of queues that the RDMA auxiliary_driver requests
resources from. The set_channel command will alter that pool and trigger a
reconfiguration of the VSI, which breaks RDMA functionality.
Prevent set_channel from executing when RDMA driver bound to auxiliary
device.
Adding a locked variable to pass down the call chain to avoid double
locking the device_lock.
Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
devlink_param_driverinit_value_set() call makes sense only for
"driverinit" params. However here, both params are "runtime".
devlink_param_driverinit_value_set() returns -EOPNOTSUPP in such case
and does not do anything. So remove the pointless calls to
devlink_param_driverinit_value_set() entirely.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
virtchnl: update and refactor
Jesse Brandeburg says:
The virtchnl.h file is used by i40e/ice physical function (PF) drivers
and irdma when talking to the iavf driver. This series cleans up the
header file by removing unused elements, adding/cleaning some comments,
fixing the data structures so they are explicitly defined, including
padding, and finally does a long overdue rename of the IWARP members in
the structures to RDMA, since the ice driver and it's associated Intel
Ethernet E800 series adapters support both RDMA and IWARP.
The whole series should result in no functional change, but hopefully
clearer code.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
virtchnl: i40e/iavf: rename iwarp to rdma
virtchnl: do structure hardening
virtchnl: update header and increase header clarity
virtchnl: remove unused structure declaration
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|