Age | Commit message (Collapse) | Author | Files | Lines |
|
Time synchronization was not properly enabled on non-MSI-X platforms.
Fixes: 2c344ae24501 ("igc: Add support for TX timestamping")
Signed-off-by: James McLaughlin <[email protected]>
Reviewed-by: Vinicius Costa Gomes <[email protected]>
Tested-by: Nechama Kraus <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
It was reported that when PCIe PTM is enabled, some lockups could
be observed with some integrated i225-V models.
While the issue is investigated, we can disable crosstimestamp for
those models and see no loss of functionality, because those models
don't have any support for time synchronization.
Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
Link: https://lore.kernel.org/all/[email protected]/
Reported-by: Stefan Dietrich <[email protected]>
Signed-off-by: Vinicius Costa Gomes <[email protected]>
Tested-by: Nechama Kraus <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
ixgbevf driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
ixgbe driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
igc driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Nechama Kraus <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
igb driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
ice driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
iavf driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
i40e driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx element.
e1000 driver runs Tx completion polling cycle right before the Rx
one. Now that e1000 uses napi_consume_skb() to put skbuff_heads of
completed entries into the cache, it will never empty and always
warm at that moment. Switch to the napi_build_skb() to relax mm
pressure on heavy Rx and increase throughput.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In order to take the best from per-cpu NAPI skbuff_head caches and
CPU cycles, let's switch from dev_kfree_skb_any(), which passes skb
back to the mm layer, to napi_consume_skb(), which feeds those
caches on non-zero budget instead (falls back to the former on 0).
Do the replacement in e1000_unmap_and_free_tx_resource(). There are
4 call sites of this function throughout the driver:
* e1000_clean_tx_ring(). Slowpath, process context, cleans the
whole Tx ring on ifdown. Use budget of 0 here;
* e1000_tx_map(). Hotpath, net Tx softirq, unmaps the buffers in
case of error. Use 0 as well;
* e1000_clean_tx_irq(). Hotpath, NAPI Tx completion polling cycle.
As the driver doesn't count completed Tx entries towards the NAPI
budget, just use the poll budget of 64 to utilize caches.
Apart from being a preparation for switching to napi_build_skb(),
this is useful on its own as well, as napi_consume_skb() flushes
skb caches by batches of 32 instead of one-at-a-time.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
include/net/sock.h
commit 8f905c0e7354 ("inet: fully convert sk->sk_rx_dst to RCU rules")
commit 43f51df41729 ("net: move early demux fields close to sk_refcnt")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Fix an odd indent where some code was left indented, and causes smatch
to warn:
ice_log_pkg_init() warn: inconsistent indenting
While here, for consistency, add a break after the default case.
This commit has a Fixes: but we caught this while it was only in net-next.
Fixes: 247dd97d713c ("ice: Refactor status flow for DDP load")
Signed-off-by: Jesse Brandeburg <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-12-21
This series contains updates to ice driver only.
Karol modifies the reset flow to correct issues with PTP reset.
Jake extends PTP support for E822 based devices. This includes a few
cleanup patches, that fix some minor issues. In addition, there are some
slight refactors to ease the addition of E822 support, followed by adding
the new hardware implementation ice_ptp_hw.c.
There are a few major differences with E822 support compared to E810
support:
*) The E822 device has a Clock Generation Unit which must be initialized in
order to generate proper clock frequencies on the output that drives the PTP
hardware clock registers
*) The E822 PHY is a bit different and requires a more complex
initialization procedure which must be rerun any time the link configuration
changes.
*) The E822 devices support enhanced timestamp calibration by making use of
a process called Vernier offset measurement. This allows the hardware to
measure phase offset related to the PHY clocks for Serdes and FEC, reducing
the inaccuracy of the timestamp relative to the actual packet transmission
and receipt. Making use of this requires data gathered from the first
transmitted and received packets, and waiting for the PHY to complete the
calibration measurements. This is done as part of a new kthread, ov_work.
Note that to avoid delay in enabling timestamps, we start the PHY in
'bypass' mode which allows timestamps to be captured without the Vernier
calibration measurement. Once the first packets have been sent and received,
we then complete the calibration setup and exit bypass mode and begin using
the more precise timestamps. According to the datasheet, timestamps without
calibration data can be incorrect relative to actual receipt or transmission
by up to 1 clock cycle (~1.25 nanoseconds), while calibrated timestamps
should be correct to within 1/8th of a clock cycle (~0.15 nanoseconds).
*) E822 devices support crosstimestamping via PCIe PTM, which we enable when
available on the platform.
There is a fair amount of logic required to perform PHY and CGU
initialization, which is the vast majority of the new code, but it is fairly
self contained within ice_ptp_hw.c, with the exception of monitoring for
offset validity being handled by a kthread.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: support crosstimestamping on E822 devices if supported
ice: exit bypass mode once hardware finishes timestamp calibration
ice: ensure the hardware Clock Generation Unit is configured
ice: implement basic E822 PTP support
ice: convert clk_freq capability into time_ref
ice: introduce ice_ptp_init_phc function
ice: use 'int err' instead of 'int status' in ice_ptp_hw.c
ice: PTP: move setting of tstamp_config
ice: introduce ice_base_incval function
ice: Fix E810 PTP reset flow
====================
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Delete the redundant word 'by'.
Signed-off-by: Xiang wangx <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Refactoring "PF still resetting" message, because previous version looked
like a bug - it informed about changes that worked as designed but might
confuse users. Changes requested to make message more user-friendly.
Signed-off-by: Karen Sornek <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The variable used for return status in `igb_write_xmdio_reg' function
is never changed and this function is just need return 0. Thus, the
`ret_val' can be removed and return 0 at the end of the
`igb_write_xmdio_reg' function.
Signed-off-by: Jason Wang <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
'MII_CR_FULL_DUPLEX' define not in use. This patch comes to tidy up
obsolete define.
Signed-off-by: Sasha Neftin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
'IGC_CTRL_EXT_LINK_MODE_MASK' not in use. This patch comes to tidy up
obsolete define.
Signed-off-by: Sasha Neftin <[email protected]>
Tested-by: Nechama Kraus <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
i225 devices use only spi nvm type. This patch comes to tidy up
obsolete nvm types.
Signed-off-by: Sasha Neftin <[email protected]>
Tested-by: Nechama Kraus <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
_phy_none type not in use. Clean up the code accordingly,
and get rid of the unused enum line
Signed-off-by: Sasha Neftin <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Nechama Kraus <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
_I_PHY_ID not in use. Clean up the code accordingly,
and get rid of the unused define
Signed-off-by: Sasha Neftin <[email protected]>
Tested-by: Nechama Kraus <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
E822 devices on supported platforms can generate a cross timestamp
between the platform ART and the device time. This process allows for
very precise measurement of the difference between the PTP hardware
clock and the platform time.
This is only supported if we know the TSC frequency relative to ART, so
we do not enable this unless the boot CPU has a known TSC frequency (as
required by convert_art_ns_to_tsc).
Because PCIe PTM support is not available on all platforms, introduce
CONFIG_ICE_HWTS and make it depend on X86 where we know the support
exists.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Once the E822 device has sent and received one packet, the hardware
computes the internal delay of the PHY using a process known as Vernier
calibration. This calibration calculates a more accurate offset for the
Tx and Rx timestamps. To make use of this offset, we need to exit the
bypass mode. This cannot be done until the PHY has completed offset
calibration, as indicated by the offset valid bits.
To handle this, introduce a kthread work item which will poll the offset
valid bits every few milliseconds seeing if it is safe to exit bypass
mode.
Once we have finished calibrating the offsets, we can program the total
Tx and Rx offset registers and turn off the bypass bit. This allows the
hardware to include the more precise vernier calibration offset, and
improves the timestamp precision.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The E822 device has a Clock Generation Unit (CGU) responsible for
determining the clock frequency that drives the timers.
Ensure this function is initialized when bringing up the PTP support, so
that the clock has a known frequency.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Implement support for the basic operations needed to enable the PTP
hardware clock on E822 devices.
This includes implementations for the various PHY access functions, as
well as the ability to start and stop the PHY timers. This is different
from the E810 device because the configuration depends on link speed, so
we cannot just start the PHYs immediately. We must wait until the link
is up to get proper values for the speed based initialization.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Convert the clk_freq value into the associated time_ref frequency value
for E822 devices. This simplifies determining the time reference value
for the clock.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
When we enable support for E822 devices, there are some additional
steps required to initialize the PTP hardware clock. To make this easier
to implement as device-specific behavior, refactor the register setups
in ice_ptp_init_owner to a new ice_ptp_init_phc function defined in
ice_ptp_hw.c
This function will have a common section, and an e810 specific
sub-implementation.
This will enable easily extending the functionality to cover the E822
specific setup required to initialize the hardware clock generation
unit. It also makes it clear which steps are E810 specific vs which ones
are necessary for all ice devices.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_ptp_hw.c file introduced a bunch of uses of "int status" instead
of the more traditional "int err" or "int ret". These are actually
traditional Linux error codes (as opposed to the recently removed
ice_status enumeration values).
We're about to add a bunch of new functions to ice_ptp_hw.c. It's
normally preferred in the ice driver to use "int ret" or "int err" when
dealing with error code values.
Instead of making the new functions use "int status" lets just fix all
of ice_ptp_hw.c to use "int err". This will match the new functions and
ensures a consistent style across at least the PTP related files.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The tstamp_config structure is being set inside of
ice_ptp_cfg_timestamp, which is the function used to set Tx and
Rx timestamping during initialization.
This function is also used in order to set the PHY port timestamping
status. However, it makes sense to always set the tstamp_config directly
whenever the ice_set_tx_tstamp or ice_set_rx_tstamp functions are
called.
Move assignment of tstamp_config into the related functions and out of
ice_ptp_cfg_timestamp.
Now that we assign the timestamp mode in the relevant functions, we no
longer modify the config value in ice_set_timestamp_mode. In turn, we
no longer want to copy that config value into the PF cached structure.
Instead, this is now the source of truth for actual configuration. On
success of ice_set_timestamp_mode, copy the real configured mode back to
report it out to userspace.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
A future change will add additional possible increment values for the
E822 device support. To handle this, we want to look up the increment
value to use instead of hard coding it to the nominal value for E810
devices. Introduce ice_base_incval as a function to get the best nominal
increment value to use.
For now, it just returns the E810 value, but will be refactored in the
future to look up the value based on the device type and configured
clock frequency.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The PF reset does not reset PHC and PHY clocks so it's unnecessary to
stop them and reinitialize after the reset.
Configuring timestamping changes the VSI fields so it needs to be
performed after VSIs are initialized, which was not done in case of a
reset.
Suggested-by: Patrick Talbert <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Tested-by: Pasi Vaananen <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Recent net core changes caused an issue with few Intel drivers
(reportedly igb), where taking RTNL in RPM resume path results in a
deadlock. See [0] for a bug report. I don't think the core changes
are wrong, but taking RTNL in RPM resume path isn't needed.
The Intel drivers are the only ones doing this. See [1] for a
discussion on the issue. Following patch changes the RPM resume path
to not take RTNL.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=215129
[1] https://lore.kernel.org/netdev/20211125074949.5f897431@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/t/
Fixes: bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")
Fixes: f32a21376573 ("ethtool: runtime-resume netdev parent before ethtool ioctl ops")
Tested-by: Martin Stolpe <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
For VIRTCHNL_VF_OFFLOAD_VLAN, PF's would limit the number of VLAN
filters a VF was allowed to add. However, by the time the opcode failed,
the VLAN netdev had already been added. VIRTCHNL_VF_OFFLOAD_VLAN_V2
added the ability for a PF to tell the VF how many VLAN filters it's
allowed to add. Make changes to support that functionality.
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The new VIRTCHNL_VF_OFFLOAD_VLAN_V2 capability added support that allows
the VF to support 802.1Q and 802.1ad VLAN insertion and stripping if
successfully negotiated via VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS.
Multiple changes were needed to support this new functionality.
1. Added new aq_required flags to support any kind of VLAN stripping and
insertion offload requests via virtchnl.
2. Added the new method iavf_set_vlan_offload_features() that's
used during VF initialization, VF reset, and iavf_set_features() to
set the aq_required bits based on the current VLAN offload
configuration of the VF's netdev.
3. Added virtchnl handling for VIRTCHNL_OP_ENABLE_STRIPPING_V2,
VIRTCHNL_OP_DISABLE_STRIPPING_V2, VIRTCHNL_OP_ENABLE_INSERTION_V2,
and VIRTCHNL_OP_ENABLE_INSERTION_V2.
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The new VIRTCHNL_VF_OFFLOAD_VLAN_V2 capability added support that allows
the PF to set the location of the Tx and Rx VLAN tag for insertion and
stripping offloads. In order to support this functionality a few changes
are needed.
1. Add a new method to cache the VLAN tag location based on negotiated
capabilities for the Tx and Rx ring flags. This needs to be called in
the initialization and reset paths.
2. Refactor the transmit hotpath to account for the new Tx ring flags.
When IAVF_TXR_FLAGS_VLAN_LOC_L2TAG2 is set, then the driver needs to
insert the VLAN tag in the L2TAG2 field of the transmit descriptor.
When the IAVF_TXRX_FLAGS_VLAN_LOC_L2TAG1 is set, then the driver needs
to use the l2tag1 field of the data descriptor (same behavior as
before).
3. Refactor the iavf_tx_prepare_vlan_flags() function to simplify
transmit hardware VLAN offload functionality by only depending on the
skb_vlan_tag_present() function. This can be done because the OS
won't request transmit offload for a VLAN unless the driver told the
OS it's supported and enabled.
4. Refactor the receive hotpath to account for the new Rx ring flags and
VLAN ethertypes. This requires checking the Rx ring flags and
descriptor status bits to determine the location of the VLAN tag.
Also, since only a single ethertype can be supported at a time, check
the enabled netdev features before specifying a VLAN ethertype in
__vlan_hwaccel_put_tag().
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Based on VIRTCHNL_VF_OFFLOAD_VLAN_V2, the VF can now support more VLAN
capabilities (i.e. 802.1AD offloads and filtering). In order to
communicate these capabilities to the netdev layer, the VF needs to
parse its VLAN capabilities based on whether it was able to negotiation
VIRTCHNL_VF_OFFLOAD_VLAN or VIRTCHNL_VF_OFFLOAD_VLAN_V2 or neither of
these.
In order to support this, add the following functionality:
iavf_get_netdev_vlan_hw_features() - This is used to determine the VLAN
features that the underlying hardware supports and that can be toggled
off/on based on the negotiated capabiltiies. For example, if
VIRTCHNL_VF_OFFLOAD_VLAN_V2 was negotiated, then any capability marked
with VIRTCHNL_VLAN_TOGGLE can be toggled on/off by the VF. If
VIRTCHNL_VF_OFFLOAD_VLAN was negotiated, then only VLAN insertion and/or
stripping can be toggled on/off.
iavf_get_netdev_vlan_features() - This is used to determine the VLAN
features that the underlying hardware supports and that should be
enabled by default. For example, if VIRTHCNL_VF_OFFLOAD_VLAN_V2 was
negotiated, then any supported capability that has its ethertype_init
filed set should be enabled by default. If VIRTCHNL_VF_OFFLOAD_VLAN was
negotiated, then filtering, stripping, and insertion should be enabled
by default.
Also, refactor iavf_fix_features() to take into account the new
capabilities. To do this, query all the supported features (enabled by
default and toggleable) and make sure the requested change is supported.
If VIRTCHNL_VF_OFFLOAD_VLAN_V2 is successfully negotiated, there is no
need to check VIRTCHNL_VLAN_TOGGLE here because the driver already told
the netdev layer which features can be toggled via netdev->hw_features
during iavf_process_config(), so only those features will be requested
to change.
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In order to support the new VIRTCHNL_VF_OFFLOAD_VLAN_V2 capability the
VF driver needs to rework it's initialization state machine and reset
flow. This has to be done because successful negotiation of
VIRTCHNL_VF_OFFLOAD_VLAN_V2 requires the VF driver to perform a second
capability request via VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS before
configuring the adapter and its netdev.
Add the VIRTCHNL_VF_OFFLOAD_VLAN_V2 bit when sending the
VIRTHCNL_OP_GET_VF_RESOURECES message. The underlying PF will either
support VIRTCHNL_VF_OFFLOAD_VLAN or VIRTCHNL_VF_OFFLOAD_VLAN_V2 or
neither. Both of these offloads should never be supported together.
Based on this, add 2 new states to the initialization state machine:
__IAVF_INIT_GET_OFFLOAD_VLAN_V2_CAPS
__IAVF_INIT_CONFIG_ADAPTER
The __IAVF_INIT_GET_OFFLOAD_VLAN_V2_CAPS state is used to request/store
the new VLAN capabilities if and only if VIRTCHNL_VLAN_OFFLOAD_VLAN_V2
was successfully negotiated in the __IAVF_INIT_GET_RESOURCES state.
The __IAVF_INIT_CONFIG_ADAPTER state is used to configure the
adapter/netdev after the resource requests have finished. The VF will
move into this state regardless of whether it successfully negotiated
VIRTCHNL_VF_OFFLOAD_VLAN or VIRTCHNL_VF_OFFLOAD_VLAN_V2.
Also, add a the new flag IAVF_FLAG_AQ_GET_OFFLOAD_VLAN_V2_CAPS and set
it during VF reset. If VIRTCHNL_VF_OFFLOAD_VLAN_V2 was successfully
negotiated then the VF will request its VLAN capabilities via
VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS during the reset. This is needed
because the PF may change/modify the VF's configuration during VF reset
(i.e. modifying the VF's port VLAN configuration).
This also, required the VF to call netdev_update_features() since its
VLAN features may change during VF reset. Make sure to call this under
rtnl_lock().
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Currently cleaned_count is initialized to ICE_DESC_UNUSED(rx_ring) and
later on during the Rx processing it is incremented per each frame that
driver consumed. This can result in excessive buffers requested from xsk
pool based on that value.
To address this, just drop cleaned_count and pass
ICE_DESC_UNUSED(rx_ring) directly as a function argument to
ice_alloc_rx_bufs_zc(). Idea is to ask for buffers as many as consumed.
Let us also call ice_alloc_rx_bufs_zc unconditionally at the end of
ice_clean_rx_irq_zc. This has been changed in that way for corresponding
ice_clean_rx_irq, but not here.
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Kiran Bhandare <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Commit ac6f733a7bd5 ("ice: allow empty Rx descriptors") stated that ice
HW can produce empty descriptors that are valid and they should be
processed.
Add this support to xsk ZC path to avoid potential processing problems.
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Kiran Bhandare <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The descriptor that ntu is pointing at when we exit
ice_alloc_rx_bufs_zc() should not have its corresponding DD bit cleared
as descriptor is not allocated in there and it is not valid for HW
usage.
The allocation routine at the entry will fill the descriptor that ntu
points to after it was set to ntu + nb_buffs on previous call.
Even the spec says:
"The tail pointer should be set to one descriptor beyond the last empty
descriptor in host descriptor ring."
Therefore, step away from clearing the status_error0 on ntu + nb_buffs
descriptor.
Fixes: db804cfc21e9 ("ice: Use the xsk batched rx allocation interface")
Reported-by: Elza Mathew <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Kiran Bhandare <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The 'if (ntu == rx_ring->count)' block in ice_alloc_rx_buffers_zc()
was previously residing in the loop, but after introducing the
batched interface it is used only to wrap-around the NTU descriptor,
thus no more need to assign 'xdp'.
Fixes: db804cfc21e9 ("ice: Use the xsk batched rx allocation interface")
Signed-off-by: Alexander Lobakin <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Tested-by: Kiran Bhandare <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Currently, the zero-copy data path is reusing the memory region that was
initially allocated for an array of struct ice_rx_buf for its own
purposes. This is error prone as it is based on the ice_rx_buf struct
always being the same size or bigger than what the zero-copy path needs.
There can also be old values present in that array giving rise to errors
when the zero-copy path uses it.
Fix this by freeing the ice_rx_buf region and allocating a new array for
the zero-copy path that has the right length and is initialized to zero.
Fixes: 57f7f8b6bc0b ("ice: Use xdp_buf instead of rx_buf for xsk zero-copy")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Kiran Bhandare <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Currently we only NULL the xdp_buff pointer in the internal SW ring but
we never give it back to the xsk buffer pool. This means that buffers
can be leaked out of the buff pool and never be used again.
Add missing xsk_buff_free() call to the routine that is supposed to
clean the entries that are left in the ring so that these buffers in the
umem can be used by other sockets.
Also, only go through the space that is actually left to be cleaned
instead of a whole ring.
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Kiran Bhandare <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The MDIO bus speed must be initialized before talking to the PHY the first
time in order to avoid talking to it using a speed that the PHY doesn't
support.
This fixes HW initialization error -17 (IXGBE_ERR_PHY_ADDR_INVALID) on
Denverton CPUs (a.k.a. the Atom C3000 family) on ports with a 10Gb network
plugged in. On those devices, HLREG0[MDCSPD] resets to 1, which combined
with the 10Gb network results in a 24MHz MDIO speed, which is apparently
too fast for the connected PHY. PHY register reads over MDIO bus return
garbage, leading to initialization failure.
Reproduced with Linux kernel 4.19 and 5.15-rc7. Can be reproduced using
the following setup:
* Use an Atom C3000 family system with at least one X552 LAN on the SoC
* Disable PXE or other BIOS network initialization if possible
(the interface must not be initialized before Linux boots)
* Connect a live 10Gb Ethernet cable to an X550 port
* Power cycle (not reset, doesn't always work) the system and boot Linux
* Observe: ixgbe interfaces w/ 10GbE cables plugged in fail with error -17
Fixes: e84db7272798 ("ixgbe: Introduce function to control MDIO speed")
Signed-off-by: Cyril Novikov <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Commit a296d665eae1 ("ixgbe: Add ethtool support to enable 2.5 and 5.0
Gbps support") introduced suppression of the advertisement of NBASE-T
speeds by default, according to Todd Fujinaka to accommodate customers
with network switches which could not cope with advertised NBASE-T
speeds, as posted in the E1000-devel mailing list:
https://sourceforge.net/p/e1000/mailman/message/37106269/
However, the suppression was not documented at all, nor was how to
enable NBASE-T support.
Properly document the NBASE-T suppression and how to enable NBASE-T
support.
Fixes: a296d665eae1 ("ixgbe: Add ethtool support to enable 2.5 and 5.0 Gbps support")
Reported-by: Robert Schlabbach <[email protected]>
Signed-off-by: Robert Schlabbach <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The LTR maximum value was incorrectly written using the scale from
the LTR minimum value. This would cause incorrect values to be sent,
in cases where the initial calculation lead to different min/max scales.
Fixes: 707abf069548 ("igc: Add initial LTR support")
Suggested-by: Dima Ruinskiy <[email protected]>
Signed-off-by: Sasha Neftin <[email protected]>
Tested-by: Nechama Kraus <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In `igbvf_probe`, if register_netdev() fails, the program will go to
label err_hw_init, and then to label err_ioremap. In free_netdev() which
is just below label err_ioremap, there is `list_for_each_entry_safe` and
`netif_napi_del` which aims to delete all entries in `dev->napi_list`.
The program has added an entry `adapter->rx_ring->napi` which is added by
`netif_napi_add` in igbvf_alloc_queues(). However, adapter->rx_ring has
been freed below label err_hw_init. So this a UAF.
In terms of how to patch the problem, we can refer to igbvf_remove() and
delete the entry before `adapter->rx_ring`.
The KASAN logs are as follows:
[ 35.126075] BUG: KASAN: use-after-free in free_netdev+0x1fd/0x450
[ 35.127170] Read of size 8 at addr ffff88810126d990 by task modprobe/366
[ 35.128360]
[ 35.128643] CPU: 1 PID: 366 Comm: modprobe Not tainted 5.15.0-rc2+ #14
[ 35.129789] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[ 35.131749] Call Trace:
[ 35.132199] dump_stack_lvl+0x59/0x7b
[ 35.132865] print_address_description+0x7c/0x3b0
[ 35.133707] ? free_netdev+0x1fd/0x450
[ 35.134378] __kasan_report+0x160/0x1c0
[ 35.135063] ? free_netdev+0x1fd/0x450
[ 35.135738] kasan_report+0x4b/0x70
[ 35.136367] free_netdev+0x1fd/0x450
[ 35.137006] igbvf_probe+0x121d/0x1a10 [igbvf]
[ 35.137808] ? igbvf_vlan_rx_add_vid+0x100/0x100 [igbvf]
[ 35.138751] local_pci_probe+0x13c/0x1f0
[ 35.139461] pci_device_probe+0x37e/0x6c0
[ 35.165526]
[ 35.165806] Allocated by task 366:
[ 35.166414] ____kasan_kmalloc+0xc4/0xf0
[ 35.167117] foo_kmem_cache_alloc_trace+0x3c/0x50 [igbvf]
[ 35.168078] igbvf_probe+0x9c5/0x1a10 [igbvf]
[ 35.168866] local_pci_probe+0x13c/0x1f0
[ 35.169565] pci_device_probe+0x37e/0x6c0
[ 35.179713]
[ 35.179993] Freed by task 366:
[ 35.180539] kasan_set_track+0x4c/0x80
[ 35.181211] kasan_set_free_info+0x1f/0x40
[ 35.181942] ____kasan_slab_free+0x103/0x140
[ 35.182703] kfree+0xe3/0x250
[ 35.183239] igbvf_probe+0x1173/0x1a10 [igbvf]
[ 35.184040] local_pci_probe+0x13c/0x1f0
Fixes: d4e0fe01a38a0 (igbvf: add new driver to support 82576 virtual functions)
Reported-by: Zheyu Ma <[email protected]>
Signed-off-by: Letu Ren <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Move checking condition of VF MAC filter before clearing
or adding MAC filter to VF to prevent potential blackout caused
by removal of necessary and working VF's MAC filter.
Fixes: 1b8b062a99dc ("igb: add VF trust infrastructure")
Signed-off-by: Karen Sornek <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|