Age | Commit message (Collapse) | Author | Files | Lines |
|
No conflicts.
Signed-off-by: Paolo Abeni <[email protected]>
|
|
It appears that dpaa_enable_tx_csum() only calls skb_reset_mac_header()
to get to the VLAN header using skb_mac_header().
We can use skb_vlan_eth_hdr() to get to the VLAN header based on
skb->data directly. This avoids spending a few cycles to set
skb->mac_header.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Acked-by: Madalin Bucur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The return value is not initialized on the success path.
Fixes: 901bdff2f529 ("net: fman: Change return type of disable to void")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Madalin Bucur <[email protected]>
Reviewed-by: Sean Anderson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
These have been useful in debugging various problems related to frame
preemption, so make them available through ethtool --register-dump for
later too.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This was left as TODO in commit 01e23b2b3bad ("net: enetc: add support
for preemptible traffic classes") since it's relatively complicated.
Where this makes a difference is with a configuration as follows:
ethtool --set-mm eno0 pmac-enabled on tx-enabled on verify-enabled on
Preemptible packets should only be sent when the MAC Merge TX direction
becomes active (i.o.w. when the verification process succeeds, aka when
the link partner confirms it can process preemptible traffic). But the
tc qdisc with the preemptible traffic classes is offloaded completely
asynchronously w.r.t. the MM becoming active.
The ENETC manual does suggest that this should be handled in the driver:
"On startup, software should wait for the verification process to
complete (MMCSR[VSTS]=011) before initiating traffic".
Adding the necessary logic allows future selftests to uphold the claim
that an inactive or disabled MAC Merge layer should never send data
packets through the pMAC.
This change moves enetc_set_ptcfpr() from enetc.c to enetc_ethtool.c,
where its only caller is now - enetc_mm_commit_preemptible_tcs().
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The MMCSR register contains 2 fields with overlapping meaning:
- LPA (Local preemption active):
This read-only status bit indicates whether preemption is active for
this port. This bit will be set if preemption is both enabled and has
completed the verification process.
- TXSTS (Merge status):
This read-only status field provides the state of the MAC Merge sublayer
transmit status as defined in IEEE Std 802.3-2018 Clause 99.
00 Transmit preemption is inactive
01 Transmit preemption is active
10 Reserved
11 Reserved
However none of these 2 fields offer reliable reporting to software.
When connecting ENETC to a link partner which is not capable of Frame
Preemption, the expectation is that ENETC's verification should fail
(VSTS=4) and its MM TX direction should be inactive (LPA=0, TXSTS=00)
even though the MM TX is enabled (ME=1). But surprise, the LPA bit of
MMCSR stays set even if VSTS=4 and ME=1.
OTOH, the TXSTS field has the opposite problem. I cannot get its value
to change from 0, even when connecting to a link partner capable of
frame preemption, which does respond to its verification frames (ME=1
and VSTS=3, "SUCCEEDED").
The only option with such buggy hardware seems to be to reimplement the
formula for calculating tx-active in software, which is for tx-enabled
to be true, and for the verify-status to be either SUCCEEDED, or
DISABLED.
Without reliable tx-active reporting, we have no good indication when
to commit the preemptible traffic classes to hardware, which makes it
possible (but not desirable) to send preemptible traffic to a link
partner incapable of receiving it. However, currently we do not have the
logic to wait for TX to be active yet, so the impact is limited.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Current enetc_set_mm() is designed to set the priv->active_offloads bit
ENETC_F_QBU for enetc_mm_link_state_update() to act on, but if the link
is already up, it modifies the ENETC_MMCSR_ME ("Merge Enable") bit
directly.
The problem is that it only *sets* ENETC_MMCSR_ME if the link is up, it
doesn't *clear* it if needed. So subsequent enetc_get_mm() calls still
see tx-enabled as true, up until a link down event, which is when
enetc_mm_link_state_update() will get called.
This is not a functional issue as far as I can assess. It has only come
up because I'd like to uphold a simple API rule in core ethtool code:
the pMAC cannot be disabled if TX is going to be enabled. Currently,
the fact that TX remains enabled for longer than expected (after the
enetc_set_mm() call that disables it) is going to violate that rule,
which is how it was caught.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
PFs which support the MAC Merge layer also have a set of 8 registers
called "Port traffic class N frame preemption register (PTC0FPR - PTC7FPR)".
Through these, a traffic class (group of TX rings of same dequeue
priority) can be mapped to the eMAC or to the pMAC.
There's nothing particularly spectacular here. We should probably only
commit the preemptible TCs to hardware once the MAC Merge layer became
active, but unlike Felix, we don't have an IRQ that notifies us of that.
We'd have to sleep for up to verifyTime (127 ms) to wait for a
resolution coming from the verification state machine; not only from the
ndo_setup_tc() code path, but also from enetc_mm_link_state_update().
Since it's relatively complicated and has a relatively small benefit,
I'm not doing it.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ferenc Fejes <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
To gain access to the larger encapsulating structure which has the type
tc_mqprio_qopt_offload, rename just the "qopt" field as "qopt".
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ferenc Fejes <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Conflicts:
tools/testing/selftests/net/config
62199e3f1658 ("selftests: net: Add VXLAN MDB test")
3a0385be133e ("selftests: add the missing CONFIG_IP_SCTP in net config")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
I have observed an issue where the RX direction of the LS1028A ENETC pMAC
seems unresponsive. The minimal procedure to reproduce the issue is:
1. Connect ENETC port 0 with a loopback RJ45 cable to one of the Felix
switch ports (0).
2. Bring the ports up (MAC Merge layer is not enabled on either end).
3. Send a large quantity of unidirectional (express) traffic from Felix
to ENETC. I tried altering frame size and frame count, and it doesn't
appear to be specific to either of them, but rather, to the quantity
of octets received. Lowering the frame count, the minimum quantity of
packets to reproduce relatively consistently seems to be around 37000
frames at 1514 octets (w/o FCS) each.
4. Using ethtool --set-mm, enable the pMAC in the Felix and in the ENETC
ports, in both RX and TX directions, and with verification on both
ends.
5. Wait for verification to complete on both sides.
6. Configure a traffic class as preemptible on both ends.
7. Send some packets again.
The issue is at step 5, where the verification process of ENETC ends
(meaning that Felix responds with an SMD-R and ENETC sees the response),
but the verification process of Felix never ends (it remains VERIFYING).
If step 3 is skipped or if ENETC receives less traffic than
approximately that threshold, the test runs all the way through
(verification succeeds on both ends, preemptible traffic passes fine).
If, between step 4 and 5, the step below is also introduced:
4.1. Disable and re-enable PM0_COMMAND_CONFIG bit RX_EN
then again, the sequence of steps runs all the way through, and
verification succeeds, even if there was the previous RX traffic
injected into ENETC.
Traffic sent *by* the ENETC port prior to enabling the MAC Merge layer
does not seem to influence the verification result, only received
traffic does.
The LS1028A manual does not mention any relationship between
PM0_COMMAND_CONFIG and MMCSR, and the hardware people don't seem to
know for now either.
The bit that is toggled to work around the issue is also toggled
by enetc_mac_enable(), called from phylink's mac_link_down() and
mac_link_up() methods - which is how the workaround was found:
verification would work after a link down/up.
Fixes: c7b9e8086902 ("net: enetc: add support for MAC Merge layer")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
A number of MDIO drivers make use of devm_mdiobus_alloc_size(). This
is only available when CONFIG_MDIO_DEVRES is enabled. Add missing
depends or selects, depending on if there are circular dependencies or
not. This avoids linker errors, especially for randconfig builds.
Signed-off-by: Andrew Lunn <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Conflicts:
drivers/net/ethernet/google/gve/gve.h
3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts")
75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format")
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
Adjacent changes:
net/can/isotp.c
051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()")
96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Not all fec MDIO bus drivers support C45 mode transactions. The older fec
hardware block in many ColdFire SoCs does not appear to support them, at
least according to most of the different ColdFire SoC reference manuals.
The bits used to generate C45 access on the iMX parts, in the OP field
of the MMFR register, are documented as generating non-compliant MII
frames (it is not documented as to exactly how they are non-compliant).
Commit 8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions")
means the fec driver will always register c45 MDIO read and write
methods. During probe these will always be accessed now generating
non-compliant MII accesses on ColdFire based devices.
Add a quirk define, FEC_QUIRK_HAS_MDIO_C45, that can be used to
distinguish silicon that supports MDIO C45 framing or not. Add this to
all the existing iMX quirks, so they will be behave as they do now (*).
(*) it seems that some iMX parts may not support C45 transactions either.
The iMX25 and iMX50 Reference Manuals contain similar wording to
the ColdFire Reference Manuals on this.
Fixes: 8d03ad1ab0b0 ("net: fec: Separate C22 and C45 transactions")
Signed-off-by: Greg Ungerer <[email protected]>
Reviewed-by: Wei Fang <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
6e9d51b1a5cb ("net/mlx5e: Initialize link speed to zero")
1bffcea42926 ("net/mlx5e: Add devlink hairpin queues parameters")
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
Adjacent changes:
drivers/net/phy/phy.c
323fe43cf9ae ("net: phy: Improved PHY error reporting in state machine")
4203d84032e2 ("net: phy: Ensure state transitions are processed from phy_stop()")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The Autoneg bit in the advertising bitmap and state->an_enabled are
always identical. Thus, we will be removing state->an_enabled.
Use the Autoneg bit in the advertising bitmap to indicate whether
autonegotiation should be used, rather than using the an_enabled
member which will be going away. This means we use the same condition
as phylink_mii_c22_pcs_config().
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When running "ethtool -S eno0 --groups rmon" without an explicit "--src
emac|pmac" argument, the kernel will not report
rx-rmon-etherStatsPkts64to64Octets, rx-rmon-etherStatsPkts65to127Octets,
etc. This is because on ETHTOOL_MAC_STATS_SRC_AGGREGATE, we do not
populate the "ranges" argument.
ocelot_port_get_rmon_stats() does things differently and things work
there. I had forgotten to make sure that the code is structured the same
way in both drivers, so do that now.
Fixes: cf52bd238b75 ("net: enetc: add support for MAC Merge statistics counters")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties.
Convert reading boolean properties to of_property_read_bool().
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Marc Kleine-Budde <[email protected]> # for net/can
Acked-by: Kalle Valo <[email protected]>
Acked-by: Nicolas Ferre <[email protected]>
Acked-by: Francois Romieu <[email protected]>
Reviewed-by: Wei Fang <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"This features a bunch of new device support, a couple of new drivers,
yaml conversion and updates of a few drivers.
Core support:
- New devm_of_phy_optional_get() API with users and conversion
New hardware support:
- Mediatek MT7986 phy support
- Qualcomm SM8550 UFS, PCIe, combo phy support, SM6115 / SM4250 USB3
phy support, SM6350 combo phy support, SM6125 UFS PHY support amd
SM8350 & SM8450 combo phy support
- Qualcomm SNPS eUSB2 eUSB2 repeater drivers
- Allwinner F1C100s USB PHY support
- Tegra xusb support for Tegra234
Updates:
- Yaml conversion for Qualcomm pcie2 phy and usb-hsic-phy
- G4 mode support in Qualcomm UFS phy and support for various SoCs
- Yaml conversion for Meson usb2 phy
- TI Type C support for usb phy for j721
- Yaml conversion for Tegra xusb binding"
* tag 'phy-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (106 commits)
phy: qcom: phy-qcom-snps-eusb2: Add support for eUSB2 repeater
phy: qcom: Add QCOM SNPS eUSB2 repeater driver
dt-bindings: phy: qcom,snps-eusb2-phy: Add phys property for the repeater
dt-bindings: phy: Add qcom,snps-eusb2-repeater schema file
dt-bindings: phy: amlogic,g12a-usb3-pcie-phy: add missing optional phy-supply property
phy: rockchip-typec: Fix unsigned comparison with less than zero
phy: rockchip-typec: fix tcphy_get_mode error case
phy: qcom: snps-eusb2: Add missing headers
phy: qcom-qmp-combo: Add support for SM8550
phy: qcom-qmp: Add v6 DP register offsets
phy: qcom-qmp: pcs-usb: Add v6 register offsets
dt-bindings: phy: qcom,sc8280xp-qmp-usb43dp: Document SM8550 compatible
phy: qcom: Add QCOM SNPS eUSB2 driver
dt-bindings: phy: Add qcom,snps-eusb2-phy schema file
phy: qcom-qmp-pcie: Add support for SM8550 g3x2 and g4x2 PCIEs
phy: qcom-qmp: qserdes-lane-shared: Add v6 register offsets
phy: qcom-qmp: qserdes-txrx: Add v6.20 register offsets
phy: qcom-qmp: pcs-pcie: Add v6.20 register offsets
phy: qcom-qmp: pcs-pcie: Add v6 register offsets
phy: qcom-qmp: pcs: Add v6.20 register offsets
...
|
|
Do not always add NETDEV_XDP_ACT_XSK_ZEROCOPY bit in xdp_features flag
but check if the NIC really supports it.
Signed-off-by: Lorenzo Bianconi <[email protected]>
Reviewed-by: Larysa Zaremba <[email protected]>
Link: https://lore.kernel.org/r/3dba6ea42dc343a9f2d7d1a6a6a6c173235e1ebf.1676471386.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <[email protected]>
|
|
====================
pull-request: bpf-next 2023-02-11
We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).
There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e533d01 ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:
[...]
ice_set_netdev_features(netdev);
netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
NETDEV_XDP_ACT_XSK_ZEROCOPY;
ice_set_ops(netdev);
[...]
Stephen's merge conflict mail:
https://lore.kernel.org/bpf/[email protected]/
The main changes are:
1) Add support for BPF trampoline on s390x which finally allows to remove many
test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.
2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.
3) Add capability to export the XDP features supported by the NIC.
Along with that, add a XDP compliance test tool,
from Lorenzo Bianconi & Marek Majtyka.
4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
from David Vernet.
5) Add a deep dive documentation about the verifier's register
liveness tracking algorithm, from Eduard Zingerman.
6) Fix and follow-up cleanups for resolve_btfids to be compiled
as a host program to avoid cross compile issues,
from Jiri Olsa & Ian Rogers.
7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
when testing on different NICs, from Jesper Dangaard Brouer.
8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.
9) Extend libbpf to add an option for when the perf buffer should
wake up, from Jon Doron.
10) Follow-up fix on xdp_metadata selftest to just consume on TX
completion, from Stanislav Fomichev.
11) Extend the kfuncs.rst document with description on kfunc
lifecycle & stability expectations, from David Vernet.
12) Fix bpftool prog profile to skip attaching to offline CPUs,
from Tonghao Zhang.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add PF driver support for the following:
- Viewing the standardized MAC Merge layer counters.
- Viewing the standardized Ethernet MAC and RMON counters associated
with the pMAC.
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add PF driver support for viewing and changing the MAC Merge sublayer
parameters, and seeing the verification state machine's current state.
The verification handshake with the link partner is driven by hardware.
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We assume that the mqprio queue configuration from taprio has a simple
1:1 mapping between prio and traffic class, and one TX queue per TC.
That might not be the case. Actually parse and act upon the mqprio
config.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Regardless of the requested queue count per traffic class, the enetc
driver allocates a number of TX rings equal to the number of TCs, and
hardcodes a queue configuration of "1@0 1@1 ... 1@max-tc". Other
configurations are silently ignored and treated the same.
Improve that by allowing what the user requests to be actually
fulfilled. This allows more than one TX ring per traffic class.
For example:
$ tc qdisc add dev eno0 root handle 1: mqprio num_tc 4 \
map 0 0 1 1 2 2 3 3 queues 2@0 2@2 2@4 2@6
[ 146.267648] fsl_enetc 0000:00:00.0 eno0: TX ring 0 prio 0
[ 146.273451] fsl_enetc 0000:00:00.0 eno0: TX ring 1 prio 0
[ 146.283280] fsl_enetc 0000:00:00.0 eno0: TX ring 2 prio 1
[ 146.293987] fsl_enetc 0000:00:00.0 eno0: TX ring 3 prio 1
[ 146.300467] fsl_enetc 0000:00:00.0 eno0: TX ring 4 prio 2
[ 146.306866] fsl_enetc 0000:00:00.0 eno0: TX ring 5 prio 2
[ 146.313261] fsl_enetc 0000:00:00.0 eno0: TX ring 6 prio 3
[ 146.319622] fsl_enetc 0000:00:00.0 eno0: TX ring 7 prio 3
$ tc qdisc del dev eno0 root
[ 178.238418] fsl_enetc 0000:00:00.0 eno0: TX ring 0 prio 0
[ 178.244369] fsl_enetc 0000:00:00.0 eno0: TX ring 1 prio 0
[ 178.251486] fsl_enetc 0000:00:00.0 eno0: TX ring 2 prio 0
[ 178.258006] fsl_enetc 0000:00:00.0 eno0: TX ring 3 prio 0
[ 178.265038] fsl_enetc 0000:00:00.0 eno0: TX ring 4 prio 0
[ 178.271557] fsl_enetc 0000:00:00.0 eno0: TX ring 5 prio 0
[ 178.277910] fsl_enetc 0000:00:00.0 eno0: TX ring 6 prio 0
[ 178.284281] fsl_enetc 0000:00:00.0 eno0: TX ring 7 prio 0
$ tc qdisc add dev eno0 root handle 1: mqprio num_tc 8 \
map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 1
[ 186.113162] fsl_enetc 0000:00:00.0 eno0: TX ring 0 prio 0
[ 186.118764] fsl_enetc 0000:00:00.0 eno0: TX ring 1 prio 1
[ 186.124374] fsl_enetc 0000:00:00.0 eno0: TX ring 2 prio 2
[ 186.130765] fsl_enetc 0000:00:00.0 eno0: TX ring 3 prio 3
[ 186.136404] fsl_enetc 0000:00:00.0 eno0: TX ring 4 prio 4
[ 186.142049] fsl_enetc 0000:00:00.0 eno0: TX ring 5 prio 5
[ 186.147674] fsl_enetc 0000:00:00.0 eno0: TX ring 6 prio 6
[ 186.153305] fsl_enetc 0000:00:00.0 eno0: TX ring 7 prio 7
The driver used to set TC_MQPRIO_HW_OFFLOAD_TCS, near which there is
this comment in the UAPI header:
TC_MQPRIO_HW_OFFLOAD_TCS, /* offload TCs, no queue counts */
which is what enetc was doing up until now (and no longer is; we offload
queue counts too), remove that assignment.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The enetc driver does not validate the mqprio queue configuration, so it
currently allows things like this:
$ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
map 0 1 2 3 4 5 6 7 queues 3@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 1
But also things like this, completely omitting the queue configuration:
$ tc qdisc add dev eno0 root handle 1: mqprio num_tc 8 \
map 0 1 2 3 4 5 6 7 hw 1
By requesting validation via the mqprio capability structure, this is no
longer allowed, and we bring what is accepted by hardware in line with
what is accepted by software.
The check that num_tc <= real_num_tx_queues also becomes superfluous and
can be dropped, because mqprio_validate_queue_counts() validates that no
TXQ range exceeds real_num_tx_queues. That is a stronger check, because
there is at least 1 TXQ per TC, so there are at least as many TXQs as TCs.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently it can happen that an mqprio qdisc is installed with num_tc 8,
and this will reserve 8 (out of 8) TXQs for the network stack. Then we
can attach an XDP program, and this will crop 2 TXQs, leaving just 6 for
mqprio. That's not what the user requested, and we should fail it.
On the other hand, if mqprio isn't requested, we still give the 8 TXQs
to the network stack (with hashing among a single traffic class), but
then, cropping 2 TXQs for XDP is fine, because the user didn't
explicitly ask for any number of TXQs, so no expectations are violated.
Simply put, the logic that mqprio should impose a minimum number of TXQs
for the network never existed. Let's say (more or less arbitrarily) that
without mqprio, the driver expects a minimum number of TXQs equal to the
number of CPUs (on NXP LS1028A, that is either 1, or 2). And with mqprio,
mqprio gives the minimum required number of TXQs.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Since the blamed net-next commit, enetc_setup_xdp_prog() no longer goes
through enetc_open(), and therefore, the function which was supposed to
detect whether a BPF program exists (in order to crop some TX queues
from network stack usage), enetc_num_stack_tx_queues(), no longer gets
called.
We can move the netif_set_real_num_rx_queues() call to enetc_alloc_msix()
(probe time), since it is a runtime invariant. We can do the same thing
with netif_set_real_num_tx_queues(), and let enetc_reconfigure_xdp_cb()
explicitly recalculate and change the number of stack TX queues.
Fixes: c33bfaf91c4c ("net: enetc: set up XDP program under enetc_reconfigure()")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
enetc_reconfigure() was modified in commit c33bfaf91c4c ("net: enetc:
set up XDP program under enetc_reconfigure()") to take an optional
callback that runs while the netdev is down, but this callback currently
cannot fail.
Code up the error handling so that the interface is restarted with the
old resources if the callback fails.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We keep a pointer to the xdp_prog in the private netdev structure as
well; what's replicated per RX ring is done so just for more convenient
access from the NAPI poll procedure.
Simplify enetc_num_stack_tx_queues() by looking at priv->xdp_prog rather
than iterating through the information replicated per RX ring.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Use the new devm_of_phy_optional_get() helper instead of open-coding the
same operation.
As devm_of_phy_optional_get() returns NULL if either the PHY cannot be
found, or if support for the PHY framework is not enabled, it is no
longer needed to check for -ENODEV or -ENOSYS.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Sean Anderson <[email protected]>
Link: https://lore.kernel.org/r/f2d801cd73cca36a7162819289480d7fc91fcc7e.1674584626.git.geert+renesas@glider.be
Signed-off-by: Vinod Koul <[email protected]>
|
|
Conversion to gpiod API done in commit 468ba54bd616 ("fec: convert
to gpio descriptor") clashed with gpiolib applying the same quirk to the
reset GPIO polarity (introduced in commit b02c85c9458c). This results in
the reset line being left active/device being left in reset state when
reset line is "active low".
Remove handling of 'phy-reset-active-high' property from the driver and
rely on gpiolib to apply needed adjustments to avoid ending up with the
double inversion/flipped logic.
Fixes: 468ba54bd616 ("fec: convert to gpio descriptor")
Signed-off-by: Dmitry Torokhov <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Conversion of the driver to gpiod API done in 468ba54bd616 ("fec:
convert to gpio descriptor") incorrectly made reset line mandatory and
resulted in aborting driver probe in cases where reset line was not
specified (note: this way of specifying PHY reset line is actually
deprecated).
Switch to using devm_gpiod_get_optional() and skip manipulating reset
line if it can not be located.
Fixes: 468ba54bd616 ("fec: convert to gpio descriptor")
Signed-off-by: Dmitry Torokhov <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reported-by: Marc Kleine-Budde <[email protected]>
Tested-by: Marc Kleine-Budde <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
A summary of the flags being set for various drivers is given below.
Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features
that can be turned off and on at runtime. This means that these flags
may be set and unset under RTNL lock protection by the driver. Hence,
READ_ONCE must be used by code loading the flag value.
Also, these flags are not used for synchronization against the availability
of XDP resources on a device. It is merely a hint, and hence the read
may race with the actual teardown of XDP resources on the device. This
may change in the future, e.g. operations taking a reference on the XDP
resources of the driver, and in turn inhibiting turning off this flag.
However, for now, it can only be used as a hint to check whether device
supports becoming a redirection target.
Turn 'hw-offload' feature flag on for:
- netronome (nfp)
- netdevsim.
Turn 'native' and 'zerocopy' features flags on for:
- intel (i40e, ice, ixgbe, igc)
- mellanox (mlx5).
- stmmac
- netronome (nfp)
Turn 'native' features flags on for:
- amazon (ena)
- broadcom (bnxt)
- freescale (dpaa, dpaa2, enetc)
- funeth
- intel (igb)
- marvell (mvneta, mvpp2, octeontx2)
- mellanox (mlx4)
- mtk_eth_soc
- qlogic (qede)
- sfc
- socionext (netsec)
- ti (cpsw)
- tap
- tsnep
- veth
- xen
- virtio_net.
Turn 'basic' (tx, pass, aborted and drop) features flags on for:
- netronome (nfp)
- cavium (thunder)
- hyperv.
Turn 'redirect_target' feature flag on for:
- amanzon (ena)
- broadcom (bnxt)
- freescale (dpaa, dpaa2)
- intel (i40e, ice, igb, ixgbe)
- ti (cpsw)
- marvell (mvneta, mvpp2)
- sfc
- socionext (netsec)
- qlogic (qede)
- mellanox (mlx5)
- tap
- veth
- virtio_net
- xen
Reviewed-by: Gerhard Engleder <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Co-developed-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Co-developed-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Marek Majtyka <[email protected]>
Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
net/core/gro.c
7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO")
b1a78b9b9886 ("net: add support for ipv4 big tcp")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When memory allocation fails in lynx_pcs_create() and it returns NULL,
there remains a dangling reference to the mdiodev returned by
of_mdio_find_device() which is leaked as soon as memac_pcs_create()
returns empty-handed.
Fixes: a7c2a32e7f22 ("net: fman: memac: Use lynx pcs driver")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Sean Anderson <[email protected]>
Acked-by: Madalin Bucur <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The driver can be trivially converted, as it only triggers the gpio
pin briefly to do a reset, and it already only supports DT.
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Conflicts:
drivers/net/ethernet/intel/ice/ice_main.c
418e53401e47 ("ice: move devlink port creation/deletion")
643ef23bd9dd ("ice: Introduce local var for readability")
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
drivers/net/ethernet/engleder/tsnep_main.c
3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues")
25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/[email protected]/
net/netfilter/nf_conntrack_proto_sctp.c
13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths")
f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.
Fixes: d678be1dc1ec ("dpaa2-eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.
Fixes: a1e031ffb422 ("dpaa_eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Acked-by: Camelia Groza <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The pMAC (ENETC_PFPMR_PMACE) is probably unconditionally enabled in the
enetc driver to allow RX of preemptible packets and not see them as
error frames. I don't know why TX preemption (ENETC_MMCSR_ME) is enabled
though. With no way to say which traffic classes are preemptible (all
are express by default), no preemptible frames would be transmitted
anyway.
Lastly, it may have been believed that the register write lock-step mode
(now deleted) needed the pMAC to be enabled at all times. I don't know
if that's true. However, I've checked that driver writes to PM1
registers do propagate through to the ENETC IP even when the pMAC is
disabled.
With such incomplete support for frame preemption, it's best to just
remove whatever exists right now and come with something more coherent
later.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently the enetc driver duplicates its writes to the PM0 registers
also to PM1, but it doesn't do this consistently - for example we write
to ENETC_PM0_MAXFRM but not to ENETC_PM1_MAXFRM.
Create enetc_port_mac_wr() which writes both the PM0 and PM1 register
with the same value (if frame preemption is supported on this port).
Also create enetc_port_mac_rd() which reads from PM0 - the assumption
being that PM1 contains just the same value.
This will be necessary when we enable the MAC Merge layer properly, and
the pMAC becomes operational.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The MWLM bit (MAC write lock-step mode) allows register writes to the
pMAC to be auto-performed whenever the corresponding eMAC register is
written by the driver. This allows their configuration to remain
in sync.
The driver has set this bit since the initial commit, but it doesn't do
anything, since the hardware feature doesn't work (and the bit has been
removed from more recent versions of the documentation).
The driver does attempt, more or less, to keep those MAC registers in
sync by writing the same value once to e.g. ENETC_PM0_CMD_CFG (eMAC) and
once to ENETC_PM1_CMD_CFG (pMAC). Because the lockstep feature doesn't
work, that's what it will stick to.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This is a preliminary patch which replaces the hardcoded 0x1000 present
in other PM1 (port MAC 1, aka pMAC) register definitions, which is an
offset to the PM0 (port MAC 0, aka eMAC) equivalent register.
This definition will be used in more places by future code.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Similar to other TSN features, query the Station Interface capability
register to see whether preemption is supported on this port or not.
On LS1028A, preemption is available on ports 0 and 2, but not on 1
and 3.
This will allow us in the future to write the pMAC registers only on the
ENETC ports where a pMAC actually exists.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The build system is complaining about the following:
enetc.o is added to multiple modules: fsl-enetc fsl-enetc-vf
enetc_cbdr.o is added to multiple modules: fsl-enetc fsl-enetc-vf
enetc_ethtool.o is added to multiple modules: fsl-enetc fsl-enetc-vf
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The page_pool_release_page was used when freeing rx buffers, and this
function just unmaps the page (if mapped) and does not recycle the page.
So after hundreds of down/up the eth0, the system will out of memory.
For more details, please refer to the following reproduce steps and
bug logs. To solve this issue and refer to the doc of page pool, the
page_pool_put_full_page should be used to replace page_pool_release_page.
Because this API will try to recycle the page if the page refcnt equal to
1. After testing 20000 times, the issue can not be reproduced anymore
(about testing 391 times the issue will occur on i.MX8MN-EVK before).
Reproduce steps:
Create the test script and run the script. The script content is as
follows:
LOOPS=20000
i=1
while [ $i -le $LOOPS ]
do
echo "TINFO:ENET $curface up and down test $i times"
org_macaddr=$(cat /sys/class/net/eth0/address)
ifconfig eth0 down
ifconfig eth0 hw ether $org_macaddr up
i=$(expr $i + 1)
done
sleep 5
if cat /sys/class/net/eth0/operstate | grep 'up';then
echo "TEST PASS"
else
echo "TEST FAIL"
fi
Bug detail logs:
TINFO:ENET up and down test 391 times
[ 850.471205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 853.535318] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 853.541694] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 870.590531] page_pool_release_retry() stalled pool shutdown 199 inflight 60 sec
[ 931.006557] page_pool_release_retry() stalled pool shutdown 199 inflight 120 sec
TINFO:ENET up and down test 392 times
[ 991.426544] page_pool_release_retry() stalled pool shutdown 192 inflight 181 sec
[ 1051.838531] page_pool_release_retry() stalled pool shutdown 170 inflight 241 sec
[ 1093.751217] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 1096.446520] page_pool_release_retry() stalled pool shutdown 308 inflight 60 sec
[ 1096.831245] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1096.839092] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1112.254526] page_pool_release_retry() stalled pool shutdown 103 inflight 302 sec
[ 1156.862533] page_pool_release_retry() stalled pool shutdown 308 inflight 120 sec
[ 1172.674516] page_pool_release_retry() stalled pool shutdown 103 inflight 362 sec
[ 1217.278532] page_pool_release_retry() stalled pool shutdown 308 inflight 181 sec
TINFO:ENET up and down test 393 times
[ 1233.086535] page_pool_release_retry() stalled pool shutdown 103 inflight 422 sec
[ 1277.698513] page_pool_release_retry() stalled pool shutdown 308 inflight 241 sec
[ 1293.502525] page_pool_release_retry() stalled pool shutdown 86 inflight 483 sec
[ 1338.110518] page_pool_release_retry() stalled pool shutdown 308 inflight 302 sec
[ 1353.918540] page_pool_release_retry() stalled pool shutdown 32 inflight 543 sec
[ 1361.179205] Qualcomm Atheros AR8031/AR8033 30be0000.ethernet-1:00: attached PHY driver (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)
[ 1364.255298] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1364.263189] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1371.998532] page_pool_release_retry() stalled pool shutdown 310 inflight 60 sec
[ 1398.530542] page_pool_release_retry() stalled pool shutdown 308 inflight 362 sec
[ 1414.334539] page_pool_release_retry() stalled pool shutdown 16 inflight 604 sec
[ 1432.414520] page_pool_release_retry() stalled pool shutdown 310 inflight 120 sec
[ 1458.942523] page_pool_release_retry() stalled pool shutdown 308 inflight 422 sec
[ 1474.750521] page_pool_release_retry() stalled pool shutdown 16 inflight 664 sec
TINFO:ENET up and down test 394 times
[ 1492.830522] page_pool_release_retry() stalled pool shutdown 310 inflight 181 sec
[ 1519.358519] page_pool_release_retry() stalled pool shutdown 308 inflight 483 sec
[ 1535.166545] page_pool_release_retry() stalled pool shutdown 2 inflight 724 sec
[ 1537.090278] eth_test2.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
[ 1537.101192] CPU: 3 PID: 2379 Comm: eth_test2.sh Tainted: G C 6.1.1+g56321e101aca #1
[ 1537.110249] Hardware name: NXP i.MX8MNano EVK board (DT)
[ 1537.115561] Call trace:
[ 1537.118005] dump_backtrace.part.0+0xe0/0xf0
[ 1537.122289] show_stack+0x18/0x40
[ 1537.125608] dump_stack_lvl+0x64/0x80
[ 1537.129276] dump_stack+0x18/0x34
[ 1537.132592] dump_header+0x44/0x208
[ 1537.136083] oom_kill_process+0x2b4/0x2c0
[ 1537.140097] out_of_memory+0xe4/0x594
[ 1537.143766] __alloc_pages+0xb68/0xd00
[ 1537.147521] alloc_pages+0xac/0x160
[ 1537.151013] __get_free_pages+0x14/0x40
[ 1537.154851] pgd_alloc+0x1c/0x30
[ 1537.158082] mm_init+0xf8/0x1d0
[ 1537.161228] mm_alloc+0x48/0x60
[ 1537.164368] alloc_bprm+0x7c/0x240
[ 1537.167777] do_execveat_common.isra.0+0x70/0x240
[ 1537.172486] __arm64_sys_execve+0x40/0x54
[ 1537.176502] invoke_syscall+0x48/0x114
[ 1537.180255] el0_svc_common.constprop.0+0xcc/0xec
[ 1537.184964] do_el0_svc+0x2c/0xd0
[ 1537.188280] el0_svc+0x2c/0x84
[ 1537.191340] el0t_64_sync_handler+0xf4/0x120
[ 1537.195613] el0t_64_sync+0x18c/0x190
[ 1537.199334] Mem-Info:
[ 1537.201620] active_anon:342 inactive_anon:10343 isolated_anon:0
[ 1537.201620] active_file:54 inactive_file:112 isolated_file:0
[ 1537.201620] unevictable:0 dirty:0 writeback:0
[ 1537.201620] slab_reclaimable:2620 slab_unreclaimable:7076
[ 1537.201620] mapped:1489 shmem:2473 pagetables:466
[ 1537.201620] sec_pagetables:0 bounce:0
[ 1537.201620] kernel_misc_reclaimable:0
[ 1537.201620] free:136672 free_pcp:96 free_cma:129241
[ 1537.240419] Node 0 active_anon:1368kB inactive_anon:41372kB active_file:216kB inactive_file:5052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB s
[ 1537.271422] Node 0 DMA free:541636kB boost:0kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB active_anon:1368kB inactive_anon:41372kB actiB
[ 1537.300219] lowmem_reserve[]: 0 0 0 0
[ 1537.303929] Node 0 DMA: 1015*4kB (UMEC) 743*8kB (UMEC) 417*16kB (UMEC) 235*32kB (UMEC) 116*64kB (UMEC) 25*128kB (UMEC) 4*256kB (UC) 2*512kB (UC) 0*1024kBB
[ 1537.323938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1537.332708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
[ 1537.341292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1537.349776] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
[ 1537.358087] 2939 total pagecache pages
[ 1537.361876] 0 pages in swap cache
[ 1537.365229] Free swap = 0kB
[ 1537.368147] Total swap = 0kB
[ 1537.371065] 516096 pages RAM
[ 1537.373959] 0 pages HighMem/MovableOnly
[ 1537.377834] 17302 pages reserved
[ 1537.381103] 163840 pages cma reserved
[ 1537.384809] 0 pages hwpoisoned
[ 1537.387902] Tasks state (memory values in pages):
[ 1537.392652] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 1537.401356] [ 201] 993 201 1130 72 45056 0 0 rpcbind
[ 1537.409772] [ 202] 0 202 4529 1640 77824 0 -250 systemd-journal
[ 1537.418861] [ 222] 0 222 4691 801 69632 0 -1000 systemd-udevd
[ 1537.427787] [ 248] 994 248 20914 130 65536 0 0 systemd-timesyn
[ 1537.436884] [ 497] 0 497 620 31 49152 0 0 atd
[ 1537.444938] [ 500] 0 500 854 77 53248 0 0 crond
[ 1537.453165] [ 503] 997 503 1470 160 49152 0 -900 dbus-daemon
[ 1537.461908] [ 505] 0 505 633 24 40960 0 0 firmwared
[ 1537.470491] [ 513] 0 513 2507 180 61440 0 0 ofonod
[ 1537.478800] [ 514] 990 514 69640 137 81920 0 0 parsec
[ 1537.487120] [ 533] 0 533 599 39 40960 0 0 syslogd
[ 1537.495518] [ 534] 0 534 4546 148 65536 0 0 systemd-logind
[ 1537.504560] [ 535] 0 535 690 24 45056 0 0 tee-supplicant
[ 1537.513564] [ 540] 996 540 2769 168 61440 0 0 systemd-network
[ 1537.522680] [ 566] 0 566 3878 228 77824 0 0 connmand
[ 1537.531168] [ 645] 998 645 1538 133 57344 0 0 avahi-daemon
[ 1537.540004] [ 646] 998 646 1461 64 57344 0 0 avahi-daemon
[ 1537.548846] [ 648] 992 648 781 41 45056 0 0 rpc.statd
[ 1537.557415] [ 650] 64371 650 590 23 45056 0 0 ninfod
[ 1537.565754] [ 653] 61563 653 555 24 45056 0 0 rdisc
[ 1537.573971] [ 655] 0 655 374569 2999 290816 0 -999 containerd
[ 1537.582621] [ 658] 0 658 1311 20 49152 0 0 agetty
[ 1537.590922] [ 663] 0 663 1529 97 49152 0 0 login
[ 1537.599138] [ 666] 0 666 3430 202 69632 0 0 wpa_supplicant
[ 1537.608147] [ 667] 0 667 2344 96 61440 0 0 systemd-userdbd
[ 1537.617240] [ 677] 0 677 2964 314 65536 0 100 systemd
[ 1537.625651] [ 679] 0 679 3720 646 73728 0 100 (sd-pam)
[ 1537.634138] [ 687] 0 687 1289 403 45056 0 0 sh
[ 1537.642108] [ 789] 0 789 970 93 45056 0 0 eth_test2.sh
[ 1537.650955] [ 2355] 0 2355 2346 94 61440 0 0 systemd-userwor
[ 1537.660046] [ 2356] 0 2356 2346 94 61440 0 0 systemd-userwor
[ 1537.669137] [ 2358] 0 2358 2346 95 57344 0 0 systemd-userwor
[ 1537.678258] [ 2379] 0 2379 970 93 45056 0 0 eth_test2.sh
[ 1537.687098] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/[email protected],tas0
[ 1537.703009] Out of memory: Killed process 679 ((sd-pam)) total-vm:14880kB, anon-rss:2584kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_ad0
[ 1553.246526] page_pool_release_retry() stalled pool shutdown 310 inflight 241 sec
Fixes: 95698ff6177b ("net: fec: using page pool to manage RX buffers")
Signed-off-by: Wei Fang <[email protected]>
Reviewed-by: shenwei wang <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
drivers/net/ipa/ipa_interrupt.c
drivers/net/ipa/ipa_interrupt.h
9ec9b2a30853 ("net: ipa: disable ipa interrupt during suspend")
8e461e1f092b ("net: ipa: introduce ipa_interrupt_enable()")
d50ed3558719 ("net: ipa: enable IPA interrupt handlers separate from registration")
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Deciding if to probe of PHYs using C45 is now determine by if the bus
provides the C45 read method. This makes probe_capabilities redundant
so remove it.
Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: Michael Walle <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Acked-by: Andrew Jeffery <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
napi_synchronize() from enetc_stop() waits until the softirq has
finished execution and no longer wants to be rescheduled. However under
high traffic load, this will never happen, and the interface can never
be closed.
The problem is the fact that the NAPI poll routine is written to update
the consumer index which makes the device want to put more buffers in
the RX ring, which restarts the madness again.
Browsing around, it seems that some drivers like i40e keep a bit
(__I40E_VSI_DOWN) which they use as communication between the control
path and the data path. But that isn't my first choice, because
complications ensue - since the enetc hardirq may trigger while we are
in a theoretical ENETC_DOWN state, it may happen that enetc_msix() masks
it, but enetc_poll() never unmasks it. To prevent a stall in that case,
one would need to schedule all NAPI instances when ENETC_DOWN gets
cleared, to process what's pending.
I find it more desirable for the control path - enetc_stop() - to just
quiesce the RX ring and let the softirq finish what remains there,
without any explicit communication, just by making hardware not provide
any more packets.
This seems possible with the Enable bit of the RX BD ring (RBaMR[EN]).
I can't seem to find an exact definition of what this bit does, but when
the RX ring is disabled, the port seems to no longer update the producer
index, and not react to software updates of the consumer index.
In fact, the RBaMR[EN] bit is already toggled by the driver, but too
late for what we want:
enetc_close()
-> enetc_stop()
-> napi_synchronize()
-> enetc_clear_bdrs()
-> enetc_clear_rxbdr()
The enetc_clear_bdrs() function contains not only logic to disable the
RX and TX rings, but also logic to wait for the TX ring stop being busy.
We split enetc_clear_bdrs() into enetc_disable_bdrs() and
enetc_wait_bdrs(). One needs to run before napi_synchronize() and the
other after (NAPI also processes TX completions, so we maximize our
chances of not waiting for the ENETC_TBSR_BUSY bit - unless a packet is
stuck for some reason, ofc).
We also split off enetc_enable_bdrs() from enetc_setup_bdrs(), and call
this from the mirror position in enetc_start() compared to enetc_stop(),
i.e. right before netif_tx_start_all_queues().
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|