Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2024-08-29
HW-Managed Flow Steering in mlx5 driver
Yevgeny Kliteynik says:
=======================
1. Overview
-----------
ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.
The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.
Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).
In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.
The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
- HW calculates CRC for each entry
- HW handles tree hash collisions
- HW & FW manage objects refcount
+ HW keeps cache coherency:
- HW provides tree access locking and synchronization
- HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
- Dedicated HW components that handle insertion
2. Performance
--------------
Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.
3. Configuration
----------------
The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:
# devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs
4. Upstream Submission
----------------------
HWS support consists of 3 main components:
+ Steering:
- The lower layer that exposes HWS API to upper layers and implements
all the management of flow steering building blocks
+ FS-Core
- Implementation of fs_hws layer to enable fs_core to use HWS instead
of FW or SW steering
- Create HW steering action pools to utilize the ability of HWS to
share steering actions among different rules
- Add support for configuring HWS mode through devlink command,
similar to configuring SWS mode
+ Connection Tracking
- Implementation of CT support for HW steering
- Hooks up the CT ops for the new steering mode and uses the HWS API
to implement connection tracking.
Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.
5. Patches in this series
-------------------------
This patch series contains implementation of the first bullet from above.
=======================
* tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: HWS, added API and enabled HWS support
net/mlx5: HWS, added send engine and context handling
net/mlx5: HWS, added debug dump and internal headers
net/mlx5: HWS, added backward-compatible API handling
net/mlx5: HWS, added memory management handling
net/mlx5: HWS, added vport handling
net/mlx5: HWS, added modify header pattern and args handling
net/mlx5: HWS, added FW commands handling
net/mlx5: HWS, added matchers functionality
net/mlx5: HWS, added definers handling
net/mlx5: HWS, added rules handling
net/mlx5: HWS, added tables handling
net/mlx5: HWS, added actions handling
net/mlx5: Added missing definitions in preparation for HW Steering
net/mlx5: Added missing mlx5_ifc definition for HW Steering
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-09-10
1) Remove an unneeded WARN_ON on packet offload.
From Patrisious Haddad.
2) Add a copy from skb_seq_state to buffer function.
This is needed for the upcomming IPTFS patchset.
From Christian Hopps.
3) Spelling fix in xfrm.h.
From Simon Horman.
4) Speed up xfrm policy insertions.
From Florian Westphal.
5) Add and revert a patch to support xfrm interfaces
for packet offload. This patch was just half cooked.
6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
From Florian Westphal.
7) Update comments on sdb and xfrm_policy.
From Florian Westphal.
8) Fix a null pointer dereference in the new policy insertion
code From Florian Westphal.
9) Fix an uninitialized variable in the new policy insertion
code. From Nathan Chancellor.
* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
xfrm: policy: fix null dereference
Revert "xfrm: add SA information to the offloaded packet"
xfrm: minor update to sdb and xfrm_policy comments
xfrm: policy: use recently added helper in more places
xfrm: add SA information to the offloaded packet
xfrm: policy: remove remaining use of inexact list
xfrm: switch migrate to xfrm_policy_lookup_bytype
xfrm: policy: don't iterate inexact policies twice at insert time
selftests: add xfrm policy insertion speed test script
xfrm: Correct spelling in xfrm.h
net: add copy from skb_seq_state to buffer function
xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Michael Chan says:
====================
bnxt_en: MSIX improvements
This patchset makes some improvements related to MSIX. The first
patch adjusts the default MSIX vectors assigned for RoCE. On the
PF, the number of MSIX is increased to 64 from the current 9. The
second patch allocates additional MSIX vectors ahead of time when
changing ethtool channels if dynamic MSIX is supported. The 3rd
patch makes sure that the IRQ name is not truncated.
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The name field of struct bnxt_irq is written using snprintf in
bnxt_setup_msix(). Make the field large enough to fit the maximal
formatted string to prevent truncation. Truncated IRQ names are
less meaningful to the user. For example, "enp4s0f0np0-TxRx-0"
gets truncated to "enp4s0f0np0-TxRx-" with the existing code.
Make sure we have space for the extra characters added to the IRQ
names:
- the characters introduced by the static format string: hyphens
- the maximal static substituted ring type string: "TxRx"
- the maximum length of an integer formatted as a string, even
though reasonable ring numbers would never be as long as this.
Signed-off-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
bnxt_check_rings() is called to ensure that we have the hardware ring
resources before committing to reinitialize with the new number of
rings. MSIX vectors are never checked at this point, because up
until recently we must first disable MSIX before we can allocate the
new set of MSIX vectors.
Now that we support dynamic MSIX allocation, check to make sure we
can dynamically allocate the new MSIX vectors as the last step in
bnxt_check_rings() if dynamic MSIX is supported.
For example, the IOMMU group may limit the number of MSIX vectors
for the device. With this patch, the ring change will fail more
gracefully when there is not enough MSIX vectors.
It is also better to move bnxt_check_rings() to be called as the last
step when changing ethtool rings.
Reviewed-by: Somnath Kotur <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
If RocE is supported on the device, set the number of RoCE MSIX vectors
to the number of online CPUs + 1 and capped at these maximums:
VF: 2
NPAR: 5
PF: 64
For the PF, the maximum is now increased from the previous value
of 9 to get better performance for kernel applications.
Remove the unnecessary check for BNXT_FLAG_ROCE_CAP.
bnxt_set_dflt_ulp_msix() will only be called if the flag is set.
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The "amlogic,tx-delay-ns" property schema has unnecessary type reference
as it's a standard unit suffix, and the constraints are in freeform
text rather than schema.
Signed-off-by: Rob Herring (Arm) <[email protected]>
Reviewed-by: Martin Blumenstingl <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Sean Anderson says:
====================
net: xilinx: axienet: Partial checksum offload improvements
Partial checksum offload is not always used when it could be.
Enable it in more cases.
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The partial rx checksum feature computes a checksum over the entire
packet, regardless of the L3 protocol. Remove the check for IPv4.
Additionally, testing with csum.py (from kselftests) shows no anomalies
with 64-byte packets, so we can remove that check as well.
Signed-off-by: Sean Anderson <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When it is supported by hardware, we enable receive checksum offload
unconditionally. Update features to reflect this.
Signed-off-by: Sean Anderson <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Partial tx chechsumming is completely generic and does not depend on the
L3/L4 protocol. Signal this to the net subsystem by enabling the
more-generic offload feature (instead of restricting ourselves to
TCP/UDP over IPv4 checksumming only like is necessary with full
checksumming).
Signed-off-by: Sean Anderson <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
These variables are set but never used. Remove them.
Signed-off-by: Sean Anderson <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Radhey Shyam Pandey <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There is a spelling mistake in the struct field tx_underun, rename
it to tx_underrun.
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There is a spelling mistake in the struct field tx_underun, rename
it to tx_underrun.
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Heiner Kallweit <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
sch_cake uses a cache of the first 16 values of the inverse square root
calculation for the Cobalt AQM to save some cycles on the fast path.
This cache is populated when the qdisc is first loaded, but there's
really no reason why it can't just be pre-populated. So change it to be
pre-populated with constants, which also makes it possible to constify
it.
This gives a modest space saving for the module (not counting debug data):
.text: -224 bytes
.rodata: +80 bytes
.bss: -64 bytes
Total: -192 bytes
Signed-off-by: Dave Taht <[email protected]>
[ fixed up comment, rewrote commit message ]
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Remove magic number 7 by introducing a GENMASK macro instead.
Remove magic number 0x80 by using the BIT macro instead.
Signed-off-by: Pieter Van Trappen <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
'net-timestamp-introduce-a-flag-to-filter-out-rx-software-and-hardware-report'
Jason Xing says:
====================
net-timestamp: introduce a flag to filter out rx software and hardware report
When one socket is set SOF_TIMESTAMPING_RX_SOFTWARE which means the
whole system turns on the netstamp_needed_key button, other sockets
that only have SOF_TIMESTAMPING_SOFTWARE will be affected and then
print the rx timestamp information even without setting
SOF_TIMESTAMPING_RX_SOFTWARE generation flag.
How to solve it without breaking users?
We introduce a new flag named SOF_TIMESTAMPING_OPT_RX_FILTER. Using
it together with SOF_TIMESTAMPING_SOFTWARE can stop reporting the
rx software timestamp.
Similarly, we also filter out the hardware case where one process
enables the rx hardware generation flag, then another process only
passing SOF_TIMESTAMPING_RAW_HARDWARE gets the timestamp. So we can set
both SOF_TIMESTAMPING_RAW_HARDWARE and SOF_TIMESTAMPING_OPT_RX_FILTER
to stop reporting rx hardware timestamp after this patch applied.
v6: https://lore.kernel.org/[email protected]
v5: https://lore.kernel.org/[email protected]
v4: https://lore.kernel.org/[email protected]
v3: https://lore.kernel.org/[email protected]
v2: https://lore.kernel.org/[email protected]
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Test a few possible cases where we use SOF_TIMESTAMPING_OPT_RX_FILTER
with software or hardware report/generation flag.
Signed-off-by: Jason Xing <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive
path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter
out rx software timestamp report, especially after a process turns on
netstamp_needed_key which can time stamp every incoming skb.
Previously, we found out if an application starts first which turns on
netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE
could also get rx timestamp. Now we handle this case by introducing this
new flag without breaking users.
Quoting Willem to explain why we need the flag:
"why a process would want to request software timestamp reporting, but
not receive software timestamp generation. The only use I see is when
the application does request
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE."
Similarly, this new flag could also be used for hardware case where we
can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive
hardware receive timestamp.
Another thing about errqueue in this patch I have a few words to say:
In this case, we need to handle the egress path carefully, or else
reporting the tx timestamp will fail. Egress path and ingress path will
finally call sock_recv_timestamp(). We have to distinguish them.
Errqueue is a good indicator to reflect the flow direction.
Suggested-by: Willem de Bruijn <[email protected]>
Signed-off-by: Jason Xing <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
SOF_TIMESTAMPING_RAW_HARDWARE is a report flag which passes the
timestamps generated by either SOF_TIMESTAMPING_TX_HARDWARE or
SOF_TIMESTAMPING_RX_HARDWARE to the userspace all the time.
So let us revise the doc here.
Link: https://lore.kernel.org/all/[email protected]/
Suggested-by: Willem de Bruijn <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Jason Xing <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Furong Xu says:
====================
net: stmmac: FPE via ethtool + tc
Move the Frame Preemption(FPE) over to the new standard API which uses
ethtool-mm/tc-mqprio/tc-taprio.
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
ethtool --show-mm can get real-time state of FPE.
fpe_irq_status logs should keep quiet.
tc-taprio can always query driver state, delete unbalanced logs.
Signed-off-by: Furong Xu <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://patch.msgid.link/39943d7967f291674a97ef0572878aca273087e9.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
tc-taprio can select whether traffic classes are express or preemptible.
0) tc qdisc add dev eth1 parent root handle 100 taprio \
num_tc 4 \
map 0 1 2 3 2 2 2 2 2 2 2 2 2 2 2 3 \
queues 1@0 1@1 1@2 1@3 \
base-time 1000000000 \
sched-entry S 03 10000000 \
sched-entry S 0e 10000000 \
flags 0x2 fp P E E E
1) After some traffic tests, MAC merge layer statistics are all good.
Local device:
[ {
"ifname": "eth1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 0,
"MACMergeFragCountRx": 0,
"MACMergeFragCountTx": 17837,
"MACMergeHoldCount": 18639
}
} ]
Remote device:
[ {
"ifname": "end1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 17189,
"MACMergeFragCountRx": 17837,
"MACMergeFragCountTx": 0,
"MACMergeHoldCount": 0
}
} ]
Tested on DWMAC CORE 5.10a
Signed-off-by: Furong Xu <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://patch.msgid.link/0d21ae356fb3cab77337527e87d46748a4852055.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
tc-mqprio can select whether traffic classes are express or preemptible.
After some traffic tests, MAC merge layer statistics are all good.
Local device:
ethtool --include-statistics --json --show-mm eth1
[ {
"ifname": "eth1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 0,
"MACMergeFragCountRx": 0,
"MACMergeFragCountTx": 35105,
"MACMergeHoldCount": 0
}
} ]
Remote device:
ethtool --include-statistics --json --show-mm end1
[ {
"ifname": "end1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 35105,
"MACMergeFragCountRx": 35105,
"MACMergeFragCountTx": 0,
"MACMergeHoldCount": 0
}
} ]
Tested on DWMAC CORE 5.10a
Signed-off-by: Furong Xu <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://patch.msgid.link/592965ea93ed8240f0a1b8f6f8ebb8914f69419b.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Implement ethtool --show-mm and --set-mm callbacks.
NIC up/down, link up/down, suspend/resume, kselftest-ethtool_mm,
all tested okay.
Signed-off-by: Furong Xu <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://patch.msgid.link/06ed409314fe0ee37b78b800922f2c0cce762532.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop driver defined stmmac_fpe_state, and switch to common
ethtool_mm_verify_status for local TX verification status.
Local side and remote side verification processes are completely
independent. There is no reason at all to keep a local state and
a remote state.
Add a spinlock to avoid races among ISR, timer, link update
and register configuration.
This patch is based on Vladimir Oltean's proposal.
Vladimir Oltean says:
====================
In the INITIAL state, the timer sends MPACKET_VERIFY. Eventually the
stmmac_fpe_event_status() IRQ fires and advances the state to VERIFYING,
then rearms the timer after verify_time ms. If a subsequent IRQ comes in
and modifies the state to SUCCEEDED after getting MPACKET_RESPONSE, the
timer sees this. It must enable the EFPE bit now. Otherwise, it
decrements the verify_limit counter and tries again. Eventually it
moves the status to FAILED, from which the IRQ cannot move it anywhere
else, except for another stmmac_fpe_apply() call.
====================
Co-developed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Furong Xu <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://patch.msgid.link/151f86c8428eba967039718c6bf90a7d841e703b.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
ethtool --set-mm can trigger FPE verification process by calling
stmmac_fpe_send_mpacket, stmmac_fpe_handshake should be gone.
Signed-off-by: Furong Xu <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://patch.msgid.link/42018b1a15eb3ced567fd6a73798c7cd4e08799a.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
By moving the fpe_cfg field to the stmmac_priv data, stmmac_fpe_cfg
becomes platform-data eventually, instead of a run-time config.
Suggested-by: Serge Semin <[email protected]>
Signed-off-by: Furong Xu <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Link: https://patch.msgid.link/d9b3d7ecb308c5e39778a4c8ae9df288a2754379.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Was slightly misleading before, because printed is pointer to fwnode,
not to phy device, as placement in message suggested. Include header
for dev_dbg() declaration while at it.
Output before:
[ +0.001247] mdio_bus f802c000.ethernet-ffffffff: registered phy 2612f00a fwnode at address 3
Output after:
[ +0.001229] mdio_bus f802c000.ethernet-ffffffff: registered phy fwnode /ahb/apb/ethernet@f802c000/ethernet-phy@3 at address 3
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Alexander Dahl <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
In commit 48b6190a0042 ("net/smc: Limit SMC visits when handshake workqueue congested"),
we introduce a mechanism to put constraint on SMC connections visit
according to the pressure of SMC handshake process.
At that time, we believed that controlling the feature through netlink
was sufficient. However, most people have realized now that netlink is
not convenient in container scenarios, and sysctl is a more suitable
approach.
In addition, since commit 462791bbfa35 ("net/smc: add sysctl interface for SMC")
had introcuded smc_sysctl_net_init(), it is reasonable for us to
initialize limit_smc_hs in it instead of initializing it in
smc_pnet_net_int().
Signed-off-by: D. Wythe <[email protected]>
Reviewed-by: Wen Gu <[email protected]>
Reviewed-by: Jan Karcher <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This adds support to show firmware version information for both stored and
running firmware versions. The version and commit is displayed separately
to aid monitoring tools which only care about the version.
Example output:
# devlink dev info
pci/0000:01:00.0:
driver fbnic
serial_number 88-25-08-ff-ff-01-50-92
versions:
running:
fw 24.07.15-017
fw.commit h999784ae9df0
fw.bootloader 24.07.10-000
fw.bootloader.commit hfef3ac835ce7
stored:
fw 24.07.24-002
fw.commit hc9d14a68b3f2
fw.bootloader 24.07.22-000
fw.bootloader.commit h922f8493eb96
fw.undi 01.00.03-000
Signed-off-by: Lee Trager <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Daniel Machon says:
====================
net: lan966x: use the newly introduced FDMA library
This patch series is the second of a 2-part series [1], that adds a new
common FDMA library for Microchip switch chips Sparx5 and lan966x. These
chips share the same FDMA engine, and as such will benefit from a common
library with a common implementation. This also has the benefit of
removing a lot of open-coded bookkeeping and duplicate code for the two
drivers.
In this second series, the FDMA library will be taken into use by the
lan966x switch driver.
###################
# Example of use: #
###################
- Initialize the rx and tx fdma structs with values for: number of
DCB's, number of DB's, channel ID, DB size (data buffer size), and
total size of the requested memory. Also provide two callbacks:
nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr.
- Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent().
- Initialize the DCB's with fdma_dcb_init().
- Add new DCB's with fdma_dcb_add().
- Free memory with fdma_free_phys() or fdma_free_coherent().
#####################
# Patch breakdown: #
#####################
Patch #1: select FDMA library for lan966x.
Patch #2: includes the fdma_api.h header and removes old symbols.
Patch #3: replaces old rx and tx variables with equivalent ones from the
fdma struct. Only the variables that can be changed without
breaking traffic is changed in this patch.
Patch #4: uses the library for allocation of rx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #5: uses the library for adding DCB's in the rx path.
Patch #6: uses the library for freeing rx buffers.
Patch #7: uses the library for allocation of tx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #8: uses the library for adding DCB's in the tx path.
Patch #9: uses the library helpers in the tx path.
Patch #10: ditch last_in_use variable and use library instead.
Patch #11: uses library helpers throughout.
Patch #12: refactor lan966x_fdma_reload() function.
[1] https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Daniel Machon <[email protected]>
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Now that we store everything in the fdma structs, refactor
lan966x_fdma_reload() to store and restore the entire struct.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The library provides helpers for a number of DCB and DB operations. Use
these throughout the code and remove the old ones.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This variable is used in the tx path to determine the last used DCB. The
library has the variable last_dcb for the exact same purpose. Ditch the
last_in_use variable throughout.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Use the fdma_dcb_add() function to add DCB's in the tx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: tx->dma, tx->dcbs and tx->curr_entry
with the equivalents from the FDMA struct.
- add lan966x_fdma_tx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Use the fdma_dcb_add() function to add DCB's in the rx path. This gets
rid of the open-coding of nextptr and dataptr handling and the functions
for adding DCB's.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: rx->dma, rx->dcbs and rx->last_entry
with the equivalents from the FDMA struct.
- make use of fdma->db_size for rx buffer size.
- add lan966x_fdma_rx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Replace the old rx and tx variables: channel_id, FDMA_DCB_MAX,
FDMA_RX_DCB_MAX_DBS, FDMA_TX_DCB_MAX_DBS, dcb_index and db_index with
the equivalents from the FDMA rx and tx structs. These variables are not
entangled in any buffer allocation and can therefore be replaced in
advance.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Include and use the new FDMA header, which now provides the required
masks and bit offsets for operating on the DCB's and DB's.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Select the newly introduced FDMA library.
Signed-off-by: Daniel Machon <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Brett Creeley says:
====================
ionic: convert Rx queue buffers to use page_pool
Our home-grown buffer management needs to go away and we need to play
nicely with the page_pool infrastructure. This patchset cleans up some
of our API use and converts the Rx traffic queues to use page_pool.
The first few patches are for tidying up things, then a small XDP
configuration refactor, adding page_pool support, and finally adding
support to hot swap an XDP program without having to reconfigure
anything.
The result is code that more closely follows current patterns, as well as
a either a performance boost or equivalent performance as seen with
iperf testing:
mss netio tx_pps rx_pps total_pps tx_bw rx_bw total_bw
---- ------- ---------- ---------- ----------- ------- ------- ----------
Before:
256 bidir 13,839,293 15,515,227 29,354,520 34 38 71
512 bidir 13,913,249 14,671,693 28,584,942 62 65 127
1024 bidir 13,006,189 13,695,413 26,701,602 109 115 224
1448 bidir 12,489,905 12,791,734 25,281,639 145 149 294
2048 bidir 9,195,622 9,247,649 18,443,271 148 149 297
4096 bidir 5,149,716 5,247,917 10,397,633 160 163 323
8192 bidir 3,029,993 3,008,882 6,038,875 179 179 358
9000 bidir 2,789,358 2,800,744 5,590,102 181 180 361
After:
256 bidir 21,540,037 21,344,644 42,884,681 52 52 104
512 bidir 23,170,014 19,207,260 42,377,274 103 85 188
1024 bidir 17,934,280 17,819,247 35,753,527 150 149 299
1448 bidir 15,242,515 14,907,030 30,149,545 167 174 341
2048 bidir 10,692,542 10,663,023 21,355,565 177 176 353
4096 bidir 6,024,977 6,083,580 12,108,557 187 180 367
8192 bidir 3,090,449 3,048,266 6,138,715 180 176 356
9000 bidir 2,859,146 2,864,226 5,723,372 178 180 358
v2: https://lore.kernel.org/[email protected]
v1: https://lore.kernel.org/[email protected]
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Using examples of other driver(s), add the ability to hot-swap an XDP
program without having to reconfigure the queues. To prevent the
q->xdp_prog to be read/written more than once use READ_ONCE() and
WRITE_ONCE() on the q->xdp_prog.
The q->xdp_prog was being checked in multiple different for loops in the
hot path. The change to allow xdp_prog hot swapping created the
possibility for many READ_ONCE(q->xdp_prog) calls during a single napi
callback. Refactor the Rx napi handling to allow a previous
READ_ONCE(q->xdp_prog) (or NULL for hwstamp_rxq) to be passed into the
relevant functions.
Also, move other Rx related hotpath handling into the newly created
ionic_rx_cq_service() function to reduce the scope of the xdp_prog
local variable and put all Rx handling in one function similar to Tx.
Signed-off-by: Brett Creeley <[email protected]>
Signed-off-by: Shannon Nelson <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Our home-grown buffer management needs to go away and we need
to be playing nicely with the page_pool infrastructure. This
converts the Rx traffic queues to use page_pool.
Also, since ionic_rx_buf_size() was removed, redefine
IONIC_PAGE_SIZE to account for IONIC_MAX_BUF_LEN being the
largest allowed buffer to prevent overflowing u16 variables,
which could happen when PAGE_SIZE is defined as >= 64KB.
include/linux/minmax.h:93:37: warning: conversion from 'long unsigned int' to 'u16' {aka 'short unsigned int'} changes value from '65536' to '0' [-Woverflow]
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Brett Creeley <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently when going to/from a NULL XDP program the driver uses
ionic_stop_queues_reconfig() and then ionic_start_queues_reconfig() in
order to re-register the xdp_rxq_info and re-init the queues. This is
fine until page_pool(s) are used in an upcoming patch.
In preparation for adding page_pool support make sure to completely
rebuild the queues when going to/from a NULL XDP program. Without this
change the call to mem_allocator_disconnect() never happens when going
to a NULL XDP program, which eventually results in
xdp_rxq_info_reg_mem_model() failing with -ENOSPC due to the mem_id_pool
ida having no remaining space.
Signed-off-by: Brett Creeley <[email protected]>
Signed-off-by: Shannon Nelson <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Instead of setting up and tearing down the rxq_info only when the XDP
program is loaded or unloaded, we will build the rxq_info whether or not
XDP is in use. This is the more common use pattern and better supports
future conversion to page_pool. Since the rxq_info wants the napi_id
we re-order things slightly to tie this into the queue init and deinit
functions where we do the add and delete of napi.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Brett Creeley <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We originally were using a per-interface xdp_prog variable to track
a loaded XDP program since we knew there would never be support for a
per-queue XDP program. With that, we only built the per queue rxq_info
struct when an XDP program was loaded and removed it on XDP program unload,
and used the pointer as an indicator in the Rx hotpath to know to how build
the buffers. However, that's really not the model generally used, and
makes a conversion to page_pool Rx buffer cacheing a little problematic.
This patch converts the driver to use the more common approach of using
a per-queue xdp_prog pointer to work out buffer allocations and need
for bpf_prog_run_xdp(). We jostle a couple of fields in the queue struct
in order to keep the new xdp_prog pointer in a warm cacheline.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Brett Creeley <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|