Age | Commit message (Collapse) | Author | Files | Lines |
|
With the 10us interval, we were seeing PTM transactions take around 12us.
Hardware team suggested this interval could be lowered to 1us which was
confirmed with PCIe sniffer. With the 1us interval, PTM dialogs took
around 2us.
Suggested-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Sasha Neftin <[email protected]>
Tested-by: Muhammad Husaini Zulkifli <[email protected]>
Reviewed-by: Muhammad Husaini Zulkifli <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add support for using the four sets of timestamping registers that
i225/i226 have available for TX.
In some workloads, where multiple applications request hardware
transmission timestamps, it was possible that some of those requests
were denied because the only in use register was already occupied.
This is also in preparation to future support for hardware
timestamping with multiple PTP domains. With multiple domains chances
of multiple TX timestamps being requested at the same time increase.
Before:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37
| responses | TX timestamp offset (ns)
rate clients | lost invalid basic xleave | min mean max stddev
1000 100 0.00% 0.00% 0.00% 100.00% +1 +41 +73 13
1500 150 0.00% 0.00% 0.00% 100.00% +9 +49 +87 15
2250 225 0.00% 0.00% 0.00% 100.00% +9 +42 +79 13
3375 337 0.00% 0.00% 0.00% 100.00% +11 +46 +81 13
5062 506 0.00% 0.00% 0.00% 100.00% +7 +44 +80 13
7593 759 0.00% 0.00% 0.00% 100.00% +9 +44 +79 12
11389 1138 0.00% 0.00% 0.00% 100.00% +14 +51 +87 13
17083 1708 0.00% 0.00% 0.00% 100.00% +1 +41 +80 14
25624 2562 0.00% 0.00% 0.00% 100.00% +11 +50 +5107 51
38436 3843 0.00% 0.00% 0.00% 100.00% -2 +36 +7843 38
57654 5765 0.00% 0.00% 0.00% 100.00% +4 +42 +10503 69
86481 8648 0.00% 0.00% 0.00% 100.00% +11 +54 +5492 65
129721 12972 0.00% 0.00% 0.00% 100.00% +31 +2680 +6942 2606
194581 16384 16.79% 0.00% 0.87% 82.34% +73 +4444 +15879 3116
291871 16384 35.05% 0.00% 1.53% 63.42% +188 +5381 +17019 3035
437806 16384 54.95% 0.00% 2.55% 42.50% +233 +6302 +13885 2846
After:
$ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37
| responses | TX timestamp offset (ns)
rate clients | lost invalid basic xleave | min mean max stddev
1000 100 0.00% 0.00% 0.00% 100.00% -20 +12 +43 13
1500 150 0.00% 0.00% 0.00% 100.00% -23 +18 +57 14
2250 225 0.00% 0.00% 0.00% 100.00% -2 +33 +67 13
3375 337 0.00% 0.00% 0.00% 100.00% +1 +38 +76 13
5062 506 0.00% 0.00% 0.00% 100.00% +9 +52 +93 14
7593 759 0.00% 0.00% 0.00% 100.00% +11 +47 +82 13
11389 1138 0.00% 0.00% 0.00% 100.00% -9 +27 +74 13
17083 1708 0.00% 0.00% 0.00% 100.00% -13 +25 +66 14
25624 2562 0.00% 0.00% 0.00% 100.00% -8 +28 +65 13
38436 3843 0.00% 0.00% 0.00% 100.00% -13 +28 +69 13
57654 5765 0.00% 0.00% 0.00% 100.00% -11 +32 +71 14
86481 8648 0.00% 0.00% 0.00% 100.00% +2 +44 +83 14
129721 12972 15.36% 0.00% 0.35% 84.29% -2 +2248 +22907 4252
194581 16384 42.98% 0.00% 1.98% 55.04% -4 +5278 +65039 5856
291871 16384 54.33% 0.00% 2.21% 43.46% -3 +6306 +22608 5665
We can see that with 4 registers, as expected, we are able to handle a
increasing number of requests more consistently, but as soon as all
registers are in use, the decrease in quality of service happens in a
sharp step.
Signed-off-by: Vinicius Costa Gomes <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Muhammad Husaini Zulkifli <[email protected]>
Reviewed-by: Kurt Kanzenbach <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
mlx5 MACsec RoCEv2 support
From Patrisious:
This series extends previously added MACsec offload support
to cover RoCE traffic either.
In order to achieve that, we need configure MACsec with offload between
the two endpoints, like below:
REMOTE_MAC=10:70:fd:43:71:c0
* ip addr add 1.1.1.1/16 dev eth2
* ip link set dev eth2 up
* ip link add link eth2 macsec0 type macsec encrypt on
* ip macsec offload macsec0 mac
* ip macsec add macsec0 tx sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16
* ip macsec add macsec0 rx port 1 address $REMOTE_MAC
* ip macsec add macsec0 rx port 1 address $REMOTE_MAC sa 0 pn 1 on key 01 ead3664f508eb06c40ac7104cdae4ce5
* ip addr add 10.1.0.1/16 dev macsec0
* ip link set dev macsec0 up
And in a similar manner on the other machine, while noting the keys order
would be reversed and the MAC address of the other machine.
RDMA traffic is separated through relevant GID entries and in case
of IP ambiguity issue - meaning we have a physical GIDs and a MACsec
GIDs with the same IP/GID, we disable our physical GID in order
to force the user to only use the MACsec GID.
v0: https://lore.kernel.org/netdev/[email protected]/
* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion
net/mlx5: Add RoCE MACsec steering infrastructure in core
net/mlx5: Configure MACsec steering for ingress RoCEv2 traffic
net/mlx5: Configure MACsec steering for egress RoCEv2 traffic
IB/core: Reorder GID delete code for RoCE
net/mlx5: Add MACsec priorities in RDMA namespaces
RDMA/mlx5: Implement MACsec gid addition and deletion
net/mlx5: Maintain fs_id xarray per MACsec device inside macsec steering
net/mlx5: Remove netdevice from MACsec steering
net/mlx5e: Move MACsec flow steering and statistics database from ethernet to core
net/mlx5e: Rename MACsec flow steering functions/parameters to suit core naming style
net/mlx5: Remove dependency of macsec flow steering on ethernet
net/mlx5e: Move MACsec flow steering operations to be used as core library
macsec: add functions to get macsec real netdevice and check offload
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
include/net/inet_sock.h
f866fbc842de ("ipv4: fix data-races around inet->inet_id")
c274af224269 ("inet: introduce inet->inet_flags")
https://lore.kernel.org/all/[email protected]/
Adjacent changes:
drivers/net/bonding/bond_alb.c
e74216b8def3 ("bonding: fix macvlan over alb bond support")
f11e5bd159b0 ("bonding: support balance-alb with openvswitch")
drivers/net/ethernet/broadcom/bgmac.c
d6499f0b7c7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()")
23a14488ea58 ("net: bgmac: Fix return value check for fixed_phy_register()")
drivers/net/ethernet/broadcom/genet/bcmmii.c
32bbe64a1386 ("net: bcmgenet: Fix return value check for fixed_phy_register()")
acf50d1adbf4 ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()")
net/sctp/socket.c
f866fbc842de ("ipv4: fix data-races around inet->inet_id")
b09bde5c3554 ("inet: move inet->mc_loop to inet->inet_frags")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-08-22
1) Patches #1..#13 From Jiri:
The goal of this patchset is to make the SF code cleaner.
Benefit from previously introduced devlink_port struct containerization
to avoid unnecessary lookups in devlink port ops.
Also, benefit from the devlink locking changes and avoid unnecessary
reference counting.
2) Patches #14,#15:
Add ability to configure proto both UDP and TCP selectors in RX and TX
directions.
* tag 'mlx5-updates-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Support IPsec upper TCP protocol selector
net/mlx5e: Support IPsec upper protocol selector field offload for RX
net/mlx5: Store vport in struct mlx5_devlink_port and use it in port ops
net/mlx5: Check vhca_resource_manager capability in each op and add extack msg
net/mlx5: Relax mlx5_devlink_eswitch_get() return value checking
net/mlx5: Return -EOPNOTSUPP in mlx5_devlink_port_fn_migratable_set() directly
net/mlx5: Reduce number of vport lookups passing vport pointer instead of index
net/mlx5: Embed struct devlink_port into driver structure
net/mlx5: Don't register ops for non-PF/VF/SF port and avoid checks in ops
net/mlx5: Remove no longer used mlx5_esw_offloads_sf_vport_enable/disable()
net/mlx5: Introduce mlx5_eswitch_load/unload_sf_vport() and use it from SF code
net/mlx5: Allow mlx5_esw_offloads_devlink_port_register() to register SFs
net/mlx5: Push devlink port PF/VF init/cleanup calls out of devlink_port_register/unregister()
net/mlx5: Push out SF devlink port init and cleanup code to separate helpers
net/mlx5: Rework devlink port alloc/free into init/cleanup
====================
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Systems having 4 GiB of RAM and more require DMA addressing beyond the
current 32-bit limit. Starting from MT7988 the hardware now supports
36-bit DMA addressing, let's use that new capability in the driver to
avoid running into swiotlb on systems with 4 GiB of RAM or more.
Signed-off-by: Daniel Golle <[email protected]>
Link: https://lore.kernel.org/r/95b919c98876c9e49761e44662e7c937479eecb8.1692721443.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
MT7981, MT7986 and MT7988 come with in-SoC SRAM dedicated for Ethernet
DMA rings. Support using the SRAM without breaking existing device tree
bindings, ie. only new SoC starting from MT7988 will have the SRAM
declared as additional resource in device tree. For MT7981 and MT7986
an offset on top of the main I/O base is used.
Signed-off-by: Daniel Golle <[email protected]>
Link: https://lore.kernel.org/r/e45e0f230c63ad58869e8fe35b95a2fb8925b625.1692721443.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add bits needed to reset the frame engine on MT7988.
Fixes: 445eb6448ed3 ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC")
Signed-off-by: Daniel Golle <[email protected]>
Link: https://lore.kernel.org/r/89b6c38380e7a3800c1362aa7575600717bc7543.1692721443.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
More register macros need to be adjusted for the 3rd GMAC on MT7988.
Account for added bit in SYSCFG0_SGMII_MASK.
Fixes: 445eb6448ed3 ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC")
Signed-off-by: Daniel Golle <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/1c8da012e2ca80939906d85f314138c552139f0f.1692721443.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
As we already added the exception tracing for XDP_TX, I think it is
necessary to add the exception tracing for other XDP actions, such
as XDP_REDIRECT, XDP_ABORTED and unknown error actions.
Signed-off-by: Wei Fang <[email protected]>
Suggested-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Use the standard error pointer macro to shorten the code and simplify.
Signed-off-by: Yu Liao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When building for power4, newer binutils don't recognise the "dcbfl"
extended mnemonic.
dcbfl RA, RB is equivalent to dcbf RA, RB, 1.
Switch to "dcbf" to avoid the build error.
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add check for pf->vf not being NULL before dereferencing
pf->vf[vsi->vf_id] in updating VSI filter sync.
Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
in the condition for clearing promisc mode bit.
Fixes: c87c938f62d8 ("i40e: Add VF VLAN pruning")
Signed-off-by: Andrii Staikov <[email protected]>
Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
All callers of build_skb() (*) in bnxt are in NAPI context.
The budget checking is somewhat convoluted because in the shared
completion queue cases Rx packets are discarded by netpoll
by forcing an error (E). But that happens before skb allocation.
Only a call chain starting at __bnxt_poll_work() can lead to
an skb allocation and it checks budget (b).
* bnxt_rx_multi_page_skb
* bnxt_rx_skb
` bp->rx_skb_func
* bnxt_tpa_end
` bnxt_rx_pkt
E bnxt_force_rx_discard
E bnxt_poll_nitroa0
b __bnxt_poll_work
Use napi_build_skb() to take advantage of the skb cache.
In iperf tests with HW-GRO enabled it barely makes a difference
but in cases where HW-GRO is not as effective (or disabled) it
can give even a >10% boost (20.7Gbps -> 23.1Gbps).
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After the conversion to use the auxiliary bus, the custom device
management is not needed anymore and can be deleted.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use the auxiliary bus to perform device management of the infiniband
part of the mlx4 driver.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use the auxiliary bus to perform device management of the ethernet part
of the mlx4 driver.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add an auxiliary virtual bus to model the mlx4 driver structure. The
code is added along the current custom device management logic.
Subsequent patches switch mlx4_en and mlx4_ib to the auxiliary bus and
the old interface is then removed.
Structure mlx4_priv gains a new adev dynamic array to keep track of its
auxiliary devices. Access to the array is protected by the global
mlx4_intf mutex.
Functions mlx4_register_device() and mlx4_unregister_device() are
updated to expose auxiliary devices on the bus in order to load mlx4_en
and/or mlx4_ib. Functions mlx4_register_auxiliary_driver() and
mlx4_unregister_auxiliary_driver() are added to substitute
mlx4_register_interface() and mlx4_unregister_interface(), respectively.
Function mlx4_do_bond() is adjusted to walk over the adev array and
re-adds a specific auxiliary device if its driver sets the
MLX4_INTFF_BONDING flag.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The mlx4_core driver has a logic that allows a sub-driver to set the
MLX4_INTFF_BONDING flag which then causes that function mlx4_do_bond()
asks the sub-driver to fully re-probe a device when its bonding
configuration changes.
Performing this operation is disallowed in mlx4_register_interface()
when it is detected that any mlx4 device is multifunction (SRIOV). The
code then resets MLX4_INTFF_BONDING in the driver flags.
Move this check directly into mlx4_do_bond(). It provides a better
separation as mlx4_core no longer directly modifies the sub-driver flags
and it will allow to get rid of explicitly keeping track of all mlx4
devices by the intf.c code when it is switched to an auxiliary bus.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Function mlx4_en_queue_bond_work() is used in mlx4_en to start a bond
reconfiguration. It gathers data about a new port map setting, takes
a reference on the netdev that triggered the change and queues a work
object on mlx4_en_priv.mdev.workqueue to perform the operation. The
scheduled work is mlx4_en_bond_work() which calls
mlx4_bond()/mlx4_unbond() and consequently mlx4_do_bond().
At the same time, function mlx4_change_port_types() in mlx4_core might
be invoked to change the port type configuration. As part of its logic,
it re-registers the whole device by calling mlx4_unregister_device(),
followed by mlx4_register_device().
The two operations can result in concurrent access to the data about
currently active interfaces on the device.
Functions mlx4_register_device() and mlx4_unregister_device() lock the
intf_mutex to gain exclusive access to this data. The current
implementation of mlx4_do_bond() doesn't do that which could result in
an unexpected behavior. An updated version of mlx4_do_bond() for use
with an auxiliary bus goes and locks the intf_mutex when accessing a new
auxiliary device array.
However, doing so can then result in the following deadlock:
* A two-port mlx4 device is configured as an Ethernet bond.
* One of the ports is changed from eth to ib, for instance, by writing
into a mlx4_port<x> sysfs attribute file.
* mlx4_change_port_types() is called to update port types. It invokes
mlx4_unregister_device() to unregister the device which locks the
intf_mutex and starts removing all associated interfaces.
* Function mlx4_en_remove() gets invoked and starts destroying its first
netdev. This triggers mlx4_en_netdev_event() which recognizes that the
configured bond is broken. It runs mlx4_en_queue_bond_work() which
takes a reference on the netdev. Removing the netdev now cannot
proceed until the work is completed.
* Work function mlx4_en_bond_work() gets scheduled. It calls
mlx4_unbond() -> mlx4_do_bond(). The latter function tries to lock the
intf_mutex but that is not possible because it is held already by
mlx4_unregister_device().
This particular case could be possibly solved by unregistering the
mlx4_en_netdev_event() notifier in mlx4_en_remove() earlier, but it
seems better to decouple mlx4_en more and break this reference order.
Avoid then this scenario by recognizing that the bond reconfiguration
operates only on a mlx4_dev. The logic to queue and execute the bond
work can be moved into the mlx4_core driver. Only a reference on the
respective mlx4_dev object is needed to be taken during the work's
lifetime. This removes a call from mlx4_en that can directly result in
needing to lock the intf_mutex, it remains a privilege of the core
driver.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The mlx4_interface.activate callback was introduced in commit
79857cd31fe7 ("net/mlx4: Postpone the registration of net_device"). It
dealt with a situation when a netdev notifier received a NETDEV_REGISTER
event for a new net_device created by mlx4_en but the same device was
not yet visible to mlx4_get_protocol_dev(). The callback can be removed
now that mlx4_get_protocol_dev() is gone.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use a notifier to implement mlx4_dispatch_event() in preparation to
switch mlx4_en and mlx4_ib to be an auxiliary device.
A problem is that if the mlx4_interface.event callback was replaced with
something as mlx4_adrv.event then the implementation of
mlx4_dispatch_event() would need to acquire a lock on a given device
before executing this callback. That is necessary because otherwise
there is no guarantee that the associated driver cannot get unbound when
the callback is running. However, taking this lock is not possible
because mlx4_dispatch_event() can be invoked from the hardirq context.
Using an atomic notifier allows the driver to accurately record when it
wants to receive these events and solves this problem.
A handler registration is done by both mlx4_en and mlx4_ib at the end of
their mlx4_interface.add callback. This matches the current situation
when mlx4_add_device() would enable events for a given device
immediately after this callback, by adding the device on the
mlx4_priv.list.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Function mlx4_dispatch_event() takes an 'unsigned long' as its event
parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR),
a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit
integer (remaining events).
In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device,
the mlx4_interface.event callback is replaced with a notifier and
function mlx4_dispatch_event() gets updated to invoke
atomic_notifier_call_chain(). This requires forwarding the input 'param'
value from the former function to the latter. A problem is that the
notifier call takes 'void *' as its 'param' value, compared to
'unsigned long' used by mlx4_dispatch_event(). Re-passing the value
would need either punning it to 'void *' or passing down the address of
the input 'param'. Both approaches create a number of unnecessary casts.
Change instead the input 'param' of mlx4_dispatch_event() from
'unsigned long' to 'void *'. A mlx4_eqe pointer can be passed directly,
callers using an int value are adjusted to pass its address.
Signed-off-by: Petr Pavlu <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Rename the mlx4_en_dev.nb notifier_block member to netdev_nb in
preparation to add a mlx4 core notifier_block.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev()
and the associated mlx4_interface.get_dev callbacks. This is done in
preparation to use an auxiliary bus to model the mlx4 driver structure.
The change is motivated by the following situation:
* The mlx4_en interface is being initialized by mlx4_en_add() and
mlx4_en_activate().
* The latter activate function calls mlx4_en_init_netdev() ->
register_netdev() to register a new net_device.
* A netdev event NETDEV_REGISTER is raised for the device.
* The netdev notififier mlx4_ib_netdev_event() is called and it invokes
mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() ->
mlx4_en_get_netdev() [via mlx4_interface.get_dev].
This chain creates a problem when mlx4_en gets switched to be an
auxiliary driver. It contains two device calls which would both need to
take a respective device lock.
Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer
call mlx4_get_protocol_dev() but instead to utilize the information
passed in net_device.parent and net_device.dev_port. This data is
sufficient to determine that an updated port is one that the mlx4_ib
driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to
date.
Following that, update mlx4_ib_get_netdev() to also not call
mlx4_get_protocol_dev() and instead scan all current netdevs to find
find a matching one. Note that mlx4_ib_get_netdev() is called early from
ib_register_device() and cannot use data tracked in
mlx4_ib_dev.iboe.netdevs which is not at that point yet set.
Finally, remove function mlx4_get_protocol_dev() and the
mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they
became unused.
Signed-off-by: Petr Pavlu <[email protected]>
Tested-by: Leon Romanovsky <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 8cd160a29415 ("qede: convert to new udp_tunnel_nic infra")
removed qede_udp_tunnel_{add,del}() but not the declarations.
Commit 0ebcebbef1cc ("qed: Read device port count from the shmem")
removed qed_device_num_engines() but not its declaration.
Commit 1e128c81290a ("qed: Add support for hardware offloaded FCoE.")
declared but never implemented qed_fcoe_set_pf_params().
Signed-off-by: Yue Haibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some of the newer silicon versions in CN10K series supports a feature
where in the current PTP timestamp in HW can be updated atomically
without losing any cpu cycles unlike read/modify/write register.
This patch uses this feature so that PTP accuracy can be improved
while adjusting the master offset in HW. There is no need for SW
timecounter when using this feature. So removed references to SW
timecounter wherever appropriate.
Signed-off-by: Sai Krishna <[email protected]>
Signed-off-by: Naveen Mamindlapalli <[email protected]>
Signed-off-by: Sunil Kovvuri Goutham <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Support TCP as protocol selector for policy and state in IPsec
packet offload mode.
Example of state configuration is as follows:
ip xfrm state add src 192.168.25.3 dst 192.168.25.1 \
proto esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \
0x54a7588d36873b031e4bd46301be5a86b3a53879 128 mode transport \
offload packet dev re0 dir in sel src 192.168.25.3 dst 192.168.25.1 \
proto tcp dport 9003
Acked-by: Raed Salem <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Support RX policy/state upper protocol selector field offload,
to enable selecting RX traffic for IPsec operation based on l4
protocol UDP with specific source/destination port.
Signed-off-by: Emeel Hakim <[email protected]>
Reviewed-by: Raed Salem <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Instead of using internal devlink_port->index to perform vport lookup in
every devlink port op, store the vport pointer to the container struct
mlx5_devlink_port and use it directly in port ops.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Since the follow-up patch is going to remove
mlx5_devlink_port_fn_get_vport() entirely, move the vhca_resource_manager
capability checking to individual ops. Add proper extack message
on the way.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
If called from port ops, it is not needed to perform the checks in
mlx5_devlink_eswitch_get(). The reason is devlink port would not be
registered if the checks are not true. Introduce relaxed version
mlx5_devlink_eswitch_nocheck_get() and use it in port ops.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Instead of initializing "err" variable, just return "-EOPNOTSUPP"
directly where it is needed.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
During devlink port init/cleanup and register/unregister calls, there
are many lookups of vport. Instead of passing vport_num as argument to
functions, pass the vport struct pointer directly and avoid repeated
lookups.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Struct devlink_port is usually embedded in a driver-specific struct
which allows to carry driver context to devlink port ops.
Introduce a container struct to include devlink_port struct
in preparation to also include driver context for devlink port ops.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently each PF/VF/SF devlink port op called into mlx5 code calls
is_port_function_supported() to check if the port is either
PF, VF or SF. So make sure that the ops are registered with devlink
port only for those and avoid the is_port_function_supported() checks
in ops.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Since the previous patch removed the only users of these functions,
remove them.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Similar to the PF/VF helpers, introduce a set of load/unload helpers
for SF vports. From there, call mlx5_eswitch_load/unload_vport() which
are common for PFs/VFs and newly introduced SF helpers.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently there is a separate set of functions used to
register/unregister the SF. The only difference is currently the ops
struct. Move the struct up and use it for SFs in
mlx5_esw_offloads_devlink_port_register().
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
devlink_port_register/unregister()
In order to prepare for
mlx5_esw_offloads_devlink_port_register/unregister() to be used
for SFs as well, push out the PF/VF specific init/cleanup calls outside.
Introduce mlx5_eswitch_load/unload_pf_vf_vport() and call them from
there. Use these new helpers of PF/VF loading and make
mlx5_eswitch_local/unload_vport() reusable for SFs.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Similar to what was done for PFs/VFs, introduce devlink port init and
cleanup helpers for SFs and manage the vport->dl_port pointer there.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
In order to prepare the devlink port registration function to be common
for PFs/VFs and SFs, change the existing devlink port allocation and
free functions into PF/VF init and cleanup, so similar helpers could be
later on introduced for SFs. Make the init/cleanup helpers responsible
for setting/clearing the vport->dl_port pointer.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM
requests. The bit resolution of this field is six bits. That bit five was
missing in the mask. This patch comes to correct the typo in the
IGC_PTM_CTRL_SHRT_CYC macro.
Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
Signed-off-by: Sasha Neftin <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting
PTP related workqueues.
In this way we can fix this:
BUG: unable to handle page fault for address: ffffc9000440b6f8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0
Oops: 0000 [#1] PREEMPT SMP
[...]
Workqueue: events igb_ptp_overflow_check
RIP: 0010:igb_rd32+0x1f/0x60
[...]
Call Trace:
igb_ptp_read_82580+0x20/0x50
timecounter_read+0x15/0x60
igb_ptp_overflow_check+0x1a/0x50
process_one_work+0x1cb/0x3c0
worker_thread+0x53/0x3f0
? rescuer_thread+0x370/0x370
kthread+0x142/0x160
? kthread_associate_blkcg+0xc0/0xc0
ret_from_fork+0x1f/0x30
Fixes: 1f6e8178d685 ("igb: Prevent dropped Tx timestamps via work items and interrupts.")
Fixes: d339b1331616 ("igb: add PTP Hardware Clock code")
Signed-off-by: Alessio Igor Bogani <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-08-21 (ice)
This series contains updates to ice driver only.
Jesse fixes an issue on calculating buffer size.
Petr Oros reverts a commit that does not fully resolve VF reset issues
and implements one that provides a fuller fix.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix NULL pointer deref during VF reset
Revert "ice: Fix ice VF reset during iavf initialization"
ice: fix receive buffer size miscalculation
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
While injecting PCIe errors to the upstream PCIe switch of
a BCM57810 NIC, system hangs/crashes were observed.
After several calls to bnx2x_tx_timout() complete,
bnx2x_nic_unload() is called to free up HW resources
and bnx2x_napi_disable() is called to release NAPI objects.
Later, when the EEH driver calls bnx2x_io_slot_reset() to
complete the recovery process, bnx2x attempts to disable
NAPI again by calling bnx2x_napi_disable() and freeing
resources which have already been freed, resulting in a
hang or crash.
Introduce a new flag to track the HW resource and NAPI
allocation state, refactor duplicated code into a single
function, check page pool allocation status before freeing,
and reduces debug output when a TX timeout event occurs.
Reviewed-by: Manish Chopra <[email protected]>
Tested-by: Abdul Haleem <[email protected]>
Tested-by: David Christensen <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Venkata Sai Duggi <[email protected]>
Signed-off-by: Thinh Tran <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cited commits passed a size to alloc_skb that was only big enough for
the actual packet contents, but the following skb_put + memcpy writes
the whole struct efx_loopback_payload including leading and trailing
padding bytes (which are then stripped off with skb_pull/skb_trim).
This could cause an skb_over_panic, although in practice we get saved
by kmalloc_size_roundup.
Pass the entire size we use, instead of the size of the final packet.
Reported-by: Andy Moreton <[email protected]>
Fixes: cf60ed469629 ("sfc: use padding to fix alignment in loopback test")
Fixes: 30c24dd87f3f ("sfc: siena: use padding to fix alignment in loopback test")
Fixes: 1186c6b31ee1 ("sfc: falcon: use padding to fix alignment in loopback test")
Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-08-16
1) aRFS ethtool stats
Improve aRFS observability by adding new set of counters. Each Rx
ring will have this set of counters listed below.
These counters are exposed through ethtool -S.
1.1) arfs_add: number of times a new rule has been created.
1.2) arfs_request_in: number of times a rule was requested to move from
its current Rx ring to a new Rx ring (incremented on the destination
Rx ring).
1.3) arfs_request_out: number of times a rule was requested to move out
from its current Rx ring (incremented on source/current Rx ring).
1.4) arfs_expired: number of times a rule has been expired by the
kernel and removed from HW.
1.5) arfs_err: number of times a rule creation or modification has
failed.
2) Supporting inline WQE when possible in SW steering
3) Misc cleanups and fixups to net-next branch
* tag 'mlx5-updates-2023-08-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Devcom, only use devcom after NULL check in mlx5_devcom_send_event()
net/mlx5: DR, Supporting inline WQE when possible
net/mlx5: Rename devlink port ops struct for PFs/VFs
net/mlx5: Remove VPORT_UPLINK handling from devlink_port.c
net/mlx5: Call mlx5_esw_offloads_rep_load/unload() for uplink port directly
net/mlx5: Update dead links in Kconfig documentation
net/mlx5: Remove health syndrome enum duplication
net/mlx5: DR, Remove unneeded local variable
net/mlx5: DR, Fix code indentation
net/mlx5: IRQ, consolidate irq and affinity mask allocation
net/mlx5e: Fix spelling mistake "Faided" -> "Failed"
net/mlx5e: aRFS, Introduce ethtool stats
net/mlx5e: aRFS, Warn if aRFS table does not exist for aRFS rule
net/mlx5e: aRFS, Prevent repeated kernel rule migrations requests
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When a hardware reset is triggered on devices not initializing WED the
calls to mtk_wed_fe_reset and mtk_wed_fe_reset_complete dereference a
pointer on uninitialized stack memory.
Break out of both functions in case a hw_list entry is 0.
Fixes: 08a764a7c51b ("net: ethernet: mtk_wed: add reset/reset_complete callbacks")
Signed-off-by: Daniel Golle <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Lorenzo Bianconi <[email protected]>
Link: https://lore.kernel.org/r/5465c1609b464cc7407ae1530c40821dcdf9d3e6.1692634266.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Commit e8609e69470f ("net: ethernet: ti: am65-cpsw: Convert to PHYLINK")
removed am65_cpsw_nuss_adjust_link() but not its declaration.
Commit 84640e27f230 ("net: netcp: Add Keystone NetCP core ethernet driver")
declared but never implemented netcp_device_find_module().
Signed-off-by: Yue Haibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Roger Quadros <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|