Age | Commit message (Collapse) | Author | Files | Lines |
|
Vladimir Oltean says:
====================
Fix rtnl_mutex deadlock with DPAA2 and SFP modules
This patch set deliberately targets net-next and lacks Fixes: tags due
to caution on my part.
While testing some SFP modules on the Solidrun Honeycomb LX2K platform,
I noticed that rebooting causes a deadlock:
============================================
WARNING: possible recursive locking detected
6.1.0-rc5-07010-ga9b9500ffaac-dirty #656 Not tainted
--------------------------------------------
systemd-shutdow/1 is trying to acquire lock:
ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30
but task is already holding lock:
ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(rtnl_mutex);
lock(rtnl_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
6 locks held by systemd-shutdow/1:
#0: ffffa62db6863c70 (system_transition_mutex){+.+.}-{4:4}, at: __do_sys_reboot+0xd4/0x260
#1: ffff2f2b0176f100 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xf4/0x260
#2: ffff2f2b017be900 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x104/0x260
#3: ffff2f2b017680f0 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x40/0x260
#4: ffff2f2b0e1608f0 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x40/0x260
#5: ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30
stack backtrace:
CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 6.1.0-rc5-07010-ga9b9500ffaac-dirty #656
Hardware name: SolidRun LX2160A Honeycomb (DT)
Call trace:
lock_acquire+0x68/0x84
__mutex_lock+0x98/0x460
mutex_lock_nested+0x2c/0x40
rtnl_lock+0x1c/0x30
sfp_bus_del_upstream+0x1c/0xac
phylink_destroy+0x1c/0x50
dpaa2_mac_disconnect+0x28/0x70
dpaa2_eth_remove+0x1dc/0x1f0
fsl_mc_driver_remove+0x24/0x60
device_remove+0x70/0x80
device_release_driver_internal+0x1f0/0x260
device_links_unbind_consumers+0xe0/0x110
device_release_driver_internal+0x138/0x260
device_release_driver+0x18/0x24
bus_remove_device+0x12c/0x13c
device_del+0x16c/0x424
fsl_mc_device_remove+0x28/0x40
__fsl_mc_device_remove+0x10/0x20
device_for_each_child+0x5c/0xac
dprc_remove+0x94/0xb4
fsl_mc_driver_remove+0x24/0x60
device_remove+0x70/0x80
device_release_driver_internal+0x1f0/0x260
device_release_driver+0x18/0x24
bus_remove_device+0x12c/0x13c
device_del+0x16c/0x424
fsl_mc_bus_remove+0x8c/0x10c
fsl_mc_bus_shutdown+0x10/0x20
platform_shutdown+0x24/0x3c
device_shutdown+0x15c/0x260
kernel_restart+0x40/0xa4
__do_sys_reboot+0x1e4/0x260
__arm64_sys_reboot+0x24/0x30
But fixing this appears to be not so simple. The patch set represents my
attempt to address it.
In short, the problem is that dpaa2_mac_connect() and dpaa2_mac_disconnect()
call 2 phylink functions in a row, one takes rtnl_lock() itself -
phylink_create(), and one which requires rtnl_lock() to be held by the
caller - phylink_fwnode_phy_connect(). The existing approach in the
drivers is too simple. We take rtnl_lock() when calling dpaa2_mac_connect(),
which is what results in the deadlock.
Fixing just that creates another problem. The drivers make use of
rtnl_lock() for serializing with other code paths too. I think I've
found all those code paths, and established other mechanisms for
serializing with them.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
After the introduction of a private mac_lock that serializes access to
priv->mac (and port_priv->mac in the switch), the only remaining purpose
of rtnl_lock() is to satisfy the locking requirements of
phylink_fwnode_phy_connect() and phylink_disconnect_phy().
But the functions these live in, dpaa2_mac_connect() and
dpaa2_mac_disconnect(), have contradictory locking requirements.
While phylink_fwnode_phy_connect() wants rtnl_lock() to be held,
phylink_create() wants it to not be held.
Move the rtnl_lock() from top-level (in the dpaa2-eth and dpaa2-switch
drivers) to only surround the phylink calls that require it, in the
dpaa2-mac library code.
This is possible because dpaa2_mac_connect() and dpaa2_mac_disconnect()
run unlocked, and there isn't any danger of an AB/BA deadlock between
the rtnl_mutex and other private locks.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The dpaa2-switch driver uses a DPMAC in the same way as the dpaa2-eth
driver, so we need to duplicate the locking solution established by the
previous change to the switch driver as well.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The dpaa2 architecture permits dynamic connections between objects on
the fsl-mc bus, specifically between a DPNI object (represented by a
struct net_device) and a DPMAC object (represented by a struct phylink).
The DPNI driver is notified when those connections are created/broken
through the dpni_irq0_handler_thread() method. To ensure that ethtool
operations, as well as netdev up/down operations serialize with the
connection/disconnection of the DPNI with a DPMAC,
dpni_irq0_handler_thread() takes the rtnl_lock() to block those other
operations from taking place.
There is code called by dpaa2_mac_connect() which wants to acquire the
rtnl_mutex once again, see phylink_create() -> phylink_register_sfp() ->
sfp_bus_add_upstream() -> rtnl_lock(). So the strategy doesn't quite
work out, even though it's fairly simple.
Create a different strategy, where all code paths in the dpaa2-eth
driver access priv->mac only while they are holding priv->mac_lock.
The phylink instance is not created or connected to the PHY under the
priv->mac_lock, but only assigned to priv->mac then. This will eliminate
the reliance on the rtnl_mutex.
Add lockdep annotations and put comments where holding the lock is not
necessary, and priv->mac can be dereferenced freely.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
dpaa2_eth_connect_mac() is called both from dpaa2_eth_probe() and from
dpni_irq0_handler_thread().
It could happen that the DPNI gets connected to a DPMAC on the fsl-mc
bus exactly during probe, as soon as the "endpoint change" interrupt is
requested in dpaa2_eth_setup_irqs(). This will cause the
dpni_irq0_handler_thread() to register a phylink instance for that DPMAC.
Then, the probing function will also try to register a phylink instance
for the same DPMAC, operation which should fail (and this will fail the
probing of the driver).
Reorder dpaa2_eth_setup_irqs() and dpaa2_eth_connect_mac(), such that
dpni_irq0_handler_thread() never races with the DPMAC-related portion of
the probing path.
Also reorder dpaa2_eth_disconnect_mac() to be in the mirror position of
dpaa2_eth_connect_mac() in the teardown path.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The helper function will gain a lockdep annotation in a future patch.
Make sure to benefit from it.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
DPNIs and DPSW objects can connect and disconnect at runtime from DPMAC
objects on the same fsl-mc bus. The DPMAC object also holds "ethtool -S"
unstructured counters. Those counters are only shown for the entity
owning the netdev (DPNI, DPSW) if it's connected to a DPMAC.
The ethtool stringset code path is split into multiple callbacks, but
currently, connecting and disconnecting the DPMAC takes the rtnl_lock().
This blocks the entire ethtool code path from running, see
ethnl_default_doit() -> rtnl_lock() -> ops->prepare_data() ->
strset_prepare_data().
This is going to be a problem if we are going to no longer require
rtnl_lock() when connecting/disconnecting the DPMAC, because the DPMAC
could appear between ops->get_sset_count() and ops->get_strings().
If it appears out of the blue, we will provide a stringset into an array
that was dimensioned thinking the DPMAC wouldn't be there => array
accessed out of bounds.
There isn't really a good way to work around that, and I don't want to
put too much pressure on the ethtool framework by playing locking games.
Just make the DPMAC counters be always available. They'll be zeroes if
the DPNI or DPSW isn't connected to a DPMAC.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The dpaa2-switch has the exact same locking requirements when connected
to a DPMAC, so it needs port_priv->mac to always point either to NULL,
or to a DPMAC with a fully initialized phylink instance.
Make the same preparatory change in the dpaa2-switch driver as in the
dpaa2-eth one.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
There are 2 requirements for correct code:
- Any time the driver accesses the priv->mac pointer at runtime, it
either holds NULL to indicate a DPNI-DPNI connection (or unconnected
DPNI), or a struct dpaa2_mac whose phylink instance was fully
initialized (created and connected to the PHY). No changes are made to
priv->mac while it is being used. Currently, rtnl_lock() watches over
the call to dpaa2_eth_connect_mac(), so it serves the purpose of
serializing this with all readers of priv->mac.
- dpaa2_mac_connect() should run unlocked, because inside it are 2
phylink calls with incompatible locking requirements: phylink_create()
requires that the rtnl_mutex isn't held, and phylink_fwnode_phy_connect()
requires that the rtnl_mutex is held. The only way to solve those
contradictory requirements is to let dpaa2_mac_connect() take
rtnl_lock() when it needs to.
To solve both requirements, we need to identify the writer side of the
priv->mac pointer, which can be wrapped in a mutex private to the driver
in a future patch. The dpaa2_mac_connect() cannot be part of the writer
side critical section, because of an AB/BA deadlock with rtnl_lock().
So the strategy needs to be that where we prepare the DPMAC by calling
dpaa2_mac_connect(), and only make priv->mac point to it once it's fully
prepared. This ensures that the writer side critical section has the
absolute minimum surface it can.
The reverse strategy is adopted in the dpaa2_eth_disconnect_mac() code
path. This makes sure that priv->mac is NULL when we start tearing down
the DPMAC that we disconnected from, and concurrent code will simply not
see it.
No locking changes in this patch (concurrent code is still blocked by
the rtnl_mutex).
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
dpaa2_mac_disconnect() will only be called with a NULL mac->phylink if
dpaa2_mac_connect() failed, or was never called.
The callers are these:
dpaa2_eth_disconnect_mac():
if (dpaa2_eth_is_type_phy(priv))
dpaa2_mac_disconnect(priv->mac);
dpaa2_switch_port_disconnect_mac():
if (dpaa2_switch_port_is_type_phy(port_priv))
dpaa2_mac_disconnect(port_priv->mac);
priv->mac can be NULL, but in that case, dpaa2_eth_is_type_phy() returns
false, and dpaa2_mac_disconnect() is never called. Similar for
dpaa2-switch.
When priv->mac is non-NULL, it means that dpaa2_mac_connect() returned
zero (success), and therefore, priv->mac->phylink is also a valid
pointer.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The phylink handling is intended to be hidden inside the dpaa2_mac
object. Move the phylink_start() call into dpaa2_mac_start(), and
phylink_stop() into dpaa2_mac_stop().
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
dpaa2_mac_is_type_fixed() is a header with no implementation and no
callers, which is referenced from the documentation though. It can be
deleted.
On the other hand, it would be useful to reuse the code between
dpaa2_eth_is_type_phy() and dpaa2_switch_port_is_type_phy(). That common
code should be called dpaa2_mac_is_type_phy(), so let's create that.
The removal and the addition are merged into the same patch because,
in fact, is_type_phy() is the logical opposite of is_type_fixed().
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
dpaa2_eth_setup_dpni() is called from the probe path and
dpaa2_eth_set_link_ksettings() is propagated to user space.
include/linux/errno.h says that ENOTSUPP is "Defined for the NFSv3
protocol". Conventional wisdom has it to not use it in networking
drivers. Replace it with -EOPNOTSUPP.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Tested-by: Ioana Ciornei <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
If vcap_dup_rule() fails that leads to an error pointer dereference
side the call to vcap_free_rule(). Also it only returns an error if the
very last call to vcap_read_rule() fails and it returns success for
other errors.
I've changed it to just stop printing after the first error and return
an error code.
Fixes: 3a7921560d2f ("net: microchip: sparx5: Add VCAP rule debugFS support for the VCAP API")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Steen Hegelund <[email protected]>
Link: https://lore.kernel.org/r/Y4XUUx9kzurBN+BV@kili
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The "ignore_updelay" variable needs to be initialized to false.
Fixes: f8a65ab2f3ff ("bonding: fix link recovery in mode 2 when updelay is nonzero")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Pavan Chebbi <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Link: https://lore.kernel.org/r/Y4SWJlh3ohJ6EPTL@kili
Signed-off-by: Paolo Abeni <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-11-29
Misc update for mlx5 driver
1) Various trivial cleanups
2) Maor Dickman, Adds support for trap offload with additional actions
3) From Tariq, UMR (device memory registrations) cleanups,
UMR WQE must be aligned to 64B per device spec, (not a bug fix).
* tag 'mlx5-updates-2022-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Support devlink reload of IPsec core
net/mlx5e: TC, Add offload support for trap with additional actions
net/mlx5e: Do early return when setup vports dests for slow path flow
net/mlx5: Remove redundant check
net/mlx5e: Delete always true DMA check
net/mlx5e: Don't access directly DMA device pointer
net/mlx5e: Don't use termination table when redundant
net/mlx5: Fix orthography errors in documentation
net/mlx5: Use generic definition for UMR KLM alignment
net/mlx5: Generalize name of UMR alignment definition
net/mlx5: Remove unused UMR MTT definitions
net/mlx5e: Add padding when needed in UMR WQEs
net/mlx5: Remove unused ctx variables
net/mlx5e: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
net/mlx5e: Remove unneeded io-mapping.h #include
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
If the external phy used by current mac interface is
managed by another mac interface, it means that this
network port cannot work independently, especially
when the system suspends and resumes, the following
trace may appear, so we should create a device link
between phy dev and mac dev.
WARNING: CPU: 0 PID: 24 at drivers/net/phy/phy.c:983 phy_error+0x20/0x68
Modules linked in:
CPU: 0 PID: 24 Comm: kworker/0:2 Not tainted 6.1.0-rc3-00011-g5aaef24b5c6d-dirty #34
Hardware name: Freescale i.MX6 SoloX (Device Tree)
Workqueue: events_power_efficient phy_state_machine
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x90
dump_stack_lvl from __warn+0xb4/0x24c
__warn from warn_slowpath_fmt+0x5c/0xd8
warn_slowpath_fmt from phy_error+0x20/0x68
phy_error from phy_state_machine+0x22c/0x23c
phy_state_machine from process_one_work+0x288/0x744
process_one_work from worker_thread+0x3c/0x500
worker_thread from kthread+0xf0/0x114
kthread from ret_from_fork+0x14/0x28
Exception stack(0xf0951fb0 to 0xf0951ff8)
Signed-off-by: Xiaolei Wang <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Vincent Mailhol says:
====================
net: devlink: return the driver name in devlink_nl_info_fill
The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.
The goal of this series is to have the devlink core to report this
information instead of the drivers.
The first patch fulfills the actual goal of this series: modify
devlink core to report the driver name and clean-up all drivers. Both
have to be done in an atomic change to avoid attribute
duplication. This same patch also removes the
devlink_info_driver_name_put() function to prevent future drivers from
reporting the driver name themselves.
The second patch allows the core to call devlink_nl_info_fill() even
if the devlink_ops::info_get() callback is NULL. This leads to the
third and final patch which cleans up the drivers which have an empty
info_get().
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
devlink_ops::info_get() is now optional and devlink will continue to
report information even if that callback gets removed.
Remove all the empty devlink_ops::info_get() callbacks from the
drivers.
Signed-off-by: Vincent Mailhol <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Some drivers only reported the driver name in their
devlink_ops::info_get() callback. Now that the core provides this
information, the callback became empty. For such drivers, just
removing the callback would prevent the core from executing
devlink_nl_info_fill() meaning that "devlink dev info" would not
return anything.
Make the callback function optional by executing
devlink_nl_info_fill() even if devlink_ops::info_get() is NULL.
N.B.: the drivers with devlink support which previously did not
implement devlink_ops::info_get() will now also be able to report
the driver name.
Signed-off-by: Vincent Mailhol <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.
In order to factorize code, make devlink_nl_info_fill() add the driver
name attribute.
Now that the core sets the driver name attribute, drivers are not
supposed to call devlink_info_driver_name_put() anymore. Remove
devlink_info_driver_name_put() and clean-up all the drivers using this
function in their callback.
Signed-off-by: Vincent Mailhol <[email protected]>
Tested-by: Ido Schimmel <[email protected]> # mlxsw
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The NQ310 is another NFC chip from NXP, document the compatible in the
bindings.
Signed-off-by: Luca Weiss <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Jacob Keller says:
====================
support direct read from region
A long time ago when initially implementing devlink regions in ice I
proposed the ability to allow reading from a region without taking a
snapshot [1]. I eventually dropped this work from the original series due to
size. Then I eventually lost track of submitting this follow up.
This can be useful when interacting with some region that has some
definitive "contents" from which snapshots are made. For example the ice
driver has regions representing the contents of the device flash.
If userspace wants to read the contents today, it must first take a snapshot
and then read from that snapshot. This makes sense if you want to read a
large portion of data or you want to be sure reads are consistently from the
same recording of the flash.
However if user space only wants to read a small chunk, it must first
generate a snapshot of the entire contents, perform a read from the
snapshot, and then delete the snapshot after reading.
For such a use case, a direct read from the region makes more sense. This
can be achieved by allowing the devlink region read command to work without
a snapshot. Instead the portion to be read can be forwarded directly to the
driver via a new .read callback.
This avoids the need to read the entire region contents into memory first
and avoids the software overhead of creating a snapshot and then deleting
it.
This series implements such behavior and hooks up the ice NVM and shadow RAM
regions to allow it.
[1] https://lore.kernel.org/netdev/[email protected]/
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Implement the .read handler for the NVM and Shadow RAM regions. This
enables user space to read a small chunk of the flash without needing the
overhead of creating a full snapshot.
Update the documentation for ice to detail which regions have direct read
support.
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
78ad87da9978 ("ice: devlink: add shadow-ram region to snapshot Shadow RAM")
added support for the 'shadow-ram' devlink region, but did not document it
in the ice devlink documentation. Fix this.
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The ice driver supports a region for both the flat NVM contents as well as
the Shadow RAM which is a layer built on top of the flash during device
initialization.
These regions use an almost identical read function, except that the NVM
needs to set the direct flag when reading, while Shadow RAM needs to read
without the direct flag set. They each call ice_read_flat_nvm with the only
difference being whether to set the direct flash flag.
The NVM region read function also was fixed to read the NVM in blocks to
avoid a situation where the firmware reclaims the lock due to taking too
long.
Note that the region snapshot function takes the ops pointer so the
function can easily determine which region to read. Make use of this and
re-use the NVM snapshot function for both the NVM and Shadow RAM regions.
This makes Shadow RAM benefit from the same block approach as the NVM
region. It also reduces code in the ice driver.
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
To read from a region, user space must currently request a new snapshot of
the region and then read from that snapshot. This can sometimes be overkill
if user space only reads a tiny portion. They first create the snapshot,
then request a read, then destroy the snapshot.
For regions which have a single underlying "contents", it makes sense to
allow supporting direct reading of the region data.
Extend the DEVLINK_CMD_REGION_READ to allow direct reading from a region if
requested via the new DEVLINK_ATTR_REGION_DIRECT. If this attribute is set,
then perform a direct read instead of using a snapshot. Direct read is
mutually exclusive with DEVLINK_ATTR_REGION_SNAPSHOT_ID, and care is taken
to ensure that we reject commands which provide incorrect attributes.
Regions must enable support for direct read by implementing the .read()
callback function. If a region does not support such direct reads, a
suitable extended error message is reported.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The devlink_nl_region_read_snapshot_fill is used to copy the contents of
a snapshot into a message for reporting to userspace via the
DEVLINK_CMG_REGION_READ netlink message.
A future change is going to add support for directly reading from
a region. Almost all of the logic for this new capability is identical.
To help reduce code duplication and make this logic more generic,
refactor the function to take a cb and cb_priv pointer for doing the
actual copy.
Add a devlink_region_snapshot_fill implementation that will simply copy
the relevant chunk of the region. This does require allocating some
storage for the chunk as opposed to simply passing the correct address
forward to the devlink_nl_cmg_region_read_chunk_fill function.
A future change to implement support for directly reading from a region
without a snapshot will provide a separate implementation that calls the
newly added devlink region operation.
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The devlink parameter of the devlink_nl_cmd_region_read_chunk_fill
function is not used. Remove it, to simplify the function signature.
Once removed, it is also obvious that the devlink parameter is not
necessary for the devlink_nl_region_read_snapshot_fill either.
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The snapshot pointer is obtained inside of the function
devlink_nl_region_read_snapshot_fill. Simplify this function by locating
the snapshot upfront in devlink_nl_cmd_region_read_dumpit instead. This
aligns with how other netlink attributes are handled, and allows us to
exit slightly earlier if an invalid snapshot ID is provided.
It also allows us to pass the snapshot pointer directly to the
devlink_nl_region_read_snapshot_fill, and remove the now unused attrs
parameter.
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Report extended error details in the devlink_nl_cmd_region_read_dumpit()
function, by using the extack structure from the netlink_callback.
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The calculation for the data_size in the devlink_nl_read_snapshot_fill
function uses an if statement that is better expressed using the min_t
macro.
Signed-off-by: Jacob Keller <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently whenever a new rule id is generated, it picks up the next
number bigger than previous id. So it would always be 1, 2, 3, etc.
When the rule with id 1 will be deleted and a new rule will be added,
it will have the id 4 and not id 1.
In theory this can be a problem if at some point a rule will be added
and removed ~0 times. Then no more rules can be added because there
are no more ids.
Change this such that when a new rule is added, search for an empty
rule id starting with value of 1 as value 0 is reserved.
Fixes: c9da1ac1c212 ("net: microchip: sparx5: Adding initial tc flower support for VCAP API")
Signed-off-by: Horatiu Vultur <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We know that table_size = table->mem_table.depth * table->mem_table.ways,
so use it instead, it is less verbose.
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/5230dabe27f48948a9fd0f50a62e2437b65e6a6e.1669378798.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This allocation is really spurious.
The size of the bitmap is 'tot_ids' and it is used as such in the driver.
So we could expect something like:
table->id_bmap = devm_kcalloc(rvu->dev, BITS_TO_LONGS(table->tot_ids),
sizeof(long), GFP_KERNEL);
However, when the bitmap is allocated, we allocate:
BITS_TO_LONGS(table->tot_ids) * table->tot_ids ~=
table->tot_ids / 32 * table->tot_ids ~=
table->tot_ids^2 / 32
It is proportional to the square of 'table->tot_ids' which seems to
potentially be big.
Allocate the expected amount of memory, and switch to the bitmap API to
have it more straightforward.
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/ce2710771939065d68f95d86a27cf7cea7966365.1669378798.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Use devm_bitmap_zalloc() instead of hand-writing it.
This also makes the comment "Allocate bitmap for 32 entry mcam" more
explicit because now 32 is really used in the allocation function, instead
of an obscure 'sizeof(long)'.
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/24177a9ee7043259448b735263d9cfd6a70e89a4.1669378798.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Use kzalloc() instead of kmalloc()/memset().
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/60ea220ccf3b61963f7d5a97e3df2c76a5feb837.1669378798.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When this error message is displayed, we know that the all the bits in the
bitmap are set.
So, bitmap_weight() will return the number of bits of the bitmap, which is
'table->tot_ids'.
It is unlikely that a bit will be cleared between mutex_unlock() and
dev_err(), but, in order to simplify the code and avoid this possibility,
just take 'table->tot_ids'.
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/5ce01c402f86412dc57884ff0994b63f0c5b3871.1669378798.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Test NIC hardware checksum offload:
- Rx + Tx
- IPv4 + IPv6
- TCP + UDP
Optional features:
- zero checksum 0xFFFF
- checksum disable 0x0000
- transport encap headers
- randomization
See file header for detailed comments.
Expected results differ depending on NIC features:
- CHECKSUM_UNNECESSARY vs CHECKSUM_COMPLETE
- NETIF_F_HW_CSUM (csum_start/csum_off) vs NETIF_F_IP(V6)_CSUM
Signed-off-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Change IPsec initialization flow to allow future creation of hardware
resources that should be released and allocated during devlink reload
operation. As part of that change, update function signature to be
void as no callers are actually interested in it.
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
TC trap action offload is currently supported only when trap is the sole action
in the flow.
This patch remove this limitation by changing trap action offload to not use
MLX5_ATTR_FLAG_SLOW_PATH flag and instead set the flow destination table
explicitly to be the slow table. This will allow offload of the additional
actions.
TC flow example:
tc filter add dev $REP2 protocol ip prio 2 root \
flower skip_sw dst_mac $mac0 \
action mirred egress redirect dev $REP3 \
action pedit ex munge eth dst set $mac2 pipe \
action trap
Signed-off-by: Maor Dickman <[email protected]>
Reviewed-by: Raed Salem <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Adding flow flag cases in setup vport dests before the slow path
case is incorrect as the slow path should take precedence.
Current code doesn't show this importance so make the slow path
case return early and separate from the other cases and remove
the redundant comparison of it in the sample case.
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Chris Mi <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
If ASO failed in creation, it won't be called to destroy either.
The kernel coding pattern is to make sure that callers are calling
to destroy only for valid objects.
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
DMA address always exists for MACsec ASO object.
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Use specialized helper to fetch DMA device pointer.
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Current code used termination table for each vport destination
while it's only required for hairpin, i.e. uplink to uplink, or
when vlan push on rx action being used.
Fix to skip using termination table for vport destinations that
do not require it.
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Improve general readability of the device driver documentation.
Signed-off-by: Rahul Rameshbabu <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
MLX5_UMR_KLM_ALIGNMENT is in units of number of entries, while
MLX5_UMR_MTT_ALIGNMENT (generalized and renamed to
MLX5_UMR_FLEX_ALIGNMENT) is in byte units. This is misleading and
confusing.
Replace this KLM definition with one based on the generic definition.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Per the device spec, MLX5_UMR_MTT_ALIGNMENT is good not only for UMR MTT
entries, but for all other entries as well, like KLMs and KSMs.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Defines MLX5_UMR_MTT_MASK and MLX5_UMR_MTT_MIN_CHUNK_SIZE are not in
use. Remove them.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|