aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-12-01Merge branch 'fix-rtnl_mutex-deadlock-with-dpaa2-and-sfp-modules'Paolo Abeni9-104/+212
Vladimir Oltean says: ==================== Fix rtnl_mutex deadlock with DPAA2 and SFP modules This patch set deliberately targets net-next and lacks Fixes: tags due to caution on my part. While testing some SFP modules on the Solidrun Honeycomb LX2K platform, I noticed that rebooting causes a deadlock: ============================================ WARNING: possible recursive locking detected 6.1.0-rc5-07010-ga9b9500ffaac-dirty #656 Not tainted -------------------------------------------- systemd-shutdow/1 is trying to acquire lock: ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30 but task is already holding lock: ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(rtnl_mutex); lock(rtnl_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by systemd-shutdow/1: #0: ffffa62db6863c70 (system_transition_mutex){+.+.}-{4:4}, at: __do_sys_reboot+0xd4/0x260 #1: ffff2f2b0176f100 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xf4/0x260 #2: ffff2f2b017be900 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x104/0x260 #3: ffff2f2b017680f0 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x40/0x260 #4: ffff2f2b0e1608f0 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x40/0x260 #5: ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30 stack backtrace: CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 6.1.0-rc5-07010-ga9b9500ffaac-dirty #656 Hardware name: SolidRun LX2160A Honeycomb (DT) Call trace: lock_acquire+0x68/0x84 __mutex_lock+0x98/0x460 mutex_lock_nested+0x2c/0x40 rtnl_lock+0x1c/0x30 sfp_bus_del_upstream+0x1c/0xac phylink_destroy+0x1c/0x50 dpaa2_mac_disconnect+0x28/0x70 dpaa2_eth_remove+0x1dc/0x1f0 fsl_mc_driver_remove+0x24/0x60 device_remove+0x70/0x80 device_release_driver_internal+0x1f0/0x260 device_links_unbind_consumers+0xe0/0x110 device_release_driver_internal+0x138/0x260 device_release_driver+0x18/0x24 bus_remove_device+0x12c/0x13c device_del+0x16c/0x424 fsl_mc_device_remove+0x28/0x40 __fsl_mc_device_remove+0x10/0x20 device_for_each_child+0x5c/0xac dprc_remove+0x94/0xb4 fsl_mc_driver_remove+0x24/0x60 device_remove+0x70/0x80 device_release_driver_internal+0x1f0/0x260 device_release_driver+0x18/0x24 bus_remove_device+0x12c/0x13c device_del+0x16c/0x424 fsl_mc_bus_remove+0x8c/0x10c fsl_mc_bus_shutdown+0x10/0x20 platform_shutdown+0x24/0x3c device_shutdown+0x15c/0x260 kernel_restart+0x40/0xa4 __do_sys_reboot+0x1e4/0x260 __arm64_sys_reboot+0x24/0x30 But fixing this appears to be not so simple. The patch set represents my attempt to address it. In short, the problem is that dpaa2_mac_connect() and dpaa2_mac_disconnect() call 2 phylink functions in a row, one takes rtnl_lock() itself - phylink_create(), and one which requires rtnl_lock() to be held by the caller - phylink_fwnode_phy_connect(). The existing approach in the drivers is too simple. We take rtnl_lock() when calling dpaa2_mac_connect(), which is what results in the deadlock. Fixing just that creates another problem. The drivers make use of rtnl_lock() for serializing with other code paths too. I think I've found all those code paths, and established other mechanisms for serializing with them. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-mac: move rtnl_lock() only around phylink_{,dis}connect_phy()Vladimir Oltean3-8/+5
After the introduction of a private mac_lock that serializes access to priv->mac (and port_priv->mac in the switch), the only remaining purpose of rtnl_lock() is to satisfy the locking requirements of phylink_fwnode_phy_connect() and phylink_disconnect_phy(). But the functions these live in, dpaa2_mac_connect() and dpaa2_mac_disconnect(), have contradictory locking requirements. While phylink_fwnode_phy_connect() wants rtnl_lock() to be held, phylink_create() wants it to not be held. Move the rtnl_lock() from top-level (in the dpaa2-eth and dpaa2-switch drivers) to only surround the phylink calls that require it, in the dpaa2-mac library code. This is possible because dpaa2_mac_connect() and dpaa2_mac_disconnect() run unlocked, and there isn't any danger of an AB/BA deadlock between the rtnl_mutex and other private locks. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-switch: serialize changes to priv->mac with a mutexVladimir Oltean3-10/+55
The dpaa2-switch driver uses a DPMAC in the same way as the dpaa2-eth driver, so we need to duplicate the locking solution established by the previous change to the switch driver as well. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-eth: serialize changes to priv->mac with a mutexVladimir Oltean3-16/+91
The dpaa2 architecture permits dynamic connections between objects on the fsl-mc bus, specifically between a DPNI object (represented by a struct net_device) and a DPMAC object (represented by a struct phylink). The DPNI driver is notified when those connections are created/broken through the dpni_irq0_handler_thread() method. To ensure that ethtool operations, as well as netdev up/down operations serialize with the connection/disconnection of the DPNI with a DPMAC, dpni_irq0_handler_thread() takes the rtnl_lock() to block those other operations from taking place. There is code called by dpaa2_mac_connect() which wants to acquire the rtnl_mutex once again, see phylink_create() -> phylink_register_sfp() -> sfp_bus_add_upstream() -> rtnl_lock(). So the strategy doesn't quite work out, even though it's fairly simple. Create a different strategy, where all code paths in the dpaa2-eth driver access priv->mac only while they are holding priv->mac_lock. The phylink instance is not created or connected to the PHY under the priv->mac_lock, but only assigned to priv->mac then. This will eliminate the reliance on the rtnl_mutex. Add lockdep annotations and put comments where holding the lock is not necessary, and priv->mac can be dereferenced freely. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-eth: connect to MAC before requesting the "endpoint changed" IRQVladimir Oltean1-9/+9
dpaa2_eth_connect_mac() is called both from dpaa2_eth_probe() and from dpni_irq0_handler_thread(). It could happen that the DPNI gets connected to a DPMAC on the fsl-mc bus exactly during probe, as soon as the "endpoint change" interrupt is requested in dpaa2_eth_setup_irqs(). This will cause the dpni_irq0_handler_thread() to register a phylink instance for that DPMAC. Then, the probing function will also try to register a phylink instance for the same DPMAC, operation which should fail (and this will fail the probing of the driver). Reorder dpaa2_eth_setup_irqs() and dpaa2_eth_connect_mac(), such that dpni_irq0_handler_thread() never races with the DPMAC-related portion of the probing path. Also reorder dpaa2_eth_disconnect_mac() to be in the mirror position of dpaa2_eth_connect_mac() in the teardown path. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-switch replace direct MAC access with dpaa2_switch_port_has_mac()Vladimir Oltean1-1/+1
The helper function will gain a lockdep annotation in a future patch. Make sure to benefit from it. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2: publish MAC stringset to ethtool -S even if MAC is missingVladimir Oltean2-18/+5
DPNIs and DPSW objects can connect and disconnect at runtime from DPMAC objects on the same fsl-mc bus. The DPMAC object also holds "ethtool -S" unstructured counters. Those counters are only shown for the entity owning the netdev (DPNI, DPSW) if it's connected to a DPMAC. The ethtool stringset code path is split into multiple callbacks, but currently, connecting and disconnecting the DPMAC takes the rtnl_lock(). This blocks the entire ethtool code path from running, see ethnl_default_doit() -> rtnl_lock() -> ops->prepare_data() -> strset_prepare_data(). This is going to be a problem if we are going to no longer require rtnl_lock() when connecting/disconnecting the DPMAC, because the DPMAC could appear between ops->get_sset_count() and ops->get_strings(). If it appears out of the blue, we will provide a stringset into an array that was dimensioned thinking the DPMAC wouldn't be there => array accessed out of bounds. There isn't really a good way to work around that, and I don't want to put too much pressure on the ethtool framework by playing locking games. Just make the DPMAC counters be always available. They'll be zeroes if the DPNI or DPSW isn't connected to a DPMAC. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-switch: assign port_priv->mac after dpaa2_mac_connect() callVladimir Oltean1-9/+12
The dpaa2-switch has the exact same locking requirements when connected to a DPMAC, so it needs port_priv->mac to always point either to NULL, or to a DPMAC with a fully initialized phylink instance. Make the same preparatory change in the dpaa2-switch driver as in the dpaa2-eth one. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-eth: assign priv->mac after dpaa2_mac_connect() callVladimir Oltean1-9/+12
There are 2 requirements for correct code: - Any time the driver accesses the priv->mac pointer at runtime, it either holds NULL to indicate a DPNI-DPNI connection (or unconnected DPNI), or a struct dpaa2_mac whose phylink instance was fully initialized (created and connected to the PHY). No changes are made to priv->mac while it is being used. Currently, rtnl_lock() watches over the call to dpaa2_eth_connect_mac(), so it serves the purpose of serializing this with all readers of priv->mac. - dpaa2_mac_connect() should run unlocked, because inside it are 2 phylink calls with incompatible locking requirements: phylink_create() requires that the rtnl_mutex isn't held, and phylink_fwnode_phy_connect() requires that the rtnl_mutex is held. The only way to solve those contradictory requirements is to let dpaa2_mac_connect() take rtnl_lock() when it needs to. To solve both requirements, we need to identify the writer side of the priv->mac pointer, which can be wrapped in a mutex private to the driver in a future patch. The dpaa2_mac_connect() cannot be part of the writer side critical section, because of an AB/BA deadlock with rtnl_lock(). So the strategy needs to be that where we prepare the DPMAC by calling dpaa2_mac_connect(), and only make priv->mac point to it once it's fully prepared. This ensures that the writer side critical section has the absolute minimum surface it can. The reverse strategy is adopted in the dpaa2_eth_disconnect_mac() code path. This makes sure that priv->mac is NULL when we start tearing down the DPMAC that we disconnected from, and concurrent code will simply not see it. No locking changes in this patch (concurrent code is still blocked by the rtnl_mutex). Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-mac: remove defensive check in dpaa2_mac_disconnect()Vladimir Oltean1-3/+0
dpaa2_mac_disconnect() will only be called with a NULL mac->phylink if dpaa2_mac_connect() failed, or was never called. The callers are these: dpaa2_eth_disconnect_mac(): if (dpaa2_eth_is_type_phy(priv)) dpaa2_mac_disconnect(priv->mac); dpaa2_switch_port_disconnect_mac(): if (dpaa2_switch_port_is_type_phy(port_priv)) dpaa2_mac_disconnect(port_priv->mac); priv->mac can be NULL, but in that case, dpaa2_eth_is_type_phy() returns false, and dpaa2_mac_disconnect() is never called. Similar for dpaa2-switch. When priv->mac is non-NULL, it means that dpaa2_mac_connect() returned zero (success), and therefore, priv->mac->phylink is also a valid pointer. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-mac: absorb phylink_start() call into dpaa2_mac_start()Vladimir Oltean3-8/+10
The phylink handling is intended to be hidden inside the dpaa2_mac object. Move the phylink_start() call into dpaa2_mac_start(), and phylink_stop() into dpaa2_mac_stop(). Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2: replace dpaa2_mac_is_type_fixed() with dpaa2_mac_is_type_phy()Vladimir Oltean4-17/+16
dpaa2_mac_is_type_fixed() is a header with no implementation and no callers, which is referenced from the documentation though. It can be deleted. On the other hand, it would be useful to reuse the code between dpaa2_eth_is_type_phy() and dpaa2_switch_port_is_type_phy(). That common code should be called dpaa2_mac_is_type_phy(), so let's create that. The removal and the addition are merged into the same patch because, in fact, is_type_phy() is the logical opposite of is_type_fixed(). Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: dpaa2-eth: don't use -ENOTSUPP error codeVladimir Oltean2-2/+2
dpaa2_eth_setup_dpni() is called from the probe path and dpaa2_eth_set_link_ksettings() is propagated to user space. include/linux/errno.h says that ENOTSUPP is "Defined for the NFSv3 protocol". Conventional wisdom has it to not use it in networking drivers. Replace it with -EOPNOTSUPP. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Tested-by: Ioana Ciornei <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01net: microchip: sparx5: Fix error handling in vcap_show_admin()Dan Carpenter1-4/+11
If vcap_dup_rule() fails that leads to an error pointer dereference side the call to vcap_free_rule(). Also it only returns an error if the very last call to vcap_read_rule() fails and it returns success for other errors. I've changed it to just stop printing after the first error and return an error code. Fixes: 3a7921560d2f ("net: microchip: sparx5: Add VCAP rule debugFS support for the VCAP API") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Steen Hegelund <[email protected]> Link: https://lore.kernel.org/r/Y4XUUx9kzurBN+BV@kili Signed-off-by: Paolo Abeni <[email protected]>
2022-12-01bonding: uninitialized variable in bond_miimon_inspect()Dan Carpenter1-1/+1
The "ignore_updelay" variable needs to be initialized to false. Fixes: f8a65ab2f3ff ("bonding: fix link recovery in mode 2 when updelay is nonzero") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Acked-by: Jay Vosburgh <[email protected]> Link: https://lore.kernel.org/r/Y4SWJlh3ohJ6EPTL@kili Signed-off-by: Paolo Abeni <[email protected]>
2022-11-30Merge tag 'mlx5-updates-2022-11-29' of ↵Jakub Kicinski22-135/+153
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-11-29 Misc update for mlx5 driver 1) Various trivial cleanups 2) Maor Dickman, Adds support for trap offload with additional actions 3) From Tariq, UMR (device memory registrations) cleanups, UMR WQE must be aligned to 64B per device spec, (not a bug fix). * tag 'mlx5-updates-2022-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Support devlink reload of IPsec core net/mlx5e: TC, Add offload support for trap with additional actions net/mlx5e: Do early return when setup vports dests for slow path flow net/mlx5: Remove redundant check net/mlx5e: Delete always true DMA check net/mlx5e: Don't access directly DMA device pointer net/mlx5e: Don't use termination table when redundant net/mlx5: Fix orthography errors in documentation net/mlx5: Use generic definition for UMR KLM alignment net/mlx5: Generalize name of UMR alignment definition net/mlx5: Remove unused UMR MTT definitions net/mlx5e: Add padding when needed in UMR WQEs net/mlx5: Remove unused ctx variables net/mlx5e: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper net/mlx5e: Remove unneeded io-mapping.h #include ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30net: phy: Add link between phy dev and mac devXiaolei Wang2-0/+16
If the external phy used by current mac interface is managed by another mac interface, it means that this network port cannot work independently, especially when the system suspends and resumes, the following trace may appear, so we should create a device link between phy dev and mac dev. WARNING: CPU: 0 PID: 24 at drivers/net/phy/phy.c:983 phy_error+0x20/0x68 Modules linked in: CPU: 0 PID: 24 Comm: kworker/0:2 Not tainted 6.1.0-rc3-00011-g5aaef24b5c6d-dirty #34 Hardware name: Freescale i.MX6 SoloX (Device Tree) Workqueue: events_power_efficient phy_state_machine unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x90 dump_stack_lvl from __warn+0xb4/0x24c __warn from warn_slowpath_fmt+0x5c/0xd8 warn_slowpath_fmt from phy_error+0x20/0x68 phy_error from phy_state_machine+0x22c/0x23c phy_state_machine from process_one_work+0x288/0x744 process_one_work from worker_thread+0x3c/0x500 worker_thread from kthread+0xf0/0x114 kthread from ret_from_fork+0x14/0x28 Exception stack(0xf0951fb0 to 0xf0951ff8) Signed-off-by: Xiaolei Wang <[email protected]> Tested-by: Florian Fainelli <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30Merge branch 'net-devlink-return-the-driver-name-in-devlink_nl_info_fill'Jakub Kicinski22-130/+26
Vincent Mailhol says: ==================== net: devlink: return the driver name in devlink_nl_info_fill The driver name is available in device_driver::name. Right now, drivers still have to report this piece of information themselves in their devlink_ops::info_get callback function. The goal of this series is to have the devlink core to report this information instead of the drivers. The first patch fulfills the actual goal of this series: modify devlink core to report the driver name and clean-up all drivers. Both have to be done in an atomic change to avoid attribute duplication. This same patch also removes the devlink_info_driver_name_put() function to prevent future drivers from reporting the driver name themselves. The second patch allows the core to call devlink_nl_info_fill() even if the devlink_ops::info_get() callback is NULL. This leads to the third and final patch which cleans up the drivers which have an empty info_get(). ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30net: devlink: clean-up empty devlink_ops::info_get()Vincent Mailhol3-22/+0
devlink_ops::info_get() is now optional and devlink will continue to report information even if that callback gets removed. Remove all the empty devlink_ops::info_get() callbacks from the drivers. Signed-off-by: Vincent Mailhol <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30net: devlink: make the devlink_ops::info_get() callback optionalVincent Mailhol1-7/+6
Some drivers only reported the driver name in their devlink_ops::info_get() callback. Now that the core provides this information, the callback became empty. For such drivers, just removing the callback would prevent the core from executing devlink_nl_info_fill() meaning that "devlink dev info" would not return anything. Make the callback function optional by executing devlink_nl_info_fill() even if devlink_ops::info_get() is NULL. N.B.: the drivers with devlink support which previously did not implement devlink_ops::info_get() will now also be able to report the driver name. Signed-off-by: Vincent Mailhol <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30net: devlink: let the core report the driver name instead of the driversVincent Mailhol22-106/+25
The driver name is available in device_driver::name. Right now, drivers still have to report this piece of information themselves in their devlink_ops::info_get callback function. In order to factorize code, make devlink_nl_info_fill() add the driver name attribute. Now that the core sets the driver name attribute, drivers are not supposed to call devlink_info_driver_name_put() anymore. Remove devlink_info_driver_name_put() and clean-up all the drivers using this function in their callback. Signed-off-by: Vincent Mailhol <[email protected]> Tested-by: Ido Schimmel <[email protected]> # mlxsw Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30dt-bindings: nfc: nxp,nci: Document NQ310 compatibleLuca Weiss1-1/+3
The NQ310 is another NFC chip from NXP, document the compatible in the bindings. Signed-off-by: Luca Weiss <[email protected]> Acked-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30Merge branch 'support-direct-read-from-region'Jakub Kicinski6-80/+215
Jacob Keller says: ==================== support direct read from region A long time ago when initially implementing devlink regions in ice I proposed the ability to allow reading from a region without taking a snapshot [1]. I eventually dropped this work from the original series due to size. Then I eventually lost track of submitting this follow up. This can be useful when interacting with some region that has some definitive "contents" from which snapshots are made. For example the ice driver has regions representing the contents of the device flash. If userspace wants to read the contents today, it must first take a snapshot and then read from that snapshot. This makes sense if you want to read a large portion of data or you want to be sure reads are consistently from the same recording of the flash. However if user space only wants to read a small chunk, it must first generate a snapshot of the entire contents, perform a read from the snapshot, and then delete the snapshot after reading. For such a use case, a direct read from the region makes more sense. This can be achieved by allowing the devlink region read command to work without a snapshot. Instead the portion to be read can be forwarded directly to the driver via a new .read callback. This avoids the need to read the entire region contents into memory first and avoids the software overhead of creating a snapshot and then deleting it. This series implements such behavior and hooks up the ice NVM and shadow RAM regions to allow it. [1] https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30ice: implement direct read for NVM and Shadow RAM regionsJacob Keller2-2/+75
Implement the .read handler for the NVM and Shadow RAM regions. This enables user space to read a small chunk of the flash without needing the overhead of creating a full snapshot. Update the documentation for ice to detail which regions have direct read support. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30ice: document 'shadow-ram' devlink regionJacob Keller1-0/+5
78ad87da9978 ("ice: devlink: add shadow-ram region to snapshot Shadow RAM") added support for the 'shadow-ram' devlink region, but did not document it in the ice devlink documentation. Fix this. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30ice: use same function to snapshot both NVM and Shadow RAMJacob Keller1-72/+23
The ice driver supports a region for both the flat NVM contents as well as the Shadow RAM which is a layer built on top of the flash during device initialization. These regions use an almost identical read function, except that the NVM needs to set the direct flag when reading, while Shadow RAM needs to read without the direct flag set. They each call ice_read_flat_nvm with the only difference being whether to set the direct flash flag. The NVM region read function also was fixed to read the NVM in blocks to avoid a situation where the firmware reclaims the lock due to taking too long. Note that the region snapshot function takes the ops pointer so the function can easily determine which region to read. Make use of this and re-use the NVM snapshot function for both the NVM and Shadow RAM regions. This makes Shadow RAM benefit from the same block approach as the NVM region. It also reduces code in the ice driver. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30devlink: support directly reading from region memoryJacob Keller4-17/+94
To read from a region, user space must currently request a new snapshot of the region and then read from that snapshot. This can sometimes be overkill if user space only reads a tiny portion. They first create the snapshot, then request a read, then destroy the snapshot. For regions which have a single underlying "contents", it makes sense to allow supporting direct reading of the region data. Extend the DEVLINK_CMD_REGION_READ to allow direct reading from a region if requested via the new DEVLINK_ATTR_REGION_DIRECT. If this attribute is set, then perform a direct read instead of using a snapshot. Direct read is mutually exclusive with DEVLINK_ATTR_REGION_SNAPSHOT_ID, and care is taken to ensure that we reject commands which provide incorrect attributes. Regions must enable support for direct read by implementing the .read() callback function. If a region does not support such direct reads, a suitable extended error message is reported. Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30devlink: refactor region_read_snapshot_fill to use a callback functionJacob Keller1-9/+35
The devlink_nl_region_read_snapshot_fill is used to copy the contents of a snapshot into a message for reporting to userspace via the DEVLINK_CMG_REGION_READ netlink message. A future change is going to add support for directly reading from a region. Almost all of the logic for this new capability is identical. To help reduce code duplication and make this logic more generic, refactor the function to take a cb and cb_priv pointer for doing the actual copy. Add a devlink_region_snapshot_fill implementation that will simply copy the relevant chunk of the region. This does require allocating some storage for the chunk as opposed to simply passing the correct address forward to the devlink_nl_cmg_region_read_chunk_fill function. A future change to implement support for directly reading from a region without a snapshot will provide a separate implementation that calls the newly added devlink region operation. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30devlink: remove unnecessary parameter from chunk_fill functionJacob Keller1-8/+2
The devlink parameter of the devlink_nl_cmd_region_read_chunk_fill function is not used. Remove it, to simplify the function signature. Once removed, it is also obvious that the devlink parameter is not necessary for the devlink_nl_region_read_snapshot_fill either. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30devlink: find snapshot in devlink_nl_cmd_region_read_dumpitJacob Keller1-11/+14
The snapshot pointer is obtained inside of the function devlink_nl_region_read_snapshot_fill. Simplify this function by locating the snapshot upfront in devlink_nl_cmd_region_read_dumpit instead. This aligns with how other netlink attributes are handled, and allows us to exit slightly earlier if an invalid snapshot ID is provided. It also allows us to pass the snapshot pointer directly to the devlink_nl_region_read_snapshot_fill, and remove the now unused attrs parameter. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30devlink: report extended error message in region_read_dumpit()Jacob Keller1-4/+12
Report extended error details in the devlink_nl_cmd_region_read_dumpit() function, by using the extack structure from the netlink_callback. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30devlink: use min_t to calculate data_sizeJacob Keller1-4/+2
The calculation for the data_size in the devlink_nl_read_snapshot_fill function uses an if statement that is better expressed using the min_t macro. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30net: microchip: vcap: Change how the rule id is generatedHoratiu Vultur2-7/+1
Currently whenever a new rule id is generated, it picks up the next number bigger than previous id. So it would always be 1, 2, 3, etc. When the rule with id 1 will be deleted and a new rule will be added, it will have the id 4 and not id 1. In theory this can be a problem if at some point a rule will be added and removed ~0 times. Then no more rules can be added because there are no more ids. Change this such that when a new rule is added, search for an empty rule id starting with value of 1 as value 0 is reserved. Fixes: c9da1ac1c212 ("net: microchip: sparx5: Adding initial tc flower support for VCAP API") Signed-off-by: Horatiu Vultur <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30octeontx2-af: Simplify a size computation in rvu_npc_exact_init()Christophe JAILLET1-1/+1
We know that table_size = table->mem_table.depth * table->mem_table.ways, so use it instead, it is less verbose. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/5230dabe27f48948a9fd0f50a62e2437b65e6a6e.1669378798.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30octeontx2-af: Fix the size of memory allocated for the 'id_bmap' bitmapChristophe JAILLET1-2/+2
This allocation is really spurious. The size of the bitmap is 'tot_ids' and it is used as such in the driver. So we could expect something like: table->id_bmap = devm_kcalloc(rvu->dev, BITS_TO_LONGS(table->tot_ids), sizeof(long), GFP_KERNEL); However, when the bitmap is allocated, we allocate: BITS_TO_LONGS(table->tot_ids) * table->tot_ids ~= table->tot_ids / 32 * table->tot_ids ~= table->tot_ids^2 / 32 It is proportional to the square of 'table->tot_ids' which seems to potentially be big. Allocate the expected amount of memory, and switch to the bitmap API to have it more straightforward. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/ce2710771939065d68f95d86a27cf7cea7966365.1669378798.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30octeontx2-af: Use the bitmap API to allocate bitmapsChristophe JAILLET1-3/+3
Use devm_bitmap_zalloc() instead of hand-writing it. This also makes the comment "Allocate bitmap for 32 entry mcam" more explicit because now 32 is really used in the allocation function, instead of an obscure 'sizeof(long)'. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/24177a9ee7043259448b735263d9cfd6a70e89a4.1669378798.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30octeontx2-af: Slightly simplify rvu_npc_exact_init()Christophe JAILLET1-2/+1
Use kzalloc() instead of kmalloc()/memset(). Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/60ea220ccf3b61963f7d5a97e3df2c76a5feb837.1669378798.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30octeontx2-af: Fix a potentially spurious error messageChristophe JAILLET1-1/+1
When this error message is displayed, we know that the all the bits in the bitmap are set. So, bitmap_weight() will return the number of bits of the bitmap, which is 'table->tot_ids'. It is unlikely that a bit will be cleared between mutex_unlock() and dev_err(), but, in order to simplify the code and avoid this possibility, just take 'table->tot_ids'. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/5ce01c402f86412dc57884ff0994b63f0c5b3871.1669378798.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-29selftests/net: add csum offload testWillem de Bruijn3-0/+988
Test NIC hardware checksum offload: - Rx + Tx - IPv4 + IPv6 - TCP + UDP Optional features: - zero checksum 0xFFFF - checksum disable 0x0000 - transport encap headers - randomization See file header for detailed comments. Expected results differ depending on NIC features: - CHECKSUM_UNNECESSARY vs CHECKSUM_COMPLETE - NETIF_F_HW_CSUM (csum_start/csum_off) vs NETIF_F_IP(V6)_CSUM Signed-off-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-29net/mlx5e: Support devlink reload of IPsec coreLeon Romanovsky4-23/+16
Change IPsec initialization flow to allow future creation of hardware resources that should be released and allocated during devlink reload operation. As part of that change, update function signature to be void as no callers are actually interested in it. Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5e: TC, Add offload support for trap with additional actionsMaor Dickman3-19/+18
TC trap action offload is currently supported only when trap is the sole action in the flow. This patch remove this limitation by changing trap action offload to not use MLX5_ATTR_FLAG_SLOW_PATH flag and instead set the flow destination table explicitly to be the slow table. This will allow offload of the additional actions. TC flow example: tc filter add dev $REP2 protocol ip prio 2 root \ flower skip_sw dst_mac $mac0 \ action mirred egress redirect dev $REP3 \ action pedit ex munge eth dst set $mac2 pipe \ action trap Signed-off-by: Maor Dickman <[email protected]> Reviewed-by: Raed Salem <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5e: Do early return when setup vports dests for slow path flowRoi Dayan1-5/+8
Adding flow flag cases in setup vport dests before the slow path case is incorrect as the slow path should take precedence. Current code doesn't show this importance so make the slow path case return early and separate from the other cases and remove the redundant comparison of it in the sample case. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Chris Mi <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5: Remove redundant checkLeon Romanovsky1-3/+0
If ASO failed in creation, it won't be called to destroy either. The kernel coding pattern is to make sure that callers are calling to destroy only for valid objects. Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5e: Delete always true DMA checkLeon Romanovsky1-5/+5
DMA address always exists for MACsec ASO object. Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5e: Don't access directly DMA device pointerLeon Romanovsky1-1/+1
Use specialized helper to fetch DMA device pointer. Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5e: Don't use termination table when redundantRoi Dayan1-4/+28
Current code used termination table for each vport destination while it's only required for hairpin, i.e. uplink to uplink, or when vlan push on rx action being used. Fix to skip using termination table for vport destinations that do not require it. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5: Fix orthography errors in documentationRahul Rameshbabu1-41/+41
Improve general readability of the device driver documentation. Signed-off-by: Rahul Rameshbabu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Reviewed-by: Gal Pressman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5: Use generic definition for UMR KLM alignmentTariq Toukan3-7/+7
MLX5_UMR_KLM_ALIGNMENT is in units of number of entries, while MLX5_UMR_MTT_ALIGNMENT (generalized and renamed to MLX5_UMR_FLEX_ALIGNMENT) is in byte units. This is misleading and confusing. Replace this KLM definition with one based on the generic definition. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Gal Pressman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5: Generalize name of UMR alignment definitionTariq Toukan6-17/+16
Per the device spec, MLX5_UMR_MTT_ALIGNMENT is good not only for UMR MTT entries, but for all other entries as well, like KLMs and KSMs. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Gal Pressman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-11-29net/mlx5: Remove unused UMR MTT definitionsTariq Toukan1-2/+0
Defines MLX5_UMR_MTT_MASK and MLX5_UMR_MTT_MIN_CHUNK_SIZE are not in use. Remove them. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Gal Pressman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>