Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-01-19 (ice)
This series contains updates to ice driver only.
Tsotne and Anatolii implement new handling, and AdminQ command, for
firmware LLDP, adding a pending notification to allow for proper
cleanup between TC changes.
Amritha extends support for drop action outside of switchdev.
Siddaraju adjusts restriction for PTP HW clock adjustments.
Ani removes an unneeded non-null check and improves reporting of some link
modes to utilize more appropriate values.
Jesse adds checks to ensure PF VSI type.
Przemek combines duplicate checks of the same condition into one check.
Tony makes various cleanups to code: removes comments for cppcheck
suppressions, reduces scope of some variables, changes some return
statements to reflect an explicit 0 return, matches naming for function
declaration and definition, adds local variable for readability, and
fixes indenting.
Sergey separates DDP (Dynamic Device Personalization) code into its own
file.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add support for DSCP rewrite in Sparx5 driver. On egress DSCP is
rewritten from either classified DSCP, or frame DSCP. Classified DSCP is
determined by the Analyzer Classifier on ingress, and is mapped from
classified QoS class and DP level. Classification of DSCP is by default
enabled for all ports.
It is required that DSCP is trusted for the egress port *and* rewrite
table is not empty, in order to rewrite DSCP based on classified DSCP,
otherwise DSCP is always rewritten from frame DSCP.
classified_dscp = qos_dscp_map[8 * dp_level + qos_class];
if (active_mappings && dscp_is_trusted)
rewritten_dscp = classified_dscp
else
rewritten_dscp = frame_dscp
To rewrite DSCP to 20 for any frames with priority 7:
$ dcb apptrust set dev eth0 order dscp
$ dcb rewr add dev eth0 7:20 <-- not in iproute2/dcb yet
Signed-off-by: Daniel Machon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add support for rewrite of PCP and DEI, based on classified Quality of
Service (QoS) class and Drop-Precedence (DP) level.
The DCB rewrite table is queried for mappings between priority and
PCP/DEI. The classified DP level is then encoded in the DEI bit, if a
mapping for DEI exists.
Sparx5 has four DP levels, where by default, 0 is mapped to DE0 and 1-3
are mapped to DE1. If a mapping exists where DEI=1, then all classified
DP levels mapped to DE1 will set the DEI bit. The other way around for
DEI=0. Effectively, this means that the tagged DEI bit will reflect the
DP level for any mappings where DEI=1.
Map priority=1 to PCP=1 and DEI=1:
$ dcb rewr add dev eth0 pcp-prio 1:1de
Map priority=7 to PCP=2 and DEI=0
$ dcb rewr add dev eth0 pcp-prio 7:2nd
Also, sparx5_dcb_ieee_dscp_setdel() has been refactored, to work for
both APP and rewrite entries.
Signed-off-by: Daniel Machon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In the original implementation of dwmac5
commit 8bf993a5877e ("net: stmmac: Add support for DWMAC5 and implement Safety Features")
all safety features were enabled by default.
Later it seems some implementations didn't have support for all the
features, so in
commit 5ac712dcdfef ("net: stmmac: enable platform specific safety features")
the safety_feat_cfg structure was added to the callback and defined for
some platforms to selectively enable these safety features.
The problem is that only certain platforms were given that software
support. If the automotive safety package bit is set in the hardware
features register the safety feature callback is called for the platform,
and for platforms that didn't get a safety_feat_cfg defined this results
in the following NULL pointer dereference:
[ 7.933303] Call trace:
[ 7.935812] dwmac5_safety_feat_config+0x20/0x170 [stmmac]
[ 7.941455] __stmmac_open+0x16c/0x474 [stmmac]
[ 7.946117] stmmac_open+0x38/0x70 [stmmac]
[ 7.950414] __dev_open+0x100/0x1dc
[ 7.954006] __dev_change_flags+0x18c/0x204
[ 7.958297] dev_change_flags+0x24/0x6c
[ 7.962237] do_setlink+0x2b8/0xfa4
[ 7.965827] __rtnl_newlink+0x4ec/0x840
[ 7.969766] rtnl_newlink+0x50/0x80
[ 7.973353] rtnetlink_rcv_msg+0x12c/0x374
[ 7.977557] netlink_rcv_skb+0x5c/0x130
[ 7.981500] rtnetlink_rcv+0x18/0x2c
[ 7.985172] netlink_unicast+0x2e8/0x340
[ 7.989197] netlink_sendmsg+0x1a8/0x420
[ 7.993222] ____sys_sendmsg+0x218/0x280
[ 7.997249] ___sys_sendmsg+0xac/0x100
[ 8.001103] __sys_sendmsg+0x84/0xe0
[ 8.004776] __arm64_sys_sendmsg+0x24/0x30
[ 8.008983] invoke_syscall+0x48/0x114
[ 8.012840] el0_svc_common.constprop.0+0xcc/0xec
[ 8.017665] do_el0_svc+0x38/0xb0
[ 8.021071] el0_svc+0x2c/0x84
[ 8.024212] el0t_64_sync_handler+0xf4/0x120
[ 8.028598] el0t_64_sync+0x190/0x194
Go back to the original behavior, if the automotive safety package
is found to be supported in hardware enable all the features unless
safety_feat_cfg is passed in saying this particular platform only
supports a subset of the features.
Fixes: 5ac712dcdfef ("net: stmmac: enable platform specific safety features")
Reported-by: Ning Cai <[email protected]>
Signed-off-by: Andrew Halaney <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
CPT HW would trigger the CPT AF FLT interrupt when CPT engines
hits some uncorrectable errors and AF is the one which receives
the interrupt and recovers the engines.
This patch adds a mailbox for CPT VFs to request for CPT faulted
and recovered engines info.
Signed-off-by: Srujana Challa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The CN10K CPT coprocessor contains a context processor
to accelerate updates to the IPsec security association
contexts. The context processor contains a context cache.
This patch updates CPT LF ALLOC mailbox to config ctx_ilen
requested by VFs. CPT_LF_ALLOC:ctx_ilen is the size of
initial context fetch.
Signed-off-by: Srujana Challa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
CN10K CPT coprocessor includes a component named RXC which
is responsible for reassembly of inner IP packets. RXC has
the feature to evict oldest entries based on age/threshold.
The age/threshold is being set to minimum values to evict
all entries at the time of teardown.
This patch adds code to restore timeout and threshold config
after teardown sequence is complete as it is global config.
Signed-off-by: Nithin Dabilpuram <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Optimize CPT PF identification in mbox handling for faster
mbox response by doing it at AF driver probe instead of
every mbox message.
Signed-off-by: Srujana Challa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
On OcteonTX2 platform CPT instruction enqueue is only
possible via LMTST operations.
The existing FLR sequence mentioned in HRM requires
a dummy LMTST to CPT but LMTST can't be submitted from
AF driver. So, HW team provided a new sequence to avoid
dummy LMTST. This patch adds code for the same.
Signed-off-by: Srujana Challa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
On OcteonTX2 SoC, the admin function (AF) is the only one with all
priviliges to configure HW and alloc resources, PFs and it's VFs
have to request AF via mailbox for all their needs.
This patch adds a new mailbox for CPT VFs to request for CPT LF
reset.
Signed-off-by: Srujana Challa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When CPT engine has uncorrectable errors, it will get halted and
must be disabled and re-enabled. This patch adds code for the same.
Signed-off-by: Srujana Challa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-01-18
1) From Rahul,
1.1) extended range for PTP adjtime and adjphase
1.2) adjphase function to support hardware-only offset control
2) From Roi, code cleanup to the TC module.
3) From Maor, TC support for Geneve and GRE with VF tunnel offload
4) Cleanups and minor updates.
* tag 'mlx5-updates-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Use read lock for eswitch get callbacks
net/mlx5e: Remove redundant allocation of spec in create indirect fwd group
net/mlx5e: Support Geneve and GRE with VF tunnel offload
net/mlx5: E-Switch, Fix typo for egress
net/mlx5e: Warn when destroying mod hdr hash table that is not empty
net/mlx5e: TC, Use common function allocating flow mod hdr or encap mod hdr
net/mlx5e: TC, Add tc prefix to attach/detach hdr functions
net/mlx5e: TC, Pass flow attr to attach/detach mod hdr functions
net/mlx5e: Add warning when log WQE size is smaller than log stride size
net/mlx5e: Fail with messages when params are not valid for XSK
net/mlx5: E-switch, Remove redundant comment about meta rules
net/mlx5: Add hardware extended range support for PTP adjtime and adjphase
net/mlx5: Add adjphase function to support hardware-only offset control
net/mlx5: Suppress error logging on UCTX creation
net/mlx5e: Suppress Send WQEBB room warning for PAGE_SIZE >= 16KB
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Remove port-specific health reporter destroy function as it is
currently the same as the instance one so no longer needed. Inline
__devlink_health_reporter_destroy() as it is no longer called from
multiple places.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Similar to other devlink objects, protect the reporters list
by devlink instance lock. Alongside add unlocked versions
of health reporter create/destroy functions and use them in drivers
on call paths where the instance lock is held.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The MLX5E_LOCKED_FLOW flag is not checked anywhere now so remove it
entirely.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The fact that devlink instance lock is held over mlx5 auxiliary devices
probe and remove routines brought a need to conditionally take devlink
instance lock there. The code is checking a MLX5E_LOCKED_FLOW flag
in mlx5 priv struct.
This is racy and may lead to access devlink objects without holding
instance lock or deadlock.
To avoid this, the only lock-wise sane solution is to make the
devlink entities created by the auxiliary device independent on
the original pci devlink instance. Create devlink instance for the
auxiliary device and put the uplink port instance there alongside with
the port health reporters.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Similar to other devlink objects, convert the linecards list to be
protected by devlink instance lock. Alongside with that rename the
create/destroy() functions to devl_* to indicate the devlink instance
lock needs to be held while calling them.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In the am65_cpsw_init_serdes_phy() function, the error handling for the
call to the devm_of_phy_get() function misses the case where the return
value of devm_of_phy_get() is ERR_PTR(-EPROBE_DEFER). Proceeding without
handling this case will result in a crash when the "phy" pointer with
this value is dereferenced by phy_init() in am65_cpsw_enable_phy().
Fix this by adding appropriate error handling code.
Reported-by: Geert Uytterhoeven <[email protected]>
Fixes: dab2b265dd23 ("net: ethernet: ti: am65-cpsw: Add support for SERDES configuration")
Suggested-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Siddharth Vadapalli <[email protected]>
Reviewed-by: Roger Quadros <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
smatch reports inconsistent indenting due to an extra space; remove it to
resolve the issue.
smatch warnings:
drivers/net/ethernet/intel/ice/ice_lib.c:1673 ice_vsi_alloc_ring_stats() warn: inconsistent indenting
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Based on previous feedback[1], introduce a local var to make things more
readable.
[1] https://lore.kernel.org/netdev/20220315203218.607f612b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
|
|
The parameter name in the function declaration and definition do not
match; adjust the naming for consistency and to avoid confusion.
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Previous checks, and goto, will catch all errors meaning these returns
will only return 0; explicitly return 0 for these cases.
Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
|
|
There are some places where the scope of a variable can
be reduced, so do that.
Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
|
|
Currently, ice_flex_pipe.c includes the DDP loading functions
and has grown large. Although flexible processing support
code is related to DDP loading, these parts are distinct.
Move the DDP loading functionality from ice_flex_pipe.c to
a separate file.
Signed-off-by: Sergey Temerkhanov <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The use of suppressions for cppcheck in the kernel does not look to be
standard as the ice driver is the only one doing it. Remove the
comments/suppressions.
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Combine if statements setting the same link speed together.
Suggested-by: Tony Nguyen <[email protected]>
Signed-off-by: Przemek Kitszel <[email protected]>
Acked-by: Anirudh Venkataramanan <[email protected]>
Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Commit 2736d94f351b ("ethtool: Added support for 50Gbps per lane link modes")
in v5.1 added (among other things) support for 100G CR2/KR2/SR2 link modes.
Advertise these link modes if the firmware reports the corresponding PHY types.
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Signed-off-by: Przemek Kitszel <[email protected]>
Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
There were a few places we had missed checking the VSI type to make sure
it was definitely a PF VSI, before calling setup functions intended only
for the PF VSI.
This doesn't fix any explicit bugs but cleans up the code in a few
places and removes one explicit != vsi->type check that can be
superseded by this code (it's a super set)
Signed-off-by: Jesse Brandeburg <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Remove a redundant null check, as vsi could not be null at this point.
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Signed-off-by: Przemek Kitszel <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The PHY provides only 39b timestamp. With current timing
implementation, we discard lower 7b, leaving 32b timestamp.
The driver reconstructs the full 64b timestamp by correlating the
32b timestamp with cached_time for performance. The reconstruction
algorithm does both forward & backward interpolation.
The 32b timeval has overflow duration of 2^32 counts ~= 4.23 second.
Due to interpolation in both direction, its now ~= 2.125 second
IIRC, going with at least half a duration, the cached_time is updated
with periodic thread of 1 second (worst-case) periodicity.
But the 1 second periodicity is based on System-timer.
With PPB adjustments, if the 1588 timers increments at say
double the rate, (2s in-place of 1s), the Nyquist rate/half duration
sampling/update of cached_time with 1 second periodic thread will
lead to incorrect interpolations.
Hence we should restrict the PPB adjustments to at least half duration
of cached_time update which translates to 500,000,000 PPB.
Since the periodicity of the cached-time system thread can vary,
it is good to have some buffer time and considering practicality of
PPB adjustments, limiting the max_adj to 100,000,000.
Signed-off-by: Siddaraju DH <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Currently the drop action is supported only in switchdev mode.
Add support for offloading receive filters with action drop
in ADQ/non-ADQ modes. This is in addition to other actions
such as forwarding to a VSI (ADQ) or a queue (ADQ/non-ADQ).
Also renamed 'ch_vsi' to 'dest_vsi' as it is valid for multiple
actions such as forward to vsi/queue which may/may not create a
channel vsi.
Reviewed-by: Sridhar Samudrala <[email protected]>
Signed-off-by: Amritha Nambiar <[email protected]>
Tested-by: Bharathi Sreenivas <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
If the number of Traffic Classes (TC) is decreased, the FW will no
longer remove TC nodes, but will send a pending change notification. This
will allow RDMA to destroy corresponding Control QP markers. After RDMA
finishes outstanding operations, the ice driver will send an execute MIB
Pending change admin queue command to FW to finish DCB configuration
change.
The FW will buffer all incoming Pending changes, so there can be only
one active Pending change.
RDMA driver guarantees to remove Control QP markers within 5000 ms.
Hence, LLDP response timeout txTTL (default 30 sec) will be met.
In the case of a Pending change, LLDP MIB Change Event (opcode 0x0A01) will
contain the whole new MIB. But Get LLDP MIB (opcode 0x0A00) AQ call would
still return an old MIB, as the Pending change hasn't been applied yet.
Add ice_get_dcb_cfg_from_mib_change() function to retrieve DCBX config
from LLDP MIB Change Event's buffer for Pending changes.
Co-developed-by: Dave Ertman <[email protected]>
Signed-off-by: Dave Ertman <[email protected]>
Signed-off-by: Anatolii Gerasymenko <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In DCB Willing Mode (FW managed LLDP), when the link partner changes
configuration which requires fewer TCs, the TCs that are no longer
needed are suspended by EMP FW, removed, and never resumed. This occurs
before a MIB change event is indicated to SW. The permanent suspension and
removal of these TC nodes in the scheduler prevents RDMA from being able
to destroy QPs associated with this TC, requiring a CORE reset to recover.
A new DCBX configuration change flow is defined to allow SW driver and
other SW components (RDMA) to properly adjust to the configuration
changes before they are taking effect in HW. This flow includes a
two-way handshake between EMP FW<->LAN SW<->RDMA SW.
List of changes:
- Add 'Execute Pending LLDP MIB' AQC.
- Add 'Pending Event Enable' bit.
- Add additional logic to ignore Pending Event Enable' request
while 'LLDP MIB Chnage' event is disabled.
- Add 'Execute Pending LLDP MIB' AQC sending function to FW,
which is needed to take place MIB Event change.
Signed-off-by: Tsotne Chakhvadze <[email protected]>
Co-developed-by: Karen Sornek <[email protected]>
Signed-off-by: Karen Sornek <[email protected]>
Co-developed-by: Dave Ertman <[email protected]>
Signed-off-by: Dave Ertman <[email protected]>
Co-developed-by: Anatolii Gerasymenko <[email protected]>
Signed-off-by: Anatolii Gerasymenko <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Deciding if to probe of PHYs using C45 is now determine by if the bus
provides the C45 read method. This makes probe_capabilities redundant
so remove it.
Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: Michael Walle <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Acked-by: Andrew Jeffery <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Some C22 PHYs do bad things when there are C45 transactions on the
bus. In order to handle this, the bus needs to be scanned first for
C22 at all addresses, and then C45 scanned for all addresses.
The Marvell pxa168 driver scans a specific address on the bus to find
its PHY. This is a C22 only device, so update it to use the c22
helper.
Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: Michael Walle <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net: mlx5: eliminate anonymous module_init & module_exit
net/mlx5: E-switch, Fix switchdev mode after devlink reload
net/mlx5e: Protect global IPsec ASO
net/mlx5e: Remove optimization which prevented update of ESN state
net/mlx5e: Set decap action based on attr for sample
net/mlx5e: QoS, Fix wrongfully setting parent_element_id on MODIFY_SCHEDULING_ELEMENT
net/mlx5: E-switch, Fix setting of reserved fields on MODIFY_SCHEDULING_ELEMENT
net/mlx5e: Remove redundant xsk pointer check in mlx5e_mpwrq_validate_xsk
net/mlx5e: Avoid false lock dependency warning on tc_ht even more
net/mlx5: fix missing mutex_unlock in mlx5_fw_fatal_reporter_err_work()
====================
Link: https://lore.kernel.org/r/
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The commit 4af1b64f80fb ("octeontx2-pf: Fix lmtst ID used in aura
free") uses the get/put_cpu() to protect the usage of percpu pointer
in ->aura_freeptr() callback, but it also unnecessarily disable the
preemption for the blockable memory allocation. The commit 87b93b678e95
("octeontx2-pf: Avoid use of GFP_KERNEL in atomic context") tried to
fix these sleep inside atomic warnings. But it only fix the one for
the non-rt kernel. For the rt kernel, we still get the similar warnings
like below.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by swapper/0/1:
#0: ffff800009fc5fe8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
#1: ffff000100c276c0 (&mbox->lock){+.+.}-{3:3}, at: otx2_init_hw_resources+0x8c/0x3a4
#2: ffffffbfef6537e0 (&cpu_rcache->lock){+.+.}-{2:2}, at: alloc_iova_fast+0x1ac/0x2ac
Preemption disabled at:
[<ffff800008b1908c>] otx2_rq_aura_pool_init+0x14c/0x284
CPU: 20 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc3-rt1-yocto-preempt-rt #1
Hardware name: Marvell OcteonTX CN96XX board (DT)
Call trace:
dump_backtrace.part.0+0xe8/0xf4
show_stack+0x20/0x30
dump_stack_lvl+0x9c/0xd8
dump_stack+0x18/0x34
__might_resched+0x188/0x224
rt_spin_lock+0x64/0x110
alloc_iova_fast+0x1ac/0x2ac
iommu_dma_alloc_iova+0xd4/0x110
__iommu_dma_map+0x80/0x144
iommu_dma_map_page+0xe8/0x260
dma_map_page_attrs+0xb4/0xc0
__otx2_alloc_rbuf+0x90/0x150
otx2_rq_aura_pool_init+0x1c8/0x284
otx2_init_hw_resources+0xe4/0x3a4
otx2_open+0xf0/0x610
__dev_open+0x104/0x224
__dev_change_flags+0x1e4/0x274
dev_change_flags+0x2c/0x7c
ic_open_devs+0x124/0x2f8
ip_auto_config+0x180/0x42c
do_one_initcall+0x90/0x4dc
do_basic_setup+0x10c/0x14c
kernel_init_freeable+0x10c/0x13c
kernel_init+0x2c/0x140
ret_from_fork+0x10/0x20
Of course, we can shuffle the get/put_cpu() to only wrap the invocation
of ->aura_freeptr() as what commit 87b93b678e95 does. But there are only
two ->aura_freeptr() callbacks, otx2_aura_freeptr() and
cn10k_aura_freeptr(). There is no usage of perpcu variable in the
otx2_aura_freeptr() at all, so the get/put_cpu() seems redundant to it.
We can move the get/put_cpu() into the corresponding callback which
really has the percpu variable usage and avoid the sprinkling of
get/put_cpu() in several places.
Fixes: 4af1b64f80fb ("octeontx2-pf: Fix lmtst ID used in aura free")
Signed-off-by: Kevin Hao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Add fixed_phy support at 1Gbps full duplex for the lan7431 device
if a phy not found over MDIO. Tested with a MAC to MAC connection
from LAN7431 to a KSZ9893 switch. This avoids the Driver open error
in LAN743x. TX delay and internal CLK125 generation is already
enabled in EEPROM.
Signed-off-by: Pavithra Sathyanarayanan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Add logic to read the Phy interface from MAC_CR register for LAN743x
driver.
Checks for the LAN7430/31 or pci11x1x devices and the adapter
interface is updated accordingly. For LAN7431, adapter interface is set
based on Bit 19 of MAC_CR register as MII or RGMII which removes the
forced RGMII/GMII configurations in lan743x_phy_open().
Signed-off-by: Pavithra Sathyanarayanan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Remove the MII/RGMII Selection settings in driver as it is preset
by the EEPROM and has the required configurations before the driver
loads for LAN743x.
Signed-off-by: Pavithra Sathyanarayanan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
napi_synchronize() from enetc_stop() waits until the softirq has
finished execution and no longer wants to be rescheduled. However under
high traffic load, this will never happen, and the interface can never
be closed.
The problem is the fact that the NAPI poll routine is written to update
the consumer index which makes the device want to put more buffers in
the RX ring, which restarts the madness again.
Browsing around, it seems that some drivers like i40e keep a bit
(__I40E_VSI_DOWN) which they use as communication between the control
path and the data path. But that isn't my first choice, because
complications ensue - since the enetc hardirq may trigger while we are
in a theoretical ENETC_DOWN state, it may happen that enetc_msix() masks
it, but enetc_poll() never unmasks it. To prevent a stall in that case,
one would need to schedule all NAPI instances when ENETC_DOWN gets
cleared, to process what's pending.
I find it more desirable for the control path - enetc_stop() - to just
quiesce the RX ring and let the softirq finish what remains there,
without any explicit communication, just by making hardware not provide
any more packets.
This seems possible with the Enable bit of the RX BD ring (RBaMR[EN]).
I can't seem to find an exact definition of what this bit does, but when
the RX ring is disabled, the port seems to no longer update the producer
index, and not react to software updates of the consumer index.
In fact, the RBaMR[EN] bit is already toggled by the driver, but too
late for what we want:
enetc_close()
-> enetc_stop()
-> napi_synchronize()
-> enetc_clear_bdrs()
-> enetc_clear_rxbdr()
The enetc_clear_bdrs() function contains not only logic to disable the
RX and TX rings, but also logic to wait for the TX ring stop being busy.
We split enetc_clear_bdrs() into enetc_disable_bdrs() and
enetc_wait_bdrs(). One needs to run before napi_synchronize() and the
other after (NAPI also processes TX completions, so we maximize our
chances of not waiting for the ENETC_TBSR_BUSY bit - unless a packet is
stuck for some reason, ofc).
We also split off enetc_enable_bdrs() from enetc_setup_bdrs(), and call
this from the mirror position in enetc_start() compared to enetc_stop(),
i.e. right before netif_tx_start_all_queues().
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Offloading a BPF program to the RX path of the driver suffers from the
same problems as the PTP reconfiguration - improper error checking can
leave the driver in an invalid state, and the link on the PHY is lost.
Reuse the enetc_reconfigure() procedure, but here, we need to run some
code in the middle of the ring reconfiguration procedure - while the
interface is still down. Introduce a callback which makes that possible.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the convention from this driver, which is to name "struct
net_device *" as "ndev", and the convention from other drivers, to name
"struct netdev_bpf *" as "bpf".
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The crude enetc_stop() -> enetc_open() mechanism suffers from 2
problems:
1. improper error checking
2. it involves phylink_stop() -> phylink_start() which loses the link
Right now, the driver is prepared to offer a better alternative: a ring
reconfiguration procedure which takes the RX BD size (normal or
extended) as argument. It allocates new resources (failing if that
fails), stops the traffic, and assigns the new resources to the rings.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We want to introduce a fast interface reconfiguration procedure, which
involves temporarily stopping the rings.
But we want enetc_start() and enetc_stop() to not restart PHY autoneg,
because that can take a few seconds until it completes again.
So we need part of enetc_start() and enetc_stop(), but not all of them.
Move phylink_start() right next to phylink_of_phy_connect(), and
phylink_stop() right next to phylink_disconnect_phy(), both still in
ndo_open() and ndo_stop().
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We have a few instances in the enetc driver where the ring resources
(BD ring iomem, software BD ring, software TSO headers, basically
everything except RX buffers) need to be reallocated. For example, when
RX timestamping is enabled, the RX BD format changes to an extended one
(twice as large).
Currently, this is done using a simplistic enetc_close() -> enetc_open()
procedure. But this is quite crude, since it also invokes phylink_stop()
-> phylink_start(), the link is lost, and a few seconds need to pass for
autoneg to complete again.
In fact it's bad also due to the improper (yolo) error checking. In case
we fail to allocate new resources, we've already freed the old ones, so
the interface is more or less stuck.
To avoid that, we need a system where reconfiguration is possible in a
way in which resources are allocated upfront. This means that there will
be a higher memory usage temporarily, but the assignment of resources to
rings can be done when both the old and new resources are still available.
Introduce a struct enetc_bdr_resource which holds the resources for a
ring, be it RX or TX. This structure duplicates a lot of fields from
struct enetc_bdr (and access to the same fields in the ring structure
was left duplicated, to not change cache characteristics in the fast
path).
When enetc_alloc_tx_resources() runs, it returns an array of resource
elements (one per TX ring), in addition to the existing priv->tx_res.
To populate priv->tx_res with that array, one must call
enetc_assign_tx_resources(), and this also frees the old resources.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Extended RX buffer descriptors are necessary if they carry RX
timestamps, which will be true when PTP timestamping is enabled.
Right now, the rx_ring->ext_en is set from the function that allocates
ring resources (enetc_alloc_rx_resources() -> enetc_alloc_rxbdr()), and
also used later, in enetc_setup_rxbdr(). It is also used in the
enetc_rxbd() and enetc_rxbd_next() fast path helpers.
We want to decouple resource allocation from BD ring setup, but both
procedures depend on BD size (extended or not). Move the "extended"
boolean to enetc_open() and pass it both to the RX allocation procedure
as well as to the RX ring setup procedure. The latter will set
rx_ring->ext_en from now on.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The call path in enetc_close() is:
enetc_close()
-> enetc_free_rxtx_rings()
-> enetc_free_tx_ring()
-> enetc_free_tx_frame()
-> enetc_free_tx_resources()
-> enetc_free_txbdr()
-> enetc_free_tx_frame()
The enetc_free_tx_frame() function is written such that the second call
exits without doing anything, but nonetheless, it is completely
redundant. Delete it. This makes the TX teardown path more similar to
the RX one, where rx_swbd freeing is done in enetc_free_rx_ring(), not
in enetc_free_rxbdr().
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The call path in enetc_close() is:
enetc_close()
-> enetc_free_rxtx_rings()
-> enetc_free_rx_ring()
-> tests whether rx_ring->rx_swbd is NULL
-> enetc_free_tx_ring()
-> tests whether tx_ring->tx_swbd is NULL
-> enetc_free_rx_resources()
-> enetc_free_rxbdr()
-> sets rxr->rx_swbd to NULL
-> enetc_free_tx_resources()
-> enetc_free_txbdr()
-> setx txr->tx_swbd to NULL
From the above, it is clear that due to the function ordering, the
checks for NULL are redundant, since the software buffer descriptor
arrays have not yet been set to NULL. Drop these checks.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This is a refactoring change which introduces the opposite function of
enetc_dma_alloc_bdr().
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|