Age | Commit message (Collapse) | Author | Files | Lines |
|
Only moving whole files, fixing Makefile and bunch of includes.
Some changes to ice_devlink file was done even in representor part (Tx
topology), so keep it as final patch to not mess up with rebasing.
After moving to devlink folder there is no need to have such long name
for these files. Rename them to simple devlink.
Reviewed-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Fixes Coccinelle/coccicheck warnings reported by newline_in_nl_msg.cocci.
Signed-off-by: Thorsten Blum <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
New E810 firmware supports the corresponding functionality, so the driver
allows PFs to subscribe the same switch recipes. Then when the PF is done
with a switch recipes, the PF can ask firmware to free that switch recipe.
When users configure a rule to PFn into E810 switch component, if there is
no existing recipe matching this rule's pattern, the driver will request
firmware to allocate and return a new recipe resource for the rule by
calling ice_add_sw_recipe() and ice_alloc_recipe(). If there is an existing
recipe matching this rule's pattern with different key value, or this is a
same second rule to PFm into switch component, the driver checks out this
recipe by calling ice_find_recp(), the driver will tell firmware to share
using this same recipe resource by calling ice_subscribable_recp_shared()
and ice_subscribe_recipe().
When firmware detects that all subscribing PFs have freed the switch
recipe, firmware will free the switch recipe so that it can be reused.
This feature also fixes a problem where all switch recipes would eventually
be exhausted because switch recipes could not be freed, as freeing a shared
recipe could potentially break other PFs that were using it.
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Andrii Staikov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Steven Zou <[email protected]>
Tested-by: Mayank Sharma <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
This is a cleanup. It is unnecessary to have this function just to call
another function.
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Michal Schmidt <[email protected]>
Reviewed-by: Sai Krishna <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Reviewed-by: Kalesh AP <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The PTP hardware semaphore (PFTSYN_SEM) is used to synchronize
operations that program the PTP timers. The operations involve issuing
commands to the sideband queue. The E810 does not have a hardware
sideband queue, so the admin queue is used. The admin queue is slow.
I have observed delays in hundreds of milliseconds waiting for
ice_sq_done.
When phc2sys reads the time from the ice PTP clock and PFTSYN_SEM is
held by a task performing one of the slow operations, ice_ptp_lock can
easily time out. phc2sys gets -EBUSY and the kernel prints:
ice 0000:XX:YY.0: PTP failed to get time
These messages appear once every few seconds, causing log spam.
The E810 datasheet recommends an algorithm for reading the upper 64 bits
of the GLTSYN_TIME register. It matches what's implemented in
ice_ptp_read_src_clk_reg. It is robust against wrap-around, but not
necessarily against the concurrent setting of the register (with
GLTSYN_CMD_{INIT,ADJ}_TIME commands). Perhaps that's why
ice_ptp_gettimex64 also takes PFTSYN_SEM.
The race with time setters can be prevented without relying on the PTP
hardware semaphore. Using the "ice_adapter" from the previous patch,
we can have a common spinlock for the PFs that share the clock hardware.
It will protect the reading and writing to the GLTSYN_TIME register.
The writing is performed indirectly, by the hardware, as a result of
the driver writing GLTSYN_CMD_SYNC in ice_ptp_exec_tmr_cmd. I wasn't
sure if the ice_flush there is enough to make sure GLTSYN_TIME has been
updated, but it works well in my testing.
My test code can be seen here:
https://gitlab.com/mschmidt2/linux/-/commits/ice-ptp-host-side-lock-10
It consists of:
- kernel threads reading the time in a busy loop and looking at the
deltas between consecutive values, reporting new maxima.
- a shell script that sets the time repeatedly;
- a bpftrace probe to produce a histogram of the measured deltas.
Without the spinlock ptp_gltsyn_time_lock, it is easy to see tearing.
Deltas in the [2G, 4G) range appear in the histograms.
With the spinlock added, there is no tearing and the biggest delta I saw
was in the range [1M, 2M), that is under 2 ms.
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Michal Schmidt <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
There is a need for synchronization between ice PFs on the same physical
adapter.
Add a "struct ice_adapter" for holding data shared between PFs of the
same multifunction PCI device. The struct is refcounted - each ice_pf
holds a reference to it.
Its first use will be for PTP. I expect it will be useful also to
improve the ugliness that is ice_prot_id_tbl.
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Michal Schmidt <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add support for creating PFCP filters in switchdev mode. Add support
for parsing PFCP-specific tc options: S flag and SEID.
To create a PFCP filter, a special netdev must be created and passed
to tc command:
ip link add pfcp0 type pfcp
tc filter add dev eth0 ingress prio 1 flower pfcp_opts \
1:123/ff:fffffffffffffff0 skip_hw action mirred egress redirect \
dev pfcp0
Changes in iproute2 [1] are required to be able to use pfcp_opts in tc.
ICE COMMS package is required to create a filter as it contains PFCP
profiles.
Link: https://lore.kernel.org/netdev/[email protected] [1]
Signed-off-by: Marcin Szycik <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
FLOW_DISSECTOR_KEY_ENC_OPTS can be used for multiple headers, but currently
it is treated as GTP-exclusive in ice. Rename ICE_TC_FLWR_FIELD_ENC_OPTS to
ICE_TC_FLWR_FIELD_GTP_OPTS and check for tunnel type earlier. After this
refactor, it is easier to add new headers using FLOW_DISSECTOR_KEY_ENC_OPTS
- instead of checking tunnel type in ice_tc_count_lkups() and
ice_tc_fill_tunnel_outer(), it needs to be checked only once, in
ice_parse_tunnel_attr().
Signed-off-by: Marcin Szycik <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There are, especially with multi-attr arrays, many cases
of needing to iterate all attributes of a specific type
in a netlink message or a nested attribute. Add specific
macros to support that case.
Also convert many instances using this spatch:
@@
iterator nla_for_each_attr;
iterator name nla_for_each_attr_type;
identifier nla;
expression head, len, rem;
expression ATTR;
type T;
identifier x;
@@
-nla_for_each_attr(nla, head, len, rem)
+nla_for_each_attr_type(nla, ATTR, head, len, rem)
{
<... T x; ...>
-if (nla_type(nla) == ATTR) {
...
-}
}
@@
identifier nla;
iterator nla_for_each_nested;
iterator name nla_for_each_nested_type;
expression attr, rem;
expression ATTR;
type T;
identifier x;
@@
-nla_for_each_nested(nla, attr, rem)
+nla_for_each_nested_type(nla, ATTR, attr, rem)
{
<... T x; ...>
-if (nla_type(nla) == ATTR) {
...
-}
}
@@
iterator nla_for_each_attr;
iterator name nla_for_each_attr_type;
identifier nla;
expression head, len, rem;
expression ATTR;
type T;
identifier x;
@@
-nla_for_each_attr(nla, head, len, rem)
+nla_for_each_attr_type(nla, ATTR, head, len, rem)
{
<... T x; ...>
-if (nla_type(nla) != ATTR) continue;
...
}
@@
identifier nla;
iterator nla_for_each_nested;
iterator name nla_for_each_nested_type;
expression attr, rem;
expression ATTR;
type T;
identifier x;
@@
-nla_for_each_nested(nla, attr, rem)
+nla_for_each_nested_type(nla, ATTR, attr, rem)
{
<... T x; ...>
-if (nla_type(nla) != ATTR) continue;
...
}
Although I had to undo one bad change this made, and
I also adjusted some other code for whitespace and to
use direct variable initialization now.
Signed-off-by: Johannes Berg <[email protected]>
Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Switch the Intel networking drivers to use the new power management ops
declaration formats and macros, which allows us to drop __maybe_unused,
as well as a bunch of ifdef checking CONFIG_PM.
This is safe to do because the compiler drops the unused functions,
verified by checking for any of the power management function symbols
being present in System.map for a build without CONFIG_PM.
If a driver has runtime PM, define the ops with pm_ptr(), and if the
driver has Simple PM, use pm_sleep_ptr(), as well as the new versions of
the macros for declaring the members of the pm_ops structs.
Checked with network-enabled allnoconfig, allyesconfig, allmodconfig on
x64_64.
Reviewed-by: Alan Brady <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: use less resources in switchdev
Michal Swiatkowski says:
Switchdev is using one queue per created port representor. This can
quickly lead to Rx queue shortage, as with subfunction support user
can create high number of PRs.
Save one MSI-X and 'number of PRs' * 1 queues.
Refactor switchdev slow-path to use less resources (even no additional
resources). Do this by removing control plane VSI and move its
functionality to PF VSI. Even with current solution PF is acting like
uplink and can't be used simultaneously for other use cases (adding
filters can break slow-path).
In short, do Tx via PF VSI and Rx packets using PF resources. Rx needs
additional code in interrupt handler to choose correct PR netdev.
Previous solution had to queue filters, it was way more elegant but
needed one queue per PRs. Beside that this refactor mostly simplifies
switchdev configuration.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: count representor stats
ice: do switchdev slow-path Rx using PF VSI
ice: change repr::id values
ice: remove switchdev control plane VSI
ice: control default Tx rule in lag
ice: default Tx rule instead of to queue
ice: do Tx through PF netdev in slow-path
ice: remove eswitch changing queues algorithm
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
__napi_alloc_skb() is napi_alloc_skb() with the added flexibility
of choosing gfp_mask. This is a NAPI function, so GFP_ATOMIC is
implied. The only practical choice the caller has is whether to
set __GFP_NOWARN. But that's a false choice, too, allocation failures
in atomic context will happen, and printing warnings in logs,
effectively for a packet drop, is both too much and very likely
non-actionable.
This leads me to a conclusion that most uses of napi_alloc_skb()
are simply misguided, and should use __GFP_NOWARN in the first
place. We also have a "standard" way of reporting allocation
failures via the queue stat API (qstats::rx-alloc-fail).
The direct motivation for this patch is that one of the drivers
used at Meta calls napi_alloc_skb() (so prior to this patch without
__GFP_NOWARN), and the resulting OOM warning is the top networking
warning in our fleet.
Reviewed-by: Alexander Lobakin <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, WiFi and netfilter.
Current release - regressions:
- ipv6: fix address dump when IPv6 is disabled on an interface
Current release - new code bugs:
- bpf: temporarily disable atomic operations in BPF arena
- nexthop: fix uninitialized variable in nla_put_nh_group_stats()
Previous releases - regressions:
- bpf: protect against int overflow for stack access size
- hsr: fix the promiscuous mode in offload mode
- wifi: don't always use FW dump trig
- tls: adjust recv return with async crypto and failed copy to
userspace
- tcp: properly terminate timers for kernel sockets
- ice: fix memory corruption bug with suspend and rebuild
- at803x: fix kernel panic with at8031_probe
- qeth: handle deferred cc1
Previous releases - always broken:
- bpf: fix bug in BPF_LDX_MEMSX
- netfilter: reject table flag and netdev basechain updates
- inet_defrag: prevent sk release while still in use
- wifi: pick the version of SESSION_PROTECTION_NOTIF
- wwan: t7xx: split 64bit accesses to fix alignment issues
- mlxbf_gige: call request_irq() after NAPI initialized
- hns3: fix kernel crash when devlink reload during pf
initialization"
* tag 'net-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
inet: inet_defrag: prevent sk release while still in use
Octeontx2-af: fix pause frame configuration in GMP mode
net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
net: bcmasp: Remove phy_{suspend/resume}
net: bcmasp: Bring up unimac after PHY link up
net: phy: qcom: at803x: fix kernel panic with at8031_probe
netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c
netfilter: nf_tables: skip netdev hook unregistration if table is dormant
netfilter: nf_tables: reject table flag and netdev basechain updates
netfilter: nf_tables: reject destroy command to remove basechain hooks
bpf: update BPF LSM designated reviewer list
bpf: Protect against int overflow for stack access size
bpf: Check bloom filter map value size
bpf: fix warning for crash_kexec
selftests: netdevsim: set test timeout to 10 minutes
net: wan: framer: Add missing static inline qualifiers
mlxbf_gige: call request_irq() after NAPI initialized
tls: get psock ref after taking rxlock to avoid leak
selftests: tls: add test with a partially invalid iov
tls: adjust recv return with async crypto and failed copy to userspace
...
|
|
Removing control plane VSI result in no information about slow-path
statistic. In current solution statistics need to be counted in driver.
Patch is based on similar implementation done by Simon Horman in nfp:
commit eadfa4c3be99 ("nfp: add stats and xmit helpers for representors")
Add const modifier to netdev parameter in ice_netdev_to_repr(). It isn't
(and shouldn't be) modified in the function.
Reviewed-by: Marcin Szycik <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add an ICE_RX_FLAG_MULTIDEV flag to Rx ring.
If it is set try to find correct port representor. Do it based on
src_vsi value stored in flex descriptor. Ids of representor pointers
stored in xarray are equal to corresponding src_vsi value. Thanks to
that we can directly get correct representor if we have src_vsi value.
Set multidev flag during ring configuration.
If the mode is switchdev, change the ring descriptor to the one that
contains src_vsi value.
PF netdev should be reconfigured, do it by calling ice_down() and
ice_up() if the netdev was up before configuring switchdev.
Reviewed-by: Marcin Szycik <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Instead of getting repr::id from xa_alloc() value, set it to the
src_vsi::num_vsi value. It is unique for each PR.
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Marcin Szycik <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
For slow-path Rx and Tx PF VSI is used. There is no need to have control
plane VSI. Remove all code related to it.
Eswitch rebuild can't fail without rebuilding control plane VSI. Return
void from ice_eswitch_rebuild().
Reviewed-by: Marcin Szycik <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Tx rule in switchdev was changed to use PF instead of additional control
plane VSI. Because of that during lag we should control it. Control
means to add and remove the default Tx rule during lag active/inactive
switching.
It can be done the same way as default Rx rule.
Reviewed-by: Wojciech Drewek <[email protected]>
Reviewed-by: Marcin Szycik <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Steer all packets that miss other rules to PF VSI. Previously in
switchdev mode, PF VSI received missed packets, but only ones marked
as Rx. Now it is receiving all missed packets.
To queue rule per PR isn't needed, because we use PF VSI instead of
control VSI now, and it's already correctly configured.
Add flag to correctly set LAN_EN bit in default Tx rule. It shouldn't
allow packet to go outside when there is a match.
Reviewed-by: Marcin Szycik <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Tx can be done using PF netdev.
Checks before Tx are unnecessary. Checking if switchdev mode is set
seems too defensive (there is no PR netdev in legacy mode). If
corresponding VF is disabled or during reset, PR netdev also should be
down.
Reviewed-by: Marcin Szycik <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Changing queues used by eswitch will be done through PF netdev.
There is no need to reserve queues if the number of used queues
is known.
Reviewed-by: Wojciech Drewek <[email protected]>
Reviewed-by: Marcin Szycik <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice driver would previously panic after suspend. This is caused
from the driver *only* calling the ice_vsi_free_q_vectors() function by
itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent
NULL pointer deref during reload") the driver has zeroed out
num_q_vectors, and only restored it in ice_vsi_cfg_def().
This further causes the ice_rebuild() function to allocate a zero length
buffer, after which num_q_vectors is updated, and then the new value of
num_q_vectors is used to index into the zero length buffer, which
corrupts memory.
The fix entails making sure all the code referencing num_q_vectors only
does so after it has been reset via ice_vsi_cfg_def().
I didn't perform a full bisect, but I was able to test against 6.1.77
kernel and that ice driver works fine for suspend/resume with no panic,
so sometime since then, this problem was introduced.
Also clean up an un-needed init of a local variable in the function
being modified.
PANIC from 6.8.0-rc1:
[1026674.915596] PM: suspend exit
[1026675.664697] ice 0000:17:00.1: PTP reset successful
[1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time
[1026675.667660] ice 0000:b1:00.0: PTP reset successful
[1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time
[1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None
[1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010
[1026677.192753] ice 0000:17:00.0: PTP reset successful
[1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time
[1026677.197928] #PF: supervisor read access in kernel mode
[1026677.197933] #PF: error_code(0x0000) - not-present page
[1026677.197937] PGD 1557a7067 P4D 0
[1026677.212133] ice 0000:b1:00.1: PTP reset successful
[1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time
[1026677.212575]
[1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1
[1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[1026677.269367] Workqueue: ice ice_service_task [ice]
[1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6
[1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202
[1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000
[1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828
[1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010
[1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0
[1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000
[1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0
[1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1026677.381952] PKRU: 55555554
[1026677.385116] Call Trace:
[1026677.388023] <TASK>
[1026677.390589] ? __die+0x20/0x70
[1026677.394105] ? page_fault_oops+0x82/0x160
[1026677.398576] ? do_user_addr_fault+0x65/0x6a0
[1026677.403307] ? exc_page_fault+0x6a/0x150
[1026677.407694] ? asm_exc_page_fault+0x22/0x30
[1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice]
[1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[1026677.429147] ice_rebuild+0x18b/0x520 [ice]
[1026677.433746] ? delay_tsc+0x8f/0xc0
[1026677.437630] ice_do_reset+0xa3/0x190 [ice]
[1026677.442231] ice_service_task+0x26/0x440 [ice]
[1026677.447180] process_one_work+0x174/0x340
[1026677.451669] worker_thread+0x27e/0x390
[1026677.455890] ? __pfx_worker_thread+0x10/0x10
[1026677.460627] kthread+0xee/0x120
[1026677.464235] ? __pfx_kthread+0x10/0x10
[1026677.468445] ret_from_fork+0x2d/0x50
[1026677.472476] ? __pfx_kthread+0x10/0x10
[1026677.476671] ret_from_fork_asm+0x1b/0x30
[1026677.481050] </TASK>
Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload")
Reported-by: Robert Elliott <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
According to the datasheet, the recipe association data is an 8-byte
little-endian value. It is described as 'Bitmap of the recipe indexes
associated with this profile', it is from 24 to 31 byte area in FW.
Therefore, it is defined to '__le64 recipe_assoc' in struct
ice_aqc_recipe_to_profile. And then fix the bitmap casting issue, as we
must never ever use castings for bitmap type.
Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface")
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Andrii Staikov <[email protected]>
Reviewed-by: Jan Sokolowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Steven Zou <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The norm should be flexible array structures with __counted_by
annotations, so DEFINE_FLEX() is updated to expect that. Rename
the non-annotated version to DEFINE_RAW_FLEX(), and update the
few existing users. Additionally add selftests for the macros.
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
|
|
Merge in late fixes to prepare for the 6.9 net-next PR.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ethtool: ice: Support for RSS settings to GTP
Takeru Hayasaka enables RSS functionality for GTP packets on ice driver
with ethtool.
A user can include TEID and make RSS work for GTP-U over IPv4 by doing the
following:`ethtool -N ens3 rx-flow-hash gtpu4 sde`
In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e,
gtpu(4|6)u, and gtpu(4|6)d.
gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does
not include a TEID.
gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that
includes a TEID.
gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios.
gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6.
gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended
header includes Uplink, applicable to both IPv4 and IPv6.
gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink,
for both IPv4 and IPv6.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Move RPS related structures and helpers from include/linux/netdevice.h
and include/net/sock.h to a new include file.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
net/core/page_pool_user.c
0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors")
429679dcf7d9 ("page_pool: fix netlink dump stop/resume")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Following the addition of new GTP RSS hash options to ethtool.h, this patch
implements the corresponding RSS settings for GTP packets in the Intel ice
driver. It enables users to configure RSS for GTP-U and GTP-C traffic over IPv4
and IPv6, utilizing the newly defined hash options.
The implementation covers the handling of gtpu(4|6), gtpc(4|6), gtpc(4|6)t,
gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d traffic, providing enhanced load
distribution for GTP traffic across multiple processing units.
Signed-off-by: Takeru Hayasaka <[email protected]>
Reviewed-by: Marcin Szycik <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Simplify stats accumulation logic to fix the case where we don't take
previous stat value into account, we should always respect it.
Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of
magnitude too big during OpenStack reconfiguration events, possibly other
reconfiguration cases too.
The regression was reported to be between 6.1 and 6.2, so I was almost
certain that on of the two "preserve stats over reset" commits were the
culprit. While reading the code, it was found that in some cases we will
increase the stats by arbitrarily large number (thanks to ignoring "-prev"
part of condition, after zeroing it).
Note that this fixes also the case where we were around limits of u64, but
that was not the regression reported.
Full disclosure: I remember suggesting this particular piece of code to
Ben a few years ago, so blame on me.
Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset")
Reported-by: Nebojsa Stevanovic <[email protected]>
Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com
Reported-by: Christian Rohmann <[email protected]>
Link: https://lore.kernel.org/intel-wired-lan/[email protected]
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Przemek Kitszel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-03-05 (idpf, ice, i40e, igc, e1000e)
This series contains updates to idpf, ice, i40e, igc and e1000e drivers.
Emil disables local BH on NAPI schedule for proper handling of softirqs
on idpf.
Jake stops reporting of virtchannel RSS option which in unsupported on
ice.
Rand Deeb adds null check to prevent possible null pointer dereference
on ice.
Michal Schmidt moves DPLL mutex initialization to resolve uninitialized
mutex usage for ice.
Jesse fixes incorrect variable usage for calculating Tx stats on ice.
Ivan Vecera corrects logic for firmware equals check on i40e.
Florian Kauer prevents memory corruption for XDP_REDIRECT on igc.
Sasha reverts an incorrect use of FIELD_GET which caused a regression
for Wake on LAN on e1000e.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Older versions of GCC really want to know the full definition
of the type involved in rcu_assign_pointer().
struct dpll_pin is defined in a local header, net/core can't
reach it. Move all the netdev <> dpll code into dpll, where
the type is known. Otherwise we'd need multiple function calls
to jump between the compilation units.
This is the same problem the commit under fixes was trying to address,
but with rcu_assign_pointer() not rcu_dereference().
Some of the exports are not needed, networking core can't
be a module, we only need exports for the helpers used by
drivers.
Reported-by: Geert Uytterhoeven <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 640f41ed33b5 ("dpll: fix build failure due to rcu_dereference_check() on unknown type")
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Fix an obviously incorrect assignment, created with a typo or cut-n-paste
error.
Fixes: 5995ef88e3a8 ("ice: realloc VSI stats arrays")
Signed-off-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The pf->dplls.lock mutex is initialized too late, after its first use.
Move it to the top of ice_dpll_init.
Note that the "err_exit" error path destroys the mutex. And the mutex is
the last thing destroyed in ice_dpll_deinit.
This fixes the following warning with CONFIG_DEBUG_MUTEXES:
ice 0000:10:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.36.0
ice 0000:10:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
ice 0000:10:00.0: PTP init successful
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 0 PID: 410 at kernel/locking/mutex.c:587 __mutex_lock+0x773/0xd40
Modules linked in: crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ice(+) nvme nvme_c>
CPU: 0 PID: 410 Comm: kworker/0:4 Not tainted 6.8.0-rc5+ #3
Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 10/19/2023
Workqueue: events work_for_cpu_fn
RIP: 0010:__mutex_lock+0x773/0xd40
Code: c0 0f 84 1d f9 ff ff 44 8b 35 0d 9c 69 01 45 85 f6 0f 85 0d f9 ff ff 48 c7 c6 12 a2 a9 85 48 c7 c7 12 f1 a>
RSP: 0018:ff7eb1a3417a7ae0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffff85ac2bff RDI: 00000000ffffffff
RBP: ff7eb1a3417a7b80 R08: 0000000000000000 R09: 00000000ffffbfff
R10: ff7eb1a3417a7978 R11: ff32b80f7fd2e568 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ff32b7f02c50e0d8
FS: 0000000000000000(0000) GS:ff32b80efe800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b5852cc000 CR3: 000000003c43a004 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x84/0x170
? __mutex_lock+0x773/0xd40
? report_bug+0x1c7/0x1d0
? prb_read_valid+0x1b/0x30
? handle_bug+0x42/0x70
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? __mutex_lock+0x773/0xd40
? rcu_is_watching+0x11/0x50
? __kmalloc_node_track_caller+0x346/0x490
? ice_dpll_lock_status_get+0x28/0x50 [ice]
? __pfx_ice_dpll_lock_status_get+0x10/0x10 [ice]
? ice_dpll_lock_status_get+0x28/0x50 [ice]
ice_dpll_lock_status_get+0x28/0x50 [ice]
dpll_device_get_one+0x14f/0x2e0
dpll_device_event_send+0x7d/0x150
dpll_device_register+0x124/0x180
ice_dpll_init_dpll+0x7b/0xd0 [ice]
ice_dpll_init+0x224/0xa40 [ice]
? _dev_info+0x70/0x90
ice_load+0x468/0x690 [ice]
ice_probe+0x75b/0xa10 [ice]
? _raw_spin_unlock_irqrestore+0x4f/0x80
? process_one_work+0x1a3/0x500
local_pci_probe+0x47/0xa0
work_for_cpu_fn+0x17/0x30
process_one_work+0x20d/0x500
worker_thread+0x1df/0x3e0
? __pfx_worker_thread+0x10/0x10
kthread+0x103/0x140
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
irq event stamp: 125197
hardirqs last enabled at (125197): [<ffffffff8416409d>] finish_task_switch.isra.0+0x12d/0x3d0
hardirqs last disabled at (125196): [<ffffffff85134044>] __schedule+0xea4/0x19f0
softirqs last enabled at (105334): [<ffffffff84e1e65a>] napi_get_frags_check+0x1a/0x60
softirqs last disabled at (105332): [<ffffffff84e1e65a>] napi_get_frags_check+0x1a/0x60
---[ end trace 0000000000000000 ]---
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
Signed-off-by: Michal Schmidt <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The function ice_bridge_setlink() may encounter a NULL pointer dereference
if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently
in nla_for_each_nested(). To address this issue, add a check to ensure that
br_spec is not NULL before proceeding with the nested attribute iteration.
Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Signed-off-by: Rand Deeb <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The E800 series hardware uses the same iAVF driver as older devices,
including the virtchnl negotiation scheme.
This negotiation scheme includes a mechanism to determine what type of RSS
should be supported, including RSS over PF virtchnl messages, RSS over
firmware AdminQ messages, and RSS via direct register access.
The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its
supported by the VF driver. However, if an older VF driver is loaded, it
may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ.
The ice driver happily agrees to support these methods. Unfortunately, the
underlying hardware does not support these mechanisms. The E800 series VFs
don't have the appropriate registers for RSS_REG. The mailbox queue used by
VFs for VF to PF communication blocks messages which do not have the
VF-to-PF opcode.
Stop lying to the VF that it could support RSS over AdminQ or registers, as
these interfaces do not work when the hardware is operating on an E800
series device.
In practice this is unlikely to be hit by any normal user. The iAVF driver
has supported RSS over PF virtchnl commands since 2016, and always defaults
to using RSS_PF if possible.
In principle, nothing actually stops the existing VF from attempting to
access the registers or send an AQ command. However a properly coded VF
will check the capability flags and will report a more useful error if it
detects a case where the driver does not support the RSS offloads that it
does.
Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support")
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Alan Brady <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
1. pcaps are free'd right after AQ routines are done, no need for
devm_'s
2. a test frame for loopback test in ethtool -t is destroyed at the end
of the test so we don't need devm_ here either.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
ice_down() clears QINT_TQCTL_CAUSE_ENA_M bit twice, which is not
necessary. First clearing happens in ice_vsi_dis_irq() and second in
ice_vsi_stop_tx_ring() - remove the first one.
While at it, make ice_vsi_dis_irq() static as ice_down() is the only
current caller of it.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The indentation for ice_set_ctx and ice_write_rxq_ctx breaks the function
name after the return type. This style of breaking is used a lot throughout
the ice driver, even in cases where its not actually helpful for
readability. We no longer prefer this style of line splitting in the
driver, and new code is avoiding it.
Normally, I would leave this alone unless the actual function contents or
description needed updating. However, a future change is going to add
inverse functions for converting packed context to unpacked context
structures. To keep this code uniform with the existing set functions, fix
up the style to the modern format of keeping the type on the same line.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The functions used to pack the Tx and Rx context into the hardware format
rely on using BIT() and then subtracting 1 to get a bitmask. These
functions even have a comment about how x86 machines can't use this method
for certain widths because the SHL instructions will not work properly.
The Linux kernel already provides the GENMASK macro for generating a
suitable bitmask. Further, GENMASK is capable of generating the mask
including the shift_width. Since width is the total field width, take care
to subtract one to get the final bit position.
Since we now include the shifted bits as part of the mask, shift the source
value first before applying the mask.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In ice_common.c there are 4 functions used for converting the unpacked
software Tx and Rx context structure data into the packed format used by
hardware. These functions have extremely generic names:
* ice_write_byte
* ice_write_word
* ice_write_dword
* ice_write_qword
When I saw these function names my first thought was "write what? to
where?". Understanding what these functions do requires looking at the
implementation details. The functions take bits from an unpacked structure
and copy them into the packed layout used by hardware.
As part of live migration, we will want functions which perform the inverse
operation of reading bits from the packed layout and copying them into the
unpacked format. Naming these as "ice_read_byte", etc would be very
confusing since they appear to write data.
In preparation for adding this new inverse operation, rename the existing
functions to use the prefix "ice_pack_ctx_". This makes it clear that they
perform the bit packing while copying from the unpacked software context
structure to the packed hardware context.
The inverse operations can then neatly be named ice_unpack_ctx_*, clearly
indicating they perform the bit unpacking while copying from the packed
hardware context to the unpacked software context structure.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The lan_vsi_num field of the VF structure is no longer used for any
purpose. Remove it.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
When initializing over virtchnl, the PF is required to pass a VSI ID to the
VF as part of its capabilities exchange. The VF driver reports this value
back to the PF in a variety of commands. The PF driver validates that this
value matches the value it sent to the VF.
Some hardware families such as the E700 series could use this value when
reading RSS registers or communicating directly with firmware over the
Admin Queue.
However, E800 series hardware does not support any of these interfaces and
the VF's only use for this value is to report it back to the PF. Thus,
there is no requirement that this value be an actual VSI ID value of any
kind.
The PF driver already does not trust that the VF sends it a real VSI ID.
The VSI structure is always looked up from the VF structure. The PF does
validate that the VSI ID provided matches a VSI associated with the VF, but
otherwise does not use the VSI ID for any purpose.
Instead of reporting the VSI number relative to the PF space, report a
fixed value of 1. When communicating with the VF over virtchnl, validate
that the VSI number is returned appropriately.
This avoids leaking information about the firmware of the PF state.
Currently the ice driver only supplies a VF with a single VSI. However, it
appears that virtchnl has some support for allowing multiple VSIs. I did
not attempt to implement this. However, space is left open to allow further
relative indexes if additional VSIs are provided in future feature
development. For this reason, keep the ice_vc_isvalid_vsi_id function in
place to allow extending it for multiple VSIs in the future.
This change will also simplify handling of live migration in a future
series. Since we no longer will provide a real VSI number to the VF, there
will be no need to keep track of this number when migrating to a new host.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_vc_fdir_param_check() function validates that the VSI ID of the
virtchnl flow director command matches the VSI number of the VF. This is
already checked by the call to ice_vc_isvalid_vsi_id() immediately
following this.
This check is unnecessary since ice_vc_isvalid_vsi_id() already confirms
this by checking that the VSI ID can locate the VSI associated with the VF
structure.
Furthermore, a following change is going to refactor the ice driver to
report VSI IDs using a relative index for each VF instead of reporting the
PF VSI number. This additional check would break that logic since it
enforces that the VSI ID matches the VSI number.
Since this check duplicates the logic in ice_vc_isvalid_vsi_id() and gets
in the way of refactoring that logic, remove it.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_vc_isvalid_q_id() function takes a VSI index and a queue ID. It
looks up the VSI from its index, and then validates that the queue number
is valid for that VSI.
The VSI ID passed is typically a VSI index from the VF. This VSI number is
validated by the PF to ensure that it matches the VSI associated with the
VF already.
In every flow where ice_vc_isvalid_q_id() is called, the PF driver already
has a pointer to the VSI associated with the VF. This pointer is obtained
using ice_get_vf_vsi(), rather than looking up the VSI using the index sent
by the VF.
Since we already know which VSI to operate on, we can modify
ice_vc_isvalid_q_id() to take a VSI pointer instead of a VSI index. Pass
the VSI we found from ice_get_vf_vsi() instead of re-doing the lookup. This
removes some unnecessary computation and scanning of the VSI list.
It also removes the last place where the driver directly used the VSI
number from the VF. This will pave the way for refactoring to communicate
relative VSI numbers to the VF instead of absolute numbers from the PF
space.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
During VSI reconfiguration filters and VSI config which is set in
ice_vf_init_host_cfg() are lost. Recall the host configuration function
to restore them.
Without this config VF on which MSI-X amount was changed might had a
connection problems.
Fixes: 4d38cb44bd32 ("ice: manage VFs MSI-X using resource tracking")
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
ice_qp_dis() currently does things in very mixed way. Tx is stopped
before disabling IRQ on related queue vector, then it takes care of
disabling Rx and finally NAPI is disabled.
Let us start with disabling IRQs in the first place followed by turning
off NAPI. Then it is safe to handle queues.
One subtle change on top of that is that even though ice_qp_ena() looks
more sane, clear ICE_CFG_BUSY as the last thing there.
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Acked-by: Magnus Karlsson <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/mptcp/protocol.c
adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket")
9426ce476a70 ("mptcp: annotate lockless access for RX path fields")
https://lore.kernel.org/all/[email protected]/
Adjacent changes:
drivers/dpll/dpll_core.c
0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()")
e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order")
drivers/net/veth.c
1ce7d306ea63 ("veth: try harder when allocating queue memory")
0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers")
drivers/net/wireless/intel/iwlwifi/mvm/d3.c
8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO")
78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists")
net/wireless/nl80211.c
f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change")
414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Commit 91fdbce7e8d6 ("ice: Add support in the driver for associating
queue with napi") invoked the netif_queue_set_napi() call. This
kernel function requires to be called with rtnl_lock taken,
otherwise ASSERT_RTNL() warning will be triggered. ice_vsi_rebuild()
initiating this call is under rtnl_lock when the rebuild is in
response to configuration changes from external interfaces (such as
tc, ethtool etc. which holds the lock). But, the VSI rebuild
generated from service tasks and resets (PFR/CORER/GLOBR) is not
under rtnl lock protection. Handle these cases as well to hold lock
before the kernel call (by setting the 'locked' boolean to false).
netif_queue_set_napi() is also used to clear previously set napi
in the q_vector unroll flow. Handle this for locked/lockless execution
paths.
Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi")
Signed-off-by: Amritha Nambiar <[email protected]>
Reviewed-by: Sridhar Samudrala <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Do not allow to set phase adjust value for a pin if PF reset is in
progress, this would cause confusing netlink extack errors as the firmware
cannot process the request properly during the reset time.
Return (-EBUSY) and report extack error for the user who tries configure
pin phase adjust during the reset time.
Test by looping execution of below steps until netlink error appears:
- perform PF reset
$ echo 1 > /sys/class/net/<ice PF>/device/reset
- change pin phase adjust value:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-set --json '{"id":0, "phase-adjust":1000}'
Fixes: 90e1c90750d7 ("ice: dpll: implement phase related callbacks")
Reviewed-by: Igor Bagnucki <[email protected]>
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|