Age | Commit message (Collapse) | Author | Files | Lines |
|
Since mqprio is a scheduler and not a classifier, move its offload
structure to pkt_sched.h, where struct tc_taprio_qopt_offload also lies.
Also update some header inclusions in drivers that access this
structure, to the best of my abilities.
Cc: Igor Russkikh <irusskikh@marvell.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: Daniel Machon <daniel.machon@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Part of code from probe can be reused in reload flow. Move this code to
separate function. Create unroll functions for each part of
initialization, like: ice_init_dev() and ice_deinit_dev(). It
simplifies unrolling and can be used in remove flow.
Avoid freeing port info as it could be reused in reload path.
Will be freed in remove path since is allocated via devm_kzalloc().
Also clean the remove path to reflect the init steps.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Do few small cleanups:
1) Rename the function to reflect that it doesn't configure all things
related to VSI. ice_vsi_cfg_lan() better fits to what function is doing.
ice_vsi_cfg() can be use to name function that will configure whole VSI.
2) Remove unused ethtype field from VSI. There is no need to set
ethtype here, because it is never used.
3) Remove unnecessary check for ICE_VSI_CHNL. There is check for
ICE_VSI_CHNL in ice_vsi_get_qs, so there is no need to check it before
calling the function.
4) Simplify ice_vsi_alloc() call. There is no need to check the type of
VSI before calling ice_vsi_alloc(). For ICE_VSI_CHNL vf is always NULL
(ice_vsi_setup() is called with vf=NULL).
For ICE_VSI_VF or ICE_VSI_CTRL ch is always NULL and for other VSI types
ch and vf are always NULL.
5) Remove unnecessary call to ice_vsi_dis_irq(). ice_vsi_dis_irq() will
be called in ice_vsi_close() flow (ice_vsi_close() -> ice_vsi_down() ->
ice_vsi_dis_irq()). Remove unnecessary call.
6) Don't remove specific filters in release. All hw filters are removed
in ice_fltr_remove_alli(), which is always called in VSI release flow.
There is no need to remove only ethertype filters before calling
ice_fltr_remove_all().
7) Rename ice_vsi_clear() to ice_vsi_free(). As ice_vsi_clear() only
free memory allocated in ice_vsi_alloc() rename it to ice_vsi_free()
which better shows what function is doing.
8) Free coalesce param in rebuild. There is potential memory leak if
configuration of VSI lan fails. Free coalesce to avoid it.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Simplify probe flow by moving all RDMA related code to ice_init_rdma().
Unroll irq allocation if RDMA initialization fails.
Implement ice_deinit_rdma() and use it in remove flow.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Acked-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
net/core/gro.c
7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO")
b1a78b9b9886 ("net: add support for ipv4 big tcp")
https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The PF controls the set of queues that the RDMA auxiliary_driver requests
resources from. The set_channel command will alter that pool and trigger a
reconfiguration of the VSI, which breaks RDMA functionality.
Prevent set_channel from executing when RDMA driver bound to auxiliary
device.
Adding a locked variable to pass down the call chain to avoid double
locking the device_lock.
Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Previously support for GNSS was implemented as a TTY driver, it allowed
to access GNSS receiver on /dev/ttyGNSS_<bus><func>.
Use generic GNSS subsystem API instead of implementing own TTY driver.
The receiver is accessible on /dev/gnss<id>. In case of multiple receivers
in the OS, correct device can be found by enumerating either:
- /sys/class/net/<eth port>/device/gnss/
- /sys/class/gnss/gnss<id>/device/
Using GNSS subsystem is superior to implementing own TTY driver, as the
GNSS subsystem was designed solely for this purpose. It also implements
TTY driver but in a common and defined way.
From user perspective, there is no difference in communicating with a
device, except new path to the device shall be used. The device will
provide same information to the userspace as the old one, and can be used
in the same way, i.e.:
old # gpsmon /dev/ttyGNSS_2100_0
new # gpsmon /dev/gnss0
There is no other impact on userspace tools.
User expecting onboard GNSS receiver support is required to enable
CONFIG_GNSS=y/m in kernel config.
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Resets may occur with or without user interaction. For example, a TX hang
or reconfiguration of parameters will result in a reset. During reset, the
VSI is freed, freeing any statistics structures inside as well. This would
create an issue for the user where a reset happens in the background,
statistics set to zero, and the user checks ring statistics expecting them
to be populated.
To ensure this doesn't happen, accumulate ring statistics over reset.
Define a new ring statistics structure, ice_ring_stats. The new structure
lives in the VSI's parent, preserving ring statistics when VSI is freed.
1. Define a new structure vsi_ring_stats in the PF scope
2. Allocate/free stats only during probe, unload, or change in ring size
3. Replace previous ring statistics functionality with new structure
Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Resets happen with or without user interaction. For example, incidents
such as TX hang or a reconfiguration of parameters will result in a reset.
During reset, hardware and software statistics were set to zero. This
created an issue for the user where a reset happens in the background,
statistics set to zero, and the user checks statistics expecting them to
be populated.
To ensure this doesn't happen, keep accumulating stats over reset.
1. Remove function calls which reset hardware and netdev statistics.
2. Do not rollover statistics in ice_stat_update40 during reset.
Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add new VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC flag, opcode
VIRTCHNL_OP_GET_SUPPORTED_RXDIDS and add member rxdid
in struct virtchnl_rxq_info to support AVF Flex RXD
extension.
Add support to allow VF to query flexible descriptor RXDIDs supported
by DDP package and configure Rx queues with selected RXDID for IAVF.
Add code to allow VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message to be
processed. Add necessary macros for registers.
Signed-off-by: Leyi Rong <leyi.rong@intel.com>
Signed-off-by: Xu Ting <ting.xu@intel.com>
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20221025161252.1952939-1-jacob.e.keller@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch uses TC skbedit queue_mapping action to support
forwarding packets to a device queue. Such filters with action
forward to queue will be the highest priority switch filter in
HW.
Example:
$ tc filter add dev ens4f0 protocol ip ingress flower\
dst_ip 192.168.1.12 ip_proto tcp dst_port 5001\
action skbedit queue_mapping 5 skip_sw
The above command adds an ingress filter, incoming packets
qualifying the match will be accepted into queue 5. The queue
number is in decimal format.
Refactored ice_add_tc_flower_adv_fltr() to consolidate code with
action FWD_TO_VSI and FWD_TO QUEUE.
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()")
c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer")
https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/
https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ice driver allocates per cpu XDP queues so that redirect path can safely
use smp_processor_id() as an index to the array. At the same time
though, XDP rings are used to pick NAPI context to call napi_schedule()
or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
num_possible_cpus() of underlying platform is 44, then this means queue
vectors with correlated NAPI contexts will carry several XDP queues.
This in turn can result in a broken behavior where NAPI context of
interest will never be scheduled and AF_XDP socket will not process any
traffic.
To fix this, let us change the way how XDP rings are assigned to Rx
rings and use this information later on when setting
ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
queue vector and walk through Tx ring's linked list. Once we stumble
upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.
Previous [0] approach of fixing this issue was for txonly scenario
because of the described grouping of XDP rings across queue vectors. So,
relying on Rx ring meant that NAPI context could be scheduled with a
queue vector without XDP ring with associated XSK pool.
[0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The driver can allow the user to configure whether the CRC aka the FCS
(Frame Check Sequence) is DMA'd to the host as part of the receive
buffer. The driver usually wants this feature disabled so that the
hardware checks the FCS and strips it in order to save PCI bandwidth.
Control the reception of FCS to the host using the command:
ethtool -K eth0 rx-fcs <on|off>
The default shown in ethtool -k eth0 | grep fcs; should be "off", as the
hardware will drop any frame with a bad checksum, and DMA of the
checksum is useless overhead especially for small packets.
Testing Hints:
test the FCS/CRC arrives with received packets using
tcpdump -nnpi eth0 -xxxx
and it should show crc data as the last 4 bytes of the packet. Can also
use wireshark to turn on CRC checking and check the data is correct.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Co-developed-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In current implementation default VSI switch filter is only able to
forward traffic to a single VSI. This limits promiscuous mode with
private flag 'vf-true-promisc-support' to a single VF. Enabling it on
the second VF won't work. Also allmulticast support doesn't seem to be
properly implemented when vf-true-promisc-support is true.
Use standard ice_add_rule_internal() function that already implements
forwarding to multiple VSI's instead of constructing AQ call manually.
Add switch filter for allmulticast mode when vf-true-promisc-support is
enabled. The same filter is added regardless of the flag - it doesn't
matter for this case.
Remove unnecessary fields in switch structure. From now on book keeping
will be done by ice_add_rule_internal().
Refactor unnecessarily passed function arguments.
To test:
1) Create 2 VM's, and two VF's. Attach VF's to VM's.
2) Enable promiscuous mode on both of them and check if
traffic is seen on both of them.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add support for creating PPPoE filters in switchdev mode. Add support
for parsing PPPoE and PPP-specific tc options: pppoe_sid and ppp_proto.
Example filter:
tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \
1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR
Changes in iproute2 are required to use the new fields.
ICE COMMS DDP package is required to create a filter as it contains PPPoE
profiles. Added a warning message when loaded DDP package does not contain
required profiles.
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add the possibility to write raw bytes to the GNSS module through the
first TTY device. This allows user to configure the module.
Create a second read-only TTY device.
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
External time stamp sources are supported only on certain devices. Enforce
the right support matrix by adding the ICE_F_PTP_EXTTS bit to the feature
bitmap set.
Co-developed-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
No conflicts.
Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Function ice_plug_aux_dev() assigns pf->adev field too early prior
aux device initialization and on other side ice_unplug_aux_dev()
starts aux device deinit and at the end assigns NULL to pf->adev.
This is wrong because pf->adev should always be non-NULL only when
aux device is fully initialized and ready. This wrong order causes
a crash when ice_send_event_to_aux() call occurs because that function
depends on non-NULL value of pf->adev and does not assume that
aux device is half-initialized or half-destroyed.
After order correction the race window is tiny but it is still there,
as Leon mentioned and manipulation with pf->adev needs to be protected
by mutex.
Fix (un-)plugging functions so pf->adev field is set after aux device
init and prior aux device destroy and protect pf->adev assignment by
new mutex. This mutex is also held during ice_send_event_to_aux()
call to ensure that aux device is valid during that call.
Note that device lock used ice_send_event_to_aux() needs to be kept
to avoid race with aux drv unload.
Reproducer:
cycle=1
while :;do
echo "#### Cycle: $cycle"
ip link set ens7f0 mtu 9000
ip link add bond0 type bond mode 1 miimon 100
ip link set bond0 up
ifenslave bond0 ens7f0
ip link set bond0 mtu 9000
ethtool -L ens7f0 combined 1
ip link del bond0
ip link set ens7f0 mtu 1500
sleep 1
let cycle++
done
In short when the device is added/removed to/from bond the aux device
is unplugged/plugged. When MTU of the device is changed an event is
sent to aux device asynchronously. This can race with (un)plugging
operation and because pf->adev is set too early (plug) or too late
(unplug) the function ice_send_event_to_aux() can touch uninitialized
or destroyed fields. In the case of crash below pf->adev->dev.mutex.
Crash:
[ 53.372066] bond0: (slave ens7f0): making interface the new active one
[ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up
link
[ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[ 54.248204] bond0: (slave ens7f0): Releasing backup interface
[ 54.253955] bond0: (slave ens7f1): making interface the new active one
[ 54.274875] bond0: (slave ens7f1): Releasing backup interface
[ 54.289153] bond0 (unregistering): Released all slaves
[ 55.383179] MII link monitoring set to 100 ms
[ 55.398696] bond0: (slave ens7f0): making interface the new active one
[ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080
[ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[ 55.412198] #PF: supervisor write access in kernel mode
[ 55.412200] #PF: error_code(0x0002) - not-present page
[ 55.412201] PGD 25d2ad067 P4D 0
[ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S
5.17.0-13579-g57f2d6540f03 #1
[ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up
link
[ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/
2021
[ 55.430226] Workqueue: ice ice_service_task [ice]
[ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20
[ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
[ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
[ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001
[ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080
[ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041
[ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0
[ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000
[ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000
[ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0
[ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 55.567305] PKRU: 55555554
[ 55.570018] Call Trace:
[ 55.572474] <TASK>
[ 55.574579] ice_service_task+0xaab/0xef0 [ice]
[ 55.579130] process_one_work+0x1c5/0x390
[ 55.583141] ? process_one_work+0x390/0x390
[ 55.587326] worker_thread+0x30/0x360
[ 55.590994] ? process_one_work+0x390/0x390
[ 55.595180] kthread+0xe6/0x110
[ 55.598325] ? kthread_complete_and_exit+0x20/0x20
[ 55.603116] ret_from_fork+0x1f/0x30
[ 55.606698] </TASK>
Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Both ice_idc.c and ice_virtchnl.c carry their own implementation of a
helper function that is looking for a given VSI based on provided
vsi_num. Their functionality is the same, so let's introduce the common
function in ice.h that both of the mentioned sites will use.
This is a strictly cleanup thing, no functionality is changed.
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Unfortunately, the ice driver doesn't respect the RCU critical section that
XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to
paths that destroy resources that might be in use.
This was addressed in other AF_XDP ZC enabled drivers, for reference see
for example commit b3873a5be757 ("net/i40e: Fix concurrency issues
between config flow and XSK")
Fixes: efc2214b6047 ("ice: Add support for XDP")
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Handling of all-multicast flag and associated multicast promiscuous
mode is broken in ice driver. When an user switches allmulticast
flag on or off the driver checks whether any VLANs are configured
over the interface (except default VLAN 0).
If any extra VLANs are registered it enables multicast promiscuous
mode for all these VLANs (including default VLAN 0) using
ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all
multicast packets tagged with known VLAN ID or untagged are received
and multicast packets tagged with unknown VLAN ID ignored.
If no extra VLANs are registered (so only VLAN 0 exists) it enables
multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC
look-up type. In this situation any multicast packets including
tagged ones are received.
The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way:
ice_vsi_sync_fltr() {
...
if (changed_flags & IFF_ALLMULTI) {
if (netdev->flags & IFF_ALLMULTI) {
if (vsi->num_vlans > 1)
ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
else
ice_set_promisc(..., ICE_MCAST_PROMISC_BITS);
} else {
if (vsi->num_vlans > 1)
ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
else
ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS);
}
}
...
}
The code above depends on value vsi->num_vlan that specifies number
of VLANs configured over the interface (including VLAN 0) and
this is problem because that value is modified in NDO callbacks
ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid().
Scenario 1:
1. ip link set ens7f0 allmulticast on
2. ip link add vlan10 link ens7f0 type vlan id 10
3. ip link set ens7f0 allmulticast off
4. ip link set ens7f0 allmulticast on
[1] In this scenario IFF_ALLMULTI is enabled and the driver calls
ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs
multicast promisc rule with non-VLAN look-up type.
[2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2
[3] Command switches IFF_ALLMULTI off and the driver calls
ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this
call is effectively NOP because it looks for multicast promisc
rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such
rules exist. So the all-multicast remains enabled silently
in hardware.
[4] Command tries to switch IFF_ALLMULTI on and the driver calls
ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call
fails (-EEXIST) because non-VLAN multicast promisc rule already
exists.
Scenario 2:
1. ip link add vlan10 link ens7f0 type vlan id 10
2. ip link set ens7f0 allmulticast on
3. ip link add vlan20 link ens7f0 type vlan id 20
4. ip link del vlan10 ; ip link del vlan20
5. ip link set ens7f0 allmulticast off
[1] VLAN with ID 10 is added and vsi->num_vlan==2
[2] Command switches IFF_ALLMULTI on and driver installs multicast
promisc rules with VLAN look-up type for VLAN 0 and 10
[3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast
promisc rules is added for this new VLAN so the interface does
not receive MC packets from VLAN 20
[4] Both VLANs are removed but multicast rule for VLAN 10 remains
installed so interface receives multicast packets from VLAN 10
[5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1
the driver tries to remove multicast promisc rule for VLAN 0
with non-VLAN look-up that does not exist.
All-multicast looks disabled from user point of view but it
is partially enabled in HW (interface receives all multicast
packets either untagged or tagged with VLAN ID 10)
To resolve these issues the patch introduces these changes:
1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and
ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed
and IFF_ALLMULTI is enabled an appropriate multicast promisc
rule for that VLAN ID is added/removed.
2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added
so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up
type for existing multicast promisc rule for VLAN 0 is updated
to ICE_MCAST_VLAN_PROMISC_BITS.
3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed
so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up
type for existing multicast promisc rule for VLAN 0 is updated
to ICE_MCAST_PROMISC_BITS.
4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY
bit protection to avoid races with ice_vsi_sync_fltr() that runs
in ice_service_task() context.
5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed.
6. Error messages added to ice_fltr_*_vsi_promisc() helper functions
to avoid them in their callers
7. Small improvements to increase readability
Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ice driver tries to always create XDP rings array to be
num_possible_cpus() sized, regardless of user's queue count setting that
can be changed via ethtool -L for example.
Currently, ice_tx_xsk_pool() calculates the qid by decrementing the
ring->q_index by the count of XDP queues, but ring->q_index is set to 'i
+ vsi->alloc_txq'.
When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but
vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool()
will do OOB access and in the final result ring would not get xsk_pool
pointer assigned. Then, each ice_xsk_wakeup() call will fail with error
and it will not be possible to get into NAPI and do the processing from
driver side.
Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from
ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the
setting of ring->q_index.
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220328142123.170157-5-maciej.fijalkowski@intel.com
|
|
Merge in overtime fixes, no conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's a kernel BUG splat on processing aux critical error
interrupts in ice_misc_intr():
[ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000
...
[ 2101.060770] Call Trace:
[ 2101.063229] <IRQ>
[ 2101.065252] dump_stack+0x41/0x60
[ 2101.068587] __schedule_bug.cold.100+0x4c/0x58
[ 2101.073060] __schedule+0x6a4/0x830
[ 2101.076570] schedule+0x35/0xa0
[ 2101.079727] schedule_preempt_disabled+0xa/0x10
[ 2101.084284] __mutex_lock.isra.7+0x310/0x420
[ 2101.088580] ? ice_misc_intr+0x201/0x2e0 [ice]
[ 2101.093078] ice_send_event_to_aux+0x25/0x70 [ice]
[ 2101.097921] ice_misc_intr+0x220/0x2e0 [ice]
[ 2101.102232] __handle_irq_event_percpu+0x40/0x180
[ 2101.106965] handle_irq_event_percpu+0x30/0x80
[ 2101.111434] handle_irq_event+0x36/0x53
[ 2101.115292] handle_edge_irq+0x82/0x190
[ 2101.119148] handle_irq+0x1c/0x30
[ 2101.122480] do_IRQ+0x49/0xd0
[ 2101.125465] common_interrupt+0xf/0xf
[ 2101.129146] </IRQ>
...
As Andrew correctly mentioned previously[0], the following call
ladder happens:
ice_misc_intr() <- hardirq
ice_send_event_to_aux()
device_lock()
mutex_lock()
might_sleep()
might_resched() <- oops
Add a new PF state bit which indicates that an aux critical error
occurred and serve it in ice_service_task() in process context.
The new ice_pf::oicr_err_reg is read-write in both hardirq and
process contexts, but only 3 bits of non-critical data probably
aren't worth explicit synchronizing (and they're even in the same
byte [31:24]).
[0] https://lore.kernel.org/all/YeSRUVmrdmlUXHDn@lunn.ch
Fixes: 348048e724a0e ("ice: Implement iidc operations")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Several headers in the ice driver include ice.h even though they are
themselves included by that header. The most notable of these is
ice_common.h, but several other headers also do this.
Such a recursive inclusion is problematic as it forces headers to be
included in a strict order, otherwise compilation errors can result. The
circular inclusions do not trigger an endless loop due to standard
header inclusion guards, however other errors can occur.
For example, ice_flow.h defines ice_rss_hash_cfg, which is used by
ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx.
ice_flow.h includes ice_acl.h, which includes ice_common.h, and which
finally includes ice.h. Since ice.h itself includes ice_sriov.h, this
creates a circular dependency.
The definition in ice_sriov.h requires things from ice_flow.h, but
ice_flow.h itself will lead to trying to load ice_sriov.h as part of its
process for expanding ice.h. The current code avoids this issue by
having an implicit dependency without the include of ice_flow.h.
If we were to fix that so that ice_sriov.h explicitly depends on
ice_flow.h the following pattern would occur:
ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h
At this point, during the expansion of, the header guard for ice_flow.h
is already set, so when ice_sriov.h attempts to load the ice_flow.h
header it is skipped. Then, we go on to begin including the rest of
ice_sriov.h, including structure definitions which depend on
ice_rss_hash_cfg. This produces a compiler warning because
ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the
start of ice_flow.h!
If the order of headers is incorrect (ice_flow.h is not implicitly
loaded first in all files which include ice_sriov.h) then we get the
same failure.
Removing this recursive inclusion requires fixing a few cases where some
headers depended on the header inclusions from ice.h. In addition, a few
other changes are also required.
Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h,
which is the likely reason that ice_common.h includes ice.h at all. This
macro implementation requires the full definition of ice_pf in order to
properly compile.
Fix this by moving it to a function declared in ice_main.c, so that we
do not require all files to depend on the layout of the ice_pf
structure.
Note that this change only fixes circular dependencies, but it does not
fully resolve all implicit dependencies where one header may depend on
the inclusion of another. I tried to fix as many of the implicit
dependencies as I noticed, but fixing them all requires a somewhat
tedious analysis of each header and attempting to compile it separately.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the
code for implementing Single Root IOV virtualization resides. This code
includes support for bringing up and tearing down VFs, hooks into the
kernel SR-IOV netdev operations, and for handling virtchnl messages from
VFs.
In the future, we plan to support Scalable IOV in addition to Single
Root IOV as an alternative virtualization scheme. This implementation
will re-use some but not all of the code in ice_virtchnl_pf.c
To prepare for this future, we want to refactor and split up the code in
ice_virtchnl_pf.c into the following scheme:
* ice_vf_lib.[ch]
Basic VF structures and accessors. This is where scheme-independent
code will reside.
* ice_virtchnl.[ch]
Virtchnl message handling. This is where the bulk of the logic for
processing messages from VFs using the virtchnl messaging scheme will
reside. This is separated from ice_vf_lib.c because it is distinct
and has a bulk of the processing code.
* ice_sriov.[ch]
Single Root IOV implementation, including initialization and the
routines for interacting with SR-IOV based netdev operations.
* (future) ice_siov.[ch]
Scalable IOV implementation.
As a first step, lets assume that all of the code in
ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to
ice_sriov.c and its header to ice_sriov.h
Future changes will further split out the code in these files following
the plan outlined here.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice_sriov.c file primarily contains code which handles the logic for
mailbox overflow detection and some other utility functions related to
the virtualization mailbox.
The bulk of the SR-IOV implementation is actually found in
ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific.
In the future, the ice driver will support an additional virtualization
scheme known as Scalable IOV, and the code in this file will be used
for this alternative implementation.
Rename this file (and its associated header) to ice_vf_mbx.c, so that we
can later re-use the ice_sriov.c file as the SR-IOV specific file.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add support for creating filters for GTP-U and GTP-C in switchdev mode. Add
support for parsing GTP-specific options (QFI and PDU type) and TEID.
By default, a filter for GTP-U will be added. To add a filter for GTP-C,
specify enc_dst_port = 2123, e.g.:
tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \
enc_dst_port 2123 action mirred egress redirect dev $VF1_PR
Note: GTP-U with outer IPv6 offload is not supported yet.
Note: GTP-U with no payload offload is not supported yet.
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-03-09
This series contains updates to ice driver only.
Martyna implements switchdev filtering on inner EtherType field for
tunnels.
Marcin adds reporting of slowpath statistics for port representors.
Jonathan Toppins changes a non-fatal link error message from warning to
debug.
Maciej removes unnecessary checks in ice_clean_tx_irq().
Amritha adds support for ADQ to match outer destination MAC for tunnels.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Add support for outer dest MAC for ADQ tunnels
ice: avoid XDP checks in ice_clean_tx_irq()
ice: change "can't set link" message to dbg level
ice: Add slow path offload stats on port representor in switchdev
ice: Add support for inner etype in switchdev
====================
Link: https://lore.kernel.org/r/20220309190315.1380414-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
net/dsa/dsa2.c
commit afb3cc1a397d ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails")
commit e83d56537859 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master")
https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/
drivers/net/ethernet/intel/ice/ice.h
commit 97b0129146b1 ("ice: Fix error with handling of bonding MTU")
commit 43113ff73453 ("ice: add TTY for GNSS module for E810T device")
https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/
drivers/staging/gdm724x/gdm_lte.c
commit fc7f750dc9d1 ("staging: gdm724x: fix use after free in gdm_lte_rx()")
commit 4bcc4249b4cf ("staging: Use netif_rx().")
https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating
auxiliary device") changes a process of re-creation of aux device
so ice_plug_aux_dev() is called from ice_service_task() context.
This unfortunately opens a race window that can result in dead-lock
when interface has left LAG and immediately enters LAG again.
Reproducer:
```
#!/bin/sh
ip link add lag0 type bond mode 1 miimon 100
ip link set lag0
for n in {1..10}; do
echo Cycle: $n
ip link set ens7f0 master lag0
sleep 1
ip link set ens7f0 nomaster
done
```
This results in:
[20976.208697] Workqueue: ice ice_service_task [ice]
[20976.213422] Call Trace:
[20976.215871] __schedule+0x2d1/0x830
[20976.219364] schedule+0x35/0xa0
[20976.222510] schedule_preempt_disabled+0xa/0x10
[20976.227043] __mutex_lock.isra.7+0x310/0x420
[20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
[20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
[20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core]
[20976.261079] ib_register_device+0x40d/0x580 [ib_core]
[20976.266139] irdma_ib_register_device+0x129/0x250 [irdma]
[20976.281409] irdma_probe+0x2c1/0x360 [irdma]
[20976.285691] auxiliary_bus_probe+0x45/0x70
[20976.289790] really_probe+0x1f2/0x480
[20976.298509] driver_probe_device+0x49/0xc0
[20976.302609] bus_for_each_drv+0x79/0xc0
[20976.306448] __device_attach+0xdc/0x160
[20976.310286] bus_probe_device+0x9d/0xb0
[20976.314128] device_add+0x43c/0x890
[20976.321287] __auxiliary_device_add+0x43/0x60
[20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice]
[20976.330109] ice_service_task+0xd0c/0xed0 [ice]
[20976.342591] process_one_work+0x1a7/0x360
[20976.350536] worker_thread+0x30/0x390
[20976.358128] kthread+0x10a/0x120
[20976.365547] ret_from_fork+0x1f/0x40
...
[20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084
[20976.446469] Call Trace:
[20976.448921] __schedule+0x2d1/0x830
[20976.452414] schedule+0x35/0xa0
[20976.455559] schedule_preempt_disabled+0xa/0x10
[20976.460090] __mutex_lock.isra.7+0x310/0x420
[20976.464364] device_del+0x36/0x3c0
[20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice]
[20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice]
[20976.477288] notifier_call_chain+0x47/0x70
[20976.481386] __netdev_upper_dev_link+0x18b/0x280
[20976.489845] bond_enslave+0xe05/0x1790 [bonding]
[20976.494475] do_setlink+0x336/0xf50
[20976.502517] __rtnl_newlink+0x529/0x8b0
[20976.543441] rtnl_newlink+0x43/0x60
[20976.546934] rtnetlink_rcv_msg+0x2b1/0x360
[20976.559238] netlink_rcv_skb+0x4c/0x120
[20976.563079] netlink_unicast+0x196/0x230
[20976.567005] netlink_sendmsg+0x204/0x3d0
[20976.570930] sock_sendmsg+0x4c/0x50
[20976.574423] ____sys_sendmsg+0x1eb/0x250
[20976.586807] ___sys_sendmsg+0x7c/0xc0
[20976.606353] __sys_sendmsg+0x57/0xa0
[20976.609930] do_syscall_64+0x5b/0x1a0
[20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca
1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
is called from ice_service_task() context, aux device is created
and associated device->lock is taken.
2. Command 'ip link ... set master...' calls ice's notifier under
RTNL lock and that notifier calls ice_unplug_aux_dev(). That
function tries to take aux device->lock but this is already taken
by ice_plug_aux_dev() in step 1
3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
taken in step 2
4. Dead-lock
The patch fixes this issue by following changes:
- Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
call in ice_service_task()
- The bit is checked in ice_clear_rdma_cap() and only if it is not set
then ice_unplug_aux_dev() is called. If it is set (in other words
plugging of aux device was requested and ice_plug_aux_dev() is
potentially running) then the function only clears the bit
- Once ice_plug_aux_dev() call (in ice_service_task) is finished
the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
whether it was already cleared by ice_clear_rdma_cap(). If so then
aux device is unplugged.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement callbacks to check for stats and fetch port representor stats.
Stats are taken from RX/TX ring corresponding to port representor and show
the number of bytes/packets that were not offloaded.
To see slow path stats run:
ifstat -x cpu_hits -a
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When a bonded interface is destroyed, .ndo_change_mtu can be called
during the tear-down process while the RTNL lock is held. This is a
problem since the auxiliary driver linked to the LAN driver needs to be
notified of the MTU change, and this requires grabbing a device_lock on
the auxiliary_device's dev. Currently this is being attempted in the
same execution context as the call to .ndo_change_mtu which is causing a
dead-lock.
Move the notification of the changed MTU to a separate execution context
(watchdog service task) and eliminate the "before" notification.
Fixes: 348048e724a0e ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
We maintain a number of values for VFs within the ice_pf structure. This
includes the VF table, the number of allocated VFs, the maximum number
of supported SR-IOV VFs, the number of queue pairs per VF, the number of
MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.
We're about to add a few more variables to this list. Clean this up
first by extracting these members out into a new ice_vfs structure
defined in ice_virtchnl_pf.h
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The VSI structure contains a vf_id field used to associate a VSI with a
VF. This is used mainly for ICE_VSI_VF as well as partially for
ICE_VSI_CTRL associated with the VFs.
This API was designed with the idea that VFs are stored in a simple
array that was expected to be static throughout most of the driver's
life.
We plan on refactoring VF storage in a few key ways:
1) converting from a simple static array to a hash table
2) using krefs to track VF references obtained from the hash table
3) use RCU to delay release of VF memory until after all references
are dropped
This is motivated by the goal to ensure that the lifetime of VF
structures is accounted for, and prevent various use-after-free bugs.
With the existing vsi->vf_id, the reference tracking for VFs would
become somewhat convoluted, because each VSI maintains a vf_id field
which will then require performing a look up. This means all these flows
will require reference tracking and proper usage of rcu_read_lock, etc.
We know that the VF VSI will always be backed by a valid VF structure,
because the VSI is created during VF initialization and removed before
the VF is destroyed. Rely on this and store a reference to the VF in the
VSI structure instead of storing a VF ID. This will simplify the usage
and avoid the need to perform lookups on the hash table in the future.
For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
debugging code which violates this assumption and avoid more disastrous
panics.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add a new ice_gnss.c file for holding the basic GNSS module functions.
If the device supports GNSS module, call the new ice_gnss_init and
ice_gnss_release functions where appropriate.
Implement basic functionality for reading the data from GNSS module
using TTY device.
Add I2C read AQ command. It is now required for controlling the external
physical connectors via external I2C port expander on E810-T adapters.
Future changes will introduce write functionality.
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Sudhansu Sekhar Mishra <sudhansu.mishra@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tools/testing/selftests/net/mptcp/mptcp_join.sh
34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers")
857898eb4b28 ("selftests: mptcp: add missing join check")
6ef84b1517e0 ("selftests: mptcp: more robust signal race test")
https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c
fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions")
c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information")
09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr")
84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr")
efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow")
3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit c503e63200c6 ("ice: Stop processing VF messages during teardown")
introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
intended to prevent some issues with concurrently handling messages from
VFs while tearing down the VFs.
This change was motivated by crashes caused while tearing down and
bringing up VFs in rapid succession.
It turns out that the fix actually introduces issues with the VF driver
caused because the PF no longer responds to any messages sent by the VF
during its .remove routine. This results in the VF potentially removing
its DMA memory before the PF has shut down the device queues.
Additionally, the fix doesn't actually resolve concurrency issues within
the ice driver. It is possible for a VF to initiate a reset just prior
to the ice driver removing VFs. This can result in the remove task
concurrently operating while the VF is being reset. This results in
similar memory corruption and panics purportedly fixed by that commit.
Fix this concurrency at its root by protecting both the reset and
removal flows using the existing VF cfg_lock. This ensures that we
cannot remove the VF while any outstanding critical tasks such as a
virtchnl message or a reset are occurring.
This locking change also fixes the root cause originally fixed by commit
c503e63200c6 ("ice: Stop processing VF messages during teardown"), so we
can simply revert it.
Note that I kept these two changes together because simply reverting the
original commit alone would leave the driver vulnerable to worse race
conditions.
Fixes: c503e63200c6 ("ice: Stop processing VF messages during teardown")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The status of support for RDMA is currently being tracked with two
separate status flags. This is unnecessary with the current state of
the driver.
Simplify status tracking down to a single flag.
Rename the helper function to denote the RDMA specific status and
universally use the helper function to test the status bit.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a call to re-create the auxiliary device happens in a context that has
already taken the RTNL lock, then the call flow that recreates auxiliary
device can hang if there is another attempt to claim the RTNL lock by the
auxiliary driver.
To avoid this, any call to re-create auxiliary devices that comes from
an source that is holding the RTNL lock (e.g. netdev notifier when
interface exits a bond) should execute in a separate thread. To
accomplish this, add a flag to the PF that will be evaluated in the
service task and dealt with there.
Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
VFs by default are able to see all tagged traffic regardless of trust
and VLAN filters. Based on legacy devices (i.e. ixgbe, i40e), customers
expect VFs to receive all VLAN tagged traffic with a matching
destination MAC.
Add an ethtool private flag 'vf-vlan-pruning' and set the default to
off so VFs will receive all VLAN traffic directed towards them. When
the flag is turned on, VF will only be able to receive untagged
traffic or traffic with VLAN tags it has created interfaces for.
Also, the flag cannot be changed while any VFs are allocated. This was
done to simplify the implementation. So, if this flag is needed, then
the PF admin must enable it. If the user tries to enable the flag while
VFs are active, then print an unsupported message with the
vf-vlan-pruning flag included. In case multiple flags were specified, this
makes it clear to the user which flag failed.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN
ops are only available when the device is in Double VLAN Mode (DVM).
Depending on the VSI type, the requirements for what operations to
use/allow differ.
By default all VSI's have unsupported inner and outer VSI VLAN ops. This
implementation was chosen to prevent unexpected crashes due to null
pointer dereferences. Instead, if a VSI calls an unsupported op, it will
just return -EOPNOTSUPP.
Add implementations to support modifying outer VLAN fields for VSI
context. This includes the ability to modify VLAN stripping, insertion,
and the port VLAN based on the outer VLAN handling fields of the VSI
context.
These functions should only ever be used if DVM is enabled because that
means the firmware supports the outer VLAN fields in the VSI context. If
the device is in DVM, then always use the outer_vlan_ops, else use the
vlan_ops since the device is in Single VLAN Mode (SVM).
Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to
ice_vsi_vlan_setup() as the latter function is specific to the PF and
all other VSI types that need an untagged VLAN 0 filter already do this
in their specific flows. Without this change, Flow Director is failing
to initialize because it does not implement any VSI VLAN ops.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and
offloads require more flexibility when configuring VLANs. The VSI VLAN
interface will allow flexibility for configuring VLANs for all VSI
types. Add new files to separate the VSI VLAN ops and move functions to
make the code more organized.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add support to enable flow-director filter when multiple TCs are
configured. Flow director filter can be configured using ethtool
(--config-ntuple option). When multiple TCs are configured, each
TC is mapped to an unique HW VSI. So VSI corresponding to queue
used in filter is identified and flow director context is updated
with correct VSI while configuring ntuple filter in HW.
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ice hardware contains an embedded chip with firmware which can be
updated using devlink flash. The firmware which runs on this chip is
referred to as the Embedded Management Processor firmware (EMP
firmware).
Activating the new firmware image currently requires that the system be
rebooted. This is not ideal as rebooting the system can cause unwanted
downtime.
In practical terms, activating the firmware does not always require a
full system reboot. In many cases it is possible to activate the EMP
firmware immediately. There are a couple of different scenarios to
cover.
* The EMP firmware itself can be reloaded by issuing a special update
to the device called an Embedded Management Processor reset (EMP
reset). This reset causes the device to reset and reload the EMP
firmware.
* PCI configuration changes are only reloaded after a cold PCIe reset.
Unfortunately there is no generic way to trigger this for a PCIe
device without a system reboot.
When performing a flash update, firmware is capable of responding with
some information about the specific update requirements.
The driver updates the flash by programming a secondary inactive bank
with the contents of the new image, and then issuing a command to
request to switch the active bank starting from the next load.
The response to the final command for updating the inactive NVM flash
bank includes an indication of the minimum reset required to fully
update the device. This can be one of the following:
* A full power on is required
* A cold PCIe reset is required
* An EMP reset is required
The response to the command to switch flash banks includes an indication
of whether or not the firmware will allow an EMP reset request.
For most updates, an EMP reset is sufficient to load the new EMP
firmware without issues. In some cases, this reset is not sufficient
because the PCI configuration space has changed. When this could cause
incompatibility with the new EMP image, the firmware is capable of
rejecting the EMP reset request.
Add logic to ice_fw_update.c to handle the response data flash update
AdminQ commands.
For the reset level, issue a devlink status notification informing the
user of how to complete the update with a simple suggestion like
"Activate new firmware by rebooting the system".
Cache the status of whether or not firmware will restrict the EMP reset
for use in implementing devlink reload.
Implement support for devlink reload with the "fw_activate" flag. This
allows user space to request the firmware be activated immediately.
For the .reload_down handler, we will issue a request for the EMP reset
using the appropriate firmware AdminQ command. If we know that the
firmware will not allow an EMP reset, simply exit with a suitable
netlink extended ACK message indicating that the EMP reset is not
available.
For the .reload_up handler, simply wait until the driver has finished
resetting. Logic to handle processing of an EMP reset already exists in
the driver as part of its reset and rebuild flows.
Implement support for the devlink reload interface with the
"fw_activate" action. This allows userspace to request activation of
firmware without a reboot.
Note that support for indicating the required reset and EMP reset
restriction is not supported on old versions of firmware. The driver can
determine if the two features are supported by checking the device
capabilities report. I confirmed support has existed since at least
version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the
EMP reset request has existed in all version of the EMP firmware for the
ice hardware.
Check the device capabilities report to determine whether or not the
indications are reported by the running firmware. If the reset
requirement indication is not supported, always assume a full power on
is necessary. If the reset restriction capability is not supported,
always assume the EMP reset is available.
Users can verify if the EMP reset has activated the firmware by using
the devlink info report to check that the 'running' firmware version has
updated. For example a user might do the following:
# Check current version
$ devlink dev info
# Update the device
$ devlink dev flash pci/0000:af:00.0 file firmware.bin
# Confirm stored version updated
$ devlink dev info
# Reload to activate new firmware
$ devlink dev reload pci/0000:af:00.0 action fw_activate
# Confirm running version updated
$ devlink dev info
Finally, this change does *not* implement basic driver-only reload
support. I did look into trying to do this. However, it requires
significant refactor of how the ice driver probes and loads everything.
The ice driver probe and allocation flows were not designed with such
a reload in mind. Refactoring the flow to support this is beyond the
scope of this change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
We have a region for reading the contents of the NVM flash as
a snapshot. This region does not allow reading the Shadow RAM, as it
always passes the FLASH_ONLY bit to the low level firmware interface.
Add a separate shadow-ram region which will allow snapshot of the
current contents of the Shadow RAM. This data is built from the NVM
contents but is distinct as the device builds up the Shadow RAM during
initialization, so being able to snapshot its contents can be useful
when attempting to debug flash related issues.
Fix the comment description of the nvm-flash region which incorrectly
stated that it filled the shadow-ram region, and add a comment
explaining that the nvm-flash region does not actually read the Shadow
RAM.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Remove the ice_stat_str() function which prints the string
representation of the ice_status error code. With upcoming changes
moving away from ice_status, there will be no need for this function.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
|