aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice/ice.h
AgeCommit message (Collapse)AuthorFilesLines
2023-02-06net/sched: move struct tc_mqprio_qopt_offload from pkt_cls.h to pkt_sched.hVladimir Oltean1-0/+1
Since mqprio is a scheduler and not a classifier, move its offload structure to pkt_sched.h, where struct tc_taprio_qopt_offload also lies. Also update some header inclusions in drivers that access this structure, to the best of my abilities. Cc: Igor Russkikh <irusskikh@marvell.com> Cc: Yisen Zhuang <yisen.zhuang@huawei.com> Cc: Salil Mehta <salil.mehta@huawei.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: Daniel Machon <daniel.machon@microchip.com> Cc: UNGLinuxDriver@microchip.com Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-03ice: split probe into smaller functionsMichal Swiatkowski1-0/+2
Part of code from probe can be reused in reload flow. Move this code to separate function. Create unroll functions for each part of initialization, like: ice_init_dev() and ice_deinit_dev(). It simplifies unrolling and can be used in remove flow. Avoid freeing port info as it could be reused in reload path. Will be freed in remove path since is allocated via devm_kzalloc(). Also clean the remove path to reflect the init steps. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-03ice: cleanup in VSI config/deconfig codeMichal Swiatkowski1-2/+1
Do few small cleanups: 1) Rename the function to reflect that it doesn't configure all things related to VSI. ice_vsi_cfg_lan() better fits to what function is doing. ice_vsi_cfg() can be use to name function that will configure whole VSI. 2) Remove unused ethtype field from VSI. There is no need to set ethtype here, because it is never used. 3) Remove unnecessary check for ICE_VSI_CHNL. There is check for ICE_VSI_CHNL in ice_vsi_get_qs, so there is no need to check it before calling the function. 4) Simplify ice_vsi_alloc() call. There is no need to check the type of VSI before calling ice_vsi_alloc(). For ICE_VSI_CHNL vf is always NULL (ice_vsi_setup() is called with vf=NULL). For ICE_VSI_VF or ICE_VSI_CTRL ch is always NULL and for other VSI types ch and vf are always NULL. 5) Remove unnecessary call to ice_vsi_dis_irq(). ice_vsi_dis_irq() will be called in ice_vsi_close() flow (ice_vsi_close() -> ice_vsi_down() -> ice_vsi_dis_irq()). Remove unnecessary call. 6) Don't remove specific filters in release. All hw filters are removed in ice_fltr_remove_alli(), which is always called in VSI release flow. There is no need to remove only ethertype filters before calling ice_fltr_remove_all(). 7) Rename ice_vsi_clear() to ice_vsi_free(). As ice_vsi_clear() only free memory allocated in ice_vsi_alloc() rename it to ice_vsi_free() which better shows what function is doing. 8) Free coalesce param in rebuild. There is potential memory leak if configuration of VSI lan fails. Free coalesce to avoid it. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-03ice: move RDMA init to ice_idc.cMichal Swiatkowski1-0/+1
Simplify probe flow by moving all RDMA related code to ice_init_rdma(). Unroll irq allocation if RDMA initialization fails. Implement ice_deinit_rdma() and use it in remove flow. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
net/core/gro.c 7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO") b1a78b9b9886 ("net: add support for ipv4 big tcp") https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27ice: Prevent set_channel from changing queues while RDMA activeDave Ertman1-1/+1
The PF controls the set of queues that the RDMA auxiliary_driver requests resources from. The set_channel command will alter that pool and trigger a reconfiguration of the VSI, which breaks RDMA functionality. Prevent set_channel from executing when RDMA driver bound to auxiliary device. Adding a locked variable to pass down the call chain to avoid double locking the device_lock. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-20ice: use GNSS subsystem instead of TTYArkadiusz Kubalewski1-3/+3
Previously support for GNSS was implemented as a TTY driver, it allowed to access GNSS receiver on /dev/ttyGNSS_<bus><func>. Use generic GNSS subsystem API instead of implementing own TTY driver. The receiver is accessible on /dev/gnss<id>. In case of multiple receivers in the OS, correct device can be found by enumerating either: - /sys/class/net/<eth port>/device/gnss/ - /sys/class/gnss/gnss<id>/device/ Using GNSS subsystem is superior to implementing own TTY driver, as the GNSS subsystem was designed solely for this purpose. It also implements TTY driver but in a common and defined way. From user perspective, there is no difference in communicating with a device, except new path to the device shall be used. The device will provide same information to the userspace as the old one, and can be used in the same way, i.e.: old # gpsmon /dev/ttyGNSS_2100_0 new # gpsmon /dev/gnss0 There is no other impact on userspace tools. User expecting onboard GNSS receiver support is required to enable CONFIG_GNSS=y/m in kernel config. Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-23ice: Accumulate ring statistics over resetBenjamin Mikailenko1-0/+6
Resets may occur with or without user interaction. For example, a TX hang or reconfiguration of parameters will result in a reset. During reset, the VSI is freed, freeing any statistics structures inside as well. This would create an issue for the user where a reset happens in the background, statistics set to zero, and the user checks ring statistics expecting them to be populated. To ensure this doesn't happen, accumulate ring statistics over reset. Define a new ring statistics structure, ice_ring_stats. The new structure lives in the VSI's parent, preserving ring statistics when VSI is freed. 1. Define a new structure vsi_ring_stats in the PF scope 2. Allocate/free stats only during probe, unload, or change in ring size 3. Replace previous ring statistics functionality with new structure Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-11-23ice: Accumulate HW and Netdev statistics over resetBenjamin Mikailenko1-0/+1
Resets happen with or without user interaction. For example, incidents such as TX hang or a reconfiguration of parameters will result in a reset. During reset, hardware and software statistics were set to zero. This created an issue for the user where a reset happens in the background, statistics set to zero, and the user checks statistics expecting them to be populated. To ensure this doesn't happen, keep accumulating stats over reset. 1. Remove function calls which reset hardware and netdev statistics. 2. Do not rollover statistics in ice_stat_update40 during reset. Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-10-27ice: Add support Flex RXDMichal Jaron1-0/+2
Add new VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC flag, opcode VIRTCHNL_OP_GET_SUPPORTED_RXDIDS and add member rxdid in struct virtchnl_rxq_info to support AVF Flex RXD extension. Add support to allow VF to query flexible descriptor RXDIDs supported by DDP package and configure Rx queues with selected RXDID for IAVF. Add code to allow VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message to be processed. Add necessary macros for registers. Signed-off-by: Leyi Rong <leyi.rong@intel.com> Signed-off-by: Xu Ting <ting.xu@intel.com> Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221025161252.1952939-1-jacob.e.keller@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25ice: Enable RX queue selection using skbedit actionAmritha Nambiar1-0/+15
This patch uses TC skbedit queue_mapping action to support forwarding packets to a device queue. Such filters with action forward to queue will be the highest priority switch filter in HW. Example: $ tc filter add dev ens4f0 protocol ip ingress flower\ dst_ip 192.168.1.12 ip_proto tcp dst_port 5001\ action skbedit queue_mapping 5 skip_sw The above command adds an ingress filter, incoming packets qualifying the match will be accepted into queue 5. The queue number is in decimal format. Refactored ice_add_tc_flower_adv_fltr() to consolidate code with action FWD_TO_VSI and FWD_TO QUEUE. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-13/+23
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()") c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer") https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/ https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-22ice: xsk: use Rx ring's XDP ring when picking NAPI contextMaciej Fijalkowski1-13/+23
Ice driver allocates per cpu XDP queues so that redirect path can safely use smp_processor_id() as an index to the array. At the same time though, XDP rings are used to pick NAPI context to call napi_schedule() or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and num_possible_cpus() of underlying platform is 44, then this means queue vectors with correlated NAPI contexts will carry several XDP queues. This in turn can result in a broken behavior where NAPI context of interest will never be scheduled and AF_XDP socket will not process any traffic. To fix this, let us change the way how XDP rings are assigned to Rx rings and use this information later on when setting ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated queue vector and walk through Tx ring's linked list. Once we stumble upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring. Previous [0] approach of fixing this issue was for txonly scenario because of the described grouping of XDP rings across queue vectors. So, relying on Rx ring meant that NAPI context could be scheduled with a queue vector without XDP ring with associated XSK pool. [0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/ Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-18ice: Implement control of FCS/CRC strippingJesse Brandeburg1-0/+1
The driver can allow the user to configure whether the CRC aka the FCS (Frame Check Sequence) is DMA'd to the host as part of the receive buffer. The driver usually wants this feature disabled so that the hardware checks the FCS and strips it in order to save PCI bandwidth. Control the reception of FCS to the host using the command: ethtool -K eth0 rx-fcs <on|off> The default shown in ethtool -k eth0 | grep fcs; should be "off", as the hardware will drop any frame with a bad checksum, and DMA of the checksum is useless overhead especially for small packets. Testing Hints: test the FCS/CRC arrives with received packets using tcpdump -nnpi eth0 -xxxx and it should show crc data as the last 4 bytes of the packet. Can also use wireshark to turn on CRC checking and check the data is correct. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Co-developed-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com> Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-28ice: Introduce enabling promiscuous mode on multiple VF'sMichal Wilczynski1-2/+0
In current implementation default VSI switch filter is only able to forward traffic to a single VSI. This limits promiscuous mode with private flag 'vf-true-promisc-support' to a single VF. Enabling it on the second VF won't work. Also allmulticast support doesn't seem to be properly implemented when vf-true-promisc-support is true. Use standard ice_add_rule_internal() function that already implements forwarding to multiple VSI's instead of constructing AQ call manually. Add switch filter for allmulticast mode when vf-true-promisc-support is enabled. The same filter is added regardless of the flag - it doesn't matter for this case. Remove unnecessary fields in switch structure. From now on book keeping will be done by ice_add_rule_internal(). Refactor unnecessarily passed function arguments. To test: 1) Create 2 VM's, and two VF's. Attach VF's to VM's. 2) Enable promiscuous mode on both of them and check if traffic is seen on both of them. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Add support for PPPoE hardware offloadMarcin Szycik1-0/+1
Add support for creating PPPoE filters in switchdev mode. Add support for parsing PPPoE and PPP-specific tc options: pppoe_sid and ppp_proto. Example filter: tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \ 1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR Changes in iproute2 are required to use the new fields. ICE COMMS DDP package is required to create a filter as it contains PPPoE profiles. Added a warning message when loaded DDP package does not contain required profiles. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-21ice: add write functionality for GNSS TTYKarol Kolacinski1-2/+2
Add the possibility to write raw bytes to the GNSS module through the first TTY device. This allows user to configure the module. Create a second read-only TTY device. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-15ice: Add EXTTS feature to the feature bitmapAnirudh Venkataramanan1-0/+1
External time stamp sources are supported only on certain devices. Enforce the right support matrix by adding the ICE_F_PTP_EXTTS bit to the feature bitmap set. Co-developed-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+1
No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06ice: Fix race during aux device (un)pluggingIvan Vecera1-0/+1
Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] <TASK> [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] </TASK> Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: introduce common helper for retrieving VSI by vsi_numMaciej Fijalkowski1-0/+15
Both ice_idc.c and ice_virtchnl.c carry their own implementation of a helper function that is looking for a given VSI based on provided vsi_num. Their functionality is the same, so let's introduce the common function in ice.h that both of the mentioned sites will use. This is a strictly cleanup thing, no functionality is changed. Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05ice: synchronize_rcu() when terminating ringsMaciej Fijalkowski1-1/+1
Unfortunately, the ice driver doesn't respect the RCU critical section that XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to paths that destroy resources that might be in use. This was addressed in other AF_XDP ZC enabled drivers, for reference see for example commit b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK") Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-01ice: Fix broken IFF_ALLMULTI handlingIvan Vecera1-1/+0
Handling of all-multicast flag and associated multicast promiscuous mode is broken in ice driver. When an user switches allmulticast flag on or off the driver checks whether any VLANs are configured over the interface (except default VLAN 0). If any extra VLANs are registered it enables multicast promiscuous mode for all these VLANs (including default VLAN 0) using ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all multicast packets tagged with known VLAN ID or untagged are received and multicast packets tagged with unknown VLAN ID ignored. If no extra VLANs are registered (so only VLAN 0 exists) it enables multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC look-up type. In this situation any multicast packets including tagged ones are received. The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way: ice_vsi_sync_fltr() { ... if (changed_flags & IFF_ALLMULTI) { if (netdev->flags & IFF_ALLMULTI) { if (vsi->num_vlans > 1) ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS); else ice_set_promisc(..., ICE_MCAST_PROMISC_BITS); } else { if (vsi->num_vlans > 1) ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS); else ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS); } } ... } The code above depends on value vsi->num_vlan that specifies number of VLANs configured over the interface (including VLAN 0) and this is problem because that value is modified in NDO callbacks ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid(). Scenario 1: 1. ip link set ens7f0 allmulticast on 2. ip link add vlan10 link ens7f0 type vlan id 10 3. ip link set ens7f0 allmulticast off 4. ip link set ens7f0 allmulticast on [1] In this scenario IFF_ALLMULTI is enabled and the driver calls ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs multicast promisc rule with non-VLAN look-up type. [2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2 [3] Command switches IFF_ALLMULTI off and the driver calls ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this call is effectively NOP because it looks for multicast promisc rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such rules exist. So the all-multicast remains enabled silently in hardware. [4] Command tries to switch IFF_ALLMULTI on and the driver calls ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call fails (-EEXIST) because non-VLAN multicast promisc rule already exists. Scenario 2: 1. ip link add vlan10 link ens7f0 type vlan id 10 2. ip link set ens7f0 allmulticast on 3. ip link add vlan20 link ens7f0 type vlan id 20 4. ip link del vlan10 ; ip link del vlan20 5. ip link set ens7f0 allmulticast off [1] VLAN with ID 10 is added and vsi->num_vlan==2 [2] Command switches IFF_ALLMULTI on and driver installs multicast promisc rules with VLAN look-up type for VLAN 0 and 10 [3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast promisc rules is added for this new VLAN so the interface does not receive MC packets from VLAN 20 [4] Both VLANs are removed but multicast rule for VLAN 10 remains installed so interface receives multicast packets from VLAN 10 [5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1 the driver tries to remove multicast promisc rule for VLAN 0 with non-VLAN look-up that does not exist. All-multicast looks disabled from user point of view but it is partially enabled in HW (interface receives all multicast packets either untagged or tagged with VLAN ID 10) To resolve these issues the patch introduces these changes: 1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed and IFF_ALLMULTI is enabled an appropriate multicast promisc rule for that VLAN ID is added/removed. 2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up type for existing multicast promisc rule for VLAN 0 is updated to ICE_MCAST_VLAN_PROMISC_BITS. 3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up type for existing multicast promisc rule for VLAN 0 is updated to ICE_MCAST_PROMISC_BITS. 4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY bit protection to avoid races with ice_vsi_sync_fltr() that runs in ice_service_task() context. 5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed. 6. Error messages added to ice_fltr_*_vsi_promisc() helper functions to avoid them in their callers 7. Small improvements to increase readability Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-28ice: xsk: Fix indexing in ice_tx_xsk_pool()Maciej Fijalkowski1-1/+1
Ice driver tries to always create XDP rings array to be num_possible_cpus() sized, regardless of user's queue count setting that can be changed via ethtool -L for example. Currently, ice_tx_xsk_pool() calculates the qid by decrementing the ring->q_index by the count of XDP queues, but ring->q_index is set to 'i + vsi->alloc_txq'. When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool() will do OOB access and in the final result ring would not get xsk_pool pointer assigned. Then, each ice_xsk_wakeup() call will fail with error and it will not be possible to get into NAPI and do the processing from driver side. Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the setting of ring->q_index. Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-5-maciej.fijalkowski@intel.com
2022-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+2
Merge in overtime fixes, no conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23ice: fix 'scheduling while atomic' on aux critical err interruptAlexander Lobakin1-0/+2
There's a kernel BUG splat on processing aux critical error interrupts in ice_misc_intr(): [ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000 ... [ 2101.060770] Call Trace: [ 2101.063229] <IRQ> [ 2101.065252] dump_stack+0x41/0x60 [ 2101.068587] __schedule_bug.cold.100+0x4c/0x58 [ 2101.073060] __schedule+0x6a4/0x830 [ 2101.076570] schedule+0x35/0xa0 [ 2101.079727] schedule_preempt_disabled+0xa/0x10 [ 2101.084284] __mutex_lock.isra.7+0x310/0x420 [ 2101.088580] ? ice_misc_intr+0x201/0x2e0 [ice] [ 2101.093078] ice_send_event_to_aux+0x25/0x70 [ice] [ 2101.097921] ice_misc_intr+0x220/0x2e0 [ice] [ 2101.102232] __handle_irq_event_percpu+0x40/0x180 [ 2101.106965] handle_irq_event_percpu+0x30/0x80 [ 2101.111434] handle_irq_event+0x36/0x53 [ 2101.115292] handle_edge_irq+0x82/0x190 [ 2101.119148] handle_irq+0x1c/0x30 [ 2101.122480] do_IRQ+0x49/0xd0 [ 2101.125465] common_interrupt+0xf/0xf [ 2101.129146] </IRQ> ... As Andrew correctly mentioned previously[0], the following call ladder happens: ice_misc_intr() <- hardirq ice_send_event_to_aux() device_lock() mutex_lock() might_sleep() might_resched() <- oops Add a new PF state bit which indicates that an aux critical error occurred and serve it in ice_service_task() in process context. The new ice_pf::oicr_err_reg is read-write in both hardirq and process contexts, but only 3 bits of non-critical data probably aren't worth explicit synchronizing (and they're even in the same byte [31:24]). [0] https://lore.kernel.org/all/YeSRUVmrdmlUXHDn@lunn.ch Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14ice: remove circular header dependencies on ice.hJacob Keller1-3/+0
Several headers in the ice driver include ice.h even though they are themselves included by that header. The most notable of these is ice_common.h, but several other headers also do this. Such a recursive inclusion is problematic as it forces headers to be included in a strict order, otherwise compilation errors can result. The circular inclusions do not trigger an endless loop due to standard header inclusion guards, however other errors can occur. For example, ice_flow.h defines ice_rss_hash_cfg, which is used by ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx. ice_flow.h includes ice_acl.h, which includes ice_common.h, and which finally includes ice.h. Since ice.h itself includes ice_sriov.h, this creates a circular dependency. The definition in ice_sriov.h requires things from ice_flow.h, but ice_flow.h itself will lead to trying to load ice_sriov.h as part of its process for expanding ice.h. The current code avoids this issue by having an implicit dependency without the include of ice_flow.h. If we were to fix that so that ice_sriov.h explicitly depends on ice_flow.h the following pattern would occur: ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h At this point, during the expansion of, the header guard for ice_flow.h is already set, so when ice_sriov.h attempts to load the ice_flow.h header it is skipped. Then, we go on to begin including the rest of ice_sriov.h, including structure definitions which depend on ice_rss_hash_cfg. This produces a compiler warning because ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the start of ice_flow.h! If the order of headers is incorrect (ice_flow.h is not implicitly loaded first in all files which include ice_sriov.h) then we get the same failure. Removing this recursive inclusion requires fixing a few cases where some headers depended on the header inclusions from ice.h. In addition, a few other changes are also required. Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h, which is the likely reason that ice_common.h includes ice.h at all. This macro implementation requires the full definition of ice_pf in order to properly compile. Fix this by moving it to a function declared in ice_main.c, so that we do not require all files to depend on the layout of the ice_pf structure. Note that this change only fixes circular dependencies, but it does not fully resolve all implicit dependencies where one header may depend on the inclusion of another. I tried to fix as many of the implicit dependencies as I noticed, but fixing them all requires a somewhat tedious analysis of each header and attempting to compile it separately. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: rename ice_virtchnl_pf.c to ice_sriov.cJacob Keller1-1/+1
The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the code for implementing Single Root IOV virtualization resides. This code includes support for bringing up and tearing down VFs, hooks into the kernel SR-IOV netdev operations, and for handling virtchnl messages from VFs. In the future, we plan to support Scalable IOV in addition to Single Root IOV as an alternative virtualization scheme. This implementation will re-use some but not all of the code in ice_virtchnl_pf.c To prepare for this future, we want to refactor and split up the code in ice_virtchnl_pf.c into the following scheme: * ice_vf_lib.[ch] Basic VF structures and accessors. This is where scheme-independent code will reside. * ice_virtchnl.[ch] Virtchnl message handling. This is where the bulk of the logic for processing messages from VFs using the virtchnl messaging scheme will reside. This is separated from ice_vf_lib.c because it is distinct and has a bulk of the processing code. * ice_sriov.[ch] Single Root IOV implementation, including initialization and the routines for interacting with SR-IOV based netdev operations. * (future) ice_siov.[ch] Scalable IOV implementation. As a first step, lets assume that all of the code in ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to ice_sriov.c and its header to ice_sriov.h Future changes will further split out the code in these files following the plan outlined here. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: rename ice_sriov.c to ice_vf_mbx.cJacob Keller1-1/+1
The ice_sriov.c file primarily contains code which handles the logic for mailbox overflow detection and some other utility functions related to the virtualization mailbox. The bulk of the SR-IOV implementation is actually found in ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific. In the future, the ice driver will support an additional virtualization scheme known as Scalable IOV, and the code in this file will be used for this alternative implementation. Rename this file (and its associated header) to ice_vf_mbx.c, so that we can later re-use the ice_sriov.c file as the SR-IOV specific file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11ice: Support GTP-U and GTP-C offload in switchdevMarcin Szycik1-0/+1
Add support for creating filters for GTP-U and GTP-C in switchdev mode. Add support for parsing GTP-specific options (QFI and PDU type) and TEID. By default, a filter for GTP-U will be added. To add a filter for GTP-C, specify enc_dst_port = 2123, e.g.: tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \ enc_dst_port 2123 action mirred egress redirect dev $VF1_PR Note: GTP-U with outer IPv6 offload is not supported yet. Note: GTP-U with no payload offload is not supported yet. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-10Merge branch '100GbE' of ↵Jakub Kicinski1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-03-09 This series contains updates to ice driver only. Martyna implements switchdev filtering on inner EtherType field for tunnels. Marcin adds reporting of slowpath statistics for port representors. Jonathan Toppins changes a non-fatal link error message from warning to debug. Maciej removes unnecessary checks in ice_clean_tx_irq(). Amritha adds support for ADQ to match outer destination MAC for tunnels. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add support for outer dest MAC for ADQ tunnels ice: avoid XDP checks in ice_clean_tx_irq() ice: change "can't set link" message to dbg level ice: Add slow path offload stats on port representor in switchdev ice: Add support for inner etype in switchdev ==================== Link: https://lore.kernel.org/r/20220309190315.1380414-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+11
net/dsa/dsa2.c commit afb3cc1a397d ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails") commit e83d56537859 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master") https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/ drivers/net/ethernet/intel/ice/ice.h commit 97b0129146b1 ("ice: Fix error with handling of bonding MTU") commit 43113ff73453 ("ice: add TTY for GNSS module for E810T device") https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/ drivers/staging/gdm724x/gdm_lte.c commit fc7f750dc9d1 ("staging: gdm724x: fix use after free in gdm_lte_rx()") commit 4bcc4249b4cf ("staging: Use netif_rx().") https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10ice: Fix race condition during interface enslaveIvan Vecera1-1/+10
Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating auxiliary device") changes a process of re-creation of aux device so ice_plug_aux_dev() is called from ice_service_task() context. This unfortunately opens a race window that can result in dead-lock when interface has left LAG and immediately enters LAG again. Reproducer: ``` #!/bin/sh ip link add lag0 type bond mode 1 miimon 100 ip link set lag0 for n in {1..10}; do echo Cycle: $n ip link set ens7f0 master lag0 sleep 1 ip link set ens7f0 nomaster done ``` This results in: [20976.208697] Workqueue: ice ice_service_task [ice] [20976.213422] Call Trace: [20976.215871] __schedule+0x2d1/0x830 [20976.219364] schedule+0x35/0xa0 [20976.222510] schedule_preempt_disabled+0xa/0x10 [20976.227043] __mutex_lock.isra.7+0x310/0x420 [20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core] [20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core] [20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core] [20976.261079] ib_register_device+0x40d/0x580 [ib_core] [20976.266139] irdma_ib_register_device+0x129/0x250 [irdma] [20976.281409] irdma_probe+0x2c1/0x360 [irdma] [20976.285691] auxiliary_bus_probe+0x45/0x70 [20976.289790] really_probe+0x1f2/0x480 [20976.298509] driver_probe_device+0x49/0xc0 [20976.302609] bus_for_each_drv+0x79/0xc0 [20976.306448] __device_attach+0xdc/0x160 [20976.310286] bus_probe_device+0x9d/0xb0 [20976.314128] device_add+0x43c/0x890 [20976.321287] __auxiliary_device_add+0x43/0x60 [20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice] [20976.330109] ice_service_task+0xd0c/0xed0 [ice] [20976.342591] process_one_work+0x1a7/0x360 [20976.350536] worker_thread+0x30/0x390 [20976.358128] kthread+0x10a/0x120 [20976.365547] ret_from_fork+0x1f/0x40 ... [20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084 [20976.446469] Call Trace: [20976.448921] __schedule+0x2d1/0x830 [20976.452414] schedule+0x35/0xa0 [20976.455559] schedule_preempt_disabled+0xa/0x10 [20976.460090] __mutex_lock.isra.7+0x310/0x420 [20976.464364] device_del+0x36/0x3c0 [20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice] [20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice] [20976.477288] notifier_call_chain+0x47/0x70 [20976.481386] __netdev_upper_dev_link+0x18b/0x280 [20976.489845] bond_enslave+0xe05/0x1790 [bonding] [20976.494475] do_setlink+0x336/0xf50 [20976.502517] __rtnl_newlink+0x529/0x8b0 [20976.543441] rtnl_newlink+0x43/0x60 [20976.546934] rtnetlink_rcv_msg+0x2b1/0x360 [20976.559238] netlink_rcv_skb+0x4c/0x120 [20976.563079] netlink_unicast+0x196/0x230 [20976.567005] netlink_sendmsg+0x204/0x3d0 [20976.570930] sock_sendmsg+0x4c/0x50 [20976.574423] ____sys_sendmsg+0x1eb/0x250 [20976.586807] ___sys_sendmsg+0x7c/0xc0 [20976.606353] __sys_sendmsg+0x57/0xa0 [20976.609930] do_syscall_64+0x5b/0x1a0 [20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca 1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev() is called from ice_service_task() context, aux device is created and associated device->lock is taken. 2. Command 'ip link ... set master...' calls ice's notifier under RTNL lock and that notifier calls ice_unplug_aux_dev(). That function tries to take aux device->lock but this is already taken by ice_plug_aux_dev() in step 1 3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already taken in step 2 4. Dead-lock The patch fixes this issue by following changes: - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev() call in ice_service_task() - The bit is checked in ice_clear_rdma_cap() and only if it is not set then ice_unplug_aux_dev() is called. If it is set (in other words plugging of aux device was requested and ice_plug_aux_dev() is potentially running) then the function only clears the bit - Once ice_plug_aux_dev() call (in ice_service_task) is finished the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked whether it was already cleared by ice_clear_rdma_cap(). If so then aux device is unplugged. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Co-developed-by: Petr Oros <poros@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09ice: Add slow path offload stats on port representor in switchdevMarcin Szycik1-0/+3
Implement callbacks to check for stats and fetch port representor stats. Stats are taken from RX/TX ring corresponding to port representor and show the number of bytes/packets that were not offloaded. To see slow path stats run: ifstat -x cpu_hits -a Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08ice: Fix error with handling of bonding MTUDave Ertman1-0/+1
When a bonded interface is destroyed, .ndo_change_mtu can be called during the tear-down process while the RTNL lock is held. This is a problem since the auxiliary driver linked to the LAN driver needs to be notified of the MTU change, and this requires grabbing a device_lock on the auxiliary_device's dev. Currently this is being attempted in the same execution context as the call to .ndo_change_mtu which is causing a dead-lock. Move the notification of the changed MTU to a separate execution context (watchdog service task) and eliminate the "before" notification. Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: factor VF variables to separate structureJacob Keller1-9/+1
We maintain a number of values for VFs within the ice_pf structure. This includes the VF table, the number of allocated VFs, the maximum number of supported SR-IOV VFs, the number of queue pairs per VF, the number of MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events. We're about to add a few more variables to this list. Clean this up first by extracting these members out into a new ice_vfs structure defined in ice_virtchnl_pf.h Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: store VF pointer instead of VF IDJacob Keller1-2/+1
The VSI structure contains a vf_id field used to associate a VSI with a VF. This is used mainly for ICE_VSI_VF as well as partially for ICE_VSI_CTRL associated with the VFs. This API was designed with the idea that VFs are stored in a simple array that was expected to be static throughout most of the driver's life. We plan on refactoring VF storage in a few key ways: 1) converting from a simple static array to a hash table 2) using krefs to track VF references obtained from the hash table 3) use RCU to delay release of VF memory until after all references are dropped This is motivated by the goal to ensure that the lifetime of VF structures is accounted for, and prevent various use-after-free bugs. With the existing vsi->vf_id, the reference tracking for VFs would become somewhat convoluted, because each VSI maintains a vf_id field which will then require performing a look up. This means all these flows will require reference tracking and proper usage of rcu_read_lock, etc. We know that the VF VSI will always be backed by a valid VF structure, because the VSI is created during VF initialization and removed before the VF is destroyed. Rely on this and store a reference to the VF in the VSI structure instead of storing a VF ID. This will simplify the usage and avoid the need to perform lookups on the hash table in the future. For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a vsi->vf pointer is valid when dealing with VF VSIs. This will aid in debugging code which violates this assumption and avoid more disastrous panics. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: add TTY for GNSS module for E810T deviceKarol Kolacinski1-0/+6
Add a new ice_gnss.c file for holding the basic GNSS module functions. If the device supports GNSS module, call the new ice_gnss_init and ice_gnss_release functions where appropriate. Implement basic functionality for reading the data from GNSS module using TTY device. Add I2C read AQ command. It is now required for controlling the external physical connectors via external I2C port expander on E810-T adapters. Future changes will introduce write functionality. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Sudhansu Sekhar Mishra <sudhansu.mishra@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+0
tools/testing/selftests/net/mptcp/mptcp_join.sh 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers") 857898eb4b28 ("selftests: mptcp: add missing join check") 6ef84b1517e0 ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions") c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") 09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") 84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr") efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") 3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18ice: fix concurrent reset and removal of VFsJacob Keller1-1/+0
Commit c503e63200c6 ("ice: Stop processing VF messages during teardown") introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is intended to prevent some issues with concurrently handling messages from VFs while tearing down the VFs. This change was motivated by crashes caused while tearing down and bringing up VFs in rapid succession. It turns out that the fix actually introduces issues with the VF driver caused because the PF no longer responds to any messages sent by the VF during its .remove routine. This results in the VF potentially removing its DMA memory before the PF has shut down the device queues. Additionally, the fix doesn't actually resolve concurrency issues within the ice driver. It is possible for a VF to initiate a reset just prior to the ice driver removing VFs. This can result in the remove task concurrently operating while the VF is being reset. This results in similar memory corruption and panics purportedly fixed by that commit. Fix this concurrency at its root by protecting both the reset and removal flows using the existing VF cfg_lock. This ensures that we cannot remove the VF while any outstanding critical tasks such as a virtchnl message or a reset are occurring. This locking change also fixes the root cause originally fixed by commit c503e63200c6 ("ice: Stop processing VF messages during teardown"), so we can simply revert it. Note that I kept these two changes together because simply reverting the original commit alone would leave the driver vulnerable to worse race conditions. Fixes: c503e63200c6 ("ice: Stop processing VF messages during teardown") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-14ice: Simplify tracking status of RDMA supportDave Ertman1-3/+0
The status of support for RDMA is currently being tracked with two separate status flags. This is unnecessary with the current state of the driver. Simplify status tracking down to a single flag. Rename the helper function to denote the RDMA specific status and universally use the helper function to test the status bit. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+2
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-10ice: Avoid RTNL lock when re-creating auxiliary deviceDave Ertman1-1/+2
If a call to re-create the auxiliary device happens in a context that has already taken the RTNL lock, then the call flow that recreates auxiliary device can hang if there is another attempt to claim the RTNL lock by the auxiliary driver. To avoid this, any call to re-create auxiliary devices that comes from an source that is holding the RTNL lock (e.g. netdev notifier when interface exits a bond) should execute in a separate thread. To accomplish this, add a flag to the PF that will be evaluated in the service task and dealt with there. Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add ability for PF admin to enable VF VLAN pruningBrett Creeley1-0/+1
VFs by default are able to see all tagged traffic regardless of trust and VLAN filters. Based on legacy devices (i.e. ixgbe, i40e), customers expect VFs to receive all VLAN tagged traffic with a matching destination MAC. Add an ethtool private flag 'vf-vlan-pruning' and set the default to off so VFs will receive all VLAN traffic directed towards them. When the flag is turned on, VF will only be able to receive untagged traffic or traffic with VLAN tags it has created interfaces for. Also, the flag cannot be changed while any VFs are allocated. This was done to simplify the implementation. So, if this flag is needed, then the PF admin must enable it. If the user tries to enable the flag while VFs are active, then print an unsupported message with the vf-vlan-pruning flag included. In case multiple flags were specified, this makes it clear to the user which flag failed. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add outer_vlan_ops and VSI specific VLAN ops implementationsBrett Creeley1-1/+2
Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN ops are only available when the device is in Double VLAN Mode (DVM). Depending on the VSI type, the requirements for what operations to use/allow differ. By default all VSI's have unsupported inner and outer VSI VLAN ops. This implementation was chosen to prevent unexpected crashes due to null pointer dereferences. Instead, if a VSI calls an unsupported op, it will just return -EOPNOTSUPP. Add implementations to support modifying outer VLAN fields for VSI context. This includes the ability to modify VLAN stripping, insertion, and the port VLAN based on the outer VLAN handling fields of the VSI context. These functions should only ever be used if DVM is enabled because that means the firmware supports the outer VLAN fields in the VSI context. If the device is in DVM, then always use the outer_vlan_ops, else use the vlan_ops since the device is in Single VLAN Mode (SVM). Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to ice_vsi_vlan_setup() as the latter function is specific to the PF and all other VSI types that need an untagged VLAN 0 filter already do this in their specific flows. Without this change, Flow Director is failing to initialize because it does not implement any VSI VLAN ops. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add new VSI VLAN opsBrett Creeley1-0/+2
Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and offloads require more flexibility when configuring VLANs. The VSI VLAN interface will allow flexibility for configuring VLANs for all VSI types. Add new files to separate the VSI VLAN ops and move functions to make the code more organized. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-30ice: Add flow director support for channel modeKiran Patil1-1/+6
Add support to enable flow-director filter when multiple TCs are configured. Flow director filter can be configured using ethtool (--config-ntuple option). When multiple TCs are configured, each TC is mapped to an unique HW VSI. So VSI corresponding to queue used in filter is identified and flow director context is updated with correct VSI while configuring ntuple filter in HW. Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15ice: support immediate firmware activation via devlink reloadJacob Keller1-0/+1
The ice hardware contains an embedded chip with firmware which can be updated using devlink flash. The firmware which runs on this chip is referred to as the Embedded Management Processor firmware (EMP firmware). Activating the new firmware image currently requires that the system be rebooted. This is not ideal as rebooting the system can cause unwanted downtime. In practical terms, activating the firmware does not always require a full system reboot. In many cases it is possible to activate the EMP firmware immediately. There are a couple of different scenarios to cover. * The EMP firmware itself can be reloaded by issuing a special update to the device called an Embedded Management Processor reset (EMP reset). This reset causes the device to reset and reload the EMP firmware. * PCI configuration changes are only reloaded after a cold PCIe reset. Unfortunately there is no generic way to trigger this for a PCIe device without a system reboot. When performing a flash update, firmware is capable of responding with some information about the specific update requirements. The driver updates the flash by programming a secondary inactive bank with the contents of the new image, and then issuing a command to request to switch the active bank starting from the next load. The response to the final command for updating the inactive NVM flash bank includes an indication of the minimum reset required to fully update the device. This can be one of the following: * A full power on is required * A cold PCIe reset is required * An EMP reset is required The response to the command to switch flash banks includes an indication of whether or not the firmware will allow an EMP reset request. For most updates, an EMP reset is sufficient to load the new EMP firmware without issues. In some cases, this reset is not sufficient because the PCI configuration space has changed. When this could cause incompatibility with the new EMP image, the firmware is capable of rejecting the EMP reset request. Add logic to ice_fw_update.c to handle the response data flash update AdminQ commands. For the reset level, issue a devlink status notification informing the user of how to complete the update with a simple suggestion like "Activate new firmware by rebooting the system". Cache the status of whether or not firmware will restrict the EMP reset for use in implementing devlink reload. Implement support for devlink reload with the "fw_activate" flag. This allows user space to request the firmware be activated immediately. For the .reload_down handler, we will issue a request for the EMP reset using the appropriate firmware AdminQ command. If we know that the firmware will not allow an EMP reset, simply exit with a suitable netlink extended ACK message indicating that the EMP reset is not available. For the .reload_up handler, simply wait until the driver has finished resetting. Logic to handle processing of an EMP reset already exists in the driver as part of its reset and rebuild flows. Implement support for the devlink reload interface with the "fw_activate" action. This allows userspace to request activation of firmware without a reboot. Note that support for indicating the required reset and EMP reset restriction is not supported on old versions of firmware. The driver can determine if the two features are supported by checking the device capabilities report. I confirmed support has existed since at least version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the EMP reset request has existed in all version of the EMP firmware for the ice hardware. Check the device capabilities report to determine whether or not the indications are reported by the running firmware. If the reset requirement indication is not supported, always assume a full power on is necessary. If the reset restriction capability is not supported, always assume the EMP reset is available. Users can verify if the EMP reset has activated the firmware by using the devlink info report to check that the 'running' firmware version has updated. For example a user might do the following: # Check current version $ devlink dev info # Update the device $ devlink dev flash pci/0000:af:00.0 file firmware.bin # Confirm stored version updated $ devlink dev info # Reload to activate new firmware $ devlink dev reload pci/0000:af:00.0 action fw_activate # Confirm running version updated $ devlink dev info Finally, this change does *not* implement basic driver-only reload support. I did look into trying to do this. However, it requires significant refactor of how the ice driver probes and loads everything. The ice driver probe and allocation flows were not designed with such a reload in mind. Refactoring the flow to support this is beyond the scope of this change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15ice: devlink: add shadow-ram region to snapshot Shadow RAMJacob Keller1-0/+1
We have a region for reading the contents of the NVM flash as a snapshot. This region does not allow reading the Shadow RAM, as it always passes the FLASH_ONLY bit to the low level firmware interface. Add a separate shadow-ram region which will allow snapshot of the current contents of the Shadow RAM. This data is built from the NVM contents but is distinct as the device builds up the Shadow RAM during initialization, so being able to snapshot its contents can be useful when attempting to debug flash related issues. Fix the comment description of the nvm-flash region which incorrectly stated that it filled the shadow-ram region, and add a comment explaining that the nvm-flash region does not actually read the Shadow RAM. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-14ice: Remove string printing for ice_statusTony Nguyen1-1/+0
Remove the ice_stat_str() function which prints the string representation of the ice_status error code. With upcoming changes moving away from ice_status, there will be no need for this function. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>