aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2022-02-15net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()Vladimir Oltean1-1/+5
ocelot_vlan_member_del() will free the struct ocelot_bridge_vlan, so if this is the same as the port's pvid_vlan which we access afterwards, what we're accessing is freed memory. Fix the bug by determining whether to clear ocelot_port->pvid_vlan prior to calling ocelot_vlan_member_del(). Fixes: d4004422f6f9 ("net: mscc: ocelot: track the port pvid using a pointer") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-02-15dpaa2-eth: Initialize mutex used in one step timestamping pathRadu Bulie1-1/+1
1588 Single Step Timestamping code path uses a mutex to enforce atomicity for two events: - update of ptp single step register - transmit ptp event packet Before this patch the mutex was not initialized. This caused unexpected crashes in the Tx function. Fixes: c55211892f463 ("dpaa2-eth: support PTP Sync packet one-step timestamping") Signed-off-by: Radu Bulie <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-02-15dpaa2-switch: fix default return of dpaa2_switch_flower_parse_mirror_keyTom Rix1-1/+3
Clang static analysis reports this representative problem dpaa2-switch-flower.c:616:24: warning: The right operand of '==' is a garbage value tmp->cfg.vlan_id == vlan) { ^ ~~~~ vlan is set in dpaa2_switch_flower_parse_mirror_key(). However this function can return success without setting vlan. So change the default return to -EOPNOTSUPP. Fixes: 0f3faece5808 ("dpaa2-switch: add VLAN based mirroring") Signed-off-by: Tom Rix <[email protected]> Reviewed-by: Ioana Ciornei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-02-14ice: enable parsing IPSEC SPI headers for RSSJesse Brandeburg1-0/+6
The COMMS package can enable the hardware parser to recognize IPSEC frames with ESP header and SPI identifier. If this package is available and configured for loading in /lib/firmware, then the driver will succeed in enabling this protocol type for RSS. This in turn allows the hardware to hash over the SPI and use it to pick a consistent receive queue for the same secure flow. Without this all traffic is steered to the same queue for multiple traffic threads from the same IP address. For that reason this is marked as a fix, as the driver supports the model, but it wasn't enabled. If the package is not available, adding this type will fail, but the failure is ignored on purpose as it has no negative affect. Fixes: c90ed40cefe1 ("ice: Enable writing hardware filtering tables") Signed-off-by: Jesse Brandeburg <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-02-11atl1c: fix tx timeout after link flap on Mikrotik 10/25G NICGatis Peisenieks1-1/+1
If NIC had packets in tx queue at the moment link down event happened, it could result in tx timeout when link got back up. Since device has more than one tx queue we need to reset them accordingly. Fixes: 057f4af2b171 ("atl1c: add 4 RX/TX queue support for Mikrotik 10/25G NIC") Signed-off-by: Gatis Peisenieks <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-10Merge branch '100GbE' of ↵Jakub Kicinski5-16/+53
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-02-10 Dan Carpenter propagates an error in FEC configuration. Jesse fixes TSO offloads of IPIP and SIT frames. Dave adds a dedicated LAG unregister function to resolve a KASAN error and moves auxiliary device re-creation after LAG removal to the service task to avoid issues with RTNL lock. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Avoid RTNL lock when re-creating auxiliary device ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler ice: fix IPIP and SIT TSO offload ice: fix an error code in ice_cfg_phy_fec() ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-10net: mscc: ocelot: fix mutex lock error during ethtool stats readColin Foster1-4/+7
An ongoing workqueue populates the stats buffer. At the same time, a user might query the statistics. While writing to the buffer is mutex-locked, reading from the buffer wasn't. This could lead to buggy reads by ethtool. This patch fixes the former blamed commit, but the bug was introduced in the latter. Signed-off-by: Colin Foster <[email protected]> Fixes: 1e1caa9735f90 ("ocelot: Clean up stats update deferred work") Fixes: a556c76adc052 ("net: mscc: Add initial Ocelot switch support") Reported-by: Vladimir Oltean <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-10ice: Avoid RTNL lock when re-creating auxiliary deviceDave Ertman2-1/+5
If a call to re-create the auxiliary device happens in a context that has already taken the RTNL lock, then the call flow that recreates auxiliary device can hang if there is another attempt to claim the RTNL lock by the auxiliary driver. To avoid this, any call to re-create auxiliary devices that comes from an source that is holding the RTNL lock (e.g. netdev notifier when interface exits a bond) should execute in a separate thread. To accomplish this, add a flag to the PF that will be evaluated in the service task and dealt with there. Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <[email protected]> Reviewed-by: Jonathan Toppins <[email protected]> Tested-by: Gurucharan G <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-02-10ice: Fix KASAN error in LAG NETDEV_UNREGISTER handlerDave Ertman1-6/+28
Currently, the same handler is called for both a NETDEV_BONDING_INFO LAG unlink notification as for a NETDEV_UNREGISTER call. This is causing a problem though, since the netdev_notifier_info passed has a different structure depending on which event is passed. The problem manifests as a call trace from a BUG: KASAN stack-out-of-bounds error. Fix this by creating a handler specific to NETDEV_UNREGISTER that only is passed valid elements in the netdev_notifier_info struct for the NETDEV_UNREGISTER event. Also included is the removal of an unbalanced dev_put on the peer_netdev and related braces. Fixes: 6a8b357278f5 ("ice: Respond to a NETDEV_UNREGISTER event for LAG") Signed-off-by: Dave Ertman <[email protected]> Acked-by: Jonathan Toppins <[email protected]> Tested-by: Sunitha Mekala <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-02-10ice: fix IPIP and SIT TSO offloadJesse Brandeburg2-8/+18
The driver was avoiding offload for IPIP (at least) frames due to parsing the inner header offsets incorrectly when trying to check lengths. This length check works for VXLAN frames but fails on IPIP frames because skb_transport_offset points to the inner header in IPIP frames, which meant the subtraction of transport_header from inner_network_header returns a negative value (-20). With the code before this patch, everything continued to work, but GSO was being used to segment, causing throughputs of 1.5Gb/s per thread. After this patch, throughput is more like 10Gb/s per thread for IPIP traffic. Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version") Signed-off-by: Jesse Brandeburg <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Tested-by: Gurucharan G <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-02-10ice: fix an error code in ice_cfg_phy_fec()Dan Carpenter1-1/+2
Propagate the error code from ice_get_link_default_override() instead of returning success. Fixes: ea78ce4dab05 ("ice: add link lenient and default override support") Signed-off-by: Dan Carpenter <[email protected]> Tested-by: Gurucharan G <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-02-10dpaa2-eth: unregister the netdev before disconnecting from the PHYRobert-Ionut Alexa1-2/+2
The netdev should be unregistered before we are disconnecting from the MAC/PHY so that the dev_close callback is called and the PHY and the phylink workqueues are actually stopped before we are disconnecting and destroying the phylink instance. Fixes: 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink") Signed-off-by: Robert-Ionut Alexa <[email protected]> Signed-off-by: Ioana Ciornei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-02-10net: macb: Align the dma and coherent dma masksMarc St-Amand1-1/+1
Single page and coherent memory blocks can use different DMA masks when the macb accesses physical memory directly. The kernel is clever enough to allocate pages that fit into the requested address width. When using the ARM SMMU, the DMA mask must be the same for single pages and big coherent memory blocks. Otherwise the translation tables turn into one big mess. [ 74.959909] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK [ 74.959989] arm-smmu fd800000.smmu: Unhandled context fault: fsr=0x402, iova=0x3165687460, fsynr=0x20001, cbfrsynra=0x877, cb=1 [ 75.173939] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK [ 75.173955] arm-smmu fd800000.smmu: Unhandled context fault: fsr=0x402, iova=0x3165687460, fsynr=0x20001, cbfrsynra=0x877, cb=1 Since using the same DMA mask does not hurt direct 1:1 physical memory mappings, this commit always aligns DMA and coherent masks. Signed-off-by: Marc St-Amand <[email protected]> Signed-off-by: Harini Katakam <[email protected]> Acked-by: Nicolas Ferre <[email protected]> Tested-by: Conor Dooley <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-02-09net: amd-xgbe: disable interrupts during pci removalRaju Rangoju1-0/+3
Hardware interrupts are enabled during the pci probe, however, they are not disabled during pci removal. Disable all hardware interrupts during pci removal to avoid any issues. Fixes: e75377404726 ("amd-xgbe: Update PCI support to use new IRQ functions") Suggested-by: Selwin Sebastian <[email protected]> Signed-off-by: Raju Rangoju <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-02-08nfp: flower: fix ida_idx not being releasedLouis Peens1-4/+8
When looking for a global mac index the extra NFP_TUN_PRE_TUN_IDX_BIT that gets set if nfp_flower_is_supported_bridge is true is not taken into account. Consequently the path that should release the ida_index in cleanup is never triggered, causing messages like: nfp 0000:02:00.0: nfp: Failed to offload MAC on br-ex. nfp 0000:02:00.0: nfp: Failed to offload MAC on br-ex. nfp 0000:02:00.0: nfp: Failed to offload MAC on br-ex. after NFP_MAX_MAC_INDEX number of reconfigs. Ultimately this lead to new tunnel flows not being offloaded. Fix this by unsetting the NFP_TUN_PRE_TUN_IDX_BIT before checking if the port is of type OTHER. Fixes: 2e0bc7f3cb55 ("nfp: flower: encode mac indexes with pre-tunnel rule check") Signed-off-by: Louis Peens <[email protected]> Signed-off-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-08net: ethernet: litex: Add the dependency on HAS_IOMEMCai Huoqing1-1/+1
The LiteX driver uses devm io function API which needs HAS_IOMEM enabled, so add the dependency on HAS_IOMEM. Fixes: ee7da21ac4c3 ("net: Add driver for LiteX's LiteETH network interface") Signed-off-by: Cai Huoqing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-08ibmvnic: don't release napi in __ibmvnic_open()Sukadev Bhattiprolu1-4/+9
If __ibmvnic_open() encounters an error such as when setting link state, it calls release_resources() which frees the napi structures needlessly. Instead, have __ibmvnic_open() only clean up the work it did so far (i.e. disable napi and irqs) and leave the rest to the callers. If caller of __ibmvnic_open() is ibmvnic_open(), it should release the resources immediately. If the caller is do_reset() or do_hard_reset(), they will release the resources on the next reset. This fixes following crash that occurred when running the drmgr command several times to add/remove a vnic interface: [102056] ibmvnic 30000003 env3: Disabling rx_scrq[6] irq [102056] ibmvnic 30000003 env3: Disabling rx_scrq[7] irq [102056] ibmvnic 30000003 env3: Replenished 8 pools Kernel attempted to read user page (10) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000010 Faulting instruction address: 0xc000000000a3c840 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries ... CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 5.16.0-rc5-autotest-g6441998e2e37 #1 Workqueue: events_long __ibmvnic_reset [ibmvnic] NIP: c000000000a3c840 LR: c0080000029b5378 CTR: c000000000a3c820 REGS: c0000000548e37e0 TRAP: 0300 Not tainted (5.16.0-rc5-autotest-g6441998e2e37) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28248484 XER: 00000004 CFAR: c0080000029bdd24 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0 GPR00: c0080000029b55d0 c0000000548e3a80 c0000000028f0200 0000000000000000 ... NIP [c000000000a3c840] napi_enable+0x20/0xc0 LR [c0080000029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic] Call Trace: [c0000000548e3a80] [0000000000000006] 0x6 (unreliable) [c0000000548e3ab0] [c0080000029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic] [c0000000548e3b40] [c0080000029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic] [c0000000548e3c60] [c000000000176228] process_one_work+0x288/0x570 [c0000000548e3d00] [c000000000176588] worker_thread+0x78/0x660 [c0000000548e3da0] [c0000000001822f0] kthread+0x1c0/0x1d0 [c0000000548e3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7d2948f8 792307e0 4e800020 60000000 3c4c01eb 384239e0 f821ffd1 39430010 38a0fff6 e92d1100 f9210028 39200000 <e9030010> f9010020 60420000 e9210020 ---[ end trace 5f8033b08fd27706 ]--- Fixes: ed651a10875f ("ibmvnic: Updated reset handling") Reported-by: Abdul Haleem <[email protected]> Signed-off-by: Sukadev Bhattiprolu <[email protected]> Reviewed-by: Dany Madden <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-08gve: Recording rx queue before sending to napiTao Liu1-0/+1
This caused a significant performance degredation when using generic XDP with multiple queues. Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Tao Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-05net: mscc: ocelot: fix all IP traffic getting trapped to CPU with PTP over IPVladimir Oltean1-0/+8
The filters for the PTP trap keys are incorrectly configured, in the sense that is2_entry_set() only looks at trap->key.ipv4.dport or trap->key.ipv6.dport if trap->key.ipv4.proto or trap->key.ipv6.proto is set to IPPROTO_TCP or IPPROTO_UDP. But we don't do that, so is2_entry_set() goes through the "else" branch of the IP protocol check, and ends up installing a rule for "Any IP protocol match" (because msk is also 0). The UDP port is ignored. This means that when we run "ptp4l -i swp0 -4", all IP traffic is trapped to the CPU, which hinders bridging. Fix this by specifying the IP protocol in the VCAP IS2 filters for PTP over UDP. Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-02-04ixgbevf: Require large buffers for build_skb on 82599VFSamuel Mendoza-Jonas1-6/+7
From 4.17 onwards the ixgbevf driver uses build_skb() to build an skb around new data in the page buffer shared with the ixgbe PF. This uses either a 2K or 3K buffer, and offsets the DMA mapping by NET_SKB_PAD + NET_IP_ALIGN. When using a smaller buffer RXDCTL is set to ensure the PF does not write a full 2K bytes into the buffer, which is actually 2K minus the offset. However on the 82599 virtual function, the RXDCTL mechanism is not available. The driver attempts to work around this by using the SET_LPE mailbox method to lower the maximm frame size, but the ixgbe PF driver ignores this in order to keep the PF and all VFs in sync[0]. This means the PF will write up to the full 2K set in SRRCTL, causing it to write NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the buffer. With 4K pages split into two buffers, this means it either writes NET_SKB_PAD + NET_IP_ALIGN bytes past the first buffer (and into the second), or NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the DMA mapping. Avoid this by only enabling build_skb when using "large" buffers (3K). These are placed in each half of an order-1 page, preventing the PF from writing past the end of the mapping. [0]: Technically it only ever raises the max frame size, see ixgbe_set_vf_lpe() in ixgbe_sriov.c Fixes: f15c5ba5b6cd ("ixgbevf: add support for using order 1 pages to receive large frames") Signed-off-by: Samuel Mendoza-Jonas <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-02-03net: sparx5: Fix get_stat64 crash in tcpdumpSteen Hegelund1-1/+1
This problem was found with Sparx5 when the tcpdump tool requests the do_get_stats64 (sparx5_get_stats64) statistic. The portstats pointer was incorrectly incremented when fetching priority based statistics. Fixes: af4b11022e2d (net: sparx5: add ethtool configuration and statistics support) Signed-off-by: Steen Hegelund <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-03net: stmmac: ensure PTP time register reads are consistentYannick Vignon1-7/+12
Even if protected from preemption and interrupts, a small time window remains when the 2 register reads could return inconsistent values, each time the "seconds" register changes. This could lead to an about 1-second error in the reported time. Add logic to ensure the "seconds" and "nanoseconds" values are consistent. Fixes: 92ba6888510c ("stmmac: add the support for PTP hw clock driver") Signed-off-by: Yannick Vignon <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-02net: sparx5: do not refer to skb after passing it onSteen Hegelund1-1/+1
Do not try to use any SKB fields after the packet has been passed up in the receive stack. Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Steen Hegelund <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-02Merge tag 'mlx5-fixes-2022-02-01' of ↵David S. Miller16-53/+98
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2022-02-01 This series provides bug fixes to mlx5 driver. Please pull and let me know if there is any problem. Sorry about the long series, but I had to move the top two patches from net-next to net to help avoiding a build break when kspp branch is merged into linus-next on next merge window. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-02-01Merge branch '1GbE' of ↵Jakub Kicinski3-19/+44
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-02-01 This series contains updates to e1000e driver only. Sasha removes CSME handshake with TGL platform as this is not supported and is causing hardware unit hangs to be reported. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: e1000e: Handshake with CSME starts from ADL platforms e1000e: Separate ADP board type from TGP ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-01net/mlx5e: Avoid field-overflowing memcpy()Kees Cook2-4/+6
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. Use flexible arrays instead of zero-element arrays (which look like they are always overflowing) and split the cross-field memcpy() into two halves that can be appropriately bounds-checked by the compiler. We were doing: #define ETH_HLEN 14 #define VLAN_HLEN 4 ... #define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN) ... struct mlx5e_tx_wqe *wqe = mlx5_wq_cyc_get_wqe(wq, pi); ... struct mlx5_wqe_eth_seg *eseg = &wqe->eth; struct mlx5_wqe_data_seg *dseg = wqe->data; ... memcpy(eseg->inline_hdr.start, xdptxd->data, MLX5E_XDP_MIN_INLINE); target is wqe->eth.inline_hdr.start (which the compiler sees as being 2 bytes in size), but copying 18, intending to write across start (really vlan_tci, 2 bytes). The remaining 16 bytes get written into wqe->data[0], covering byte_count (4 bytes), lkey (4 bytes), and addr (8 bytes). struct mlx5e_tx_wqe { struct mlx5_wqe_ctrl_seg ctrl; /* 0 16 */ struct mlx5_wqe_eth_seg eth; /* 16 16 */ struct mlx5_wqe_data_seg data[]; /* 32 0 */ /* size: 32, cachelines: 1, members: 3 */ /* last cacheline: 32 bytes */ }; struct mlx5_wqe_eth_seg { u8 swp_outer_l4_offset; /* 0 1 */ u8 swp_outer_l3_offset; /* 1 1 */ u8 swp_inner_l4_offset; /* 2 1 */ u8 swp_inner_l3_offset; /* 3 1 */ u8 cs_flags; /* 4 1 */ u8 swp_flags; /* 5 1 */ __be16 mss; /* 6 2 */ __be32 flow_table_metadata; /* 8 4 */ union { struct { __be16 sz; /* 12 2 */ u8 start[2]; /* 14 2 */ } inline_hdr; /* 12 4 */ struct { __be16 type; /* 12 2 */ __be16 vlan_tci; /* 14 2 */ } insert; /* 12 4 */ __be32 trailer; /* 12 4 */ }; /* 12 4 */ /* size: 16, cachelines: 1, members: 9 */ /* last cacheline: 16 bytes */ }; struct mlx5_wqe_data_seg { __be32 byte_count; /* 0 4 */ __be32 lkey; /* 4 4 */ __be64 addr; /* 8 8 */ /* size: 16, cachelines: 1, members: 3 */ /* last cacheline: 16 bytes */ }; So, split the memcpy() so the compiler can reason about the buffer sizes. "pahole" shows no size nor member offset changes to struct mlx5e_tx_wqe nor struct mlx5e_umr_wqe. "objdump -d" shows no meaningful object code changes (i.e. only source line number induced differences and optimizations). Fixes: b5503b994ed5 ("net/mlx5e: XDP TX forwarding support") Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: Use struct_group() for memcpy() regionKees Cook1-1/+1
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. Use struct_group() in struct vlan_ethhdr around members h_dest and h_source, so they can be referenced together. This will allow memcpy() and sizeof() to more easily reason about sizes, improve readability, and avoid future warnings about writing beyond the end of h_dest. "pahole" shows no size nor member offset changes to struct vlan_ethhdr. "objdump -d" shows no object code changes. Fixes: 34802a42b352 ("net/mlx5e: Do not modify the TX SKB") Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: Avoid implicit modify hdr for decap drop ruleRoi Dayan1-1/+2
Currently the driver adds implicit modify hdr action for decap rules on tunnel devices if the port is an ovs port. This is also done if the action is drop and makes the modify hdr redundant and also the FW doesn't support it and will generate a syndrome. kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3) Fix it by adding the implicit modify hdr only for fwd actions. Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device") Fixes: 077cdda764c7 ("net/mlx5e: TC, Fix memory leak with rules with internal port") Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Ariel Levkovich <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP trafficRaed Salem1-2/+11
IPsec Tunnel mode crypto offload software parser (SWP) setting in data path currently always set the inner L4 offset regardless of the encapsulated L4 header type and whether it exists in the first place, this breaks non TCP/UDP traffic as such. Set the SWP inner L4 offset only when the IPsec tunnel encapsulated L4 header protocol is TCP/UDP. While at it fix inner ip protocol read for setting MLX5_ETH_WQE_SWP_INNER_L4_UDP flag to address the case where the ip header protocol is IPv6. Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload") Signed-off-by: Raed Salem <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated trafficRaed Salem1-3/+6
IPsec crypto offload always set the ethernet segment checksum flags with the inner L4 header checksum flag enabled for encapsulated IPsec offloaded packet regardless of the encapsulated L4 header type, and even if it doesn't exists in the first place, this breaks non TCP/UDP traffic as such. Set the inner L4 checksum flag only when the encapsulated L4 header protocol is TCP/UDP using software parser swp_inner_l4_offset field as indication. Fixes: 5cfb540ef27b ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.") Signed-off-by: Raed Salem <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: Don't treat small ceil values as unlimited in HTB offloadMaxim Mikityanskiy1-1/+2
The hardware spec defines max_average_bw == 0 as "unlimited bandwidth". max_average_bw is calculated as `ceil / BYTES_IN_MBIT`, which can become 0 when ceil is small, leading to an undesired effect of having no bandwidth limit. This commit fixes it by rounding up small values of ceil to 1 Mbit/s. Fixes: 214baf22870c ("net/mlx5e: Support HTB offload") Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5: E-Switch, Fix uninitialized variable modactMaor Dickman1-1/+1
The variable modact is not initialized before used in command modify header allocation which can cause command to fail. Fix by initializing modact with zeros. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 8f1e0b97cc70 ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping") Signed-off-by: Maor Dickman <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: Fix handling of wrong devices during bond neteventMaor Dickman1-18/+14
Current implementation of bond netevent handler only check if the handled netdev is VF representor and it missing a check if the VF representor is on the same phys device of the bond handling the netevent. Fix by adding the missing check and optimizing the check if the netdev is VF representor so it will not access uninitialized private data and crashes. BUG: kernel NULL pointer dereference, address: 000000000000036c PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI Workqueue: eth3bond0 bond_mii_monitor [bonding] RIP: 0010:mlx5e_is_uplink_rep+0xc/0x50 [mlx5_core] RSP: 0018:ffff88812d69fd60 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8881cf800000 RCX: 0000000000000000 RDX: ffff88812d69fe10 RSI: 000000000000001b RDI: ffff8881cf800880 RBP: ffff8881cf800000 R08: 00000445cabccf2b R09: 0000000000000008 R10: 0000000000000004 R11: 0000000000000008 R12: ffff88812d69fe10 R13: 00000000fffffffe R14: ffff88820c0f9000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88846fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000036c CR3: 0000000103d80006 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5e_eswitch_uplink_rep+0x31/0x40 [mlx5_core] mlx5e_rep_is_lag_netdev+0x94/0xc0 [mlx5_core] mlx5e_rep_esw_bond_netevent+0xeb/0x3d0 [mlx5_core] raw_notifier_call_chain+0x41/0x60 call_netdevice_notifiers_info+0x34/0x80 netdev_lower_state_changed+0x4e/0xa0 bond_mii_monitor+0x56b/0x640 [bonding] process_one_work+0x1b9/0x390 worker_thread+0x4d/0x3d0 ? rescuer_thread+0x350/0x350 kthread+0x124/0x150 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 Fixes: 7e51891a237f ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule") Signed-off-by: Maor Dickman <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: Fix broken SKB allocation in HW-GROKhalid Manaa1-9/+17
In case the HW doesn't perform header-data split, it will write the whole packet into the data buffer in the WQ, in this case the SHAMPO CQE handler couldn't use the header entry to build the SKB, instead it should allocate a new memory to build the SKB using the function: mlx5e_skb_from_cqe_mpwrq_nonlinear. Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support") Signed-off-by: Khalid Manaa <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: Fix wrong calculation of header index in HW_GROKhalid Manaa2-2/+7
The HW doesn't wrap the CQE.shampo.header_index field according to the headers buffer size, instead it always increases it until reaching overflow of u16 size. Thus the mlx5e_handle_rx_cqe_mpwrq_shampo handler should mask the CQE header_index field to find the actual header index in the headers buffer. Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support") Signed-off-by: Khalid Manaa <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5: Bridge, Fix devlink deadlock on net namespace deletionRoi Dayan1-2/+2
When changing mode to switchdev, rep bridge init registered to netdevice notifier holds the devlink lock and then takes pernet_ops_rwsem. At that time deleting a netns holds pernet_ops_rwsem and then takes the devlink lock. Example sequence is: $ ip netns add foo $ devlink dev eswitch set pci/0000:00:08.0 mode switchdev & $ ip netns del foo deleting netns trace: [ 1185.365555] ? devlink_pernet_pre_exit+0x74/0x1c0 [ 1185.368331] ? mutex_lock_io_nested+0x13f0/0x13f0 [ 1185.370984] ? xt_find_table+0x40/0x100 [ 1185.373244] ? __mutex_lock+0x24a/0x15a0 [ 1185.375494] ? net_generic+0xa0/0x1c0 [ 1185.376844] ? wait_for_completion_io+0x280/0x280 [ 1185.377767] ? devlink_pernet_pre_exit+0x74/0x1c0 [ 1185.378686] devlink_pernet_pre_exit+0x74/0x1c0 [ 1185.379579] ? devlink_nl_cmd_get_dumpit+0x3a0/0x3a0 [ 1185.380557] ? xt_find_table+0xda/0x100 [ 1185.381367] cleanup_net+0x372/0x8e0 changing mode to switchdev trace: [ 1185.411267] down_write+0x13a/0x150 [ 1185.412029] ? down_write_killable+0x180/0x180 [ 1185.413005] register_netdevice_notifier+0x1e/0x210 [ 1185.414000] mlx5e_rep_bridge_init+0x181/0x360 [mlx5_core] [ 1185.415243] mlx5e_uplink_rep_enable+0x269/0x480 [mlx5_core] [ 1185.416464] ? mlx5e_uplink_rep_disable+0x210/0x210 [mlx5_core] [ 1185.417749] mlx5e_attach_netdev+0x232/0x400 [mlx5_core] [ 1185.418906] mlx5e_netdev_attach_profile+0x15b/0x1e0 [mlx5_core] [ 1185.420172] mlx5e_netdev_change_profile+0x15a/0x1d0 [mlx5_core] [ 1185.421459] mlx5e_vport_rep_load+0x557/0x780 [mlx5_core] [ 1185.422624] ? mlx5e_stats_grp_vport_rep_num_stats+0x10/0x10 [mlx5_core] [ 1185.424006] mlx5_esw_offloads_rep_load+0xdb/0x190 [mlx5_core] [ 1185.425277] esw_offloads_enable+0xd74/0x14a0 [mlx5_core] Fix this by registering rep bridges for per net netdev notifier instead of global one, which operats on the net namespace without holding the pernet_ops_rwsem. Fixes: 19e9bfa044f3 ("net/mlx5: Bridge, add offload infrastructure") Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLEDima Chumak1-3/+4
Only prio 1 is supported for nic mode when there is no ignore flow level support in firmware. But for switchdev mode, which supports fixed number of statically pre-allocated prios, this restriction is not relevant so it can be relaxed. Fixes: d671e109bd85 ("net/mlx5: Fix tc max supported prio for nic mode") Signed-off-by: Dima Chumak <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: TC, Reject rules with forward and drop actionsRoi Dayan1-0/+6
Such rules are redundant but allowed and passed to the driver. The driver does not support offloading such rules so return an error. Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads") Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5: Use del_timer_sync in fw reset flow of halting pollMaher Sanalla1-1/+1
Substitute del_timer() with del_timer_sync() in fw reset polling deactivation flow, in order to prevent a race condition which occurs when del_timer() is called and timer is deactivated while another process is handling the timer interrupt. A situation that led to the following call trace: RIP: 0010:run_timer_softirq+0x137/0x420 <IRQ> recalibrate_cpu_khz+0x10/0x10 ktime_get+0x3e/0xa0 ? sched_clock_cpu+0xb/0xc0 __do_softirq+0xf5/0x2ea irq_exit_rcu+0xc1/0xf0 sysvec_apic_timer_interrupt+0x9e/0xc0 asm_sysvec_apic_timer_interrupt+0x12/0x20 </IRQ> Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") Signed-off-by: Maher Sanalla <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: Fix module EEPROM queryGal Pressman1-4/+5
When querying the module EEPROM, there was a misusage of the 'offset' variable vs the 'query.offset' field. Fix that by always using 'offset' and assigning its value to 'query.offset' right before the mcia register read call. While at it, the cross-pages read size adjustment was changed to be more intuitive. Fixes: e19b0a3474ab ("net/mlx5: Refactor module EEPROM query") Reported-by: Wang Yugui <[email protected]> Signed-off-by: Gal Pressman <[email protected]> Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5e: TC, Reject rules with drop and modify hdr actionRoi Dayan1-0/+6
This kind of action is not supported by firmware and generates a syndrome. kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3) Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions") Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5: Bridge, ensure dev_name is null-terminatedVlad Buslov1-1/+1
Even though net_device->name is guaranteed to be null-terminated string of size<=IFNAMSIZ, the test robot complains that return value of netdev_name() can be larger: In file included from include/trace/define_trace.h:102, from drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h:113, from drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c:12: drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h: In function 'trace_event_raw_event_mlx5_esw_bridge_fdb_template': >> drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h:24:29: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 20 [-Wstringop-truncation] 24 | strncpy(__entry->dev_name, | ^~~~~~~~~~~~~~~~~~~~~~~~~~ 25 | netdev_name(fdb->dev), | ~~~~~~~~~~~~~~~~~~~~~~ 26 | IFNAMSIZ); | ~~~~~~~~~ This is caused by the fact that default value of IFNAMSIZ is 16, while placeholder value that is returned by netdev_name() for unnamed net devices is larger than that. The offending code is in a tracing function that is only called for mlx5 representors, so there is no straightforward way to reproduce the issue but let's fix it for correctness sake by replacing strncpy() with strscpy() to ensure that resulting string is always null-terminated. Fixes: 9724fd5d9c2a ("net/mlx5: Bridge, add tracepoints") Reported-by: kernel test robot <[email protected]> Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01net/mlx5: Bridge, take rtnl lock in init error handlerVlad Buslov2-0/+6
The mlx5_esw_bridge_cleanup() is expected to be called with rtnl lock taken, which is true for mlx5e_rep_bridge_cleanup() function but not for error handling code in mlx5e_rep_bridge_init(). Add missing rtnl lock/unlock calls and extend both mlx5_esw_bridge_cleanup() and its dual function mlx5_esw_bridge_init() with ASSERT_RTNL() to verify the invariant from now on. Fixes: 7cd6a54a8285 ("net/mlx5: Bridge, handle FDB events") Fixes: 19e9bfa044f3 ("net/mlx5: Bridge, add offload infrastructure") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-02-01Merge branch '40GbE' of ↵Jakub Kicinski2-2/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-01-31 This series contains updates to i40e driver only. Jedrzej fixes a condition check which would cause an error when resetting bandwidth when DCB is active with one TC. Karen resolves a null pointer dereference that could occur when removing the driver while VSI rings are being disabled. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: Fix reset path while removing the driver i40e: Fix reset bw limit when DCB enabled with 1 TC ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-01ethernet: smc911x: fix indentation in get/set EEPROMJakub Kicinski1-4/+4
Build bot produced a smatch indentation warning, the code looks correct but it mixes spaces and tabs. Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-02-01e1000e: Handshake with CSME starts from ADL platformsSasha Neftin1-2/+4
Handshake with CSME/AMT on none provisioned platforms during S0ix flow is not supported on TGL platform and can cause to HW unit hang. Update the handshake with CSME flow to start from the ADL platform. Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix") Signed-off-by: Sasha Neftin <[email protected]> Tested-by: Nechama Kraus <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-02-01e1000e: Separate ADP board type from TGPSasha Neftin3-17/+40
We have the same LAN controller on different PCH's. Separate ADP board type from a TGP which will allow for specific fixes to be applied for ADP platforms. Suggested-by: Kai-Heng Feng <[email protected]> Suggested-by: Dima Ruinskiy <[email protected]> Signed-off-by: Sasha Neftin <[email protected]> Tested-by: Nechama Kraus <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-01-31net: stmmac: dump gmac4 DMA registers correctlyCamel Guo2-2/+18
Unlike gmac100, gmac1000, gmac4 has 27 DMA registers and they are located at DMA_CHAN_BASE_ADDR (0x1100). In order for ethtool to dump gmac4 DMA registers correctly, this commit checks if a net_device has gmac4 and uses different logic to dump its DMA registers. This fixes the following KASAN warning, which can normally be triggered by a command similar like "ethtool -d eth0": BUG: KASAN: vmalloc-out-of-bounds in dwmac4_dump_dma_regs+0x6d4/0xb30 Write of size 4 at addr ffffffc010177100 by task ethtool/1839 kasan_report+0x200/0x21c __asan_report_store4_noabort+0x34/0x60 dwmac4_dump_dma_regs+0x6d4/0xb30 stmmac_ethtool_gregs+0x110/0x204 ethtool_get_regs+0x200/0x4b0 dev_ethtool+0x1dac/0x3800 dev_ioctl+0x7c0/0xb50 sock_ioctl+0x298/0x6c4 ... Fixes: fbf68229ffe7 ("net: stmmac: unify registers dumps methods") Signed-off-by: Camel Guo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-01-31i40e: Fix reset path while removing the driverKaren Sornek2-1/+19
Fix the crash in kernel while dereferencing the NULL pointer, when the driver is unloaded and simultaneously the VSI rings are being stopped. The hardware requires 50msec in order to finish RX queues disable. For this purpose the driver spins in mdelay function for the operation to be completed. For example changing number of queues which requires reset would fail in the following call stack: 1) i40e_prep_for_reset 2) i40e_pf_quiesce_all_vsi 3) i40e_quiesce_vsi 4) i40e_vsi_close 5) i40e_down 6) i40e_vsi_stop_rings 7) i40e_vsi_control_rx -> disable requires the delay of 50msecs 8) continue back in i40e_down function where i40e_clean_tx_ring(vsi->tx_rings[i]) is going to crash When the driver was spinning vsi_release called i40e_vsi_free_arrays where the vsi->tx_rings resources were freed and the pointer was set to NULL. Fixes: 5b6d4a7f20b0 ("i40e: Fix crash during removing i40e driver") Signed-off-by: Slawomir Laba <[email protected]> Signed-off-by: Sylwester Dziedziuch <[email protected]> Signed-off-by: Karen Sornek <[email protected]> Tested-by: Gurucharan G <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-01-31i40e: Fix reset bw limit when DCB enabled with 1 TCJedrzej Jagielski1-1/+11
There was an AQ error I40E_AQ_RC_EINVAL when trying to reset bw limit as part of bw allocation setup. This was caused by trying to reset bw limit with DCB enabled. Bw limit should not be reset when DCB is enabled. The code was relying on the pf->flags to check if DCB is enabled but if only 1 TC is available this flag will not be set even though DCB is enabled. Add a check for number of TC and if it is 1 don't try to reset bw limit even if pf->flags shows DCB as disabled. Fixes: fa38e30ac73f ("i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled") Suggested-by: Alexander Lobakin <[email protected]> # Flatten the condition Signed-off-by: Sylwester Dziedziuch <[email protected]> Signed-off-by: Jedrzej Jagielski <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Tested-by: Imam Hassan Reza Biswas <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>