aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-03-12Merge branch '100GbE' of ↵David S. Miller5-1/+63
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: GTP support in switchdev Marcin Szycik says: Add support for adding GTP-C and GTP-U filters in switchdev mode. To create a filter for GTP, create a GTP-type netdev with ip tool, enable hardware offload, add qdisc and add a filter in tc: ip link add $GTP0 type gtp role <sgsn/ggsn> hsize <hsize> ethtool -K $PF0 hw-tc-offload on tc qdisc add dev $GTP0 ingress tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \ action mirred egress redirect dev $VF1_PR By default, a filter for GTP-U will be added. To add a filter for GTP-C, specify enc_dst_port = 2123, e.g.: tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \ enc_dst_port 2123 action mirred egress redirect dev $VF1_PR Note: outer IPv6 offload is not supported yet. Note: GTP-U with no payload offload is not supported yet. ICE COMMS package is required to create a filter as it contains GTP profiles. Changes in iproute2 [1] are required to be able to add GTP netdev and use GTP-specific options (QFI and PDU type). [1] https://lore.kernel.org/netdev/20220211182902.11542-1-wojciech.drewek@intel.com/T --- v2: Add more CC v3: Fix mail thread, sorry for spam v4: Add GTP echo response in gtp module v5: Change patch order v6: Add GTP echo request in gtp module v7: Fix kernel-docs in ice v8: Remove handling of GTP Echo Response v9: Add sending of multicast message on GTP Echo Response, fix GTP-C dummy packet selection v10: Rebase, fixed most 80 char line limits v11: Rebase, collect Harald's Reviewed-by on patch 3 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-11net: add per-cpu storage and net->core_statsEric Dumazet2-10/+38
Before adding yet another possibly contended atomic_long_t, it is time to add per-cpu storage for existing ones: dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler Because many devices do not have to increment such counters, allocate the per-cpu storage on demand, so that dev_get_stats() does not have to spend considerable time folding zero counters. Note that some drivers have abused these counters which were supposed to be only used by core networking stack. v4: should use per_cpu_ptr() in dev_get_stats() (Jakub) v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo) v2: add a missing include (reported by kernel test robot <lkp@intel.com>) Change in netdev_core_stats_alloc() (Jakub) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: jeffreyji <jeffreyji@google.com> Reviewed-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11nfp: add support for NFP3800/NFP3803 PCIe devicesDirk van der Merwe1-0/+2
Enable binding the nfp driver to NFP3800 and NFP3803 devices. The PCIE_SRAM offset is different for the NFP3800 device, which also only supports a single explicit group. Changes to Dirk's work: * 48-bit dma addressing is not ready yet. Keep 40-bit dma addressing for NFP3800. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11Merge tag 'wireless-next-2022-03-11' of ↵Jakub Kicinski4-9/+530
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== brcmfmac * add BCM43454/6 support rtw89 * add support for 160 MHz channels and 6 GHz band * hardware scan support iwlwifi * support UHB TAS enablement via BIOS * remove a bunch of W=1 warnings * add support for channel switch offload * support 32 Rx AMPDU sessions in newer devices * add support for a couple of new devices * add support for band disablement via BIOS mt76 * mt7915 thermal management improvements * SAR support for more mt76 drivers * mt7986 wmac support on mt7915 ath11k * debugfs interface to configure firmware debug log level * debugfs interface to test Target Wake Time (TWT) * provide 802.11ax High Efficiency (HE) data via radiotap ath9k * use hw_random API instead of directly dumping into random.c wcn36xx * fix wcn3660 to work on 5 GHz band ath6kl * add device ID for WLU5150-D81 cfg80211/mac80211 * initial EHT (from 802.11be) support (EHT rates, 320 MHz, larger block-ack) * support disconnect on HW restart * tag 'wireless-next-2022-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (247 commits) mac80211: Add support to trigger sta disconnect on hardware restart mac80211: fix potential double free on mesh join mac80211: correct legacy rates check in ieee80211_calc_rx_airtime nl80211: fix typo of NL80211_IF_TYPE_OCB in documentation mac80211: Use GFP_KERNEL instead of GFP_ATOMIC when possible mac80211: replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE rtw89: 8852c: process logic efuse map rtw89: 8852c: process efuse of phycap rtw89: support DAV efuse reading operation rtw89: 8852c: add chip::dle_mem rtw89: add page_regs to handle v1 chips rtw89: add chip_info::{h2c,c2h}_reg to support more chips rtw89: add hci_func_en_addr to support variant generation rtw89: add power_{on/off}_func rtw89: read chip version depends on chip ID rtw89: pci: use a struct to describe all registers address related to DMA channel rtw89: pci: add V1 of PCI channel address rtw89: pci: add struct rtw89_pci_info rtw89: 8852c: add 8852c empty files MAINTAINERS: add devicetree bindings entry for mt76 ... ==================== Link: https://lore.kernel.org/r/20220311124029.213470-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11gtp: Add support for checking GTP device typeWojciech Drewek1-0/+6
Add a function that checks if a net device type is GTP. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11net/sched: Allow flower to match on GTP optionsWojciech Drewek3-1/+23
Options are as follows: PDU_TYPE:QFI and they refernce to the fields from the PDU Session Protocol. PDU Session data is conveyed in GTP-U Extension Header. GTP-U Extension Header is described in 3GPP TS 29.281. PDU Session Protocol is described in 3GPP TS 38.415. PDU_TYPE - indicates the type of the PDU Session Information (4 bits) QFI - QoS Flow Identifier (6 bits) # ip link add gtp_dev type gtp role sgsn # tc qdisc add dev gtp_dev ingress # tc filter add dev gtp_dev protocol ip parent ffff: \ flower \ enc_key_id 11 \ gtp_opts 1:8/ff:ff \ action mirred egress redirect dev eth0 Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11gtp: Implement GTP echo requestWojciech Drewek1-0/+1
Adding GTP device through ip link creates the situation where GTP instance is not able to send GTP echo requests. Echo requests are used to check if GTP peer is still alive. With this patch, gtp_genl_ops are extended by new cmd (GTP_CMD_ECHOREQ) which allows to send echo request in the given version of GTP protocol (v0 or v1), from the given ms address to he given peer. TID is not inclued because in all path management messages it should be equal to 0. When GTP echo response is detected, multicast message is send to everyone in the gtp_genl_family. Message contains GTP version, ms address and peer address. Suggested-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11gtp: Implement GTP echo responseWojciech Drewek2-0/+32
Adding GTP device through ip link creates the situation where there is no userspace daemon which would handle GTP messages (Echo Request for example). GTP-U instance which would not respond to echo requests would violate GTP specification. When GTP packet arrives with GTP_ECHO_REQ message type, GTP_ECHO_RSP is send to the sender. GTP_ECHO_RSP message should contain information element with GTPIE_RECOVERY tag and restart counter value. For GTPv1 restart counter is not used and should be equal to 0, for GTPv0 restart counter contains information provided from userspace(IFLA_GTP_RESTART_COUNT). Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Suggested-by: Harald Welte <laforge@gnumonks.org> Reviewed-by: Harald Welte <laforge@gnumonks.org> Tested-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11gtp: Allow to create GTP device without FDsWojciech Drewek1-0/+1
Currently, when the user wants to create GTP device, he has to provide file handles to the sockets created in userspace (IFLA_GTP_FD0, IFLA_GTP_FD1). This behaviour is not ideal, considering the option of adding support for GTP device creation through ip link. Ip link application is not a good place to create such sockets. This patch allows to create GTP device without providing IFLA_GTP_FD0 and IFLA_GTP_FD1 arguments. If the user sets IFLA_GTP_CREATE_SOCKETS attribute, then GTP module takes care of creating UDP sockets by itself. Sockets are created with the commonly known UDP ports used for GTP protocol (GTP0_PORT and GTP1U_PORT). In this case we don't have to provide encap_destroy because no extra deinitialization is needed, everything is covered by udp_tunnel_sock_release. Note: GTP instance created with only this change applied, does not handle GTP Echo Requests. This is implemented in the following patch. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11mac80211: Add support to trigger sta disconnect on hardware restartYoughandhar Chintala1-0/+10
Currently in case of target hardware restart, we just reconfig and re-enable the security keys and enable the network queues to start data traffic back from where it was interrupted. Many ath10k wifi chipsets have sequence numbers for the data packets assigned by firmware and the mac sequence number will restart from zero after target hardware restart leading to mismatch in the sequence number expected by the remote peer vs the sequence number of the frame sent by the target firmware. This mismatch in sequence number will cause out-of-order packets on the remote peer and all the frames sent by the device are dropped until we reach the sequence number which was sent before we restarted the target hardware In order to fix this, we trigger a sta disconnect, in case of target hw restart. After this there will be a fresh connection and thereby avoiding the dropping of frames by remote peer. The right fix would be to pull the entire data path into the host which is not feasible or would need lots of complex changes and will still be inefficient. Tested on ath10k using WCN3990, QCA6174 Signed-off-by: Youghandhar Chintala <youghand@codeaurora.org> Link: https://lore.kernel.org/r/20220308115325.5246-2-youghand@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-11powerpc/net: Implement powerpc specific csum_shift() to remove branchChristophe Leroy1-0/+2
Today's implementation of csum_shift() leads to branching based on parity of 'offset' 000002f8 <csum_block_add>: 2f8: 70 a5 00 01 andi. r5,r5,1 2fc: 41 a2 00 08 beq 304 <csum_block_add+0xc> 300: 54 84 c0 3e rotlwi r4,r4,24 304: 7c 63 20 14 addc r3,r3,r4 308: 7c 63 01 94 addze r3,r3 30c: 4e 80 00 20 blr Use first bit of 'offset' directly as input of the rotation instead of branching. 000002f8 <csum_block_add>: 2f8: 54 a5 1f 38 rlwinm r5,r5,3,28,28 2fc: 20 a5 00 20 subfic r5,r5,32 300: 5c 84 28 3e rotlw r4,r4,r5 304: 7c 63 20 14 addc r3,r3,r4 308: 7c 63 01 94 addze r3,r3 30c: 4e 80 00 20 blr And change to left shift instead of right shift to skip one more instruction. This has no impact on the final sum. 000002f8 <csum_block_add>: 2f8: 54 a5 1f 38 rlwinm r5,r5,3,28,28 2fc: 5c 84 28 3e rotlw r4,r4,r5 300: 7c 63 20 14 addc r3,r3,r4 304: 7c 63 01 94 addze r3,r3 308: 4e 80 00 20 blr Seems like only powerpc benefits from a branchless implementation. Other main architectures like ARM or X86 get better code with the generic implementation and its branch. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-11nl80211: fix typo of NL80211_IF_TYPE_OCB in documentationVeerendranath Jakkam1-1/+1
It should be NL80211_IFTYPE_OCB instead. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/1645542399-4680-1-git-send-email-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-10net/mlx5: Parse module mapping using mlx5_ifcGal Pressman2-5/+4
The assumption that the first byte in the module mapping dword is the module number shouldn't be hard-coded in the driver, but come from mlx5_ifc structs. While at it, fix the incorrect width for the 'rx_lane' and 'tx_lane' fields. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-10net/mlx5: Query the maximum MCIA register read size from firmwareGal Pressman2-2/+3
The MCIA register supports either 12 or 32 dwords, use the correct value by querying the capability from the MCAM register. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-10net: openvswitch: fix uAPI incompatibility with existing user spaceIlya Maximets1-4/+14
Few years ago OVS user space made a strange choice in the commit [1] to define types only valid for the user space inside the copy of a kernel uAPI header. '#ifndef __KERNEL__' and another attribute was added later. This leads to the inevitable clash between user space and kernel types when the kernel uAPI is extended. The issue was unveiled with the addition of a new type for IPv6 extension header in kernel uAPI. When kernel provides the OVS_KEY_ATTR_IPV6_EXTHDRS attribute to the older user space application, application tries to parse it as OVS_KEY_ATTR_PACKET_TYPE and discards the whole netlink message as malformed. Since OVS_KEY_ATTR_IPV6_EXTHDRS is supplied along with every IPv6 packet that goes to the user space, IPv6 support is fully broken. Fixing that by bringing these user space attributes to the kernel uAPI to avoid the clash. Strictly speaking this is not the problem of the kernel uAPI, but changing it is the only way to avoid breakage of the older user space applications at this point. These 2 types are explicitly rejected now since they should not be passed to the kernel. Additionally, OVS_KEY_ATTR_TUNNEL_INFO moved out from the '#ifdef __KERNEL__' as there is no good reason to hide it from the userspace. And it's also explicitly rejected now, because it's for in-kernel use only. Comments with warnings were added to avoid the problem coming back. (1 << type) converted to (1ULL << type) to avoid integer overflow on OVS_KEY_ATTR_IPV6_EXTHDRS, since it equals 32 now. [1] beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") Fixes: 28a3f0601727 ("net: openvswitch: IPv6: Add IPv6 extension header support") Link: https://lore.kernel.org/netdev/3adf00c7-fe65-3ef4-b6d7-6d8a0cad8a5f@nvidia.com Link: https://github.com/openvswitch/ovs/commit/beb75a40fdc295bfd6521b0068b4cd12f6de507c Reported-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20220309222033.3018976-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10Merge tag 'linux-can-next-for-5.18-20220310' of ↵Jakub Kicinski1-6/+22
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-03-10 The first 3 patches are by Oliver Hartkopp, target the CAN ISOTP protocol and update the CAN frame sending behavior, and increases the max PDU size to 64 kByte. The next 2 patches are also by Oliver Hartkopp and update the virtual VXCAN driver so that CAN frames send into the peer name space show up as RX'ed CAN frames. Vincent Mailhol contributes a patch for the etas_es58x driver to fix a false positive dereference uninitialized variable warning. 2 patches by Ulrich Hecht add r8a779a0 SoC support to the rcar_canfd driver. The remaining 21 patches target the gs_usb driver and are by Peter Fink, Ben Evans, Eric Evenchick and me. This series cleans up the gs-usb driver, documents some bits of the USB ABI used by the widely used open source firmware candleLight, adds support for up to 3 CAN interfaces per USB device, adds CAN-FD support, adds quirks for some hardware and software workarounds and finally adds support for 2 new devices. * tag 'linux-can-next-for-5.18-20220310' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (29 commits) can: gs_usb: add VID/PID for ABE CAN Debugger devices can: gs_usb: add VID/PID for CES CANext FD devices can: gs_usb: add extended bt_const feature can: gs_usb: activate quirks for CANtact Pro unconditionally can: gs_usb: add quirk for CANtact Pro overlapping GS_USB_BREQ value can: gs_usb: add usb quirk for NXP LPC546xx controllers can: gs_usb: add CAN-FD support can: gs_usb: use union and FLEX_ARRAY for data in struct gs_host_frame can: gs_usb: support up to 3 channels per device can: gs_usb: gs_usb_probe(): introduce udev and make use of it can: gs_usb: document the PAD_PKTS_TO_MAX_PKT_SIZE feature can: gs_usb: document the USER_ID feature can: gs_usb: update GS_CAN_FEATURE_IDENTIFY documentation can: gs_usb: add HW timestamp mode bit can: gs_usb: gs_make_candev(): call SET_NETDEV_DEV() after handling all bt_const->feature can: gs_usb: rewrap usb_control_msg() and usb_fill_bulk_urb() can: gs_usb: rewrap error messages can: gs_usb: GS_CAN_FLAG_OVERFLOW: make use of BIT() can: gs_usb: sort include files alphabetically can: gs_usb: fix checkpatch warning ... ==================== Link: https://lore.kernel.org/r/20220310142903.341658-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski16-56/+136
net/dsa/dsa2.c commit afb3cc1a397d ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails") commit e83d56537859 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master") https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/ drivers/net/ethernet/intel/ice/ice.h commit 97b0129146b1 ("ice: Fix error with handling of bonding MTU") commit 43113ff73453 ("ice: add TTY for GNSS module for E810T device") https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/ drivers/staging/gdm724x/gdm_lte.c commit fc7f750dc9d1 ("staging: gdm724x: fix use after free in gdm_lte_rx()") commit 4bcc4249b4cf ("staging: Use netif_rx().") https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10Merge tag 'net-5.17-rc8' of ↵Linus Torvalds4-5/+8
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, and ipsec. Current release - regressions: - Bluetooth: fix unbalanced unlock in set_device_flags() - Bluetooth: fix not processing all entries on cmd_sync_work, make connect with qualcomm and intel adapters reliable - Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0" - xdp: xdp_mem_allocator can be NULL in trace_mem_connect() - eth: ice: fix race condition and deadlock during interface enslave Current release - new code bugs: - tipc: fix incorrect order of state message data sanity check Previous releases - regressions: - esp: fix possible buffer overflow in ESP transformation - dsa: unlock the rtnl_mutex when dsa_master_setup() fails - phy: meson-gxl: fix interrupt handling in forced mode - smsc95xx: ignore -ENODEV errors when device is unplugged Previous releases - always broken: - xfrm: fix tunnel mode fragmentation behavior - esp: fix inter address family tunneling on GSO - tipc: fix null-deref due to race when enabling bearer - sctp: fix kernel-infoleak for SCTP sockets - eth: macb: fix lost RX packet wakeup race in NAPI receive - eth: intel stop disabling VFs due to PF error responses - eth: bcmgenet: don't claim WOL when its not available" * tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) xdp: xdp_mem_allocator can be NULL in trace_mem_connect(). ice: Fix race condition during interface enslave net: phy: meson-gxl: improve link-up behavior net: bcmgenet: Don't claim WOL when its not available net: arc_emac: Fix use after free in arc_mdio_probe() sctp: fix kernel-infoleak for SCTP sockets net: phy: correct spelling error of media in documentation net: phy: DP83822: clear MISR2 register to disable interrupts gianfar: ethtool: Fix refcount leak in gfar_get_ts_info selftests: pmtu.sh: Kill nettest processes launched in subshell. selftests: pmtu.sh: Kill tcpdump processes launched by subshell. NFC: port100: fix use-after-free in port100_send_complete net/mlx5e: SHAMPO, reduce TIR indication net/mlx5e: Lag, Only handle events from highest priority multipath entry net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE net/mlx5: Fix a race on command flush flow net/mlx5: Fix size field in bufferx_reg struct ax25: Fix NULL pointer dereference in ax25_kill_by_device net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr net: ethernet: lpc_eth: Handle error for clk_enable ...
2022-03-10net: phy: correct spelling error of media in documentationColin Foster1-2/+2
The header file incorrectly referenced "median-independant interface" instead of media. Correct this typo. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Fixes: 4069a572d423 ("net: phy: Document core PHY structures") Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20220309062544.3073-1-colin.foster@in-advantage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10Merge tag 'mlx5-updates-2022-03-09' of ↵Jakub Kicinski2-12/+35
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-03-09 1) Remove kernel log prints on FW events regarding FW pages management and replace that with debugfs entries to track FW pages management commands failures and general stats, we do that for all FW commands in general since it's the same effort to do so under the already existing debugfs entry for FW commands. 2) Add support for ConnectX-7 Software managed steering, in other words STEv2 which shares a lot in common with STE V1, the difference is in specific offsets in the devices, the logic is almost the same, thus we implement STEv1 and STEv2 in the same file. * tag 'mlx5-updates-2022-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Add support for ConnectX-7 steering net/mlx5: DR, Refactor ste_ctx handling for STE v0/1 net/mlx5: DR, Rename action modify fields to reflect naming in HW spec net/mlx5: DR, Fix handling of different actions on the same STE in STEv1 net/mlx5: DR, Remove unneeded comments net/mlx5: DR, Add support for matching on Internet Header Length (IHL) net/mlx5: DR, Align mlx5dv_dr API vport action with FW behavior net/mlx5: Add debugfs counters for page commands failures net/mlx5: Add pages debugfs net/mlx5: Move debugfs entries to separate struct net/mlx5: Change release_all_pages cap bit location net/mlx5: Remove redundant error on reclaim pages net/mlx5: Remove redundant error on give pages net/mlx5: Remove redundant notify fail on give pages net/mlx5: Add command failures data to debugfs net/mlx5e: TC, Fix use after free in mlx5e_clone_flow_attr_for_post_act() ==================== Link: https://lore.kernel.org/r/20220309213755.610202-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10can: isotp: set default value for N_As to 50 micro secondsOliver Hartkopp1-6/+22
The N_As value describes the time a CAN frame needs on the wire when transmitted by the CAN controller. Even very short CAN FD frames need arround 100 usecs (bitrate 1Mbit/s, data bitrate 8Mbit/s). Having N_As to be zero (the former default) leads to 'no CAN frame separation' when STmin is set to zero by the receiving node. This 'burst mode' should not be enabled by default as it could potentially dump a high number of CAN frames into the netdev queue from the soft hrtimer context. This does not affect the system stability but is just not nice and cooperative. With this N_As/frame_txtime value the 'burst mode' is disabled by default. As user space applications usually do not set the frame_txtime element of struct can_isotp_options the new in-kernel default is very likely overwritten with zero when the sockopt() CAN_ISOTP_OPTS is invoked. To make sure that a N_As value of zero is only set intentional the value '0' is now interpreted as 'do not change the current value'. When a frame_txtime of zero is required for testing purposes this CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime. Link: https://lore.kernel.org/all/20220309120416.83514-2-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-03-09Merge tag 'xsa396-5.17-tag' of ↵Linus Torvalds1-2/+17
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Several Linux PV device frontends are using the grant table interfaces for removing access rights of the backends in ways being subject to race conditions, resulting in potential data leaks, data corruption by malicious backends, and denial of service triggered by malicious backends: - blkfront, netfront, scsifront and the gntalloc driver are testing whether a grant reference is still in use. If this is not the case, they assume that a following removal of the granted access will always succeed, which is not true in case the backend has mapped the granted page between those two operations. As a result the backend can keep access to the memory page of the guest no matter how the page will be used after the frontend I/O has finished. The xenbus driver has a similar problem, as it doesn't check the success of removing the granted access of a shared ring buffer. - blkfront, netfront, scsifront, usbfront, dmabuf, xenbus, 9p, kbdfront, and pvcalls are using a functionality to delay freeing a grant reference until it is no longer in use, but the freeing of the related data page is not synchronized with dropping the granted access. As a result the backend can keep access to the memory page even after it has been freed and then re-used for a different purpose. - netfront will fail a BUG_ON() assertion if it fails to revoke access in the rx path. This will result in a Denial of Service (DoS) situation of the guest which can be triggered by the backend" * tag 'xsa396-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/netfront: react properly to failing gnttab_end_foreign_access_ref() xen/gnttab: fix gnttab_end_foreign_access() without page specified xen/pvcalls: use alloc/free_pages_exact() xen/9p: use alloc/free_pages_exact() xen/usb: don't use gnttab_end_foreign_access() in xenhcd_gnttab_done() xen: remove gnttab_query_foreign_access() xen/gntalloc: don't use gnttab_query_foreign_access() xen/scsifront: don't use gnttab_query_foreign_access() for mapped status xen/netfront: don't use gnttab_query_foreign_access() for mapped status xen/blkfront: don't use gnttab_query_foreign_access() for mapped status xen/grant-table: add gnttab_try_end_foreign_access() xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
2022-03-09tcp: adjust TSO packet sizes based on min_rttEric Dumazet1-1/+2
Back when tcp_tso_autosize() and TCP pacing were introduced, our focus was really to reduce burst sizes for long distance flows. The simple heuristic of using sk_pacing_rate/1024 has worked well, but can lead to too small packets for hosts in the same rack/cluster, when thousands of flows compete for the bottleneck. Neal Cardwell had the idea of making the TSO burst size a function of both sk_pacing_rate and tcp_min_rtt() Indeed, for local flows, sending bigger bursts is better to reduce cpu costs, as occasional losses can be repaired quite fast. This patch is based on Neal Cardwell implementation done more than two years ago. bbr is adjusting max_pacing_rate based on measured bandwidth, while cubic would over estimate max_pacing_rate. /proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable this new feature, in logarithmic steps. Tested: 100Gbit NIC, two hosts in the same rack, 4K MTU. 600 flows rate-limited to 20000000 bytes per second. Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO) ~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log ~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered" 96005 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000': 65,945.29 msec task-clock # 2.845 CPUs utilized 1,314,632 context-switches # 19935.279 M/sec 5,292 cpu-migrations # 80.249 M/sec 940,641 page-faults # 14264.023 M/sec 201,117,030,926 cycles # 3049769.216 GHz (83.45%) 17,699,435,405 stalled-cycles-frontend # 8.80% frontend cycles idle (83.48%) 136,584,015,071 stalled-cycles-backend # 67.91% backend cycles idle (83.44%) 53,809,530,436 instructions # 0.27 insn per cycle # 2.54 stalled cycles per insn (83.36%) 9,062,315,523 branches # 137422329.563 M/sec (83.22%) 153,008,621 branch-misses # 1.69% of all branches (83.32%) 23.182970846 seconds time elapsed TcpInSegs 15648792 0.0 TcpOutSegs 58659110 0.0 # Average of 3.7 4K segments per TSO packet TcpExtTCPDelivered 58654791 0.0 TcpExtTCPDeliveredCE 19 0.0 After patch: ~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log ~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered" 96046 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000': 48,982.58 msec task-clock # 2.104 CPUs utilized 186,014 context-switches # 3797.599 M/sec 3,109 cpu-migrations # 63.472 M/sec 941,180 page-faults # 19214.814 M/sec 153,459,763,868 cycles # 3132982.807 GHz (83.56%) 12,069,861,356 stalled-cycles-frontend # 7.87% frontend cycles idle (83.32%) 120,485,917,953 stalled-cycles-backend # 78.51% backend cycles idle (83.24%) 36,803,672,106 instructions # 0.24 insn per cycle # 3.27 stalled cycles per insn (83.18%) 5,947,266,275 branches # 121417383.427 M/sec (83.64%) 87,984,616 branch-misses # 1.48% of all branches (83.43%) 23.281200256 seconds time elapsed TcpInSegs 1434706 0.0 TcpOutSegs 58883378 0.0 # Average of 41 4K segments per TSO packet TcpExtTCPDelivered 58878971 0.0 TcpExtTCPDeliveredCE 9664 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net/tls: Provide {__,}tls_driver_ctx() unconditionallyDimitris Michailidis1-2/+0
Having the definitions of {__,}tls_driver_ctx() under an #if guard means code referencing them also needs to rely on the preprocessor. The protection doesn't appear needed so make the definitions unconditional. Fixes: db37bc177dae ("net/funeth: add the data path") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09ptp: idt82p33: use rsmu driver to access i2c/spi busMin Li1-0/+3
rsmu (Renesas Synchronization Management Unit ) driver is located in drivers/mfd and responsible for creating multiple devices including idt82p33 phc, which will then use the exposed regmap and mutex handle to access i2c/spi bus. Signed-off-by: Min Li <min.li.xe@renesas.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/1646748651-16811-1-git-send-email-min.li.xe@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net/mlx5: DR, Add support for ConnectX-7 steeringYevgeny Kliteynik1-0/+1
Add support for a new SW format version that is implemented by ConnectX-7. Except for several differences, the STEv2 is identical to STEv1, so for most callbacks the STEv2 context struct will call STEv1 functions. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09net/mlx5: DR, Add support for matching on Internet Header Length (IHL)Yevgeny Kliteynik1-1/+4
Add support for matching on new field - Internet Header Length (IHL). Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09net/mlx5: Add debugfs counters for page commands failuresMoshe Shemesh1-0/+3
Add the following new debugfs counters for debug and verbosity: fw_pages_alloc_failed - number of pages FW requested but driver failed to allocate. give_pages_dropped - number of pages given to FW, but command give pages failed by FW. reclaim_pages_discard - number of pages which were about to reclaim back and FW failed the command. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09net/mlx5: Add pages debugfsMoshe Shemesh1-3/+6
Add pages debugfs to expose the following counters for debuggability: fw_pages_total - How many pages were given to FW and not returned yet. vfs_pages - For SRIOV, how many pages were given to FW for virtual functions usage. host_pf_pages - For ECPF, how many pages were given to FW for external hosts physical functions usage. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09net/mlx5: Move debugfs entries to separate structMoshe Shemesh1-7/+10
Move the debugfs entry pointers under priv to their own struct. Add get function for device debugfs root. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09net/mlx5: Change release_all_pages cap bit locationMoshe Shemesh1-1/+2
mlx5 FW has changed release_all_pages cap bit by one bit offset to reflect a fix in the FW flow for release_all_pages. The driver should use the new bit to ensure it calls release_all_pages only if the FW fix is there. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09net/mlx5: Add command failures data to debugfsMoshe Shemesh1-0/+9
Add new counters to command interface debugfs to count command failures. The following counters added: total_failed - number of times command failed (any kind of failure). failed_mbox_status - number of times command failed on bad status returned by FW. In addition, add data about last command failure to command interface debugfs: last_failed_errno - last command failed returned errno. last_failed_mbox_status - last bad status returned by FW. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09net/mlx5e: SHAMPO, reduce TIR indicationBen Ben-Ishay1-1/+0
SHAMPO is an RQ / WQ feature, an indication was added to the TIR in the first place to enforce suitability between connected TIR and RQ, this enforcement does not exist in current the Firmware implementation and was redundant in the first place. Fixes: 83439f3c37aa ("net/mlx5e: Add HW-GRO offload") Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09net/mlx5: Fix size field in bufferx_reg structMohammad Kabat1-2/+2
According to HW spec the field "size" should be 16 bits in bufferx register. Fixes: e281682bf294 ("net/mlx5_core: HW data structs/types definitions cleanup") Signed-off-by: Mohammad Kabat <mohammadkab@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09net: tcp: fix shim definition of tcp_inbound_md5_hashVladimir Oltean1-1/+1
When CONFIG_TCP_MD5SIG isn't enabled, there is a compilation bug due to the fact that the static inline definition of tcp_inbound_md5_hash() has an unexpected semicolon. Remove it. Fixes: 1330b6ef3313 ("skb: make drop reason booleanable") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220309122012.668986-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09Merge branch 'master' of ↵David S. Miller2-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-03-09 1) Fix IPv6 PMTU discovery for xfrm interfaces. From Lina Wang. 2) Revert failing for policies and states that are configured with XFRMA_IF_ID 0. It broke a user configuration. From Kai Lueke. 3) Fix a possible buffer overflow in the ESP output path. 4) Fix ESP GSO for tunnel and BEET mode on inter address family tunnels. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09skb: make drop reason booleanableJakub Kicinski2-10/+12
We have a number of cases where function returns drop/no drop decision as a boolean. Now that we want to report the reason code as well we have to pass extra output arguments. We can make the reason code evaluate correctly as bool. I believe we're good to reorder the reasons as they are reported to user space as strings. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09net: dsa: felix: avoid early deletion of host FDB entriesVladimir Oltean1-0/+6
The Felix driver declares FDB isolation but puts all standalone ports in VID 0. This is mostly problem-free as discussed with Alvin here: https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870 however there is one catch. DSA still thinks that FDB entries are installed on the CPU port as many times as there are user ports, and this is problematic when multiple user ports share the same MAC address. Consider the default case where all user ports inherit their MAC address from the DSA master, and then the user runs: ip link set swp0 address 00:01:02:03:04:05 The above will make dsa_slave_set_mac_address() call dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's standalone database, and dsa_port_standalone_host_fdb_del() for the old address of swp0, again in swp0's standalone database. Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down to the felix driver, which will end up deleting the old MAC address from the CPU port. But this is still in use by other user ports, so we end up breaking unicast termination for them. There isn't a problem in the fact that DSA keeps track of host standalone addresses in the individual database of each user port: some drivers like sja1105 need this. There also isn't a problem in the fact that some drivers choose the same VID/FID for all standalone ports. It is just that the deletion of these host addresses must be delayed until they are known to not be in use any longer, and only the driver has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and &cpu_db->mdbs, it is just a matter of walking over those lists and see whether the same MAC address is present on the CPU port in the port db of another user port. I have considered reusing the generic dsa_port_walk_fdbs() and dsa_port_walk_mdbs() schemes for this, but locking makes it difficult. In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that we introduce an unlocked variant of the address iterator, we'd still need some relatively complex data structures, and a void *ctx in the dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are able to figure out, after iterating, whether the same MAC address is or isn't present in the port db of another port. All the above, plus the fact that I expect other drivers to follow the same model as felix where all standalone ports use the same FID, made me conclude that a generic method provided by DSA is necessary: dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this from the ->port_fdb_del() handler for the CPU port, when the database was classified to either a port db, or a LAG db. For symmetry, we also call this from ->port_fdb_add(), because if the address was installed once, then installing it a second time serves no purpose: it's already in hardware in VID 0 and it affects all standalone ports. This change moves dsa_db_equal() from switch.c to dsa.c, since it now has one more caller. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-08mptcp: introduce implicit endpointsPaolo Abeni1-0/+1
In some edge scenarios, an MPTCP subflows can use a local address mapped by a "implicit" endpoint created by the in-kernel path manager. Such endpoints presence can be confusing, as it's creation is hard to track and will prevent the later endpoint creation from the user-space using the same address. Define a new endpoint flag to mark implicit endpoints and allow the user-space to replace implicit them with user-provided data at endpoint creation time. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08mptcp: add tracepoint in mptcp_sendmsg_fragGeliang Tang1-0/+4
The tracepoint in get_mapping_status() only dumped the incoming mpext fields. This patch added a new tracepoint in mptcp_sendmsg_frag() to dump the outgoing mpext too. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08Merge tag 'fuse-fixes-5.17-rc8' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: - Fix an issue with splice on the fuse device - Fix a regression in the fileattr API conversion - Add a small userspace API improvement * tag 'fuse-fixes-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix pipe buffer lifetime for direct_io fuse: move FUSE_SUPER_MAGIC definition to magic.h fuse: fix fileattr op failure
2022-03-08Merge tag 'arm64-spectre-bhb-for-v5.17-2' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 spectre fixes from James Morse: "ARM64 Spectre-BHB mitigations: - Make EL1 vectors per-cpu - Add mitigation sequences to the EL1 and EL2 vectors on vulnerble CPUs - Implement ARCH_WORKAROUND_3 for KVM guests - Report Vulnerable when unprivileged eBPF is enabled" * tag 'arm64-spectre-bhb-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting arm64: Use the clearbhb instruction in mitigations KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated arm64: Mitigate spectre style branch history side channels arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2 arm64: Add percpu vectors for EL1 arm64: entry: Add macro for reading symbol addresses from the trampoline arm64: entry: Add vectors that have the bhb mitigation sequences arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations arm64: entry: Allow the trampoline text to occupy multiple pages arm64: entry: Make the kpti trampoline's kpti sequence optional arm64: entry: Move trampoline macros out of ifdef'd section arm64: entry: Don't assume tramp_vectors is the start of the vectors arm64: entry: Allow tramp_alias to access symbols after the 4K boundary arm64: entry: Move the trampoline data page before the text page arm64: entry: Free up another register on kpti's tramp_exit path arm64: entry: Make the trampoline cleanup optional KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit arm64: entry.S: Add ventry overflow sanity checks
2022-03-08net: phy: exported the genphy_read_master_slave functionArun Ramadoss1-0/+1
genphy_read_master_slave function allows to configure the master/slave for gigabit phys only. In order to use this function irrespective of speed, moved the speed check to the genphy_read_status call. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-07Merge tag 'x86_bugs_for_v5.17' of ↵Linus Torvalds1-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 spectre fixes from Borislav Petkov: - Mitigate Spectre v2-type Branch History Buffer attacks on machines which support eIBRS, i.e., the hardware-assisted speculation restriction after it has been shown that such machines are vulnerable even with the hardware mitigation. - Do not use the default LFENCE-based Spectre v2 mitigation on AMD as it is insufficient to mitigate such attacks. Instead, switch to retpolines on all AMD by default. - Update the docs and add some warnings for the obviously vulnerable cmdline configurations. * tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT x86/speculation: Warn about Spectre v2 LFENCE mitigation x86/speculation: Update link to AMD speculation whitepaper x86/speculation: Use generic retpoline by default on AMD x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting Documentation/hw-vuln: Update spectre doc x86/speculation: Add eIBRS + Retpoline options x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
2022-03-07Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds3-8/+14
Pull virtio fixes from Michael Tsirkin: "Some last minute fixes that took a while to get ready. Not regressions, but they look safe and seem to be worth to have" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: tools/virtio: handle fallout from folio work tools/virtio: fix virtio_test execution vhost: remove avail_event arg from vhost_update_avail_event() virtio: drop default for virtio-mem vdpa: fix use-after-free on vp_vdpa_remove virtio-blk: Remove BUG_ON() in virtio_queue_rq() virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero vhost: fix hung thread due to erroneous iotlb entries vduse: Fix returning wrong type in vduse_domain_alloc_iova() vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command vdpa/mlx5: should verify CTRL_VQ feature exists for MQ vdpa: factor out vdpa_set_features_unlocked for vdpa internal use virtio_console: break out of buf poll on remove virtio: document virtio_reset_device virtio: acknowledge all features before access virtio: unexport virtio_finalize_features
2022-03-07swiotlb: rework "fix info leak with DMA_FROM_DEVICE"Halil Pasic1-8/+0
Unfortunately, we ended up merging an old version of the patch "fix info leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph (the swiotlb maintainer), he asked me to create an incremental fix (after I have pointed this out the mix up, and asked him for guidance). So here we go. The main differences between what we got and what was agreed are: * swiotlb_sync_single_for_device is also required to do an extra bounce * We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters * The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE must take precedence over DMA_ATTR_SKIP_CPU_SYNC Thus this patch removes DMA_ATTR_OVERWRITE, and makes swiotlb_sync_single_for_device() bounce unconditionally (that is, also when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale data from the swiotlb buffer. Let me note, that if the size used with dma_sync_* API is less than the size used with dma_[un]map_*, under certain circumstances we may still end up with swiotlb not being transparent. In that sense, this is no perfect fix either. To get this bullet proof, we would have to bounce the entire mapping/bounce buffer. For that we would have to figure out the starting address, and the size of the mapping in swiotlb_sync_single_for_device(). While this does seem possible, there seems to be no firm consensus on how things are supposed to work. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-07net: Fix esp GSO on inter address family tunnels.Steffen Klassert1-0/+2
The esp tunnel GSO handlers use skb_mac_gso_segment to push the inner packet to the segmentation handlers. However, skb_mac_gso_segment takes the Ethernet Protocol ID from 'skb->protocol' which is wrong for inter address family tunnels. We fix this by introducing a new skb_eth_gso_segment function. This function can be used if it is necessary to pass the Ethernet Protocol ID directly to the segmentation handler. First users of this function will be the esp4 and esp6 tunnel segmentation handlers. Fixes: c35fe4106b92 ("xfrm: Add mode handlers for IPsec on layer 2") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07esp: Fix possible buffer overflow in ESP transformationSteffen Klassert1-0/+2
The maximum message size that can be send is bigger than the maximum site that skb_page_frag_refill can allocate. So it is possible to write beyond the allocated buffer. Fix this by doing a fallback to COW in that case. v2: Avoid get get_order() costs as suggested by Linus Torvalds. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Reported-by: valis <sec@valis.email> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07net: Remove netif_rx_any_context() and netif_rx_ni().Sebastian Andrzej Siewior1-10/+0
Remove netif_rx_any_context and netif_rx_ni() because there are no more users in tree. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-07ptp: Add generic PTP is_sync() functionKurt Kanzenbach1-0/+15
PHY drivers such as micrel or dp83640 need to analyze whether a given skb is a PTP sync message for one step functionality. In order to avoid code duplication introduce a generic function and move it to ptp classify. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>