aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-10-03net/mlx5: Start health poll at earlier stage of driver loadMoshe Shemesh1-0/+1
Start health poll at earlier stage, so if fw fatal issue occurred before or during initialization commands such as init_hca or set_hca_cap the poll health can detect and indicate that the driver is already in error state. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03net/mlx5e: Expose rx_oversize_pkts_buffer counterGal Pressman1-2/+6
Add the rx_oversize_pkts_buffer counter to ethtool statistics. This counter exposes the number of dropped received packets due to length which arrived to RQ and exceed software buffer size allocated by the device for incoming traffic. It might imply that the device MTU is larger than the software buffers size. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03net/mlx5e: xsk: Use umr_mode to calculate striding RQ parametersMaxim Mikityanskiy1-0/+4
Instead of passing the unaligned flag, pass an enum that indicates the UMR mode. The next commit will add the third mode (KLM for certain configurations of XSK), which will be added to this enum instead of adding another bool flag everywhere. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski13-31/+311
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-03 We've added 143 non-merge commits during the last 27 day(s) which contain a total of 151 files changed, 8321 insertions(+), 1402 deletions(-). The main changes are: 1) Add kfuncs for PKCS#7 signature verification from BPF programs, from Roberto Sassu. 2) Add support for struct-based arguments for trampoline based BPF programs, from Yonghong Song. 3) Fix entry IP for kprobe-multi and trampoline probes under IBT enabled, from Jiri Olsa. 4) Batch of improvements to veristat selftest tool in particular to add CSV output, a comparison mode for CSV outputs and filtering, from Andrii Nakryiko. 5) Add preparatory changes needed for the BPF core for upcoming BPF HID support, from Benjamin Tissoires. 6) Support for direct writes to nf_conn's mark field from tc and XDP BPF program types, from Daniel Xu. 7) Initial batch of documentation improvements for BPF insn set spec, from Dave Thaler. 8) Add a new BPF_MAP_TYPE_USER_RINGBUF map which provides single-user-space-producer / single-kernel-consumer semantics for BPF ring buffer, from David Vernet. 9) Follow-up fixes to BPF allocator under RT to always use raw spinlock for the BPF hashtab's bucket lock, from Hou Tao. 10) Allow creating an iterator that loops through only the resources of one task/thread instead of all, from Kui-Feng Lee. 11) Add support for kptrs in the per-CPU arraymap, from Kumar Kartikeya Dwivedi. 12) Add a new kfunc helper for nf to set src/dst NAT IP/port in a newly allocated CT entry which is not yet inserted, from Lorenzo Bianconi. 13) Remove invalid recursion check for struct_ops for TCP congestion control BPF programs, from Martin KaFai Lau. 14) Fix W^X issue with BPF trampoline and BPF dispatcher, from Song Liu. 15) Fix percpu_counter leakage in BPF hashtab allocation error path, from Tetsuo Handa. 16) Various cleanups in BPF selftests to use preferred ASSERT_* macros, from Wang Yufen. 17) Add invocation for cgroup/connect{4,6} BPF programs for ICMP pings, from YiFei Zhu. 18) Lift blinding decision under bpf_jit_harden = 1 to bpf_capable(), from Yauheni Kaliuta. 19) Various libbpf fixes and cleanups including a libbpf NULL pointer deref, from Xin Liu. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (143 commits) net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c Documentation: bpf: Add implementation notes documentations to table of contents bpf, docs: Delete misformatted table. selftests/xsk: Fix double free bpftool: Fix error message of strerror libbpf: Fix overrun in netlink attribute iteration selftests/bpf: Fix spelling mistake "unpriviledged" -> "unprivileged" samples/bpf: Fix typo in xdp_router_ipv4 sample bpftool: Remove unused struct event_ring_info bpftool: Remove unused struct btf_attach_point bpf, docs: Add TOC and fix formatting. bpf, docs: Add Clang note about BPF_ALU bpf, docs: Move Clang notes to a separate file bpf, docs: Linux byteswap note bpf, docs: Move legacy packet instructions to a separate file selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline ... ==================== Link: https://lore.kernel.org/r/20221003194915.11847-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.cLorenzo Bianconi1-0/+19
Remove circular dependency between nf_nat module and nf_conntrack one moving bpf_ct_set_nat_info kfunc in nf_nat_bpf.c Fixes: 0fabd2aa199f ("net: netfilter: add bpf_ct_set_nat_info kfunc helper") Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/51a65513d2cda3eeb0754842e8025ab3966068d8.1664490511.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-03once: add DO_ONCE_SLOW() for sleepable contextsEric Dumazet1-0/+28
Christophe Leroy reported a ~80ms latency spike happening at first TCP connect() time. This is because __inet_hash_connect() uses get_random_once() to populate a perturbation table which became quite big after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") get_random_once() uses DO_ONCE(), which block hard irqs for the duration of the operation. This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock for operations where we prefer to stay in process context. Then __inet_hash_connect() can use get_random_slow_once() to populate its perturbation table. Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time") Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: Remove DECnet leftovers from flow.h.Guillaume Nault1-26/+0
DECnet was removed by commit 1202cdd66531 ("Remove DECnet support from kernel"). Let's also revome its flow structure. Compile-tested only (allmodconfig). Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: phy: mdio-i2c: support I2C MDIO protocol for RollBall SFP modulesMarek Behún1-1/+3
Some multigig SFPs from RollBall and Hilink do not expose functional MDIO access to the internal PHY of the SFP via I2C address 0x56 (although there seems to be read-only clause 22 access on this address). Instead these SFPs PHY can be accessed via I2C via the SFP Enhanced Digital Diagnostic Interface - I2C address 0x51. The SFP_PAGE has to be selected to 3 and the password must be filled with 0xff bytes for this PHY communication to work. This extends the mdio-i2c driver to support this protocol by adding a special parameter to mdio_i2c_alloc function via which this RollBall protocol can be selected. Signed-off-by: Marek Behún <kabel@kernel.org> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: sfp: create/destroy I2C mdiobus before PHY probe/after PHY releaseMarek Behún1-0/+6
Instead of configuring the I2C mdiobus when SFP driver is probed, create/destroy the mdiobus before the PHY is probed for/after it is released. This way we can tell the mdio-i2c code which protocol to use for each SFP transceiver. Move the code that determines MDIO I2C protocol from sfp_sm_probe_for_phy() to sfp_sm_mod_probe(), where most of the SFP ID parsing is done. Don't allocate I2C bus if no PHY is expected. Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: phylink: pass supported host PHY interface modes to phylib for SFP's PHYsMarek Behún1-0/+4
Pass the supported PHY interface types to phylib if the PHY we are connecting is inside a SFP, so that the PHY driver can select an appropriate host configuration mode for their interface according to the host capabilities. For example the Marvell 88X3310 PHY inside RollBall SFP modules defaults to 10gbase-r mode on host's side, and the marvell10g driver currently does not change this setting. But a host may not support 10gbase-r. For example Turris Omnia only supports sgmii, 1000base-x and 2500base-x modes. The PHY can be configured to use those modes, but in order for the PHY driver to do that, it needs to know which modes are supported. Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: sfp: augment SFP parsing with phy_interface_t bitmapRussell King1-2/+3
We currently parse the SFP EEPROM to a bitmap of ethtool link modes, and then attempt to convert the link modes to a PHY interface mode. While this works at present, there are cases where this is sub-optimal. For example, where a module can operate with several different PHY interface modes. To start addressing this, arrange for the SFP EEPROM parsing to also provide a bitmap of the possible PHY interface modes. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: Add helper function to parse netlink msg of ip_tunnel_parmLiu Jian1-0/+3
Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm. Reduces duplicate code, no actual functional changes. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: Add helper function to parse netlink msg of ip_tunnel_encapLiu Jian1-0/+3
Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap. Reduces duplicate code, no actual functional changes. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03Merge branch 'master' of ↵David S. Miller5-8/+60
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Refactor selftests to use an array of structs in xfrm_fill_key(). From Gautam Menghani. 2) Drop an unused argument from xfrm_policy_match. From Hongbin Wang. 3) Support collect metadata mode for xfrm interfaces. From Eyal Birger. 4) Add netlink extack support to xfrm. From Sabrina Dubroca. Please note, there is a merge conflict in: include/net/dst_metadata.h between commit: 0a28bfd4971f ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support") from the net-next tree and commit: 5182a5d48c3d ("net: allow storing xfrm interface metadata in metadata_dst") from the ipsec-next tree. Can be solved as done in linux-next. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-02net: sched: cls_api: introduce tc_cls_bind_class() helperZhengchao Shao1-0/+12
All the bind_class callback duplicate the same logic, this patch introduces tc_cls_bind_class() helper for common usage. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30net: dsa: remove bool devlink_port_setupVladimir Oltean1-2/+0
Since dsa_port_devlink_setup() and dsa_port_devlink_teardown() are already called from code paths which only execute once per port (due to the existing bool dp->setup), keeping another dp->devlink_port_setup is redundant, because we can already manage to balance the calls properly (and not call teardown when setup was never called, or call setup twice, or things like that). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net: devlink: add port_init/fini() helpers to allow ↵Jiri Pirko1-1/+5
pre-register/post-unregister functions Lifetime of some of the devlink objects, like regions, is currently forced to be different for devlink instance and devlink port instance (per-port regions). The reason is that for devlink ports, the internal structures initialization happens only after devlink_port_register() is called. To resolve this inconsistency, introduce new set of helpers to allow driver to initialize devlink pointer and region list before devlink_register() is called. That allows port regions to be created before devlink port registration and destroyed after devlink port unregistration. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net: devlink: introduce a flag to indicate devlink port being registeredJiri Pirko1-1/+2
Instead of relying on devlink pointer not being initialized, introduce an extra flag to indicate if devlink port is registered. This is needed as later on devlink pointer is going to be initialized even in case devlink port is not registered yet. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30Merge tag 'for-net-next-2022-09-30' of ↵Jakub Kicinski5-3/+80
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next - Add RTL8761BUV device (Edimax BT-8500) - Add a new PID/VID 13d3/3583 for MT7921 - Add Realtek RTL8852C support ID 0x13D3:0x3592 - Add VID/PID 0489/e0e0 for MediaTek MT7921 - Add a new VID/PID 0e8d/0608 for MT7921 - Add a new PID/VID 13d3/3578 for MT7921 - Add BT device 0cb8:c549 from RTW8852AE - Add support for Intel Magnetor * tag 'for-net-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (49 commits) Bluetooth: hci_sync: Fix not indicating power state Bluetooth: L2CAP: Fix user-after-free Bluetooth: Call shutdown for HCI_USER_CHANNEL Bluetooth: Prevent double register of suspend Bluetooth: hci_core: Fix not handling link timeouts propertly Bluetooth: hci_event: Make sure ISO events don't affect non-ISO connections Bluetooth: hci_debugfs: Fix not checking conn->debugfs Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times Bluetooth: MGMT: fix zalloc-simple.cocci warnings Bluetooth: hci_{ldisc,serdev}: check percpu_init_rwsem() failure Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create() Bluetooth: RFCOMM: Fix possible deadlock on socket shutdown/release Bluetooth: hci_sync: allow advertise when scan without RPA Bluetooth: btusb: Add a new VID/PID 0e8d/0608 for MT7921 Bluetooth: btusb: Add a new PID/VID 13d3/3583 for MT7921 Bluetooth: avoid hci_dev_test_and_set_flag() in mgmt_init_hdev() Bluetooth: btintel: Mark Intel controller to support LE_STATES quirk Bluetooth: btintel: Add support for Magnetor Bluetooth: btusb: Add a new PID/VID 13d3/3578 for MT7921 ... ==================== Link: https://lore.kernel.org/r/20221001004602.297366-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30Merge tag 'wireless-next-2022-09-30' of ↵Jakub Kicinski3-37/+154
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.1 Few stack changes and lots of driver changes in this round. brcmfmac has more activity as usual and it gets new hardware support. ath11k improves WCN6750 support and also other smaller features. And of course changes all over. Note: in early September wireless tree was merged to wireless-next to avoid some conflicts with mac80211 patches, this shouldn't cause any problems but wanted to mention anyway. Major changes: mac80211 - refactoring and preparation for Wi-Fi 7 Multi-Link Operation (MLO) feature continues brcmfmac - support CYW43439 SDIO chipset - support BCM4378 on Apple platforms - support CYW89459 PCIe chipset rtw89 - more work to get rtw8852c supported - P2P support - support for enabling and disabling MSDU aggregation via nl80211 mt76 - tx status reporting improvements ath11k - cold boot calibration support on WCN6750 - Target Wake Time (TWT) debugfs support for STA interface - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - implement SRAM dump debugfs interface - enable threaded NAPI on all hardware - WoW support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz wcn36xx - add SNR from a received frame as a source of system entropy * tag 'wireless-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (231 commits) wifi: rtl8xxxu: Improve rtl8xxxu_queue_select wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM wifi: rtl8xxxu: gen2: Enable 40 MHz channel width wifi: rtw89: 8852b: configure DLE mem wifi: rtw89: check DLE FIFO size with reserved size wifi: rtw89: mac: correct register of report IMR wifi: rtw89: pci: set power cut closed for 8852be wifi: rtw89: pci: add to do PCI auto calibration wifi: rtw89: 8852b: implement chip_ops::{enable,disable}_bb_rf wifi: rtw89: add DMA busy checking bits to chip info wifi: rtw89: mac: define DMA channel mask to avoid unsupported channels wifi: rtw89: pci: mask out unsupported TX channels iwlegacy: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper ipw2x00: Replace zero-length array with DECLARE_FLEX_ARRAY() helper wifi: iwlwifi: Track scan_cmd allocation size explicitly brcmfmac: Remove the call to "dtim_assoc" IOVAR brcmfmac: increase dcmd maximum buffer size brcmfmac: Support 89459 pcie brcmfmac: increase default max WOWL patterns to 16 cw1200: fix incorrect check to determine if no element is found in list ... ==================== Link: https://lore.kernel.org/r/20220930150413.A7984C433D6@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30xsk: Remove unused xsk_buff_discardMaxim Mikityanskiy1-7/+0
The previous commit removed the last usage of xsk_buff_discard in mlx5e, so the function that is no longer used can be removed. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> CC: "Björn Töpel" <bjorn@kernel.org> CC: Magnus Karlsson <magnus.karlsson@intel.com> CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: xsk: Use KSM for unaligned XSKMaxim Mikityanskiy1-0/+6
UMR MTTs used in striding RQ have certain alignment requirements. While it's guaranteed to work when UMR pages are aligned to the UMR page size, in practice it works then UMR pages are aligned to 8 bytes. However, it's still not enough flexibility for the unaligned mode of XSK. This patch leverages KSM to map UMR pages without alignment requirements, when unaligned XSK is active. The downside is that KSM entries are twice as big as MTTs, which limits the maximum WQE size, so regular RQs and aligned XSK continue using MTTs. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/mlx5e: Use runtime page_shift for striding RQMaxim Mikityanskiy1-0/+2
This commit allows striding RQ to determine MTT page size at runtime, instead of sticking to the compile-time PAGE_SIZE. This functionality will be used by a following commit that adjusts the MTT page size to the XSK frame size. Stick with PAGE_SIZE for XSK on legacy RQ, as frag_stride is not used in data path, it only helps calculate how pages are partitioned into fragments, and PAGE_SIZE will ensure each fragment starts at the beginning of a new allocation unit (XSK frame). Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30xsk: Expose min chunk size to driversMaxim Mikityanskiy1-0/+3
Drivers should be aware of the range of valid UMEM chunk sizes to be able to allocate their internal structures of an appropriate size. It will be used by mlx5e in the following patches. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> CC: "Björn Töpel" <bjorn@kernel.org> CC: Magnus Karlsson <magnus.karlsson@intel.com> CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net-next: skbuff: refactor pskb_pullRichard Gobert1-14/+9
pskb_may_pull already contains all of the checks performed by pskb_pull. Use pskb_may_pull for validation in pskb_pull, eliminating the duplication and making __pskb_pull obsolete. Replace __pskb_pull with pskb_pull where applicable. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-29net/sched: taprio: allow user input of per-tc max SDUVladimir Oltean2-0/+16
IEEE 802.1Q clause 12.29.1.1 "The queueMaxSDUTable structure and data types" and 8.6.8.4 "Enhancements for scheduled traffic" talk about the existence of a per traffic class limitation of maximum frame sizes, with a fallback on the port-based MTU. As far as I am able to understand, the 802.1Q Service Data Unit (SDU) represents the MAC Service Data Unit (MSDU, i.e. L2 payload), excluding any number of prepended VLAN headers which may be otherwise present in the MSDU. Therefore, the queueMaxSDU is directly comparable to the device MTU (1500 means L2 payload sizes are accepted, or frame sizes of 1518 octets, or 1522 plus one VLAN header). Drivers which offload this are directly responsible of translating into other units of measurement. To keep the fast path checks optimized, we keep 2 arrays in the qdisc, one for max_sdu translated into frame length (so that it's comparable to skb->len), and another for offloading and for dumping back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29net/sched: query offload capabilities through ndo_setup_tc()Vladimir Oltean3-0/+9
When adding optional new features to Qdisc offloads, existing drivers must reject the new configuration until they are coded up to act on it. Since modifying all drivers in lockstep with the changes in the Qdisc can create problems of its own, it would be nice if there existed an automatic opt-in mechanism for offloading optional features. Jakub proposes that we multiplex one more kind of call through ndo_setup_tc(): one where the driver populates a Qdisc-specific capability structure. First user will be taprio in further changes. Here we are introducing the definitions for the base functionality. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220923163310.3192733-3-vladimir.oltean@nxp.com/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29net: skb: introduce and use a single page frag cachePaolo Abeni1-0/+1
After commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") we are observing 10-20% regressions in performance tests with small packets. The perf trace points to high pressure on the slab allocator. This change tries to improve the allocation schema for small packets using an idea originally suggested by Eric: a new per CPU page frag is introduced and used in __napi_alloc_skb to cope with small allocation requests. To ensure that the above does not lead to excessive truesize underestimation, the frag size for small allocation is inflated to 1K and all the above is restricted to build with 4K page size. Note that we need to update accordingly the run-time check introduced with commit fd9ea57f4e95 ("net: add napi_get_frags_check() helper"). Alex suggested a smart page refcount schema to reduce the number of atomic operations and deal properly with pfmemalloc pages. Under small packet UDP flood, I measure a 15% peak tput increases. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Alexander H Duyck <alexanderduyck@fb.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/6b6f65957c59f86a353fc09a5127e83a32ab5999.1664350652.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29docs: netlink: clarify the historical baggage of Netlink flagsJakub Kicinski1-1/+1
nlmsg_flags are full of historical baggage, inconsistencies and strangeness. Try to document it more thoroughly. Explain the meaning of the ECHO flag (and while at it clarify the comment in the uAPI). Handwave a little about the NEW request flags and how they make sense on the surface but cater to really old paradigm before commands were a thing. I will add more notes on how to make use of ECHO and discouragement for reuse of flags to the kernel-side documentation. Link: https://lore.kernel.org/r/20220927212306.823862-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-68/+47
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itselfMartin KaFai Lau1-0/+6
When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by using a bit of an existing 1 byte hole in tcp_sock to check if there is on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220929070407.965581-5-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampolineMartin KaFai Lau1-0/+4
The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way. The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller. The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion. One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches. Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-29Merge tag 'net-6.0-rc8' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wifi and can. Current release - regressions: - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume() - wifi: fix locking in mac80211 mlme - eth: - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()" - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe Previous releases - regressions: - wifi: fix regression with non-QoS drivers Previous releases - always broken: - mptcp: fix unreleased socket in accept queue - wifi: - don't start TX with fq->lock to fix deadlock - fix memory corruption in minstrel_ht_update_rates() - eth: - macb: fix ZynqMP SGMII non-wakeup source resume failure - mt7531: only do PLL once after the reset - usbnet: fix memory leak in usbnet_disconnect() Misc: - usb: qmi_wwan: add new usb-id for Dell branded EM7455" * tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits) mptcp: fix unreleased socket in accept queue mptcp: factor out __mptcp_close() without socket lock net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2} net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge can: c_can: don't cache TX messages for C_CAN cores ice: xsk: drop power of 2 ring size restriction for AF_XDP ice: xsk: change batched Tx descriptor cleaning net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455 selftests: Fix the if conditions of in test_extra_filter() net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume() net: stmmac: power up/down serdes in stmmac_open/release wifi: mac80211: mlme: Fix double unlock on assoc success handling wifi: mac80211: mlme: Fix missing unlock on beacon RX wifi: mac80211: fix memory corruption in minstrel_ht_update_rates() wifi: mac80211: fix regression with non-QoS drivers wifi: mac80211: ensure vif queues are operational after start wifi: mac80211: don't start TX with fq->lock to fix deadlock wifi: cfg80211: fix MCS divisor value net: hippi: Add missing pci_disable_device() in rr_init_one() net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe ...
2022-09-29Merge tag 'ata-6.0-rc7' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA fixes from Damien Le Moal: "Three late patches to fix problems discovered recently: - Add a horkage to disable link power management by default for the Pioneer BDR-207M and BDR-205 DVD drives (from Niklas) - Two patches to fix setting the maximum queue depth of libsas owned ATA devices (from me)" * tag 'ata-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: libata-sata: Fix device queue depth control ata: libata-scsi: Fix initialization of device queue depth libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
2022-09-29xfrm: ipcomp: add extack to ipcomp{4,6}_init_stateSabrina Dubroca1-1/+1
And the shared helper ipcomp_init_state. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-29xfrm: pass extack down to xfrm_type ->init_stateSabrina Dubroca1-1/+2
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-28net/mlx5: Add the log_min_mkey_entity_size capabilityMaxim Mikityanskiy1-1/+7
Add the capability that will allow the driver to determine the minimal MTT page size to be able to map the smallest possible pages in XSK. The older firmwares that don't have this capability default to 12 (i.e. 4096-byte pages). Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28Merge branch 'mlx5-next' of ↵Jakub Kicinski5-98/+131
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== updates from mlx5-next 2022-09-24 Updates form mlx5-next including[1]: 1) HW definitions and support for NPPS clock settings. 2) various cleanups 3) Enable hash mode by default for all NICs 4) page tracker and advanced virtualization HW definitions for vfio [1] https://lore.kernel.org/netdev/20220907233636.388475-1-saeed@kernel.org/ * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Remove from FPGA IFC file not-needed definitions net/mlx5: Remove unused structs net/mlx5: Remove unused functions net/mlx5: detect and enable bypass port select flow table net/mlx5: Lag, enable hash mode by default for all NICs net/mlx5: Lag, set active ports if support bypass port select flow table RDMA/mlx5: Don't set tx affinity when lag is in hash mode net/mlx5: add IFC bits for bypassing port select flow table net/mlx5: Add support for NPPS with real time mode net/mlx5: Expose NPPS related registers net/mlx5: Query ADV_VIRTUALIZATION capabilities net/mlx5: Introduce ifc bits for page tracker RDMA/mlx5: Move function mlx5_core_query_ib_ppcnt() to mlx5_ib ==================== Link: https://lore.kernel.org/all/20220927201906.234015-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28net: drop the weight argument from netif_napi_addJakub Kicinski1-3/+2
We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28net: Fix incorrect address comparison when searching for a bind2 bucketMartin KaFai Lau1-0/+3
The v6_rcv_saddr and rcv_saddr are inside a union in the 'struct inet_bind2_bucket'. When searching a bucket by following the bhash2 hashtable chain, eg. inet_bind2_bucket_match, it is only using the sk->sk_family and there is no way to check if the inet_bind2_bucket has a v6 or v4 address in the union. This leads to an uninit-value KMSAN report in [0] and also potentially incorrect matches. This patch fixes it by adding a family member to the inet_bind2_bucket and then tests 'sk->sk_family != tb->family' before matching the sk's address to the tb's address. Cc: Joanne Koong <joannelkoong@gmail.com> Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20220927002544.3381205-1-kafai@fb.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28tcp: export tcp_sendmsg_fastopenBenjamin Hesmans1-0/+2
It will be used to support TCP FastOpen with MPTCP in the following commit. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28netns: Replace zero-length array with DECLARE_FLEX_ARRAY() helperGustavo A. R. Silva1-1/+1
Zero-length arrays are deprecated and we are moving towards adopting C99 flexible-array members, instead. So, replace zero-length arrays declarations in anonymous union with the new DECLARE_FLEX_ARRAY() helper macro. This helper allows for flexible-array members in unions. Link: https://github.com/KSPP/linux/issues/193 Link: https://github.com/KSPP/linux/issues/225 Link: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/YzIvfGXxfjdXmIS3@work Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28net: shrink struct ubuf_infoPavel Begunkov1-18/+4
We can benefit from a smaller struct ubuf_info, so leave only mandatory fields and let users to decide how they want to extend it. Convert MSG_ZEROCOPY to struct ubuf_info_msgzc and remove duplicated fields. This reduces the size from 48 bytes to just 16. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28net: introduce struct ubuf_info_msgzcPavel Begunkov1-0/+21
We're going to split struct ubuf_info and leave there only mandatory fields. Users are free to extend it. Add struct ubuf_info_msgzc, which will be an extended version for MSG_ZEROCOPY and some other users. It duplicates of struct ubuf_info for now and will be removed in a couple of patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28bpf: Handle bpf_link_info for the parameterized task BPF iterators.Kui-Feng Lee1-0/+4
Add new fields to bpf_link_info that users can query it through bpf_obj_get_info_by_fd(). Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-3-kuifeng@fb.com
2022-09-28bpf: Parameterize task iterators.Kui-Feng Lee2-0/+31
Allow creating an iterator that loops through resources of one thread/process. People could only create iterators to loop through all resources of files, vma, and tasks in the system, even though they were interested in only the resources of a specific task or process. Passing the additional parameters, people can now create an iterator to go through all resources or only the resources of a task. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-2-kuifeng@fb.com
2022-09-28Revert "net: set proper memcg for net_init hooks allocations"Shakeel Butt1-45/+0
This reverts commit 1d0403d20f6c281cb3d14c5f1db5317caeec48e9. Anatoly Pugachev reported that the commit 1d0403d20f6c ("net: set proper memcg for net_init hooks allocations") is somehow causing the sparc64 VMs failed to boot and the VMs boot fine with that patch reverted. So, revert the patch for now and later we can debug the issue. Link: https://lore.kernel.org/all/20220918092849.GA10314@u164.east.ru/ Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Shakeel Butt <shakeelb@google.com> Cc: Vasily Averin <vvs@openvz.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: cgroups@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Tested-by: Anatoly Pugachev <matorola@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Fixes: 1d0403d20f6c ("net: set proper memcg for net_init hooks allocations") Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-28ata: libata-sata: Fix device queue depth controlDamien Le Moal1-2/+2
The function __ata_change_queue_depth() uses the helper ata_scsi_find_dev() to get the ata_device structure of a scsi device and set that device maximum queue depth. However, when the ata device is managed by libsas, ata_scsi_find_dev() returns NULL, turning __ata_change_queue_depth() into a nop, which prevents the user from setting the maximum queue depth of ATA devices used with libsas based HBAs. Fix this by renaming __ata_change_queue_depth() to ata_change_queue_depth() and adding a pointer to the ata_device structure of the target device as argument. This pointer is provided by ata_scsi_change_queue_depth() using ata_scsi_find_dev() in the case of a libata managed device and by sas_change_queue_depth() using sas_to_ata_dev() in the case of a libsas managed ata device. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: John Garry <john.garry@huawei.com>
2022-09-27net: tls: Add ARIA-GCM algorithmTaehee Yoo1-0/+30
RFC 6209 describes ARIA for TLS 1.2. ARIA-128-GCM and ARIA-256-GCM are defined in RFC 6209. This patch would offer performance increment and an opportunity for hardware offload. Benchmark results: iperf-ssl are used. CPU: intel i3-12100. TLS(openssl-3.0-dev) [ 3] 0.0- 1.0 sec 185 MBytes 1.55 Gbits/sec [ 3] 1.0- 2.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 2.0- 3.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 3.0- 4.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 4.0- 5.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 0.0- 5.0 sec 927 MBytes 1.56 Gbits/sec kTLS(aria-generic) [ 3] 0.0- 1.0 sec 198 MBytes 1.66 Gbits/sec [ 3] 1.0- 2.0 sec 194 MBytes 1.62 Gbits/sec [ 3] 2.0- 3.0 sec 194 MBytes 1.63 Gbits/sec [ 3] 3.0- 4.0 sec 194 MBytes 1.63 Gbits/sec [ 3] 4.0- 5.0 sec 194 MBytes 1.62 Gbits/sec [ 3] 0.0- 5.0 sec 974 MBytes 1.63 Gbits/sec kTLS(aria-avx wirh GFNI) [ 3] 0.0- 1.0 sec 632 MBytes 5.30 Gbits/sec [ 3] 1.0- 2.0 sec 657 MBytes 5.51 Gbits/sec [ 3] 2.0- 3.0 sec 657 MBytes 5.51 Gbits/sec [ 3] 3.0- 4.0 sec 656 MBytes 5.50 Gbits/sec [ 3] 4.0- 5.0 sec 656 MBytes 5.50 Gbits/sec [ 3] 0.0- 5.0 sec 3.18 GBytes 5.47 Gbits/sec Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/20220925150033.24615-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27net/mlx5: Remove from FPGA IFC file not-needed definitionsLeon Romanovsky2-24/+16
Move IP layout bits definitions to be close to the place that actually uses it, together with removal extra defines that not in-use. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>