aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-09-30net/mlx5: Fix spelling mistake "syndrom" -> "syndrome"Colin Ian King1-1/+1
There is a spelling mistake in a devlink_health_report message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30net: bna: Fix spelling mistake "muliple" -> "multiple"Colin Ian King1-1/+1
There is a spelling mistake in a literal string in the array bnad_net_stats_strings. Fix it. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30ibmveth: Ethtool set queue supportNick Child1-29/+120
Implement channel management functions to allow dynamic addition and removal of transmit queues. The `ethtool --show-channels` and `ethtool --set-channels` commands can be used to get and set the number of queues, respectively. Allow the ability to add as many transmit queues as available processors but never allow more than the hard maximum of 16. The number of receive queues is one and cannot be modified. Depending on whether the requested number of queues is larger or smaller than the current value, either allocate or free long term buffers. Since long term buffer construction and destruction can occur in two different areas, from either channel set requests or device open/close, define functions for performing this work. If allocation of a new buffer fails, then attempt to revert back to the previous number of queues. Signed-off-by: Nick Child <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30ibmveth: Implement multi queue on xmitNick Child2-31/+43
The `ndo_start_xmit` function is protected by a spinlock on the tx queue being used to transmit the skb. Allow concurrent calls to `ndo_start_xmit` by using more than one tx queue. This allows for greater throughput when several jobs are trying to transmit data. Introduce 16 tx queues (leave single rx queue as is) which each correspond to one DMA mapped long term buffer. Signed-off-by: Nick Child <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30ibmveth: Copy tx skbs into a premapped bufferNick Child2-133/+74
Rather than DMA mapping and unmapping every outgoing skb, copy the skb into a buffer that was mapped during the drivers open function. Copying the skb and its frags have proven to be more time efficient than mapping and unmapping. As an effect, performance increases by 3-5 Gbits/s. Allocate and DMA map one continuous 64KB buffer at `ndo_open`. This buffer is maintained until `ibmveth_close` is called. This buffer is large enough to hold the largest possible linnear skb. During `ndo_start_xmit`, copy the skb and all of it's frags into the continuous buffer. By manually linnearizing all the socket buffers, time is saved during memcpy as well as more efficient handling in FW. As a result, we no longer need to worry about the firmware limitation of handling a max of 6 frags. So, we only need to maintain 1 descriptor instead of 6 and can hardcode 0 for the other 5 descriptors during h_send_logical_lan. Since, DMA allocation/mapping issues can no longer arise in xmit functions, we can further reduce code size by removing the need for a bounce buffer on DMA errors. Signed-off-by: Nick Child <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30bnx2: Fix spelling mistake "bufferred" -> "buffered"Colin Ian King1-2/+2
There are spelling mistakes in two literal strings. Fix these. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30net: bridge: assign path_cost for 2.5G and 5G link speedSteven Hsieh1-1/+10
As 2.5G, 5G ethernet ports are more common and affordable, these ports are being used in LAN bridge devices. STP port_cost() is missing path_cost assignment for these link speeds, causes highest cost 100 being used. This result in lower speed port being picked when there is loop between 5G and 1G ports. Original path_cost: 10G=2, 1G=4, 100m=19, 10m=100 Adjusted path_cost: 10G=2, 5G=3, 2.5G=4, 1G=5, 100m=19, 10m=100 speed greater than 10G = 1 Signed-off-by: Steven Hsieh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30net: lan966x: Fix spelling mistake "tarffic" -> "traffic"Colin Ian King1-1/+1
There is a spelling mistake in a netdev_err message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30net-next: skbuff: refactor pskb_pullRichard Gobert4-18/+13
pskb_may_pull already contains all of the checks performed by pskb_pull. Use pskb_may_pull for validation in pskb_pull, eliminating the duplication and making __pskb_pull obsolete. Replace __pskb_pull with pskb_pull where applicable. Signed-off-by: Richard Gobert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30net: bonding: Convert to use sysfs_emit()/sysfs_emit_at() APIsWang Yufen2-67/+67
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Wang Yufen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30net-sysfs: Convert to use sysfs_emit() APIsWang Yufen1-29/+29
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Wang Yufen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30net: tun: Convert to use sysfs_emit() APIsWang Yufen1-7/+7
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Wang Yufen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30Merge branch 'net-tsnep-multiqueue'David S. Miller8-116/+693
Gerhard Engleder says: ==================== tsnep: multi queue support and some other improvements Add support for additional TX/RX queues along with RX flow classification support. Binding is extended to allow additional interrupts for additional TX/RX queues. Also dma-coherent is allowed as minor improvement. RX path optimisation is done by using page pool as preparations for future XDP support. v4: - rework dma-coherent commit message (Krzysztof Kozlowski) - fixed order of interrupt-names in binding (Krzysztof Kozlowski) - add line break between examples in binding (Krzysztof Kozlowski) - add RX_CLS_LOC_ANY support to RX flow classification v3: - now with changes in cover letter v2: - use netdev_name() (Jakub Kicinski) - use ENOENT if RX flow rule is not found (Jakub Kicinski) - eliminate return code of tsnep_add_rule() (Jakub Kicinski) - remove commit with lazy refill due to depletion problem (Jakub Kicinski) ==================== Signed-off-by: David S. Miller <[email protected]>
2022-09-30tsnep: Use page pool for RXGerhard Engleder3-68/+100
Use page pool for RX buffer handling. Makes RX path more efficient and is required prework for future XDP support. Signed-off-by: Gerhard Engleder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30tsnep: Add EtherType RX flow classification supportGerhard Engleder6-5/+403
Received Ethernet frames are assigned to first RX queue per default. Based on EtherType Ethernet frames can be assigned to other RX queues. This enables processing of real-time Ethernet protocols on dedicated RX queues. Add RX flow classification interface for EtherType based RX queue assignment. Signed-off-by: Gerhard Engleder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30tsnep: Support multiple TX/RX queue pairsGerhard Engleder3-11/+53
Support additional TX/RX queue pairs if dedicated interrupt is available. Interrupts are detected by name in device tree. Signed-off-by: Gerhard Engleder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30tsnep: Move interrupt from device to queueGerhard Engleder2-33/+99
For multiple queues multiple interrupts shall be used. Therefore, rework global interrupt to per queue interrupt. Every interrupt name shall contain interface name and queue information. To get a valid interface name, the interrupt request needs to by done during open like in other drivers. Additionally, this allows the removal of some initialisation checks in the interrupt handler. Signed-off-by: Gerhard Engleder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30dt-bindings: net: tsnep: Allow additional interruptsGerhard Engleder1-2/+39
Additional TX/RX queue pairs require dedicated interrupts. Extend binding with additional interrupts. Signed-off-by: Gerhard Engleder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30dt-bindings: net: tsnep: Allow dma-coherentGerhard Engleder1-0/+2
Within SoCs like ZynqMP, FPGA logic can be connected to different kinds of AXI master ports. Also cache coherent AXI master ports are available. The property "dma-coherent" is used to signal that DMA is cache coherent. Add "dma-coherent" property to allow the configuration of cache coherent DMA. Signed-off-by: Gerhard Engleder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-30Merge branch 'xfrm: add netlink extack to all the ->init_stat'Steffen Klassert13-87/+140
Sabrina Dubroca says: ============ This series completes extack support for state creation. ============ Signed-off-by: Steffen Klassert <[email protected]>
2022-09-29Merge branch '100GbE' of ↵Jakub Kicinski6-58/+93
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-28 (ice) Arkadiusz implements a single pin initialization function, checking feature bits, instead of having separate device functions and updates sub-device IDs for recognizing E810T devices. Martyna adds support for switchdev filters on VLAN priority field. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add support for VLAN priority filters in switchdev ice: support features on new E810T variants ice: Merge pin initialization of E810 and E810T adapters ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net: phy: Convert to use sysfs_emit() APIsWang Yufen2-7/+7
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Wang Yufen <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29Merge branch 'add-tc-taprio-support-for-queuemaxsdu'Jakub Kicinski14-74/+433
Vladimir Oltean says: ==================== Add tc-taprio support for queueMaxSDU The tc-taprio offload mode supported by the Felix DSA driver has limitations surrounding its guard bands. The initial discussion was at: https://lore.kernel.org/netdev/[email protected]/ with the latest status being that we now have a vsc9959_tas_guard_bands_update() method which makes a best-guess attempt at how much useful space to reserve for packet scheduling in a taprio interval, and how much to reserve for guard bands. IEEE 802.1Q actually does offer a tunable variable (queueMaxSDU) which can determine the max MTU supported per traffic class. In turn we can determine the size we need for the guard bands, depending on the queueMaxSDU. This way we can make the guard band of small taprio intervals smaller than one full MTU worth of transmission time, if we know that said traffic class will transport only smaller packets. As discussed with Gerhard Engleder, the queueMaxSDU may also be useful in limiting the latency on an endpoint, if some of the TX queues are outside of the control of the Linux driver. https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/ Allow input of queueMaxSDU through netlink into tc-taprio, offload it to the hardware I have access to (LS1028A), and (implicitly) deny non-default values to everyone else. Kurt Kanzenbach has also kindly tested and shared a patch to offload this to hellcreek. v3 at: https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/ v2 at: https://patchwork.kernel.org/project/netdevbpf/list/?series=679954&state=* v1 at: https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net: enetc: offload per-tc max SDU from tc-taprioVladimir Oltean3-6/+57
The driver currently sets the PTCMSDUR register statically to the max MTU supported by the interface. Keep this logic if tc-taprio is absent or if the max_sdu for a traffic class is 0, and follow the requested max SDU size otherwise. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net: enetc: use common naming scheme for PTGCR and PTGCAPR registersVladimir Oltean2-12/+11
The Port Time Gating Control Register (PTGCR) and Port Time Gating Capability Register (PTGCAPR) have definitions in the driver which aren't in line with the other registers. Rename these. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net: enetc: cache accesses to &priv->si->hwVladimir Oltean3-48/+49
The &priv->si->hw construct dereferences 2 pointers and makes lines longer than they need to be, in turn making the code harder to read. Replace &priv->si->hw accesses with a "hw" variable when there are 2 or more accesses within a function that dereference this. This includes loops, since &priv->si->hw is a loop invariant. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net: dsa: hellcreek: Offload per-tc max SDU from tc-taprioKurt Kanzenbach2-2/+81
Add support for configuring the max SDU per priority and per port. If not specified, keep the default. Signed-off-by: Kurt Kanzenbach <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net: dsa: hellcreek: refactor hellcreek_port_setup_tc() to use switch/caseVladimir Oltean1-8/+12
The following patch will need to make this function also respond to TC_QUERY_BASE, so make the processing more structured around the tc_setup_type. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net: dsa: felix: offload per-tc max SDU from tc-taprioVladimir Oltean1-2/+35
Our current vsc9959_tas_guard_bands_update() algorithm has a limitation imposed by the hardware design. To avoid packet overruns between one gate interval and the next (which would add jitter for scheduled traffic in the next gate), we configure the switch to use guard bands. These are as large as the largest packet which is possible to be transmitted. The problem is that at tc-taprio intervals of sizes comparable to a guard band, there isn't an obvious place in which to split the interval between the useful portion (for scheduling) and the guard band portion (where scheduling is blocked). For example, a 10 us interval at 1Gbps allows 1225 octets to be transmitted. We currently split the interval between the bare minimum of 33 ns useful time (required to schedule a single packet) and the rest as guard band. But 33 ns of useful scheduling time will only allow a single packet to be sent, be that packet 1200 octets in size, or 60 octets in size. It is impossible to send 2 60 octets frames in the 10 us window. Except that if we reduced the guard band (and therefore the maximum allowable SDU size) to 5 us, the useful time for scheduling is now also 5 us, so more packets could be scheduled. The hardware inflexibility of not scheduling according to individual packet lengths must unfortunately propagate to the user, who needs to tune the queueMaxSDU values if he wants to fit more small packets into a 10 us interval, rather than one large packet. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net/sched: taprio: allow user input of per-tc max SDUVladimir Oltean3-1/+167
IEEE 802.1Q clause 12.29.1.1 "The queueMaxSDUTable structure and data types" and 8.6.8.4 "Enhancements for scheduled traffic" talk about the existence of a per traffic class limitation of maximum frame sizes, with a fallback on the port-based MTU. As far as I am able to understand, the 802.1Q Service Data Unit (SDU) represents the MAC Service Data Unit (MSDU, i.e. L2 payload), excluding any number of prepended VLAN headers which may be otherwise present in the MSDU. Therefore, the queueMaxSDU is directly comparable to the device MTU (1500 means L2 payload sizes are accepted, or frame sizes of 1518 octets, or 1522 plus one VLAN header). Drivers which offload this are directly responsible of translating into other units of measurement. To keep the fast path checks optimized, we keep 2 arrays in the qdisc, one for max_sdu translated into frame length (so that it's comparable to skb->len), and another for offloading and for dumping back to the user. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net/sched: query offload capabilities through ndo_setup_tc()Vladimir Oltean4-0/+26
When adding optional new features to Qdisc offloads, existing drivers must reject the new configuration until they are coded up to act on it. Since modifying all drivers in lockstep with the changes in the Qdisc can create problems of its own, it would be nice if there existed an automatic opt-in mechanism for offloading optional features. Jakub proposes that we multiplex one more kind of call through ndo_setup_tc(): one where the driver populates a Qdisc-specific capability structure. First user will be taprio in further changes. Here we are introducing the definitions for the base functionality. Link: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/ Suggested-by: Jakub Kicinski <[email protected]> Co-developed-by: Jakub Kicinski <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net/tipc: Remove unused struct distr_queue_itemYuan Can1-8/+0
After commit 09b5678c778f("tipc: remove dead code in tipc_net and relatives"), struct distr_queue_item is not used any more and can be removed as well. Signed-off-by: Yuan Can <[email protected]> Acked-by: Jon Maloy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net: skb: introduce and use a single page frag cachePaolo Abeni3-22/+104
After commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") we are observing 10-20% regressions in performance tests with small packets. The perf trace points to high pressure on the slab allocator. This change tries to improve the allocation schema for small packets using an idea originally suggested by Eric: a new per CPU page frag is introduced and used in __napi_alloc_skb to cope with small allocation requests. To ensure that the above does not lead to excessive truesize underestimation, the frag size for small allocation is inflated to 1K and all the above is restricted to build with 4K page size. Note that we need to update accordingly the run-time check introduced with commit fd9ea57f4e95 ("net: add napi_get_frags_check() helper"). Alex suggested a smart page refcount schema to reduce the number of atomic operations and deal properly with pfmemalloc pages. Under small packet UDP flood, I measure a 15% peak tput increases. Suggested-by: Eric Dumazet <[email protected]> Suggested-by: Alexander H Duyck <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Link: https://lore.kernel.org/r/6b6f65957c59f86a353fc09a5127e83a32ab5999.1664350652.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29net: sched: cls_u32: Avoid memcpy() false-positive warningKees Cook1-1/+5
To work around a misbehavior of the compiler's ability to see into composite flexible array structs (as detailed in the coming memcpy() hardening series[1]), use unsafe_memcpy(), as the sizing, bounds-checking, and allocation are all very tightly coupled here. This silences the false-positive reported by syzbot: memcpy: detected field-spanning write (size 80) of single field "&n->sel" at net/sched/cls_u32.c:1043 (size 16) [1] https://lore.kernel.org/linux-hardening/[email protected] Cc: Cong Wang <[email protected]> Cc: Jiri Pirko <[email protected]> Reported-by: [email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29dt-bindings: net: snps,dwmac: Document stmmac-axi-config subnodeMarek Vasut1-0/+54
The stmmac-axi-config subnode is present in multiple dwmac instance DTs, document its content per snps,axi-config property description which is a phandle to this subnode. Signed-off-by: Marek Vasut <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29docs: netlink: clarify the historical baggage of Netlink flagsJakub Kicinski2-14/+49
nlmsg_flags are full of historical baggage, inconsistencies and strangeness. Try to document it more thoroughly. Explain the meaning of the ECHO flag (and while at it clarify the comment in the uAPI). Handwave a little about the NEW request flags and how they make sense on the surface but cater to really old paradigm before commands were a thing. I will add more notes on how to make use of ECHO and discouragement for reuse of flags to the kernel-side documentation. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski249-1826/+1622
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29Bluetooth: L2CAP: Fix user-after-freeLuiz Augusto von Dentz1-0/+7
This uses l2cap_chan_hold_unless_zero() after calling __l2cap_get_chan_blah() to prevent the following trace: Bluetooth: l2cap_core.c:static void l2cap_chan_destroy(struct kref *kref) Bluetooth: chan 0000000023c4974d Bluetooth: parent 00000000ae861c08 ================================================================== BUG: KASAN: use-after-free in __mutex_waiter_is_first kernel/locking/mutex.c:191 [inline] BUG: KASAN: use-after-free in __mutex_lock_common kernel/locking/mutex.c:671 [inline] BUG: KASAN: use-after-free in __mutex_lock+0x278/0x400 kernel/locking/mutex.c:729 Read of size 8 at addr ffff888006a49b08 by task kworker/u3:2/389 Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sungwoo Kim <[email protected]>
2022-09-29Merge branch 'bpf: Remove recursion check for struct_ops prog'Alexei Starovoitov8-24/+112
Martin KaFai Lau says: ==================== From: Martin KaFai Lau <[email protected]> The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. The kernel does not call the tcp-cc's ops in a recursive way, so this set is to remove the recursion check for struct_ops prog. v3: - Clear the bpf_chg_cc_inprogress from the newly cloned tcp_sock in tcp_create_openreq_child() because the listen sk can be cloned without lock being held. (Eric Dumazet) v2: - v1 [0] turned into a long discussion on a few cases and also whether it needs to follow the bpf_run_ctx chain if there is tracing bpf_run_ctx (kprobe/trace/trampoline) running in between. It is a good signal that it is not obvious enough to reason about it and needs a tradeoff for a more straight forward approach. This revision uses one bit out of an existing 1 byte hole in the tcp_sock. It is in Patch 4. [0]: https://lore.kernel.org/bpf/[email protected]/T/#md98d40ac5ec295fdadef476c227a3401b2b6b911 ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-29selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION)Martin KaFai Lau2-8/+21
This patch changes the bpf_dctcp test to ensure the recurred bpf_setsockopt(TCP_CONGESTION) returns -EBUSY. Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-29bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itselfMartin KaFai Lau3-1/+34
When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by using a bit of an existing 1 byte hole in tcp_sock to check if there is on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: Martin KaFai Lau <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-29bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another functionMartin KaFai Lau1-17/+28
This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into another function. The next patch will add extra logic to avoid recursion and this will make the latter patch easier to follow. Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-29bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()Martin KaFai Lau1-6/+7
The check on the tcp-cc, "cdg", is done in the bpf_sk_setsockopt which is used by the bpf_tcp_ca, bpf_lsm, cg_sockopt, and tcp_iter hooks. However, it is not done for cg sock_ddr, cg sockops, and some of the bpf_lsm_cgroup hooks. The tcp-cc "cdg" should have very limited usage. This patch is to move the "cdg" check to the common sol_tcp_sockopt() so that all hooks have a consistent behavior. The motivation to make this check consistent now is because the latter patch will refactor the bpf_setsockopt(TCP_CONGESTION) into another function, so it is better to take this chance to refactor this piece also. Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-29bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampolineMartin KaFai Lau3-0/+30
The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way. The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller. The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion. One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches. Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-29Merge tag 'net-6.0-rc8' of ↵Linus Torvalds32-191/+223
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wifi and can. Current release - regressions: - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume() - wifi: fix locking in mac80211 mlme - eth: - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()" - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe Previous releases - regressions: - wifi: fix regression with non-QoS drivers Previous releases - always broken: - mptcp: fix unreleased socket in accept queue - wifi: - don't start TX with fq->lock to fix deadlock - fix memory corruption in minstrel_ht_update_rates() - eth: - macb: fix ZynqMP SGMII non-wakeup source resume failure - mt7531: only do PLL once after the reset - usbnet: fix memory leak in usbnet_disconnect() Misc: - usb: qmi_wwan: add new usb-id for Dell branded EM7455" * tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits) mptcp: fix unreleased socket in accept queue mptcp: factor out __mptcp_close() without socket lock net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2} net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge can: c_can: don't cache TX messages for C_CAN cores ice: xsk: drop power of 2 ring size restriction for AF_XDP ice: xsk: change batched Tx descriptor cleaning net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455 selftests: Fix the if conditions of in test_extra_filter() net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume() net: stmmac: power up/down serdes in stmmac_open/release wifi: mac80211: mlme: Fix double unlock on assoc success handling wifi: mac80211: mlme: Fix missing unlock on beacon RX wifi: mac80211: fix memory corruption in minstrel_ht_update_rates() wifi: mac80211: fix regression with non-QoS drivers wifi: mac80211: ensure vif queues are operational after start wifi: mac80211: don't start TX with fq->lock to fix deadlock wifi: cfg80211: fix MCS divisor value net: hippi: Add missing pci_disable_device() in rr_init_one() net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe ...
2022-09-29Merge tag 'input-for-v6.0-rc7' of ↵Linus Torvalds4-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - small fixes for iqs62x-keys and melfas_mip4 drivers - corrected register address in snvs_pwrkey driver - Synaptic driver will stop trying to use intertouch (native) mode on some Lenovo AMD devices * tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address Input: synaptics - disable Intertouch for Lenovo T14 and P14s AMD G1 Input: iqs62x-keys - drop unused device node references Input: melfas_mip4 - fix return value check in mip4_probe()
2022-09-29Merge tag 'ata-6.0-rc7' of ↵Linus Torvalds5-21/+24
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA fixes from Damien Le Moal: "Three late patches to fix problems discovered recently: - Add a horkage to disable link power management by default for the Pioneer BDR-207M and BDR-205 DVD drives (from Niklas) - Two patches to fix setting the maximum queue depth of libsas owned ATA devices (from me)" * tag 'ata-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: libata-sata: Fix device queue depth control ata: libata-scsi: Fix initialization of device queue depth libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
2022-09-29Merge tag 'loongarch-fixes-6.0-3' of ↵Linus Torvalds3-15/+4
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Some trivial fixes and cleanup" * tag 'loongarch-fixes-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Clean up loongson3_smp_ops declaration LoongArch: Fix and cleanup csr_era handling in do_ri() LoongArch: Align the address of kernel_entry to 4KB
2022-09-29net: cpmac: Add __init/__exit annotations to module init/exit funcsruanjinjie1-2/+2
Add __init/__exit annotations to module init/exit funcs Signed-off-by: ruanjinjie <[email protected]> Acked-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-09-29ethernet: 8390: remove unnecessary check of memYang Yingliang1-2/+1
The 'mem' returned by platform_get_resource() has been checked in probe function, so it is no need do this check in remove function. Signed-off-by: Yang Yingliang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>