aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-05-16ipv6: add READ_ONCE(sk->sk_bound_dev_if) in INET6_MATCH()Eric Dumazet4-14/+24
INET6_MATCH() runs without holding a lock on the socket. We probably need to annotate most reads. This patch makes INET6_MATCH() an inline function to ease our changes. v2: inline function only defined if IS_ENABLED(CONFIG_IPV6) Change the name to inet6_match(), this is no longer a macro. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16l2tp: use add READ_ONCE() to fetch sk->sk_bound_dev_ifEric Dumazet2-4/+8
Use READ_ONCE() in paths not holding the socket lock. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net_sched: em_meta: add READ_ONCE() in var_sk_bound_if()Eric Dumazet1-2/+5
sk->sk_bound_dev_if can change under us, use READ_ONCE() annotation. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16inet: add READ_ONCE(sk->sk_bound_dev_if) in inet_csk_bind_conflict()Eric Dumazet1-4/+8
inet_csk_bind_conflict() can access sk->sk_bound_dev_if for unlocked sockets. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16dccp: use READ_ONCE() to read sk->sk_bound_dev_ifEric Dumazet2-3/+3
When reading listener sk->sk_bound_dev_if locklessly, we must use READ_ONCE(). Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: core: add READ_ONCE/WRITE_ONCE annotations for sk->sk_bound_dev_ifEric Dumazet1-4/+7
sock_bindtoindex_locked() needs to use WRITE_ONCE(sk->sk_bound_dev_if, val), because other cpus/threads might locklessly read this field. sock_getbindtodevice(), sock_getsockopt() need READ_ONCE() because they run without socket lock held. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16tcp: sk->sk_bound_dev_if once in inet_request_bound_dev_if()Eric Dumazet1-2/+3
inet_request_bound_dev_if() reads sk->sk_bound_dev_if twice while listener socket is not locked. Another cpu could change this field under us. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16sctp: read sk->sk_bound_dev_if once in sctp_rcv()Eric Dumazet1-1/+3
sctp_rcv() reads sk->sk_bound_dev_if twice while the socket is not locked. Another cpu could change this field under us. Fixes: 0fd9a65a76e8 ("[SCTP] Support SO_BINDTODEVICE socket option on incoming packets.") Signed-off-by: Eric Dumazet <[email protected]> Cc: Neil Horman <[email protected]> Cc: Vlad Yasevich <[email protected]> Cc: Marcelo Ricardo Leitner <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: annotate races around sk->sk_bound_dev_ifEric Dumazet4-11/+13
UDP sendmsg() is lockless, and reads sk->sk_bound_dev_if while this field can be changed by another thread. Adds minimal annotations to avoid KCSAN splats for UDP. Following patches will add more annotations to potential lockless readers. BUG: KCSAN: data-race in __ip6_datagram_connect / udpv6_sendmsg write to 0xffff888136d47a94 of 4 bytes by task 7681 on cpu 0: __ip6_datagram_connect+0x6e2/0x930 net/ipv6/datagram.c:221 ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272 inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576 __sys_connect_file net/socket.c:1900 [inline] __sys_connect+0x197/0x1b0 net/socket.c:1917 __do_sys_connect net/socket.c:1927 [inline] __se_sys_connect net/socket.c:1924 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1924 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888136d47a94 of 4 bytes by task 7670 on cpu 1: udpv6_sendmsg+0xc60/0x16e0 net/ipv6/udp.c:1436 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:652 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000000 -> 0xffffff9b Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 7670 Comm: syz-executor.3 Tainted: G W 5.18.0-rc1-syzkaller-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 I chose to not add Fixes: tag because race has minor consequences and stable teams busy enough. Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16Merge branch 'big-tcp'David S. Miller30-67/+321
Eric Dumazet says: ==================== tcp: BIG TCP implementation This series implements BIG TCP as presented in netdev 0x15: https://netdevconf.info/0x15/session.html?BIG-TCP Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/ Standard TSO/GRO packet limit is 64KB With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic. Note that this feature is by default not enabled, because it might break some eBPF programs assuming TCP header immediately follows IPv6 header. While tcpdump recognizes the HBH/Jumbo header, standard pcap filters are unable to skip over IPv6 extension headers. Reducing number of packets traversing networking stack usually improves performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU. 'Standard' performance with current (74KB) limits. for i in {1..10}; do ./netperf -t TCP_RR -H iroa23 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done 77 138 183 8542.19 79 143 178 8215.28 70 117 164 9543.39 80 144 176 8183.71 78 126 155 9108.47 80 146 184 8115.19 71 113 165 9510.96 74 113 164 9518.74 79 137 178 8575.04 73 111 171 9561.73 Now enable BIG TCP on both hosts. ip link set dev eth0 gro_max_size 185000 gso_max_size 185000 for i in {1..10}; do ./netperf -t TCP_RR -H iroa23 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done 57 83 117 13871.38 64 118 155 11432.94 65 116 148 11507.62 60 105 136 12645.15 60 103 135 12760.34 60 102 134 12832.64 62 109 132 10877.68 58 82 115 14052.93 57 83 124 14212.58 57 82 119 14196.01 We see an increase of transactions per second, and lower latencies as well. v7: adopt unsafe_memcpy() in mlx5 to avoid FORTIFY warnings. v6: fix a compilation error for CONFIG_IPV6=n in "net: allow gso_max_size to exceed 65536", reported by kernel bots. v5: Replaced two patches (that were adding new attributes) with patches from Alexander Duyck. Idea is to reuse existing gso_max_size/gro_max_size v4: Rebased on top of Jakub series (Merge branch 'tso-gso-limit-split') max_tso_size is now family independent. v3: Fixed a typo in RFC number (Alexander) Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts. v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series. Addressed feedback, for Alexander and nvidia folks. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-05-16mlx5: support BIG TCP packetsEric Dumazet2-23/+89
mlx5 supports LSOv2. IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header with JUMBO TLV for big packets. We need to ignore/skip this HBH header when populating TX descriptor. Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only. v7: adopt unsafe_memcpy() and MLX5_UNSAFE_MEMCPY_DISCLAIMER v2: clear hopbyhop in mlx5e_tx_get_gso_ihs() v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y Signed-off-by: Coco Li <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Cc: Saeed Mahameed <[email protected]> Cc: Leon Romanovsky <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16mlx4: support BIG TCP packetsEric Dumazet2-9/+41
mlx4 supports LSOv2 just fine. IPv6 stack inserts a temporary Hop-by-Hop header with JUMBO TLV for big packets. We need to ignore the HBH header when populating TX descriptor. Tested: Before: (not enabling bigger TSO/GRO packets) ip link set dev eth0 gso_max_size 65536 gro_max_size 65536 netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000 MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind Local /Remote Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem Send Recv Size Size Time Rate local remote local remote bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr 262144 540000 70000 70000 10.00 6591.45 0.86 1.34 62.490 97.446 262144 540000 After: (enabling bigger TSO/GRO packets) ip link set dev eth0 gso_max_size 185000 gro_max_size 185000 netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000 MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind Local /Remote Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem Send Recv Size Size Time Rate local remote local remote bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr 262144 540000 70000 70000 10.00 8383.95 0.95 1.01 54.432 57.584 262144 540000 Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16veth: enable BIG TCP packetsEric Dumazet1-0/+1
Set the TSO driver limit to GSO_MAX_SIZE (512 KB). This allows the admin/user to set a GSO limit up to this value. ip link set dev veth10 gso_max_size 200000 Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: loopback: enable BIG TCP packetsEric Dumazet1-0/+2
Set the driver limit to GSO_MAX_SIZE (512 KB). This allows the admin/user to set a GSO limit up to this value. Tested: ip link set dev lo gso_max_size 200000 netperf -H ::1 -t TCP_RR -l 100 -- -r 80000,80000 & tcpdump shows : 18:28:42.962116 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626480001:3626560001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000 18:28:42.962138 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0 18:28:42.962152 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626560001:3626640001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000 18:28:42.962157 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0 18:28:42.962180 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626560001:3626640001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000 18:28:42.962214 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 0 18:28:42.962228 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626640001:3626720001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 80000 18:28:42.962233 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626720001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179266], length 0 Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ipv6: Add hop-by-hop header to jumbograms in ip6_outputCoco Li2-2/+21
Instead of simply forcing a 0 payload_len in IPv6 header, implement RFC 2675 and insert a custom extension header. Note that only TCP stack is currently potentially generating jumbograms, and that this extension header is purely local, it wont be sent on a physical link. This is needed so that packet capture (tcpdump and friends) can properly dissect these large packets. Signed-off-by: Coco Li <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: allow gro_max_size to exceed 65536Alexander Duyck6-12/+16
Allow the gro_max_size to exceed a value larger than 65536. There weren't really any external limitations that prevented this other than the fact that IPv4 only supports a 16 bit length field. Since we have the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to exceed this value and for IPv4 and non-TCP flows we can cap things at 65536 via a constant rather than relying on gro_max_size. [edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ipv6/gro: insert temporary HBH/jumbo headerEric Dumazet1-2/+30
Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build BIG TCP ipv6 packets (bigger than 64K). This patch changes ipv6_gro_complete() to insert a HBH/jumbo header so that resulting packet can go through IPv6/TCP stacks. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ipv6/gso: remove temporary HBH/jumbo headerEric Dumazet2-1/+56
ipv6 tcp and gro stacks will soon be able to build big TCP packets, with an added temporary Hop By Hop header. If GSO is involved for these large packets, we need to remove the temporary HBH header before segmentation happens. v2: perform HBH removal from ipv6_gso_segment() instead of skb_segment() (Alexander feedback) Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ipv6: add struct hop_jumbo_hdr definitionEric Dumazet1-0/+11
Following patches will need to add and remove local IPv6 jumbogram options to enable BIG TCP. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16tcp_cubic: make hystart_ack_delay() aware of BIG TCPEric Dumazet1-2/+2
hystart_ack_delay() had the assumption that a TSO packet would not be bigger than GSO_MAX_SIZE. This will no longer be true. We should use sk->sk_gso_max_size instead. This reduces chances of spurious Hystart ACK train detections. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: limit GSO_MAX_SIZE to 524280 bytesEric Dumazet1-2/+5
Make sure we will not overflow shinfo->gso_segs Minimal TCP MSS size is 8 bytes, and shinfo->gso_segs is a 16bit field. TCP_MIN_GSO_SIZE is currently defined in include/net/tcp.h, it seems cleaner to not bring tcp details into include/linux/netdevice.h Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: allow gso_max_size to exceed 65536Alexander Duyck16-17/+40
The code for gso_max_size was added originally to allow for debugging and workaround of buggy devices that couldn't support TSO with blocks 64K in size. The original reason for limiting it to 64K was because that was the existing limits of IPv4 and non-jumbogram IPv6 length fields. With the addition of Big TCP we can remove this limit and allow the value to potentially go up to UINT_MAX and instead be limited by the tso_max_size value. So in order to support this we need to go through and clean up the remaining users of the gso_max_size value so that the values will cap at 64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper limit for GSO_MAX_SIZE. v6: (edumazet) fixed a compile error if CONFIG_IPV6=n, in a new sk_trim_gso_size() helper. netif_set_tso_max_size() caps the requested TSO size with GSO_MAX_SIZE. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: add IFLA_TSO_{MAX_SIZE|SEGS} attributesEric Dumazet3-0/+10
New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS are used to report to user-space the device TSO limits. ip -d link sh dev eth1 ... tso_max_size 65536 tso_max_segs 65535 Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16Merge branch 'Renesas-RSZ-V2M-support'David S. Miller4-35/+168
Phil Edworthy says: ==================== Add Renesas RZ/V2M Ethernet support The RZ/V2M Ethernet is very similar to R-Car Gen3 Ethernet-AVB, though some small parts are the same as R-Car Gen2. Other differences are: * It has separate data (DI), error (Line 1) and management (Line 2) irqs rather than one irq for all three. * Instead of using the High-speed peripheral bus clock for gPTP, it has a separate gPTP reference clock. v4: * Add clk_disable_unprepare() for gptp ref clk v3: * Really renamed irq_en_dis_regs to irq_en_dis this time * Modified ravb_ptp_extts() to use irq_en_dis * Added Reviewed-by tags v2: * Just net patches in this series * Instead of reusing ch22 and ch24 interrupt names, use the proper names * Renamed irq_en_dis_regs to irq_en_dis * Squashed use of GIC reg versus GIE/GID and got rid of separate gptp_ptm_gic feature. * Move err_mgmt_irqs code under multi_irqs * Minor editing of the commit msgs ==================== Signed-off-by: David S. Miller <[email protected]>
2022-05-16ravb: Add support for RZ/V2MPhil Edworthy1-0/+26
RZ/V2M Ethernet is very similar to R-Car Gen3 Ethernet-AVB, though some small parts are the same as R-Car Gen2. Other differences to R-Car Gen3 and Gen2 are: * It has separate data (DI), error (Line 1) and management (Line 2) irqs rather than one irq for all three. * Instead of using the High-speed peripheral bus clock for gPTP, it has a separate gPTP reference clock. Signed-off-by: Phil Edworthy <[email protected]> Reviewed-by: Biju Das <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ravb: Use separate clock for gPTPPhil Edworthy2-3/+21
RZ/V2M has a separate gPTP reference clock that is used when the AVB-DMAC Mode Register (CCC) gPTP Clock Select (CSEL) bits are set to "01: High-speed peripheral bus clock". Therefore, add a feature that allows this clock to be used for gPTP. Signed-off-by: Phil Edworthy <[email protected]> Reviewed-by: Biju Das <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ravb: Support separate Line0 (Desc), Line1 (Err) and Line2 (Mgmt) irqsPhil Edworthy2-6/+53
R-Car has a combined interrupt line, ch22 = Line0_DiA | Line1_A | Line2_A. RZ/V2M has separate interrupt lines for each of these, so add a feature that allows the driver to get these interrupts and call the common handler. Signed-off-by: Phil Edworthy <[email protected]> Reviewed-by: Biju Das <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ravb: Separate handling of irq enable/disable regs into featurePhil Edworthy3-5/+7
Currently, when the HW has a single interrupt, the driver uses the GIC, TIC, RIC0 registers to enable and disable interrupts. When the HW has multiple interrupts, it uses the GIE, GID, TIE, TID, RIE0, RID0 registers. However, other devices, e.g. RZ/V2M, have multiple irqs and only have the GIC, TIC, RIC0 registers. Therefore, split this into a separate feature. Signed-off-by: Phil Edworthy <[email protected]> Reviewed-by: Biju Das <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16dt-bindings: net: renesas,etheravb: Document RZ/V2M SoCPhil Edworthy1-21/+61
Document the Ethernet AVB IP found on RZ/V2M SoC. It includes the Ethernet controller (E-MAC) and Dedicated Direct memory access controller (DMAC) for transferring transmitted Ethernet frames to and received Ethernet frames from respective storage areas in the RAM at high speed. The AVB-DMAC is compliant with IEEE 802.1BA, IEEE 802.1AS timing and synchronization protocol, IEEE 802.1Qav real-time transfer, and the IEEE 802.1Qat stream reservation protocol. R-Car has a pair of combined interrupt lines: ch22 = Line0_DiA | Line1_A | Line2_A ch23 = Line0_DiB | Line1_B | Line2_B Line0 for descriptor interrupts (which we call dia and dib). Line1 for error related interrupts (which we call err_a and err_b). Line2 for management and gPTP related interrupts (mgmt_a and mgmt_b). RZ/V2M hardware has separate interrupt lines for each of these. It has 3 clocks; the main AXI clock, the AMBA CHI (Coherent Hub Interface) clock and a gPTP reference clock. Signed-off-by: Phil Edworthy <[email protected]> Reviewed-by: Biju Das <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextDavid S. Miller23-386/+494
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next This is v2 including deadlock fix in conntrack ecache rework reported by Jakub Kicinski. The following patchset contains Netfilter updates for net-next, mostly updates to conntrack from Florian Westphal. 1) Add a dedicated list for conntrack event redelivery. 2) Include event redelivery list in conntrack dumps of dying type. 3) Remove per-cpu dying list for event redelivery, not used anymore. 4) Add netns .pre_exit to cttimeout to zap timeout objects before synchronize_rcu() call. 5) Remove nf_ct_unconfirmed_destroy. 6) Add generation id for conntrack extensions for conntrack timeout and helpers. 7) Detach timeout policy from conntrack on cttimeout module removal. 8) Remove __nf_ct_unconfirmed_destroy. 9) Remove unconfirmed list. 10) Remove unconditional local_bh_disable in init_conntrack(). 11) Consolidate conntrack iterator nf_ct_iterate_cleanup(). 12) Detect if ctnetlink listeners exist to short-circuit event path early. 13) Un-inline nf_ct_ecache_ext_add(). 14) Add nf_conntrack_events autodetect ctnetlink listener mode and make it default. 15) Add nf_ct_ecache_exist() to check for event cache extension. 16) Extend flowtable reverse route lookup to include source, iif, tos and mark, from Sven Auhagen. 17) Do not verify zero checksum UDP packets in nf_reject, from Kevin Mitchell. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-05-16Merge tag 'linux-can-fixes-for-5.18-20220514' of ↵David S. Miller3-65/+10
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-05-14 this is a pull request of 2 patches for net/master. Changes to linux-can-fixes-for-5.18-20220513: - adjusted Fixes: Tag on "Revert "can: m_can: pci: use custom bit timings for Elkhart Lake"" (Thanks Jakub) Both patches are by Jarkko Nikula, target the m_can PCI driver bindings, and fix usage of wrong bit timing constants for the Elkhart Lake platform. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-05-16mac80211: minstrel_ht: support ieee80211_rate_statusJonas Jelonek2-9/+133
This patch adds support for the new struct ieee80211_rate_status and its annotation in struct ieee80211_tx_status in minstrel_ht. In minstrel_ht_tx_status, a check for the presence of instances of the new struct in ieee80211_tx_status is added. Based on this, minstrel_ht then gets and updates internal rate stats with either struct ieee80211_rate_status or ieee80211_tx_info->status.rates. Adjusted variants of minstrel_ht_txstat_valid, minstrel_ht_get_stats, minstrel_{ht/vht}_get_group_idx are added which use struct ieee80211_rate_status and struct rate_info instead of the legacy structs. struct rate_info from cfg80211.h does not provide whether short preamble was used for the transmission. So we retrieve this information from VIF and STA configuration and cache it in a new flag in struct minstrel_ht_sta per rate control instance. Compile-Tested: current wireless-next tree with all flags on Tested-on: Xiaomi 4A Gigabit (MediaTek MT7603E, MT7612E) with OpenWrt Linux 5.10.113 Signed-off-by: Jonas Jelonek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: extend current rate control tx status APIJonas Jelonek4-45/+92
This patch adds the new struct ieee80211_rate_status and replaces 'struct rate_info *rate' in ieee80211_tx_status with pointer and length annotation. The struct ieee80211_rate_status allows to: (1) receive tx power status feedback for transmit power control (TPC) per packet or packet retry (2) dynamic mapping of wifi chip specific multi-rate retry (mrr) chains with different lengths (3) increase the limit of annotatable rate indices to support IEEE802.11ac rate sets and beyond ieee80211_tx_info, control and status buffer, and ieee80211_tx_rate cannot be used to achieve these goals due to fixed size limitations. Our new struct contains a struct rate_info to annotate the rate that was used, retry count of the rate and tx power. It is intended for all information related to RC and TPC that needs to be passed from driver to mac80211 and its RC/TPC algorithms like Minstrel_HT. It corresponds to one stage in an mrr. Multiple subsequent instances of this struct can be included in struct ieee80211_tx_status via a pointer and a length variable. Those instances can be allocated on-stack. The former reference to a single instance of struct rate_info is replaced with our new annotation. An extension is introduced to struct ieee80211_hw. There are two new members called 'tx_power_levels' and 'max_txpwr_levels_idx' acting as a tx power level table. When a wifi device is registered, the driver shall supply all supported power levels in this list. This allows to support several quirks like differing power steps in power level ranges or alike. TPC can use this for algorithm and thus be designed more abstract instead of handling all possible step widths individually. Further mandatory changes in status.c, mt76 and ath11k drivers due to the removal of 'struct rate_info *rate' are also included. status.c already uses the information in ieee80211_tx_status->rate in radiotap, this is now changed to use ieee80211_rate_status->rate_idx. mt76 driver already uses struct rate_info to pass the tx rate to status path. The new members of the ieee80211_tx_status are set to NULL and 0 because the previously passed rate is not relevant to rate control and accurate information is passed via tx_info->status.rates. For ath11k, the txrate can be passed via this struct because ath11k uses firmware RC and thus the information does not interfere with software RC. Compile-Tested: current wireless-next tree with all flags on Tested-on: Xiaomi 4A Gigabit (MediaTek MT7603E, MT7612E) with OpenWrt Linux 5.10.113 Signed-off-by: Jonas Jelonek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: minstrel_ht: fill all requested ratesPeter Seiderer1-7/+7
Fill all requested rates (in case of ath9k 4 rate slots are available, so fill all 4 instead of only 3), improves throughput in noisy environment. Signed-off-by: Peter Seiderer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: disable BSS color collision detection in case of no free colorsLavanya Suresh1-0/+9
AP may run out of BSS color after color collision detection event from driver. Disable BSS color collision detection if no free colors are available based on bss color disabled bit sent as a part of NL80211_ATTR_HE_BSS_COLOR attribute sent in NL80211_CMD_SET_BEACON. It can be reenabled once new color is available. Signed-off-by: Lavanya Suresh <[email protected]> Signed-off-by: Rameshkumar Sundaram <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-05-16nl80211: Parse NL80211_ATTR_HE_BSS_COLOR as a part of nl80211_parse_beaconRameshkumar Sundaram3-36/+39
NL80211_ATTR_HE_BSS_COLOR attribute can be included in both NL80211_CMD_START_AP and NL80211_CMD_SET_BEACON commands. Move he_bss_color from cfg80211_ap_settings to cfg80211_beacon_data and parse NL80211_ATTR_HE_BSS_COLOR as a part of nl80211_parse_beacon() to have bss color settings parsed for both start ap and set beacon commands. Add a new flag he_bss_color_valid to indicate whether NL80211_ATTR_HE_BSS_COLOR attribute is included. Signed-off-by: Rameshkumar Sundaram <[email protected]> Link: https://lore.kernel.org/r/[email protected] [fix build ...] Signed-off-by: Johannes Berg <[email protected]>
2022-05-16ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machineAndy Chi1-0/+1
The HP EliteBook 630 is using ALC236 codec which used 0x02 to control mute LED and 0x01 to control micmute LED. Therefore, add a quirk to make it works. Signed-off-by: Andy Chi <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2022-05-16xfrm: fix "disable_policy" flag use when arriving from different devicesEyal Birger3-6/+32
In IPv4 setting the "disable_policy" flag on a device means no policy should be enforced for traffic originating from the device. This was implemented by seting the DST_NOPOLICY flag in the dst based on the originating device. However, dsts are cached in nexthops regardless of the originating devices, in which case, the DST_NOPOLICY flag value may be incorrect. Consider the following setup: +------------------------------+ | ROUTER | +-------------+ | +-----------------+ | | ipsec src |----|-|ipsec0 | | +-------------+ | |disable_policy=0 | +----+ | | +-----------------+ |eth1|-|----- +-------------+ | +-----------------+ +----+ | | noipsec src |----|-|eth0 | | +-------------+ | |disable_policy=1 | | | +-----------------+ | +------------------------------+ Where ROUTER has a default route towards eth1. dst entries for traffic arriving from eth0 would have DST_NOPOLICY and would be cached and therefore can be reused by traffic originating from ipsec0, skipping policy check. Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead of the DST in IN/FWD IPv4 policy checks. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Shmulik Ladkani <[email protected]> Signed-off-by: Eyal Birger <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-05-16mac80211: mlme: track assoc_bss/associated separatelyJohannes Berg2-10/+13
We currently track whether we're associated and which the BSS is in the same variable (ifmgd->associated), but for MLD we'll need to move the BSS pointer to be per link, while the question whether we're associated or not is for the whole interface. Add ifmgd->assoc_bss that stores the pointer and change ifmgd->associated to be just a bool, so the question of whether we're associated can continue working after MLD rework, without requiring changes, while the BSS pointer will have to be changed/used checked per link. Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: remove useless bssid copyJohannes Berg1-3/+1
We don't need to copy this locally, we now only use the variable to print before doing other things. Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: remove unused argument to ieee80211_sta_connection_lost()Johannes Berg3-9/+7
We never use the bssid argument to ieee80211_sta_connection_lost() so we might as well just remove it. Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: mlme: use local SSID copyJohannes Berg1-12/+2
There's no need to look it up from the ifmgd->associated BSS configuration, we already maintain a local copy since commit b0140fda626e ("mac80211: mlme: save ssid info to ieee80211_bss_conf while assoc"). Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: use ifmgd->bssid instead of ifmgd->associated->bssidJohannes Berg5-21/+21
Since we always track the BSSID there when we get associated, these are equivalent, but ifmgd->bssid saves a dereference and thus makes the code a bit smaller, and more readable. Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: mlme: move in RSSI reporting codeJohannes Berg2-40/+40
This code is tightly coupled to the sdata->u.mgd data structure, so there's no reason for it to be in utils. Move it to mlme.c. Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: remove stray multi_sta_back_32bit docsJohannes Berg1-1/+0
This field doesn't exist, remove the docs for it. Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: fix typo in documentationJohannes Berg1-1/+1
This is called offload_flags, remove the extra 'a'. Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: unify CCMP/GCMP AAD constructionJohannes Berg1-56/+31
Ping-Ke's previous patch adjusted the CCMP AAD construction to properly take the order bit into account, but failed to update the (identical) GCMP AAD construction as well. Unify the AAD construction between the two cases. Reported-by: Jouni Malinen <[email protected]> Link: https://lore.kernel.org/r/20220506105150.51d66e2a6f3c.I65f12be82c112365169e8a9f48c7a71300e814b9@changeid Signed-off-by: Johannes Berg <[email protected]>
2022-05-15Linux 5.18-rc7Linus Torvalds1-1/+1
2022-05-15Merge tag 'driver-core-5.18-rc7' of ↵Linus Torvalds2-3/+21
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here is one fix, and three documentation updates for 5.18-rc7. The fix is for the firmware loader which resolves a long-reported problem where the credentials of the firmware loader could be set to a userspace process without enough permissions to actually load the firmware image. Many Android vendors have been reporting this for quite some time. The documentation updates are for the embargoed-hardware-issues.rst file to add a new entry, change an existing one, and sort the list to make changes easier in the future. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Documentation/process: Update ARM contact for embargoed hardware issues Documentation/process: Add embargoed HW contact for Ampere Computing Documentation/process: Make groups alphabetical and use tabs consistently firmware_loader: use kernel credentials when reading firmware
2022-05-15Merge tag 'char-misc-5.18-rc7' of ↵Linus Torvalds2-3/+9
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are two small driver fixes for 5.18-rc7 that resolve reported problems: - slimbus driver irq bugfix - interconnect sync state bugfix Both of these have been in linux-next with no reported problems" * tag 'char-misc-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: slimbus: qcom: Fix IRQ check in qcom_slim_probe interconnect: Restore sync state by ignoring ipa-virt in provider count