aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-05-19Merge tag 'linux-can-next-for-5.19-20220519' of ↵Jakub Kicinski2-61/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-05-19 Oliver Hartkopp contributes a patch for the ISO-TP CAN protocol to update the validation of address information during bind. The next patch is by Jakub Kicinski and converts the CAN network drivers from netif_napi_add() to the netif_napi_add_weight() function. Another patch by Oliver Hartkopp removes obsolete CAN specific LED support. Vincent Mailhol's patch for the mcp251xfd driver fixes a -Wunaligned-access warning by clang-14. * tag 'linux-can-next-for-5.19-20220519' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: mcp251xfd: silence clang's -Wunaligned-access warning can: can-dev: remove obsolete CAN LED support can: can-dev: move to netif_napi_add_weight() can: isotp: isotp_bind(): do not validate unused address information ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-19can: can-dev: remove obsolete CAN LED supportOliver Hartkopp2-61/+0
Since commit 30f3b42147ba6f ("can: mark led trigger as broken") the CAN specific LED support was disabled and marked as BROKEN. As the common LED support with CONFIG_LEDS_TRIGGER_NETDEV should do this work now the code can be removed as preparation for a CAN netdevice Kconfig rework. Link: https://lore.kernel.org/all/[email protected] Suggested-by: Vincent Mailhol <[email protected]> Signed-off-by: Oliver Hartkopp <[email protected]> [mkl: remove led.h from MAINTAINERS] Signed-off-by: Marc Kleine-Budde <[email protected]>
2022-05-19Merge tag 'wireless-next-2022-05-19' of ↵Jakub Kicinski3-11/+41
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v5.19 Second set of patches for v5.19 and most likely the last one. rtw89 got support for 8852ce devices and mt76 now supports Wireless Ethernet Dispatch. Major changes: cfg80211/mac80211 - support disabling EHT mode rtw89 - add support for Realtek 8852ce devices mt76 - Wireless Ethernet Dispatch support for flow offload - non-standard VHT MCS10-11 support - mt7921 AP mode support - mt7921 ipv6 NS offload support ath11k - enable keepalive during WoWLAN suspend - implement remain-on-channel support * tag 'wireless-next-2022-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (135 commits) iwlwifi: mei: fix potential NULL-ptr deref iwlwifi: mei: clear the sap data header before sending iwlwifi: mvm: remove vif_count iwlwifi: mvm: always tell the firmware to accept MCAST frames in BSS iwlwifi: mvm: add OTP info in case of init failure iwlwifi: mvm: fix assert 1F04 upon reconfig iwlwifi: fw: init SAR GEO table only if data is present iwlwifi: mvm: clean up authorized condition iwlwifi: mvm: use NULL instead of ERR_PTR when parsing wowlan status iwlwifi: pcie: simplify MSI-X cause mapping rtw89: pci: only mask out INT indicator register for disable interrupt v1 rtw89: convert rtw89_band to nl80211_band precisely rtw89: 8852c: update txpwr tables to HALRF_027_00_052 rtw89: cfo: check mac_id to avoid out-of-bounds rtw89: 8852c: set TX antenna path rtw89: add ieee80211::sta_rc_update ops wireless: Fix Makefile to be in alphabetical order mac80211: refactor freeing the next_beacon cfg80211: fix kernel-doc for cfg80211_beacon_data mac80211: minstrel_ht: support ieee80211_rate_status ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski8-9/+23
drivers/net/ethernet/mellanox/mlx5/core/main.c b33886971dbc ("net/mlx5: Initialize flow steering during driver probe") 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support") f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table") https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/ 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") https://lore.kernel.org/all/[email protected]/ tools/testing/selftests/net/mptcp/mptcp_join.sh e274f7154008 ("selftests: mptcp: add subflow limits test-cases") b6e074e171bc ("selftests: mptcp: add infinite map testcase") 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type") https://lore.kernel.org/all/[email protected]/ net/mptcp/options.c ba2c89e0ea74 ("mptcp: fix checksum byte order") 1e39e5a32ad7 ("mptcp: infinite mapping sending") ea66758c1795 ("tcp: allow MPTCP to update the announced window") https://lore.kernel.org/all/[email protected]/ net/mptcp/pm.c 95d686517884 ("mptcp: fix subflow accounting on close") 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") https://lore.kernel.org/all/[email protected]/ net/mptcp/subflow.c ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure") 0348c690ed37 ("mptcp: add the fallback check") f8d4bcacff3b ("mptcp: infinite mapping receiving") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-19Merge tag 'net-5.18-rc8' of ↵Linus Torvalds4-3/+17
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, xfrm and netfilter subtrees. Notably this reverts a recent TCP/DCCP netns-related change to address a possible UaF. Current release - regressions: - tcp: revert "tcp/dccp: get rid of inet_twsk_purge()" - xfrm: set dst dev to blackhole_netdev instead of loopback_dev in ifdown Previous releases - regressions: - netfilter: flowtable: fix TCP flow teardown - can: revert "can: m_can: pci: use custom bit timings for Elkhart Lake" - xfrm: check encryption module availability consistency - eth: vmxnet3: fix possible use-after-free bugs in vmxnet3_rq_alloc_rx_buf() - eth: mlx5: initialize flow steering during driver probe - eth: ice: fix crash when writing timestamp on RX rings Previous releases - always broken: - mptcp: fix checksum byte order - eth: lan966x: fix assignment of the MAC address - eth: mlx5: remove HW-GRO from reported features - eth: ftgmac100: disable hardware checksum on AST2600" * tag 'net-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) net: bridge: Clear offload_fwd_mark when passing frame up bridge interface. ptp: ocp: change sysfs attr group handling selftests: forwarding: fix missing backslash netfilter: nf_tables: disable expression reduction infra netfilter: flowtable: move dst_check to packet path netfilter: flowtable: fix TCP flow teardown net: ftgmac100: Disable hardware checksum on AST2600 igb: skip phy status check where unavailable nfc: pn533: Fix buggy cleanup order mptcp: Do TCP fallback on early DSS checksum failure mptcp: fix checksum byte order net: af_key: check encryption module availability consistency net: af_key: add check for pfkey_broadcast in function pfkey_process net/mlx5: Drain fw_reset when removing device net/mlx5e: CT: Fix setting flow_source for smfs ct tuples net/mlx5e: CT: Fix support for GRE tuples net/mlx5e: Remove HW-GRO from reported features net/mlx5e: Properly block HW GRO when XDP is enabled net/mlx5e: Properly block LRO when XDP is enabled net/mlx5e: Block rx-gro-hw feature in switchdev mode ...
2022-05-19tls: Add opt-in zerocopy mode of sendfile()Boris Pismenny2-0/+3
TLS device offload copies sendfile data to a bounce buffer before transmitting. It allows to maintain the valid MAC on TLS records when the file contents change and a part of TLS record has to be retransmitted on TCP level. In many common use cases (like serving static files over HTTPS) the file contents are not changed on the fly. In many use cases breaking the connection is totally acceptable if the file is changed during transmission, because it would be received corrupted in any case. This commit allows to optimize performance for such use cases to providing a new optional mode of TLS sendfile(), in which the extra copy is skipped. Removing this copy improves performance significantly, as TLS and TCP sendfile perform the same operations, and the only overhead is TLS header/trailer insertion. The new mode can only be enabled with the new socket option named TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards compatibility with existing applications that rely on the copying behavior. The new mode is safe, meaning that unsolicited modifications of the file being sent can't break integrity of the kernel. The worst thing that can happen is sending a corrupted TLS record, which is in any case not forbidden when using regular TCP sockets. Sockets other than TLS device offload are not affected by the new socket option. The actual status of zerocopy sendfile can be queried with sock_diag. Performance numbers in a single-core test with 24 HTTPS streams on nginx, under 100% CPU load: * non-zerocopy: 33.6 Gbit/s * zerocopy: 79.92 Gbit/s CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz Signed-off-by: Boris Pismenny <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-05-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski1-1/+1
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Reduce number of hardware offload retries from flowtable datapath which might hog system with retries, from Felix Fietkau. 2) Skip neighbour lookup for PPPoE device, fill_forward_path() already provides this and set on destination address from fill_forward_path for PPPoE device, also from Felix. 4) When combining PPPoE on top of a VLAN device, set info->outdev to the PPPoE device so software offload works, from Felix. 5) Fix TCP teardown flowtable state, races with conntrack gc might result in resetting the state to ESTABLISHED and the time to one day. Joint work with Oz Shlomo and Sven Auhagen. 6) Call dst_check() from flowtable datapath to check if dst is stale instead of doing it from garbage collector path. 7) Disable register tracking infrastructure, either user-space or kernel need to pre-fetch keys inconditionally, otherwise register tracking assumes data is already available in register that might not well be there, leading to incorrect reductions. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: disable expression reduction infra netfilter: flowtable: move dst_check to packet path netfilter: flowtable: fix TCP flow teardown netfilter: nft_flow_offload: fix offload with pppoe + vlan net: fix dev_fill_forward_path with pppoe + bridge netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices netfilter: flowtable: fix excessive hw offload attempts after failure ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-18Merge tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull io_uring fixes from Jens Axboe: "Two small changes fixing issues from the 5.18 merge window: - Fix wrong ordering of a tracepoint (Dylan) - Fix MSG_RING on IOPOLL rings (me)" * tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block: io_uring: don't attempt to IOPOLL for MSG_RING requests io_uring: fix ordering of args in io_uring_queue_async_work
2022-05-18Merge branch 'master' of ↵David S. Miller2-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-05-18 1) Fix "disable_policy" flag use when arriving from different devices. From Eyal Birger. 2) Fix error handling of pfkey_broadcast in function pfkey_process. From Jiasheng Jiang. 3) Check the encryption module availability consistency in pfkey. From Thomas Bartschies. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-05-17net/mlx5: Support multiport eswitch modeEli Cohen1-2/+3
Multiport eswitch mode is a LAG mode that allows to add rules that forward traffic to a specific physical port without being affected by LAG affinity configuration. This mode of operation is mutual exclusive with the other LAG modes used by multipath and bonding. To make the transition between the modes, we maintain a counter on the number of rules specifying one of the uplink representors as the target of mirred egress redirect action. An example of such rule would be: $ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \ 00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0 If the reference count just grows to one and LAG is not in use, we create the LAG in multiport eswitch mode. Other mode changes are not allowed while in this mode. When the reference count reaches zero, we destroy the LAG and let other modes be used if needed. logic also changed such that if forwarding to some uplink destination cannot be guaranteed, we fail the operation so the rule will eventually be in software and not in hardware. Signed-off-by: Eli Cohen <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-17net/mlx5: Inline db alloc API functionTariq Toukan1-1/+6
Take the wrapper version which picks default node into a header file. This reduces the number of exported functions. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-17net/mlx5: Add last command failure syndrome to debugfsMoshe Shemesh1-0/+2
Add syndrome of last command failure per command type to debugfs to ease debugging of such failure. last_failed_syndrome - last command failed syndrome returned by FW. Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-17ptp: ptp_clockmatrix: Add PTP_CLK_REQ_EXTTS supportMin Li1-1/+11
Use TOD_READ_SECONDARY for extts to keep TOD_READ_PRIMARY for gettime and settime exclusively. Before this change, TOD_READ_PRIMARY was used for both extts and gettime/settime, which would result in changing TOD read/write triggers between operations. Using TOD_READ_SECONDARY would make extts independent of gettime/settime operation Signed-off-by: Min Li <[email protected]> Acked-by: Richard Cochran <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-17audit,io_uring,io-wq: call __audit_uring_exit for dummy contextsJulian Orth1-1/+1
Not calling the function for dummy contexts will cause the context to not be reset. During the next syscall, this will cause an error in __audit_syscall_entry: WARN_ON(context->context != AUDIT_CTX_UNUSED); WARN_ON(context->name_count); if (context->context != AUDIT_CTX_UNUSED || context->name_count) { audit_panic("unrecoverable error in audit_syscall_entry()"); return; } These problematic dummy contexts are created via the following call chain: exit_to_user_mode_prepare -> arch_do_signal_or_restart -> get_signal -> task_work_run -> tctx_task_work -> io_req_task_submit -> io_issue_sqe -> audit_uring_entry Cc: [email protected] Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Signed-off-by: Julian Orth <[email protected]> [PM: subject line tweaks] Signed-off-by: Paul Moore <[email protected]>
2022-05-17cfg80211: fix kernel-doc for cfg80211_beacon_dataJohannes Berg1-1/+1
The kernel-doc comment is formatted badly, resulting in a warning: include/net/cfg80211.h:1188: warning: bad line: [...] Fix that. Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2022-05-16Merge tag 'linux-can-next-for-5.19-20220516' of ↵Jakub Kicinski1-12/+13
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-05-16 the first 2 patches are by me and target the CAN raw protocol. The 1st removes an unneeded assignment, the other one adds support for SO_TXTIME/SCM_TXTIME. Oliver Hartkopp contributes 2 patches for the ISOTP protocol. The 1st adds support for transmission without flow control, the other let's bind() return an error on incorrect CAN ID formatting. Geert Uytterhoeven contributes a patch to clean up ctucanfd's Kconfig file. Vincent Mailhol's patch for the slcan driver uses the proper function to check for invalid CAN frames in the xmit callback. The next patch is by Geert Uytterhoeven and makes the interrupt-names of the renesas,rcar-canfd dt bindings mandatory. A patch by my update the ctucanfd dt bindings to include the common CAN controller bindings. The last patch is by Akira Yokosawa and fixes a breakage the ctucanfd's documentation. * tag 'linux-can-next-for-5.19-20220516' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: docs: ctucanfd: Use 'kernel-figure' directive instead of 'figure' dt-bindings: can: ctucanfd: include common CAN controller bindings dt-bindings: can: renesas,rcar-canfd: Make interrupt-names required can: slcan: slc_xmit(): use can_dropped_invalid_skb() instead of manual check can: ctucanfd: Let users select instead of depend on CAN_CTUCANFD can: isotp: isotp_bind(): return -EINVAL on incorrect CAN ID formatting can: isotp: add support for transmission without flow control can: raw: add support for SO_TXTIME/SCM_TXTIME can: raw: raw_sendmsg(): remove not needed setting of skb->sk ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-16net: skb: Remove skb_data_area_size()Ricardo Martinez1-5/+0
skb_data_area_size() is not needed. As Jakub pointed out [1]: For Rx, drivers can use the size passed during skb allocation or use skb_tailroom(). For Tx, drivers should use skb_headlen(). [1] https://lore.kernel.org/netdev/CAHNKnsTmH-rGgWi3jtyC=ktM1DW2W1VJkYoTMJV2Z_Bt498bsg@mail.gmail.com/ Signed-off-by: Ricardo Martinez <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Sergey Ryazanov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-16can: isotp: add support for transmission without flow controlOliver Hartkopp1-12/+13
Usually the ISO 15765-2 protocol is a point-to-point protocol to transfer segmented PDUs to a dedicated receiver. This receiver sends a flow control message to specify protocol options and timings (e.g. block size / STmin). The so called functional addressing communication allows a 1:N communication but is limited to a single frame length. This new CAN_ISOTP_CF_BROADCAST allows an unconfirmed 1:N communication with PDU length that would not fit into a single frame. This feature is not covered by the ISO 15765-2 standard. Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Oliver Hartkopp <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2022-05-16net: fix dev_fill_forward_path with pppoe + bridgeFelix Fietkau1-1/+1
When calling dev_fill_forward_path on a pppoe device, the provided destination address is invalid. In order for the bridge fdb lookup to succeed, the pppoe code needs to update ctx->daddr to the correct value. Fix this by storing the address inside struct net_device_path_ctx Fixes: f6efc675c9dd ("net: ppp: resolve forwarding path for bridge pppoe devices") Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-16net: fix possible race in skb_attempt_defer_free()Eric Dumazet1-0/+1
A cpu can observe sd->defer_count reaching 128, and call smp_call_function_single_async() Problem is that the remote CPU can clear sd->defer_count before the IPI is run/acknowledged. Other cpus can queue more packets and also decide to call smp_call_function_single_async() while the pending IPI was not yet delivered. This is a common issue with smp_call_function_single_async(). Callers must ensure correct synchronization and serialization. I triggered this issue while experimenting smaller threshold. Performing the call to smp_call_function_single_async() under sd->defer_lock protection did not solve the problem. Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async() to insert locked csd") replaced an informative WARN_ON_ONCE() with a return of -EBUSY, which is often ignored. Test of CSD_FLAG_LOCK presence is racy anyway. Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: skb: change the definition SKB_DR_SET()Menglong Dong1-1/+2
The SKB_DR_OR() is used to set the drop reason to a value when it is not set or specified yet. SKB_NOT_DROPPED_YET should also be considered as not set. Reviewed-by: Jiang Biao <[email protected]> Reviewed-by: Hao Peng <[email protected]> Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16inet: rename INET_MATCH()Eric Dumazet1-1/+1
This is no longer a macro, but an inlined function. INET_MATCH() -> inet_match() Signed-off-by: Eric Dumazet <[email protected]> Suggested-by: Olivier Hartkopp <[email protected]> Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ipv6: add READ_ONCE(sk->sk_bound_dev_if) in INET6_MATCH()Eric Dumazet1-9/+19
INET6_MATCH() runs without holding a lock on the socket. We probably need to annotate most reads. This patch makes INET6_MATCH() an inline function to ease our changes. v2: inline function only defined if IS_ENABLED(CONFIG_IPV6) Change the name to inet6_match(), this is no longer a macro. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16tcp: sk->sk_bound_dev_if once in inet_request_bound_dev_if()Eric Dumazet1-2/+3
inet_request_bound_dev_if() reads sk->sk_bound_dev_if twice while listener socket is not locked. Another cpu could change this field under us. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: annotate races around sk->sk_bound_dev_ifEric Dumazet2-3/+4
UDP sendmsg() is lockless, and reads sk->sk_bound_dev_if while this field can be changed by another thread. Adds minimal annotations to avoid KCSAN splats for UDP. Following patches will add more annotations to potential lockless readers. BUG: KCSAN: data-race in __ip6_datagram_connect / udpv6_sendmsg write to 0xffff888136d47a94 of 4 bytes by task 7681 on cpu 0: __ip6_datagram_connect+0x6e2/0x930 net/ipv6/datagram.c:221 ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272 inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576 __sys_connect_file net/socket.c:1900 [inline] __sys_connect+0x197/0x1b0 net/socket.c:1917 __do_sys_connect net/socket.c:1927 [inline] __se_sys_connect net/socket.c:1924 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1924 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888136d47a94 of 4 bytes by task 7670 on cpu 1: udpv6_sendmsg+0xc60/0x16e0 net/ipv6/udp.c:1436 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:652 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000000 -> 0xffffff9b Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 7670 Comm: syz-executor.3 Tainted: G W 5.18.0-rc1-syzkaller-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 I chose to not add Fixes: tag because race has minor consequences and stable teams busy enough. Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ipv6: Add hop-by-hop header to jumbograms in ip6_outputCoco Li1-0/+1
Instead of simply forcing a 0 payload_len in IPv6 header, implement RFC 2675 and insert a custom extension header. Note that only TCP stack is currently potentially generating jumbograms, and that this extension header is purely local, it wont be sent on a physical link. This is needed so that packet capture (tcpdump and friends) can properly dissect these large packets. Signed-off-by: Coco Li <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: allow gro_max_size to exceed 65536Alexander Duyck2-2/+6
Allow the gro_max_size to exceed a value larger than 65536. There weren't really any external limitations that prevented this other than the fact that IPv4 only supports a 16 bit length field. Since we have the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to exceed this value and for IPv4 and non-TCP flows we can cap things at 65536 via a constant rather than relying on gro_max_size. [edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ipv6/gso: remove temporary HBH/jumbo headerEric Dumazet1-0/+33
ipv6 tcp and gro stacks will soon be able to build big TCP packets, with an added temporary Hop By Hop header. If GSO is involved for these large packets, we need to remove the temporary HBH header before segmentation happens. v2: perform HBH removal from ipv6_gso_segment() instead of skb_segment() (Alexander feedback) Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16ipv6: add struct hop_jumbo_hdr definitionEric Dumazet1-0/+11
Following patches will need to add and remove local IPv6 jumbogram options to enable BIG TCP. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: limit GSO_MAX_SIZE to 524280 bytesEric Dumazet1-2/+5
Make sure we will not overflow shinfo->gso_segs Minimal TCP MSS size is 8 bytes, and shinfo->gso_segs is a 16bit field. TCP_MIN_GSO_SIZE is currently defined in include/net/tcp.h, it seems cleaner to not bring tcp details into include/linux/netdevice.h Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: allow gso_max_size to exceed 65536Alexander Duyck1-1/+3
The code for gso_max_size was added originally to allow for debugging and workaround of buggy devices that couldn't support TSO with blocks 64K in size. The original reason for limiting it to 64K was because that was the existing limits of IPv4 and non-jumbogram IPv6 length fields. With the addition of Big TCP we can remove this limit and allow the value to potentially go up to UINT_MAX and instead be limited by the tso_max_size value. So in order to support this we need to go through and clean up the remaining users of the gso_max_size value so that the values will cap at 64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper limit for GSO_MAX_SIZE. v6: (edumazet) fixed a compile error if CONFIG_IPV6=n, in a new sk_trim_gso_size() helper. netif_set_tso_max_size() caps the requested TSO size with GSO_MAX_SIZE. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16net: add IFLA_TSO_{MAX_SIZE|SEGS} attributesEric Dumazet1-0/+2
New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS are used to report to user-space the device TSO limits. ip -d link sh dev eth1 ... tso_max_size 65536 tso_max_segs 65535 Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextDavid S. Miller8-80/+70
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next This is v2 including deadlock fix in conntrack ecache rework reported by Jakub Kicinski. The following patchset contains Netfilter updates for net-next, mostly updates to conntrack from Florian Westphal. 1) Add a dedicated list for conntrack event redelivery. 2) Include event redelivery list in conntrack dumps of dying type. 3) Remove per-cpu dying list for event redelivery, not used anymore. 4) Add netns .pre_exit to cttimeout to zap timeout objects before synchronize_rcu() call. 5) Remove nf_ct_unconfirmed_destroy. 6) Add generation id for conntrack extensions for conntrack timeout and helpers. 7) Detach timeout policy from conntrack on cttimeout module removal. 8) Remove __nf_ct_unconfirmed_destroy. 9) Remove unconfirmed list. 10) Remove unconditional local_bh_disable in init_conntrack(). 11) Consolidate conntrack iterator nf_ct_iterate_cleanup(). 12) Detect if ctnetlink listeners exist to short-circuit event path early. 13) Un-inline nf_ct_ecache_ext_add(). 14) Add nf_conntrack_events autodetect ctnetlink listener mode and make it default. 15) Add nf_ct_ecache_exist() to check for event cache extension. 16) Extend flowtable reverse route lookup to include source, iif, tos and mark, from Sven Auhagen. 17) Do not verify zero checksum UDP packets in nf_reject, from Kevin Mitchell. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-05-16mac80211: extend current rate control tx status APIJonas Jelonek1-2/+31
This patch adds the new struct ieee80211_rate_status and replaces 'struct rate_info *rate' in ieee80211_tx_status with pointer and length annotation. The struct ieee80211_rate_status allows to: (1) receive tx power status feedback for transmit power control (TPC) per packet or packet retry (2) dynamic mapping of wifi chip specific multi-rate retry (mrr) chains with different lengths (3) increase the limit of annotatable rate indices to support IEEE802.11ac rate sets and beyond ieee80211_tx_info, control and status buffer, and ieee80211_tx_rate cannot be used to achieve these goals due to fixed size limitations. Our new struct contains a struct rate_info to annotate the rate that was used, retry count of the rate and tx power. It is intended for all information related to RC and TPC that needs to be passed from driver to mac80211 and its RC/TPC algorithms like Minstrel_HT. It corresponds to one stage in an mrr. Multiple subsequent instances of this struct can be included in struct ieee80211_tx_status via a pointer and a length variable. Those instances can be allocated on-stack. The former reference to a single instance of struct rate_info is replaced with our new annotation. An extension is introduced to struct ieee80211_hw. There are two new members called 'tx_power_levels' and 'max_txpwr_levels_idx' acting as a tx power level table. When a wifi device is registered, the driver shall supply all supported power levels in this list. This allows to support several quirks like differing power steps in power level ranges or alike. TPC can use this for algorithm and thus be designed more abstract instead of handling all possible step widths individually. Further mandatory changes in status.c, mt76 and ath11k drivers due to the removal of 'struct rate_info *rate' are also included. status.c already uses the information in ieee80211_tx_status->rate in radiotap, this is now changed to use ieee80211_rate_status->rate_idx. mt76 driver already uses struct rate_info to pass the tx rate to status path. The new members of the ieee80211_tx_status are set to NULL and 0 because the previously passed rate is not relevant to rate control and accurate information is passed via tx_info->status.rates. For ath11k, the txrate can be passed via this struct because ath11k uses firmware RC and thus the information does not interfere with software RC. Compile-Tested: current wireless-next tree with all flags on Tested-on: Xiaomi 4A Gigabit (MediaTek MT7603E, MT7612E) with OpenWrt Linux 5.10.113 Signed-off-by: Jonas Jelonek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-05-16nl80211: Parse NL80211_ATTR_HE_BSS_COLOR as a part of nl80211_parse_beaconRameshkumar Sundaram1-2/+5
NL80211_ATTR_HE_BSS_COLOR attribute can be included in both NL80211_CMD_START_AP and NL80211_CMD_SET_BEACON commands. Move he_bss_color from cfg80211_ap_settings to cfg80211_beacon_data and parse NL80211_ATTR_HE_BSS_COLOR as a part of nl80211_parse_beacon() to have bss color settings parsed for both start ap and set beacon commands. Add a new flag he_bss_color_valid to indicate whether NL80211_ATTR_HE_BSS_COLOR attribute is included. Signed-off-by: Rameshkumar Sundaram <[email protected]> Link: https://lore.kernel.org/r/[email protected] [fix build ...] Signed-off-by: Johannes Berg <[email protected]>
2022-05-16xfrm: fix "disable_policy" flag use when arriving from different devicesEyal Birger2-1/+14
In IPv4 setting the "disable_policy" flag on a device means no policy should be enforced for traffic originating from the device. This was implemented by seting the DST_NOPOLICY flag in the dst based on the originating device. However, dsts are cached in nexthops regardless of the originating devices, in which case, the DST_NOPOLICY flag value may be incorrect. Consider the following setup: +------------------------------+ | ROUTER | +-------------+ | +-----------------+ | | ipsec src |----|-|ipsec0 | | +-------------+ | |disable_policy=0 | +----+ | | +-----------------+ |eth1|-|----- +-------------+ | +-----------------+ +----+ | | noipsec src |----|-|eth0 | | +-------------+ | |disable_policy=1 | | | +-----------------+ | +------------------------------+ Where ROUTER has a default route towards eth1. dst entries for traffic arriving from eth0 would have DST_NOPOLICY and would be cached and therefore can be reused by traffic originating from ipsec0, skipping policy check. Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead of the DST in IN/FWD IPv4 policy checks. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Shmulik Ladkani <[email protected]> Signed-off-by: Eyal Birger <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-05-16mac80211: remove stray multi_sta_back_32bit docsJohannes Berg1-1/+0
This field doesn't exist, remove the docs for it. Signed-off-by: Johannes Berg <[email protected]>
2022-05-16mac80211: fix typo in documentationJohannes Berg1-1/+1
This is called offload_flags, remove the extra 'a'. Signed-off-by: Johannes Berg <[email protected]>
2022-05-15Merge tag 'sched-urgent-2022-05-15' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "The recent expansion of the sched switch tracepoint inserted a new argument in the middle of the arguments. This reordering broke BPF programs which relied on the old argument list. While tracepoints are not considered stable ABI, it's not trivial to make BPF cope with such a change, but it's being worked on. For now restore the original argument order and move the new argument to the end of the argument list" * tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/tracing: Append prev_state to tp args instead
2022-05-13Merge tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-1/+1
Pull NFS client bugfixes from Trond Myklebust: "One more pull request. There was a bug in the fix to ensure that gss- proxy continues to work correctly after we fixed the AF_LOCAL socket leak in the RPC code. This therefore reverts that broken patch, and replaces it with one that works correctly. Stable fixes: - SUNRPC: Ensure that the gssproxy client can start in a connected state Bugfixes: - Revert "SUNRPC: Ensure gss-proxy connects on setup" - nfs: fix broken handling of the softreval mount option" * tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix broken handling of the softreval mount option SUNRPC: Ensure that the gssproxy client can start in a connected state Revert "SUNRPC: Ensure gss-proxy connects on setup"
2022-05-13Merge branch 'master' of ↵Jakub Kicinski1-8/+12
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-05-13 1) Cleanups for the code behind the XFRM offload API. This is a preparation for the extension of the API for policy offload. From Leon Romanovsky. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: drop not needed flags variable in XFRM offload struct net/mlx5e: Use XFRM state direction instead of flags netdevsim: rely on XFRM state direction instead of flags ixgbe: propagate XFRM offload state direction instead of flags xfrm: store and rely on direction to construct offload flags xfrm: rename xfrm_state_offload struct to allow reuse xfrm: delete not used number of external headers xfrm: free not used XFRM_ESP_NO_TRAILER flag ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-13netfilter: conntrack: skip verification of zero UDP checksumKevin Mitchell1-4/+17
The checksum is optional for UDP packets. However nf_reject would previously require a valid checksum to elicit a response such as ICMP_DEST_UNREACH. Add some logic to nf_reject_verify_csum to determine if a UDP packet has a zero checksum and should therefore not be verified. Signed-off-by: Kevin Mitchell <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-13netfilter: prefer extension check to pointer checkFlorian Westphal2-17/+16
The pointer check usually results in a 'false positive': its likely that the ctnetlink module is loaded but no event monitoring is enabled. After recent change to autodetect ctnetlink usage and only allocate the ecache extension if a listener is active, check if the extension is present on a given conntrack. If its not there, there is nothing to report and calls to the notification framework can be elided. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-13netfilter: conntrack: un-inline nf_ct_ecache_ext_addFlorian Westphal1-25/+5
Only called when new ct is allocated or the extension isn't present. This function will be extended, place this in the conntrack module instead of inlining. The callers already depend on nf_conntrack module. Return value is changed to bool, noone used the returned pointer. Make sure that the core drops the newly allocated conntrack if the extension is requested but can't be added. This makes it necessary to ifdef the section, as the stub always returns false we'd drop every new conntrack if the the ecache extension is disabled in kconfig. Add from data path (xt_CT, nft_ct) is unchanged. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-13netfilter: nfnetlink: allow to detect if ctnetlink listeners existFlorian Westphal1-0/+1
At this time, every new conntrack gets the 'event cache extension' enabled for it. This is because the 'net.netfilter.nf_conntrack_events' sysctl defaults to 1. Changing the default to 0 means that commands that rely on the event notification extension, e.g. 'conntrack -E' or conntrackd, stop working. We COULD detect if there is a listener by means of 'nfnetlink_has_listeners()' and only add the extension if this is true. The downside is a dependency from conntrack module to nfnetlink module. This adds a different way: inc/dec a counter whenever a ctnetlink group is being (un)subscribed and toggle a flag in struct net. Next patches will take advantage of this and will only add the event extension if the flag is set. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-13netfilter: conntrack: add nf_ct_iter_data object for nf_ct_iterate_cleanup*()Pablo Neira Ayuso1-3/+9
This patch adds a structure to collect all the context data that is passed to the cleanup iterator. struct nf_ct_iter_data { struct net *net; void *data; u32 portid; int report; }; There is a netns field that allows to clean up conntrack entries specifically owned by the specified netns. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-13netfilter: conntrack: remove unconfirmed listFlorian Westphal2-7/+0
It has no function anymore and can be removed. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-13netfilter: extensions: introduce extension genid countFlorian Westphal2-16/+25
Multiple netfilter extensions store pointers to external data in their extension area struct. Examples: 1. Timeout policies 2. Connection tracking helpers. No references are taken for these. When a helper or timeout policy is removed, the conntrack table gets traversed and affected extensions are cleared. Conntrack entries not yet in the hashtable are referenced via a special list, the unconfirmed list. On removal of a policy or connection tracking helper, the unconfirmed list gets traversed an all entries are marked as dying, this prevents them from getting committed to the table at insertion time: core checks for dying bit, if set, the conntrack entry gets destroyed at confirm time. The disadvantage is that each new conntrack has to be added to the percpu unconfirmed list, and each insertion needs to remove it from this list. The list is only ever needed when a policy or helper is removed -- a rare occurrence. Add a generation ID count: Instead of adding to the list and then traversing that list on policy/helper removal, increment a counter that is stored in the extension area. For unconfirmed conntracks, the extension has the genid valid at ct allocation time. Removal of a helper/policy etc. increments the counter. At confirmation time, validate that ext->genid == global_id. If the stored number is not the same, do not allow the conntrack insertion, just like as if a confirmed-list traversal would have flagged the entry as dying. After insertion, the genid is no longer relevant (conntrack entries are now reachable via the conntrack table iterators and is set to 0. This allows removal of the percpu unconfirmed list. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-13netfilter: remove nf_ct_unconfirmed_destroy helperFlorian Westphal1-3/+0
This helper tags connections not yet in the conntrack table as dying. These nf_conn entries will be dropped instead when the core attempts to insert them from the input or postrouting 'confirm' hook. After the previous change, the entries get unlinked from the list earlier, so that by the time the actual exit hook runs, new connections no longer have a timeout policy assigned. Its enough to walk the hashtable instead. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-13netfilter: cttimeout: decouple unlink and free on netns destructionFlorian Westphal1-8/+0
Make it so netns pre_exit unlinks the objects from the pernet list, so they cannot be found anymore. netns core issues a synchronize_rcu() before calling the exit hooks so any the time the exit hooks run unconfirmed nf_conn entries have been free'd or they have been committed to the hashtable. The exit hook still tags unconfirmed entries as dying, this can now be removed in a followup change. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>