aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-21mptcp: track some aggregate data countersPaolo Abeni5-8/+47
Currently there are no data transfer counters accounting for all the subflows used by a given MPTCP socket. The user-space can compute such figures aggregating the subflow info, but that is inaccurate if any subflow is closed before the MPTCP socket itself. Add the new counters in the MPTCP socket itself and expose them via the existing diag and sockopt. While touching mptcp_diag_fill_info(), acquire the relevant locks before fetching the msk data, to ensure better data consistency Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/385 Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21mptcp: move snd_una update earlier for fallback socketPaolo Abeni2-6/+6
That will avoid an unneeded conditional in both the fast-path and in the fallback case and will simplify a bit the next patch. Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21Merge branch 'mptcp-fixes-for-6-4'Jakub Kicinski4-107/+76
Matthieu Baerts says: ==================== mptcp: fixes for 6.4 Patch 1 correctly handles disconnect() failures that can happen in some specific cases: now the socket state is set as unconnected as expected. That fixes an issue introduced in v6.2. Patch 2 fixes a divide by zero bug in mptcp_recvmsg() with a fix similar to a recent one from Eric Dumazet for TCP introducing sk_wait_pending flag. It should address an issue present in MPTCP from almost the beginning, from v5.9. Patch 3 fixes a possible list corruption on passive MPJ even if the race seems very unlikely, better be safe than sorry. The possible issue is present from v5.17. Patch 4 consolidates fallback and non fallback state machines to avoid leaking some MPTCP sockets. The fix is likely needed for versions from v5.11. Patch 5 drops code that is no longer used after the introduction of patch 4/6. This is not really a fix but this patch can probably land in the -net tree as well not to leave unused code. Patch 6 ensures listeners are unhashed before updating their sk status to avoid possible deadlocks when diag info are going to be retrieved with a lock. Even if it should not be visible with the way we are currently getting diag info, the issue is present from v5.17. ==================== Link: https://lore.kernel.org/r/20230620-upstream-net-20230620-misc-fixes-for-v6-4-v1-0-f36aa5eae8b9@tessares.net Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21mptcp: ensure listener is unhashed before updating the sk statusPaolo Abeni2-12/+20
The MPTCP protocol access the listener subflow in a lockless manner in a couple of places (poll, diag). That works only if the msk itself leaves the listener status only after that the subflow itself has been closed/disconnected. Otherwise we risk deadlock in diag, as reported by Christoph. Address the issue ensuring that the first subflow (the listener one) is always disconnected before updating the msk socket status. Reported-by: Christoph Paasch <[email protected]> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/407 Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21mptcp: drop legacy code around RX EOFPaolo Abeni2-53/+1
Thanks to the previous patch -- "mptcp: consolidate fallback and non fallback state machine" -- we can finally drop the "temporary hack" used to detect rx eof. Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21mptcp: consolidate fallback and non fallback state machinePaolo Abeni2-33/+25
An orphaned msk releases the used resources via the worker, when the latter first see the msk in CLOSED status. If the msk status transitions to TCP_CLOSE in the release callback invoked by the worker's final release_sock(), such instance of the workqueue will not take any action. Additionally the MPTCP code prevents scheduling the worker once the socket reaches the CLOSE status: such msk resources will be leaked. The only code path that can trigger the above scenario is the __mptcp_check_send_data_fin() in fallback mode. Address the issue removing the special handling of fallback socket in __mptcp_check_send_data_fin(), consolidating the state machine for fallback and non fallback socket. Since non-fallback sockets do not send and do not receive data_fin, the mptcp code can update the msk internal status to match the next step in the SM every time data fin (ack) should be generated or received. As a consequence we can remove a bunch of checks for fallback from the fastpath. Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21mptcp: fix possible list corruption on passive MPJPaolo Abeni1-3/+9
At passive MPJ time, if the msk socket lock is held by the user, the new subflow is appended to the msk->join_list under the msk data lock. In mptcp_release_cb()/__mptcp_flush_join_list(), the subflows in that list are moved from the join_list into the conn_list under the msk socket lock. Append and removal could race, possibly corrupting such list. Address the issue splicing the join list into a temporary one while still under the msk data lock. Found by code inspection, the race itself should be almost impossible to trigger in practice. Fixes: 3e5014909b56 ("mptcp: cleanup MPJ subflow list handling") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21mptcp: fix possible divide by zero in recvmsg()Paolo Abeni1-0/+7
Christoph reported a divide by zero bug in mptcp_recvmsg(): divide error: 0000 [#1] PREEMPT SMP CPU: 1 PID: 19978 Comm: syz-executor.6 Not tainted 6.4.0-rc2-gffcc7899081b #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:__tcp_select_window+0x30e/0x420 net/ipv4/tcp_output.c:3018 Code: 11 ff 0f b7 cd c1 e9 0c b8 ff ff ff ff d3 e0 89 c1 f7 d1 01 cb 21 c3 eb 17 e8 2e 83 11 ff 31 db eb 0e e8 25 83 11 ff 89 d8 99 <f7> 7c 24 04 29 d3 65 48 8b 04 25 28 00 00 00 48 3b 44 24 10 75 60 RSP: 0018:ffffc90000a07a18 EFLAGS: 00010246 RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000040000 RDX: 0000000000000000 RSI: 000000000003ffff RDI: 0000000000040000 RBP: 000000000000ffd7 R08: ffffffff820cf297 R09: 0000000000000001 R10: 0000000000000000 R11: ffffffff8103d1a0 R12: 0000000000003f00 R13: 0000000000300000 R14: ffff888101cf3540 R15: 0000000000180000 FS: 00007f9af4c09640(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b33824000 CR3: 000000012f241001 CR4: 0000000000170ee0 Call Trace: <TASK> __tcp_cleanup_rbuf+0x138/0x1d0 net/ipv4/tcp.c:1611 mptcp_recvmsg+0xcb8/0xdd0 net/mptcp/protocol.c:2034 inet_recvmsg+0x127/0x1f0 net/ipv4/af_inet.c:861 ____sys_recvmsg+0x269/0x2b0 net/socket.c:1019 ___sys_recvmsg+0xe6/0x260 net/socket.c:2764 do_recvmmsg+0x1a5/0x470 net/socket.c:2858 __do_sys_recvmmsg net/socket.c:2937 [inline] __se_sys_recvmmsg net/socket.c:2953 [inline] __x64_sys_recvmmsg+0xa6/0x130 net/socket.c:2953 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f9af58fc6a9 Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007f9af4c08cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00000000006bc050 RCX: 00007f9af58fc6a9 RDX: 0000000000000001 RSI: 0000000020000140 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000f00 R11: 0000000000000246 R12: 00000000006bc05c R13: fffffffffffffea8 R14: 00000000006bc050 R15: 000000000001fe40 </TASK> mptcp_recvmsg is allowed to release the msk socket lock when blocking, and before re-acquiring it another thread could have switched the sock to TCP_LISTEN status - with a prior connect(AF_UNSPEC) - also clearing icsk_ack.rcv_mss. Address the issue preventing the disconnect if some other process is concurrently performing a blocking syscall on the same socket, alike commit 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting"). Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Cc: [email protected] Reported-by: Christoph Paasch <[email protected]> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/404 Signed-off-by: Paolo Abeni <[email protected]> Tested-by: Christoph Paasch <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21mptcp: handle correctly disconnect() failuresPaolo Abeni1-6/+14
Currently the mptcp code has assumes that disconnect() can fail only at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and don't even bother returning an error code. Soon mptcp_disconnect() will handle more error conditions: let's track them explicitly. As a bonus, explicitly annotate TCP-level disconnect as not failing: the mptcp code never blocks for event on the subflows. Fixes: 7d803344fdc3 ("mptcp: fix deadlock in fastopen error path") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Tested-by: Christoph Paasch <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21net: ena: Fix rst format issues in readmeDavid Arinzon1-0/+2
This patch fixes a warning in the ena documentation file identified by the kernel automatic tools. The patch also adds a missing newline between sections. Signed-off-by: David Arinzon <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21inet: Cleanup on charging memory for newly accepted socketsAbel Wu1-7/+10
If there is no net-memcg associated with the sock, don't bother calculating its memory usage for charge. Signed-off-by: Abel Wu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: tc-testing: add one test for flushing explicitly created chainrenmingshuai1-0/+25
Add the test for additional reference to chains that are explicitly created by RTM_NEWCHAIN message. The test result: 1..1 ok 1 c2b4 - soft lockup alarm will be not generated after delete the prio 0 filter of the chain This is a follow up to commit c9a82bec02c3 ("net/sched: cls_api: Fix lockup on flushing explicitly created chain"). Signed-off-by: Mingshuai Ren <[email protected]> Acked-by: Pedro Tammela <[email protected]> Acked-by: Victor Nogueira <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21dt-bindings: net: micrel,ks8851: allow SPI device propertiesKrzysztof Kozlowski1-1/+2
The Micrel KS8851 can be attached to SPI or parallel bus and the difference is expressed in compatibles. Allow common SPI properties when this is a SPI variant and narrow the parallel memory bus properties to the second case. This fixes dtbs_check warning: qcom-msm8960-cdp.dtb: ethernet@0: Unevaluated properties are not allowed ('spi-max-frequency' was unexpected) Signed-off-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21dt-bindings: net: bluetooth: qualcomm: document VDD_CH1Krzysztof Kozlowski1-0/+3
WCN3990 comes with two chains - CH0 and CH1 - where each takes VDD regulator. It seems VDD_CH1 is optional (Linux driver does not care about it), so document it to fix dtbs_check warnings like: sdm850-lenovo-yoga-c630.dtb: bluetooth: 'vddch1-supply' does not match any of the regexes: 'pinctrl-[0-9]+' Signed-off-by: Krzysztof Kozlowski <[email protected]> Acked-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21net: hsr: Disable promiscuous mode in offload modeRavi Gunasekaran3-4/+17
When port-to-port forwarding for interfaces in HSR node is enabled, disable promiscuous mode since L2 frame forward happens at the offloaded hardware. Signed-off-by: Ravi Gunasekaran <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests/bpf: Add vrf_socket_lookup testsGilad Sever2-0/+400
Verify that socket lookup via TC/XDP with all BPF APIs is VRF aware. Signed-off-by: Gilad Sever <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Eyal Birger <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-21bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindingsGilad Sever2-30/+48
When calling bpf_sk_lookup_tcp(), bpf_sk_lookup_udp() or bpf_skc_lookup_tcp() from tc/xdp ingress, VRF socket bindings aren't respoected, i.e. unbound sockets are returned, and bound sockets aren't found. VRF binding is determined by the sdif argument to sk_lookup(), however when called from tc the IP SKB control block isn't initialized and thus inet{,6}_sdif() always returns 0. Fix by calculating sdif for the tc/xdp flows by observing the device's l3 enslaved state. The cg/sk_skb hooking points which are expected to support inet{,6}_sdif() pass sdif=-1 which makes __bpf_skc_lookup() use the existing logic. Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Signed-off-by: Gilad Sever <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Shmulik Ladkani <[email protected]> Reviewed-by: Eyal Birger <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Cc: David Ahern <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-21bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpointGilad Sever1-6/+18
skb->dev always exists in the tc flow. There is no need to use bpf_skc_lookup(), bpf_sk_lookup() from this code path. This change facilitates fixing the tc flow to be VRF aware. Signed-off-by: Gilad Sever <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Shmulik Ladkani <[email protected]> Reviewed-by: Eyal Birger <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-21bpf: Factor out socket lookup functions for the TC hookpoint.Gilad Sever1-3/+60
Change BPF helper socket lookup functions to use TC specific variants: bpf_tc_sk_lookup_tcp() / bpf_tc_sk_lookup_udp() / bpf_tc_skc_lookup_tcp() instead of sharing implementation with the cg / sk_skb hooking points. This allows introducing a separate logic for the TC flow. The tc functions are identical to the original code. Signed-off-by: Gilad Sever <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Shmulik Ladkani <[email protected]> Reviewed-by: Eyal Birger <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-21Merge branch 'leds-trigger-netdev-add-additional-modes'Jakub Kicinski2-10/+109
Christian Marangi says: ==================== leds: trigger: netdev: add additional modes This is a continue of [1]. It was decided to take a more gradual approach to implement LEDs support for switch and phy starting with basic support and then implementing the hw control part when we have all the prereq done. This should be the final part for the netdev trigger. I added net-next tag and added netdev mailing list since I was informed that this should be merged with netdev branch. We collect some info around and we found a good set of modes that are common in almost all the PHY and Switch. These modes are: - Modes for dedicated link speed(10, 100, 1000 mbps). Additional mode can be added later following this example. - Modes for half and full duplex. The original idea was to add hw control only modes. While the concept makes sense in practice it would results in lots of additional code and extra check to make sure we are setting correct modes. With the suggestion from Andrew it was pointed out that using the ethtool APIs we can actually get the current link speed and duplex and this effectively removed the problem of having hw control only modes since we can fallback to software. Since these modes are supported by software, we can skip providing an user for this in the LED driver to support hw control for these new modes (that will come right after this is merged) and prevent this to be another multi subsystem series. For link speed and duplex we use ethtool APIs. To call ethtool APIs, rtnl lock is needed but this can be skipped on handling netdev events as the lock is already held. [1] https://lore.kernel.org/lkml/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21leds: trigger: netdev: expose hw_control status via sysfsChristian Marangi1-0/+11
Expose hw_control status via sysfs for the netdev trigger to give userspace better understanding of the current state of the trigger and the LED. Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Acked-by: Lee Jones <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21leds: trigger: netdev: add additional specific link duplex modeChristian Marangi2-2/+27
Add additional modes for specific link duplex. Use ethtool APIs to get the current link duplex and enable the LED accordingly. Under netdev event handler the rtnl lock is already held and is not needed to be set to access ethtool APIs. This is especially useful for PHY and Switch that supports LEDs hw control for specific link duplex. Add additional modes: - half_duplex: Turn on LED when link is half duplex - full_duplex: Turn on LED when link is full duplex Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Lee Jones <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21leds: trigger: netdev: add additional specific link speed modeChristian Marangi2-10/+73
Add additional modes for specific link speed. Use ethtool APIs to get the current link speed and enable the LED accordingly. Under netdev event handler the rtnl lock is already held and is not needed to be set to access ethtool APIs. This is especially useful for PHY and Switch that supports LEDs hw control for specific link speed. (example scenario a PHY that have 2 LED connected one green and one orange where the green is turned on with 1000mbps speed and orange is turned on with 10mpbs speed) On mode set from sysfs we check if we have enabled split link speed mode and reject enabling generic link mode to prevent wrong and redundant configuration. Rework logic on the set baseline state to support these new modes to select if we need to turn on or off the LED. Add additional modes: - link_10: Turn on LED when link speed is 10mbps - link_100: Turn on LED when link speed is 100mbps - link_1000: Turn on LED when link speed is 1000mbps Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Lee Jones <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21bnxt_en: Link representors to PCI deviceIvan Vecera1-0/+1
Link VF representors to parent PCI device to benefit from systemd defined naming scheme. Without this change the representor is visible as ethN. Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21Merge branch ↵Jakub Kicinski19-25/+88
'selftests-preparations-for-out-of-order-operations-patches-in-mlxsw' Petr Machata says: ==================== selftests: Preparations for out-of-order-operations patches in mlxsw The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. Over the course of the following several patchsets, mlxsw code is going to be adjusted to diminish the space of wrongly offloaded configurations. Ideally the offload state will reflect the actual state, regardless of the sequence of operation used to construct that state. Several selftests build configurations that will not be offloadable in the future on some systems. The reason is that what will get offloaded is the actual configuration, not the configuration steps. For example, when a port is added to a bridge that has an IP address, that bridge will get a RIF, which it would not have with the current code. But on Nvidia Spectrum-1 machines, MAC addresses of all RIFs need to have the same prefix, which the bridge will violate. The RIF thus couldn't be created, and the enslavement is therefore canceled, because it would lead to an unoffloadable configuration. This breaks some selftests. In this patchset, adjust selftests to avoid the configurations that mlxsw would be incapable of offloading, while maintaining relevance with regards to the feature that is being tested. There are generally two cases of fixes: - Disabling IPv6 autogen on bridges that do not participate in routing, either because of the abovementioned requirement to keep the same MAC prefix on all in-HW router interfaces, or, on 802.1ad bridges, because in-HW router interfaces are not supported at all. - Setting the bridge MAC address to what it will become after the first member port is attached, so that the in-HW router interface is created with a supported MAC address. The patchset is then split thus: - Patches #1-#7 adjust generic selftests - Patches #8-#16 adjust mlxsw-specific selftests ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: mlxsw: one_armed_router: Use port MAC for bridge addressPetr Machata1-1/+2
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The bridge eventually inherits MAC address from its first member, after the enslavement is acked. A number of (mainly VXLAN) selftests already work around the problem by setting the MAC address to whatever it will eventually be anyway. Do the same for this selftest. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: mlxsw: vxlan: Disable IPv6 autogen on bridgesPetr Machata1-10/+31
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge (this holds for all bridges used here), the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks various aspects of VXLAN offloading and the bridges do not need to participate in routing traffic. The IP addresses or the RIFs are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridges in this selftest, thus exempting them from mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: mlxsw: spectrum: q_in_vni_veto: Disable IPv6 autogen on a bridgePetr Machata1-0/+1
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks vetoing of a different aspect of the configuration and the bridge does not need to participate in routing traffic. The IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: mlxsw: qos_mc_aware: Disable IPv6 autogen on bridgesPetr Machata1-0/+2
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge (this holds for both bridges used here), the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks traffic prioritization and scheduling, and the bridges serve for their L2 forwarding capabilities, and do not need to participate in routing traffic. The IP addresses or the RIFs are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridges in this selftest, thus exempting them from mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: mlxsw: qos_ets_strict: Disable IPv6 autogen on bridgesPetr Machata1-2/+6
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge (this holds for both bridges used here), the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks traffic prioritization and scheduling, and the bridges serve for their L2 forwarding capabilities, and do not need to participate in routing traffic. The IP addresses or the RIFs are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridges in this selftest, thus exempting them from mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: mlxsw: qos_dscp_bridge: Disable IPv6 autogen on a bridgePetr Machata1-0/+1
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks DCB DSCP-based prioritization, and the bridge serves for its L2 forwarding capabilities, and does not need to participate in routing traffic. The IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: mlxsw: mirror_gre_scale: Disable IPv6 autogen on a bridgePetr Machata1-0/+1
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks how many mirroring sessions a machine is capable of offloading. The IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: mlxsw: extack: Disable IPv6 autogen on bridgesPetr Machata1-6/+18
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge (this holds for all bridges used here), the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks whether a different vetoed aspect of the configuration provides an extack. The IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridges in this selftest, thus exempting them from mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: mlxsw: q_in_q_veto: Disable IPv6 autogen on bridgesPetr Machata1-0/+8
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. The swp enslavement to the 802.1ad bridge is not allowed, because RIFs are not allowed to be created for 802.1ad bridges, but the address indicates one needs to be created. Thus the veto selftests fail already during the port enslavement. Then the attempt to create a VLAN on top of the same bridge is not vetoed, because the bridge is not related to mlxsw, and the selftest fails. Fix by disabling automatic IPv6 address generation for the bridges in this selftest, thus exempting them from the mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: forwarding: router_bridge: Use port MAC for bridge addressPetr Machata1-1/+2
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The bridge eventually inherits MAC address from its first member, after the enslavement is acked. A number of (mainly VXLAN) selftests already work around the problem by setting the MAC address to whatever it will eventually be anyway. Do the same here. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: forwarding: mirror_gre_*: Use port MAC for bridge addressPetr Machata3-3/+6
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The bridge eventually inherits MAC address from its first member, after the enslavement is acked. A number of (mainly VXLAN) selftests already work around the problem by setting the MAC address to whatever it will eventually be anyway. Do the same for several mirror_gre selftests. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: forwarding: mirror_gre_*: Disable IPv6 autogen on bridgesPetr Machata2-0/+2
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. These two selftests however check mirroring traffic to a gretap netdevice. The bridge here does not participate in routing traffic and the IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridges in these selftests, thus exempting them from mlxsw router attention. Since the bridges are only used for L2 forwarding, this change should not hinder usefulness of this selftest for testing SW datapath or HW datapaths in other devices. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: forwarding: pedit_dsfield: Disable IPv6 autogen on a bridgePetr Machata1-1/+3
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks whether skbedit changes packet priority as appropriate. The bridge thus does not need to participate in routing traffic and the IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Since the bridge is only used for L2 forwarding, this change should not hinder usefulness of this selftest for testing SW datapath or HW datapaths in other devices. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: forwarding: skbedit_priority: Disable IPv6 autogen on a bridgePetr Machata1-1/+3
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks operation of pedit on IPv4 and IPv6 dsfield and its parts. The bridge thus does not need to participate in routing traffic and the IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Since the bridge is only used for L2 forwarding, this change should not hinder usefulness of this selftest for testing SW datapath or HW datapaths in other devices. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: forwarding: dual_vxlan_bridge: Disable IPv6 autogen on bridgesPetr Machata1-0/+1
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. This will cause this selftest to fail spuriously. The swp enslavement to the 802.1ad bridge is not allowed, because RIFs are not allowed to be created for 802.1ad bridges, but the address indicates one needs to be created. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21selftests: forwarding: q_in_vni: Disable IPv6 autogen on bridgesPetr Machata1-0/+1
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. This will cause this selftest to fail spuriously. The swp enslavement to the 802.1ad bridge is not allowed, because RIFs are not allowed to be created for 802.1ad bridges, but the address indicates one needs to be created. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21Merge tag 'for-netdev' of ↵Jakub Kicinski7-15/+181
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-06-21 We've added 7 non-merge commits during the last 14 day(s) which contain a total of 7 files changed, 181 insertions(+), 15 deletions(-). The main changes are: 1) Fix a verifier id tracking issue with scalars upon spill, from Maxim Mikityanskiy. 2) Fix NULL dereference if an exception is generated while a BPF subprogram is running, from Krister Johansen. 3) Fix a BTF verification failure when compiling kernel with LLVM_IAS=0, from Florent Revest. 4) Fix expected_attach_type enforcement for kprobe_multi link, from Jiri Olsa. 5) Fix a bpf_jit_dump issue for x86_64 to pick the correct JITed image, from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Force kprobe multi expected_attach_type for kprobe_multi link bpf/btf: Accept function names that contain dots selftests/bpf: add a test for subprogram extables bpf: ensure main program has an extable bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable. selftests/bpf: Add test cases to assert proper ID tracking on spill bpf: Fix verifier id tracking of scalars on spill ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-21cgroup/misc: Expose misc.current on cgroup v2 rootLeiZhou-972-2/+1
Hello, This patch is to expose misc.current on cgroup v2 root for tracking how much of the resource has been consumed in total on the system. Most of the cloud infrastucture use cgroup to fetch the host information for scheduling purpose. Currently, the misc controller can be used by Intel TDX HKIDs and AMD SEV ASIDs, which are both used for creating encrypted VMs. Intel TDX and AMD SEV are mostly be used by the cloud providers for providing confidential VMs. In actual use of a server, these confidential VMs may be launched in different ways. For the cloud solution, there are kubvirt and coco (tracked by kubepods.slice); on host, they can be booted directly through qemu by end user (tracked by user.slice), etc. In this complex environment, when wanting to know how many resource is used in total it has to iterate through all existing slices to get the value of each misc.current and add them up to calculate the total number of consumed keys. So exposing misc.current to root cgroup tends to give much easier when calculates how much resource has been used in total, which helps to schedule and count resources for the cloud infrastucture. Signed-off-by: LeiZhou-97 <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-06-21cgroup: Avoid -Wstringop-overflow warningsGustavo A. R. Silva1-0/+6
Address the following -Wstringop-overflow warnings seen when built with ARM architecture and aspeed_g4_defconfig configuration (notice that under this configuration CGROUP_SUBSYS_COUNT == 0): kernel/cgroup/cgroup.c:1208:16: warning: 'find_existing_css_set' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:1258:15: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:6089:18: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:6153:18: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] These changes are based on commit d20d30ebb199 ("cgroup: Avoid compiler warnings with no subsystems"). Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-06-21Merge tag 'timers-urgent-2023-06-21' of ↵Linus Torvalds2-13/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single regression fix for a regression fix: For a long time the tick was aligned to clock MONOTONIC so that the tick event happened at a multiple of nanoseconds per tick starting from clock MONOTONIC = 0. At some point this changed as the refined jiffies clocksource which is used during boot before the TSC or other clocksources becomes usable, was adjusted with a boot offset, so that time 0 is closer to the point where the kernel starts. This broke the assumption in the tick code that when the tick setup happens early on ktime_get() will return a multiple of nanoseconds per tick. As a consequence applications which aligned their periodic execution so that it does not collide with the tick were not longer guaranteed that the tick period starts from time 0. The fix for this regression was to realign the tick when it is initially set up to a multiple of tick periods. That works as long as the underlying tick device supports periodic mode, but breaks under certain conditions when the tick device supports only one shot mode. Depending on the offset, the alignment delta to clock MONOTONIC can get in a range where the minimal programming delta of the underlying clock event device is larger than the calculated delta to the next tick. This results in a boot hang as the tick code tries to play catch up, but as the tick never fires jiffies are not advanced so it keeps trying for ever. Solve this by moving the tick alignement into the NOHZ / HIGHRES enablement code because at that point it is guaranteed that the underlying clocksource is high resolution capable and not longer depending on the tick. This is far before user space starts, so at the point where applications try to align their timers, the old behaviour of the tick happening at a multiple of nanoseconds per tick starting from clock MONOTONIC = 0 is restored" * tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/common: Align tick period during sched_timer setup
2023-06-21block: Improve kernel-doc headersBart Van Assche2-1/+3
Fix the documentation of the devt_from_partuuid() return value. Fix the following two recently introduced kernel-doc warnings: block/bdev.c:570: warning: Function parameter or member 'hops' not described in 'bd_finish_claiming' block/early-lookup.c:46: warning: Function parameter or member 'devt' not described in 'devt_from_partuuid' Cc: Christoph Hellwig <[email protected]> Fixes: 0718afd47f70 ("block: introduce holder ops") Fixes: cf056a431215 ("init: improve the name_to_dev_t interface") Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-21nfsd: remove redundant assignments to variable lenColin Ian King1-7/+5
There are a few assignments to variable len where the value is not being read and so the assignments are redundant and can be removed. In one case, the variable len can be removed completely. Cleans up 4 clang scan warnings of the form: fs/nfsd/export.c:100:7: warning: Although the value stored to 'len' is used in the enclosing expression, the value is never actually read from 'len' [deadcode.DeadStores] Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2023-06-21cgroup: remove obsolete comment on cgroup_on_dfl()Miaohe Lin1-2/+0
The debug feature is supported since commit 8cc38fa7fa31 ("cgroup: make debug an implicit controller on cgroup2"), update corresponding comment. Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-06-21wifi: rtlwifi: cleanup USB interfaceDmitry Antipov2-27/+6
Drop unused '_usb_writen_sync()' and relevant pointer from 'struct rtl_io', handle possible write error in '_usb_write_async()', adjust related code. Signed-off-by: Dmitry Antipov <[email protected]> Acked-by: Ping-Ke Shih <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-21wifi: rtlwifi: simplify LED managementDmitry Antipov40-391/+177
Introduce 'rtl_init_sw_leds()' to replace per-chip LED initialization code (and so drop 'struct rtl_led' as no longer used), drop 'init_sw_leds' and 'deinit_sw_leds' fields from 'struct rtl_hal_ops', adjust related code. Signed-off-by: Dmitry Antipov <[email protected]> Acked-by: Ping-Ke Shih <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected]