aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-09-20net: dsa: allow masters to join a LAGVladimir Oltean5-10/+298
There are 2 ways in which a DSA user port may become handled by 2 CPU ports in a LAG: (1) its current DSA master joins a LAG ip link del bond0 && ip link add bond0 type bond mode 802.3ad ip link set eno2 master bond0 When this happens, all user ports with "eno2" as DSA master get automatically migrated to "bond0" as DSA master. (2) it is explicitly configured as such by the user # Before, the DSA master was eno3 ip link set swp0 type dsa master bond0 The design of this configuration is that the LAG device dynamically becomes a DSA master through dsa_master_setup() when the first physical DSA master becomes a LAG slave, and stops being so through dsa_master_teardown() when the last physical DSA master leaves. A LAG interface is considered as a valid DSA master only if it contains existing DSA masters, and no other lower interfaces. Therefore, we mainly rely on method (1) to enter this configuration. Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when it becomes a standalone DSA master again. But the LAG master also has a dev->dsa_ptr, and this is actually duplicated from one of the physical LAG slaves, and therefore needs to be balanced when LAG slaves come and go. To the switch driver, putting DSA masters in a LAG is seen as putting their associated CPU ports in a LAG. We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG, by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: propagate extack to port_lag_joinVladimir Oltean3-2/+4
Drivers could refuse to offload a LAG configuration for a variety of reasons, mainly having to do with its TX type. Additionally, since DSA masters may now also be LAG interfaces, and this will translate into a call to port_lag_join on the CPU ports, there may be extra restrictions there. Propagate the netlink extack to this DSA method in order for drivers to give a meaningful error message back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: suppress device links to LAG DSA mastersVladimir Oltean1-6/+8
These don't work (print a harmless error about the operation failing) and make little sense to have anyway, because when a LAG DSA master goes away, we will introduce logic to move our CPU port back to the first physical DSA master. So suppress these device links in preparation for adding support for LAG DSA masters. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: suppress appending ethtool stats to LAG DSA mastersVladimir Oltean1-0/+9
Similar to the discussion about tracking the admin/oper state of LAG DSA masters, we have the problem here that struct dsa_port *cpu_dp caches a single pair of orig_ethtool_ops and netdev_ops pointers. So if we call dsa_master_setup(bond0, cpu_dp) where cpu_dp is also the dev->dsa_ptr of one of the physical DSA masters, we'd effectively overwrite what we cached from that physical netdev with what replaced from the bonding interface. We don't need DSA ethtool stats on the bonding interface when used as DSA master, it's good enough to have them just on the physical DSA masters, so suppress this logic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: don't keep track of admin/oper state on LAG DSA mastersVladimir Oltean1-0/+12
We store information about the DSA master's state in cpu_dp->master_admin_up and cpu_dp->master_oper_up, and this assumes a bijective association between a CPU port and a DSA master. However, when we have CPU ports in a LAG (and DSA masters in a LAG too), the way in which we set up things is that the physical DSA masters still have dev->dsa_ptr pointing to our cpu_dp, but the bonding/team device itself also has its dev->dsa_ptr pointing towards one of the CPU port structures (the first one). So logically speaking, that first cpu_dp can't keep track of both the physical master's admin/oper state, and of the bonding master's state. This isn't even needed; the reason why we keep track of the DSA master's state is to know when it is available for Ethernet-based register access. For that use case, we don't even need LAG; we just need to decide upon one of the physical DSA masters (if there is more than 1 available) and use that. This change suppresses dsa_tree_master_{admin,oper}_state_change() calls on LAG DSA masters (which will be supported in a future change), to allow the tracking of just physical DSA masters. Link: https://lore.kernel.org/netdev/628cc94d.1c69fb81.15b0d.422d@mx.google.com/ Suggested-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: allow the DSA master to be seen and changed through rtnetlinkVladimir Oltean7-1/+354
Some DSA switches have multiple CPU ports, which can be used to improve CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(), sets up only the first one, leading to suboptimal use of hardware. The desire is to not change the default configuration but to permit the user to create a dynamic mapping between individual user ports and the CPU port that they are served by, configurable through rtnetlink. It is also intended to permit load balancing between CPU ports, and in that case, the foreseen model is for the DSA master to be a bonding interface whose lowers are the physical DSA masters. To that end, we create a struct rtnl_link_ops for DSA user ports with the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that contains the ifindex of the newly desired DSA master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: introduce dsa_port_get_master()Vladimir Oltean5-27/+26
There is a desire to support for DSA masters in a LAG. That configuration is intended to work by simply enslaving the master to a bonding/team device. But the physical DSA master (the LAG slave) still has a dev->dsa_ptr, and that cpu_dp still corresponds to the physical CPU port. However, we would like to be able to retrieve the LAG that's the upper of the physical DSA master. In preparation for that, introduce a helper called dsa_port_get_master() that replaces all occurrences of the dp->cpu_dp->master pattern. The distinction between LAG and non-LAG will be made later within the helper itself. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20flow_offload: Introduce flow_match_l2tpv3Wojciech Drewek1-0/+7
Allow to offload L2TPv3 filters by adding flow_rule_match_l2tpv3. Drivers can extract L2TPv3 specific fields from now on. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net/sched: flower: Add L2TPv3 filterWojciech Drewek1-0/+16
Add support for matching on L2TPv3 session ID. Session ID can be specified only when ip proto was set to IPPROTO_L2TP. Example filter: # tc filter add dev $PF1 ingress prio 1 protocol ip \ flower \ ip_proto l2tp \ l2tpv3_sid 1234 \ skip_sw \ action mirred egress redirect dev $VF1_PR Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20flow_dissector: Add L2TPv3 dissectorsWojciech Drewek1-0/+28
Allow to dissect L2TPv3 specific field which is: - session ID (32 bits) L2TPv3 might be transported over IP or over UDP, this implementation is only about L2TPv3 over IP. IP protocol carries L2TPv3 when ip_proto is IPPROTO_L2TP (115). Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-19openvswitch: Change the return type for vport_ops.send function hook to intNathan Huckleberry2-2/+2
All usages of the vport_ops struct have the .send field set to dev_queue_xmit or internal_dev_recv. Since most usages are set to dev_queue_xmit, the function hook should match the signature of dev_queue_xmit. The only call to vport_ops->send() is in net/openvswitch/vport.c and it throws away the return value. This mismatched return type breaks forward edge kCFI since the underlying function definition does not match the function hook definition. Reported-by: Dan Carpenter <error27@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1703 Cc: llvm@lists.linux.dev Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20220913230739.228313-1-nhuck@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19Merge tag 'batadv-next-pullrequest-20220916' of ↵Jakub Kicinski4-43/+1
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - drop unused headers in trace.h, by Sven Eckelmann - drop initialization of flexible ethtool_link_ksettings, by Sven Eckelmann - remove unused struct definitions, by Marek Lindner * tag 'batadv-next-pullrequest-20220916' of git://git.open-mesh.org/linux-merge: batman-adv: remove unused struct definitions batman-adv: Drop initialization of flexible ethtool_link_ksettings batman-adv: Drop unused headers in trace.h batman-adv: Start new development cycle ==================== Link: https://lore.kernel.org/r/20220916161454.1413154-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19Merge tag 'batadv-net-pullrequest-20220916' of ↵Jakub Kicinski1-0/+4
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Fix hang up with small MTU hard-interface, by Shigeru Yoshida * tag 'batadv-net-pullrequest-20220916' of git://git.open-mesh.org/linux-merge: batman-adv: Fix hang up with small MTU hard-interface ==================== Link: https://lore.kernel.org/r/20220916160931.1412407-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19net: rds: add missing __init/__exit annotations to module init/exit funcsXiu Jianfeng3-4/+4
Add missing __init/__exit annotations to module init/exit funcs. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Link: https://lore.kernel.org/r/20220909091840.247946-1-xiujianfeng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19rxrpc: remove rxrpc_max_call_lifetime declarationGaosheng Cui1-1/+0
rxrpc_max_call_lifetime has been removed since commit a158bdd3247b ("rxrpc: Fix call timeouts"), so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://lore.kernel.org/r/20220909064042.1149404-1-cuigaosheng1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-16Merge tag 'linux-can-next-for-6.1-20220915' of ↵David S. Miller6-66/+111
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== Sept. 15, 2022, 8:19 a.m. UTC Hello Jakub, hello David, this is a pull request of 23 patches for net-next/master. the first 2 patches are by me and fix a typo in the rx-offload helper and the flexcan driver. Christophe JAILLET's patch cleans up the error handling in rcar_canfd driver's probe function. Kenneth Lee's patch converts the kvaser_usb driver from kcalloc() to kzalloc(). Biju Das contributes 2 patches to the sja1000 driver which update the DT bindings and support for the RZ/N1 SJA1000 CAN controller. Jinpeng Cui provides 2 patches that remove redundant variables from the sja1000 and kvaser_pciefd driver. 2 patches by John Whittington and me add hardware timestamp support to the gs_usb driver. Gustavo A. R. Silva's patch converts the etas_es58x driver to make use of DECLARE_FLEX_ARRAY(). Krzysztof Kozlowski's patch cleans up the sja1000 DT bindings. Dario Binacchi fixes his invalid email in the flexcan driver documentation. Ziyang Xuan contributes 2 patches that clean up the CAN RAW protocol. Yang Yingliang's patch switches the flexcan driver to dev_err_probe(). The last 7 patches are by Oliver Hartkopp and add support for the next generation of the CAN protocol: CAN with eXtended data Length (CAN XL). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16tcp: Use WARN_ON_ONCE() in tcp_read_skb()Peilin Ye1-1/+1
Prevent tcp_read_skb() from flooding the syslog. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net/ieee802154: fix uninit value bug in dgram_sendmsgHaimin Zhang1-19/+23
There is uninit value bug in dgram_sendmsg function in net/ieee802154/socket.c when the length of valid data pointed by the msg->msg_name isn't verified. We introducing a helper function ieee802154_sockaddr_check_size to check namelen. First we check there is addr_type in ieee802154_addr_sa. Then, we check namelen according to addr_type. Also fixed in raw_bind, dgram_bind, dgram_connect. Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16vsock/vmci: fix repeated words in commentsJilin Yuan1-1/+1
Delete the redundant word 'that'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16rtnetlink: advertise allmulti counterNicolas Dichtel1-0/+3
Like what was done with IFLA_PROMISCUITY, add IFLA_ALLMULTI to advertise the allmulti counter. The flag IFF_ALLMULTI is advertised only if it was directly set by a userland app. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-15mptcp: account memory allocation in mptcp_nl_cmd_add_addr() to userThomas Haller1-1/+1
Now that non-root users can configure MPTCP endpoints, account the memory allocation to the user. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-15mptcp: allow privileged operations from user namespacesThomas Haller1-9/+9
GENL_ADMIN_PERM checks that the user has CAP_NET_ADMIN in the initial namespace by calling netlink_capable(). Instead, use GENL_UNS_ADMIN_PERM which uses netlink_ns_capable(). This checks that the caller has CAP_NET_ADMIN in the current user namespace. See also commit 4a92602aa1cd ("openvswitch: allow management from inside user namespaces") which introduced this mechanism. See also commit 5617c6cd6f84 ("nl80211: Allow privileged operations from user namespaces") which introduced this for nl80211. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-15mptcp: add do_check_data_fin to replace copiedGeliang Tang1-3/+4
This patch adds a new bool variable 'do_check_data_fin' to replace the original int variable 'copied' in __mptcp_push_pending(), check it to determine whether to call __mptcp_check_send_data_fin(). Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-15mptcp: add mptcp_for_each_subflow_safe helperMatthieu Baerts3-4/+6
Similar to mptcp_for_each_subflow(): this is clearer now that the _safe version is used in multiple places. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-15can: raw: add CAN XL supportOliver Hartkopp1-12/+43
Enable CAN_RAW sockets to read and write CAN XL frames analogue to the CAN FD extension (new CAN_RAW_XL_FRAMES sockopt). A CAN XL network interface is capable to handle Classical CAN, CAN FD and CAN XL frames. When CAN_RAW_XL_FRAMES is enabled, the CAN_RAW socket checks whether the addressed CAN network interface is capable to handle the provided CAN frame. In opposite to the fixed number of bytes for - CAN frames (CAN_MTU = sizeof(struct can_frame)) - CAN FD frames (CANFD_MTU = sizeof(struct can_frame)) the number of bytes when reading/writing CAN XL frames depends on the number of data bytes. For efficiency reasons the length of the struct canxl_frame is truncated to the needed size for read/write operations. This leads to a calculated size of CANXL_HDR_SIZE + canxl_frame::len which is enforced on write() operations and guaranteed on read() operations. NB: Valid length values are 1 .. 2048 (CANXL_MIN_DLEN .. CANXL_MAX_DLEN). Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-8-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-15can: canxl: update CAN infrastructure for CAN XL framesOliver Hartkopp1-1/+24
- add new ETH_P_CANXL ethernet protocol type - update skb checks for CAN XL - add alloc_canxl_skb() which now needs a data length parameter - introduce init_can_skb_reserve() to reduce code duplication Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-6-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-15can: set CANFD_FDF flag in all CAN FD frame structuresOliver Hartkopp1-0/+5
To simplify the testing in user space all struct canfd_frame's provided by the CAN subsystem of the Linux kernel now have the CANFD_FDF flag set in canfd_frame::flags. NB: Handcrafted ETH_P_CANFD frames introduced via PF_PACKET socket might not set this bit correctly. During the check for sufficient headroom in PF_PACKET sk_buffs the uninitialized CAN sk_buff data structures are filled. In the case of a CAN FD frame the CANFD_FDF flag is set accordingly. As the CAN frame content is already zero initialized in alloc_canfd_skb() the obsolete initialization of cf->flags in the CTU CAN FD driver has been removed as it would overwrite the already set CANFD_FDF flag. Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-4-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-15can: skb: unify skb CAN frame identification helpersOliver Hartkopp5-45/+24
Replace open coded checks for sk_buffs containing Classical CAN and CAN FD frame structures as a preparation for CAN XL support. With the added length check the unintended processing of CAN XL frames having the CANXL_XLF bit set can be suppressed even when the skb->len fits to non CAN XL frames. The CAN_RAW socket needs a rework to use these helpers. Therefore the use of these helpers is postponed to the CAN_RAW CAN XL integration. The J1939 protocol gets a check for Classical CAN frames too. Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-2-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-15batman-adv: remove unused struct definitionsMarek Lindner1-39/+0
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-09-13mptcp: fix fwd memory accounting on coalescePaolo Abeni1-1/+7
The intel bot reported a memory accounting related splat: [ 240.473094] ------------[ cut here ]------------ [ 240.478507] page_counter underflow: -4294828518 nr_pages=4294967290 [ 240.485500] WARNING: CPU: 2 PID: 14986 at mm/page_counter.c:56 page_counter_cancel+0x96/0xc0 [ 240.570849] CPU: 2 PID: 14986 Comm: mptcp_connect Tainted: G S 5.19.0-rc4-00739-gd24141fe7b48 #1 [ 240.581637] Hardware name: HP HP Z240 SFF Workstation/802E, BIOS N51 Ver. 01.63 10/05/2017 [ 240.590600] RIP: 0010:page_counter_cancel+0x96/0xc0 [ 240.596179] Code: 00 00 00 45 31 c0 48 89 ef 5d 4c 89 c6 41 5c e9 40 fd ff ff 4c 89 e2 48 c7 c7 20 73 39 84 c6 05 d5 b1 52 04 01 e8 e7 95 f3 01 <0f> 0b eb a9 48 89 ef e8 1e 25 fc ff eb c3 66 66 2e 0f 1f 84 00 00 [ 240.615639] RSP: 0018:ffffc9000496f7c8 EFLAGS: 00010082 [ 240.621569] RAX: 0000000000000000 RBX: ffff88819c9c0120 RCX: 0000000000000000 [ 240.629404] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff5200092deeb [ 240.637239] RBP: ffff88819c9c0120 R08: 0000000000000001 R09: ffff888366527a2b [ 240.645069] R10: ffffed106cca4f45 R11: 0000000000000001 R12: 00000000fffffffa [ 240.652903] R13: ffff888366536118 R14: 00000000fffffffa R15: ffff88819c9c0000 [ 240.660738] FS: 00007f3786e72540(0000) GS:ffff888366500000(0000) knlGS:0000000000000000 [ 240.669529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 240.675974] CR2: 00007f966b346000 CR3: 0000000168cea002 CR4: 00000000003706e0 [ 240.683807] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 240.691641] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 240.699468] Call Trace: [ 240.702613] <TASK> [ 240.705413] page_counter_uncharge+0x29/0x80 [ 240.710389] drain_stock+0xd0/0x180 [ 240.714585] refill_stock+0x278/0x580 [ 240.718951] __sk_mem_reduce_allocated+0x222/0x5c0 [ 240.729248] __mptcp_update_rmem+0x235/0x2c0 [ 240.734228] __mptcp_move_skbs+0x194/0x6c0 [ 240.749764] mptcp_recvmsg+0xdfa/0x1340 [ 240.763153] inet_recvmsg+0x37f/0x500 [ 240.782109] sock_read_iter+0x24a/0x380 [ 240.805353] new_sync_read+0x420/0x540 [ 240.838552] vfs_read+0x37f/0x4c0 [ 240.842582] ksys_read+0x170/0x200 [ 240.864039] do_syscall_64+0x5c/0x80 [ 240.872770] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 240.878526] RIP: 0033:0x7f3786d9ae8e [ 240.882805] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 18 0a 00 e8 89 e8 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 [ 240.902259] RSP: 002b:00007fff7be81e08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 240.910533] RAX: ffffffffffffffda RBX: 0000000000002000 RCX: 00007f3786d9ae8e [ 240.918368] RDX: 0000000000002000 RSI: 00007fff7be87ec0 RDI: 0000000000000005 [ 240.926206] RBP: 0000000000000005 R08: 00007f3786e6a230 R09: 00007f3786e6a240 [ 240.934046] R10: fffffffffffff288 R11: 0000000000000246 R12: 0000000000002000 [ 240.941884] R13: 00007fff7be87ec0 R14: 00007fff7be87ec0 R15: 0000000000002000 [ 240.949741] </TASK> [ 240.952632] irq event stamp: 27367 [ 240.956735] hardirqs last enabled at (27366): [<ffffffff81ba50ea>] mem_cgroup_uncharge_skmem+0x6a/0x80 [ 240.966848] hardirqs last disabled at (27367): [<ffffffff81b8fd42>] refill_stock+0x282/0x580 [ 240.976017] softirqs last enabled at (27360): [<ffffffff83a4d8ef>] mptcp_recvmsg+0xaf/0x1340 [ 240.985273] softirqs last disabled at (27364): [<ffffffff83a4d30c>] __mptcp_move_skbs+0x18c/0x6c0 [ 240.994872] ---[ end trace 0000000000000000 ]--- After commit d24141fe7b48 ("mptcp: drop SK_RECLAIM_* macros"), if rmem_fwd_alloc become negative, mptcp_rmem_uncharge() can try to reclaim a negative amount of pages, since the expression: reclaimable >= PAGE_SIZE will evaluate to true for any negative value of the int 'reclaimable': 'PAGE_SIZE' is an unsigned long and the negative integer will be promoted to a (very large) unsigned long value. Still after the mentioned commit, kfree_skb_partial() in mptcp_try_coalesce() will reclaim most of just released fwd memory, so that following charging of the skb delta size will lead to negative fwd memory values. At that point a racing recvmsg() can trigger the splat. Address the issue switching the order of the memory accounting operations. The fwd memory can still transiently reach negative values, but that will happen in an atomic scope and no code path could touch/use such value. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: d24141fe7b48 ("mptcp: drop SK_RECLAIM_* macros") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20220906180404.1255873-1-matthieu.baerts@tessares.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-12Merge tag 'nfs-for-5.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2-4/+7
Pull NFS client bugfixes from Trond Myklebust: - Fix SUNRPC call completion races with call_decode() that trigger a WARN_ON() - NFSv4.0 cannot support open-by-filehandle and NFS re-export - Revert "SUNRPC: Remove unreachable error condition" to allow handling of error conditions - Update suid/sgid mode bits after ALLOCATE and DEALLOCATE * tag 'nfs-for-5.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: Revert "SUNRPC: Remove unreachable error condition" NFSv4.2: Update mode bits after ALLOCATE and DEALLOCATE NFSv4: Turn off open-by-filehandle and NFS re-export for NFSv4.0 SUNRPC: Fix call completion races with call_decode()
2022-09-09net: core: fix flow symmetric hashLudovic Cintrat1-3/+2
__flow_hash_consistentify() wrongly swaps ipv4 addresses in few cases. This function is indirectly used by __skb_get_hash_symmetric(), which is used to fanout packets in AF_PACKET. Intrusion detection systems may be impacted by this issue. __flow_hash_consistentify() computes the addresses difference then swaps them if the difference is negative. In few cases src - dst and dst - src are both negative. The following snippet mimics __flow_hash_consistentify(): ``` #include <stdio.h> #include <stdint.h> int main(int argc, char** argv) { int diffs_d, diffd_s; uint32_t dst = 0xb225a8c0; /* 178.37.168.192 --> 192.168.37.178 */ uint32_t src = 0x3225a8c0; /* 50.37.168.192 --> 192.168.37.50 */ uint32_t dst2 = 0x3325a8c0; /* 51.37.168.192 --> 192.168.37.51 */ diffs_d = src - dst; diffd_s = dst - src; printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n", src, dst, diffs_d, diffs_d, diffd_s, diffd_s); diffs_d = src - dst2; diffd_s = dst2 - src; printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n", src, dst2, diffs_d, diffs_d, diffd_s, diffd_s); return 0; } ``` Results: src:3225a8c0 dst:b225a8c0, \ diff(s-d)=-2147483648(0x80000000) \ diff(d-s)=-2147483648(0x80000000) src:3225a8c0 dst:3325a8c0, \ diff(s-d)=-16777216(0xff000000) \ diff(d-s)=16777216(0x1000000) In the first case the addresses differences are always < 0, therefore __flow_hash_consistentify() always swaps, thus dst->src and src->dst packets have differents hashes. Fixes: c3f8324188fa8 ("net: Add full IPv6 addresses to flow_keys") Signed-off-by: Ludovic Cintrat <ludovic.cintrat@gatewatcher.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: openvswitch: fix repeated words in commentsJilin Yuan1-1/+1
Delete the redundant word 'is'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller3-9/+33
Florian Westhal says: ==================== netfilter: bugfixes for net The following set contains four netfilter patches for your *net* tree. When there are multiple Contact headers in a SIP message its possible the next headers won't be found because the SIP helper confuses relative and absolute offsets in the message. From Igor Ryzhov. Make the nft_concat_range self-test support socat, this makes the selftest pass on my test VM, from myself. nf_conntrack_irc helper can be tricked into opening a local port forward that the client never requested by embedding a DCC message in a PING request sent to the client. Fix from David Leadbeater. Both have been broken since the kernel 2.6.x days. The 'osf' match might indicate success while it could not find anything, broken since 5.2 . Fix from Pablo Neira. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_vlan: get rid of tcf_vlan_walker and tcf_vlan_searchZhengchao Shao1-19/+0
tcf_vlan_walker() and tcf_vlan_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_tunnel_key: get rid of tunnel_key_walker and tunnel_key_searchZhengchao Shao1-19/+0
tunnel_key_walker() and tunnel_key_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_skbmod: get rid of tcf_skbmod_walker and tcf_skbmod_searchZhengchao Shao1-19/+0
tcf_skbmod_walker() and tcf_skbmod_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_skbedit: get rid of tcf_skbedit_walker and tcf_skbedit_searchZhengchao Shao1-19/+0
tcf_skbedit_walker() and tcf_skbedit_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_simple: get rid of tcf_simp_walker and tcf_simp_searchZhengchao Shao1-19/+0
tcf_simp_walker() and tcf_simp_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_sample: get rid of tcf_sample_walker and tcf_sample_searchZhengchao Shao1-19/+0
tcf_sample_walker() and tcf_sample_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_police: get rid of tcf_police_walker and tcf_police_searchZhengchao Shao1-19/+0
tcf_police_walker() and tcf_police_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_pedit: get rid of tcf_pedit_walker and tcf_pedit_searchZhengchao Shao1-19/+0
tcf_pedit_walker() and tcf_pedit_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_nat: get rid of tcf_nat_walker and tcf_nat_searchZhengchao Shao1-19/+0
tcf_nat_walker() and tcf_nat_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_mpls: get rid of tcf_mpls_walker and tcf_mpls_searchZhengchao Shao1-19/+0
tcf_mpls_walker() and tcf_mpls_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_mirred: get rid of tcf_mirred_walker and tcf_mirred_searchZhengchao Shao1-19/+0
tcf_mirred_walker() and tcf_mirred_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_ipt: get rid of tcf_ipt_walker/tcf_xt_walker and ↵Zhengchao Shao1-38/+0
tcf_ipt_search/tcf_xt_search tcf_ipt_walker()/tcf_xt_walker() and tcf_ipt_search()/tcf_xt_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_ife: get rid of tcf_ife_walker and tcf_ife_searchZhengchao Shao1-19/+0
tcf_ife_walker() and tcf_ife_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_gate: get rid of tcf_gate_walker and tcf_gate_searchZhengchao Shao1-19/+0
tcf_gate_walker() and tcf_gate_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_gact: get rid of tcf_gact_walker and tcf_gact_searchZhengchao Shao1-19/+0
tcf_gact_walker() and tcf_gact_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_ctinfo: get rid of tcf_ctinfo_walker and tcf_ctinfo_searchZhengchao Shao1-19/+0
tcf_ctinfo_walker() and tcf_ctinfo_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>