aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-03-30crypto/chcr: fix incorrect ipv6 packet lengthRohit Maheshwari1-1/+1
IPv6 header's payload length field shouldn't include IPv6 header length. Signed-off-by: Rohit Maheshwari <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-30net: stmmac: Add support for VLAN Rx filteringWong Vee Khee5-0/+245
Add support for VLAN ID-based filtering by the MAC controller for MAC drivers that support it. Only the 12-bit VID field is used. Signed-off-by: Chuah Kim Tatt <[email protected]> Signed-off-by: Ong Boon Leong <[email protected]> Signed-off-by: Wong Vee Khee <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-30Merge branch 'crypto-chelsio-Fixes-issues-during-chcr-driver-registration'David S. Miller2-8/+28
Ayush Sawal says: ==================== crypto: chelsio: Fixes issues during chcr driver registration Patch 1: Avoid the accessing of wrong u_ctx pointer. Patch 2: Fixes a deadlock between rtnl_lock and uld_mutex. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-30Crypto: chelsio - Fixes a deadlock between rtnl_lock and uld_mutexAyush Sawal2-6/+28
The locks are taken in this order during driver registration (uld_mutex), at: cxgb4_register_uld.part.14+0x49/0xd60 [cxgb4] (rtnl_mutex), at: rtnetlink_rcv_msg+0x2db/0x400 (uld_mutex), at: cxgb_up+0x3a/0x7b0 [cxgb4] (rtnl_mutex), at: chcr_add_xfrmops+0x83/0xa0 [chcr](stucked here) To avoid this now the netdev features are updated after the cxgb4_register_uld function is completed. Fixes: 6dad4e8ab3ec6 ("chcr: Add support for Inline IPSec"). Signed-off-by: Ayush Sawal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-30Crypto: chelsio - Fixes a hang issue during driver registrationAyush Sawal1-2/+0
This issue occurs only when multiadapters are present. Hang happens because assign_chcr_device returns u_ctx pointer of adapter which is not yet initialized as for this adapter cxgb_up is not been called yet. The last_dev pointer is used to determine u_ctx pointer and it is initialized two times in chcr_uld_add in chcr_dev_add respectively. The fix here is don't initialize the last_dev pointer during chcr_uld_add. Only assign to value to it when the adapter's initialization is completed i.e in chcr_dev_add. Fixes: fef4912b66d62 ("crypto: chelsio - Handle PCI shutdown event"). Signed-off-by: Ayush Sawal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-30selftests:mptcp: fix failure due to whitespace damageMatthieu Baerts2-8/+8
'pm_nl_ctl' was adding a trailing whitespace after having printed the IP. But at the end, the IP element is currently always the last one. The bash script launching 'pm_nl_ctl' had trailing whitespaces in the expected result on purpose. But these whitespaces have been removed when the patch has been applied upstream. To avoid trailing whitespaces in the bash code, 'pm_nl_ctl' and expected results have now been adapted. The MPTCP PM selftest can now pass again. Fixes: eedbc685321b (selftests: add PM netlink functional tests) Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-30net: ethernet: ti: fix spelling mistake "rundom" -> "random"Colin Ian King1-1/+1
There is a spelling mistake in a dev_err error message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-30Merge tag 'mlx5-updates-2020-03-29' of ↵David S. Miller16-86/+153
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-03-29 1) mlx5 core level updates for mkey APIs + migrate some code to mlx5_ib 2) Use a separate work queue for fib event handling 3) Move new eswitch chains files to a new directory, to prepare for upcoming E-Switch and offloads features. 4) Support indr block setup (TC_SETUP_FT) in Flow Table mode. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-29net/mlx5e: add mlx5e_rep_indr_setup_ft_cb supportwenxu1-0/+49
Add mlx5e_rep_indr_setup_ft_cb to support indr block setup in FT mode. Both tc rules and flow table rules are of the same format, It can re-use tc parsing for that, and move the flow table rules to their steering domain(the specific chain_index), the indr block offload in FT also follow this scenario. Signed-off-by: wenxu <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-03-29net/mlx5e: refactor indr setup blockwenxu1-21/+21
Refactor indr setup block for support ft indr setup in the next patch. The function mlx5e_rep_indr_offload exposes 'flags' in order set additional flag for FT in next patch. Rename mlx5e_rep_indr_setup_tc_block to mlx5e_rep_indr_setup_block and add flow_setup_cb_t callback parameters in order set the specific callback for FT in next patch. Signed-off-by: wenxu <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-03-29net/mlx5: E-Switch: Move eswitch chains to a new directorySaeed Mahameed8-6/+10
eswitch_offloads_chains.{c,h} were just introduced this kernel release cycle, eswitch is in high development demand right now and many features are planned to be added to it. eswitch deserves its own directory and here we move these new files to there, in preparation for upcoming eswitch features and new files. Signed-off-by: Saeed Mahameed <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Reviewed-by: Roi Dayan <[email protected]>
2020-03-29net/mlx5: Use a separate work queue for fib event handlingMark Zhang2-4/+11
In VF lag mode when remove the bonding module without bring down the bond device first, we could potentially have circular dependency when we unload IB devices and also handle fib events: 1. The bond work starts first; 2. The "modprobe -rv bonding" process tries to release the bond device, with the "pernet_ops_rwsem" lock hold; 3. The bond work blocks in unregister_netdevice_notifier() and waits for the lock because fib event came right before; 4. The kernel fib module tries to free all the fib entries by broadcasting the "FIB_EVENT_NH_DEL" event; 5. Upon the fib event this lag_mp module holds the fib lock and queue a fib work. So: bond work -> modprobe task -> kernel fib module -> lag_mp -> bond work Today we either reload IB devices in roce lag in nic mode or either handle fib events in switchdev mode, but a new feature could change that we'll need to reload IB devices also in switchdev mode so this is a future proof fix as one may not notice this later. Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-03-29Merge branch 'mlx5-next' of ↵Saeed Mahameed6-55/+62
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: mlx5: Remove uninitialized use of key in mlx5_core_create_mkey {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib {IB,net}/mlx5: Assign mkey variant in mlx5_ib only {IB,net}/mlx5: Setup mkey variant before mr create command invocation Signed-off-by: Saeed Mahameed <[email protected]>
2020-03-29Merge branch 'ethtool-netlink-interface-part-4'David S. Miller17-31/+1350
Michal Kubecek says: ==================== ethtool netlink interface, part 4 Implementation of more netlink request types: - coalescing (ethtool -c/-C, patches 2-4) - pause parameters (ethtool -a/-A, patches 5-7) - EEE settings (--show-eee / --set-eee, patches 8-10) - timestamping info (-T, patches 11-12) Patch 1 is a fix for netdev reference leak similar to commit 2f599ec422ad ("ethtool: fix reference leak in some *_SET handlers") but fixing a code Changes in v3 - change "one-step-*" Tx type names to "onestep-*", (patch 11, suggested by Richard Cochran - use "TSINFO" rather than "TIMESTAMP" for timestamping information constants and adjust symbol names (patch 12, suggested by Richard Cochran) Changes in v2: - fix compiler warning in net_hwtstamp_validate() (patch 11) - fix follow-up lines alignment (whitespace only, patches 3 and 8) which is only in net-next tree at the moment. ==================== Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: provide timestamping information with TSINFO_GET requestMichal Kubecek9-21/+225
Implement TSINFO_GET request to get timestamping information for a network device. This is traditionally available via ETHTOOL_GET_TS_INFO ioctl request. Move part of ethtool_get_ts_info() into common.c so that ioctl and netlink code use the same logic to get timestamping information from the device. v3: use "TSINFO" rather than "TIMESTAMP", suggested by Richard Cochran Signed-off-by: Michal Kubecek <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: add timestamping related string setsMichal Kubecek6-0/+87
Add three string sets related to timestamping information: ETH_SS_SOF_TIMESTAMPING: SOF_TIMESTAMPING_* flags ETH_SS_TS_TX_TYPES: timestamping Tx types ETH_SS_TS_RX_FILTERS: timestamping Rx filters These will be used for TIMESTAMP_GET request. v2: avoid compiler warning ("enumeration value not handled in switch") in net_hwtstamp_validate() v3: omit dash in Tx type names ("one-step-*" -> "onestep-*"), suggested by Richard Cochran Signed-off-by: Michal Kubecek <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: add EEE_NTF notificationMichal Kubecek5-1/+12
Send ETHTOOL_MSG_EEE_NTF notification whenever EEE settings of a network device are modified using ETHTOOL_MSG_EEE_SET netlink message or ETHTOOL_SEEE ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: set EEE settings with EEE_SET requestMichal Kubecek5-1/+104
Implement EEE_SET netlink request to set EEE settings of a network device. These are traditionally set with ETHTOOL_SEEE ioctl request. The netlink interface allows setting the EEE status for all link modes supported by kernel but only first 32 link modes can be set at the moment as only those are supported by the ethtool_ops callback. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: provide EEE settings with EEE_GET requestMichal Kubecek6-2/+192
Implement EEE_GET request to get EEE settings of a network device. These are traditionally available via ETHTOOL_GEEE ioctl request. The netlink interface allows reporting EEE status for all link modes supported by kernel but only first 32 link modes are provided at the moment as only those are reported by the ethtool_ops callback and drivers. v2: fix alignment (whitespace only) Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: add PAUSE_NTF notificationMichal Kubecek5-1/+12
Send ETHTOOL_MSG_PAUSE_NTF notification whenever pause parameters of a network device are modified using ETHTOOL_MSG_PAUSE_SET netlink message or ETHTOOL_SPAUSEPARAM ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: set pause parameters with PAUSE_SET requestMichal Kubecek5-1/+70
Implement PAUSE_SET netlink request to set pause parameters of a network device. Thease are traditionally set with ETHTOOL_SPAUSEPARAM ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: provide pause parameters with PAUSE_GET requestMichal Kubecek6-2/+146
Implement PAUSE_GET request to get pause parameters of a network device. These are traditionally available via ETHTOOL_GPAUSEPARAM ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: add COALESCE_NTF notificationMichal Kubecek5-1/+12
Send ETHTOOL_MSG_COALESCE_NTF notification whenever coalescing parameters of a network device are modified using ETHTOOL_MSG_COALESCE_SET netlink message or ETHTOOL_SCOALESCE ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: set coalescing parameters with COALESCE_SET requestMichal Kubecek5-1/+184
Implement COALESCE_SET netlink request to set coalescing parameters of a network device. These are traditionally set with ETHTOOL_SCOALESCE ioctl request. This commit adds only support for device coalescing parameters, not per queue coalescing parameters. Like the ioctl implementation, the generic ethtool code checks if only supported parameters are modified; if not, first offending attribute is reported using extack. v2: fix alignment (whitespace only) Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: provide coalescing parameters with COALESCE_GET requestMichal Kubecek6-2/+306
Implement COALESCE_GET request to get coalescing parameters of a network device. These are traditionally available via ETHTOOL_GCOALESCE ioctl request. This commit adds only support for device coalescing parameters, not per queue coalescing parameters. Omit attributes with zero values unless they are declared as supported (i.e. the corresponding bit in ethtool_ops::supported_coalesce_params is set). Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29ethtool: fix reference leak in ethnl_set_privflags()Michal Kubecek1-1/+3
Andrew noticed that some handlers for *_SET commands leak a netdev reference if required ethtool_ops callbacks do not exist. One of them is ethnl_set_privflags(), a simple reproducer would be e.g. ip link add veth1 type veth peer name veth2 ethtool --set-priv-flags veth1 foo on ip link del veth1 Make sure dev_put() is called when ethtool_ops check fails. Fixes: f265d799596a ("ethtool: set device private flags with PRIVFLAGS_SET request") Reported-by: Andrew Lunn <[email protected]> Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29Merge branch 'ipv6-add-rpl-source-routing'David S. Miller27-31/+940
Alexander Aring says: ==================== net: ipv6: add rpl source routing This patch series will add handling for RPL source routing handling and insertion (implement as lwtunnel)! I did an example prototype implementation in rpld for using this implementation in non-storing mode: https://github.com/linux-wpan/rpld/tree/nonstoring_mode I will also present a talk at netdev about it: https://netdevconf.info/0x14/session.html?talk-extend-segment-routing-for-RPL In receive handling I add handling for IPIP encapsulation as RFC6554 describes it as possible. For reasons I didn't implemented it yet for generating such packets because I am not really sure how/when this should happen. So far I understand there exists a draft yet which describes the cases (inclusive a Hop-by-Hop option which we also not support yet). https://tools.ietf.org/html/draft-ietf-roll-useofrplinfo-35 This is just the beginning to start implementation everything for yet, step by step. It works for my use cases yet to have it running on a 6LOWPAN _only_ network. I have some patches for iproute2 as well. A sidenote: I check on local addresses if they are part of segment routes, this is just to avoid stupid settings. A use can add addresses afterwards what I cannot control anymore but then it's users fault to make such thing. The receive handling checks for this as well which is required by RFC6554, so the next hops or when it comes back should drop it anyway. To make this possible I added functionality to pass the net structure to the build_state of lwtunnel (I hope I caught all lwtunnels). Another sidenote: I set the headroom value to 0 as I figured out it will break on interfaces with IPv6 min mtu if set to non zero for tunnels on L3. - Alex changes since v3: - use parse_nested which isn't deprecated - Thanks David Ahern - change to return -1 instead errno in exthdr handling to unify error code - change function name from ipv6_rpl_srh_decompress_size to ipv6_rpl_srh_size changes since v2: - add additional segdata length in lwtunnel build_state - fix build_state patch by not catching one inline noop function if LWTUNNEL is disabled Alexander Aring (5): include: uapi: linux: add rpl sr header definition addrconf: add functionality to check on rpl requirements net: ipv6: add support for rpl sr exthdr net: add net available in build_state net: ipv6: add rpl sr tunnel ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-29net: ipv6: add rpl sr tunnelAlexander Aring8-0/+434
This patch adds functionality to configure routes for RPL source routing functionality. There is no IPIP functionality yet implemented which can be added later when the cases when to use IPv6 encapuslation comes more clear. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29net: add net available in build_stateAlexander Aring13-28/+32
The build_state callback of lwtunnel doesn't contain the net namespace structure yet. This patch will add it so we can check on specific address configuration at creation time of rpl source routes. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29net: ipv6: add support for rpl sr exthdrAlexander Aring7-3/+370
This patch adds rpl source routing receive handling. Everything works only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly the same behaviour as IPv6 segmentation routing. To handle compression and uncompression a rpl.c file is created which contains the necessary functionality. The receive handling will also care about IPv6 encapsulated so far it's specified as possible nexthdr in RFC 6554. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29addrconf: add functionality to check on rpl requirementsAlexander Aring2-0/+56
This patch adds a functionality to addrconf to check on a specific RPL address configuration. According to RFC 6554: To detect loops in the SRH, a router MUST determine if the SRH includes multiple addresses assigned to any interface on that router. If such addresses appear more than once and are separated by at least one address not assigned to that router. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29include: uapi: linux: add rpl sr header definitionAlexander Aring1-0/+48
This patch adds a uapi header for rpl struct definition. The segments data can be accessed over rpl_segaddr or rpl_segdata macros. In case of compri and compre is zero the segment data is not compressed and can be accessed by rpl_segaddr. In the other case the compressed data can be accessed by rpl_segdata and interpreted as byte array. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29Merge branch 'mptcp-multiple-subflows-path-management'David S. Miller27-125/+4158
Mat Martineau says: ==================== Multipath TCP part 3: Multiple subflows and path management v2 -> v3: Remove 'inline' in .c files, fix uapi bit macros, and rebase. v1 -> v2: Rebase on current net-next, fix for netlink limit setting, and update .gitignore for selftest. This patch set allows more than one TCP subflow to be established and used for a multipath TCP connection. Subflows are added to an existing connection using the MP_JOIN option during the 3-way handshake. With multiple TCP subflows available, sent data is now stored in the MPTCP socket so it may be retransmitted on any TCP subflow if there is no DATA_ACK before a timeout. If an MPTCP-level timeout occurs, data is retransmitted using an available subflow. Storing this sent data requires the addition of memory accounting at the MPTCP level, which was previously delegated to the single subflow. Incoming DATA_ACKs now free data from the MPTCP-level retransmit buffer. IP addresses available for new subflow connections can now be advertised and received with the ADD_ADDR option, and the corresponding REMOVE_ADDR option likewise advertises that an address is no longer available. The MPTCP path manager netlink interface has commands to set in-kernel limits for the number of concurrent subflows and control the advertisement of IP addresses between peers. To track and debug MPTCP connections there are new MPTCP MIB counters, and subflow context can be requested using inet_diag. The MPTCP self-tests now validate multiple-subflow operation and the netlink path manager interface. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-29selftests: add test-cases for MPTCP MP_JOINPaolo Abeni3-4/+383
Use the pm netlink to configure the creation of several subflows, and verify that via MIB counters. Update the mptcp_connect program to allow reliable MP_JOIN handshake even on small data file Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29selftests: add PM netlink functional testsPaolo Abeni4-3/+751
This introduces basic self-tests for the PM netlink, checking the basic APIs and possible exceptional values. Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: add netlink-based PMPaolo Abeni5-2/+928
Expose a new netlink family to userspace to control the PM, setting: - list of local addresses to be signalled. - list of local addresses used to created subflows. - maximum number of add_addr option to react When the msk is fully established, the PM netlink attempts to announce the 'signal' list via the ADD_ADDR option. Since we currently lack the ADD_ADDR echo (and related event) only the first addr is sent. After exhausting the 'announce' list, the PM tries to create subflow for each addr in 'local' list, waiting for each connection to be completed before attempting the next one. Idea is to add an additional PM hook for ADD_ADDR echo, to allow the PM netlink announcing multiple addresses, in sequence. Co-developed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: add and use MIB counter infrastructureFlorian Westphal9-15/+172
Exported via same /proc file as the Linux TCP MIB counters, so "netstat -s" or "nstat" will show them automatically. The MPTCP MIB counters are allocated in a distinct pcpu area in order to avoid bloating/wasting TCP pcpu memory. Counters are allocated once the first MPTCP socket is created in a network namespace and free'd on exit. If no sockets have been allocated, all-zero mptcp counters are shown. The MIB counter list is taken from the multipath-tcp.org kernel, but only a few counters have been picked up so far. The counter list can be increased at any time later on. v2 -> v3: - remove 'inline' in foo.c files (David S. Miller) Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: allow dumping subflow context to userspaceDavide Caratti7-1/+146
add ulp-specific diagnostic functions, so that subflow information can be dumped to userspace programs like 'ss'. v2 -> v3: - uapi: use bit macros appropriate for userspace Co-developed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: implement and use MPTCP-level retransmissionPaolo Abeni2-4/+95
On timeout event, schedule a work queue to do the retransmission. Retransmission code closely resembles the sendmsg() implementation and re-uses mptcp_sendmsg_frag, providing a dummy msghdr - for flags' sake - and peeking the relevant dfrag from the rtx head. Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: rework mptcp_sendmsg_frag to accept optional dfragPaolo Abeni1-49/+74
This will simplify mptcp-level retransmission implementation in the next patch. If dfrag is provided by the caller, skip kernel space memory allocation and use data and metadata provided by the dfrag itself. Because a peer could ack data at TCP level but refrain from sending mptcp-level ACKs, we could grow the mptcp socket backlog indefinitely. We should thus block mptcp_sendmsg until the peer has acked some of the sent data. In order to be able to do so, increment the mptcp socket wmem_queued counter on memory allocation and decrement it when releasing the memory on mptcp-level ack reception. Because TCP performns sndbuf auto-tuning up to tcp_wmem_max[2], make this the mptcp sk_sndbuf limit. In the future we could add experiment with autotuning as TCP does in tcp_sndbuf_expand(). v2 -> v3: - remove 'inline' in foo.c files (David S. Miller) Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: allow partial cleaning of rtx head dfragFlorian Westphal2-0/+26
After adding wmem accounting for the mptcp socket we could get into a situation where the mptcp socket can't transmit more data, and mptcp_clean_una doesn't reduce wmem even if snd_una has advanced because it currently will only remove entire dfrags. Allow advancing the dfrag head sequence and reduce wmem, even though this isn't correct (as we can't release the page). Because we will soon block on mptcp sk in case wmem is too large, call sk_stream_write_space() in case we reduced the backlog so userspace task blocked in sendmsg or poll will be woken up. This isn't an issue if the send buffer is large, but it is when SO_SNDBUF is used to reduce it to a lower value. Note we can still get a deadlock for low SO_SNDBUF values in case both sides of the connection write to the socket: both could be blocked due to wmem being too small -- and current mptcp stack will only increment mptcp ack_seq on recv. This doesn't happen with the selftest as it uses poll() and will always call recv if there is data to read. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: implement memory accounting for mptcp rtx queuePaolo Abeni1-3/+39
Charge the data on the rtx queue to the master MPTCP socket, too. Such memory in uncharged when the data is acked/dequeued. Also account mptcp sockets inuse via a protocol specific pcpu counter. Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: introduce MPTCP retransmission timerPaolo Abeni3-2/+93
The timer will be used to schedule retransmission. It's frequency is based on the current subflow RTO estimation and is reset on every una_seq update The timer is clearer for good by __mptcp_clear_xmit() Also clean MPTCP rtx queue before each transmission. Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: queue data for mptcp level retransmissionPaolo Abeni2-8/+147
Keep the send page fragment on an MPTCP level retransmission queue. The queue entries are allocated inside the page frag allocator, acquiring an additional reference to the page for each list entry. Also switch to a custom page frag refill function, to ensure that the current page fragment can always host an MPTCP rtx queue entry. The MPTCP rtx queue is flushed at disconnect() and close() time Note that now we need to call __mptcp_init_sock() regardless of mptcp enable status, as the destructor will try to walk the rtx_queue. v2 -> v3: - remove 'inline' in foo.c files (David S. Miller) Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: update per unacked sequence on pkt receptionPaolo Abeni3-6/+49
So that we keep per unacked sequence number consistent; since we update per msk data, use an atomic64 cmpxchg() to protect against concurrent updates from multiple subflows. Initialize the snd_una at connect()/accept() time. Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: Implement path manager interface commandsPeter Krystad3-5/+129
Fill in more path manager functionality by adding a worker function and modifying the related stub functions to schedule the worker. Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Peter Krystad <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: Add handling of outgoing MP_JOIN requestsPeter Krystad5-17/+287
Subflow creation may be initiated by the path manager when the primary connection is fully established and a remote address has been received via ADD_ADDR. Create an in-kernel sock and use kernel_connect() to initiate connection. Passive sockets can't acquire the mptcp socket lock at subflow creation time, so an additional list protected by a new spinlock is used to track the MPJ subflows. Such list is spliced into conn_list tail every time the msk socket lock is acquired, so that it will not interfere with data flow on the original connection. Data flow and connection failover not addressed by this commit. Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Co-developed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Peter Krystad <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: Add handling of incoming MP_JOIN requestsPeter Krystad8-46/+390
Process the MP_JOIN option in a SYN packet with the same flow as MP_CAPABLE but when the third ACK is received add the subflow to the MPTCP socket subflow list instead of adding it to the TCP socket accept queue. The subflow is added at the end of the subflow list so it will not interfere with the existing subflows operation and no data is expected to be transmitted on it. Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Peter Krystad <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: Add path manager interfacePeter Krystad6-19/+264
Add enough of a path manager interface to allow sending of ADD_ADDR when an incoming MPTCP connection is created. Capable of sending only a single IPv4 ADD_ADDR option. The 'pm_data' element of the connection sock will need to be expanded to handle multiple interfaces and IPv6. Partial processing of the incoming ADD_ADDR is included so the path manager notification of that event happens at the proper time, which involves validating the incoming address information. This is a skeleton interface definition for events generated by MPTCP. Co-developed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Co-developed-by: Mat Martineau <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Peter Krystad <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29mptcp: Add ADD_ADDR handlingPeter Krystad5-18/+262
Add handling for sending and receiving the ADD_ADDR, ADD_ADDR6, and RM_ADDR suboptions. Co-developed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Peter Krystad <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>