aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2020-05-22nexthop: add support for notifiersRoopa Prabhu1-0/+27
This patch adds nexthop add/del notifiers. To be used by vxlan driver in a later patch. Could possibly be used by switchdev drivers in the future. Signed-off-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-22vxlan: ecmp support for mac fdb entriesRoopa Prabhu1-0/+2
Todays vxlan mac fdb entries can point to multiple remote ips (rdsts) with the sole purpose of replicating broadcast-multicast and unknown unicast packets to those remote ips. E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (E-VPN multihoming is analogous to multi-homed LAG implementations, but with the inter-switch peerlink replaced with a vxlan tunnel). In other words it needs support for mac ecmp. Furthermore, for faster convergence, E-VPN multihoming needs the ability to update fdb ecmp nexthops independent of the fdb entries. New route nexthop API is perfect for this usecase. This patch extends the vxlan fdb code to take a nexthop id pointing to an ecmp nexthop group. Changes include: - New NDA_NH_ID attribute for fdbs - Use the newly added fdb nexthop groups - makes vxlan rdsts and nexthop handling code mutually exclusive - since this is a new use-case and the requirement is for ecmp nexthop groups, the fdb add and update path checks that the nexthop is really an ecmp nexthop group. This check can be relaxed in the future, if we want to introduce replication fdb nexthop groups and allow its use in lieu of current rdst lists. - fdb update requests with nexthop id's only allowed for existing fdb's that have nexthop id's - learning will not override an existing fdb entry with nexthop group - I have wrapped the switchdev offload code around the presence of rdst [1] E-VPN RFC https://tools.ietf.org/html/rfc7432 [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365 [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf Includes a null check fix in vxlan_xmit from Nikolay v2 - Fixed build issue: Reported-by: kbuild test robot <[email protected]> Signed-off-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-22nexthop: support for fdb ecmp nexthopsRoopa Prabhu2-25/+112
This patch introduces ecmp nexthops and nexthop groups for mac fdb entries. In subsequent patches this is used by the vxlan driver fdb entries. The use case is E-VPN multihoming [1,2,3] which requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (This is analogous to a multi-homed LAG but over vxlan). Changes include new nexthop flag NHA_FDB for nexthops referenced by fdb entries. These nexthops only have ip. This patch includes appropriate checks to avoid routes referencing such nexthops. example: $ip nexthop add id 12 via 172.16.1.2 fdb $ip nexthop add id 13 via 172.16.1.3 fdb $ip nexthop add id 102 group 12/13 fdb $bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self [1] E-VPN https://tools.ietf.org/html/rfc7432 [2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365 [3] LPC talk with mention of nexthop groups for L2 ecmp http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf v4 - fixed uninitialized variable reported by kernel test robot Reported-by: kernel test robot <[email protected]> Signed-off-by: Roopa Prabhu <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-21ethtool: provide UAPI for PHY Signal Quality Index (SQI)Oleksij Rempel1-1/+74
Signal Quality Index is a mandatory value required by "OPEN Alliance SIG" for the 100Base-T1 PHYs [1]. This indicator can be used for cable integrity diagnostic and investigating other noise sources and implement by at least two vendors: NXP[2] and TI[3]. [1] http://www.opensig.org/download/document/218/Advanced_PHY_features_for_automotive_Ethernet_V1.0.pdf [2] https://www.nxp.com/docs/en/data-sheet/TJA1100.pdf [3] https://www.ti.com/product/DP83TC811R-Q1 Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-21net: psample: Add tunnel supportChris Mi1-0/+157
Currently, psample can only send the packet bits after decapsulation. The tunnel information is lost. Add the tunnel support. If the sampled packet has no tunnel info, the behavior is the same as before. If it has, add a nested metadata field named PSAMPLE_ATTR_TUNNEL and include the tunnel subfields if applicable. Increase the metadata length for sampled packet with the tunnel info. If new subfields of tunnel info should be included, update the metadata length accordingly. Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-20atm: switch do_atmif_sioc() to direct use of atm_dev_ioctl()Al Viro1-21/+4
Signed-off-by: Al Viro <[email protected]>
2020-05-20atm: lift copyin from atm_dev_ioctl()Al Viro3-33/+31
Signed-off-by: Al Viro <[email protected]>
2020-05-20atm: switch do_atm_iobuf() to direct use of atm_getnames()Al Viro1-22/+3
... and sod the compat_alloc_user_space() with its complications Signed-off-by: Al Viro <[email protected]>
2020-05-20atm: move copyin from atm_getnames() into the callerAl Viro3-20/+20
Signed-off-by: Al Viro <[email protected]>
2020-05-20atm: separate ATM_GETNAMES handling from the rest of atm_dev_ioctl()Al Viro3-44/+51
atm_dev_ioctl() does copyin in two different ways - one for ATM_GETNAMES, another for everything else. Start with separating the former into a new helper (atm_getnames()). The next step will be to lift the copyin into the callers. Signed-off-by: Al Viro <[email protected]>
2020-05-20batadv_socket_read(): get rid of pointless access_ok()Al Viro1-3/+0
address is passed only to copy_to_user() Signed-off-by: Al Viro <[email protected]>
2020-05-20get rid of compat_mc_setsockopt()Al Viro1-90/+0
not used anymore Signed-off-by: Al Viro <[email protected]>
2020-05-20handle the group_source_req options directlyAl Viro2-4/+42
Native ->setsockopt() handling of these options (MCAST_..._SOURCE_GROUP and MCAST_{,UN}BLOCK_SOURCE) consists of copyin + call of a helper that does the actual work. The only change needed for ->compat_setsockopt() is a slightly different copyin - the helpers can be reused as-is. Signed-off-by: Al Viro <[email protected]>
2020-05-20ipv6: take handling of group_source_req options into a helperAl Viro1-29/+36
Signed-off-by: Al Viro <[email protected]>
2020-05-20ipv4: take handling of group_source_req options into a helperAl Viro1-39/+44
Signed-off-by: Al Viro <[email protected]>
2020-05-20ipv[46]: do compat setsockopt for MCAST_{JOIN,LEAVE}_GROUP directlyAl Viro2-0/+59
direct parallel to the way these two are handled in the native ->setsockopt() instances - the helpers that do the real work are already separated and can be reused as-is in this case. Signed-off-by: Al Viro <[email protected]>
2020-05-20ipv6: do compat setsockopt for MCAST_MSFILTER directlyAl Viro1-1/+47
similar to the ipv4 counterpart of that patch - the same trick used to align the tail array properly. Signed-off-by: Al Viro <[email protected]>
2020-05-20ip6_mc_msfilter(): pass the address list separatelyAl Viro2-4/+5
that way we'll be able to reuse it for compat case Signed-off-by: Al Viro <[email protected]>
2020-05-20ipv4: do compat setsockopt for MCAST_MSFILTER directlyAl Viro1-1/+47
Parallel to what the native setsockopt() does, except that unlike the native setsockopt() we do not use memdup_user() - we want the sockaddr_storage fields properly aligned, so we allocate 4 bytes more and copy compat_group_filter at the offset 4, which yields the proper alignments. Signed-off-by: Al Viro <[email protected]>
2020-05-20set_mcast_msfilter(): take the guts of setsockopt(MCAST_MSFILTER) into a helperAl Viro1-33/+40
Signed-off-by: Al Viro <[email protected]>
2020-05-20get rid of compat_mc_getsockopt()Al Viro3-85/+79
now we can do MCAST_MSFILTER in compat ->getsockopt() without playing silly buggers with copying things back and forth. We can form a native struct group_filter (sans the variable-length tail) on stack, pass that + pointer to the tail of original request to the helper doing the bulk of the work, then do the rest of copyout - same as the native getsockopt() does. Signed-off-by: Al Viro <[email protected]>
2020-05-20ip*_mc_gsfget(): lift copyout of struct group_filter into callersAl Viro4-29/+36
pass the userland pointer to the array in its tail, so that part gets copied out by our functions; copyout of everything else is done in the callers. Rationale: reuse for compat; the array is the same in native and compat, the layout of parts before it is different for compat. Signed-off-by: Al Viro <[email protected]>
2020-05-20compat_ip{,v6}_setsockopt(): enumerate MCAST_... options explicitlyAl Viro2-2/+18
We want to check if optname is among the MCAST_... ones; do that as an explicit switch. Signed-off-by: Al Viro <[email protected]>
2020-05-20lift compat definitions of mcast [sg]etsockopt requests into net/compat.hAl Viro1-25/+0
We want to get rid of compat_mc_[sg]etsockopt() and to have that stuff handled without compat_alloc_user_space(), extra copying through userland, etc. To do that we'll need ipv4 and ipv6 instances of ->compat_[sg]etsockopt() to manipulate the 32bit variants of mcast requests, so we need to move the definitions of those out of net/compat.c and into a public header. This patch just does a mechanical move to include/net/compat.h Signed-off-by: Al Viro <[email protected]>
2020-05-20rds: fix crash in rds_info_getsockopt()John Hubbard1-1/+2
The conversion to pin_user_pages() had a bug: it overlooked the case of allocation of pages failing. Fix that by restoring an equivalent check. Reported-by: [email protected] Fixes: dbfe7d74376e ("rds: convert get_user_pages() --> pin_user_pages()") Cc: David S. Miller <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: John Hubbard <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19net: unexport skb_gro_receive()Eric Dumazet1-2/+0
skb_gro_receive() used to be used by SCTP, it is no longer the case. skb_gro_receive_list() is in the same category : never used from modules. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19ipv6: use ->ndo_tunnel_ctl in addrconf_set_dstaddrChristoph Hellwig1-7/+2
Use the new ->ndo_tunnel_ctl instead of overriding the address limit and using ->ndo_do_ioctl just to do a pointless user copy. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19ipv6: streamline addrconf_set_dstaddrChristoph Hellwig1-49/+38
Factor out a addrconf_set_sit_dstaddr helper for the actual work if we found a SIT device, and only hold the rtnl lock around the device lookup and that new helper, as there is no point in holding it over a copy_from_user call. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19ipv6: stub out even more of addrconf_set_dstaddr if SIT is disabledChristoph Hellwig1-2/+3
There is no point in copying the structure from userspace or looking up a device if SIT support is not disabled and we'll eventually return -ENODEV anyway. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19sit: impement ->ndo_tunnel_ctlChristoph Hellwig1-39/+34
Implement the ->ndo_tunnel_ctl method, and use ip_tunnel_ioctl to handle userspace requests for the SIOCGETTUNNEL, SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19sit: refactor ipip6_tunnel_ioctlChristoph Hellwig1-158/+210
Split the ioctl handler into one function per command instead of having a all the logic sit in one giant switch statement. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19impr: use ->ndo_tunnel_ctl in ipmr_new_tunnelChristoph Hellwig1-11/+3
Use the new ->ndo_tunnel_ctl instead of overriding the address limit and using ->ndo_do_ioctl just to do a pointless user copy. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19net: add a new ndo_tunnel_ioctl methodChristoph Hellwig4-62/+51
This method is used to properly allow kernel callers of the IPv4 route management ioctls. The exsting ip_tunnel_ioctl helper is renamed to ip_tunnel_ctl to better reflect that it doesn't directly implement ioctls touching user memory, and is used for the guts of ndo_tunnel_ctl implementations. A new ip_tunnel_ioctl helper is added that can be wired up directly to the ndo_do_ioctl method and takes care of the copy to and from userspace. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19ipv4: consolidate the VIFF_TUNNEL handling in ipmr_new_tunnelChristoph Hellwig1-40/+13
Also move the dev_set_allmulti call and the error handling into the ioctl helper. This allows reusing already looked up tunnel_dev pointer and the set up argument structure for the deletion in the error handler. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19ipv4: streamline ipmr_new_tunnelChristoph Hellwig1-37/+36
Reduce a few level of indentation to simplify the function. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19net/af_iucv: clean up function prototypesJulian Wiedmann1-57/+51
Remove a bunch of forward declarations (trivially shifting code around where needed), and make a few functions static. Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19net/af_iucv: remove a redundant zero initializationJulian Wiedmann1-1/+0
txmsg is declared as {0}, no need to clear individual fields later on. Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19net/af_iucv: replace open-coded U16_MAXJulian Wiedmann1-1/+2
Improve the readability of a range check. Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19net/af_iucv: remove pm supportJulian Wiedmann1-140/+1
commit 394216275c7d ("s390: remove broken hibernate / power management support") removed support for ARCH_HIBERNATION_POSSIBLE from s390. So drop the unused pm ops from the s390-only af_iucv socket code. Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-19net/iucv: remove pm supportJulian Wiedmann1-188/+0
commit 394216275c7d ("s390: remove broken hibernate / power management support") removed support for ARCH_HIBERNATION_POSSIBLE from s390. So drop the unused pm ops from the s390-only iucv bus driver. CC: Hendrik Brueckner <[email protected]> Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-18ipv4,appletalk: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctlChristoph Hellwig3-73/+76
To prepare removing the global routing_ioctl hack start lifting the code into the ipv4 and appletalk ->compat_ioctl handlers. Unlike the existing handler we don't bother copying in the name - there are no compat issues for char arrays. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-18appletalk: factor out a atrtr_ioctl_addrt helperChristoph Hellwig1-13/+20
Add a helper than can be shared with the upcoming compat ioctl handler. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-18ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctlChristoph Hellwig7-46/+75
To prepare removing the global routing_ioctl hack start lifting the code into a newly added ipv6 ->compat_ioctl handler. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-18ipv6: lift copy_from_user out of ipv6_route_ioctlChristoph Hellwig2-34/+26
Prepare for better compat ioctl handling by moving the user copy out of ipv6_route_ioctl. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-17rds: convert get_user_pages() --> pin_user_pages()John Hubbard1-4/+2
This code was using get_user_pages_fast(), in a "Case 2" scenario (DMA/RDMA), using the categorization from [1]. That means that it's time to convert the get_user_pages_fast() + put_page() calls to pin_user_pages_fast() + unpin_user_pages() calls. There is some helpful background in [2]: basically, this is a small part of fixing a long-standing disconnect between pinning pages, and file systems' use of those pages. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Cc: David S. Miller <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: John Hubbard <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-17net: allow __skb_ext_alloc to sleepFlorian Westphal2-4/+8
mptcp calls this from the transmit side, from process context. Allow a sleeping allocation instead of unconditional GFP_ATOMIC. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-17mptcp: remove inner wait loop from mptcp_sendmsg_fragFlorian Westphal1-14/+0
previous patches made sure we only call into this function when these prerequisites are met, so no need to wait on the subflow socket anymore. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/7 Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-17mptcp: fill skb page frag cache outside of mptcp_sendmsg_fragFlorian Westphal1-1/+6
The mptcp_sendmsg_frag helper contains a loop that will wait on the subflow sk. It seems preferrable to only wait in mptcp_sendmsg() when blocking io is requested. mptcp_sendmsg already has such a wait loop that is used when no subflow socket is available for transmission. This is another preparation patch that makes sure we call mptcp_sendmsg_frag only if the page frag cache has been refilled. Followup patch will remove the wait loop from mptcp_sendmsg_frag(). The retransmit worker doesn't need to do this refill as it won't transmit new mptcp-level data. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-17mptcp: fill skb extension cache outside of mptcp_sendmsg_fragFlorian Westphal1-4/+14
The mptcp_sendmsg_frag helper contains a loop that will wait on the subflow sk. It seems preferrable to only wait in mptcp_sendmsg() when blocking io is requested. mptcp_sendmsg already has such a wait loop that is used when no subflow socket is available for transmission. This is a preparation patch that makes sure we call mptcp_sendmsg_frag only if a skb extension has been allocated. Moreover, such allocation currently uses GFP_ATOMIC while it could use sleeping allocation instead. Followup patches will remove the wait loop from mptcp_sendmsg_frag() and will allow to do a sleeping allocation for the extension. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-17mptcp: avoid blocking in tcp_sendpagesFlorian Westphal1-3/+32
The transmit loop continues to xmit new data until an error is returned or all data was transmitted. For the blocking i/o case, this means that tcp_sendpages() may block on the subflow until more space becomes available, i.e. we end up sleeping with the mptcp socket lock held. Instead we should check if a different subflow is ready to be used. This restarts the subflow sk lookup when the tx operation succeeded and the tcp subflow can't accept more data or if tcp_sendpages indicates -EAGAIN on a blocking mptcp socket. In that case we also need to set the NOSPACE bit to make sure we get notified once memory becomes available. In case all subflows are busy, the existing logic will wait until a subflow is ready, releasing the mptcp socket lock while doing so. The mptcp worker already sets DONTWAIT, so no need to make changes there. v2: * set NOSPACE bit * add a comment to clarify that mptcp-sk sndbuf limits need to be checked as well. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>