aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2019-04-08xfrm: make xfrm modes builtinFlorian Westphal1-9/+4
after previous changes, xfrm_mode contains no function pointers anymore and all modules defining such struct contain no code except an init/exit functions to register the xfrm_mode struct with the xfrm core. Just place the xfrm modes core and remove the modules, the run-time xfrm_mode register/unregister functionality is removed. Before: text data bss dec filename 7523 200 2364 10087 net/xfrm/xfrm_input.o 40003 628 440 41071 net/xfrm/xfrm_state.o 15730338 6937080 4046908 26714326 vmlinux 7389 200 2364 9953 net/xfrm/xfrm_input.o 40574 656 440 41670 net/xfrm/xfrm_state.o 15730084 6937068 4046908 26714060 vmlinux The xfrm*_mode_{transport,tunnel,beet} modules are gone. v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6 ones rather than removing them. Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-04-08xfrm: remove afinfo pointer from xfrm_modeFlorian Westphal1-1/+0
Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from ipv6 in case of CONFIG_IPV6=m. This change has virtually no effect on vmlinux size, but it reduces afinfo size and allows followup patch to make xfrm modes const. v2: mark if (afinfo) tests as likely (Sabrina) re-fetch afinfo according to inner_mode in xfrm_prepare_input(). Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-04-08xfrm: remove output2 indirection from xfrm_modeFlorian Westphal1-13/+0
similar to previous patch: no external module dependencies, so we can avoid the indirection by placing this in the core. This change removes the last indirection from xfrm_mode and the xfrm4|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore. Before: text data bss dec hex filename 3957 136 0 4093 ffd net/xfrm/xfrm_output.o 587 44 0 631 277 net/ipv4/xfrm4_mode_beet.o 649 32 0 681 2a9 net/ipv4/xfrm4_mode_tunnel.o 625 44 0 669 29d net/ipv6/xfrm6_mode_beet.o 599 32 0 631 277 net/ipv6/xfrm6_mode_tunnel.o After: text data bss dec hex filename 5359 184 0 5543 15a7 net/xfrm/xfrm_output.o 171 24 0 195 c3 net/ipv4/xfrm4_mode_beet.o 171 24 0 195 c3 net/ipv4/xfrm4_mode_tunnel.o 172 24 0 196 c4 net/ipv6/xfrm6_mode_beet.o 172 24 0 196 c4 net/ipv6/xfrm6_mode_tunnel.o v2: fold the *encap_add functions into xfrm*_prepare_output preserve (move) output2 comment (Sabrina) use x->outer_mode->encap, not inner fix a build breakage on ppc (kbuild robot) Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-04-08xfrm: remove input2 indirection from xfrm_modeFlorian Westphal1-13/+0
No external dependencies on any module, place this in the core. Increase is about 1800 byte for xfrm_input.o. The beet helpers get added to internal header, as they can be reused from xfrm_output.c in the next patch (kernel contains several copies of them in the xfrm{4,6}_mode_beet.c files). Before: text data bss dec filename 5578 176 2364 8118 net/xfrm/xfrm_input.o 1180 64 0 1244 net/ipv4/xfrm4_mode_beet.o 171 40 0 211 net/ipv4/xfrm4_mode_transport.o 1163 40 0 1203 net/ipv4/xfrm4_mode_tunnel.o 1083 52 0 1135 net/ipv6/xfrm6_mode_beet.o 172 40 0 212 net/ipv6/xfrm6_mode_ro.o 172 40 0 212 net/ipv6/xfrm6_mode_transport.o 1056 40 0 1096 net/ipv6/xfrm6_mode_tunnel.o After: text data bss dec filename 7373 200 2364 9937 net/xfrm/xfrm_input.o 587 44 0 631 net/ipv4/xfrm4_mode_beet.o 171 32 0 203 net/ipv4/xfrm4_mode_transport.o 649 32 0 681 net/ipv4/xfrm4_mode_tunnel.o 625 44 0 669 net/ipv6/xfrm6_mode_beet.o 172 32 0 204 net/ipv6/xfrm6_mode_ro.o 172 32 0 204 net/ipv6/xfrm6_mode_transport.o 599 32 0 631 net/ipv6/xfrm6_mode_tunnel.o v2: pass inner_mode to xfrm_inner_mode_encap_remove to fix AF_UNSPEC selector breakage (bisected by Benedict Wong) Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-04-08xfrm: remove gso_segment indirection from xfrm_modeFlorian Westphal1-5/+0
These functions are small and we only have versions for tunnel and transport mode for ipv4 and ipv6 respectively. Just place the 'transport or tunnel' conditional in the protocol specific function instead of using an indirection. Before: 3226 12 0 3238 net/ipv4/esp4_offload.o 7004 492 0 7496 net/ipv4/ip_vti.o 3339 12 0 3351 net/ipv6/esp6_offload.o 11294 460 0 11754 net/ipv6/ip6_vti.o 1180 72 0 1252 net/ipv4/xfrm4_mode_beet.o 428 48 0 476 net/ipv4/xfrm4_mode_transport.o 1271 48 0 1319 net/ipv4/xfrm4_mode_tunnel.o 1083 60 0 1143 net/ipv6/xfrm6_mode_beet.o 172 48 0 220 net/ipv6/xfrm6_mode_ro.o 429 48 0 477 net/ipv6/xfrm6_mode_transport.o 1164 48 0 1212 net/ipv6/xfrm6_mode_tunnel.o 15730428 6937008 4046908 26714344 vmlinux After: 3461 12 0 3473 net/ipv4/esp4_offload.o 7000 492 0 7492 net/ipv4/ip_vti.o 3574 12 0 3586 net/ipv6/esp6_offload.o 11295 460 0 11755 net/ipv6/ip6_vti.o 1180 64 0 1244 net/ipv4/xfrm4_mode_beet.o 171 40 0 211 net/ipv4/xfrm4_mode_transport.o 1163 40 0 1203 net/ipv4/xfrm4_mode_tunnel.o 1083 52 0 1135 net/ipv6/xfrm6_mode_beet.o 172 40 0 212 net/ipv6/xfrm6_mode_ro.o 172 40 0 212 net/ipv6/xfrm6_mode_transport.o 1056 40 0 1096 net/ipv6/xfrm6_mode_tunnel.o 15730424 6937008 4046908 26714340 vmlinux Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-04-08xfrm: remove xmit indirection from xfrm_modeFlorian Westphal1-5/+0
There are only two versions (tunnel and transport). The ip/ipv6 versions are only differ in sizeof(iphdr) vs ipv6hdr. Place this in the core and use x->outer_mode->encap type to call the correct adjustment helper. Before: text data bss dec filename 15730311 6937008 4046908 26714227 vmlinux After: 15730428 6937008 4046908 26714344 vmlinux (about 117 byte increase) v2: use family from x->outer_mode, not inner Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-04-08xfrm: remove output indirection from xfrm_modeFlorian Westphal1-14/+5
Same is input indirection. Only exception: we need to export xfrm_outer_mode_output for pktgen. Increases size of vmlinux by about 163 byte: Before: text data bss dec filename 15730208 6936948 4046908 26714064 vmlinux After: 15730311 6937008 4046908 26714227 vmlinux xfrm_inner_extract_output has no more external callers, make it static. v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca) add WARN_ON_ONCE for 'call AF_INET6 related output function, but CONFIG_IPV6=n' case. make xfrm_inner_extract_output static Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-04-08xfrm: remove input indirection from xfrm_modeFlorian Westphal1-11/+0
No need for any indirection or abstraction here, both functions are pretty much the same and quite small, they also have no external dependencies. xfrm_prepare_input can then be made static. With allmodconfig build, size increase of vmlinux is 25 byte: Before: text data bss dec filename 15730207 6936924 4046908 26714039 vmlinux After: 15730208 6936948 4046908 26714064 vmlinux v2: Fix INET_XFRM_MODE_TRANSPORT name in is-enabled test (Sabrina Dubroca) change copied comment to refer to transport and network header, not skb->{h,nh}, which don't exist anymore. (Sabrina) make xfrm_prepare_input static (Eyal Birger) Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-04-08xfrm: place af number into xfrm_mode structFlorian Westphal1-4/+5
This will be useful to know if we're supposed to decode ipv4 or ipv6. While at it, make the unregister function return void, all module_exit functions did just BUG(); there is never a point in doing error checks if there is no way to handle such error. Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-04-06nfc: nci: Potential off by one in ->pipes[] arrayDan Carpenter1-1/+1
This is similar to commit e285d5bfb7e9 ("NFC: Fix the number of pipes") where we changed NFC_HCI_MAX_PIPES from 127 to 128. As the comment next to the define explains, the pipe identifier is 7 bits long. The highest possible pipe is 127, but the number of possible pipes is 128. As the code is now, then there is potential for an out of bounds array access: net/nfc/nci/hci.c:297 nci_hci_cmd_received() warn: array off by one? 'ndev->hci_dev->pipes[pipe]' '0-127 == 127' Fixes: 11f54f228643 ("NFC: nci: Add HCI over NCI protocol support") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-16/+41
Minor comment merge conflict in mlx5. Staging driver has a fixup due to the skb->xmit_more changes in 'net-next', but was removed in 'net'. Signed-off-by: David S. Miller <[email protected]>
2019-04-04net: devlink: introduce devlink_compat_switch_id_get() helperJiri Pirko1-0/+9
Introduce devlink_compat_switch_id_get() helper which fills up switch_id according to passed netdev pointer. Call it directly from dev_get_port_parent_id() as a fallback when ndo_get_port_parent_id is not defined for given netdev. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-04net: devlink: extend port attrs for switch IDJiri Pirko1-2/+6
Extend devlink_port_attrs_set() to pass switch ID for ports which are part of switch and store it in port attrs. For other ports, this is NULL. Note that this allows the driver to group devlink ports into one or more switches according to the actual topology. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-04net: devlink: convert devlink_port_attrs bools to bitsJiri Pirko1-2/+2
In order to save space in the struct, convert bools to bits. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-03ipv6: Flip to fib_nexthop_infoDavid Ahern1-0/+5
Export fib_nexthop_info and fib_add_nexthop for use by IPv6 code. Remove rt6_nexthop_info and rt6_add_nexthop in favor of the IPv4 versions. Update fib_nexthop_info for IPv6 linkdown check and RTA_GATEWAY for AF_INET6. Signed-off-by: David Ahern <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-03ipv4: Add fib_nh_common to fib_resultDavid Ahern1-26/+21
Most of the ipv4 code only needs data from fib_nh_common. Add fib_nh_common selection to fib_result and update users to use it. Right now, fib_nh_common in fib_result will point to a fib_nh struct that is embedded within a fib_info: fib_info --> fib_nh fib_nh ... fib_nh ^ fib_result->nhc ----+ Later, nhc can point to a fib_nh within a nexthop struct: fib_info --> nexthop --> fib_nh ^ fib_result->nhc ---------------+ or for a nexthop group: fib_info --> nexthop --> nexthop --> fib_nh nexthop --> fib_nh ... nexthop --> fib_nh ^ fib_result->nhc ---------------------------+ In all cases nhsel within fib_result will point to which leg in the multipath route is used. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-01net: dsa: read mac address from DT for slave deviceXiaofei Shen1-0/+1
Before creating a slave netdevice, get the mac address from DTS and apply in case it is valid. Signed-off-by: Xiaofei Shen <[email protected]> Signed-off-by: Vinod Koul <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-01net: sched: introduce and use qdisc tree flush/purge helpersPaolo Abeni1-7/+19
The same code to flush qdisc tree and purge the qdisc queue is duplicated in many places and in most cases it does not respect NOLOCK qdisc: the global backlog len is used and the per CPU values are ignored. This change addresses the above, factoring-out the relevant code and using the helpers introduced by the previous patch to fetch the correct backlog len. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-01net: sched: introduce and use qstats read helpersPaolo Abeni1-0/+18
Classful qdiscs can't access directly the child qdiscs backlog length: if such qdisc is NOLOCK, per CPU values should be accounted instead. Most qdiscs no not respect the above. As a result, qstats fetching for most classful qdisc is currently incorrect: if the child qdisc is NOLOCK, it always reports 0 len backlog. This change introduces a pair of helpers to safely fetch both backlog and qlen and use them in stats class dumping functions, fixing the above issue and cleaning a bit the code. DRR needs also to access the child qdisc queue length, so it needs custom handling. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-01vrf: check accept_source_route on the original netdeviceStephen Suryaputra1-1/+1
Configuration check to accept source route IP options should be made on the incoming netdevice when the skb->dev is an l3mdev master. The route lookup for the source route next hop also needs the incoming netdev. v2->v3: - Simplify by passing the original netdevice down the stack (per David Ahern). Signed-off-by: Stephen Suryaputra <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29ipv6: Move ipv6 stubs to a separate header fileDavid Ahern3-48/+64
The number of stubs is growing and has nothing to do with addrconf. Move the definition of the stubs to a separate header file and update users. In the move, drop the vxlan specific comment before ipv6_stub. Code move only; no functional change intended. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29net: Use common nexthop init and release helpersDavid Ahern1-0/+4
With fib_nh_common in place, move common initialization and release code into helpers used by both ipv4 and ipv6. For the moment, the init is just the lwt encap and the release is both the netdev reference and the the lwt state reference. More will be added later. Signed-off-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29net: Add fib_nh_common and update fib_nh and fib6_nhDavid Ahern2-18/+33
Add fib_nh_common struct with common nexthop attributes. Convert fib_nh and fib6_nh to use it. Use macros to move existing fib_nh_* references to the new nh_common.nhc_*. Signed-off-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29ipv6: Rename fib6_nh entriesDavid Ahern2-11/+13
Rename fib6_nh entries that will be moved to a fib_nh_common struct. Specifically, the device, gateway, flags, and lwtstate are common with all nexthop definitions. In some places new temporary variables are declared or local variables renamed to maintain line lengths. Rename only; no functional change intended. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29ipv4: Rename fib_nh entriesDavid Ahern1-12/+12
Rename fib_nh entries that will be moved to a fib_nh_common struct. Specifically, the device, oif, gateway, flags, scope, lwtstate, nh_weight and nh_upper_bound are common with all nexthop definitions. In the process shorten fib_nh_lwtstate to fib_nh_lws to avoid really long lines. Rename only; no functional change intended. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29ipv6: Refactor fib6_ignore_linkdownDavid Ahern1-0/+8
fib6_ignore_linkdown takes a fib6_info but only looks at the net_device and its IPv6 config. Change it to take a net_device over a fib6_info as its input argument. In addition, move it to a header file to make the check inline and usable later with IPv4 code without going through the ipv6 stub, and rename to ip6_ignore_linkdown since it is only checking the setting based on the ipv6 struct on a device. Signed-off-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29ipv6: Move gateway checks to a fib6_nh settingDavid Ahern2-2/+3
The gateway setting is not per fib6_info entry but per-fib6_nh. Add a new fib_nh_has_gw flag to fib6_nh and convert references to RTF_GATEWAY to the new flag. For IPv6 address the flag is cheaper than checking that nh_gw is non-0 like IPv4 does. While this increases fib6_nh by 8-bytes, the effective allocation size of a fib6_info is unchanged. The 8 bytes is recovered later with a fib_nh_common change. Signed-off-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29ipv6: Create cleanup helper for fib6_nhDavid Ahern1-0/+1
Move the fib6_nh cleanup code to a new helper, fib6_nh_release. Signed-off-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29ipv6: Create init helper for fib6_nhDavid Ahern1-0/+4
Similar to IPv4, consolidate the fib6_nh initialization into a helper. As a new standalone function, add a cleanup path to put lwtstate on error. To avoid modifying fib6_config flags, move the reject check to a helper that is invoked once by fib6_nh_init to reset the device and then again in ip6_route_info_create to set the fib6_flags. Signed-off-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29ipv4: Create cleanup helper for fib_nhDavid Ahern1-0/+1
Move the fib_nh cleanup code from free_fib_info_rcu into a new helper, fib_nh_release. Move classid accounting into fib_nh_release which is called per fib_nh to make accounting symmetrical with fib_nh_init. Export the helper to allow for use with nexthop objects in the future. Signed-off-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29ipv4: Create init helper for fib_nhDavid Ahern1-0/+4
Consolidate the fib_nh initialization which is duplicated between fib_create_info for single path and fib_get_nhs for multipath. Export the helper to allow for use with nexthop objects in the future. Signed-off-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-29mac80211: rework locking for txq scheduling / airtime fairnessFelix Fietkau1-30/+19
Holding the lock around the entire duration of tx scheduling can create some nasty lock contention, especially when processing airtime information from the tx status or the rx path. Improve locking by only holding the active_txq_lock for lookups / scheduling list modifications. Signed-off-by: Felix Fietkau <[email protected]> Acked-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-03-28netns: provide pure entropy for net_hash_mix()Eric Dumazet2-8/+3
net_hash_mix() currently uses kernel address of a struct net, and is used in many places that could be used to reveal this address to a patient attacker, thus defeating KASLR, for the typical case (initial net namespace, &init_net is not dynamically allocated) I believe the original implementation tried to avoid spending too many cycles in this function, but security comes first. Also provide entropy regardless of CONFIG_NET_NS. Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Amit Klein <[email protected]> Reported-by: Benny Pinkas <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-28netfilter: Export nf_ct_{set,destroy}_timeout()Yi-Hung Wei1-0/+15
This patch exports nf_ct_set_timeout() and nf_ct_destroy_timeout(). The two functions are derived from xt_ct_destroy_timeout() and xt_ct_set_timeout() in xt_CT.c, and moved to nf_conntrack_timeout.c without any functional change. It would be useful for other users (i.e. OVS) that utilizes the finer-grain conntrack timeout feature. CC: Pablo Neira Ayuso <[email protected]> CC: Pravin Shelar <[email protected]> Signed-off-by: Yi-Hung Wei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-28net: devlink: remove unused devlink_port_get_phys_port_name() functionJiri Pirko1-2/+0
Now it is unused, remove it. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-28net: devlink: introduce devlink_compat_phys_port_name_get()Jiri Pirko1-0/+9
Introduce devlink_compat_phys_port_name_get() helper that gets the physical port name for specified netdevice according to devlink port attributes. Call this helper from dev_get_phys_port_name() in case ndo_get_phys_port_name is not defined. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-28net: replace ndo_get_devlink with ndo_get_devlink_portJiri Pirko1-2/+12
Follow-up patch is going to need a devlink port instance according to a netdev. Devlink port instance should be always available when devlink is used. So change the recently introduced ndo_get_devlink to ndo_get_devlink_port. With that, adjust the wrapper for the only user to get devlink pointer. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Michal Kubecek <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-3/+9
2019-03-27inet: switch IP ID generator to siphashEric Dumazet1-0/+2
According to Amit Klein and Benny Pinkas, IP ID generation is too weak and might be used by attackers. Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix()) having 64bit key and Jenkins hash is risky. It is time to switch to siphash and its 128bit keys. Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Amit Klein <[email protected]> Reported-by: Benny Pinkas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-27tcp: fix zerocopy and notsent_lowat issuesEric Dumazet1-3/+4
My recent patch had at least three problems : 1) TX zerocopy wants notification when skb is acknowledged, thus we need to call skb_zcopy_clear() if the skb is cached into sk->sk_tx_skb_cache 2) Some applications might expect precise EPOLLOUT notifications, so we need to update sk->sk_wmem_queued and call sk_mem_uncharge() from sk_wmem_free_skb() in all cases. The SOCK_QUEUE_SHRUNK flag must also be set. 3) Reuse of saved skb should have used skb_cloned() instead of simply checking if the fast clone has been freed. Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx") Signed-off-by: Eric Dumazet <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-27xfrm: Honor original L3 slave device in xfrmi policy lookupMartin Willi1-1/+2
If an xfrmi is associated to a vrf layer 3 master device, xfrm_policy_check() fails after traffic decapsulation. The input interface is replaced by the layer 3 master device, and hence xfrmi_decode_session() can't match the xfrmi anymore to satisfy policy checking. Extend ingress xfrmi lookup to honor the original layer 3 slave device, allowing xfrm interfaces to operate within a vrf domain. Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Martin Willi <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-03-26xfrm: clean up xfrm protocol checksCong Wang1-0/+17
In commit 6a53b7593233 ("xfrm: check id proto in validate_tmpl()") I introduced a check for xfrm protocol, but according to Herbert IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so it should be removed from validate_tmpl(). And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific protocols, this is why xfrm_state_flush() could still miss IPPROTO_ROUTING, which leads that those entries are left in net->xfrm.state_all before exit net. Fix this by replacing IPSEC_PROTO_ANY with zero. This patch also extracts the check from validate_tmpl() to xfrm_id_proto_valid() and uses it in parse_ipsecrequest(). With this, no other protocols should be added into xfrm. Fixes: 6a53b7593233 ("xfrm: check id proto in validate_tmpl()") Reported-by: [email protected] Cc: Steffen Klassert <[email protected]> Cc: Herbert Xu <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2019-03-24net: devlink: select NET_DEVLINK from driversJiri Pirko1-492/+3
Some drivers are becoming more dependent on NET_DEVLINK being selected in configuration. With upcoming compat functions, the behavior would be wrong in case devlink was not compiled in. So make the drivers select NET_DEVLINK and rely on the functions being there, not just stubs. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-24net: devlink: add port type spinlockJiri Pirko1-0/+4
Add spinlock to protect port type and type_dev pointer consistency. Without that, userspace may see inconsistent type and type_dev combinations. Signed-off-by: Jiri Pirko <[email protected]> v1->v2: - rebased Signed-off-by: David S. Miller <[email protected]>
2019-03-23Merge tag 'mlx5-updates-2019-03-20' of ↵David S. Miller2-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-03-20 This series includes updates to mlx5 driver, 1) Compiler warnings cleanup from Saeed Mahameed 2) Parav Pandit simplifies sriov enable/disables 3) Gustavo A. R. Silva, Removes a redundant assignment 4) Moshe Shemesh, Adds Geneve tunnel stateless offload support 5) Eli Britstein, Adds the Support for VLAN modify action and Replaces TC VLAN pop and push actions with VLAN modify Note: This series includes two simple non-mlx5 patches, 1) Declare IANA_VXLAN_UDP_PORT definition in include/net/vxlan.h, and use it in some drivers. 2) Declare GENEVE_UDP_PORT definition in include/net/geneve.h, and use it in mlx5 and nfp drivers. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-03-23tcp: add one skb cache for rxEric Dumazet1-0/+10
Often times, recvmsg() system calls and BH handling for a particular TCP socket are done on different cpus. This means the incoming skb had to be allocated on a cpu, but freed on another. This incurs a high spinlock contention in slab layer for small rpc, but also a high number of cache line ping pongs for larger packets. A full size GRO packet might use 45 page fragments, meaning that up to 45 put_page() can be involved. More over performing the __kfree_skb() in the recvmsg() context adds a latency for user applications, and increase probability of trapping them in backlog processing, since the BH handler might found the socket owned by the user. This patch, combined with the prior one increases the rpc performance by about 10 % on servers with large number of cores. (tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps instead of 8 Mpps) This also increases single bulk flow performance on 40Gbit+ links, since in this case there are often two cpus working in tandem : - CPU handling the NIC rx interrupts, feeding the receive queue, and (after this patch) freeing the skbs that were consumed. - CPU in recvmsg() system call, essentially 100 % busy copying out data to user space. Having at most one skb in a per-socket cache has very little risk of memory exhaustion, and since it is protected by socket lock, its management is essentially free. Note that if rps/rfs is used, we do not enable this feature, because there is high chance that the same cpu is handling both the recvmsg() system call and the TCP rx path, but that another cpu did the skb allocations in the device driver right before the RPS/RFS logic. To properly handle this case, it seems we would need to record on which cpu skb was allocated, and use a different channel to give skbs back to this cpu. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-23tcp: add one skb cache for txEric Dumazet1-0/+5
On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks. 20.69% [kernel] [k] queued_spin_lock_slowpath 5.64% [kernel] [k] _raw_spin_lock 3.83% [kernel] [k] syscall_return_via_sysret 3.48% [kernel] [k] __entry_text_start 1.76% [kernel] [k] __netif_receive_skb_core 1.64% [kernel] [k] __fget For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes. In many cases, ACK packets are handled by another cpus, and this unfortunately incurs heavy costs for slab layer. This patch uses an extra pointer in socket structure, so that we try to reuse the same skb and avoid these expensive costs. We cache at most one skb per socket so this should be safe as far as memory pressure is concerned. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-23net: convert rps_needed and rfs_needed to new static branch apiEric Dumazet1-1/+1
We prefer static_branch_unlikely() over static_key_false() these days. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-23net: sched: add empty status flag for NOLOCK qdiscPaolo Abeni1-0/+11
The queue is marked not empty after acquiring the seqlock, and it's up to the NOLOCK qdisc clearing such flag on dequeue. Since the empty status lays on the same cache-line of the seqlock, it's always hot on cache during the updates. This makes the empty flag update a little bit loosy. Given the lack of synchronization between enqueue and dequeue, this is unavoidable. v2 -> v3: - qdisc_is_empty() has a const argument (Eric) v1 -> v2: - use really an 'empty' flag instead of 'not_empty', as suggested by Eric Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Ivan Vecera <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-22net: Add IANA_VXLAN_UDP_PORT definition to vxlan header fileMoshe Shemesh1-0/+2
Added IANA_VXLAN_UDP_PORT (4789) definition to vxlan header file so it can be used by drivers instead of local definition. Updated drivers which locally defined it as 4789 to use it. Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Or Gerlitz <[email protected]> Cc: John Hurley <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Yunsheng Lin <[email protected]> Cc: Peng Li <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>