aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2020-07-31mac80211: add a function for running rx without passing skbs to the stackFelix Fietkau1-0/+25
This can be used to run mac80211 rx processing on a batch of frames in NAPI poll before passing them to the network stack in a large batch. This can improve icache footprint, or it can be used to pass frames via netif_receive_skb_list. Signed-off-by: Felix Fietkau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31mac80211: parse radiotap header when selecting Tx queueMathy Vanhoef1-0/+8
Already parse the radiotap header in ieee80211_monitor_select_queue. In a subsequent commit this will allow us to add a radiotap flag that influences the queue on which injected packets will be sent. This also fixes the incomplete validation of the injected frame in ieee80211_monitor_select_queue: currently an out of bounds memory access may occur in in the called function ieee80211_select_queue_80211 if the 802.11 header is too small. Note that in ieee80211_monitor_start_xmit the radiotap header is parsed again, which is necessairy because ieee80211_monitor_select_queue is not always called beforehand. Signed-off-by: Mathy Vanhoef <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31mac80211: add radiotap flag to prevent sequence number overwriteMathy Vanhoef2-0/+4
The radiotap specification contains a flag to indicate that the sequence number of an injected frame should not be overwritten. Parse this flag and define and set a corresponding Tx control flag. Signed-off-by: Mathy Vanhoef <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31net/fq_impl: use skb_get_hash instead of skb_get_hash_perturbFelix Fietkau2-3/+1
This avoids unnecessarily regenerating the skb flow hash Signed-off-by: Felix Fietkau <[email protected]> Link: https://lore.kernel.org/r/[email protected] [small commit message fixup] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31cfg80211/mac80211: add connected to auth server to station infoMarkus Theil1-0/+3
This patch adds the necessary bits to later query the auth server flag for every peer from iw. Signed-off-by: Markus Theil <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31cfg80211/mac80211: add connected to auth server to meshconfMarkus Theil1-0/+1
Besides information about num of peerings and gate connectivity, the mesh formation byte also contains a flag for authentication server connectivity, that currently cannot be set in the mesh conf. This patch adds this capability, which is necessary to implement 802.1X authentication in mesh mode. Signed-off-by: Markus Theil <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31cfg80211/mac80211: add mesh_param "mesh_nolearn" to skip path discoveryLinus Lüssing1-0/+6
Currently, before being able to forward a packet between two 802.11s nodes, both a PLINK handshake is performed upon receiving a beacon and then later a PREQ/PREP exchange for path discovery is performed on demand upon receiving a data frame to forward. When running a mesh protocol on top of an 802.11s interface, like batman-adv, we do not need the multi-hop mesh routing capabilities of 802.11s and usually set mesh_fwding=0. However, even with mesh_fwding=0 the PREQ/PREP path discovery is still performed on demand. Even though in this scenario the next hop PREQ/PREP will determine is always the direct 11s neighbor node. The new mesh_nolearn parameter allows to skip the PREQ/PREP exchange in this scenario, leading to a reduced delay, reduced packet buffering and simplifies HWMP in general. mesh_nolearn is still rather conservative in that if the packet destination is not a direct 11s neighbor, it will fall back to PREQ/PREP path discovery. For normal, multi-hop 802.11s mesh routing it is usually not advisable to enable mesh_nolearn as a transmission to a direct but distant neighbor might be worse than reaching that same node via a more robust / higher throughput etc. multi-hop path. Cc: Sven Eckelmann <[email protected]> Cc: Simon Wunderlich <[email protected]> Signed-off-by: Linus Lüssing <[email protected]> Link: https://lore.kernel.org/r/[email protected] [fix nl80211 policy to range 0/1 only] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31cfg80211: allow the low level driver to flush the BSS tableEmmanuel Grumbach1-0/+6
The low level driver adds its own opaque information in the BSS table in the cfg80211_bss structure. The low level driver may need to signal that this information is no longer relevant and needs to be recreated. Add an API to allow the low level driver to do that. iwlwifi needs this because it keeps there an information about the firmware's internal clock. This is kept in mac80211's struct ieee80211_bss::sync_device_ts. This information is populated while we scan, we add the internal firmware's clock to each beacon which allows us to program the firmware correctly after association so that it'll know when (in terms of its internal clock) the DTIM and TBTT will happen. When the firmware is reset this internal clock is reset as well and ieee80211_bss::sync_device_ts is no longer accurate. iwlwifi will call this new API any time the firmware is started. Signed-off-by: Emmanuel Grumbach <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31net/wireless: regulatory.h: drop duplicate word in commentRandy Dunlap1-1/+1
Drop doubled word "of" in a comment. Signed-off-by: Randy Dunlap <[email protected]> Cc: [email protected] Cc: Kalle Valo <[email protected]> Cc: [email protected] Cc: Johannes Berg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31net/wireless: mac80211.h: drop duplicate words in commentsRandy Dunlap1-3/+3
Drop doubled words "are" and "by" in comments. Change doubled "to to" to "to the". Signed-off-by: Randy Dunlap <[email protected]> Cc: [email protected] Cc: Kalle Valo <[email protected]> Cc: [email protected] Cc: Johannes Berg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31net/wireless: cfg80211.h: drop duplicate words in commentsRandy Dunlap1-2/+2
Drop doubled word "by" in a comment. Change "operate in in" to "operate with in" as is used below. Signed-off-by: Randy Dunlap <[email protected]> Cc: [email protected] Cc: Kalle Valo <[email protected]> Cc: [email protected] Cc: Johannes Berg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-31nl80211: S1G band and channel definitionsThomas Pedersen1-0/+17
Gives drivers the definitions needed to advertise support for S1G bands. Signed-off-by: Thomas Pedersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-07-30ipv6: fix memory leaks on IPV6_ADDRFORM pathCong Wang1-0/+1
IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket to IPv4, particularly struct ipv6_ac_socklist. Similar to struct ipv6_mc_socklist, we should just close it on this path. This bug can be easily reproduced with the following C program: #include <stdio.h> #include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <arpa/inet.h> int main() { int s, value; struct sockaddr_in6 addr; struct ipv6_mreq m6; s = socket(AF_INET6, SOCK_DGRAM, 0); addr.sin6_family = AF_INET6; addr.sin6_port = htons(5000); inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr); connect(s, (struct sockaddr *)&addr, sizeof(addr)); inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr); m6.ipv6mr_interface = 5; setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6)); value = AF_INET; setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value)); close(s); return 0; } Reported-by: [email protected] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-30Merge branch 'master' of ↵David S. Miller1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-07-30 Please note that I did the first time now --no-ff merges of my testing branch into the master branch to include the [PATCH 0/n] message of a patchset. Please let me know if this is desirable, or if I should do it any different. 1) Introduce a oseq-may-wrap flag to disable anti-replay protection for manually distributed ICVs as suggested in RFC 4303. From Petr Vaněk. 2) Patchset to fully support IPCOMP for vti4, vti6 and xfrm interfaces. From Xin Long. 3) Switch from a linear list to a hash list for xfrm interface lookups. From Eyal Birger. 4) Fixes to not register one xfrm(6)_tunnel object twice. From Xin Long. 5) Fix two compile errors that were introduced with the IPCOMP support for vti and xfrm interfaces. Also from Xin Long. 6) Make the policy hold queue work with VTI. This was forgotten when VTI was implemented. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-07-30Bluetooth: Enable controller RPA resolution using Experimental featureSathish Narasimman1-0/+1
This patch adds support to enable the use of RPA Address resolution using expermental feature mgmt command. Signed-off-by: Sathish Narasimman <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-07-30Bluetooth: Enable RPA TimeoutSathish Narasimman1-0/+2
Enable RPA timeout during bluetooth initialization. The RPA timeout value is used from hdev, which initialized from debug_fs Signed-off-by: Sathish Narasimman <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-07-30Bluetooth: Configure controller address resolution if availableMarcel Holtmann1-0/+3
When the LL Privacy support is available, then as part of enabling or disabling passive background scanning, it is required to set up the controller based address resolution as well. Since only passive background scanning is utilizing the whitelist, the address resolution is now bound to the whitelist and passive background scanning. All other resolution can be easily done by the host stack. Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sathish Narsimman <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-07-30Bluetooth: Translate additional address type correctlyMarcel Holtmann1-2/+4
When using controller based address resolution, then the new address types 0x02 and 0x03 are used. These types need to be converted back into either public address or random address types. Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sathish Narsimman <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-07-29mlxsw: spectrum: Use different trap group for externally routed packetsIdo Schimmel1-0/+3
Cited commit mistakenly removed the trap group for externally routed packets (e.g., via the management interface) and grouped locally routed and externally routed packet traps under the same group, thereby subjecting them to the same policer. This can result in problems, for example, when FRR is restarted and suddenly all transient traffic is trapped to the CPU because of a default route through the management interface. Locally routed packets required to re-establish a BGP connection will never reach the CPU and the routing tables will not be re-populated. Fix this by using a different trap group for externally routed packets. Fixes: 8110668ecd9a ("mlxsw: spectrum_trap: Register layer 3 control traps") Reported-by: Alex Veber <[email protected]> Tested-by: Alex Veber <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-29netfilter: conntrack: Use sequence counter with associated spinlockAhmed S. Darwish1-1/+1
A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t data type, which allows to associate a spinlock with the sequence counter. This enables lockdep to verify that the spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-07-28fib: use indirect call wrappers in the most common fib_rules_opsBrian Vazquez1-0/+18
This avoids another inderect call per RX packet which save us around 20-40 ns. Changelog: v1 -> v2: - Move declaraions to fib_rules.h to remove warnings Reported-by: kernel test robot <[email protected]> Signed-off-by: Brian Vazquez <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-28Bluetooth: btusb: Fix and detect most of the Chinese Bluetooth controllersIsmael Ferreras Morezuelas2-0/+13
For some reason they tend to squat on the very first CSR/ Cambridge Silicon Radio VID/PID instead of paying fees. This is an extremely common problem; the issue goes as back as 2013 and these devices are only getting more popular, even rebranded by reputable vendors and sold by retailers everywhere. So, at this point in time there are hundreds of modern dongles reusing the ID of what originally was an early Bluetooth 1.1 controller. Linux is the only place where they don't work due to spotty checks in our detection code. It only covered a minimum subset. So what's the big idea? Take advantage of the fact that all CSR chips report the same internal version as both the LMP sub-version and HCI revision number. It always matches, couple that with the manufacturer code, that rarely lies, and we now have a good idea of who is who. Additionally, by compiling a list of user-reported HCI/lsusb dumps, and searching around for legit CSR dongles in similar product ranges we can find what CSR BlueCore firmware supported which Bluetooth versions. That way we can narrow down ranges of fakes for each of them. e.g. Real CSR dongles with LMP subversion 0x73 are old enough that support BT 1.1 only; so it's a dead giveaway when some third-party BT 4.0 dongle reuses it. So, to sum things up; there are multiple classes of fake controllers reusing the same 0A12:0001 VID/PID. This has been broken for a while. Known 'fake' bcdDevices: 0x0100, 0x0134, 0x1915, 0x2520, 0x7558, 0x8891 IC markings on 0x7558: FR3191AHAL 749H15143 (???) https://bugzilla.kernel.org/show_bug.cgi?id=60824 Fixes: 81cac64ba258ae (Deal with USB devices that are faking CSR vendor) Reported-by: Michał Wiśniewski <[email protected]> Tested-by: Mike Johnson <[email protected]> Tested-by: Ricardo Rodrigues <[email protected]> Tested-by: M.Hanny Sabbagh <[email protected]> Tested-by: Oussama BEN BRAHIM <[email protected]> Tested-by: Ismael Ferreras Morezuelas <[email protected]> Signed-off-by: Ismael Ferreras Morezuelas <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-07-25bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commandsAndrii Nakryiko1-2/+0
Now that BPF program/link management is centralized in generic net_device code, kernel code never queries program id from drivers, so XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary. This patch removes all the implementations of those commands in kernel, along the xdp_attachment_query(). This patch was compile-tested on allyesconfig. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-07-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-1/+0
The UDP reuseport conflict was a little bit tricky. The net-next code, via bpf-next, extracted the reuseport handling into a helper so that the BPF sk lookup code could invoke it. At the same time, the logic for reuseport handling of unconnected sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace which changed the logic to carry on the reuseport result into the rest of the lookup loop if we do not return immediately. This requires moving the reuseport_has_conns() logic into the callers. While we are here, get rid of inline directives as they do not belong in foo.c files. The other changes were cases of more straightforward overlapping modifications. Signed-off-by: David S. Miller <[email protected]>
2020-07-24net: pass a sockptr_t into ->setsockoptChristoph Hellwig6-9/+10
Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Stefan Schmidt <[email protected]> [ieee802154] Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-24net/tcp: switch ->md5_parse to sockptr_tChristoph Hellwig1-1/+1
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-24net/udp: switch udp_lib_setsockopt to sockptr_tChristoph Hellwig1-1/+1
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-24net/ipv6: switch ipv6_flowlabel_opt to sockptr_tChristoph Hellwig1-1/+1
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Note that the get case is pretty weird in that it actually copies data back to userspace from setsockopt. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-24net/ipv4: merge ip_options_get and ip_options_get_from_userChristoph Hellwig1-3/+2
Use the sockptr_t type to merge the versions. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-24net/xfrm: switch xfrm_user_policy to sockptr_tChristoph Hellwig1-3/+5
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-24net: switch sock_set_timeout to sockptr_tChristoph Hellwig1-1/+2
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-24net/flow_dissector: add packet hash dissectionAriel Levkovich1-0/+9
Retreive a hash value from the SKB and store it in the dissector key for future matching. Signed-off-by: Ariel Levkovich <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-24flow_offload: Move rhashtable inclusion to the source fileHerbert Xu1-1/+0
I noticed that touching linux/rhashtable.h causes lib/vsprintf.c to be rebuilt. This dependency came through a bogus inclusion in the file net/flow_offload.h. This patch moves it to the right place. This patch also removes a lingering rhashtable inclusion in cls_api created by the same commit. Fixes: 4e481908c51b ("flow_offload: move tc indirect block to...") Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-23net: dsa: stop overriding master's ndo_get_phys_port_nameVladimir Oltean1-23/+0
The purpose of this override is to give the user an indication of what the number of the CPU port is (in DSA, the CPU port is a hardware implementation detail and not a network interface capable of traffic). However, it has always failed (by design) at providing this information to the user in a reliable fashion. Prior to commit 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers"), the behavior was to only override this callback if it was not provided by the DSA master. That was its first failure: if the DSA master itself was a DSA port or a switchdev, then the user would not see the number of the CPU port in /sys/class/net/eth0/phys_port_name, but the number of the DSA master port within its respective physical switch. But that was actually ok in a way. The commit mentioned above changed that behavior, and now overrides the master's ndo_get_phys_port_name unconditionally. That comes with problems of its own, which are worse in a way. The idea is that it's typical for switchdev users to have udev rules for consistent interface naming. These are based, among other things, on the phys_port_name attribute. If we let the DSA switch at the bottom to start randomly overriding ndo_get_phys_port_name with its own CPU port, we basically lose any predictability in interface naming, or even uniqueness, for that matter. So, there are reasons to let DSA override the master's callback (to provide a consistent interface, a number which has a clear meaning and must not be interpreted according to context), and there are reasons to not let DSA override it (it breaks udev matching for the DSA master). But, there is an alternative method for users to retrieve the number of the CPU port of each DSA switch in the system: $ devlink port pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0 pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2 pci/0000:00:00.5/4: type notset flavour cpu port 4 spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0 spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1 spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2 spi/spi2.0/4: type notset flavour cpu port 4 spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0 spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1 spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2 spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3 spi/spi2.1/4: type notset flavour cpu port 4 So remove this duplicated, unreliable and troublesome method. From this patch on, the phys_port_name attribute of the DSA master will only contain information about itself (if at all). If the users need reliable information about the CPU port they're probably using devlink anyway. Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-13/+29
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-07-21 The following pull-request contains BPF updates for your *net-next* tree. We've added 46 non-merge commits during the last 6 day(s) which contain a total of 68 files changed, 4929 insertions(+), 526 deletions(-). The main changes are: 1) Run BPF program on socket lookup, from Jakub. 2) Introduce cpumap, from Lorenzo. 3) s390 JIT fixes, from Ilya. 4) teach riscv JIT to emit compressed insns, from Luke. 5) use build time computed BTF ids in bpf iter, from Yonghong. ==================== Purely independent overlapping changes in both filter.h and xdp.h Signed-off-by: David S. Miller <[email protected]>
2020-07-21bareudp: Reverted support to enable & disable rx metadata collectionMartin Varghese1-1/+0
The commit fe80536acf83 ("bareudp: Added attribute to enable & disable rx metadata collection") breaks the the original(5.7) default behavior of bareudp module to collect RX metadadata at the receive. It was added to avoid the crash at the kernel neighbour subsytem when packet with metadata from bareudp is processed. But it is no more needed as the commit 394de110a733 ("net: Added pointer check for dst->ops->neigh_lookup in dst_neigh_lookup_skb") solves this crash. Fixes: fe80536acf83 ("bareudp: Added attribute to enable & disable rx metadata collection") Signed-off-by: Martin Varghese <[email protected]> Acked-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-22ipvs: queue delayed work to expire no destination connections if ↵Andrew Sy Kim1-0/+29
expire_nodest_conn=1 When expire_nodest_conn=1 and a destination is deleted, IPVS does not expire the existing connections until the next matching incoming packet. If there are many connection entries from a single client to a single destination, many packets may get dropped before all the connections are expired (more likely with lots of UDP traffic). An optimization can be made where upon deletion of a destination, IPVS queues up delayed work to immediately expire any connections with a deleted destination. This ensures any reused source ports from a client (within the IPVS timeouts) are scheduled to new real servers instead of silently dropped. Signed-off-by: Andrew Sy Kim <[email protected]> Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-07-21devlink: Add comment for devlink instance lockParav Pandit1-1/+3
Add comment to describe the purpose of devlink instance lock. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-21xfrm: Fix crash when the hold queue is used.Steffen Klassert1-2/+2
The commits "xfrm: Move dst->path into struct xfrm_dst" and "net: Create and use new helper xfrm_dst_child()." changed xfrm bundle handling under the assumption that xdst->path and dst->child are not a NULL pointer only if dst->xfrm is not a NULL pointer. That is true with one exception. If the xfrm hold queue is used to wait until a SA is installed by the key manager, we create a dummy bundle without a valid dst->xfrm pointer. The current xfrm bundle handling crashes in that case. Fix this by extending the NULL check of dst->xfrm with a test of the DST_XFRM_QUEUE flag. Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst") Fixes: b92cf4aab8e6 ("net: Create and use new helper xfrm_dst_child().") Signed-off-by: Steffen Klassert <[email protected]>
2020-07-20net: dsa: Setup dsa_netdev_opsFlorian Fainelli1-1/+0
Now that we have all the infrastructure in place for calling into the dsa_ptr->netdev_ops function pointers, install them when we configure the DSA CPU/management interface and tear them down. The flow is unchanged from before, but now we preserve equality of tests when network device drivers do tests like dev->netdev_ops == &foo_ops which was not the case before since we were allocating an entirely new structure. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-20net: dsa: Add wrappers for overloaded ndo_opsFlorian Fainelli1-0/+70
Add definitions for the dsa_netdevice_ops structure which is a subset of the net_device_ops structure for the specific operations that we care about overlaying on top of the DSA CPU port net_device and provide inline stubs that take core managing whether DSA code is reachable. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19icmp: support rfc 4884Willem de Bruijn1-0/+1
Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an extension struct if present. ICMP messages may include an extension structure after the original datagram. RFC 4884 standardized this behavior. It stores the offset in words to the extension header in u8 icmphdr.un.reserved[1]. The field is valid only for ICMP types destination unreachable, time exceeded and parameter problem, if length is at least 128 bytes and entire packet does not exceed 576 bytes. Return the offset to the start of the extension struct when reading an ICMP error from the error queue, if it matches the above constraints. Do not return the raw u8 field. Return the offset from the start of the user buffer, in bytes. The kernel does not return the network and transport headers, so subtract those. Also validate the headers. Return the offset regardless of validation, as an invalid extension must still not be misinterpreted as part of the original datagram. Note that !invalid does not imply valid. If the extension version does not match, no validation can take place, for instance. For backward compatibility, make this optional, set by setsockopt SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c For forward compatibility, reserve only setsockopt value 1, leaving other bits for additional icmp extensions. Changes v1->v2: - convert word offset to byte offset from start of user buffer - return in ee_data as u8 may be insufficient - define extension struct and object header structs - return len only if constraints met - if returning len, also validate Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19xdp: introduce xdp_get_shared_info_from_{buff, frame} utility routinesLorenzo Bianconi1-0/+15
Introduce xdp_get_shared_info_from_{buff,frame} utility routines to get skb_shared_info from xdp buffer/frame pointer. xdp_get_shared_info_from_{buff,frame} will be used to implement xdp multi-buffer support Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19net: make ->{get,set}sockopt in proto_ops optionalChristoph Hellwig1-2/+0
Just check for a NULL method instead of wiring up sock_no_{get,set}sockopt. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Marc Kleine-Budde <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19net/ipv6: remove compat_ipv6_{get,set}sockoptChristoph Hellwig5-39/+0
Handle the few cases that need special treatment in-line using in_compat_syscall(). This also removes all the now unused compat_{get,set}sockopt methods. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19net/ipv4: remove compat_ip_{get,set}sockoptChristoph Hellwig1-4/+0
Handle the few cases that need special treatment in-line using in_compat_syscall(). Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19net: remove compat_sock_common_{get,set}sockoptChristoph Hellwig1-4/+0
Add the compat handling to sock_common_{get,set}sockopt instead, keyed of in_compat_syscall(). This allow to remove the now unused ->compat_{get,set}sockopt methods from struct proto_ops. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Matthieu Baerts <[email protected]> Acked-by: Stefan Schmidt <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19net: simplify cBPF setsockopt compat handlingChristoph Hellwig1-1/+0
Add a helper that copies either a native or compat bpf_fprog from userspace after verifying the length, and remove the compat setsockopt handlers that now aren't required. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-16Revert "net: sched: Pass root lock to Qdisc_ops.enqueue"Petr Machata1-4/+2
This reverts commit aebe4426ccaa4838f36ea805cdf7d76503e65117. Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-07-16net: sched: Do not drop root lock in tcf_qevent_handle()Petr Machata1-2/+2
Mirred currently does not mix well with blocks executed after the qdisc root lock is taken. This includes classification blocks (such as in PRIO, ETS, DRR qdiscs) and qevents. The locking caused by the packet mirrored by mirred can cause deadlocks: either when the thread of execution attempts to take the lock a second time, or when two threads end up waiting on each other's locks. The qevent patchset attempted to not introduce further badness of this sort, and dropped the lock before executing the qevent block. However this lead to too little locking and races between qdisc configuration and packet enqueue in the RED qdisc. Before the deadlock issues are solved in a way that can be applied across many qdiscs reasonably easily, do for qevents what is done for the classification blocks and just keep holding the root lock. Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>