aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-02-16batman-adv: Only put gw_node list reference when removedSven Eckelmann1-3/+4
The batadv_gw_node reference counter in batadv_gw_node_update can only be reduced when the list entry was actually removed. Otherwise the reference counter may reach zero when batadv_gw_node_update is called from two different contexts for the same gw_node but only one context is actually removing the entry from the list. The release function for this gw_node is not called inside the list_lock spinlock protected region because the function batadv_gw_node_update still holds a gw_node reference for the object pointer on the stack. Thus the actual release function (when required) will be called only at the end of the function. Fixes: bd3524c14bd0 ("batman-adv: remove obsolete deleted attribute for gateway node") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-16mm/gup: Switch all callers of get_user_pages() to not pass tsk/mmDave Hansen1-1/+1
We will soon modify the vanilla get_user_pages() so it can no longer be used on mm/tasks other than 'current/current->mm', which is by far the most common way it is called. For now, we allow the old-style calls, but warn when they are used. (implemented in previous patch) This patch switches all callers of: get_user_pages() get_user_pages_unlocked() get_user_pages_locked() to stop passing tsk/mm so they will no longer see the warnings. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-15treewide: Fix typo in printkMasanari Iida1-1/+1
This patch fix spelling typos found in printk and Kconfig. Signed-off-by: Masanari Iida <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2016-02-14Merge 4.5-rc4 into tty-nextGreg Kroah-Hartman72-268/+470
We want the fixes in here, and this resolves a merge error in tty_io.c Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-02-13vsock: Fix blocking ops call in prepare_to_waitLaura Abbott1-13/+6
We receoved a bug report from someone using vmware: WARNING: CPU: 3 PID: 660 at kernel/sched/core.c:7389 __might_sleep+0x7d/0x90() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810fa68d>] prepare_to_wait+0x2d/0x90 Modules linked in: vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event snd_ens1371 iosf_mbi gameport snd_rawmidi snd_ac97_codec ac97_bus snd_seq coretemp snd_seq_device snd_pcm snd_timer snd soundcore ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vmw_vmci vmw_balloon i2c_piix4 shpchp parport_pc parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc btrfs xor raid6_pq 8021q garp stp llc mrp crc32c_intel serio_raw mptspi vmwgfx drm_kms_helper ttm drm scsi_transport_spi mptscsih e1000 ata_generic mptbase pata_acpi CPU: 3 PID: 660 Comm: vmtoolsd Not tainted 4.2.0-0.rc1.git3.1.fc23.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 0000000000000000 0000000049e617f3 ffff88006ac37ac8 ffffffff818641f5 0000000000000000 ffff88006ac37b20 ffff88006ac37b08 ffffffff810ab446 ffff880068009f40 ffffffff81c63bc0 0000000000000061 0000000000000000 Call Trace: [<ffffffff818641f5>] dump_stack+0x4c/0x65 [<ffffffff810ab446>] warn_slowpath_common+0x86/0xc0 [<ffffffff810ab4d5>] warn_slowpath_fmt+0x55/0x70 [<ffffffff8112551d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90 [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90 [<ffffffff810da2bd>] __might_sleep+0x7d/0x90 [<ffffffff812163b3>] __might_fault+0x43/0xa0 [<ffffffff81430477>] copy_from_iter+0x87/0x2a0 [<ffffffffa039460a>] __qp_memcpy_to_queue+0x9a/0x1b0 [vmw_vmci] [<ffffffffa0394740>] ? qp_memcpy_to_queue+0x20/0x20 [vmw_vmci] [<ffffffffa0394757>] qp_memcpy_to_queue_iov+0x17/0x20 [vmw_vmci] [<ffffffffa0394d50>] qp_enqueue_locked+0xa0/0x140 [vmw_vmci] [<ffffffffa039593f>] vmci_qpair_enquev+0x4f/0xd0 [vmw_vmci] [<ffffffffa04847bb>] vmci_transport_stream_enqueue+0x1b/0x20 [vmw_vsock_vmci_transport] [<ffffffffa047ae05>] vsock_stream_sendmsg+0x2c5/0x320 [vsock] [<ffffffff810fabd0>] ? wake_atomic_t_function+0x70/0x70 [<ffffffff81702af8>] sock_sendmsg+0x38/0x50 [<ffffffff81702ff4>] SYSC_sendto+0x104/0x190 [<ffffffff8126e25a>] ? vfs_read+0x8a/0x140 [<ffffffff817042ee>] SyS_sendto+0xe/0x10 [<ffffffff8186d9ae>] entry_SYSCALL_64_fastpath+0x12/0x76 transport->stream_enqueue may call copy_to_user so it should not be called inside a prepare_to_wait. Narrow the scope of the prepare_to_wait to avoid the bad call. This also applies to vsock_stream_recvmsg as well. Reported-by: Vinson Lee <[email protected]> Tested-by: Vinson Lee <[email protected]> Signed-off-by: Laura Abbott <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-13ipv4: fix memory leaks in ip_cmsg_send() callersEric Dumazet4-3/+11
Dmitry reported memory leaks of IP options allocated in ip_cmsg_send() when/if this function returns an error. Callers are responsible for the freeing. Many thanks to Dmitry for the report and diagnostic. Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-12net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloadsEdward Cree6-23/+14
All users now pass false, so we can remove it, and remove the code that was conditional upon it. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-12net: gre: Implement LCO for GRE over IPv4Edward Cree1-3/+13
Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-12fou: enable LCO in FOU and GUEEdward Cree1-8/+6
Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-12net: udp: always set up for CHECKSUM_PARTIAL offloadEdward Cree2-25/+2
If the dst device doesn't support it, it'll get fixed up later anyway by validate_xmit_skb(). Also, this allows us to take advantage of LCO to avoid summing the payload multiple times. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-12net: local checksum offload for encapsulationEdward Cree3-22/+22
The arithmetic properties of the ones-complement checksum mean that a correctly checksummed inner packet, including its checksum, has a ones complement sum depending only on whatever value was used to initialise the checksum field before checksumming (in the case of TCP and UDP, this is the ones complement sum of the pseudo header, complemented). Consequently, if we are going to offload the inner checksum with CHECKSUM_PARTIAL, we can compute the outer checksum based only on the packed data not covered by the inner checksum, and the initial value of the inner checksum field. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-12tcp/dccp: better use of ephemeral ports in bind()Eric Dumazet1-126/+114
Implement strategy used in __inet_hash_connect() in opposite way : Try to find a candidate using odd ports, then fallback to even ports. We no longer disable BH for whole traversal, but one bucket at a time. We also use cond_resched() to yield cpu to other tasks if needed. I removed one indentation level and tried to mirror the loop we have in __inet_hash_connect() and variable names to ease code maintenance. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-12tcp/dccp: better use of ephemeral ports in connect()Eric Dumazet1-85/+85
In commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range in connect()"), I added a very simple heuristic, so that we got better chances to use even ports, and allow bind() users to have more available slots. It gave nice results, but with more than 200,000 TCP sessions on a typical server, the ~30,000 ephemeral ports are still a rare resource. I chose to go a step further, by looking at all even ports, and if none was available, fallback to odd ports. The companion patch does the same in bind(), but in opposite way. I've seen exec times of up to 30ms on busy servers, so I no longer disable BH for the whole traversal, but only for each hash bucket. I also call cond_resched() to be gentle to other tasks. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds15-24/+85
Pull networking fixes from David Miller: 1) Fix BPF handling of branch offset adjustmnets on backjumps, from Daniel Borkmann. 2) Make sure selinux knows about SOCK_DESTROY netlink messages, from Lorenzo Colitti. 3) Fix openvswitch tunnel mtu regression, from David Wragg. 4) Fix ICMP handling of TCP sockets in syn_recv state, from Eric Dumazet. 5) Fix SCTP user hmacid byte ordering bug, from Xin Long. 6) Fix recursive locking in ipv6 addrconf, from Subash Abhinov Kasiviswanathan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: bpf: fix branch offset adjustment on backjumps after patching ctx expansion vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices geneve: Relax MTU constraints vxlan: Relax MTU constraints flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen of: of_mdio: Add marvell, 88e1145 to whitelist of PHY compatibilities. selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tables sctp: translate network order to host order when users get a hmacid enic: increment devcmd2 result ring in case of timeout tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs net:Add sysctl_max_skb_frags tcp: do not drop syn_recv on all icmp reports ipv6: fix a lockdep splat unix: correctly track in-flight fds in sending process user_struct update be2net maintainers' email addresses dwc_eth_qos: Reset hardware before PHY start ipv6: addrconf: Fix recursive spin lock call
2016-02-11net: bulk free SKBs that were delay free'ed due to IRQ contextJesper Dangaard Brouer2-3/+13
The network stack defers SKBs free, in-case free happens in IRQ or when IRQs are disabled. This happens in __dev_kfree_skb_irq() that writes SKBs that were free'ed during IRQ to the softirq completion queue (softnet_data.completion_queue). These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ in function net_tx_action(). Take advantage of this a use the skb defer and flush API, as we are already in softirq context. For modern drivers this rarely happens. Although most drivers do call dev_kfree_skb_any(), which detects the situation and calls __dev_kfree_skb_irq() when needed. This due to netpoll can call from IRQ context. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11net: bulk free infrastructure for NAPI context, use napi_consume_skbJesper Dangaard Brouer2-6/+78
Discovered that network stack were hitting the kmem_cache/SLUB slowpath when freeing SKBs. Doing bulk free with kmem_cache_free_bulk can speedup this slowpath. NAPI context is a bit special, lets take advantage of that for bulk free'ing SKBs. In NAPI context we are running in softirq, which gives us certain protection. A softirq can run on several CPUs at once. BUT the important part is a softirq will never preempt another softirq running on the same CPU. This gives us the opportunity to access per-cpu variables in softirq context. Extend napi_alloc_cache (before only contained page_frag_cache) to be a struct with a small array based stack for holding SKBs. Introduce a SKB defer and flush API for accessing this. Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any() when running in NAPI context. A small trick to handle/detect if we are called from netpoll is to see if budget is 0. In that case, we need to invoke dev_consume_skb_irq(). Joint work with Alexander Duyck. Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11igmp: Namespacify igmp_qrv sysctl knobNikolay Borisov3-22/+28
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11igmp: Namespaceify igmp_llm_reports sysctl knobNikolay Borisov3-12/+18
This was initially introduced in df2cf4a78e488d26 ("IGMP: Inhibit reports for local multicast groups") by defining the sysctl in the ipv4_net_table array, however it was never implemented to be namespace aware. Fix this by changing the code accordingly. Signed-off-by: David S. Miller <[email protected]>
2016-02-11igmp: Namespaceify igmp_max_msf sysctl knobNikolay Borisov4-13/+12
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11igmp: Namespaceify igmp_max_memberships sysctl knobNikolay Borisov3-10/+10
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11openvswitch: allow management from inside user namespacesTycho Andersen2-10/+14
Operations with the GENL_ADMIN_PERM flag fail permissions checks because this flag means we call netlink_capable, which uses the init user ns. Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations which should be allowed inside a user namespace. The motivation for this is to be able to run openvswitch in unprivileged containers. I've tested this and it seems to work, but I really have no idea about the security consequences of this patch, so thoughts would be much appreciated. v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one massive one Reported-by: James Page <[email protected]> Signed-off-by: Tycho Andersen <[email protected]> CC: Eric Biederman <[email protected]> CC: Pravin Shelar <[email protected]> CC: Justin Pettit <[email protected]> CC: "David S. Miller" <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11rds: duplicate include net/tcp.hstephen hemminger1-1/+0
Duplicate include detected. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11net: Allow tunnels to use inner checksum offloads with outer checksums neededAlexander Duyck1-2/+1
This patch enables us to use inner checksum offloads if provided by hardware with outer checksums computed by software. It basically reduces encap_hdr_csum to an advisory flag for now, but based on the fact that SCTP may be getting segmentation support before long I thought we may want to keep it as it is possible we may need to support CRC32c and 1's compliment checksum in the same packet at some point in the future. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11udp: Use uh->len instead of skb->len to compute checksum in segmentationAlexander Duyck1-15/+13
The segmentation code was having to do a bunch of work to pull the skb->len and strip the udp header offset before the value could be used to adjust the checksum. Instead of doing all this work we can just use the value that goes into uh->len since that is the correct value with the correct byte order that we need anyway. By using this value we can save ourselves a bunch of pain as there is no need to do multiple byte swaps. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11udp: Clean up the use of flags in UDP segmentation offloadAlexander Duyck1-19/+18
This patch goes though and cleans up the logic related to several of the control flags used in UDP segmentation. Specifically the use of dont_encap isn't really needed as we can just check the skb for CHECKSUM_PARTIAL and if it isn't set then we don't need to update the internal headers. As such we can just drop that value. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11gre: Use inner_proto to obtain inner header protocolAlexander Duyck1-4/+2
Instead of parsing headers to determine the inner protocol we can just pull the value from inner_proto. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11gre: Use GSO flags to determine csum need instead of GRE flagsAlexander Duyck1-34/+30
This patch updates the gre checksum path to follow something much closer to the UDP checksum path. By doing this we can avoid needing to do as much header inspection and can just make use of the fields we were already reading in the sk_buff structure. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11net: Move skb_has_shared_frag check out of GRE code and into segmentationAlexander Duyck2-11/+5
The call skb_has_shared_frag is used in the GRE path and skb_checksum_help to verify that no frags can be modified by an external entity. This check really doesn't belong in the GRE path but in the skb_segment function itself. This way any protocol that might be segmented will be performing this check before attempting to offload a checksum to software. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11net: Store checksum result for offloaded GSO checksumsAlexander Duyck1-2/+6
This patch makes it so that we can offload the checksums for a packet up to a certain point and then begin computing the checksums via software. Setting this up is fairly straight forward as all we need to do is reset the values stored in csum and csum_start for the GSO context block. One complication for this is remote checksum offload. In order to allow the inner checksums to be offloaded while computing the outer checksum manually we needed to have some way of indicating that the offload wasn't real. In order to do that I replaced CHECKSUM_PARTIAL with CHECKSUM_UNNECESSARY in the case of us computing checksums for the outer header while skipping computing checksums for the inner headers. We clean up the ip_summed flag and set it to either CHECKSUM_PARTIAL or CHECKSUM_NONE once we hand the packet off to the next lower level. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11net: Update remote checksum segmentation to support use of GSO checksumAlexander Duyck2-16/+16
This patch addresses two main issues. First in the case of remote checksum offload we were avoiding dealing with scatter-gather issues. As a result it would be possible to assemble a series of frames that used frags instead of being linearized as they should have if remote checksum offload was enabled. Second I have updated the code so that we now let GSO take care of doing the checksum on the data itself and drop the special case that was added for remote checksum offload. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11net: Move GSO csum into SKB_GSO_CBAlexander Duyck1-7/+9
This patch moves the checksum maintained by GSO out of skb->csum and into the GSO context block in order to allow for us to work on outer checksums while maintaining the inner checksum offsets in the case of the inner checksum being offloaded, while the outer checksums will be computed. While updating the code I also did a minor cleanu-up on gso_make_checksum. The change is mostly to make it so that we store the values and compute the checksum instead of computing the checksum and then storing the values we needed to update. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11net: Drop unecessary enc_features variable from tunnel segmentation functionsAlexander Duyck2-6/+6
The enc_features variable isn't necessary since features isn't used anywhere after we create enc_features so instead just use a destructive AND on features itself and save ourselves the variable declaration. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11ipv6: add option to drop unsolicited neighbor advertisementsJohannes Berg2-0/+17
In certain 802.11 wireless deployments, there will be NA proxies that use knowledge of the network to correctly answer requests. To prevent unsolicitd advertisements on the shared medium from being a problem, on such deployments wireless needs to drop them. Enable this by providing an option called "drop_unsolicited_na". Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11ipv6: add option to drop unicast encapsulated in L2 multicastJohannes Berg2-0/+18
In order to solve a problem with 802.11, the so-called hole-196 attack, add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if enabled, causes the stack to drop IPv6 unicast packets encapsulated in link-layer multi- or broadcast frames. Such frames can (as an attack) be created by any member of the same wireless network and transmitted as valid encrypted frames since the symmetric key for broadcast frames is shared between all stations. Reviewed-by: Julian Anastasov <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11ipv4: add option to drop gratuitous ARP packetsJohannes Berg2-0/+10
In certain 802.11 wireless deployments, there will be ARP proxies that use knowledge of the network to correctly answer requests. To prevent gratuitous ARP frames on the shared medium from being a problem, on such deployments wireless needs to drop them. Enable this by providing an option called "drop_gratuitous_arp". Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11ipv4: add option to drop unicast encapsulated in L2 multicastJohannes Berg2-1/+26
In order to solve a problem with 802.11, the so-called hole-196 attack, add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if enabled, causes the stack to drop IPv4 unicast packets encapsulated in link-layer multi- or broadcast frames. Such frames can (as an attack) be created by any member of the same wireless network and transmitted as valid encrypted frames since the symmetric key for broadcast frames is shared between all stations. Additionally, enabling this option provides compliance with a SHOULD clause of RFC 1122. Reviewed-by: Julian Anastasov <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11net: Add support for filtering link dump by master device and kindDavid Ahern1-1/+67
Add support for filtering link dumps by master device and kind, similar to the filtering implemented for neighbor dumps. Each net_device that exists adds between 1196 bytes (eth) and 1556 bytes (bridge) to the link dump. As the number of interfaces increases so does the amount of data pushed to user space for a link list. If the user only wants to see a list of specific devices (e.g., interfaces enslaved to a specific bridge or a list of VRFs) most of that data is thrown away. Passing the filters to the kernel to have only relevant data returned makes the dump more efficient. Signed-off-by: David Ahern <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11soreuseport: fast reuseport TCP socket selectionCraig Gallek5-11/+89
This change extends the fast SO_REUSEPORT socket lookup implemented for UDP to TCP. Listener sockets with SO_REUSEPORT and the same receive address are additionally added to an array for faster random access. This means that only a single socket from the group must be found in the listener list before any socket in the group can be used to receive a packet. Previously, every socket in the group needed to be considered before handing off the incoming packet. This feature also exposes the ability to use a BPF program when selecting a socket from a reuseport group. Signed-off-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11soreuseport: Prep for fast reuseport TCP socket selectionCraig Gallek2-1/+2
Both of the lines in this patch probably should have been included in the initial implementation of this code for generic socket support, but weren't technically necessary since only UDP sockets were supported. First, the sk_reuseport_cb points to a structure which assumes each socket in the group has this pointer assigned at the same time it's added to the array in the structure. The sk_clone_lock function breaks this assumption. Since a child socket shouldn't implicitly be in a reuseport group, the simple fix is to clear the field in the clone. Second, the SO_ATTACH_REUSEPORT_xBPF socket options require that SO_REUSEPORT also be set first. For UDP sockets, this is easily enforced at bind-time since that process both puts the socket in the appropriate receive hlist and updates the reuseport structures. Since these operations can happen at two different times for TCP sockets (bind and listen) it must be explicitly checked to enforce the use of SO_REUSEPORT with SO_ATTACH_REUSEPORT_xBPF in the setsockopt call. Signed-off-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11inet: refactor inet[6]_lookup functions to take skbCraig Gallek9-32/+64
This is a preliminary step to allow fast socket lookup of SO_REUSEPORT groups. Doing so with a BPF filter will require access to the skb in question. This change plumbs the skb (and offset to payload data) through the call stack to the listening socket lookup implementations where it will be used in a following patch. Signed-off-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11inet: create IPv6-equivalent inet_hash functionCraig Gallek5-46/+61
In order to support fast lookups for TCP sockets with SO_REUSEPORT, the function that adds sockets to the listening hash set needs to be able to check receive address equality. Since this equality check is different for IPv4 and IPv6, we will need two different socket hashing functions. This patch adds inet6_hash identical to the existing inet_hash function and updates the appropriate references. A following patch will differentiate the two by passing different comparison functions to __inet_hash. Additionally, in order to use the IPv6 address equality function from inet6_hashtables (which is compiled as a built-in object when IPv6 is enabled) it also needs to be in a built-in object file as well. This moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this. Signed-off-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-11sock: struct proto hash function may errorCraig Gallek9-17/+44
In order to support fast reuseport lookups in TCP, the hash function defined in struct proto must be capable of returning an error code. This patch changes the function signature of all related hash functions to return an integer and handles or propagates this return value at all call sites. Signed-off-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-10batman-adv: Convert batadv_tt_common_entry to krefSven Eckelmann2-13/+49
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-10batman-adv: Convert batadv_orig_node to krefSven Eckelmann8-17/+22
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-10batman-adv: Convert batadv_orig_node_vlan to krefSven Eckelmann2-7/+21
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-10batman-adv: Convert batadv_hard_iface to krefSven Eckelmann5-23/+27
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-10batman-adv: Convert batadv_neigh_node to krefSven Eckelmann4-17/+20
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-10batman-adv: Convert batadv_orig_ifinfo to krefSven Eckelmann3-9/+13
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-10batman-adv: Convert batadv_neigh_ifinfo to krefSven Eckelmann2-8/+11
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-10batman-adv: Convert batadv_tt_orig_list_entry to krefSven Eckelmann2-10/+18
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>