aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-03-13sctp: allow sctp_transmit_packet and others to use gfpMarcelo Ricardo Leitner7-66/+83
Currently sctp_sendmsg() triggers some calls that will allocate memory with GFP_ATOMIC even when not necessary. In the case of sctp_packet_transmit it will allocate a linear skb that will be used to construct the packet and this may cause sends to fail due to ENOMEM more often than anticipated specially with big MTUs. This patch thus allows it to inherit gfp flags from upper calls so that it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or similar. All others, like retransmits or flushes started from BH, are still allocated using GFP_ATOMIC. In netperf tests this didn't result in any performance drawbacks when memory is not too fragmented and made it trigger ENOMEM way less often. Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-13ovs: allow nl 'flow set' to use ufid without flow keySamuel Gauthier1-11/+17
When we want to change a flow using netlink, we have to identify it to be able to perform a lookup. Both the flow key and unique flow ID (ufid) are valid identifiers, but we always have to specify the flow key in the netlink message. When both attributes are there, the ufid is used. The flow key is used to validate the actions provided by the userland. This commit allows to use the ufid without having to provide the flow key, as it is already done in the netlink 'flow get' and 'flow del' path. The flow key remains mandatory when an action is provided. Signed-off-by: Samuel Gauthier <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-13netconf: add macro to represent all attributesZhang Shengju2-32/+44
This patch adds macro NETCONFA_ALL to represent all type of netconf attributes for IPv4 and IPv6. Signed-off-by: Zhang Shengju <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-13sctp: fix the transports round robin issue when init is retransmittedXin Long2-2/+2
prior to this patch, at the beginning if we have two paths in one assoc, they may have the same params other than the last_time_heard, it will try the paths like this: 1st cycle try trans1 fail. then trans2 is selected.(cause it's last_time_heard is after trans1). 2nd cycle: try trans2 fail then trans2 is selected.(cause it's last_time_heard is after trans1). 3rd cycle: try trans2 fail then trans2 is selected.(cause it's last_time_heard is after trans1). .... trans1 will never have change to be selected, which is not what we expect. we should keeping round robin all the paths if they are just added at the beginning. So at first every tranport's last_time_heard should be initialized 0, so that we ensure they have the same value at the beginning, only by this, all the transports could get equal chance to be selected. Then for sctp_trans_elect_best, it should return the trans_next one when *trans == *trans_next, so that we can try next if it fails, but now it always return trans. so we can fix it by exchanging these two params when we calls sctp_trans_elect_tie(). Fixes: 4c47af4d5eb2 ('net: sctp: rework multihoming retransmission path selection to rfc4960') Signed-off-by: Xin Long <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-13rxrpc: Replace all unsigned with unsigned intDavid Howells8-39/+39
Replace all "unsigned" types with "unsigned int" types. Reported-by: David Miller <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-13gro: Defer clearing of flush bit in tunnel pathsAlexander Duyck2-4/+2
This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that we do not clear the flush bit until after we have called the next level GRO handler. Previously this was being cleared before parsing through the list of frames, however this resulted in several paths where either the bit needed to be reset but wasn't as in the case of FOU, or cases where it was being set as in GENEVE. By just deferring the clearing of the bit until after the next level protocol has been parsed we can avoid any unnecessary bit twiddling and avoid bugs. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-12netfilter: x_tables: check for size overflowFlorian Westphal1-0/+3
Ben Hawkes says: integer overflow in xt_alloc_table_info, which on 32-bit systems can lead to small structure allocation and a copy_from_user based heap corruption. Reported-by: Ben Hawkes <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-03-11bpf: support flow label for bpf_skb_{set, get}_tunnel_keyDaniel Borkmann1-2/+12
This patch extends bpf_tunnel_key with a tunnel_label member, that maps to ip_tunnel_key's label so underlying backends like vxlan and geneve can propagate the label to udp_tunnel6_xmit_skb(), where it's being set in the IPv6 header. It allows for having 20 more bits to encode/decode flow related meta information programmatically. Tested with vxlan and geneve. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-11ip_tunnel: add support for setting flow label via collect metadataDaniel Borkmann2-4/+4
This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label from call sites. Currently, there's no such option and it's always set to zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so that flow-based tunnels via collect metadata frontends can make use of it. vxlan and geneve will be converted to add flow label support separately. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-11bridge: allow zero ageing timeStephen Hemminger1-3/+8
This fixes a regression in the bridge ageing time caused by: commit c62987bbd8a1 ("bridge: push bridge setting ageing_time down to switchdev") There are users of Linux bridge which use the feature that if ageing time is set to 0 it causes entries to never expire. See: https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge For a pure software bridge, it is unnecessary for the code to have arbitrary restrictions on what values are allowable. Signed-off-by: Stephen Hemminger <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-11net/flower: Fix pointer castAmir Vadai1-6/+6
Cast pointer to unsigned long instead of u64, to fix compilation warning on 32 bit arch, spotted by 0day build. Fixes: 5b33f48 ("net/flower: Introduce hardware offload support") Signed-off-by: Amir Vadai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-11Bluetooth: Fix potential buffer overflow with Add AdvertisingJohan Hedberg1-0/+4
The Add Advertising command handler does the appropriate checks for the AD and Scan Response data, however fails to take into account the general length of the mgmt command itself, which could lead to potential buffer overflows. This patch adds the necessary check that the mgmt command length is consistent with the given ad and scan_rsp lengths. Signed-off-by: Johan Hedberg <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Cc: [email protected]
2016-03-11Bluetooth: Fix setting correct flags in ADJohan Hedberg1-1/+3
A recent change added MGMT_ADV_FLAG_DISCOV to the flags returned by get_adv_instance_flags(), however failed to take into account limited discoverable mode. This patch fixes the issue by setting the correct discoverability flag in the AD data. Signed-off-by: Johan Hedberg <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-03-11netfilter: nft_compat: check match/targetinfo attr sizeFlorian Westphal1-0/+6
We copy according to ->target|matchsize, so check that the netlink attribute (which can include padding and might be larger) contains enough data. Reported-by: Julia Lawall <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-03-11Merge tag 'ipvs-fixes-for-v4.5' of ↵Pablo Neira Ayuso3-12/+35
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs Simon Horman says: ==================== please consider these IPVS fixes for v4.5 or if it is too late please consider them for v4.6. * Arnd Bergman has corrected an error whereby the SIP persistence engine may incorrectly access protocol fields * Julian Anastasov has corrected a problem reported by Jiri Bohac with the connection rescheduling mechanism added in 3.10 when new SYNs in connection to dead real server can be redirected to another real server. * Marco Angaroni resolved a problem in the SIP persistence engine whereby the Call-ID could not be found if it was at the beginning of a SIP message. ==================== Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-03-10net/9p: convert to new CQ APIChristoph Hellwig1-55/+31
Trivial conversion to the new RDMA CQ API. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Dominique Martinet <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-03-10net/flow_dissector: Make dissector_uses_key() and ↵Amir Vadai1-13/+0
skb_flow_dissector_target() public Will be used in a following patch to query if a key is being used, and what it's value in the target object. Acked-by: John Fastabend <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Amir Vadai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-10net/flower: Introduce hardware offload supportAmir Vadai1-1/+63
This patch is based on a patch made by John Fastabend. It adds support for offloading cls_flower. when NETIF_F_HW_TC is on: flags = 0 => Rule will be processed twice - by hardware, and if still relevant, by software. flags = SKIP_HW => Rull will be processed by software only If hardware fail/not capabale to apply the rule, operation will NOT fail. Filter will be processed by SW only. Acked-by: Jiri Pirko <[email protected]> Suggested-by: John Fastabend <[email protected]> Signed-off-by: Amir Vadai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-10net: dsa: Fix cleanup resources upon module removalNeil Armstrong1-8/+8
The initial commit badly merged into the dsa_resume method instead of the dsa_remove_dst method. As consequence, the dst->master_netdev->dsa_ptr is not set to NULL on removal and re-bind of the dsa device fails with error -17. Fixes: b0dc635d923c ("net: dsa: cleanup resources upon module removal ") Signed-off-by: Neil Armstrong <[email protected]> Acked-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-10Bluetooth: Increment management interface revisionJohan Hedberg1-1/+1
Increment the mgmt revision due to the recently added limited privacy mode. Signed-off-by: Johan Hedberg <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-03-10Bluetooth: Add support for limited privacy modeJohan Hedberg4-11/+75
Introduce a limited privacy mode indicated by value 0x02 to the mgmt Set Privacy command. With value 0x02 the kernel will use privacy mode with a resolvable private address. In case the controller is bondable and discoverable the identity address will be used. Signed-off-by: Johan Hedberg <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-03-10Bluetooth: Fix adding discoverable to adv instance flagsJohan Hedberg1-0/+3
When lookup up the advertising instance flags for the default advertising instance (0) the discoverable flag should be filled in based on the HCI_DISCOVERABLE flag. Signed-off-by: Johan Hedberg <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-03-10Bluetooth: Move memset closer to where it's neededJohan Hedberg1-2/+2
Minor fix to not do the memset until the variable it clears is actually used. Signed-off-by: Johan Hedberg <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-03-106lowpan: iphc: fix SAM/DAM bit commentAlexander Aring1-2/+2
This patch fixes the comments for SAM/DAM value. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-03-106lowpan: debugfs: add missing staticAlexander Aring1-2/+2
This patch solves the sparse warning: net/6lowpan/debugfs.c:164:30: warning: symbol 'lowpan_ctx_pfx_fops' was not declared. Should it be static? net/6lowpan/debugfs.c:241:30: warning: symbol 'lowpan_context_fops' was not declared. Should it be static? Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-03-10Merge branch 'master' of git://blackhole.kfki.hu/nfPablo Neira Ayuso4-31/+32
Jozsef Kadlecsik says: ==================== Please apply the next patch against the nf tree: - Deniz Eren reported that parallel flush/dump of list:set type of sets can lead to kernel crash. The bug was due to non-RCU compatible flushing, listing in the set type, fixed by me. - Julia Lawall pointed out that IPSET_ATTR_ETHER netlink attribute length was not checked explicitly. The patch adds the missing checkings. ==================== Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-03-09packet: validate variable length ll headersWillem de Bruijn1-25/+18
Replace link layer header validation check ll_header_truncate with more generic dev_validate_header. Validation based on hard_header_len incorrectly drops valid packets in variable length protocols, such as AX25. dev_validate_header calls header_ops.validate for such protocols to ensure correctness below hard_header_len. See also http://comments.gmane.org/gmane.linux.network/401064 Fixes 9c7077622dd9 ("packet: make packet_snd fail on len smaller than l2 header") Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09ax25: add link layer header validation functionWillem de Bruijn1-0/+15
As variable length protocol, AX25 fails link layer header validation tests based on a minimum length. header_ops.validate allows protocols to validate headers that are shorter than hard_header_len. Implement this callback for AX25. See also http://comments.gmane.org/gmane.linux.network/401064 Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09kcm: Add receive message timeoutTom Herbert2-2/+36
This patch adds receive timeout for message assembly on the attached TCP sockets. The timeout is set when a new messages is started and the whole message has not been received by TCP (not in the receive queue). If the completely message is subsequently received the timer is cancelled, if the timer expires the RX side is aborted. The timeout value is taken from the socket timeout (SO_RCVTIMEO) that is set on a TCP socket (i.e. set by get sockopt before attaching a TCP socket to KCM. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09kcm: Add memory limit for receive message constructionTom Herbert2-2/+48
Message assembly is performed on the TCP socket. This is logically equivalent of an application that performs a peek on the socket to find out how much memory is needed for a receive buffer. The receive socket buffer also provides the maximum message size which is checked. The receive algorithm is something like: 1) Receive the first skbuf for a message (or skbufs if multiple are needed to determine message length). 2) Check the message length against the number of bytes in the TCP receive queue (tcp_inq()). - If all the bytes of the message are in the queue (incluing the skbuf received), then proceed with message assembly (it should complete with the tcp_read_sock) - Else, mark the psock with the number of bytes needed to complete the message. 3) In TCP data ready function, if the psock indicates that we are waiting for the rest of the bytes of a messages, check the number of queued bytes against that. - If there are still not enough bytes for the message, just return - Else, clear the waiting bytes and proceed to receive the skbufs. The message should now be received in one tcp_read_sock Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09kcm: Sendpage supportTom Herbert1-2/+145
Implement kcm_sendpage. Set in sendpage to kcm_sendpage in both dgram and seqpacket ops. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09kcm: Splice supportTom Herbert1-2/+96
Implement kcm_splice_read. This is supported only for seqpacket. Add kcm_seqpacket_ops and set splice read to kcm_splice_read. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09kcm: Add statistics and proc interfacesTom Herbert3-1/+503
This patch adds various counters for KCM. These include counters for messages and bytes received or sent, as well as counters for number of attached/unattached TCP sockets and other error or edge events. The statistics are exposed via a proc interface. /proc/net/kcm provides statistics per KCM socket and per psock (attached TCP sockets). /proc/net/kcm_stats provides aggregate statistics. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09kcm: Kernel Connection Multiplexor moduleTom Herbert5-0/+2031
This module implements the Kernel Connection Multiplexor. Kernel Connection Multiplexor (KCM) is a facility that provides a message based interface over TCP for generic application protocols. With KCM an application can efficiently send and receive application protocol messages over TCP using datagram sockets. For more information see the included Documentation/networking/kcm.txt Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09tcp: Add tcp_inq to get available receive bytes on socketTom Herbert1-14/+1
Create a common kernel function to get the number of bytes available on a TCP socket. This is based on code in INQ getsockopt and we now call the function for that getsockopt. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09net: Walk fragments in __skb_splice_bitsTom Herbert1-23/+16
Add walking of fragments in __skb_splice_bits. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09net: Add MSG_BATCH flagTom Herbert1-0/+5
Add a new msg flag called MSG_BATCH. This flag is used in sendmsg to indicate that more messages will follow (i.e. a batch of messages is being sent). This is similar to MSG_MORE except that the following messages are not merged into one packet, they are sent individually. sendmmsg is updated so that each contained message except for the last one is marked as MSG_BATCH. MSG_BATCH is a performance optimization in cases where a socket implementation can benefit by transmitting packets in a batch. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09net: Allow MSG_EOR in each msghdr of sendmmsgTom Herbert1-4/+6
This patch allows setting MSG_EOR in each individual msghdr passed in sendmmsg. This allows a sendmmsg to send multiple messages when using SOCK_SEQPACKET. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-09net: Make sock_alloc exportableTom Herbert1-1/+2
Export it for cases where we want to create sockets by hand. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-08ipv6: per netns FIB garbage collectionMichal Kubeček1-5/+4
One of our customers observed issues with FIB6 garbage collectors running in different network namespaces blocking each other, resulting in soft lockups (fib6_run_gc() initiated from timer runs always in forced mode). Now that FIB6 walkers are separated per namespace, there is no more need for instances of fib6_run_gc() in different namespaces blocking each other. There is still a call to icmp6_dst_gc() which operates on shared data but this function is protected by its own shared lock. Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-08ipv6: per netns fib6 walkersMichal Kubeček1-32/+36
The IPv6 FIB data structures are separated per network namespace but there is still only one global walkers list and one global walker list lock. This means changes in one namespace unnecessarily interfere with walkers in other namespaces. Replace the global list with per-netns lists (and give each its own lock). Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-08ipv6: replace global gc_args with local variableMichal Kubeček1-6/+8
Global variable gc_args is only used in fib6_run_gc() and functions called from it. As fib6_run_gc() makes sure there is at most one instance of fib6_clean_all() running at any moment, we can replace gc_args with a local variable which will be needed once multiple instances (per netns) of garbage collector are allowed. Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-08sctp: fix copying more bytes than expected in sctp_add_bind_addrMarcelo Ricardo Leitner4-8/+14
Dmitry reported that sctp_add_bind_addr may read more bytes than expected in case the parameter is a IPv4 addr supplied by the user through calls such as sctp_bindx_add(), because it always copies sizeof(union sctp_addr) while the buffer may be just a struct sockaddr_in, which is smaller. This patch then fixes it by limiting the memcpy to the min between the union size and a (new parameter) provided addr size. Where possible this parameter still is the size of that union, except for reading from user-provided buffers, which then it accounts for protocol type. Reported-by: Dmitry Vyukov <[email protected]> Tested-by: Dmitry Vyukov <[email protected]> Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-08netfilter: ipset: Check IPSET_ATTR_ETHER netlink attribute lengthJozsef Kadlecsik2-1/+4
Julia Lawall pointed out that IPSET_ATTR_ETHER netlink attribute length was not checked explicitly, just for the maximum possible size. Malicious netlink clients could send shorter attribute and thus resulting a kernel read after the buffer. The patch adds the explicit length checkings. Reported-by: Julia Lawall <[email protected]> Signed-off-by: Jozsef Kadlecsik <[email protected]>
2016-03-08net_sched: dsmark: use qdisc_dequeue_peeked()Kyeong Yoo1-1/+1
This fix is for dsmark similar to commit 3557619f0f6f7496ed453d4825e249 ("net_sched: prio: use qdisc_dequeue_peeked") and makes use of qdisc_dequeue_peeked() instead of direct dequeue() call. First time, wrr peeks dsmark, which will then peek into sfq. sfq dequeues an skb and it's stored in sch->gso_skb. Next time, wrr tries to dequeue from dsmark, which will call sfq dequeue directly. This results skipping the previously peeked skb. So changed dsmark dequeue to call qdisc_dequeue_peeked() instead to use peeked skb if exists. Signed-off-by: Kyeong Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller24-320/+586
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Remove useless debug message when deleting IPVS service, from Yannick Brosseau. 2) Get rid of compilation warning when CONFIG_PROC_FS is unset in several spots of the IPVS code, from Arnd Bergmann. 3) Add prandom_u32 support to nft_meta, from Florian Westphal. 4) Remove unused variable in xt_osf, from Sudip Mukherjee. 5) Don't calculate IP checksum twice from netfilter ipv4 defrag hook since fixing af_packet defragmentation issues, from Joe Stringer. 6) On-demand hook registration for iptables from netns. Instead of registering the hooks for every available netns whenever we need one of the support tables, we register this on the specific netns that needs it, patchset from Florian Westphal. 7) Add missing port range selection to nf_tables masquerading support. BTW, just for the record, there is a typo in the description of 5f6c253ebe93b0 ("netfilter: bridge: register hooks only when bridge interface is added") that refers to the cluster match as deprecated, but it is actually the CLUSTERIP target (which registers hooks inconditionally) the one that is scheduled for removal. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-03-08bpf, vxlan, geneve, gre: fix usage of dst_cache on xmitDaniel Borkmann2-5/+7
The assumptions from commit 0c1d70af924b ("net: use dst_cache for vxlan device"), 468dfffcd762 ("geneve: add dst caching support") and 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage when ip_tunnel_info is used is unfortunately not always valid as assumed. While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre with different remote dsts, tos, etc, therefore they cannot be assumed as packet independent. Right now vxlan, geneve, gre would cache the dst for eBPF and every packet would reuse the same entry that was first created on the initial route lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have a different one. Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test in vxlan needs to be handeled differently in this context as it is currently inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable() helper is added for the three tunnel cases, which checks if we can use dst cache. Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device") Fixes: 468dfffcd762 ("geneve: add dst caching support") Fixes: 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Paolo Abeni <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-08bpf: support for access to tunnel optionsDaniel Borkmann1-6/+77
After eBPF being able to programmatically access/manage tunnel key meta data via commit d3aa45ce6b94 ("bpf: add helpers to access tunnel metadata") and more recently also for IPv6 through c6c33454072f ("bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key"), this work adds two complementary helpers to generically access their auxiliary tunnel options. Geneve and vxlan support this facility. For geneve, TLVs can be pushed, and for the vxlan case its GBP extension. I.e. setting tunnel key for geneve case only makes sense, if we can also read/write TLVs into it. In the GBP case, it provides the flexibility to easily map the group policy ID in combination with other helpers or maps. I chose to model this as two separate helpers, bpf_skb_{set,get}_tunnel_opt(), for a couple of reasons. bpf_skb_{set,get}_tunnel_key() is already rather complex by itself, and there may be cases for tunnel key backends where tunnel options are not always needed. If we would have integrated this into bpf_skb_{set,get}_tunnel_key() nevertheless, we are very limited with remaining helper arguments, so keeping compatibility on structs in case of passing in a flat buffer gets more cumbersome. Separating both also allows for more flexibility and future extensibility, f.e. options could be fed directly from a map, etc. Moreover, change geneve's xmit path to test only for info->options_len instead of TUNNEL_GENEVE_OPT flag. This makes it more consistent with vxlan's xmit path and allows for avoiding to specify a protocol flag in the API on xmit, so it can be protocol agnostic. Having info->options_len is enough information that is needed. Tested with vxlan and geneve. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-08bpf: allow to propagate df in bpf_skb_set_tunnel_keyDaniel Borkmann1-1/+5
Added by 9a628224a61b ("ip_tunnel: Add dont fragment flag."), allow to feed df flag into tunneling facilities (currently supported on TX by vxlan, geneve and gre) as a hint from eBPF's bpf_skb_set_tunnel_key() helper. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-08bpf: make helper function protos staticDaniel Borkmann1-9/+9
They are only used here, so there's no reason they should not be static. Only the vlan push/pop protos are used in the test_bpf suite. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>