aboutsummaryrefslogtreecommitdiff
path: root/net/openvswitch
AgeCommit message (Collapse)AuthorFilesLines
2020-08-03openvswitch: Prevent kernel-infoleak in ovs_ct_put_key()Peilin Ye1-18/+20
ovs_ct_put_key() is potentially copying uninitialized kernel stack memory into socket buffers, since the compiler may leave a 3-byte hole at the end of `struct ovs_key_ct_tuple_ipv4` and `struct ovs_key_ct_tuple_ipv6`. Fix it by initializing `orig` with memset(). Fixes: 9dd7f8907c37 ("openvswitch: Add original direction conntrack tuple to sw_flow_key.") Suggested-by: Dan Carpenter <[email protected]> Signed-off-by: Peilin Ye <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03net/sched: act_ct: fix miss set mru for ovs after defrag in act_ctwenxu1-0/+1
When openvswitch conntrack offload with act_ct action. Fragment packets defrag in the ingress tc act_ct action and miss the next chain. Then the packet pass to the openvswitch datapath without the mru. The over mtu packet will be dropped in output action in openvswitch for over mtu. "kernel: net2: dropped over-mtu packet: 1528 > 1500" This patch add mru in the tc_skb_ext for adefrag and miss next chain situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru to the qdisc_skb_cb when the packet defrag. And When the chain miss, The mru is set to tc_skb_ext which can be got by ovs datapath. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: wenxu <[email protected]> Reviewed-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-24net: openvswitch: fixes potential deadlock in dp cleanup codeEelco Chaudron2-14/+13
The previous patch introduced a deadlock, this patch fixes it by making sure the work is canceled without holding the global ovs lock. This is done by moving the reorder processing one layer up to the netns level. Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage") Reported-by: [email protected] Reported-by: [email protected] Reviewed-by: Paolo <[email protected]> Signed-off-by: Eelco Chaudron <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-17net: openvswitch: reorder masks array based on usageEelco Chaudron4-7/+207
This patch reorders the masks array every 4 seconds based on their usage count. This greatly reduces the masks per packet hit, and hence the overall performance. Especially in the OVS/OVN case for OpenShift. Here are some results from the OVS/OVN OpenShift test, which use 8 pods, each pod having 512 uperf connections, each connection sends a 64-byte request and gets a 1024-byte response (TCP). All uperf clients are on 1 worker node while all uperf servers are on the other worker node. Kernel without this patch : 7.71 Gbps Kernel with this patch applied: 14.52 Gbps We also run some tests to verify the rebalance activity does not lower the flow insertion rate, which does not. Signed-off-by: Eelco Chaudron <[email protected]> Tested-by: Andrew Theurer <[email protected]> Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-13net: openvswitch: kerneldoc fixesAndrew Lunn2-4/+5
Simple fixes which require no deep knowledge of the code. Cc: Pravin B Shelar <[email protected]> Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-06-24openvswitch: take into account de-fragmentation/gso_size in ↵Lorenzo Bianconi1-2/+7
execute_check_pkt_len ovs connection tracking module performs de-fragmentation on incoming fragmented traffic. Take info account if traffic has been de-fragmented in execute_check_pkt_len action otherwise we will perform the wrong nested action considering the original packet size. This issue typically occurs if ovs-vswitchd adds a rule in the pipeline that requires connection tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action. Moreover take into account GSO fragment size for GSO packet in execute_check_pkt_len routine Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len") Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada1-4/+4
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <[email protected]>
2020-04-25net: openvswitch: use div_u64() for 64-by-32 divisionsTonghao Zhang1-1/+1
Compile the kernel for arm 32 platform, the build warning found. To fix that, should use div_u64() for divisions. | net/openvswitch/meter.c:396: undefined reference to `__udivdi3' [add more commit msg, change reported tag, and use div_u64 instead of do_div by Tonghao] Fixes: e57358873bb5d6ca ("net: openvswitch: use u64 for meter bucket") Reported-by: kbuild test robot <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Tonghao Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-25net: openvswitch: suitable access to the dp_metersTonghao Zhang1-3/+3
To fix the following sparse warning: | net/openvswitch/meter.c:109:38: sparse: sparse: incorrect type | in assignment (different address spaces) ... | net/openvswitch/meter.c:720:45: sparse: sparse: incorrect type | in argument 1 (different address spaces) ... Reported-by: kbuild test robot <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2-2/+5
Simple overlapping changes to linux/vermagic.h Signed-off-by: David S. Miller <[email protected]>
2020-04-23net: openvswitch: use u64 for meter bucketTonghao Zhang2-2/+2
When setting the meter rate to 4+Gbps, there is an overflow, the meters don't work as expected. Cc: Pravin B Shelar <[email protected]> Cc: Andy Zhou <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-23net: openvswitch: make EINVAL return value more obviousTonghao Zhang1-3/+2
Cc: Pravin B Shelar <[email protected]> Cc: Andy Zhou <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-23net: openvswitch: remove the unnecessary checkTonghao Zhang1-5/+4
Before invoking the ovs_meter_cmd_reply_stats, "meter" was checked, so don't check it agin in that function. Cc: Pravin B Shelar <[email protected]> Cc: Andy Zhou <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-23net: openvswitch: set max limitation to metersTonghao Zhang2-10/+49
Don't allow user to create meter unlimitedly, which may cause to consume a large amount of kernel memory. The max number supported is decided by physical memory and 20K meters as default. Cc: Pravin B Shelar <[email protected]> Cc: Andy Zhou <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-23net: openvswitch: expand the meters supported numberTonghao Zhang3-63/+195
In kernel datapath of Open vSwitch, there are only 1024 buckets of meter in one datapath. If installing more than 1024 (e.g. 8192) meters, it may lead to the performance drop. But in some case, for example, Open vSwitch used as edge gateway, there should be 20K at least, where meters used for IP address bandwidth limitation. [Open vSwitch userspace datapath has this issue too.] For more scalable meter, this patch use meter array instead of hash tables, and expand/shrink the array when necessary. So we can install more meters than before in the datapath. Introducing the struct *dp_meter_instance, it's easy to expand meter though changing the *ti point in the struct *dp_meter_table. Cc: Pravin B Shelar <[email protected]> Cc: Andy Zhou <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-20net: openvswitch: ovs_ct_exit to be done under ovs_lockTonghao Zhang2-2/+5
syzbot wrote: | ============================= | WARNING: suspicious RCU usage | 5.7.0-rc1+ #45 Not tainted | ----------------------------- | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!! | | other info that might help us debug this: | rcu_scheduler_active = 2, debug_locks = 1 | ... | | stack backtrace: | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 | Workqueue: netns cleanup_net | Call Trace: | ... | ovs_ct_exit | ovs_exit_net | ops_exit_list.isra.7 | cleanup_net | process_one_work | worker_thread To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add lockdep_ovsl_is_held as optional lockdep expression. Link: https://lore.kernel.org/lkml/[email protected] Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Cc: Pravin B Shelar <[email protected]> Cc: Yi-Hung Wei <[email protected]> Reported-by: [email protected] Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-02net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entryTonghao Zhang1-4/+6
The struct sw_flow is protected by RCU, when traversing them, use hlist_for_each_entry_rcu. Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Reviewed-by: Greg Rose <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-29net: Fix typo of SKB_SGO_CB_OFFSETCambda Zhu1-1/+1
The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the offset of the GSO in skb cb. This patch fixes the typo. Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation") Signed-off-by: Cambda Zhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+1
Minor overlapping changes, nothing serious. Signed-off-by: David S. Miller <[email protected]>
2020-03-03openvswitch: add missing attribute validation for hashJakub Kicinski1-0/+1
Add missing attribute validation for OVS_PACKET_ATTR_HASH to the netlink policy. Fixes: bd1903b7c459 ("net: openvswitch: add hash info to upcall") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Greg Rose <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller5-15/+24
Conflict resolution of ice_virtchnl_pf.c based upon work by Stephen Rothwell. Signed-off-by: David S. Miller <[email protected]>
2020-02-20openvswitch: Distribute switch variables for initializationKees Cook1-8/+10
Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. net/openvswitch/flow_netlink.c: In function ‘validate_set’: net/openvswitch/flow_netlink.c:2711:29: warning: statement will never be executed [-Wswitch-unreachable] 2711 | const struct ovs_key_ipv4 *ipv4_key; | ^~~~~~~~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-18flow_table.c: Use built-in RCU list checkingMadhuparna Bhowmik1-2/+4
hlist_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-18datapath.c: Use built-in RCU list checkingMadhuparna Bhowmik1-3/+6
hlist_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-18vport.c: Use built-in RCU list checkingMadhuparna Bhowmik1-1/+2
hlist_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-18meter.c: Use built-in RCU list checkingMadhuparna Bhowmik1-1/+2
hlist_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-16openvswitch: add TTL decrement actionMatteo Croce2-0/+137
New action to decrement TTL instead of setting it to a fixed value. This action will decrement the TTL and, in case of expired TTL, drop it or execute an action passed via a nested attribute. The default TTL expired action is to drop the packet. Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively. Tested with a corresponding change in the userspace: # ovs-dpctl dump-flows in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},1 in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},2 in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2 in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1 # ping -c1 192.168.0.2 -t 42 IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64 # ping -c1 192.168.0.2 -t 120 IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64 # ping -c1 192.168.0.2 -t 1 # Co-developed-by: Bindiya Kurle <[email protected]> Signed-off-by: Bindiya Kurle <[email protected]> Signed-off-by: Matteo Croce <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: openvswitch: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld1-7/+4
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-24openvswitch: New MPLS actions for layer 2 tunnellingMartin Varghese2-6/+58
The existing PUSH MPLS action inserts MPLS header between ethernet header and the IP header. Though this behaviour is fine for L3 VPN where an IP packet is encapsulated inside a MPLS tunnel, it does not suffice the L2 VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should encapsulate the ethernet packet. The new mpls action ADD_MPLS inserts MPLS header at the start of the packet or at the start of the l3 header depending on the value of l3 tunnel flag in the ADD_MPLS arguments. POP_MPLS action is extended to support ethertype 0x6558. Signed-off-by: Martin Varghese <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya2-3/+3
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <[email protected]> Link: https://lore.kernel.org/r/[email protected] Co-developed-by: Kees Cook <[email protected]> Signed-off-by: Kees Cook <[email protected]> Acked-by: David Miller <[email protected]> # for net
2019-12-04net: Fixed updating of ethertype in skb_mpls_push()Martin Varghese1-1/+2
The skb_mpls_push was not updating ethertype of an ethernet packet if the packet was originally received from a non ARPHRD_ETHER device. In the below OVS data path flow, since the device corresponding to port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does not update the ethertype of the packet even though the previous push_eth action had added an ethernet header to the packet. recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no), actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00), push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4 Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper") Signed-off-by: Martin Varghese <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-04openvswitch: support asymmetric conntrackAaron Conole1-0/+11
The openvswitch module shares a common conntrack and NAT infrastructure exposed via netfilter. It's possible that a packet needs both SNAT and DNAT manipulation, due to e.g. tuple collision. Netfilter can support this because it runs through the NAT table twice - once on ingress and again after egress. The openvswitch module doesn't have such capability. Like netfilter hook infrastructure, we should run through NAT twice to keep the symmetry. Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Aaron Conole <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-02Fixed updating of ethertype in function skb_mpls_popMartin Varghese1-1/+2
The skb_mpls_pop was not updating ethertype of an ethernet packet if the packet was originally received from a non ARPHRD_ETHER device. In the below OVS data path flow, since the device corresponding to port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update the ethertype of the packet even though the previous push_eth action had added an ethernet header to the packet. recirc_id(0),in_port(7),eth_type(0x8847), mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1), actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00), pop_mpls(eth_type=0x800),4 Fixes: ed246cee09b9 ("net: core: move pop MPLS functionality from OvS to core helper") Signed-off-by: Martin Varghese <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-01openvswitch: remove another BUG_ON()Paolo Abeni1-1/+5
If we can't build the flow del notification, we can simply delete the flow, no need to crash the kernel. Still keep a WARN_ON to preserve debuggability. Note: the BUG_ON() predates the Fixes tag, but this change can be applied only after the mentioned commit. v1 -> v2: - do not leak an skb on error Fixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical section.") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-01openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()Paolo Abeni1-1/+4
All the callers of ovs_flow_cmd_build_info() already deal with error return code correctly, so we can handle the error condition in a more gracefull way. Still dump a warning to preserve debuggability. v1 -> v2: - clarify the commit message - clean the skb and report the error (DaveM) Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-26openvswitch: fix flow command message sizePaolo Abeni1-1/+5
When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant flow has no UFID, we can exceed the computed size, as ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY attribute. Take the above in account when computing the flow command message size. Fixes: 74ed7ab9264c ("openvswitch: Add support for unique flow IDs.") Reported-by: Qi Jun Ding <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-15net: openvswitch: don't call pad_packet if not necessaryTonghao Zhang1-14/+8
The nla_put_u16/nla_put_u32 makes sure that *attrlen is align. The call tree is that: nla_put_u16/nla_put_u32 -> nla_put attrlen = sizeof(u16) or sizeof(u32) -> __nla_put attrlen -> __nla_reserve attrlen -> skb_put(skb, nla_total_size(attrlen)) nla_total_size returns the total length of attribute including padding. Cc: Joe Stringer <[email protected]> Cc: William Tu <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-14net: openvswitch: add hash info to upcallTonghao Zhang2-1/+37
When using the kernel datapath, the upcall don't include skb hash info relatived. That will introduce some problem, because the hash of skb is important in kernel stack. For example, VXLAN module uses it to select UDP src port. The tx queue selection may also use the hash in stack. Hash is computed in different ways. Hash is random for a TCP socket, and hash may be computed in hardware, or software stack. Recalculation hash is not easy. Hash of TCP socket is computed: tcp_v4_connect -> sk_set_txhash (is random) __tcp_transmit_skb -> skb_set_hash_from_sk There will be one upcall, without information of skb hash, to ovs-vswitchd, for the first packet of a TCP session. The rest packets will be processed in Open vSwitch modules, hash kept. If this tcp session is forward to VXLAN module, then the UDP src port of first tcp packet is different from rest packets. TCP packets may come from the host or dockers, to Open vSwitch. To fix it, we store the hash info to upcall, and restore hash when packets sent back. +---------------+ +-------------------------+ | Docker/VMs | | ovs-vswitchd | +----+----------+ +-+--------------------+--+ | ^ | | | | | | upcall v restore packet hash (not recalculate) | +-+--------------------+--+ | tap netdev | | vxlan module +---------------> +--> Open vSwitch ko +--> or internal type | | +-------------------------+ Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-06net: openvswitch: select vport upcall portid directlyTonghao Zhang1-2/+3
The commit 69c51582ff786 ("dpif-netlink: don't allocate per thread netlink sockets"), in Open vSwitch ovs-vswitchd, has changed the number of allocated sockets to just one per port by moving the socket array from a per handler structure to a per datapath one. In the kernel datapath, a vport will have only one socket in most case, if so select it directly in fast-path. Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05Change in Openvswitch to support MPLS label depth of 3 in ingress directionMartin Varghese4-33/+85
The openvswitch was supporting a MPLS label depth of 1 in the ingress direction though the userspace OVS supports a max depth of 3 labels. This change enables openvswitch module to support a max depth of 3 labels in the ingress. Signed-off-by: Martin Varghese <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: simplify the ovs_dp_cmd_newTonghao Zhang1-22/+38
use the specified functions to init resource. Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: don't unlock mutex when changing the user_features failsTonghao Zhang1-1/+1
Unlocking of a not locked mutex is not allowed. Other kernel thread may be in critical section while we unlock it because of setting user_feature fail. Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index") Cc: Paul Blakey <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Acked-by: William Tu <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: fix possible memleak on destroy flow-tableTonghao Zhang1-88/+98
When we destroy the flow tables which may contain the flow_mask, so release the flow mask struct. Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: add likely in flow_lookupTonghao Zhang1-2/+2
The most case *index < ma->max, and flow-mask is not NULL. We add un/likely for performance. Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Acked-by: William Tu <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: simplify the flow_hashTonghao Zhang1-5/+2
Simplify the code and remove the unnecessary BUILD_BUG_ON. Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Acked-by: William Tu <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: optimize flow-mask looking upTonghao Zhang1-51/+53
The full looking up on flow table traverses all mask array. If mask-array is too large, the number of invalid flow-mask increase, performance will be drop. One bad case, for example: M means flow-mask is valid and NULL of flow-mask means deleted. +-------------------------------------------+ | M | NULL | ... | NULL | M| +-------------------------------------------+ In that case, without this patch, openvswitch will traverses all mask array, because there will be one flow-mask in the tail. This patch changes the way of flow-mask inserting and deleting, and the mask array will be keep as below: there is not a NULL hole. In the fast path, we can "break" "for" (not "continue") in flow_lookup when we get a NULL flow-mask. "break" v +-------------------------------------------+ | M | M | NULL |... | NULL | NULL| +-------------------------------------------+ This patch don't optimize slow or control path, still using ma->max to traverse. Slow path: * tbl_mask_array_realloc * ovs_flow_tbl_lookup_exact * flow_mask_find Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: optimize flow mask cache hash collisionTonghao Zhang1-42/+53
Port the codes to linux upstream and with little changes. Pravin B Shelar, says: | In case hash collision on mask cache, OVS does extra flow | lookup. Following patch avoid it. Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1 Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: shrink the mask array if necessaryTonghao Zhang1-10/+23
When creating and inserting flow-mask, if there is no available flow-mask, we realloc the mask array. When removing flow-mask, if necessary, we shrink mask array. Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Acked-by: William Tu <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: convert mask list in mask arrayTonghao Zhang3-51/+167
Port the codes to linux upstream and with little changes. Pravin B Shelar, says: | mask caches index of mask in mask_list. On packet recv OVS | need to traverse mask-list to get cached mask. Therefore array | is better for retrieving cached mask. This also allows better | cache replacement algorithm by directly checking mask's existence. Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5 Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Acked-by: William Tu <[email protected]> Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-03net: openvswitch: add flow-mask cache for performanceTonghao Zhang3-16/+107
The idea of this optimization comes from a patch which is committed in 2014, openvswitch community. The author is Pravin B Shelar. In order to get high performance, I implement it again. Later patches will use it. Pravin B Shelar, says: | On every packet OVS needs to lookup flow-table with every | mask until it finds a match. The packet flow-key is first | masked with mask in the list and then the masked key is | looked up in flow-table. Therefore number of masks can | affect packet processing performance. Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5 Signed-off-by: Tonghao Zhang <[email protected]> Tested-by: Greg Rose <[email protected]> Acked-by: William Tu <[email protected]> Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>