aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2013-08-30ipv6: Remove redundant sk variableThomas Graf1-1/+0
A sk variable initialized to ndisc_sk is already available outside of the branch. Signed-off-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-30tcp: do not use cached RTT for RTT estimationYuchung Cheng1-33/+11
RTT cached in the TCP metrics are valuable for the initial timeout because SYN RTT usually does not account for serialization delays on low BW path. However using it to seed the RTT estimator maybe disruptive because other components (e.g., pacing) require the smooth RTT to be obtained from actual connection. The solution is to use the higher cached RTT to set the first RTO conservatively like tcp_rtt_estimator(), but avoid seeding the other RTT estimator variables such as srtt. It is also a good idea to keep RTO conservative to obtain the first RTT sample, and the performance is insured by TCP loss probe if SYN RTT is available. To keep the seeding formula consistent across SYN RTT and cached RTT, the rttvar is twice the cached RTT instead of cached RTTVAR value. The reason is because cached variation may be too small (near min RTO) which defeats the purpose of being conservative on first RTO. However the metrics still keep the RTT variations as they might be useful for user applications (through ip). Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Tested-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-30pkt_sched: fq: prefetch() fixEric Dumazet1-1/+2
kbuild bot reported following m68k build error : net/sched/sch_fq.c: In function 'fq_dequeue': >> net/sched/sch_fq.c:491:2: error: implicit declaration of function 'prefetch' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors While we are fixing this, move this prefetch() call a bit earlier. Reported-by: Wu Fengguang <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29pkt_sched: fq: Fair Queue packet schedulerEric Dumazet3-0/+807
- Uses perfect flow match (not stochastic hash like SFQ/FQ_codel) - Uses the new_flow/old_flow separation from FQ_codel - New flows get an initial credit allowing IW10 without added delay. - Special FIFO queue for high prio packets (no need for PRIO + FQ) - Uses a hash table of RB trees to locate the flows at enqueue() time - Smart on demand gc (at enqueue() time, RB tree lookup evicts old unused flows) - Dynamic memory allocations. - Designed to allow millions of concurrent flows per Qdisc. - Small memory footprint : ~8K per Qdisc, and 104 bytes per flow. - Single high resolution timer for throttled flows (if any). - One RB tree to link throttled flows. - Ability to have a max rate per flow. We might add a socket option to add per socket limitation. Attempts have been made to add TCP pacing in TCP stack, but this seems to add complex code to an already complex stack. TCP pacing is welcomed for flows having idle times, as the cwnd permits TCP stack to queue a possibly large number of packets. This removes the 'slow start after idle' choice, hitting badly large BDP flows, and applications delivering chunks of data as video streams. Nicely spaced packets : Here interface is 10Gbit, but flow bottleneck is ~20Mbit cwin is big, yet FQ avoids the typical bursts generated by TCP (as in netperf TCP_RR -- -r 100000,100000) 15:01:23.545279 IP A > B: . 78193:81089(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.545394 IP B > A: . ack 81089 win 3668 <nop,nop,timestamp 11597985 1115> 15:01:23.546488 IP A > B: . 81089:83985(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.546565 IP B > A: . ack 83985 win 3668 <nop,nop,timestamp 11597986 1115> 15:01:23.547713 IP A > B: . 83985:86881(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.547778 IP B > A: . ack 86881 win 3668 <nop,nop,timestamp 11597987 1115> 15:01:23.548911 IP A > B: . 86881:89777(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.548949 IP B > A: . ack 89777 win 3668 <nop,nop,timestamp 11597988 1115> 15:01:23.550116 IP A > B: . 89777:92673(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.550182 IP B > A: . ack 92673 win 3668 <nop,nop,timestamp 11597989 1115> 15:01:23.551333 IP A > B: . 92673:95569(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.551406 IP B > A: . ack 95569 win 3668 <nop,nop,timestamp 11597991 1115> 15:01:23.552539 IP A > B: . 95569:98465(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.552576 IP B > A: . ack 98465 win 3668 <nop,nop,timestamp 11597992 1115> 15:01:23.553756 IP A > B: . 98465:99913(1448) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554138 IP A > B: P 99913:100001(88) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554204 IP B > A: . ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.554234 IP B > A: . 65248:68144(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.555620 IP B > A: . 68144:71040(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.557005 IP B > A: . 71040:73936(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.558390 IP B > A: . 73936:76832(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.559773 IP B > A: . 76832:79728(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.561158 IP B > A: . 79728:82624(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.562543 IP B > A: . 82624:85520(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.563928 IP B > A: . 85520:88416(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.565313 IP B > A: . 88416:91312(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.566698 IP B > A: . 91312:94208(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.568083 IP B > A: . 94208:97104(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.569467 IP B > A: . 97104:100000(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.570852 IP B > A: . 100000:102896(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.572237 IP B > A: . 102896:105792(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.573639 IP B > A: . 105792:108688(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.575024 IP B > A: . 108688:111584(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.576408 IP B > A: . 111584:114480(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.577793 IP B > A: . 114480:117376(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> TCP timestamps show that most packets from B were queued in the same ms timeframe (TSval 1159799{3,4}), but FQ managed to send them right in time to avoid a big burst. In slow start or steady state, very few packets are throttled [1] FQ gets a bunch of tunables as : limit : max number of packets on whole Qdisc (default 10000) flow_limit : max number of packets per flow (default 100) quantum : the credit per RR round (default is 2 MTU) initial_quantum : initial credit for new flows (default is 10 MTU) maxrate : max per flow rate (default : unlimited) buckets : number of RB trees (default : 1024) in hash table. (consumes 8 bytes per bucket) [no]pacing : disable/enable pacing (default is enable) All of them can be changed on a live qdisc. $ tc qd add dev eth0 root fq help Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ] [ quantum BYTES ] [ initial_quantum BYTES ] [ maxrate RATE ] [ buckets NUMBER ] [ [no]pacing ] $ tc -s -d qd qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 256 quantum 3028 initial_quantum 15140 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14) backlog 0b 0p requeues 14 511 flows, 511 inactive, 0 throttled 110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit [1] Except if initial srtt is overestimated, as if using cached srtt in tcp metrics. We'll provide a fix for this issue. Signed-off-by: Eric Dumazet <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29can: gw: add a per rule limitation of frame hopsOliver Hartkopp1-4/+31
Usually the received CAN frames can be processed/routed as much as 'max_hops' times (which is given at module load time of the can-gw module). Introduce a new configuration option to reduce the number of possible hops for a specific gateway rule to a value smaller then max_hops. Signed-off-by: Oliver Hartkopp <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2013-08-29net: packet: use reciprocal_divide in fanout_demux_hashDaniel Borkmann1-1/+1
Instead of hard-coding reciprocal_divide function, use the inline function from reciprocal_div.h. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29net: packet: add randomized fanout schedulerDaniel Borkmann1-1/+12
We currently allow for different fanout scheduling policies in pf_packet such as scheduling by skb's rxhash, round-robin, by cpu, and rollover. Also allow for a random, equidistributed selection of the socket from the fanout process group. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29net: add netdev_upper_get_next_dev_rcu(dev, iter)Veaceslav Falico1-0/+25
This function returns the next dev in the dev->upper_dev_list after the struct list_head **iter position, and updates *iter accordingly. Returns NULL if there are no devices left. Caller must hold RCU read lock. CC: "David S. Miller" <[email protected]> CC: Eric Dumazet <[email protected]> CC: Jiri Pirko <[email protected]> CC: Alexander Duyck <[email protected]> CC: Cong Wang <[email protected]> Signed-off-by: Veaceslav Falico <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29net: remove search_list from netdev_adjacentVeaceslav Falico1-36/+1
We already don't need it cause we see every upper/lower device in the list already. CC: "David S. Miller" <[email protected]> CC: Eric Dumazet <[email protected]> CC: Jiri Pirko <[email protected]> CC: Alexander Duyck <[email protected]> CC: Cong Wang <[email protected]> Signed-off-by: Veaceslav Falico <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29net: add lower_dev_list to net_device and make a full meshVeaceslav Falico1-27/+258
This patch adds lower_dev_list list_head to net_device, which is the same as upper_dev_list, only for lower devices, and begins to use it in the same way as the upper list. It also changes the way the whole adjacent device lists work - now they contain *all* of upper/lower devices, not only the first level. The first level devices are distinguished by the bool neighbour field in netdev_adjacent, also added by this patch. There are cases when a device can be added several times to the adjacent list, the simplest would be: /---- eth0.10 ---\ eth0- --- bond0 \---- eth0.20 ---/ where both bond0 and eth0 'see' each other in the adjacent lists two times. To avoid duplication of netdev_adjacent structures ref_nr is being kept as the number of times the device was added to the list. The 'full view' is achieved by adding, on link creation, all of the upper_dev's upper_dev_list devices as upper devices to all of the lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice versa. On unlink they are removed using the same logic. I've tested it with thousands vlans/bonds/bridges, everything works ok and no observable lags even on a huge number of interfaces. Memory footprint for 128 devices interconnected with each other via both upper and lower (which is impossible, but for the comparison) lists would be: 128*128*2*sizeof(netdev_adjacent) = 1.5MB but in the real world we usualy have at most several devices with slaves and a lot of vlans, so the footprint will be much lower. CC: "David S. Miller" <[email protected]> CC: Eric Dumazet <[email protected]> CC: Jiri Pirko <[email protected]> CC: Alexander Duyck <[email protected]> CC: Cong Wang <[email protected]> Signed-off-by: Veaceslav Falico <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29net: rename netdev_upper to netdev_adjacentVeaceslav Falico1-12/+12
Rename the structure to reflect the upcoming addition of lower_dev_list. CC: "David S. Miller" <[email protected]> CC: Eric Dumazet <[email protected]> CC: Jiri Pirko <[email protected]> CC: Alexander Duyck <[email protected]> CC: Cong Wang <[email protected]> Signed-off-by: Veaceslav Falico <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29net: sctp: sctp_verify_init: clean up mandatory checks and add commentDaniel Borkmann1-14/+12
Add a comment related to RFC4960 explaning why we do not check for initial TSN, and while at it, remove yoda notation checks and clean up code from checks of mandatory conditions. That's probably just really minor, but makes reviewing easier. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29tcp: TSO packets automatic sizingEric Dumazet4-7/+65
After hearing many people over past years complaining against TSO being bursty or even buggy, we are proud to present automatic sizing of TSO packets. One part of the problem is that tcp_tso_should_defer() uses an heuristic relying on upcoming ACKS instead of a timer, but more generally, having big TSO packets makes little sense for low rates, as it tends to create micro bursts on the network, and general consensus is to reduce the buffering amount. This patch introduces a per socket sk_pacing_rate, that approximates the current sending rate, and allows us to size the TSO packets so that we try to send one packet every ms. This field could be set by other transports. Patch has no impact for high speed flows, where having large TSO packets makes sense to reach line rate. For other flows, this helps better packet scheduling and ACK clocking. This patch increases performance of TCP flows in lossy environments. A new sysctl (tcp_min_tso_segs) is added, to specify the minimal size of a TSO packet (default being 2). A follow-up patch will provide a new packet scheduler (FQ), using sk_pacing_rate as an input to perform optional per flow pacing. This explains why we chose to set sk_pacing_rate to twice the current rate, allowing 'slow start' ramp up. sk_pacing_rate = 2 * cwnd * mss / srtt v2: Neal Cardwell reported a suspect deferring of last two segments on initial write of 10 MSS, I had to change tcp_tso_should_defer() to take into account tp->xmit_size_goal_segs Signed-off-by: Eric Dumazet <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Van Jacobson <[email protected]> Cc: Tom Herbert <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29ipv6: drop fragmented ndisc packets by default (RFC 6980)Hannes Frederic Sowa2-0/+27
This patch implements RFC6980: Drop fragmented ndisc packets by default. If a fragmented ndisc packet is received the user is informed that it is possible to disable the check. Cc: Fernando Gont <[email protected]> Cc: YOSHIFUJI Hideaki <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29bridge: inherit slave devices needed_headroomFlorian Fainelli1-0/+3
Some slave devices may have set a dev->needed_headroom value which is different than the default one, most likely in order to prepend a hardware descriptor in front of the Ethernet frame to send. Whenever a new slave is added to a bridge, ensure that we update the needed_headroom value accordingly to account for the slave needed_headroom value. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29Merge branch 'master' of ↵John W. Linville30-583/+1046
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c
2013-08-28Merge branch 'for-john' of ↵John W. Linville9-22/+27
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
2013-08-28Merge branch 'for-john' of ↵John W. Linville1-8/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2013-08-28Merge branch 'master' of ↵John W. Linville4-21/+48
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c net/mac80211/ibss.c
2013-08-28batman-adv: send GW_DEL event when the gw client mode is deselectedAntonio Quartulli3-0/+32
Whenever the GW client mode is deselected, a DEL event has to be sent in order to tell userspace that the current gateway has been lost. Send the uevent on state change only if a gateway was currently selected. Reported-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2013-08-28batman-adv: Start new development cycleSimon Wunderlich1-1/+1
Signed-off-by: Simon Wunderlich <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2013-08-28batman-adv: move enum definition at the top of the fileAntonio Quartulli1-15/+16
Signed-off-by: Antonio Quartulli <[email protected]>
2013-08-28batman-adv: set skb priority according to contentSimon Wunderlich10-2/+91
The skb priority field may help the wireless driver to choose the right queue (e.g. WMM queues). This should be set in batman-adv, as this information is only available here. This patch adds support for IPv4/IPv6 DS fields and VLAN PCP. Note that only VLAN PCP is used if a VLAN header is present. Also initially set TC_PRIO_CONTROL only for self-generated packets, and keep the priority set by higher layers. Signed-off-by: Simon Wunderlich <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2013-08-27Merge branch 'master' of ↵David S. Miller10-542/+1291
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A number of significant new features and optimizations for net-next/3.12. Highlights are: * "Megaflows", an optimization that allows userspace to specify which flow fields were used to compute the results of the flow lookup. This allows for a major reduction in flow setups (the major performance bottleneck in Open vSwitch) without reducing flexibility. * Converting netlink dump operations to use RCU, allowing for additional parallelism in userspace. * Matching and modifying SCTP protocol fields. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-08-28netfilter: ctnetlink: fix uninitialized variableFlorian Westphal1-1/+1
net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_nfqueue_attach_expect': 'helper' may be used uninitialized in this function It was only initialized in if CTA_EXPECT_HELP_NAME attribute was present, it must be NULL otherwise. Problem added recently in bd077937 (netfilter: nfnetlink_queue: allow to attach expectations to conntracks). Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28netfilter: add IPv6 SYNPROXY targetPatrick McHardy3-0/+509
Add an IPv6 version of the SYNPROXY target. The main differences to the IPv4 version is routing and IP header construction. Signed-off-by: Patrick McHardy <[email protected]> Tested-by: Martin Topholm <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28net: syncookies: export cookie_v6_init_sequence/cookie_v6_checkPatrick McHardy1-9/+16
Extract the local TCP stack independant parts of tcp_v6_init_sequence() and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY target. Signed-off-by: Patrick McHardy <[email protected]> Acked-by: David S. Miller <[email protected]> Tested-by: Martin Topholm <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28netfilter: add SYNPROXY core/targetPatrick McHardy9-0/+966
Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy core with common functions and an address family specific target. The SYNPROXY receives the connection request from the client, responds with a SYN/ACK containing a SYN cookie and announcing a zero window and checks whether the final ACK from the client contains a valid cookie. It then establishes a connection to the original destination and, if successful, sends a window update to the client with the window size announced by the server. Support for timestamps, SACK, window scaling and MSS options can be statically configured as target parameters if the features of the server are known. If timestamps are used, the timestamp value sent back to the client in the SYN/ACK will be different from the real timestamp of the server. In order to now break PAWS, the timestamps are translated in the direction server->client. Signed-off-by: Patrick McHardy <[email protected]> Tested-by: Martin Topholm <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28net: syncookies: export cookie_v4_init_sequence/cookie_v4_checkPatrick McHardy1-11/+18
Extract the local TCP stack independant parts of tcp_v4_init_sequence() and cookie_v4_check() and export them for use by the upcoming SYNPROXY target. Signed-off-by: Patrick McHardy <[email protected]> Acked-by: David S. Miller <[email protected]> Tested-by: Martin Topholm <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28netfilter: nf_conntrack: make sequence number adjustments usuable without NATPatrick McHardy11-338/+300
Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <[email protected]> Tested-by: Martin Topholm <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28netfilter: nf_defrag_ipv6.o included twiceNathan Hintz1-1/+1
'nf_defrag_ipv6' is built as a separate module; it shouldn't be included in the 'nf_conntrack_ipv6' module as well. Signed-off-by: Nathan Hintz <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28netfilter: ip[6]t_REJECT: tcp-reset using wrong MAC source if bridgedPhil Oester2-2/+39
As reported by Casper Gripenberg, in a bridged setup, using ip[6]t_REJECT with the tcp-reset option sends out reset packets with the src MAC address of the local bridge interface, instead of the MAC address of the intended destination. This causes some routers/firewalls to drop the reset packet as it appears to be spoofed. Fix this by bypassing ip[6]_local_out and setting the MAC of the sender in the tcp reset packet. This closes netfilter bugzilla #531. Signed-off-by: Phil Oester <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-27openvswitch: optimize flow compare and mask functionsAndy Zhou2-39/+44
Make sure the sw_flow_key structure and valid mask boundaries are always machine word aligned. Optimize the flow compare and mask operations using machine word size operations. This patch improves throughput on average by 15% when CPU is the bottleneck of forwarding packets. This patch is inspired by ideas and code from a patch submitted by Peter Klausler titled "replace memcmp() with specialized comparator". However, The original patch only optimizes for architectures support unaligned machine word access. This patch optimizes for all architectures. Signed-off-by: Andy Zhou <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-27net: tcp_probe: allow more advanced ingress filtering by markDaniel Borkmann1-4/+11
Currently, the tcp_probe snooper can either filter packets by a given port (handed to the module via module parameter e.g. port=80) or lets all TCP traffic pass (port=0, default). When a port is specified, the port number is tested against the sk's source/destination port. Thus, if one of them matches, the information will be further processed for the log. As this is quite limited, allow for more advanced filtering possibilities which can facilitate debugging/analysis with the help of the tcp_probe snooper. Therefore, similarly as added to BPF machine in commit 7e75f93e ("pkt_sched: ingress socket filter by mark"), add the possibility to use skb->mark as a filter. If the mark is not being used otherwise, this allows ingress filtering by flow (e.g. in order to track updates from only a single flow, or a subset of all flows for a given port) and other things such as dynamic logging and reconfiguration without removing/re-inserting the tcp_probe module, etc. Simple example: insmod net/ipv4/tcp_probe.ko fwmark=8888 full=1 ... iptables -A INPUT -i eth4 -t mangle -p tcp --dport 22 \ --sport 60952 -j MARK --set-mark 8888 [... sampling interval ...] iptables -D INPUT -i eth4 -t mangle -p tcp --dport 22 \ --sport 60952 -j MARK --set-mark 8888 The current option to filter by a given port is still being preserved. A similar approach could be done for the sctp_probe module as a follow-up. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-26openvswitch: Rename key_len to key_endAndy Zhou2-16/+17
Key_end is a better name describing the ending boundary than key_len. Rename those variables to make it less confusing. Signed-off-by: Andy Zhou <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-26openvswitch: Add SCTP supportJoe Stringer5-4/+115
This patch adds support for rewriting SCTP src,dst ports similar to the functionality already available for TCP/UDP. Rewriting SCTP ports is expensive due to double-recalculation of the SCTP checksums; this is performed to ensure that packets traversing OVS with invalid checksums will continue to the destination with any checksum corruption intact. Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Joe Stringer <[email protected]> Signed-off-by: Ben Pfaff <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller13-43/+73
Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <[email protected]>
2013-08-26mac80211: fix change_interface queue assignmentsJohannes Berg1-8/+11
Jouni reported that with mac80211_hwsim, multicast TX was causing crashes due to invalid vif->cab_queue assignment. It turns out that this is caused by change_interface() getting invoked and not having the vif->type/vif->p2p assigned correctly before calling the queue check (ieee80211_check_queues). Fix this by passing the 'external' interface type to the function and adjusting it accordingly. While at it, also fix the error path in change_interface, it wasn't correctly resetting to the external type but using the internal one instead. Fortunately this affects on hwsim because all other drivers set the vif->type/vif->p2p variables when changing iftype. This shouldn't be needed, but almost all implementations actually do it for their own internal handling. Reported-by: Jouni Malinen <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2013-08-25ipip: potential race in ip_tunnel_init_net()Dan Carpenter1-6/+4
Eric Dumazet says that my previous fix for an ERR_PTR dereference (ea857f28ab 'ipip: dereferencing an ERR_PTR in ip_tunnel_init_net()') could be racy and suggests the following fix instead. Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-23openvswitch: Mega flow implementationAndy Zhou5-512/+1123
Add wildcarded flow support in kernel datapath. Wildcarded flow can improve OVS flow set up performance by avoid sending matching new flows to the user space program. The exact performance boost will largely dependent on wildcarded flow hit rate. In case all new flows hits wildcard flows, the flow set up rate is within 5% of that of linux bridge module. Pravin has made significant contributions to this patch. Including API clean ups and bug fixes. Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: Andy Zhou <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-23openvswitch: check CONFIG_OPENVSWITCH_GRE in makefileCong Wang2-4/+4
Cc: Jesse Gross <[email protected]> Cc: Pravin B Shelar <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-23openvswitch: Fix argument descriptions in vport.c.Justin Pettit1-1/+2
Signed-off-by: Justin Pettit <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-23openvswitch:: link upper device for port devicesJiri Pirko1-1/+19
Link upper device properly. That will make IFLA_MASTER filled up. Set the master to port 0 of the datapath under which the port belongs. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-23openvswitch: Use non rcu hlist_del() flow table entry.Pravin B Shelar1-1/+1
Flow table destroy is done in rcu call-back context. Therefore there is no need to use rcu variant of hlist_del(). Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-23openvswitch: Use RCU lock for dp dump operation.Pravin B Shelar1-6/+7
RCUfy dp-dump operation which is already read-only. This makes all ovs dump operations lockless. Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-23openvswitch: Use RCU lock for flow dump operation.Pravin B Shelar1-8/+9
Flow dump operation is read-only operation. There is no need to take ovs-lock. Following patch use rcu-lock for dumping flows. Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-08-23mac80211: ignore (E)CSA in probe response framesJohannes Berg1-8/+3
Seth reports that some APs, notably the Netgear WNDAP360, send invalid ECSA IEs in probe response frames with the operating class and channel number both set to zero, even when no channel switch is being done. As a result, any scan while connected to such an AP results in the connection being dropped. Fix this by ignoring any channel switch announcment in probe response frames entirely, since we're connected to the AP we will be receiving a beacon (and maybe even an action frame) if a channel switch is done, which is sufficient. Cc: [email protected] # 3.10 Reported-by: Seth Forshee <[email protected]> Tested-by: Seth Forshee <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2013-08-23cfg80211: add flags to cfg80211_rx_mgmt()Vladimir Kondratiev4-7/+8
Add flags intended to report various auxiliary information and introduce the NL80211_RXMGMT_FLAG_ANSWERED flag to report that the frame was already answered by the device. Signed-off-by: Vladimir Kondratiev <[email protected]> [REPLIED->ANSWERED, reword commit message] Signed-off-by: Johannes Berg <[email protected]>
2013-08-23mac80211: assign seqnums for group QoS framesBob Copeland1-2/+4
According to 802.11-2012 9.3.2.10, paragraph 4, QoS data frames with a group address in the Address 1 field have sequence numbers allocated from the same counter as non-QoS data and management frames. Without this flag, some drivers may not assign sequence numbers, and in rare cases frames might get dropped. Set the control flag accordingly. Signed-off-by: Bob Copeland <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2013-08-23mac80211: only respond to probe request with mesh IDChun-Yeow Yeoh1-0/+3
Previously, the mesh STA responds to probe request from legacy STA but now it will only respond to legacy STA if the legacy STA does include the specific mesh ID or wildcard mesh ID in the probe request. The iw patch "iw: scan using meshid" can be used either by legacy STA or by mesh STA to do active scanning by inserting the mesh ID in the probe request frame. Signed-off-by: Chun-Yeow Yeoh <[email protected]> Acked-by: Thomas Pedersen <[email protected]> Acked-by: Javier Cardona <[email protected]> Signed-off-by: Johannes Berg <[email protected]>