aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-11-02epic100: replace printk with netdev_ callsBen Boeckel1-67/+58
Also snipes some whitespace errors. Signed-off-by: Ben Boeckel <mathstuf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-02Merge branch 'master' of ↵David S. Miller6-28/+27
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== This series contains updates to e1000, igb, ixgbe and ixgbevf. Hong Zhiguo provides a fix for e1000 where tx_ring and adapter->tx_ring are already of type "struct e1000_tx_ring" so no need to divide by e1000_tx_ring size in the idx calculation. Emil provides a fix for ixgbevf to remove a redundant workaround related to header split and a fix for ixgbe to resolve an issue where the MTA table can be cleared when the interface is reset while in promisc mode. Todd provides a fix for igb to prevent ethtool from writing to the iNVM in i210/i211 devices. This issue was reported by Marek Vasut <marex@denx.de>. Anton Blanchard provides a fix for ixgbe to reduce memory consumption with larger page sizes, seen on PPC. Don provides a cleanup in ixgbe to replace the IXGBE_DESC_UNUSED macro with the inline function ixgbevf_desc_unused() to make the logic a bit more readable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-02Merge branch 'for-davem' of ↵David S. Miller2-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next Ben Hutchings says: ==================== A single fix by Alexandre Rames for the recent changes to TSO. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-01ixgbe: fix inconsistent clearing of the multicast tableEmil Tantilov1-8/+7
This patch resolves an issue where the MTA table can be cleared when the interface is reset while in promisc mode. As result IPv6 traffic between VFs will be interrupted. This patch makes the update of the MTA table unconditional to avoid the inconsistent clearing on reset. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-11-01ixgbe: cleanup IXGBE_DESC_UNUSEDDon Skidmore2-8/+12
This patch just replaces the IXGBE_DESC_UNUSED macro with a like named inline function ixgbevf_desc_unused. The inline function makes the logic a bit more readable. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-11-01ixgbe: Reduce memory consumption with larger page sizesAnton Blanchard1-0/+4
The ixgbe driver allocates pages for its receive rings. It currently uses 512 pages, regardless of page size. During receive handling it adds the unused part of the page back into the rx ring, avoiding the need for a new allocation. On a ppc64 box with 64 threads and 64kB pages, we end up with 512 entries * 64 rx queues * 64kB = 2GB memory used. Even more of a concern is that we use up 2GB of IOMMU space in order to map all this memory. The driver makes a number of decisions based on if PAGE_SIZE is less than 8kB, so use this as the breakpoint and only allocate 128 entries on 8kB or larger page sizes. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-11-01igb: Don't let ethtool try to write to iNVM in i210/i211Fujinaka, Todd1-1/+3
Don't let ethtool try to write to iNVM in i210/i211. This fixes an issue seen by Marek Vasut. Reported-by: Marek Vasut <marex@denx.de> Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-11-01ixgbevf: remove redundant workaroundEmil Tantilov1-9/+0
This patch removes a workaround related to header split, which is redundant because the driver does not support splitting packet headers on Rx. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-11-01e1000: fix wrong queue idx calculationHong Zhiguo1-2/+1
tx_ring and adapter->tx_ring are already of type "struct e1000_tx_ring *" Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-31sfc: Fix DMA unmapping issue with firmware assisted TSOAlexandre Rames2-2/+8
When using firmware assisted TSO, we use a single DMA mapping for the linear area of a TSO skb. We still have to segment the super-packet and insert a descriptor containing the original headers before each segment of payload, so we can unmap the linear area only after the last segment is completed. The unmapping information for the linear area is therefore associated with the last header descriptor. We calculate the DMA address to unmap from using the map length and the invariant that the end of the DMA mapping matches the end of the data referenced by the last descriptor. But this invariant is broken when there is TCP payload in the linear area. Fix this by adding and using an explicit dma_offset field. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2013-10-30Merge branch '6lowpan'David S. Miller2-21/+13
Alexander Aring says: ==================== This patch series cleanup the 6LoWPAN header creation and extend the use of skb_*_header functions. Patch 2/4 fix issues of parsing the mac header. The ieee802.15.4 header has a dynamic size which depends on frame control bits. This patch replaces the static mac header len calculation with a dynamic one. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-306lowpan: cleanup skb copy dataAlexander Aring1-5/+8
This patch drops the direct memcpy on skb and uses the right skb memcpy functions. Also remove an unnecessary check if plen is non zero. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-306lowpan: set 6lowpan network and transport headerAlexander Aring1-0/+2
This is necessary to access network header with the skb_network_header function instead of calculate the position with mac_len, etc. Do the same for the transport header, when we replace the IPv6 header with the 6LoWPAN header. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-306lowpan: set and use mac_len for mac header lengthAlexander Aring2-12/+3
Set the mac header length while creating the 802.15.4 mac header. Drop the function for recalculate mac header length in upper layers which was static and works for intra pan communication only. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-306lowpan: remove unnecessary set of headersAlexander Aring1-4/+0
On receiving side we don't need to set any headers in skb because the 6LoWPAN layer do not access it. Currently these values will set twice after calling netif_rx. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30ipv6: remove the unnecessary statement in find_match()Duan Jiong1-1/+1
After reading the function rt6_check_neigh(), we can know that the RT6_NUD_FAIL_SOFT can be returned only when the IS_ENABLE(CONFIG_IPV6_ROUTER_PREF) is false. so in function find_match(), there is no need to execute the statement !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF). Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30mac802154: Use pr_err(...) rather than printk(KERN_ERR ...)Chen Weilong1-4/+2
This change is inspired by checkpatch. Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30bgmac: pass received packet to the netif instead of copying itRafał Miłecki1-27/+39
Copying whole packets with skb_copy_from_linear_data_offset is a pretty bad idea. CPU was spending time in __copy_user_common and network performance was lower. With the new solution iperf-measured speed increased from 116Mb/s to 134Mb/s. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-30tipc: remove two indentation levels in tipc_recv_msg routineYing Xue1-89/+84
The message dispatching part of tipc_recv_msg() is wrapped layers of while/if/if/switch, causing out-of-control indentation and does not look very good. We reduce two indentation levels by separating the message dispatching from the blocks that checks link state and sequence numbers, allowing longer function and arg names to be consistently indented without wrapping. Additionally we also rename "cont" label to "discard" and add one new label called "unlock_discard" to make code clearer. In all, these are cosmetic changes that do not alter the operation of TIPC in any way. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Cc: David Laight <david.laight@aculab.com> Cc: Andreas Bofjäll <andreas.bofjall@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29arc_emac: drop redundant mac address checkLuka Perkov1-3/+3
Checking if MAC address is valid using is_valid_ether_addr() is already done in of_get_mac_address(). While at it, reorganize checking so it matches checks in other drivers. Signed-off-by: Luka Perkov <luka@openwrt.org> CC: Alexey Brodkin <Alexey.Brodkin@synopsys.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29mvneta: drop redundant mac address checkLuka Perkov1-1/+1
Checking if MAC address is valid using is_valid_ether_addr() is already done in of_get_mac_address(). Signed-off-by: Luka Perkov <luka@openwrt.org> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29octeon_mgmt: drop redundant mac address checkLuka Perkov1-1/+1
Checking if MAC address is valid using is_valid_ether_addr() is already done in of_get_mac_address(). Signed-off-by: Luka Perkov <luka@openwrt.org> Acked-by: David Daney <david.daney@cavium.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29tcp: temporarily disable Fast Open on SYN timeoutYuchung Cheng2-3/+8
Fast Open currently has a fall back feature to address SYN-data being dropped but it requires the middle-box to pass on regular SYN retry after SYN-data. This is implemented in commit aab487435 ("net-tcp: Fast Open client - detecting SYN-data drops") However some NAT boxes will drop all subsequent packets after first SYN-data and blackholes the entire connections. An example is in commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast Open". The sender should note such incidents and fall back to use the regular TCP handshake on subsequent attempts temporarily as well: after the second SYN timeouts the original Fast Open SYN is most likely lost. When such an event recurs Fast Open is disabled based on the number of recurrences exponentially. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29Merge branch 'master' of ↵David S. Miller9-81/+366
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== This series contains updates to vxlan, net, ixgbe, ixgbevf, and i40e. Joseph provides a single patch against vxlan which removes the burden from the NIC drivers to check if the vxlan driver is enabled in the kernel and also makes available the vxlan headrooms to the drivers. Jacob provides majority of the patches, with patches against net, ixgbe and ixgbevf. His net patch adds might_sleep() call to napi_disable so that every use of napi_disable during atomic context will be visible. Then Jacob provides a patch to fix qv_lock_napi call in ixgbe_napi_disable_all. The other ixgbe patches cleanup ixgbe_check_minimum_link function to correctly show that there are some minor loss of encoding, even though we don't calculate it and remove unnecessary duplication of PCIe bandwidth display. Lastly, Jacob provides 4 patches against ixgbevf to add ixgbevf_rx_skb in line with how ixgbe handles the variations on how packets can be received, adds support in order to track how many packets were cleaned during busy poll as part of the extended statistics. Wei Yongjun provides a fix for i40e to return -ENOMEN in the memory allocation error handling case instead of returning 0, as done elsewhere in this function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net: mvmdio: doc: mvmdio now used by mv643xx_ethLeigh Brown1-5/+3
Amend the documentation in the mvmdio driver to note the fact that it is now used by both the mvneta and mv643xx_eth drivers. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net: mvmdio: slight optimisation of orion_mdio_writeLeigh Brown1-6/+4
Make only a single call to mutex_unlock in orion_mdio_write. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net: mvmdio: orion_mdio_ready: remove manual pollLeigh Brown1-21/+13
Replace manual poll of MVMDIO_SMI_READ_VALID with a call to orion_mdio_wait_ready. This ensures a consistent timeout, eliminates a busy loop, and allows for use of interrupts on systems that support them. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net: mvmdio: make orion_mdio_wait_ready consistentLeigh Brown1-22/+30
Amend orion_mdio_wait_ready so that the same timeout is used when polling or using wait_event_timeout. Set the timeout to 1ms. Replace udelay with usleep_range to avoid a busy loop, and set the polling interval range as 45us to 55us, so that the first sleep will be enough in almost all cases. Generate the same log message at timeout when polling or using wait_event_timeout. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net/benet: Make lancer_wait_ready() staticGavin Shan2-2/+1
The function needn't to be public, so to make it as static. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net/benet: Remove interface typeGavin Shan2-6/+0
The interface type, which is being traced by "struct be_adapter:: if_type", isn't used currently. So we can remove that safely according to Sathya's comments. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29netconsole: Convert to pr_<level>Joe Perches1-30/+27
Use a more current logging style. Convert printks to pr_<level>. Consolidate multiple printks into a single printk to avoid any possible dmesg interleaving. Add a default "event" msg in case the listed types are ever expanded. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net: sched: cls_bpf: add BPF-based classifierDaniel Borkmann4-0/+410
This work contains a lightweight BPF-based traffic classifier that can serve as a flexible alternative to ematch-based tree classification, i.e. now that BPF filter engine can also be JITed in the kernel. Naturally, tc actions and policies are supported as well with cls_bpf. Multiple BPF programs/filter can be attached for a class, or they can just as well be written within a single BPF program, that's really up to the user how he wishes to run/optimize the code, e.g. also for inversion of verdicts etc. The notion of a BPF program's return/exit codes is being kept as follows: 0: No match -1: Select classid given in "tc filter ..." command else: flowid, overwrite the default one As a minimal usage example with iproute2, we use a 3 band prio root qdisc on a router with sfq each as leave, and assign ssh and icmp bpf-based filters to band 1, http traffic to band 2 and the rest to band 3. For the first two bands we load the bytecode from a file, in the 2nd we load it inline as an example: echo 1 > /proc/sys/net/core/bpf_jit_enable tc qdisc del dev em1 root tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 tc qdisc add dev em1 parent 1:1 sfq perturb 16 tc qdisc add dev em1 parent 1:2 sfq perturb 16 tc qdisc add dev em1 parent 1:3 sfq perturb 16 tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1 tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1 tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2 tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3 BPF programs can be easily created and passed to tc, either as inline 'bytecode' or 'bytecode-file'. There are a couple of front-ends that can compile opcodes, for example: 1) People familiar with tcpdump-like filters: tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf 2) People that want to low-level program their filters or use BPF extensions that lack support by libpcap's compiler: bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf ssh.ops example code: ldh [12] jne #0x800, drop ldb [23] jneq #6, drop ldh [20] jset #0x1fff, drop ldxb 4 * ([14] & 0xf) ldh [%x + 14] jeq #0x16, pass ldh [%x + 16] jne #0x16, drop pass: ret #-1 drop: ret #0 It was chosen to load bytecode into tc, since the reverse operation, tc filter list dev em1, is then able to show the exact commands again. Possible follow-up work could also include a small expression compiler for iproute2. Tested with the help of bmon. This idea came up during the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from Eric Dumazet! Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29bgmac: separate RX descriptor setup code into a new functionRafał Miłecki1-19/+22
This cleans code a bit and will be useful when allocating buffers in other places (like RX path, to avoid skb_copy_from_linear_data_offset). Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29i40e: fix error return code in i40e_probe()Wei Yongjun1-1/+3
Fix to return -ENOMEM in the memory alloc error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29ixgbevf: Add zero_base handler to network statisticsDon Skidmore2-34/+45
This patch removes the need to keep a zero_base variable in the adapter structure. Now we just use two different macros to set the non-zero and zero base. This adds to readability and shortens some of the structure initialization under 80 columns. The gathering of status for ethtool was slightly modified to again better fit into 80 columns and become a bit more readable. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29ixgbevf: add BP_EXTENDED_STATS for CONFIG_NET_RX_BUSY_POLLJacob Keller3-0/+60
This patch adds the extended statistics similar to the ixgbe driver. These statistics keep track of how often the busy polling yields, as well as how many packets are cleaned or missed by the polling routine. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29ixgbevf: implement CONFIG_NET_RX_BUSY_POLLJacob Keller2-0/+177
This patch enables CONFIG_NET_RX_BUSY_POLL support in the VF code. This enables sockets which have enabled the SO_BUSY_POLL socket option to use the ndo_busy_poll_recv operation which could result in lower latency, at the cost of higher CPU utilization, and increased power usage. This support is similar to how the ixgbe driver works. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29ixgbevf: have clean_rx_irq return total_rx_packets cleanedJacob Keller1-6/+7
Rather than return true/false indicating whether there was budget left, return the total packets cleaned. This currently has no use, but will be used in a following patch which enables CONFIG_NET_RX_BUSY_POLL support in order to track how many packets were cleaned during the busy poll as part of the extended statistics. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29ixgbevf: add ixgbevf_rx_skbJacob Keller1-1/+15
This patch adds ixgbevf_rx_skb in line with how ixgbe handles the variations on how packets can be received. It will be extended in a following patch for CONFIG_NET_RX_BUSY_POLL support. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29ixgbe: remove unnecessary duplication of PCIe bandwidth displayJacob Keller1-23/+13
This patch removes the unnecessary display of PCIe bandwidth twice. Since the ixgbe_check_minimum_link does a better job, and ensures accurate detection on even complex chains, this older check is no longer necessary. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29ixgbe: show <2% for encoding loss on PCIe Gen3Jacob Keller1-2/+2
This patch updates the ixgbe_check_minimum_link function to correctly show that there is some minor loss of encoding, even though we don't calculate it in the max GT/s equation. It is small enough to not bother, but is better to report it than not. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29ixgbe: fix qv_lock_napi call in ixgbe_napi_disable_allJacob Keller2-16/+38
ixgbe_napi_disable_all calls napi_disable on each queue, however the busy polling code introduced a local_bh_disable()d context around the napi_disable. The original author did not realize that napi_disable might sleep, which would cause a sleep while atomic BUG. In addition, on a single processor system, the ixgbe_qv_lock_napi loop shouldn't have to mdelay. This patch adds an ixgbe_qv_disable along with a new IXGBE_QV_STATE_DISABLED bit, which it uses to indicate to the poll and napi routines that the q_vector has been disabled. Now the ixgbe_napi_disable_all function will wait until all pending work has been finished and prevent any future work from being started. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: Alexander Duyck <alexander.duyck@intel.com> Cc: Hyong-Youb Kim <hykim@myri.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Dmitry Kravkov <dmitry@broadcom.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29net: add might_sleep() call to napi_disableJacob Keller1-0/+1
napi_disable uses an msleep() call to wait for outstanding napi work to be finished after setting the disable bit. It does not always sleep incase there was no outstanding work. This resulted in a rare bug in ixgbe_down operation where a napi_disable call took place inside of a local_bh_disable()d context. In order to enable easier detection of future sleep while atomic BUGs, this patch adds a might_sleep() call, so that every use of napi_disable during atomic context will be visible. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: Alexander Duyck <alexander.duyck@intel.com> Cc: Hyong-Youb Kim <hykim@myri.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Dmitry Kravkov <dmitry@broadcom.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29vxlan: Have the NIC drivers do less work for offloadsJoseph Gasparakis2-4/+11
This patch removes the burden from the NIC drivers to check if the vxlan driver is enabled in the kernel and also makes available the vxlan headrooms to them. Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29net, mc: fix the incorrect comments in two mc-related functionsZhi Yong Wu1-2/+2
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net, iovec: fix the incorrect comment in memcpy_fromiovecend()Zhi Yong Wu1-1/+1
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29net, datagram: fix the incorrect comment in zerocopy_sg_from_iovec()Zhi Yong Wu1-1/+1
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29vxlan: silence one build warningZhi Yong Wu1-17/+14
drivers/net/vxlan.c: In function ‘vxlan_sock_add’: drivers/net/vxlan.c:2298:11: warning: ‘sock’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/net/vxlan.c:2275:17: note: ‘sock’ was declared here LD drivers/net/built-in.o Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORKHannes Frederic Sowa1-4/+8
UFO as well as UDP_CORK do not respect IP_PMTUDISC_DO and IP_PMTUDISC_PROBE well enough. UFO enabled packet delivery just appends all frags to the cork and hands it over to the network card. So we just deliver non-DF udp fragments (DF-flag may get overwritten by hardware or virtual UFO enabled interface). UDP_CORK does enqueue the data until the cork is disengaged. At this point it sets the correct IP_DF and local_df flags and hands it over to ip_fragment which in this case will generate an icmp error which gets appended to the error socket queue. This is not reflected in the syscall error (of course, if UFO is enabled this also won't happen). Improve this by checking the pmtudisc flags before appending data to the socket and if we still can fit all data in one packet when IP_PMTUDISC_DO or IP_PMTUDISC_PROBE is set, only then proceed. We use (mtu-fragheaderlen) to check for the maximum length because we ensure not to generate a fragment and non-fragmented data does not need to have its length aligned on 64 bit boundaries. Also the passed in ip_options are already aligned correctly. Maybe, we can relax some other checks around ip_fragment. This needs more research. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-28virtio_net: migrate mergeable rx buffers to page frag allocatorsMichael Dalton1-58/+106
The virtio_net driver's mergeable receive buffer allocator uses 4KB packet buffers. For MTU-sized traffic, SKB truesize is > 4KB but only ~1500 bytes of the buffer is used to store packet data, reducing the effective TCP window size substantially. This patch addresses the performance concerns with mergeable receive buffers by allocating MTU-sized packet buffers using page frag allocators. If more than MAX_SKB_FRAGS buffers are needed, the SKB frag_list is used. Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>