aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-05-04Merge branch 'master' of ↵David S. Miller3-1/+34
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2016-05-04 1) The flowcache can hit an OOM condition if too many entries are in the gc_list. Fix this by counting the entries in the gc_list and refuse new allocations if the value is too high. 2) The inner headers are invalid after a xfrm transformation, so reset the skb encapsulation field to ensure nobody tries access the inner headers. Otherwise tunnel devices stacked on top of xfrm may build the outer headers based on wrong informations. 3) Add pmtu handling to vti, we need it to report pmtu informations for local generated packets. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-05-04Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller17-231/+273
Antonio Quartulli says: ==================== pull request: batman-adv 20160504 In this pull request you have: - two changes to the MAINTAINERS file where one marks our mailing list as moderated and the other adds a missing documentation file - kernel-doc fixes - code refactoring and various cleanups ==================== Signed-off-by: David S. Miller <[email protected]>
2016-05-04net: fix infoleak in rtnetlinkKangjie Lu1-8/+10
The stack object “map” has a total size of 32 bytes. Its last 4 bytes are padding generated by compiler. These padding bytes are not initialized and sent out via “nla_put”. Signed-off-by: Kangjie Lu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04net: fix infoleak in llcKangjie Lu1-0/+1
The stack object “info” has a total size of 12 bytes. Its last byte is padding which is not initialized and leaked via “put_cmsg”. Signed-off-by: Kangjie Lu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04net: remove dev->trans_startFlorian Westphal1-7/+3
previous patches removed all direct accesses to dev->trans_start, so change the netif_trans_update helper to update trans_start of netdev queue 0 instead and then remove trans_start from struct net_device. AFAICS a lot of the netif_trans_update() invocations are now useless because they occur in ndo_start_xmit and driver doesn't set LLTX (i.e. stack already took care of the update). As I can't test any of them it seems better to just leave them alone. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04treewide: replace dev->trans_start update with helperFlorian Westphal5-6/+6
Replace all trans_start updates with netif_trans_update helper. change was done via spatch: struct net_device *d; @@ - d->trans_start = jiffies + netif_trans_update(d) Compile tested only. Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Florian Westphal <[email protected]> Acked-by: Felipe Balbi <[email protected]> Acked-by: Mugunthan V N <[email protected]> Acked-by: Antonio Quartulli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04gre6: add Kconfig dependency for NET_IPGRE_DEMUXArnd Bergmann1-0/+1
The ipv6 gre implementation was cleaned up to share more code with the ipv4 version, but it can be enabled even when NET_IPGRE_DEMUX is disabled, resulting in a link error: net/built-in.o: In function `gre_rcv': :(.text+0x17f5d0): undefined reference to `gre_parse_header' ERROR: "gre_parse_header" [net/ipv6/ip6_gre.ko] undefined! This adds a Kconfig dependency to prevent that now invalid configuration. Signed-off-by: Arnd Bergmann <[email protected]> Fixes: 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions") Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04gre: receive also TEB packets for lwtunnelsJiri Benc1-11/+28
For ipgre interfaces in collect metadata mode, receive also traffic with encapsulated Ethernet headers. The lwtunnel users are supposed to sort this out correctly. This allows to have mixed Ethernet + L3-only traffic on the same lwtunnel interface. This is the same way as VXLAN-GPE behaves. To keep backwards compatibility and prevent any surprises, gretap interfaces have priority in receiving packets with Ethernet headers. Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04gre: move iptunnel_pull_header down to ipgre_rcvJiri Benc1-5/+10
This will allow to make the pull dependent on the tunnel type. Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04gre: remove superfluous pskb_may_pullJiri Benc1-4/+1
The call to gre_parse_header is either followed by iptunnel_pull_header, or in the case of ICMP error path, the actual header is not accessed at all. In the first case, iptunnel_pull_header will call pskb_may_pull anyway and it's pointless to do it twice. The only difference is what call will fail with what error code but the net effect is still the same in all call sites. In the second case, pskb_may_pull is pointless, as skb->data is at the outer IP header and not at the GRE header. Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04net: Fix netdev_fix_features so that TSO_MANGLEID is only available with TSOAlexander Duyck1-0/+4
This change makes it so that we will strip the TSO_MANGLEID bit if TSO is not present. This way we will also handle ECN correctly of TSO is not present. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04gso: Only allow GSO_PARTIAL if we can checksum the inner protocolAlexander Duyck1-3/+3
This patch addresses a possible issue that can occur if we get into any odd corner cases where we support TSO for a given protocol but not the checksum or scatter-gather offload. There are few drivers floating around that setup their tunnels this way and by enforcing the checksum piece we can avoid mangling any frames. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04gso: Do not perform partial GSO if number of partial segments is 1 or lessAlexander Duyck1-1/+4
In the event that the number of partial segments is equal to 1 we don't really need to perform partial segmentation offload. As such we should skip multiplying the MSS and instead just clear the partial_segs value since it will not provide any gain to advertise the frame as being GSO when it is a single frame. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04gre: change gre_parse_header to return the header lengthJiri Benc3-10/+8
It's easier for gre_parse_header to return the header length instead of filing it into a parameter. That way, the callers that don't care about the header length can just check whether the returned value is lower than zero. In gre_err, the tunnel header must not be pulled. See commit b7f8fe251e46 ("gre: do not pull header in ICMP error processing") for details. This patch reduces the conflict between the mentioned commit and commit 95f5c64c3c13 ("gre: Move utility functions to common headers"). Signed-off-by: Jiri Benc <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04tcp: guarantee forward progress in tcp_sendmsg()Eric Dumazet1-2/+5
Under high rx pressure, it is possible tcp_sendmsg() never has a chance to allocate an skb and loop forever as sk_flush_backlog() would always return true. Fix this by calling sk_flush_backlog() only if one skb had been allocated and filled before last backlog check. Fixes: d41a69f1d390 ("tcp: make tcp_sendmsg() aware of socket backlog") Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller28-151/+256
Conflicts: net/ipv4/ip_gre.c Minor conflicts between tunnel bug fixes in net and ipv6 tunnel cleanups in net-next. Signed-off-by: David S. Miller <[email protected]>
2016-05-04nfc: nci: Add nci_nfcc_loopback to the nci coreChristophe Ricard1-0/+77
For test purpose, provide the generic nci loopback function. Signed-off-by: Christophe Ricard <[email protected]> Signed-off-by: Samuel Ortiz <[email protected]>
2016-05-04nfc: nci: Add an additional parameter to identify a connection idChristophe Ricard3-11/+36
According to NCI specification, destination type and destination specific parameters shall uniquely identify a single destination for the Logical Connection. Signed-off-by: Christophe Ricard <[email protected]> Signed-off-by: Samuel Ortiz <[email protected]>
2016-05-04nfc: nci: Fix nci_core_conn_closeChristophe Ricard2-1/+3
nci_core_conn_close was not retrieving a conn_info using the correct connection id. Signed-off-by: Christophe Ricard <[email protected]> Signed-off-by: Samuel Ortiz <[email protected]>
2016-05-04nfc: nci: Fix nci_core_conn_create to allowing empty destinationChristophe Ricard1-9/+9
NCI_CORE_CONN_CREATE may not have any destination type parameter. Signed-off-by: Christophe Ricard <[email protected]> Signed-off-by: Samuel Ortiz <[email protected]>
2016-05-03ipv6/ila: fix nlsize calculation for lwtunnelNicolas Dichtel1-2/+1
The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR. Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module") CC: Tom Herbert <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03ipv6: add new struct ipcm6_cookieWei Wang9-101/+110
In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local variables like hlimits, tclass, opt and dontfrag and pass them to corresponding functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames. This is not a good practice and makes it hard to add new parameters. This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in ipv4 and include the above mentioned variables. And we only pass the pointer to this structure to corresponding functions. This makes it easier to add new parameters in the future and makes the function cleaner. Signed-off-by: Wei Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock.Sowmini Varadhan4-10/+33
An arbitration scheme for duelling SYNs is implemented as part of commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()") which ensures that both nodes involved will arrive at the same arbitration decision. However, this needs to be synchronized with an outgoing SYN to be generated by rds_tcp_conn_connect(). This commit achieves the synchronization through the t_conn_lock mutex in struct rds_tcp_connection. The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring the t_conn_lock mutex. A SYN is sent out only if the RDS connection is not already UP (an UP would indicate that rds_tcp_accept_one() has completed 3WH, so no SYN needs to be generated). Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after acquiring the t_conn_lock mutex. The only acceptable states (to allow continuation of the arbitration logic) are UP (i.e., outgoing SYN was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent outgoing SYN before we saw incoming SYN). Signed-off-by: Sowmini Varadhan <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sockSowmini Varadhan2-17/+25
There is a race condition between rds_send_xmit -> rds_tcp_xmit and the code that deals with resolution of duelling syns added by commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()"). Specifically, we may end up derefencing a null pointer in rds_send_xmit if we have the interleaving sequence: rds_tcp_accept_one rds_send_xmit conn is RDS_CONN_UP, so invoke rds_tcp_xmit tc = conn->c_transport_data rds_tcp_restore_callbacks /* reset t_sock */ null ptr deref from tc->t_sock The race condition can be avoided without adding the overhead of additional locking in the xmit path: have rds_tcp_accept_one wait for rds_tcp_xmit threads to complete before resetting callbacks. The synchronization can be done in the same manner as rds_conn_shutdown(). First set the rds_conn_state to something other than RDS_CONN_UP (so that new threads cannot get into rds_tcp_xmit()), then wait for RDS_IN_XMIT to be cleared in the conn->c_flags indicating that any threads in rds_tcp_xmit are done. Fixes: 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()") Signed-off-by: Sowmini Varadhan <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03net: add __sock_wfree() helperEric Dumazet2-1/+25
Hosts sending lot of ACK packets exhibit high sock_wfree() cost because of cache line miss to test SOCK_USE_WRITE_QUEUE We could move this flag close to sk_wmem_alloc but it is better to perform the atomic_sub_and_test() on a clean cache line, as it avoid one extra bus transaction. skb_orphan_partial() can also have a fast track for packets that either are TCP acks, or already went through another skb_orphan_partial() Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03net: Disable segmentation if checksumming is not supportedAlexander Duyck1-1/+1
In the case of the mlx4 and mlx5 driver they do not support IPv6 checksum offload for tunnels. With this being the case we should disable GSO in addition to the checksum offload features when we find that a device cannot perform a checksum on a given packet type. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03tipc: redesign connection-level flow controlJon Paul Maloy5-62/+122
There are two flow control mechanisms in TIPC; one at link level that handles network congestion, burst control, and retransmission, and one at connection level which' only remaining task is to prevent overflow in the receiving socket buffer. In TIPC, the latter task has to be solved end-to-end because messages can not be thrown away once they have been accepted and delivered upwards from the link layer, i.e, we can never permit the receive buffer to overflow. Currently, this algorithm is message based. A counter in the receiving socket keeps track of number of consumed messages, and sends a dedicated acknowledge message back to the sender for each 256 consumed message. A counter at the sending end keeps track of the sent, not yet acknowledged messages, and blocks the sender if this number ever reaches 512 unacknowledged messages. When the missing acknowledge arrives, the socket is then woken up for renewed transmission. This works well for keeping the message flow running, as it almost never happens that a sender socket is blocked this way. A problem with the current mechanism is that it potentially is very memory consuming. Since we don't distinguish between small and large messages, we have to dimension the socket receive buffer according to a worst-case of both. I.e., the window size must be chosen large enough to sustain a reasonable throughput even for the smallest messages, while we must still consider a scenario where all messages are of maximum size. Hence, the current fix window size of 512 messages and a maximum message size of 66k results in a receive buffer of 66 MB when truesize(66k) = 131k is taken into account. It is possible to do much better. This commit introduces an algorithm where we instead use 1024-byte blocks as base unit. This unit, always rounded upwards from the actual message size, is used when we advertise windows as well as when we count and acknowledge transmitted data. The advertised window is based on the configured receive buffer size in such a way that even the worst-case truesize/msgsize ratio always is covered. Since the smallest possible message size (from a flow control viewpoint) now is 1024 bytes, we can safely assume this ratio to be less than four, which is the value we are now using. This way, we have been able to reduce the default receive buffer size from 66 MB to 2 MB with maintained performance. In order to keep this solution backwards compatible, we introduce a new capability bit in the discovery protocol, and use this throughout the message sending/reception path to always select the right unit. Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03tipc: propagate peer node capabilities to socket layerJon Paul Maloy3-2/+22
During neighbor discovery, nodes advertise their capabilities as a bit map in a dedicated 16-bit field in the discovery message header. This bit map has so far only be stored in the node structure on the peer nodes, but we now see the need to keep a copy even in the socket structure. This commit adds this functionality. Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03tipc: re-enable compensation for socket receive buffer double countingJon Paul Maloy1-1/+1
In the refactoring commit d570d86497ee ("tipc: enqueue arrived buffers in socket in separate function") we did by accident replace the test if (sk->sk_backlog.len == 0) atomic_set(&tsk->dupl_rcvcnt, 0); with if (sk->sk_backlog.len) atomic_set(&tsk->dupl_rcvcnt, 0); This effectively disables the compensation we have for the double receive buffer accounting that occurs temporarily when buffers are moved from the backlog to the socket receive queue. Until now, this has gone unnoticed because of the large receive buffer limits we are applying, but becomes indispensable when we reduce this buffer limit later in this series. We now fix this by inverting the mentioned condition. Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03Remove unnecessary allocationJ. Bruce Fields1-3/+2
Reported-by: Benjamin Coddington <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-05-04batman-adv: Split batadv_iv_ogm_orig_del_if functionSven Eckelmann1-39/+76
batadv_iv_ogm_orig_del_if handles two different buffers bcast_own and bcast_own_sum which should be resized. The error handling two for allocating these buffers causes the complexity of this function. This can be avoided completely when the function is split into a main function handling the locking, freeing and call of the subfunctions. The subfunction can then independently handle the resize of the buffers. This also allows to easily reuse the old buffer (which always is larger) in case a smaller buffer could not be allocated without increasing the code complexity. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: Merge batadv_v_ogm_orig_update into batadv_v_ogm_route_updateSimon Wunderlich1-71/+46
Since batadv_v_ogm_orig_update() was only called from one place and the calling function became very short, merge these two functions together. This should also reflect the protocol description of B.A.T.M.A.N. V better. Signed-off-by: Simon Wunderlich <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: move and restructure batadv_v_ogm_forwardSimon Wunderlich1-47/+63
To match our code better to the protocol description of B.A.T.M.A.N. V, move batadv_v_ogm_forward() out into batadv_v_ogm_process_per_outif() and move all checks directly deciding whether the OGM should be forwarded into batadv_v_ogm_forward(). Signed-off-by: Simon Wunderlich <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: fix debuginfo macro style issueSimon Wunderlich1-8/+11
Structure initialization within the macros should follow the general coding style used in the kernel: put the initialization of the first variable and the closing brace on a separate line. Reported-by: Antonio Quartulli <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]> [[email protected]: fix conflicts with current version] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: Fix function names on new line starting with '*'Sven Eckelmann3-16/+16
Some really long function names in batman-adv require a newline between return type and the function name. This has lead to some lines starting with *batadv_... This * belongs to the return type and thus should be on the same line as the return type. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: Add kernel-doc for batadv_interface_rxSven Eckelmann1-0/+18
Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: Fix kerneldoc for batadv_compare_claimSven Eckelmann1-1/+1
Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: Fix checkpatch warning about 'unsigned' typeSven Eckelmann1-6/+6
checkpatch.pl warns about the use of 'unsigned' as a short form for 'unsigned int'. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: fix wrong names in kerneldocAntonio Quartulli8-12/+14
Signed-off-by: Antonio Quartulli <[email protected]> [[email protected]: Fix additional names] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-05-04batman-adv: use to_delayed_workGeliang Tang6-7/+7
Use to_delayed_work() instead of open-coding it. Signed-off-by: Geliang Tang <[email protected]> Reviewed-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: use list_for_each_entry_safeGeliang Tang1-13/+9
Use list_for_each_entry_safe() instead of list_for_each_safe() to simplify the code. Signed-off-by: Geliang Tang <[email protected]> Acked-by: Antonio Quartulli <[email protected]> Reviewed-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-04batman-adv: use static string for table headersAntonio Quartulli5-21/+16
Use a static string when showing table headers rather then a nonsense parametric one with fixed arguments. It is easier to grep and it does not need to be recomputed at runtime each time. Reported-by: Joe Perches <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]> [[email protected]: fix conflicts with current version] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-05-04batman-adv: Start new development cycleSimon Wunderlich1-1/+1
Signed-off-by: Simon Wunderlich <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-05-03VSOCK: constify vsock_transport structureJulia Lawall1-1/+1
The vsock_transport structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03fq_codel: add batch ability to fq_codel_drop()Eric Dumazet1-19/+45
In presence of inelastic flows and stress, we can call fq_codel_drop() for every packet entering fq_codel qdisc. fq_codel_drop() is quite expensive, as it does a linear scan of 4 KB of memory to find a fat flow. Once found, it drops the oldest packet of this flow. Instead of dropping a single packet, try to drop 50% of the backlog of this fat flow, with a configurable limit of 64 packets per round. TCA_FQ_CODEL_DROP_BATCH_SIZE is the new attribute to make this limit configurable. With this strategy the 4 KB search is amortized to a single cache line per drop [1], so fq_codel_drop() no longer appears at the top of kernel profile in presence of few inelastic flows. [1] Assuming a 64byte cache line, and 1024 buckets Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dave Taht <[email protected]> Cc: Jonathan Morton <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Dave Taht Signed-off-by: David S. Miller <[email protected]>
2016-05-03netem: Segment GSO packets on enqueueNeil Horman1-2/+59
This was recently reported to me, and reproduced on the latest net kernel, when attempting to run netperf from a host that had a netem qdisc attached to the egress interface: [ 788.073771] ---------------------[ cut here ]--------------------------- [ 788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda() [ 788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962 data_len=0 gso_size=1448 gso_type=1 ip_summed=3 [ 788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log dm_mod [ 788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G W ------------ 3.10.0-327.el7.x86_64 #1 [ 788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012 [ 788.542260] ffff880437c036b8 f7afc56532a53db9 ffff880437c03670 ffffffff816351f1 [ 788.576332] ffff880437c036a8 ffffffff8107b200 ffff880633e74200 ffff880231674000 [ 788.611943] 0000000000000001 0000000000000003 0000000000000000 ffff880437c03710 [ 788.647241] Call Trace: [ 788.658817] <IRQ> [<ffffffff816351f1>] dump_stack+0x19/0x1b [ 788.686193] [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0 [ 788.713803] [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80 [ 788.741314] [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100 [ 788.767018] [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda [ 788.796117] [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190 [ 788.823392] [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem] [ 788.854487] [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570 [ 788.880870] [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0 ... The problem occurs because netem is not prepared to handle GSO packets (as it uses skb_checksum_help in its enqueue path, which cannot manipulate these frames). The solution I think is to simply segment the skb in a simmilar fashion to the way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes. When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt the first segment, and enqueue the remaining ones. tested successfully by myself on the latest net kernel, to which this applies Signed-off-by: Neil Horman <[email protected]> CC: Jamal Hadi Salim <[email protected]> CC: "David S. Miller" <[email protected]> CC: [email protected] CC: [email protected] CC: [email protected] Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03net: relax expensive skb_unclone() in iptunnel_handle_offloads()Eric Dumazet1-1/+1
Locally generated TCP GSO packets having to go through a GRE/SIT/IPIP tunnel have to go through an expensive skb_unclone() Reallocating skb->head is a lot of work. Test should really check if a 'real clone' of the packet was done. TCP does not care if the original gso_type is changed while the packet travels in the stack. This adds skb_header_unclone() which is a variant of skb_clone() using skb_header_cloned() check instead of skb_cloned(). This variant can probably be used from other points. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-03Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller6-56/+41
Antonio Quartulli says: ==================== In this small batch of patches you have: - a fix for our Distributed ARP Table that makes sure that the input provided to the hash function during a query is the same as the one provided during an insert (so to prevent false negatives), by Antonio Quartulli - a fix for our new protocol implementation B.A.T.M.A.N. V that ensures that a hard interface is properly re-activated when it is brought down and then up again, by Antonio Quartulli - two fixes respectively to the reference counting of the tt_local_entry and neigh_node objects, by Sven Eckelmann. Such bug is rather severe as it would prevent the netdev objects references by batman-adv from being released after shutdown. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-05-02bridge: netlink: export per-vlan statsNikolay Aleksandrov3-0/+99
Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and get_linkxstats_size) in order to export the per-vlan stats. The paddings were added because soon these fields will be needed for per-port per-vlan stats (or something else if someone beats me to it) so avoiding at least a few more netlink attributes. Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-05-02bridge: vlan: learn to countNikolay Aleksandrov4-16/+109
Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets allocated a per-cpu stats which is then set in each per-port vlan context for quick access. The br_allowed_ingress() common function is used to account for Rx packets and the br_handle_vlan() common function is used to account for Tx packets. Stats accounting is performed only if the bridge-wide vlan_stats_enabled option is set either via sysfs or netlink. A struct hole between vlan_enabled and vlan_proto is used for the new option so it is in the same cache line. Currently it is binary (on/off) but it is intentionally restricted to exactly 0 and 1 since other values will be used in the future for different purposes (e.g. per-port stats). Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>