aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2014-10-02netfilter: nf_tables: allow to filter from prerouting and postroutingPablo Neira Ayuso1-0/+2
This allows us to emulate the NAT table in ebtables, which is actually a plain filter chain that hooks at prerouting, output and postrouting. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-10-02netfilter: nft_compat: remove incomplete 32/64 bits arch compat codePablo Neira Ayuso1-101/+15
This code was based on the wrong asumption that you can probe based on the match/target private size that we get from userspace. This doesn't work at all when you have to dump the info back to userspace since you don't know what word size the userspace utility is using. Currently, the extensions that require arch compat are limit match and the ebt_mark match/target. The standard targets are not used by the nft-xt compat layer, so they are not affected. We can work around this limitation with a new revision that uses arch agnostic types. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-10-02netfilter: nf_tables: wait for call_rcu completion on module removalPablo Neira Ayuso1-0/+1
Make sure the objects have been released before the nf_tables modules is removed. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-10-02netfilter: use IS_ENABLED(CONFIG_BRIDGE_NETFILTER)Pablo Neira Ayuso10-20/+20
In 34666d4 ("netfilter: bridge: move br_netfilter out of the core"), the bridge netfilter code has been modularized. Use IS_ENABLED instead of ifdef to cover the module case. Fixes: 34666d4 ("netfilter: bridge: move br_netfilter out of the core") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-10-02netfilter: move nf_send_resetX() code to nf_reject_ipvX modulesPablo Neira Ayuso6-0/+308
Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and nf_reject_ipv6 respectively. This code is shared by x_tables and nf_tables. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-10-02netfilter: nft_reject: introduce icmp code abstraction for inet and bridgePablo Neira Ayuso4-10/+217
This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides an abstraction to the ICMP and ICMPv6 codes that you can use from the inet and bridge tables, they are: * NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable * NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable * NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable * NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited You can still use the specific codes when restricting the rule to match the corresponding layer 3 protocol. I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have different semantics depending on the table family and to allow the user to specify ICMP family specific codes if they restrict it to the corresponding family. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-10-02Bluetooth: 6lowpan: Check transmit errors for multicast packetsJukka Rissanen1-3/+10
We did not return error if multicast packet transmit failed. This might not be desired so return error also in this case. If there are multiple 6lowpan devices where the multicast packet is sent, then return error even if sending to only one of them fails. Signed-off-by: Jukka Rissanen <[email protected]> Signed-off-by: Johan Hedberg <[email protected]>
2014-10-02Bluetooth: 6lowpan: Return EAGAIN error also for multicast packetsJukka Rissanen1-11/+5
Make sure that we are able to return EAGAIN from l2cap_chan_send() even for multicast packets. The error code was ignored unncessarily. Signed-off-by: Jukka Rissanen <[email protected]> Signed-off-by: Johan Hedberg <[email protected]>
2014-10-02Bluetooth: 6lowpan: Avoid memory leak if memory allocation failsJukka Rissanen1-2/+6
If skb_unshare() returns NULL, then we leak the original skb. Solution is to use temp variable to hold the new skb. Signed-off-by: Jukka Rissanen <[email protected]> Signed-off-by: Johan Hedberg <[email protected]>
2014-10-02Bluetooth: 6lowpan: Memory leak as the skb is not freedJukka Rissanen1-0/+2
The earlier multicast commit 36b3dd250dde ("Bluetooth: 6lowpan: Ensure header compression does not corrupt IPv6 header") lost one skb free which then caused memory leak. Signed-off-by: Jukka Rissanen <[email protected]> Signed-off-by: Johan Hedberg <[email protected]>
2014-10-02Bluetooth: Fix lockdep warning with l2cap_chan_connectJohan Hedberg1-5/+8
The L2CAP connection's channel list lock (conn->chan_lock) must never be taken while already holding a channel lock (chan->lock) in order to avoid lock-inversion and lockdep warnings. So far the l2cap_chan_connect function has acquired the chan->lock early in the function and then later called l2cap_chan_add(conn, chan) which will try to take the conn->chan_lock. This violates the correct order of taking the locks and may lead to the following type of lockdep warnings: -> #1 (&conn->chan_lock){+.+...}: [<c109324d>] lock_acquire+0x9d/0x140 [<c188459c>] mutex_lock_nested+0x6c/0x420 [<d0aab48e>] l2cap_chan_add+0x1e/0x40 [bluetooth] [<d0aac618>] l2cap_chan_connect+0x348/0x8f0 [bluetooth] [<d0cc9a91>] lowpan_control_write+0x221/0x2d0 [bluetooth_6lowpan] -> #0 (&chan->lock){+.+.+.}: [<c10928d8>] __lock_acquire+0x1a18/0x1d20 [<c109324d>] lock_acquire+0x9d/0x140 [<c188459c>] mutex_lock_nested+0x6c/0x420 [<d0ab05fd>] l2cap_connect_cfm+0x1dd/0x3f0 [bluetooth] [<d0a909c4>] hci_le_meta_evt+0x11a4/0x1260 [bluetooth] [<d0a910eb>] hci_event_packet+0x3ab/0x3120 [bluetooth] [<d0a7cb08>] hci_rx_work+0x208/0x4a0 [bluetooth] CPU0 CPU1 ---- ---- lock(&conn->chan_lock); lock(&chan->lock); lock(&conn->chan_lock); lock(&chan->lock); Before calling l2cap_chan_add() the channel is not part of the conn->chan_l list, and can therefore only be accessed by the L2CAP user (such as l2cap_sock.c). We can therefore assume that it is the responsibility of the user to handle mutual exclusion until this point (which we can see is already true in l2cap_sock.c by it in many places touching chan members without holding chan->lock). Since the hci_conn and by exctension l2cap_conn creation in the l2cap_chan_connect() function depend on chan details we cannot simply add a mutex_lock(&conn->chan_lock) in the beginning of the function (since the conn object doesn't yet exist there). What we can do however is move the chan->lock taking later into the function where we already have the conn object and can that way take conn->chan_lock first. This patch implements the above strategy and does some other necessary changes such as using __l2cap_chan_add() which assumes conn->chan_lock is held, as well as adding a second needed label so the unlocking happens as it should. Reported-by: Jukka Rissanen <[email protected]> Signed-off-by: Johan Hedberg <[email protected]> Tested-by: Jukka Rissanen <[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2014-10-01net: pktgen: packet bursting via skb->xmit_moreAlexei Starovoitov1-2/+24
This patch demonstrates the effect of delaying update of HW tailptr. (based on earlier patch by Jesper) burst=1 is the default. It sends one packet with xmit_more=false burst=2 sends one packet with xmit_more=true and 2nd copy of the same packet with xmit_more=false burst=3 sends two copies of the same packet with xmit_more=true and 3rd copy with xmit_more=false Performance with ixgbe (usec 30): burst=1 tx:9.2 Mpps burst=2 tx:13.5 Mpps burst=3 tx:14.5 Mpps full 10G line rate Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01net: bridge: add a br_set_state helper functionFlorian Fainelli6-11/+17
In preparation for being able to propagate port states to e.g: notifiers or other kernel parts, do not manipulate the port state directly, but instead use a helper function which will allow us to do a bit more than just setting the state. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01net_sched: avoid calling tcf_unbind_filter() in call_rcu callbackWANG Cong1-4/+6
This fixes the following crash: [ 63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648 [ 63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000 [ 63.980094] RIP: 0010:[<ffffffff817e6d07>] [<ffffffff817e6d07>] u32_destroy_key+0x27/0x6d [ 63.980094] RSP: 0018:ffff880117dffcc0 EFLAGS: 00010202 [ 63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000 [ 63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b [ 63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000 [ 63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001 [ 63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30 [ 63.980094] FS: 0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000 [ 63.980094] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0 [ 63.980094] Stack: [ 63.980094] ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68 [ 63.980094] ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8 [ 63.980094] ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a [ 63.980094] Call Trace: [ 63.980094] [<ffffffff817e6d68>] u32_delete_key_freepf_rcu+0x1b/0x1d [ 63.980094] [<ffffffff810cb4c7>] rcu_process_callbacks+0x3bb/0x691 [ 63.980094] [<ffffffff810cb3cd>] ? rcu_process_callbacks+0x2c1/0x691 [ 63.980094] [<ffffffff817e6d4d>] ? u32_destroy_key+0x6d/0x6d [ 63.980094] [<ffffffff810780a4>] __do_softirq+0x142/0x323 [ 63.980094] [<ffffffff810782a8>] run_ksoftirqd+0x23/0x53 [ 63.980094] [<ffffffff81092126>] smpboot_thread_fn+0x203/0x221 [ 63.980094] [<ffffffff81091f23>] ? smpboot_unpark_thread+0x33/0x33 [ 63.980094] [<ffffffff8108e44d>] kthread+0xc9/0xd1 [ 63.980094] [<ffffffff819e00ea>] ? do_wait_for_common+0xf8/0x125 [ 63.980094] [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61 [ 63.980094] [<ffffffff819e43ec>] ret_from_fork+0x7c/0xb0 [ 63.980094] [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61 tp could be freed in call_rcu callback too, the order is not guaranteed. John Fastabend says: ==================== Its worth noting why this is safe. Any running schedulers will either read the valid class field or it will be zeroed. All schedulers today when the class is 0 do a lookup using the same call used by the tcf_exts_bind(). So even if we have a running classifier hit the null class pointer it will do a lookup and get to the same result. This is particularly fragile at the moment because the only way to verify this is to audit the schedulers call sites. ==================== Cc: John Fastabend <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01net_sched: fix another crash in cls_tcindexWANG Cong1-3/+9
This patch fixes the following crash: [ 166.670795] BUG: unable to handle kernel NULL pointer dereference at (null) [ 166.674230] IP: [<ffffffff814b739f>] __list_del_entry+0x5c/0x98 [ 166.674230] PGD d0ea5067 PUD ce7fc067 PMD 0 [ 166.674230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 166.674230] CPU: 1 PID: 775 Comm: tc Not tainted 3.17.0-rc6+ #642 [ 166.674230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 166.674230] task: ffff8800d03c4d20 ti: ffff8800cae7c000 task.ti: ffff8800cae7c000 [ 166.674230] RIP: 0010:[<ffffffff814b739f>] [<ffffffff814b739f>] __list_del_entry+0x5c/0x98 [ 166.674230] RSP: 0018:ffff8800cae7f7d0 EFLAGS: 00010207 [ 166.674230] RAX: 0000000000000000 RBX: ffff8800cba8d700 RCX: ffff8800cba8d700 [ 166.674230] RDX: 0000000000000000 RSI: dead000000200200 RDI: ffff8800cba8d700 [ 166.674230] RBP: ffff8800cae7f7d0 R08: 0000000000000001 R09: 0000000000000001 [ 166.674230] R10: 0000000000000000 R11: 000000000000859a R12: ffffffffffffffe8 [ 166.674230] R13: ffff8800cba8c5b8 R14: 0000000000000001 R15: ffff8800cba8d700 [ 166.674230] FS: 00007fdb5f04a740(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000 [ 166.674230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 166.674230] CR2: 0000000000000000 CR3: 00000000cf929000 CR4: 00000000000006e0 [ 166.674230] Stack: [ 166.674230] ffff8800cae7f7e8 ffffffff814b73e8 ffff8800cba8d6e8 ffff8800cae7f828 [ 166.674230] ffffffff817caeec 0000000000000046 ffff8800cba8c5b0 ffff8800cba8c5b8 [ 166.674230] 0000000000000000 0000000000000001 ffff8800cf8e33e8 ffff8800cae7f848 [ 166.674230] Call Trace: [ 166.674230] [<ffffffff814b73e8>] list_del+0xd/0x2b [ 166.674230] [<ffffffff817caeec>] tcf_action_destroy+0x4c/0x71 [ 166.674230] [<ffffffff817ca0ce>] tcf_exts_destroy+0x20/0x2d [ 166.674230] [<ffffffff817ec2b5>] tcindex_delete+0x196/0x1b7 struct list_head can not be simply copied and we should always init it. Cc: John Fastabend <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01gre: Set inner protocol in v4 and v6 GRE transmitTom Herbert2-2/+8
Call skb_set_inner_protocol to set inner Ethernet protocol to protocol being encapsulation by GRE before tunnel_xmit. This is needed for GSO if UDP encapsulation (fou) is being done. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01ipip: Set inner IP protocol in ipipTom Herbert1-0/+2
Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV4 before tunnel_xmit. This is needed if UDP encapsulation (fou) is being done. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01sit: Set inner IP protocol in sitTom Herbert1-0/+4
Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV6 before tunnel_xmit. This is needed if UDP encapsulation (fou) is being done. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01udp: Generalize skb_udp_segmentTom Herbert2-6/+47
skb_udp_segment is the function called from udp4_ufo_fragment to segment a UDP tunnel packet. This function currently assumes segmentation is transparent Ethernet bridging (i.e. VXLAN encapsulation). This patch generalizes the function to operate on either Ethertype or IP protocol. The inner_protocol field must be set to the protocol of the inner header. This can now be either an Ethertype or an IP protocol (in a union). A new flag in the skbuff indicates which type is effective. skb_set_inner_protocol and skb_set_inner_ipproto helper functions were added to set the inner_protocol. These functions are called from the point where the tunnel encapsulation is occuring. When skb_udp_tunnel_segment is called, the function to segment the inner packet is selected based on the inner IP or Ethertype. In the case of an IP protocol encapsulation, the function is derived from inet[6]_offloads. In the case of Ethertype, skb->protocol is set to the inner_protocol and skb_mac_gso_segment is called. (GRE currently does this, but it might be possible to lookup the protocol in offload_base and call the appropriate segmenation function directly). Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01net: avoid one atomic operation in skb_clone()Eric Dumazet1-6/+17
Fast clone cloning can actually avoid an atomic_inc(), if we guarantee prior clone_ref value is 1. This requires a change kfree_skbmem(), to perform the atomic_dec_and_test() on clone_ref before setting fclone to SKB_FCLONE_UNAVAILABLE. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01net/dccp/ccid.c: add __init to ccid_activateFabian Frederick1-1/+1
ccid_activate is only called by __init ccid_initialize_builtins in same module. Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01net/dccp/proto.c: add __init to dccp_mib_initFabian Frederick1-1/+1
dccp_mib_init is only called by __init dccp_init in same module. Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01net: cleanup and document skb fclone layoutEric Dumazet3-28/+22
Lets use a proper structure to clearly document and implement skb fast clones. Then, we might experiment more easily alternative layouts. This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm, to stop leaking of implementation details. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01tcp: abort orphan sockets stalling on zero window probesYuchung Cheng2-21/+22
Currently we have two different policies for orphan sockets that repeatedly stall on zero window ACKs. If a socket gets a zero window ACK when it is transmitting data, the RTO is used to probe the window. The socket is aborted after roughly tcp_orphan_retries() retries (as in tcp_write_timeout()). But if the socket was idle when it received the zero window ACK, and later wants to send more data, we use the probe timer to probe the window. If the receiver always returns zero window ACKs, icsk_probes keeps getting reset in tcp_ack() and the orphan socket can stall forever until the system reaches the orphan limit (as commented in tcp_probe_timer()). This opens up a simple attack to create lots of hanging orphan sockets to burn the memory and the CPU, as demonstrated in the recent netdev post "TCP connection will hang in FIN_WAIT1 after closing if zero window is advertised." http://www.spinics.net/lists/netdev/msg296539.html This patch follows the design in RTO-based probe: we abort an orphan socket stalling on zero window when the probe timer reaches both the maximum backoff and the maximum RTO. For example, an 100ms RTT connection will timeout after roughly 153 seconds (0.3 + 0.6 + .... + 76.8) if the receiver keeps the window shut. If the orphan socket passes this check, but the system already has too many orphans (as in tcp_out_of_resources()), we still abort it but we'll also send an RST packet as the connection may still be active. In addition, we change TCP_USER_TIMEOUT to cover (life or dead) sockets stalled on zero-window probes. This changes the semantics of TCP_USER_TIMEOUT slightly because it previously only applies when the socket has pending transmission. Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Reported-by: Andrey Dmitrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01cipso: add __init to cipso_v4_cache_initFabian Frederick1-1/+1
cipso_v4_cache_init is only called by __init cipso_v4_init Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01inet: frags: add __init to ip4_frags_ctl_registerFabian Frederick1-2/+2
ip4_frags_ctl_register is only called by __init ipfrag_init Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01tcp: add __init to tcp_init_memFabian Frederick1-1/+1
tcp_init_mem is only called by __init tcp_init. Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01net: dsa: Fix build warning for !PM_SLEEPThierry Reding1-0/+2
The dsa_switch_suspend() and dsa_switch_resume() functions are only used when PM_SLEEP is enabled, so they need #ifdef CONFIG_PM_SLEEP protection to avoid a compiler warning. Signed-off-by: Thierry Reding <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01ipv4: mentions skb_gro_postpull_rcsum() in inet_gro_receive()Eric Dumazet1-0/+3
Proper CHECKSUM_COMPLETE support needs to adjust skb->csum when we remove one header. Its done using skb_gro_postpull_rcsum() In the case of IPv4, we know that the adjustment is not really needed, because the checksum over IPv4 header is 0. Lets add a comment to ease code comprehension and avoid copy/paste errors. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01ieee802154: fix __init functionsFabian Frederick1-2/+2
Commit 3243acd37fd9 ("ieee802154: add __init to lowpan_frags_sysctl_register") added __init to lowpan_frags_ns_sysctl_register instead of lowpan_frags_sysctl_register Suggested-by: Alexander Aring <[email protected]> Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-30Merge branch 'bugfixes' into linux-nextTrond Myklebust1-0/+3
* bugfixes: NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails NFS: Fabricate fscache server index key correctly SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT nfs: fix duplicate proc entries
2014-09-30tcp: Change tcp_slow_start function to return voidLi RongQing1-3/+1
No caller uses the return value, so make this function return void. Signed-off-by: Li RongQing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-30ieee802154: add __init to lowpan_frags_sysctl_registerFabian Frederick1-2/+2
lowpan_frags_sysctl_register is only called by __init lowpan_net_frag_init (part of the lowpan module). Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-30irda: add __init to irlan_openFabian Frederick1-2/+2
irlan_open is only called by __init irlan_init in same module. Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-30netfilter: bridge: build br_nf_core only if requiredFlorian Westphal2-3/+4
Eric reports build failure with CONFIG_BRIDGE_NETFILTER=n We insist to build br_nf_core.o unconditionally, but we must only do so if br_netfilter was enabled, else it fails to build due to functions being defined to empty stubs (and some structure members being defined out). Also, BRIDGE_NETFILTER=y|m makes no sense when BRIDGE=n. Fixes: 34666d467 (netfilter: bridge: move br_netfilter out of the core) Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-30ipv6: remove rt6i_genidHannes Frederic Sowa4-5/+29
Eric Dumazet noticed that all no-nonexthop or no-gateway routes which are already marked DST_HOST (e.g. input routes routes) will always be invalidated during sk_dst_check. Thus per-socket dst caching absolutely had no effect and early demuxing had no effect. Thus this patch removes rt6i_genid: fn_sernum already gets modified during add operations, so we only must ensure we mutate fn_sernum during ipv6 address remove operations. This is a fairly cost extensive operations, but address removal should not happen that often. Also our mtu update functions do the same and we heard no complains so far. xfrm policy changes also cause a call into fib6_flush_trees. Also plug a hole in rt6_info (no cacheline changes). I verified via tracing that this change has effect. Cc: Eric Dumazet <[email protected]> Cc: YOSHIFUJI Hideaki <[email protected]> Cc: Vlad Yasevich <[email protected]> Cc: Nicolas Dichtel <[email protected]> Cc: Martin Lau <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-01Merge commit 'v3.16' into nextJames Morris56-337/+536
2014-09-30net: sched: enable per cpu qstatsJohn Fastabend15-23/+74
After previous patches to simplify qstats the qstats can be made per cpu with a packed union in Qdisc struct. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-30net: sched: restrict use of qstats qlenJohn Fastabend15-33/+30
This removes the use of qstats->qlen variable from the classifiers and makes it an explicit argument to gnet_stats_copy_queue(). The qlen represents the qdisc queue length and is packed into the qstats at the last moment before passnig to user space. By handling it explicitely we avoid, in the percpu stats case, having to figure out which per_cpu variable to put it in. It would probably be best to remove it from qstats completely but qstats is a user space ABI and can't be broken. A future patch could make an internal only qstats structure that would avoid having to allocate an additional u32 variable on the Qdisc struct. This would make the qstats struct 128bits instead of 128+32. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-30net: sched: implement qstat helper routinesJohn Fastabend24-75/+75
This adds helpers to manipulate qstats logic and replaces locations that touch the counters directly. This simplifies future patches to push qstats onto per cpu counters. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-30net: sched: make bstats per cpu and estimator RCU safeJohn Fastabend17-50/+132
In order to run qdisc's without locking statistics and estimators need to be handled correctly. To resolve bstats make the statistics per cpu. And because this is only needed for qdiscs that are running without locks which is not the case for most qdiscs in the near future only create percpu stats when qdiscs set the TCQ_F_CPUSTATS flag. Next because estimators use the bstats to calculate packets per second and bytes per second the estimator code paths are updated to use the per cpu statistics. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-29ematch: Fix matching of inverted containers.Ignacy Gawędzki1-2/+4
Negated expressions and sub-expressions need to have their flags checked for TCF_EM_INVERT and their result negated accordingly. Signed-off-by: Ignacy Gawędzki <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-29gro: fix aggregation for skb using frag_listEric Dumazet1-0/+3
In commit 8a29111c7ca6 ("net: gro: allow to build full sized skb") I added a regression for linear skb that traditionally force GRO to use the frag_list fallback. Erez Shitrit found that at most two segments were aggregated and the "if (skb_gro_len(p) != pinfo->gso_size)" test was failing. This is because pinfo at this spot still points to the last skb in the chain, instead of the first one, where we find the correct gso_size information. Signed-off-by: Eric Dumazet <[email protected]> Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb") Reported-by: Erez Shitrit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller59-480/+1428
Pablo Neira Ayuso says: ==================== pull request: netfilter/ipvs updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: 1) Four patches to make the new nf_tables masquerading support independent of the x_tables infrastructure. This also resolves a compilation breakage if the masquerade target is disabled but the nf_tables masq expression is enabled. 2) ipset updates via Jozsef Kadlecsik. This includes the addition of the skbinfo extension that allows you to store packet metainformation in the elements. This can be used to fetch and restore this to the packets through the iptables SET target, patches from Anton Danilov. 3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick. 4) Add simple weighted fail-over scheduler via Simon Horman. This provides a fail-over IPVS scheduler (unlike existing load balancing schedulers). Connections are directed to the appropriate server based solely on highest weight value and server availability, patch from Kenny Mathis. 5) Support IPv6 real servers in IPv4 virtual-services and vice versa. Simon Horman informs that the motivation for this is to allow more flexibility in the choice of IP version offered by both virtual-servers and real-servers as they no longer need to match: An IPv4 connection from an end-user may be forwarded to a real-server using IPv6 and vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell and Julian Anastasov. 6) Add global generation ID to the nf_tables ruleset. When dumping from several different object lists, we need a way to identify that an update has ocurred so userspace knows that it needs to refresh its lists. This also includes a new command to obtain the 32-bits generation ID. The less significant 16-bits of this ID is also exposed through res_id field in the nfnetlink header to quickly detect the interference and retry when there is no risk of ID wraparound. 7) Move br_netfilter out of the bridge core. The br_netfilter code is built in the bridge core by default. This causes problems of different kind to people that don't want this: Jesper reported performance drop due to the inconditional hook registration and I remember to have read complains on netdev from people regarding the unexpected behaviour of our bridging stack when br_netfilter is enabled (fragmentation handling, layer 3 and upper inspection). People that still need this should easily undo the damage by modprobing the new br_netfilter module. 8) Dump the set policy nf_tables that allows set parameterization. So userspace can keep user-defined preferences when saving the ruleset. From Arturo Borrero. 9) Use __seq_open_private() helper function to reduce boiler plate code in x_tables, From Rob Jones. 10) Safer default behaviour in case that you forget to load the protocol tracker. Daniel Borkmann and Florian Westphal detected that if your ruleset is stateful, you allow traffic to at least one single SCTP port and the SCTP protocol tracker is not loaded, then any SCTP traffic may be pass through unfiltered. After this patch, the connection tracking classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has been compiled with support for these modules. ==================== Trivially resolved conflict in include/linux/skbuff.h, Eric moved some netfilter skbuff members around, and the netfilter tree adjusted the ifdef guards for the bridging info pointer. Signed-off-by: David S. Miller <[email protected]>
2014-09-29tcp: change TCP_ECN prefixes to lower caseFlorian Westphal3-31/+34
Suggested by Stephen. Also drop inline keyword and let compiler decide. gcc 4.7.3 decides to no longer inline tcp_ecn_check_ce, so split it up. The actual evaluation is not inlined anymore while the ECN_OK test is. Suggested-by: Stephen Hemminger <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-29tcp: move TCP_ECN_create_request out of headerFlorian Westphal1-1/+35
After Octavian Purdilas tcp ipv4/ipv6 unification work this helper only has a single callsite. While at it, convert name to lowercase, suggested by Stephen. Suggested-by: Stephen Hemminger <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-29svcrdma: advertise the correct max payloadSteve Wise2-1/+8
Svcrdma currently advertises 1MB, which is too large. The correct value is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed in an NFSRDMA IO chunk * the host page size. This bug is usually benign because the Linux X64 NFSRDMA client correctly limits the payload size to the correct value (64*4096 = 256KB). But if the Linux client is PPC64 with a 64KB page size, then the client will indeed use a payload size that will overflow the server. Signed-off-by: Steve Wise <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-09-29tcp: remove unnecessary assignment.Li RongQing1-1/+1
This variable i is overwritten to 0 by following code Signed-off-by: Li RongQing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-29net: reorganize sk_buff for faster __copy_skb_header()Eric Dumazet1-39/+41
With proliferation of bit fields in sk_buff, __copy_skb_header() became quite expensive, showing as the most expensive function in a GSO workload. __copy_skb_header() performance is also critical for non GSO TCP operations, as it is used from skb_clone() This patch carefully moves all the fields that were not copied in a separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more Then I moved all other fields and all other copied fields in a section delimited by headers_start[0]/headers_end[0] section so that we can use a single memcpy() call, inlined by compiler using long word load/stores. I also tried to make all copies in the natural orders of sk_buff, to help hardware prefetching. I made sure sk_buff size did not change. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-29Bluetooth: 6lowpan: Enable multicast supportJukka Rissanen1-1/+2
Set multicast support for 6lowpan network interface. This is needed in every network interface that supports IPv6. Signed-off-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>