aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-03-31xprtrdma: Make rpcrdma_{un}map_one() into inline functionsChuck Lever5-46/+73
These functions are called in a loop for each page transferred via RDMA READ or WRITE. Extract loop invariants and inline them to reduce CPU overhead. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Handle non-SEND completions via a calloutChuck Lever3-10/+28
Allow each memory registration mode to plug in a callout that handles the completion of a memory registration operation. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Add "open" memreg opChuck Lever5-46/+70
The open op determines the size of various transport data structures based on device capabilities and memory registration mode. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Add "destroy MRs" memreg opChuck Lever5-51/+40
Memory Region objects associated with a transport instance are destroyed before the instance is shutdown and destroyed. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Add "reset MRs" memreg opChuck Lever5-101/+83
This method is invoked when a transport instance is about to be reconnected. Each Memory Region object is reset to its initial state. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Add "init MRs" memreg opChuck Lever5-101/+119
This method is used when setting up a new transport instance to create a pool of Memory Region objects that will be used to register memory during operation. Memory Regions are not needed for "physical" registration, since ->prepare and ->release are no-ops for that mode. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Add a "deregister_external" op for each memreg modeChuck Lever7-90/+84
There is very little common processing among the different external memory deregistration functions. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Add a "register_external" op for each memreg modeChuck Lever6-169/+160
There is very little common processing among the different external memory registration functions. Have rpcrdma_create_chunks() call the registration method directly. This removes a stack frame and a switch statement from the external registration path. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Add a "max_payload" op for each memreg modeChuck Lever6-36/+59
The max_payload computation is generalized to ensure that the payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of data segments that can be transmitted in an inline buffer. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Add vector of ops for each memory registration strategyChuck Lever6-5/+89
Instead of employing switch() statements, let's use the typical Linux kernel idiom for handling behavioral variation: virtual functions. Start by defining a vector of operations for each supported memory registration mode, and by adding a source file for each mode. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Prevent infinite loop in rpcrdma_ep_create()Chuck Lever1-2/+3
If a provider advertizes a zero max_fast_reg_page_list_len, FRWR depth detection loops forever. Instead of just failing the mount, try other memory registration modes. Fixes: 0fc6c4e7bb28 ("xprtrdma: mind the device's max fast . . .") Reported-by: Devesh Sharma <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Byte-align FRWR registrationChuck Lever1-8/+4
The RPC/RDMA transport's FRWR registration logic registers whole pages. This means areas in the first and last pages that are not involved in the RDMA I/O are needlessly exposed to the server. Buffered I/O is typically page-aligned, so not a problem there. But for direct I/O, which can be byte-aligned, and for reply chunks, which are nearly always smaller than a page, the transport could expose memory outside the I/O buffer. FRWR allows byte-aligned memory registration, so let's use it as it was intended. Reported-by: Sagi Grimberg <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Perform a full marshal on retransmitChuck Lever3-52/+34
Commit 6ab59945f292 ("xprtrdma: Update rkeys after transport reconnect" added logic in the ->send_request path to update the chunk list when an RPC/RDMA request is retransmitted. Note that rpc_xdr_encode() resets and re-encodes the entire RPC send buffer for each retransmit of an RPC. The RPC send buffer is not preserved from the previous transmission of an RPC. Revert 6ab59945f292, and instead, just force each request to be fully marshaled every time through ->send_request. This should preserve the fix from 6ab59945f292, while also performing pullup during retransmits. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Sagi Grimberg <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-31xprtrdma: Display IPv6 addresses and port numbers correctlyChuck Lever2-21/+47
Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-by: Devesh Sharma <[email protected]> Tested-by: Meghana Cheripady <[email protected]> Tested-by: Veeresh U. Kokatnur <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-03-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller9-13/+51
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix missing initialization of tuple structure in nfnetlink_cthelper to avoid mismatches when looking up to attach userspace helpers to flows, from Ian Wilson. 2) Fix potential crash in nft_hash when we hit -EAGAIN in nft_hash_walk(), from Herbert Xu. 3) We don't need to indicate the hook information to update the basechain default policy in nf_tables. 4) Restore tracing over nfnetlink_log due to recent rework to accomodate logging infrastructure into nf_tables. 5) Fix wrong IP6T_INV_PROTO check in xt_TPROXY. 6) Set IP6T_F_PROTO flag in nft_compat so we can use SYNPROXY6 and REJECT6 from xt over nftables. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-03-22netfilter: nft_compat: set IP6T_F_PROTO flag if protocol is setPablo Neira Ayuso1-0/+6
ip6tables extensions check for this flag to restrict match/target to a given protocol. Without this flag set, SYNPROXY6 returns an error. Signed-off-by: Pablo Neira Ayuso <[email protected]> Acked-by: Patrick McHardy <[email protected]>
2015-03-20net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfromAl Viro1-0/+4
Cc: [email protected] # v3.19 Signed-off-by: Al Viro <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-20net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() ↵Catalin Marinas1-0/+7
behaviour Commit db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error) introduced the clamping of msg_namelen when the unsigned value was larger than sizeof(struct sockaddr_storage). This caused a msg_namelen of -1 to be valid. The native code was subsequently fixed by commit dbb490b96584 (net: socket: error on a negative msg_namelen). In addition, the native code sets msg_namelen to 0 when msg_name is NULL. This was done in commit (6a2a2b3ae075 net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland) and subsequently updated by 08adb7dabd48 (fold verify_iovec() into copy_msghdr_from_user()). This patch brings the get_compat_msghdr() in line with copy_msghdr_from_user(). Fixes: db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error) Cc: David S. Miller <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-20tcp: fix tcp fin memory accountingJosh Hunt1-5/+1
tcp_send_fin() does not account for the memory it allocates properly, so sk_forward_alloc can be negative in cases where we've sent a FIN: ss example output (ss -amn | grep -B1 f4294): tcp FIN-WAIT-1 0 1 192.168.0.1:45520 192.0.2.1:8080 skmem:(r0,rb87380,t0,tb87380,f4294966016,w1280,o0,bl0) Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-20ipv6: fix backtracking for throw routesSteven Barth1-0/+1
for throw routes to trigger evaluation of other policy rules EAGAIN needs to be propagated up to fib_rules_lookup similar to how its done for IPv4 A simple testcase for verification is: ip -6 rule add lookup 33333 priority 33333 ip -6 route add throw 2001:db8::1 ip -6 route add 2001:db8::1 via fe80::1 dev wlan0 table 33333 ip route get 2001:db8::1 Signed-off-by: Steven Barth <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-20ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in ↵Sabrina Dubroca1-5/+3
udp6_ufo_fragment Matt Grant reported frequent crashes in ipv6_select_ident when udp6_ufo_fragment is called from openvswitch on a skb that doesn't have a dst_entry set. ipv6_proxy_select_ident generates the frag_id without using the dst associated with the skb. This approach was suggested by Vladislav Yasevich. Fixes: 0508c07f5e0c ("ipv6: Select fragment id during UFO segmentation if not set.") Cc: Vladislav Yasevich <[email protected]> Reported-by: Matt Grant <[email protected]> Tested-by: Matt Grant <[email protected]> Signed-off-by: Sabrina Dubroca <[email protected]> Acked-by: Vladislav Yasevich <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-20netfilter: xt_TPROXY: fix invflags check in tproxy_tg6_check()Pablo Neira Ayuso1-2/+2
We have to check for IP6T_INV_PROTO in invflags, instead of flags. Signed-off-by: Pablo Neira Ayuso <[email protected]> Acked-by: Balazs Scheidler <[email protected]>
2015-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds26-86/+189
Pull networking fixes from David Miller: 1) Fix packet header offset calculation in _decode_session6(), from Hajime Tazaki. 2) Fix route leak in error paths of xfrm_lookup(), from Huaibin Wang. 3) Be sure to clear state properly when scans fail in iwlwifi mvm code, from Luciano Coelho. 4) iwlwifi tries to stop scans that aren't actually running, also from Luciano Coelho. 5) mac80211 should drop mesh frames that are not encrypted, fix from Bob Copeland. 6) Add new device ID to b43 wireless driver for BCM432228 chips, from Rafał Miłecki. 7) Fix accidental addition of members after variable sized array in struct tc_u_hnode, from WANG Cong. 8) Don't re-enable interrupts until after we call napi_complete() in ibmveth and WIZnet drivers, frm Yongbae Park. 9) Fix regression in vlan tag handling of fec driver, from Fugang Duan. 10) If a network namespace change fails during rtnl_newlink(), we don't unwind the device registry properly. 11) Fix two TCP regressions, from Neal Cardwell: - Don't allow snd_cwnd_cnt to accumulate huge values due to missing test in tcp_cong_avoid_ai(). - Restore CUBIC back to advancing cwnd by 1.5x packets per RTT. 12) Fix performance regression in xne-netback involving push TX notifications, from David Vrabel. 13) __skb_tstamp_tx() can be called with a NULL sk pointer, do not dereference blindly. From Willem de Bruijn. 14) Fix potential stack overflow in RDS protocol stack, from Arnd Bergmann. 15) VXLAN_VID_MASK used incorrectly in new remote checksum offload support of VXLAN driver. Fix from Alexey Kodanev. 16) Fix too small netlink SKB allocation in inet_diag layer, from Eric Dumazet. 17) ieee80211_check_combinations() does not count interfaces correctly, from Andrei Otcheretianski. 18) Hardware feature determination in bxn2x driver references a piece of software state that actually isn't initialized yet, fix from Michal Schmidt. 19) inet_csk_wait_for_connect() needs a sched_annotate_sleep() annoation, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits) Revert "net: cx82310_eth: use common match macro" net/mlx4_en: Set statistics bitmap at port init IB/mlx4: Saturate RoCE port PMA counters in case of overflow net/mlx4_en: Fix off-by-one in ethtool statistics display IB/mlx4: Verify net device validity on port change event act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome Revert "smc91x: retrieve IRQ and trigger flags in a modern way" inet: Clean up inet_csk_wait_for_connect() vs. might_sleep() ip6_tunnel: fix error code when tunnel exists netdevice.h: fix ndo_bridge_* comments bnx2x: fix encapsulation features on 57710/57711 mac80211: ignore CSA to same channel nl80211: ignore HT/VHT capabilities without QoS/WMM mac80211: ask for ECSA IE to be considered for beacon parse CRC mac80211: count interfaces correctly for combination checks isdn: icn: use strlcpy() when parsing setup options rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() caif: fix MSG_OOB test in caif_seqpkt_recvmsg() bridge: reset bridge mtu after deleting an interface can: kvaser_usb: Fix tx queue start/stop race conditions ...
2015-03-19netfilter: restore rule tracing via nfnetlink_logPablo Neira Ayuso4-10/+34
Since fab4085 ("netfilter: log: nf_log_packet() as real unified interface"), the loginfo structure that is passed to nf_log_packet() is used to explicitly indicate the logger type you want to use. This is a problem for people tracing rules through nfnetlink_log since packets are always routed to the NF_LOG_TYPE logger after the aforementioned patch. We can fix this by removing the trace loginfo structures, but that still changes the log level from 4 to 5 for tracing messages and there may be someone relying on this outthere. So let's just introduce a new nf_log_trace() function that restores the former behaviour. Reported-by: Markus Kötter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-03-17act_bpf: allow non-default TC_ACT opcodes as BPF exec outcomeDaniel Borkmann1-8/+28
Revisiting commit d23b8ad8ab23 ("tc: add BPF based action") with regards to eBPF support, I was thinking that it might be better to improve return semantics from a BPF program invoked through BPF_PROG_RUN(). Currently, in case filter_res is 0, we overwrite the default action opcode with TC_ACT_SHOT. A default action opcode configured through tc's m_bpf can be: TC_ACT_RECLASSIFY, TC_ACT_PIPE, TC_ACT_SHOT, TC_ACT_UNSPEC, TC_ACT_OK. In cls_bpf, we have the possibility to overwrite the default class associated with the classifier in case filter_res is _not_ 0xffffffff (-1). That allows us to fold multiple [e]BPF programs into a single one, where they would otherwise need to be defined as a separate classifier with its own classid, needlessly redoing parsing work, etc. Similarly, we could do better in act_bpf: Since above TC_ACT* opcodes are exported to UAPI anyway, we reuse them for return-code-to-tc-opcode mapping, where we would allow above possibilities. Thus, like in cls_bpf, a filter_res of 0xffffffff (-1) means that the configured _default_ action is used. Any unkown return code from the BPF program would fail in tcf_bpf() with TC_ACT_UNSPEC. Should we one day want to make use of TC_ACT_STOLEN or TC_ACT_QUEUED, which both have the same semantics, we have the option to either use that as a default action (filter_res of 0xffffffff) or non-default BPF return code. All that will allow us to transparently use tcf_bpf() for both BPF flavours. Signed-off-by: Daniel Borkmann <[email protected]> Cc: Jiri Pirko <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-17inet: Clean up inet_csk_wait_for_connect() vs. might_sleep()Eric Dumazet1-0/+1
I got the following trace with current net-next kernel : [14723.885290] WARNING: CPU: 26 PID: 22658 at kernel/sched/core.c:7285 __might_sleep+0x89/0xa0() [14723.885325] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810e8734>] prepare_to_wait_exclusive+0x34/0xa0 [14723.885355] CPU: 26 PID: 22658 Comm: netserver Not tainted 4.0.0-dbg-DEV #1379 [14723.885359] ffffffff81a223a8 ffff881fae9e7ca8 ffffffff81650b5d 0000000000000001 [14723.885364] ffff881fae9e7cf8 ffff881fae9e7ce8 ffffffff810a72e7 0000000000000000 [14723.885367] ffffffff81a57620 000000000000093a 0000000000000000 ffff881fae9e7e64 [14723.885371] Call Trace: [14723.885377] [<ffffffff81650b5d>] dump_stack+0x4c/0x65 [14723.885382] [<ffffffff810a72e7>] warn_slowpath_common+0x97/0xe0 [14723.885386] [<ffffffff810a73e6>] warn_slowpath_fmt+0x46/0x50 [14723.885390] [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0 [14723.885393] [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0 [14723.885396] [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0 [14723.885399] [<ffffffff810ccdc9>] __might_sleep+0x89/0xa0 [14723.885403] [<ffffffff81581846>] lock_sock_nested+0x36/0xb0 [14723.885406] [<ffffffff815829a3>] ? release_sock+0x173/0x1c0 [14723.885411] [<ffffffff815ea1f7>] inet_csk_accept+0x157/0x2a0 [14723.885415] [<ffffffff810e8900>] ? abort_exclusive_wait+0xc0/0xc0 [14723.885419] [<ffffffff8161b96d>] inet_accept+0x2d/0x150 [14723.885424] [<ffffffff8157db6f>] SYSC_accept4+0xff/0x210 [14723.885428] [<ffffffff8165a451>] ? retint_swapgs+0xe/0x44 [14723.885431] [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0 [14723.885437] [<ffffffff81369c0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [14723.885441] [<ffffffff8157ef40>] SyS_accept+0x10/0x20 [14723.885444] [<ffffffff81659872>] system_call_fastpath+0x12/0x17 [14723.885447] ---[ end trace ff74cd83355b1873 ]--- In commit 26cabd31259ba43f68026ce3f62b78094124333f Peter added a sched_annotate_sleep() in sk_wait_event() Is the following patch needed as well ? Alternative would be to use sk_wait_event() from inet_csk_wait_for_connect() Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-17ip6_tunnel: fix error code when tunnel existsNicolas Dichtel1-16/+17
After commit 2b0bb01b6edb, the kernel returns -ENOBUFS when user tries to add an existing tunnel with ioctl API: $ ip -6 tunnel add ip6tnl1 mode ip6ip6 dev eth1 add tunnel "ip6tnl0" failed: No buffer space available It's confusing, the right error is EEXIST. This patch also change a bit the code returned: - ENOBUFS -> ENOMEM - ENOENT -> ENODEV Fixes: 2b0bb01b6edb ("ip6_tunnel: Return an error when adding an existing tunnel.") CC: Steffen Klassert <[email protected]> Reported-by: Pierre Cheynier <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-17Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds1-4/+20
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio fixes from Rusty Russell: "Not entirely surprising: the ongoing QEMU work on virtio 1.0 has revealed more minor issues with our virtio 1.0 drivers just introduced in the kernel. (I would normally use my fixes branch for this, but there were a batch of them...)" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio_mmio: fix access width for mmio uapi/virtio_scsi: allow overriding CDB/SENSE size virtio_mmio: generation support virtio_rpmsg: set DRIVER_OK before using device 9p/trans_virtio: fix hot-unplug virtio-balloon: do not call blocking ops when !TASK_RUNNING virtio_blk: fix comment for virtio 1.0 virtio_blk: typo fix virtio_balloon: set DRIVER_OK before using device virtio_console: avoid config access from irq virtio_console: init work unconditionally
2015-03-17netfilter: nf_tables: allow to change chain policy without hook if it existsPablo Neira Ayuso1-1/+4
If there's an existing base chain, we have to allow to change the default policy without indicating the hook information. However, if the chain doesn't exists, we have to enforce the presence of the hook attribute. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-03-16Merge tag 'mac80211-for-davem-2015-03-16' of ↵David S. Miller5-8/+47
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Here are a few fixes that I'd like to still get in: * disable U-APSD for better interoperability, from Michal Kazior * drop unencrypted frames in mesh forwarding, from Bob Copeland * treat non-QoS/WMM HT stations as non-HT, to fix confusion when they connect and then get QoS packets anyway due to HT * fix counting interfaces for combination checks, otherwise the interface combinations aren't properly enforced (from Andrei) * fix pure ECSA by reacting to the IE change * ignore erroneous (E)CSA to the current channel which sometimes happens due to AP/GO bugs ==================== Signed-off-by: David S. Miller <[email protected]>
2015-03-16Merge branch 'master' of ↵David S. Miller4-8/+9
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2015-03-16 1) Fix the network header offset in _decode_session6 when multiple IPv6 extension headers are present. From Hajime Tazaki. 2) Fix an interfamily tunnel crash. We set outer mode protocol too early and may dispatch to the wrong address family. Move the setting of the outer mode protocol behind the last accessing of the inner mode to fix the crash. 3) Most callers of xfrm_lookup() expect that dst_orig is released on error. But xfrm_lookup_route() may need dst_orig to handle certain error cases. So introduce a flag that tells what should be done in case of error. From Huaibin Wang. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-03-16mac80211: ignore CSA to same channelJohannes Berg2-0/+14
If the AP is confused and starts doing a CSA to the same channel, just ignore that request instead of trying to act it out since it was likely sent in error anyway. In the case of the bug I was investigating the GO was misbehaving and sending out a beacon with CSA IEs still included after having actually done the channel switch. Signed-off-by: Johannes Berg <[email protected]>
2015-03-16nl80211: ignore HT/VHT capabilities without QoS/WMMJohannes Berg1-0/+10
As HT/VHT depend heavily on QoS/WMM, it's not a good idea to let userspace add clients that have HT/VHT but not QoS/WMM. Since it does so in certain cases we've observed (client is using HT IEs but not QoS/WMM) just ignore the HT/VHT info at this point and don't pass it down to the drivers which might unconditionally use it. Cc: [email protected] Signed-off-by: Johannes Berg <[email protected]>
2015-03-16mac80211: ask for ECSA IE to be considered for beacon parse CRCJohannes Berg1-1/+2
When a beacon from the AP contains only the ECSA IE, and not a CSA IE as well, this ECSA IE is not considered for calculating the CRC and the beacon might be dropped as not being interesting. This is clearly wrong, it should be handled and the channel switch should be executed. Fix this by including the ECSA IE ID in the bitmap of interesting IEs. Reported-by: Gil Tribush <[email protected]> Reviewed-by: Luciano Coelho <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-03-16mac80211: count interfaces correctly for combination checksAndrei Otcheretianski1-1/+1
Since moving the interface combination checks to mac80211, it's broken because it now only considers interfaces with an assigned channel context, so for example any interface that isn't active can still be up, which is clearly an issue; also, in particular P2P-Device wdevs are an issue since they never have a chanctx. Fix this by counting running interfaces instead the ones with a channel context assigned. Cc: [email protected] [3.16+] Fixes: 73de86a38962b ("cfg80211/mac80211: move interface counting for combination check to mac80211") Signed-off-by: Andrei Otcheretianski <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> [rewrite commit message, dig out the commit it fixes] Signed-off-by: Johannes Berg <[email protected]>
2015-03-15rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()Al Viro1-1/+1
[I would really like an ACK on that one from dhowells; it appears to be quite straightforward, but...] MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit in there. It gets passed via flags; in fact, another such check in the same function is done correctly - as flags & MSG_PEEK. It had been that way (effectively disabled) for 8 years, though, so the patch needs beating up - that case had never been tested. If it is correct, it's -stable fodder. Signed-off-by: Al Viro <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-15caif: fix MSG_OOB test in caif_seqpkt_recvmsg()Al Viro1-1/+1
It should be checking flags, not msg->msg_flags. It's ->sendmsg() instances that need to look for that in ->msg_flags, ->recvmsg() ones (including the other ->recvmsg() instance in that file, as well as unix_dgram_recvmsg() this one claims to be imitating) check in flags. Braino had been introduced in commit dcda13 ("caif: Bugfix - use MSG_TRUNC in receive") back in 2010, so it goes quite a while back. Signed-off-by: Al Viro <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-14bridge: reset bridge mtu after deleting an interfaceVenkat Venkatsubra1-0/+2
On adding an interface br_add_if() sets the MTU to the min of all the interfaces. Do the same thing on removing an interface too in br_del_if. Signed-off-by: Venkat Venkatsubra <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-13inet_diag: fix possible overflow in inet_diag_dump_one_icsk()Eric Dumazet1-3/+15
inet_diag_dump_one_icsk() allocates too small skb. Add inet_sk_attr_size() helper right before inet_sk_diag_fill() so that it can be updated if/when new attributes are added. iproute2/ss currently does not use this dump_one() interface, this might explain nobody noticed this problem yet. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-13netfilter: Fix potential crash in nft_hash walkerHerbert Xu1-0/+2
When we get back an EAGAIN from rhashtable_walk_next we were treating it as a valid object which obviously doesn't work too well. Luckily this is hard to trigger so it seems nobody has run into it yet. This patch fixes it by redoing the next call when we get an EAGAIN. Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-03-139p/trans_virtio: fix hot-unplugMichael S. Tsirkin1-4/+20
On device hot-unplug, 9p/virtio currently will kfree channel while it might still be in use. Of course, it might stay used forever, so it's an extremely ugly hack, but it seems better than use-after-free that we have now. [ Unused variable removed, whitespace cleanup, msg single-lined --RR ] Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2015-03-12netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()Ian Wilson1-0/+3
nfnl_cthelper_parse_tuple() is called from nfnl_cthelper_new(), nfnl_cthelper_get() and nfnl_cthelper_del(). In each case they pass a pointer to an nf_conntrack_tuple data structure local variable: struct nf_conntrack_tuple tuple; ... ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]); The problem is that this local variable is not initialized, and nfnl_cthelper_parse_tuple() only initializes two fields: src.l3num and dst.protonum. This leaves all other fields with undefined values based on whatever is on the stack: tuple->src.l3num = ntohs(nla_get_be16(tb[NFCTH_TUPLE_L3PROTONUM])); tuple->dst.protonum = nla_get_u8(tb[NFCTH_TUPLE_L4PROTONUM]); The symptom observed was that when the rpc and tns helpers were added then traffic to port 1536 was being sent to user-space. Signed-off-by: Ian Wilson <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-03-12rds: avoid potential stack overflowArnd Bergmann1-18/+22
The rds_iw_update_cm_id function stores a large 'struct rds_sock' object on the stack in order to pass a pair of addresses. This happens to just fit withint the 1024 byte stack size warning limit on x86, but just exceed that limit on ARM, which gives us this warning: net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] As the use of this large variable is basically bogus, we can rearrange the code to not do that. Instead of passing an rds socket into rds_iw_get_device, we now just pass the two addresses that we have available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly, to create two address structures on the stack there. Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Sowmini Varadhan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-12sock: fix possible NULL sk dereference in __skb_tstamp_txWillem de Bruijn1-2/+6
Test that sk != NULL before reading sk->sk_tsflags. Fixes: 49ca0d8bfaf3 ("net-timestamp: no-payload option") Reported-by: One Thousand Gnomes <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-11xps: must clear sender_cpu before forwardingEric Dumazet3-1/+3
John reported that my previous commit added a regression on his router. This is because sender_cpu & napi_id share a common location, so get_xps_queue() can see garbage and perform an out of bound access. We need to make sure sender_cpu is cleared before doing the transmit, otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger this bug. Signed-off-by: Eric Dumazet <[email protected]> Reported-by: John <[email protected]> Bisected-by: John <[email protected]> Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices") Signed-off-by: David S. Miller <[email protected]>
2015-03-11net: sysctl_net_core: check SNDBUF and RCVBUF for min lengthAlexey Kodanev1-4/+6
sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be set to incorrect values. Given that 'struct sk_buff' allocates from rcvbuf, incorrectly set buffer length could result to memory allocation failures. For example, set them as follows: # sysctl net.core.rmem_default=64 net.core.wmem_default = 64 # sysctl net.core.wmem_default=64 net.core.wmem_default = 64 # ping localhost -s 1024 -i 0 > /dev/null This could result to the following failure: skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32 head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL> kernel BUG at net/core/skbuff.c:102! invalid opcode: 0000 [#1] SMP ... task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000 RIP: 0010:[<ffffffff8155fbd1>] [<ffffffff8155fbd1>] skb_put+0xa1/0xb0 RSP: 0018:ffff88003ae8bc68 EFLAGS: 00010296 RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000 RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8 RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900 FS: 00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0 Stack: ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00 Call Trace: [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470 [<ffffffff81556f56>] sock_write_iter+0x146/0x160 [<ffffffff811d9612>] new_sync_write+0x92/0xd0 [<ffffffff811d9cd6>] vfs_write+0xd6/0x180 [<ffffffff811da499>] SyS_write+0x59/0xd0 [<ffffffff81651532>] system_call_fastpath+0x12/0x17 Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 RIP [<ffffffff8155fbd1>] skb_put+0xa1/0xb0 RSP <ffff88003ae8bc68> Kernel panic - not syncing: Fatal exception Moreover, the possible minimum is 1, so we can get another kernel panic: ... BUG: unable to handle kernel paging request at ffff88013caee5c0 IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0 ... Signed-off-by: Alexey Kodanev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-11tcp: restore 1.5x per RTT limit to CUBIC cwnd growth in congestion avoidanceNeal Cardwell1-2/+4
Commit 814d488c6126 ("tcp: fix the timid additive increase on stretch ACKs") fixed a bug where tcp_cong_avoid_ai() would either credit a connection with an increase of snd_cwnd_cnt, or increase snd_cwnd, but not both, resulting in cwnd increasing by 1 packet on at most every alternate invocation of tcp_cong_avoid_ai(). Although the commit correctly implemented the CUBIC algorithm, which can increase cwnd by as much as 1 packet per 1 packet ACKed (2x per RTT), in practice that could be too aggressive: in tests on network paths with small buffers, YouTube server retransmission rates nearly doubled. This commit restores CUBIC to a maximum cwnd growth rate of 1 packet per 2 packets ACKed (1.5x per RTT). In YouTube tests this restored retransmit rates to low levels. Testing: This patch has been tested in datacenter netperf transfers and live youtube.com and google.com servers. Fixes: 9cd981dcf174 ("tcp: fix stretch ACK bugs in CUBIC") Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-11tcp: fix tcp_cong_avoid_ai() credit accumulation bug with decreases in wNeal Cardwell1-0/+6
The recent change to tcp_cong_avoid_ai() to handle stretch ACKs introduced a bug where snd_cwnd_cnt could accumulate a very large value while w was large, and then if w was reduced snd_cwnd could be incremented by a large delta, leading to a large burst and high packet loss. This was tickled when CUBIC's bictcp_update() sets "ca->cnt = 100 * cwnd". This bug crept in while preparing the upstream version of 814d488c6126. Testing: This patch has been tested in datacenter netperf transfers and live youtube.com and google.com servers. Fixes: 814d488c6126 ("tcp: fix the timid additive increase on stretch ACKs") Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-03-10net: Handle unregister properly when netdev namespace change fails.David S. Miller1-13/+13
If rtnl_newlink() fails on it's call to dev_change_net_namespace(), we have to make use of the ->dellink() method, if present, just like we do when rtnl_configure_link() fails. Fixes: 317f4810e45e ("rtnl: allow to create device with IFLA_LINK_NETNSID set") Signed-off-by: David S. Miller <[email protected]>
2015-03-10net: add comment for sock_efree() usageOliver Hartkopp1-0/+4
Signed-off-by: Oliver Hartkopp <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>