aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-11-03mac80211: treat bad WMM parameters more gracefullyJohannes Berg1-94/+48
As WMM is required for HT/VHT operation, treat bad WMM parameters more gracefully by falling back to default parameters instead of not using WMM assocation. This makes it possible to still use HT or VHT, although potentially with reduced quality of service due to unintended WMM parameters. Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: fixup AIFSN instead of disabling WMMEmmanuel Grumbach1-7/+7
Disabling WMM has a huge impact these days. It implies that HT and VHT will be disabled which means that the throughput will be drammatically reduced. Since the AIFSN is a transmission parameter, we can play a bit and fix it up to make it compliant with the 802.11 specification which requires it to be at least 2. Increasing it from 1 to 2 will slightly reduce the likelyhood to get a transmission opportunity compared to other clients that would accept to set AIFSN=1, but at least it will allow HT and VHT which is a huge gain. Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: make enable_qos parameter to ieee80211_set_wmm_default()Johannes Berg5-16/+11
The function currently determines this value, for use in bss_info.qos, based on the interface type itself. Make it a parameter instead and set it with the same logic for now. Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: fix crash on mesh local link ID generation with VIFsMatthias Schiffer1-0/+3
llid_in_use needs to be limited to stations of the same VIF, otherwise it will cause a NULL deref as the sta_info of non-mesh-VIFs don't have sta->mesh set. Steps to reproduce: modprobe mac80211_hwsim channels=2 iw phy phy0 interface add ibss0 type ibss iw phy phy0 interface add mesh0 type mp iw phy phy1 interface add ibss1 type ibss iw phy phy1 interface add mesh1 type mp ip link set ibss0 up ip link set mesh0 up ip link set ibss1 up ip link set mesh1 up iw dev ibss0 ibss join foo 2412 iw dev ibss1 ibss join foo 2412 # Ensure that ibss0 and ibss1 are actually associated; I often need to # leave and join the cell on ibss1 a second time. iw dev mesh0 mesh join bar iw dev mesh1 mesh join bar # crash Signed-off-by: Matthias Schiffer <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: TDLS: add proper HT-oper IEArik Nemtsov5-7/+18
When 11n peers performs a TDLS connection on a legacy BSS, the HT operation IE must be specified according to IEEE802.11-2012 section 9.23.3.2. Otherwise HT-protection is compromised and the medium becomes noisy for both the TDLS and the BSS links. Signed-off-by: Arik Nemtsov <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: don't reconfigure sched scan in case of wowlanEliad Peller5-35/+45
Scheduled scan has to be reconfigured only if wowlan wasn't configured, since otherwise it should continue to run (with the 'any' trigger) or be aborted. The current code will end up asking the driver to start a new scheduled scan without stopping the previous one, and leaking some memory (from the previous request.) Fix this by doing the abort/restart under the proper conditions. Signed-off-by: Eliad Peller <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: call drv_stop only if driver is startedEliad Peller3-31/+48
If drv_start() fails during hw_restart, all the running interfaces are being closed/stopped, which results in drv_stop() being called, although the driver was never started successfully. This might cause drivers to perform operations on uninitialized memory (as they assume it was initialized on drv_start) Consider the local->started flag, and call the driver's stop() op only if drv_start() succeeded before. Move drv_start() and drv_stop() to driver-ops.c, as they are no longer simple wrappers. Signed-off-by: Eliad Peller <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: Remove WARN_ON_ONCE in ieee80211_recalc_smpsAndrei Otcheretianski1-1/+7
The recalc_smps work can run after the station disassociates. At this stage we already released the channel, but the work will be cancelled only when the interface stops. In this scenario we can hit the warning in ieee80211_recalc_smps, so just remove it. Signed-off-by: Andrei Otcheretianski <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: use freezable workqueue for restart workEliad Peller2-1/+12
Requesting hw restart during suspend might result in the restart work being executed after mac80211 and the hw are suspended. Solve the race by simply scheduling the restart work on a freezable workqueue. Note that there can be some cases of reconfiguration on resume (besides the hardware restart): * wowlan is not configured - All the interfaces removed were removed on suspend, and drv_stop() was called. At this point the driver shouldn't expect for hw_restart anyway, so we can simply cancel it (on resume). * wowlan is configured, drv_resume() == 1 There is no definitive expected behavior in this case, as each driver might have different expectations (e.g. setting some flags on suspend/restart vs. not handling spurious recovery). For now, simply let the hw_restart work run again after resume, and hope the driver will handle it well (or at least initiate another hw restart). Signed-off-by: Eliad Peller <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: Fix local deauth while associatingAndrei Otcheretianski1-0/+19
Local request to deauthenticate wasn't handled while associating, thus the association could continue even when the user space required to disconnect. Cc: [email protected] Signed-off-by: Andrei Otcheretianski <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: allow null chandef in tracingArik Nemtsov1-5/+5
In TDLS channel-switch operations the chandef can sometimes be NULL. Avoid an oops in the trace code for these cases and just print a chandef full of zeros. Cc: [email protected] Fixes: a7a6bdd0670fe ("mac80211: introduce TDLS channel switch ops") Signed-off-by: Arik Nemtsov <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03nl80211: Fix potential memory leak from parse_acl_dataOla Olsson1-6/+6
If parse_acl_data succeeds but the subsequent parsing of smps attributes fails, there will be a memory leak due to early returns. Fix that by moving the ACL parsing later. Cc: [email protected] Fixes: 18998c381b19b ("cfg80211: allow requesting SMPS mode on ap start") Signed-off-by: Ola Olsson <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-03mac80211: fix divide by zero when NOA update[email protected]1-0/+7
In case of one shot NOA the interval can be 0, catch that instead of potentially (depending on the driver) crashing like this: divide error: 0000 [#1] SMP [...] Call Trace: <IRQ> [<ffffffffc08e891c>] ieee80211_extend_absent_time+0x6c/0xb0 [mac80211] [<ffffffffc08e8a17>] ieee80211_update_p2p_noa+0xb7/0xe0 [mac80211] [<ffffffffc069cc30>] ath9k_p2p_ps_timer+0x170/0x190 [ath9k] [<ffffffffc070adf8>] ath_gen_timer_isr+0xc8/0xf0 [ath9k_hw] [<ffffffffc0691156>] ath9k_tasklet+0x296/0x2f0 [ath9k] [<ffffffff8107ad65>] tasklet_action+0xe5/0xf0 [...] Cc: [email protected] [3.16+, due to d463af4a1c34 using it] Signed-off-by: Janusz Dziedzic <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-11-02net/core: generic support for disabling netdev features down stackJarod Wilson1-0/+50
There are some netdev features, which when disabled on an upper device, such as a bonding master or a bridge, must be disabled and cannot be re-enabled on underlying devices. This is a rework of an earlier more heavy-handed appraoch, which simply disables and prevents re-enabling of netdev features listed in a new define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper device that disables a flag in that feature mask, the disabling will propagate down the stack, and any lower device that has any upper device with one of those flags disabled should not be able to enable said flag. Initially, only LRO is included for proof of concept, and because this code effectively does the same thing as dev_disable_lro(), though it will also activate from the ethtool path, which was one of the goals here. [root@dell-per730-01 ~]# ethtool -k bond0 |grep large large-receive-offload: on [root@dell-per730-01 ~]# ethtool -k p5p1 |grep large large-receive-offload: on [root@dell-per730-01 ~]# ethtool -K bond0 lro off [root@dell-per730-01 ~]# ethtool -k bond0 |grep large large-receive-offload: off [root@dell-per730-01 ~]# ethtool -k p5p1 |grep large large-receive-offload: off dmesg dump: [ 1033.277986] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2. [ 1034.067949] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83 [ 1034.753612] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1. [ 1035.591019] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71 This has been successfully tested with bnx2x, qlcnic and netxen network cards as slaves in a bond interface. Turning LRO on or off on the master also turns it on or off on each of the slaves, new slaves are added with LRO in the same state as the master, and LRO can't be toggled on the slaves. Also, this should largely remove the need for dev_disable_lro(), and most, if not all, of its call sites can be replaced by simply making sure NETIF_F_LRO isn't included in the relevant device's feature flags. Note that this patch is driven by bug reports from users saying it was confusing that bonds and slaves had different settings for the same features, and while it won't be 100% in sync if a lower device doesn't support a feature like LRO, I think this is a good step in the right direction. CC: "David S. Miller" <[email protected]> CC: Eric Dumazet <[email protected]> CC: Jay Vosburgh <[email protected]> CC: Veaceslav Falico <[email protected]> CC: Andy Gospodarek <[email protected]> CC: Jiri Pirko <[email protected]> CC: Nikolay Aleksandrov <[email protected]> CC: Michal Kubecek <[email protected]> CC: Alexander Duyck <[email protected]> CC: [email protected] Signed-off-by: Jarod Wilson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02sit: fix sit0 percpu double allocationsEric Dumazet1-22/+4
sit0 device allocates its percpu storage twice : - One time in ipip6_tunnel_init() - One time in ipip6_fb_tunnel_init() Thus we leak 48 bytes per possible cpu per network namespace dismantle. ipip6_fb_tunnel_init() can be much simpler and does not return an error, and should be called after register_netdev() Note that ipip6_tunnel_clone_6rd() also needs to be called after register_netdev() (calling ipip6_tunnel_init()) Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Cc: Steffen Klassert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02net: fix percpu memory leaksEric Dumazet5-18/+35
This patch fixes following problems : 1) percpu_counter_init() can return an error, therefore init_frag_mem_limit() must propagate this error so that inet_frags_init_net() can do the same up to its callers. 2) If ip[46]_frags_ns_ctl_register() fail, we must unwind properly and free the percpu_counter. Without this fix, we leave freed object in percpu_counters global list (if CONFIG_HOTPLUG_CPU) leading to crashes. This bug was detected by KASAN and syzkaller tool (http://github.com/google/syzkaller) Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02libceph: clear msg->con in ceph_msg_release() onlyIlya Dryomov2-28/+20
The following bit in ceph_msg_revoke_incoming() is unsafe: struct ceph_connection *con = msg->con; if (!con) return; mutex_lock(&con->mutex); <more msg->con use> There is nothing preventing con from getting destroyed right after msg->con test. One easy way to reproduce this is to disable message signing only on the server side and try to map an image. The system will go into a libceph: read_partial_message ffff880073f0ab68 signature check failed libceph: osd0 192.168.255.155:6801 bad crc/signature libceph: read_partial_message ffff880073f0ab68 signature check failed libceph: osd0 192.168.255.155:6801 bad crc/signature loop which has to be interrupted with Ctrl-C. Hit Ctrl-C and you are likely to end up with a random GP fault if the reset handler executes "within" ceph_msg_revoke_incoming(): <yet another reply w/o a signature> ... <Ctrl-C> rbd_obj_request_end ceph_osdc_cancel_request __unregister_request ceph_osdc_put_request ceph_msg_revoke_incoming ... osd_reset __kick_osd_requests __reset_osd remove_osd ceph_con_close reset_connection <clear con->in_msg->con> <put con ref> put_osd <free osd/con> <msg->con use> <-- !!! If ceph_msg_revoke_incoming() executes "before" the reset handler, osd/con will be leaked because ceph_msg_revoke_incoming() clears con->in_msg but doesn't put con ref, while reset_connection() only puts con ref if con->in_msg != NULL. The current msg->con scheme was introduced by commits 38941f8031bf ("libceph: have messages point to their connection") and 92ce034b5a74 ("libceph: have messages take a connection reference"), which defined when messages get associated with a connection and when that association goes away. Part of the problem is that this association is supposed to go away in much too many places; closing this race entirely requires either a rework of the existing or an addition of a new layer of synchronization. In lieu of that, we can make it *much* less likely to hit by disassociating messages only on their destruction and resend through a different connection. This makes the code simpler and is probably a good thing to do regardless - this patch adds a msg_con_set() helper which is is called from only three places: ceph_con_send() and ceph_con_in_msg_alloc() to set msg->con and ceph_msg_release() to clear it. Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02libceph: add nocephx_sign_messages optionIlya Dryomov3-1/+20
Support for message signing was merged into 3.19, along with nocephx_require_signatures option. But, all that option does is allow the kernel client to talk to clusters that don't support MSG_AUTH feature bit. That's pretty useless, given that it's been supported since bobtail. Meanwhile, if one disables message signing on the server side with "cephx sign messages = false", it becomes impossible to use the kernel client since it expects messages to be signed if MSG_AUTH was negotiated. Add nocephx_sign_messages option to support this use case. Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02libceph: stop duplicating client fields in messengerIlya Dryomov2-22/+10
supported_features and required_features serve no purpose at all, while nocrc and tcp_nodelay belong to ceph_options::flags. Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02libceph: drop authorizer check from cephx msg signing routinesIlya Dryomov1-4/+1
I don't see a way for auth->authorizer to be NULL in ceph_x_sign_message() or ceph_x_check_message_signature(). Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02libceph: msg signing callouts don't need con argumentIlya Dryomov2-8/+10
We can use msg->con instead - at the point we sign an outgoing message or check the signature on the incoming one, msg->con is always set. We wouldn't know how to sign a message without an associated session (i.e. msg->con == NULL) and being able to sign a message using an explicitly provided authorizer is of no use. Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02libceph: evaluate osd_req_op_data() arguments only onceIoana Ciornei1-5/+7
This patch changes the osd_req_op_data() macro to not evaluate arguments more than once in order to follow the kernel coding style. Signed-off-by: Ioana Ciornei <[email protected]> Reviewed-by: Alex Elder <[email protected]> [[email protected]: changelog, formatting] Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02libceph: introduce ceph_x_authorizer_cleanup()Ilya Dryomov2-12/+20
Commit ae385eaf24dc ("libceph: store session key in cephx authorizer") introduced ceph_x_authorizer::session_key, but didn't update all the exit/error paths. Introduce ceph_x_authorizer_cleanup() to encapsulate ceph_x_authorizer cleanup and switch to it. This fixes ceph_x_destroy(), which currently always leaks key and ceph_x_build_authorizer() error paths. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Yan, Zheng <[email protected]>
2015-11-02libceph: use local variable cursor instead of &msg->cursorShraddha Barke1-6/+5
Use local variable cursor in place of &msg->cursor in read_partial_msg_data() and write_partial_msg_data(). Signed-off-by: Shraddha Barke <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02libceph: remove con argument in handle_reply()Shraddha Barke1-3/+2
Since handle_reply() does not use its con argument, remove it. Signed-off-by: Shraddha Barke <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2015-11-02Merge tag 'nfs-rdma-4.4-2' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust14-317/+923
NFS: NFSoRDMA Client Side Changes In addition to a variety of bugfixes, these patches are mostly geared at enabling both swap and backchannel support to the NFS over RDMA client. Signed-off-by: Anna Schumake <[email protected]>
2015-11-02ipv6: fix crash on ICMPv6 redirects with prohibited/blackholed sourceMatthias Schiffer1-2/+1
There are other error values besides ip6_null_entry that can be returned by ip6_route_redirect(): fib6_rule_action() can also result in ip6_blk_hole_entry and ip6_prohibit_entry if such ip rules are installed. Only checking for ip6_null_entry in rt6_do_redirect() causes ip6_ins_rt() to be called with rt->rt6i_table == NULL in these cases, making the kernel crash. Signed-off-by: Matthias Schiffer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02NFS: Enable client side NFSv4.1 backchannel to use other transportsChuck Lever4-0/+35
Forechannel transports get their own "bc_up" method to create an endpoint for the backchannel service. Signed-off-by: Chuck Lever <[email protected]> [Anna Schumaker: Add forward declaration of struct net to xprt.h] Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02net: make skb_set_owner_w() more robustEric Dumazet2-3/+23
skb_set_owner_w() is called from various places that assume skb->sk always point to a full blown socket (as it changes sk->sk_wmem_alloc) We'd like to attach skb to request sockets, and in the future to timewait sockets as well. For these kind of pseudo sockets, we need to take a traditional refcount and use sock_edemux() as the destructor. It is now time to un-inline skb_set_owner_w(), being too big. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <[email protected]> Bisected-by: Haiyang Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02bridge: vlan: Use rcu_dereference instead of rtnl_dereferenceIdo Schimmel1-1/+1
br_should_learn() is protected by RCU and not by RTNL, so use correct flavor of nbp_vlan_group(). Fixes: 907b1e6e83ed ("bridge: vlan: use proper rcu for the vlgrp member") Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() ↵Ani Sinha1-3/+3
in preemptible context. Fixes the following kernel BUG : BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758 caller is __this_cpu_preempt_check+0x13/0x15 CPU: 0 PID: 2758 Comm: bash Tainted: P O 3.18.19 #2 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8 Call Trace: [<ffffffff81482b2a>] dump_stack+0x52/0x80 [<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1 [<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15 [<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c [<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e [<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810e6974>] ? pollwake+0x4d/0x51 [<ffffffff81058ac0>] ? default_wake_function+0x0/0xf [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810613d9>] ? __wake_up_common+0x45/0x77 [<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32 [<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53 [<ffffffff8139a519>] ? sock_def_readable+0x71/0x75 [<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55 [<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41 [<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86 [<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d [<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b [<ffffffff810d5738>] ? new_sync_read+0x82/0xaa [<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99 [<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32 [<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f [<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf [<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3 [<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e Signed-off-by: Ani Sinha <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02bridge: vlan: Use correct flag name in commentIdo Schimmel1-3/+3
The flag used to indicate if a VLAN should be used for filtering - as opposed to context only - on the bridge itself (e.g. br0) is called 'brentry' and not 'brvlan'. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02bridge: vlan: Prevent possible use-after-freeIdo Schimmel1-0/+2
When adding a port to a bridge we initialize VLAN filtering on it. We do not bail out in case an error occurred in nbp_vlan_init, as it can be used as a non VLAN filtering bridge. However, if VLAN filtering is required and an error occurred in nbp_vlan_init, we should set vlgrp to NULL, so that VLAN filtering functions (e.g. br_vlan_find, br_get_pvid) will know the struct is invalid and will not try to access it. Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02tcp/dccp: fix ireq->pktopts raceEric Dumazet2-17/+17
IPv6 request sockets store a pointer to skb containing the SYN packet to be able to transfer it to full blown socket when 3WHS is done (ireq->pktopts -> np->pktoptions) As explained in commit 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions"), we must transfer the skb only if we won the hashdance race, if multiple cpus receive the 'ack' packet completing 3WHS at the same time. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02RDS: convert bind hash table to re-sizable hashtable[email protected]3-86/+56
To further improve the RDS connection scalabilty on massive systems where number of sockets grows into tens of thousands of sockets, there is a need of larger bind hashtable. Pre-allocated 8K or 16K table is not very flexible in terms of memory utilisation. The rhashtable infrastructure gives us the flexibility to grow the hashtbable based on use and also comes up with inbuilt efficient bucket(chain) handling. Reviewed-by: David Miller <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02net: rds: changing the return type from int to voidSaurabh Sengar1-4/+2
as result of function rds_iw_flush_mr_pool is nowhere checked, changing its return type from int to void. also removing the unused variable rc as there is nothing to return Signed-off-by: Saurabh Sengar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02ipv4: use l4 hash for locally generated multipath flowsPaolo Abeni1-1/+2
This patch changes how the multipath hash is computed for locally generated flows: now the hash comprises l4 information. This allows better utilization of the available paths when the existing flows have the same source IP and the same destination IP: with l3 hash, even when multiple connections are in place simultaneously, a single path will be used, while with l4 hash we can use all the available paths. v2 changes: - use get_hash_from_flowi4() instead of implementing just another l4 hash function Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02SUNRPC: Remove the TCP-only restriction in bc_svc_process()Chuck Lever1-5/+0
Allow the use of other transport classes when handling a backward direction RPC call. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02svcrdma: Add backward direction service for RPC/RDMA transportChuck Lever2-0/+64
On NFSv4.1 mount points, the Linux NFS client uses this transport endpoint to receive backward direction calls and route replies back to the NFSv4.1 server. Signed-off-by: Chuck Lever <[email protected]> Acked-by: "J. Bruce Fields" <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Handle incoming backward direction RPC callsChuck Lever3-0/+161
Introduce a code path in the rpcrdma_reply_handler() to catch incoming backward direction RPC calls and route them to the ULP's backchannel server. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Add support for sending backward direction RPC repliesChuck Lever3-0/+51
Backward direction RPC replies are sent via the client transport's send_request method, the same way forward direction RPC calls are sent. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Pre-allocate Work Requests for backchannelChuck Lever3-2/+26
Pre-allocate extra send and receive Work Requests needed to handle backchannel receives and sends. The transport doesn't know how many extra WRs to pre-allocate until the xprt_setup_backchannel() call, but that's long after the WRs are allocated during forechannel setup. So, use a fixed value for now. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffersChuck Lever5-12/+309
xprtrdma's backward direction send and receive buffers are the same size as the forechannel's inline threshold, and must be pre- registered. The consumer has no control over which receive buffer the adapter chooses to catch an incoming backwards-direction call. Any receive buffer can be used for either a forward reply or a backward call. Thus both types of RPC message must all be the same size. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02SUNRPC: Abstract backchannel operationsChuck Lever2-2/+27
xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA bi-direction. In particular, receive buffers have to be pre- registered and posted in order to receive incoming backchannel requests. Add a virtual function call to allow the insertion of appropriate backchannel setup and destruction methods for each transport. In addition, freeing a backchannel request is a little different for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Saving IRQs no longer needed for rb_lockChuck Lever1-14/+10
Now that RPC replies are processed in a workqueue, there's no need to disable IRQs when managing send and receive buffers. This saves noticeable overhead per RPC. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Remove reply taskletChuck Lever1-43/+0
Clean up: The reply tasklet is no longer used. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Use workqueue to process RPC/RDMA repliesChuck Lever4-18/+65
The reply tasklet is fast, but it's single threaded. After reply traffic saturates a single CPU, there's no more reply processing capacity. Replace the tasklet with a workqueue to spread reply handling across all CPUs. This also moves RPC/RDMA reply handling out of the soft IRQ context and into a context that allows sleeps. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Replace send and receive arraysChuck Lever2-91/+73
The rb_send_bufs and rb_recv_bufs arrays are used to implement a pair of stacks for keeping track of free rpcrdma_req and rpcrdma_rep structs. Replace those arrays with free lists. To allow more than 512 RPCs in-flight at once, each of these arrays would be larger than a page (assuming 8-byte addresses and 4KB pages). Allowing up to 64K in-flight RPCs (as TCP now does), each buffer array would have to be 128 pages. That's an order-6 allocation. (Not that we're going there.) A list is easier to expand dynamically. Instead of allocating a larger array of pointers and copying the existing pointers to the new array, simply append more buffers to each list. This also makes it simpler to manage receive buffers that might catch backwards-direction calls, or to post receive buffers in bulk to amortize the overhead of ib_post_recv. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Devesh Sharma <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Refactor reply handler error handlingChuck Lever3-40/+53
Clean up: The error cases in rpcrdma_reply_handler() almost never execute. Ensure the compiler places them out of the hot path. No behavior change expected. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Devesh Sharma <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2015-11-02xprtrdma: Prevent loss of completion signalsChuck Lever2-41/+38
Commit 8301a2c047cc ("xprtrdma: Limit work done by completion handler") was supposed to prevent xprtrdma's upcall handlers from starving other softIRQ work by letting them return to the provider before all CQEs have been polled. The logic assumes the provider will call the upcall handler again immediately if the CQ is re-armed while there are still queued CQEs. This assumption is invalid. The IBTA spec says that after a CQ is armed, the hardware must interrupt only when a new CQE is inserted. xprtrdma can't rely on the provider calling again, even though some providers do. Therefore, leaving CQEs on queue makes sense only when there is another mechanism that ensures all remaining CQEs are consumed in a timely fashion. xprtrdma does not have such a mechanism. If a CQE remains queued, the transport can wait forever to send the next RPC. Finally, move the wcs array back onto the stack to ensure that the poll array is always local to the CPU where the completion upcall is running. Fixes: 8301a2c047cc ("xprtrdma: Limit work done by completion ...") Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Devesh Sharma <[email protected]> Tested-By: Devesh Sharma <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>