aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2019-02-22mac80211: ignore quiet mode in probeSara Sharon1-3/+3
Some buggy APs keep the CSA IE in probes after the channel switch was completed and can silence us for no good reason. Apply quiet mode only from beacons. If there is real channel switch going on, we will see the beacon anyway. Signed-off-by: Sara Sharon <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-02-22mac80211: allow CSA to self with immediate quietSara Sharon1-1/+2
Currently, due to some buggy APs that continue to include CSA IEs after the switch, we ignore CSA to same channel. However, some other APs may do CSA to self in order to have immediate quiet. Allow it. Do it only for beacons. Signed-off-by: Sara Sharon <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-02-22mac80211: notify driver on subsequent CSA beaconsSara Sharon3-14/+70
Some drivers may want to track further the CSA beacons, for example to compensate for buggy APs that change the beacon count or quiet mode during CSA flow. Signed-off-by: Sara Sharon <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-02-22mac80211: fix position of vendor_data readLiad Kaufman1-2/+19
The ieee80211_vendor_radiotap was read from the beginning of the skb->data regardless of the existence of other elements in radiotap that would cause it to move to another position. Fix this by taking into account where it really should be. Signed-off-by: Liad Kaufman <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-02-22mac80211: abort CSA if beacon does not include CSA IEsSara Sharon3-6/+57
In case we receive a beacon without CSA IE while we are in the middle of channel switch - abort the operation. Signed-off-by: Sara Sharon <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-02-22mac80211: support max channel switch time elementSara Sharon4-0/+13
2018 REVmd of the spec introduces the max channel switch time element which is optionally included in beacons/probes when there is a channel switch / extended channel switch element. The value represents the maximum delay between the time the AP transmitted the last beacon in current channel and the expected time of the first beacon in the new channel, in TU. Parse the value and pass it to the driver. Signed-off-by: Sara Sharon <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-02-22cfg80211: Report Association Request frame IEs in association eventsJouni Malinen5-14/+41
This extends the NL80211_CMD_ASSOCIATE event case to report NL80211_ATTR_REQ_IE similarly to what is already done with the NL80211_CMD_CONNECT events if the driver provides this information. In practice, this adds (Re)Association Request frame information element reporting to mac80211 drivers for the cases where user space SME is used. This provides more information for user space to figure out which capabilities were negotiated for the association. For example, this can be used to determine whether HT, VHT, or HE is used. Signed-off-by: Jouni Malinen <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-02-22cfg80211: pmsr: use eth_broadcast_addr() to assign broadcast addressMao Wenan1-1/+1
This patch is to use eth_broadcast_addr() to assign broadcast address insetad of memset(). Signed-off-by: Mao Wenan <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-02-22mac80211: Change default tx_sk_pacing_shift to 7Toke Høiland-Jørgensen1-2/+2
When we did the original tests for the optimal value of sk_pacing_shift, we came up with 6 ms of buffering as the default. Sadly, 6 is not a power of two, so when picking the shift value I erred on the size of less buffering and picked 4 ms instead of 8. This was probably wrong; those 2 ms of extra buffering makes a larger difference than I thought. So, change the default pacing shift to 7, which corresponds to 8 ms of buffering. The point of diminishing returns really kicks in after 8 ms, and so having this as a default should cut down on the need for extensive per-device testing and overrides needed in the drivers. Cc: [email protected] Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-02-21phonet: fix building with clangArnd Bergmann1-16/+16
clang warns about overflowing the data[] member in the struct pnpipehdr: net/phonet/pep.c:295:8: warning: array index 4 is past the end of the array (which contains 1 element) [-Warray-bounds] if (hdr->data[4] == PEP_IND_READY) ^ ~ include/net/phonet/pep.h:66:3: note: array 'data' declared here u8 data[1]; Using a flexible array member at the end of the struct avoids the warning, but since we cannot have a flexible array member inside of the union, each index now has to be moved back by one, which makes it a little uglier. Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Rémi Denis-Courmont <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21Merge branch 'master' of ↵David S. Miller8-44/+44
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2019-02-21 1) Don't do TX bytes accounting for the esp trailer when sending from a request socket as this will result in an out of bounds memory write. From Martin Willi. 2) Destroy xfrm_state synchronously on net exit path to avoid nested gc flush callbacks that may trigger a warning in xfrm6_tunnel_net_exit(). From Cong Wang. 3) Do an unconditionally clone in pfkey_broadcast_one() to avoid a race when freeing the skb. From Sean Tranchetti. 4) Fix inbound traffic via XFRM interfaces across network namespaces. We did the lookup for interfaces and policies in the wrong namespace. From Tobias Brunner. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretapLorenzo Bianconi1-18/+18
Report erspan version field to userspace in ip6gre_fill_info just for erspan_v6 tunnels. Moreover report IFLA_GRE_ERSPAN_INDEX only for erspan version 1. The issue can be triggered with the following reproducer: $ip link add name gre6 type ip6gre local 2001::1 remote 2002::2 $ip link set gre6 up $ip -d link sh gre6 14: grep6@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1448 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/gre6 2001::1 peer 2002::2 promiscuity 0 minmtu 0 maxmtu 0 ip6gre remote 2002::2 local 2001::1 hoplimit 64 encaplimit 4 tclass 0x00 flowlabel 0x00000 erspan_index 0 erspan_ver 0 addrgenmode eui64 Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support") Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: ip_gre: do not report erspan_ver for gre or gretapLorenzo Bianconi1-16/+17
Report erspan version field to userspace in ipgre_fill_info just for erspan tunnels. The issue can be triggered with the following reproducer: $ip link add name gre1 type gre local 192.168.0.1 remote 192.168.1.1 $ip link set dev gre1 up $ip -d link sh gre1 13: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/gre 192.168.0.1 peer 192.168.1.1 promiscuity 0 minmtu 0 maxmtu 0 gre remote 192.168.1.1 local 192.168.0.1 ttl inherit erspan_ver 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre") Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: Get rid of switchdev_port_attr_get()Florian Fainelli1-7/+0
With the bridge no longer calling switchdev_port_attr_get() to obtain the supported bridge port flags from a driver but instead trying to set the bridge port flags directly and relying on driver to reject unsupported configurations, we can effectively get rid of switchdev_port_attr_get() entirely since this was the only place where it was called. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: Remove SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORTFlorian Fainelli1-15/+1
Now that we have converted the bridge code and the drivers to check for bridge port(s) flags at the time we try to set them, there is no need for a get() -> set() sequence anymore and SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT therefore becomes unused. Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: bridge: Stop calling switchdev_port_attr_get()Florian Fainelli1-6/+5
Now that all switchdev drivers have been converted to check the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS flags and report flags that they do not support accordingly, we can migrate the bridge code to try to set that attribute first, check the results and then do the actual setting. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: dsa: Add setter for SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGSFlorian Fainelli3-0/+18
In preparation for removing SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT, add support for a function that processes the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attributes and returns not supported for any flag set, since DSA does not currently support toggling those bridge port attributes (yet). Signed-off-by: Florian Fainelli <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: dsa: enable flooding for bridge portsRussell King1-3/+13
Switches work by learning the MAC address for each attached station by monitoring traffic from each station. When a station sends a packet, the switch records which port the MAC address is connected to. With IPv4 networking, before communication commences with a neighbour, an ARP packet is broadcasted to all stations asking for the MAC address corresponding with the IPv4. The desired station responds with an ARP reply, and the ARP reply causes the switch to learn which port the station is connected to. With IPv6 networking, the situation is rather different. Rather than broadcasting ARP packets, a "neighbour solicitation" is multicasted rather than broadcasted. This multicast needs to reach the intended station in order for the neighbour to be discovered. Once a neighbour has been discovered, and entered into the sending stations neighbour cache, communication can restart at a point later without sending a new neighbour solicitation, even if the entry in the neighbour cache is marked as stale. This can be after the MAC address has expired from the forwarding cache of the DSA switch - when that occurs, there is a long pause in communication. Our DSA implementation for mv88e6xxx switches disables flooding of multicast and unicast frames for bridged ports. As per the above description, this is fine for IPv4 networking, since the broadcasted ARP queries will be sent to and received by all stations on the same network. However, this breaks IPv6 very badly - blocking neighbour solicitations and later causing connections to stall. The defaults that the Linux bridge code expect from bridges are for unknown unicast and unknown multicast frames to be flooded to all ports on the bridge, which is at odds to the defaults adopted by our DSA implementation for mv88e6xxx switches. This commit enables by default flooding of both unknown unicast and unknown multicast frames whenever a port is added to a bridge, and disables the flooding when a port leaves the bridge. This means that mv88e6xxx DSA switches now behave as per the bridge(8) man page, and IPv6 works flawlessly through such a switch. Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: dsa: add support for bridge flagsRussell King3-0/+28
The Linux bridge implementation allows various properties of the bridge to be controlled, such as flooding unknown unicast and multicast frames. This patch adds the necessary DSA infrastructure to allow the Linux bridge support to control these properties for DSA switches. Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: Russell King <[email protected]> [florian: Add missing dp and ds variables declaration to fix build] Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21tipc: improve function tipc_wait_for_rcvmsg()Tung Nguyen1-4/+5
This commit replaces schedule_timeout() with wait_woken() in function tipc_wait_for_rcvmsg(). wait_woken() uses memory barriers in its implementation to avoid potential race condition when putting a process into sleeping state and then waking it up. Acked-by: Ying Xue <[email protected]> Acked-by: Jon Maloy <[email protected]> Signed-off-by: Tung Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21tipc: improve function tipc_wait_for_cond()Tung Nguyen1-1/+1
Commit 844cf763fba6 ("tipc: make macro tipc_wait_for_cond() smp safe") replaced finish_wait() with remove_wait_queue() but still used prepare_to_wait(). This causes unnecessary conditional checking before adding to wait queue in prepare_to_wait(). This commit replaces prepare_to_wait() with add_wait_queue() as the pair function with remove_wait_queue(). Acked-by: Ying Xue <[email protected]> Acked-by: Jon Maloy <[email protected]> Signed-off-by: Tung Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21bridge: remove redundant check on err in br_multicast_ipv4_rcvLi RongQing1-6/+1
br_ip4_multicast_mrd_rcv only return 0 and -ENOMSG, no other negative value Signed-off-by: Li RongQing <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: remove unneeded switch fall-throughLi RongQing2-2/+0
This case block has been terminated by a return, so not need a switch fall-through Signed-off-by: Li RongQing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net: sched: potential NULL dereference in tcf_block_find()Dan Carpenter1-1/+3
The error code isn't set on this path so it would result in returning ERR_PTR(0) and a NULL dereference in the caller. Fixes: 18d3eefb17cf ("net: sched: refactor tcf_block_find() into standalone functions") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21ipmr: ip6mr: Create new sockopt to clear mfc cache or vifsCallum Sinclair2-54/+99
Currently the only way to clear the forwarding cache was to delete the entries one by one using the MRT_DEL_MFC socket option or to destroy and recreate the socket. Create a new socket option which with the use of optional flags can clear any combination of multicast entries (static or not static) and multicast vifs (static or not static). Calling the new socket option MRT_FLUSH with the flags MRT_FLUSH_MFC and MRT_FLUSH_VIFS will clear all entries and vifs on the socket except for static entries. Signed-off-by: Callum Sinclair <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21devlink: Modify reply of DEVLINK_CMD_HEALTH_REPORTER_GETAya Levin1-2/+4
Avoid sending attributes related to recovery: DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD and DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER in reply to DEVLINK_CMD_HEALTH_REPORTER_GET for a reporter which didn't register a recover operation. These parameters can't be configured on a reporter that did not provide a recover operation, thus not needed to return them. Fixes: 7afe335a8bed ("devlink: Add health get command") Signed-off-by: Aya Levin <[email protected]> Signed-off-by: Eran Ben Elisha <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21devlink: Rename devlink health attributesAya Levin1-2/+2
Rename devlink health attributes for better reflect the attributes use. Add COUNT prefix on error counter attribute and recovery counter attribute. Fixes: 7afe335a8bed ("devlink: Add health get command") Signed-off-by: Aya Levin <[email protected]> Signed-off-by: Eran Ben Elisha <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net/smc: allow PCI IDs as ib device names in the pnet tableHans Wippel1-2/+4
SMC-D devices are identified by their PCI IDs in the pnet table. In order to make usage of the pnet table more consistent for users, this patch adds this form of identification for ib devices as well. Signed-off-by: Hans Wippel <[email protected]> Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net/smc: add pnet table namespace supportHans Wippel4-43/+162
This patch adds namespace support to the pnet table code. Each network namespace gets its own pnet table. Infiniband and smcd device pnetids can only be modified in the initial namespace. In other namespaces they can still be used as if they were set by the underlying hardware. Signed-off-by: Hans Wippel <[email protected]> Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net/smc: add smcd support to the pnet tableHans Wippel1-7/+80
Currently, users can only set pnetids for netdevs and ib devices in the pnet table. This patch adds support for smcd devices to the pnet table. Signed-off-by: Hans Wippel <[email protected]> Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net/smc: rework pnet tableHans Wippel4-194/+249
If a device does not have a pnetid, users can set a temporary pnetid for said device in the pnet table. This patch reworks the pnet table to make it more flexible. Multiple entries with the same pnetid but differing devices are now allowed. Additionally, the netlink interface now sends each mapping from pnetid to device separately to the user while maintaining the message format existing applications might expect. Also, the SMC data structure for ib devices already has a pnetid attribute. So, it is used to store the user defined pnetids. As a result, the pnet table entries are only used for netdevs. Signed-off-by: Hans Wippel <[email protected]> Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net/smc: cleanup for smcr_tx_sndbuf_nonemptyUrsula Braun1-3/+2
Use local variable pflags from the beginning of function smcr_tx_sndbuf_nonempty Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21net/smc: fix smc_poll in SMC_INIT stateUrsula Braun1-3/+3
smc_poll() returns with mask bit EPOLLPRI if the connection urg_state is SMC_URG_VALID. Since SMC_URG_VALID is zero, smc_poll signals EPOLLPRI errorneously if called in state SMC_INIT before the connection is created, for instance in a non-blocking connect scenario. This patch switches to non-zero values for the urg states. Reviewed-by: Karsten Graul <[email protected]> Fixes: de8474eb9d50 ("net/smc: urgent data support") Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21ipv6: route: enforce RCU protection in ip6_route_check_nh_onlink()Paolo Abeni1-1/+5
We need a RCU critical section around rt6_info->from deference, and proper annotation. Fixes: 4ed591c8ab44 ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21ipv6: route: enforce RCU protection in rt6_update_exception_stamp_rt()Paolo Abeni1-5/+6
We must access rt6_info->from under RCU read lock: move the dereference under such lock, with proper annotation. v1 -> v2: - avoid using multiple, racy, fetch operations for rt->from Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-21Merge tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds1-6/+9
Pull ceph fixes from Ilya Dryomov: "Two bug fixes for old issues, both marked for stable" * tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-client: ceph: avoid repeatedly adding inode to mdsc->snap_flush_list libceph: handle an empty authorize reply
2019-02-21Revert "xsk: simplify AF_XDP socket teardown"Björn Töpel1-1/+15
This reverts commit e2ce3674883ecba2605370404208c9d4a07ae1c3. It turns out that the sock destructor xsk_destruct was needed after all. The cleanup simplification broke the skb transmit cleanup path, due to that the umem was prematurely destroyed. The umem cannot be destroyed until all outstanding skbs are freed, which means that we cannot remove the umem until the sk_destruct has been called. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-02-21svcrpc: fix UDP on servers with lots of threadsJ. Bruce Fields1-10/+10
James Pearson found that an NFS server stopped responding to UDP requests if started with more than 1017 threads. sv_max_mesg is about 2^20, so that is probably where the calculation performed by svc_sock_setbufsize(svsk->sk_sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg, (serv->sv_nrthreads+3) * serv->sv_max_mesg); starts to overflow an int. Reported-by: James Pearson <[email protected]> Tested-by: James Pearson <[email protected]> Cc: [email protected] Signed-off-by: J. Bruce Fields <[email protected]>
2019-02-20net_sched: fix a memory leak in cls_tcindexCong Wang1-13/+24
(cherry picked from commit 033b228e7f26b29ae37f8bfa1bc6b209a5365e9f) When tcindex_destroy() destroys all the filter results in the perfect hash table, it invokes the walker to delete each of them. However, results with class==0 are skipped in either tcindex_walk() or tcindex_delete(), which causes a memory leak reported by kmemleak. This patch fixes it by skipping the walker and directly deleting these filter results so we don't miss any filter result. As a result of this change, we have to initialize exts->net properly in tcindex_alloc_perfect_hash(). For net-next, we need to consider whether we should initialize ->net in tcf_exts_init() instead, before that just directly test CONFIG_NET_CLS_ACT=y. Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-20net_sched: fix a race condition in tcindex_destroy()Cong Wang1-7/+11
(cherry picked from commit 8015d93ebd27484418d4952284fd02172fa4b0b2) tcindex_destroy() invokes tcindex_destroy_element() via a walker to delete each filter result in its perfect hash table, and tcindex_destroy_element() calls tcindex_delete() which schedules tcf RCU works to do the final deletion work. Unfortunately this races with the RCU callback __tcindex_destroy(), which could lead to use-after-free as reported by Adrian. Fix this by migrating this RCU callback to tcf RCU work too, as that workqueue is ordered, we will not have use-after-free. Note, we don't need to hold netns refcnt because we don't call tcf_exts_destroy() here. Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter") Reported-by: Adrian <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-20missing barriers in some of unix_sock ->addr and ->path accessesAl Viro2-25/+35
Several u->addr and u->path users are not holding any locks in common with unix_bind(). unix_state_lock() is useless for those purposes. u->addr is assign-once and *(u->addr) is fully set up by the time we set u->addr (all under unix_table_lock). u->path is also set in the same critical area, also before setting u->addr, and any unix_sock with ->path filled will have non-NULL ->addr. So setting ->addr with smp_store_release() is all we need for those "lockless" users - just have them fetch ->addr with smp_load_acquire() and don't even bother looking at ->path if they see NULL ->addr. Users of ->addr and ->path fall into several classes now: 1) ones that do smp_load_acquire(u->addr) and access *(u->addr) and u->path only if smp_load_acquire() has returned non-NULL. 2) places holding unix_table_lock. These are guaranteed that *(u->addr) is seen fully initialized. If unix_sock is in one of the "bound" chains, so's ->path. 3) unix_sock_destructor() using ->addr is safe. All places that set u->addr are guaranteed to have seen all stores *(u->addr) while holding a reference to u and unix_sock_destructor() is called when (atomic) refcount hits zero. 4) unix_release_sock() using ->path is safe. unix_bind() is serialized wrt unix_release() (normally - by struct file refcount), and for the instances that had ->path set by unix_bind() unix_release_sock() comes from unix_release(), so they are fine. Instances that had it set in unix_stream_connect() either end up attached to a socket (in unix_accept()), in which case the call chain to unix_release_sock() and serialization are the same as in the previous case, or they never get accept'ed and unix_release_sock() is called when the listener is shut down and its queue gets purged. In that case the listener's queue lock provides the barriers needed - unix_stream_connect() shoves our unix_sock into listener's queue under that lock right after having set ->path and eventual unix_release_sock() caller picks them from that queue under the same lock right before calling unix_release_sock(). 5) unix_find_other() use of ->path is pointless, but safe - it happens with successful lookup by (abstract) name, so ->path.dentry is guaranteed to be NULL there. earlier-variant-reviewed-by: "Paul E. McKenney" <[email protected]> Signed-off-by: Al Viro <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-20SUNRPC: Remove the redundant 'zerocopy' argument to xs_sendpages()Trond Myklebust1-12/+4
Signed-off-by: Trond Myklebust <[email protected]>
2019-02-20SUNRPC: Further cleanups of xs_sendpages()Trond Myklebust1-10/+1
Now that we send the pages using a struct msghdr, instead of using sendpage(), we no longer need to 'prime the socket' with an address for unconnected UDP messages. Signed-off-by: Trond Myklebust <[email protected]>
2019-02-20SUNRPC: Convert socket page send code to use iov_iter()Trond Myklebust2-36/+14
Simplify the page send code using iov_iter and bvecs. Signed-off-by: Trond Myklebust <[email protected]>
2019-02-20SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()Trond Myklebust1-15/+23
Prepare to the socket transmission code to use iov_iter. Signed-off-by: Trond Myklebust <[email protected]>
2019-02-20SUNRPC: Initiate a connection close on an ESHUTDOWN error in stream receiveTrond Myklebust1-1/+4
If the client stream receive code receives an ESHUTDOWN error either because the server closed the connection, or because it sent a callback which cannot be processed, then we should shut down the connection. Signed-off-by: Trond Myklebust <[email protected]>
2019-02-20SUNRPC: Don't suppress socket errors when a message read completesTrond Myklebust1-5/+2
If the message read completes, but the socket returned an error condition, we should ensure to propagate that error. Signed-off-by: Trond Myklebust <[email protected]>
2019-02-20SUNRPC: Handle zero length fragments correctlyTrond Myklebust1-14/+29
A zero length fragment is really a bug, but let's ensure we don't go nuts when one turns up. Signed-off-by: Trond Myklebust <[email protected]>
2019-02-20SUNRPC: Don't reset the stream record info when the receive worker is runningTrond Myklebust1-3/+9
To ensure that the receive worker has exclusive access to the stream record info, we must not reset the contents other than when holding the transport->recv_mutex. Signed-off-by: Trond Myklebust <[email protected]>
2019-02-20SUNRPC: remove pointless test in unx_match()NeilBrown1-1/+1
As reported by Dan Carpenter, this test for acred->cred being set is inconsistent with the dereference of the pointer a few lines earlier. An 'auth_cred' *always* has ->cred set - every place that creates one initializes this field, often as the first thing done. So remove this test. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>