aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2019-08-03netfilter: conntrack: use shared sysctl constantsMatteo Croce1-18/+16
Use shared sysctl variables for zero and one constants, as in commit eec4844fae7c ("proc/sysctl: add shared variables for range check") Fixes: 8f14c99c7eda ("netfilter: conntrack: limit sysctl setting for boolean options") Signed-off-by: Matteo Croce <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-08-03netfilter: synproxy: rename mss synproxy_options fieldFernando Fernandez Mancera4-10/+10
After introduce "mss_encode" field in the synproxy_options struct the field "mss" is a little confusing. It has been renamed to "mss_option". Signed-off-by: Fernando Fernandez Mancera <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-08-02hv_sock: Fix hang when a connection is closedDexuan Cui1-0/+8
There is a race condition for an established connection that is being closed by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the 'remove_sock' is false): 1 for the initial value; 1 for the sk being in the bound list; 1 for the sk being in the connected list; 1 for the delayed close_work. After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may* decrease the refcnt to 3. Concurrently, hvs_close_connection() runs in another thread: calls vsock_remove_sock() to decrease the refcnt by 2; call sock_put() to decrease the refcnt to 0, and free the sk; next, the "release_sock(sk)" may hang due to use-after-free. In the above, after hvs_release() finishes, if hvs_close_connection() runs faster than "__vsock_release() -> sock_put(sk)", then there is not any issue, because at the beginning of hvs_close_connection(), the refcnt is still 4. The issue can be resolved if an extra reference is taken when the connection is established. Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close") Signed-off-by: Dexuan Cui <[email protected]> Reviewed-by: Sunil Muthuswamy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-08-01tipc: reduce risk of wakeup queue starvationJon Maloy1-8/+21
In commit 365ad353c256 ("tipc: reduce risk of user starvation during link congestion") we allowed senders to add exactly one list of extra buffers to the link backlog queues during link congestion (aka "oversubscription"). However, the criteria for when to stop adding wakeup messages to the input queue when the overload abates is inaccurate, and may cause starvation problems during very high load. Currently, we stop adding wakeup messages after 10 total failed attempts where we find that there is no space left in the backlog queue for a certain importance level. The counter for this is accumulated across all levels, which may lead the algorithm to leave the loop prematurely, although there may still be plenty of space available at some levels. The result is sometimes that messages near the wakeup queue tail are not added to the input queue as they should be. We now introduce a more exact algorithm, where we keep adding wakeup messages to a level as long as the backlog queue has free slots for the corresponding level, and stop at the moment there are no more such slots or when there are no more wakeup messages to dequeue. Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion") Reported-by: Tung Nguyen <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-08-01tipc: compat: allow tipc commands without argumentsTaras Kondratiuk1-4/+7
Commit 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit") broke older tipc tools that use compat interface (e.g. tipc-config from tipcutils package): % tipc-config -p operation not supported The commit started to reject TIPC netlink compat messages that do not have attributes. It is too restrictive because some of such messages are valid (they don't need any arguments): % grep 'tx none' include/uapi/linux/tipc_config.h #define TIPC_CMD_NOOP 0x0000 /* tx none, rx none */ #define TIPC_CMD_GET_MEDIA_NAMES 0x0002 /* tx none, rx media_name(s) */ #define TIPC_CMD_GET_BEARER_NAMES 0x0003 /* tx none, rx bearer_name(s) */ #define TIPC_CMD_SHOW_PORTS 0x0006 /* tx none, rx ultra_string */ #define TIPC_CMD_GET_REMOTE_MNG 0x4003 /* tx none, rx unsigned */ #define TIPC_CMD_GET_MAX_PORTS 0x4004 /* tx none, rx unsigned */ #define TIPC_CMD_GET_NETID 0x400B /* tx none, rx unsigned */ #define TIPC_CMD_NOT_NET_ADMIN 0xC001 /* tx none, rx none */ This patch relaxes the original fix and rejects messages without arguments only if such arguments are expected by a command (reg_type is non zero). Fixes: 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit") Cc: [email protected] Signed-off-by: Taras Kondratiuk <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-08-01hrtimer/treewide: Use hrtimer_sleeper_start_expires()Thomas Gleixner1-1/+1
hrtimer_sleepers will gain a scheduling class dependent treatment on PREEMPT_RT. Use the new hrtimer_sleeper_start_expires() function to make that possible. Signed-off-by: Thomas Gleixner <[email protected]>
2019-08-01hrtimer: Consolidate hrtimer_init() + hrtimer_init_sleeper() callsSebastian Andrzej Siewior1-3/+1
hrtimer_init_sleeper() calls require prior initialisation of the hrtimer object which is embedded into the hrtimer_sleeper. Combine the initialization and spare a function call. Fixup all call sites. This is also a preparatory change for PREEMPT_RT to do hrtimer sleeper specific initializations of the embedded hrtimer without modifying any of the call sites. No functional change. [ anna-maria: Minor cleanups ] [ tglx: Adopted to the removal of the task argument of hrtimer_init_sleeper() and trivial polishing. Folded a fix from Stephen Rothwell for the vsoc code ] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-07-31net: bridge: mcast: add delete due to fast-leave mdb flagNikolay Aleksandrov3-1/+4
In user-space there's no way to distinguish why an mdb entry was deleted and that is a problem for daemons which would like to keep the mdb in sync with remote ends (e.g. mlag) but would also like to converge faster. In almost all cases we'd like to age-out the remote entry for performance and convergence reasons except when fast-leave is enabled. In that case we want explicit immediate remote delete, thus add mdb flag which is set only when the entry is being deleted due to fast-leave. Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-31net: bridge: mcast: don't delete permanent entries when fast leave is enabledNikolay Aleksandrov1-0/+3
When permanent entries were introduced by the commit below, they were exempt from timing out and thus igmp leave wouldn't affect them unless fast leave was enabled on the port which was added before permanent entries existed. It shouldn't matter if fast leave is enabled or not if the user added a permanent entry it shouldn't be deleted on igmp leave. Before: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show $ After: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent Fixes: ccb1c31a7a87 ("bridge: add flags to distinguish permanent mdb entires") Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-31Merge tag 'mac80211-next-for-davem-2019-07-31' of ↵David S. Miller27-318/+891
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a reasonably large number of changes: * lots more HE (802.11ax) support, particularly things relevant for the the AP side, but also mesh support * debugfs cleanups from Greg * some more work on extended key ID * start using genl parallel_ops, as preparation for weaning ourselves off RTNL and getting parallelism * various other changes all over ==================== Signed-off-by: David S. Miller <[email protected]>
2019-07-31Merge tag 'mac80211-for-davem-2019-07-31' of ↵David S. Miller6-14/+41
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just a few fixes: * revert NETIF_F_LLTX usage as it caused problems * avoid warning on WMM parameters from AP that are too short * fix possible null-ptr dereference in hwsim * fix interface combinations with 4-addr and crypto control ==================== Signed-off-by: David S. Miller <[email protected]>
2019-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller6-39/+29
Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains Netfilter fixes for your net tree: 1) memleak in ebtables from the error path for the 32/64 compat layer, from Florian Westphal. 2) Fix inverted meta ifname/ifidx matching when no interface is set on either from the input/output path, from Phil Sutter. 3) Remove goto label in nft_meta_bridge, also from Phil. 4) Missing include guard in xt_connlabel, from Masahiro Yamada. 5) Two patch to fix ipset destination MAC matching coming from Stephano Brivio, via Jozsef Kadlecsik. 6) Fix set rename and listing concurrency problem, from Shijie Luo. Patch also coming via Jozsef Kadlecsik. 7) ebtables 32/64 compat missing base chain policy in rule count, from Florian Westphal. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-07-31mac80211: HE STA disassoc due to QOS NULL not sentShay Bar1-1/+4
In case of HE AP-STA link, ieee80211_send_nullfunc() will not send the QOS NULL packet to check if AP is still associated. In this case, probe_send_count will be non-zero and ieee80211_sta_work() will later disassociate the AP, even though no packet was ever sent. Fix this by decrementing probe_send_count and not calling ieee80211_send_nullfunc() in case of HE link, so that we still wait for some time for the AP beacon to reappear and don't disconnect right away. Signed-off-by: Shay Bar <[email protected]> Link: https://lore.kernel.org/r/[email protected] [clarify commit message] Signed-off-by: Johannes Berg <[email protected]>
2019-07-31mac80211: allow setting spatial reuse parameters from bss_confJohn Crispin4-1/+32
Store the OBSS PD parameters inside bss_conf when bringing up an AP and/or when a station connects to an AP. This allows the driver to configure the HW accordingly. Signed-off-by: John Crispin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-07-31nl80211: add strict start typeJohannes Berg1-0/+1
Add a strict start type so all new attributes starting from NL80211_ATTR_HE_OBSS_PD are validated strictly. Signed-off-by: Johannes Berg <[email protected]>
2019-07-31cfg80211: add support for parsing OBBS_PD attributesJohn Crispin1-0/+45
Add the data structure, policy and parsing code allowing userland to send the OBSS PD information into the kernel. Signed-off-by: John Crispin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-07-31mac80211: reject zero MAC address in add stationKarthikeyan Periyasamy1-1/+1
This came up in fuzz testing, and really we don't consider all-zeroes to be a valid MAC address in most places, so also reject it here to avoid confusion later on. Signed-off-by: Karthikeyan Periyasamy <[email protected]> Link: https://lore.kernel.org/r/[email protected] [rewrite commit message] Signed-off-by: Johannes Berg <[email protected]>
2019-07-31cfg80211: use parallel_ops for genlJohannes Berg1-30/+78
Over time, we really need to get rid of all of our global locking. One of the things needed is to use parallel_ops. This isn't really the most important (RTNL is much more important) but OTOH we just keep adding uses of genl_family_attrbuf() now. Use .parallel_ops to disallow this. Reviewed-By: Denis Kenzior <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-07-31mac80211: add missing null return check from call to ieee80211_get_sbandColin Ian King1-0/+2
The return from ieee80211_get_sband can potentially be a null pointer, so it seems prudent to add a null check to avoid a null pointer dereference on sband. Addresses-Coverity: ("Dereference null return") Fixes: 2ab45876756f ("mac80211: add support for the ADDBA extension element") Signed-off-by: Colin Ian King <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-07-30bpf: add bpf_tcp_gen_syncookie helperPetar Penkov1-0/+73
This helper function allows BPF programs to try to generate SYN cookies, given a reference to a listener socket. The function works from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a socket in both cases. Signed-off-by: Petar Penkov <[email protected]> Suggested-by: Eric Dumazet <[email protected]> Reviewed-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-07-30tcp: add skb-less helpers to retrieve SYN cookiePetar Penkov3-0/+103
This patch allows generation of a SYN cookie before an SKB has been allocated, as is the case at XDP. Signed-off-by: Petar Penkov <[email protected]> Reviewed-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-07-30tcp: tcp_syn_flood_action read port from socketPetar Penkov1-5/+3
This allows us to call this function before an SKB has been allocated. Signed-off-by: Petar Penkov <[email protected]> Reviewed-by: Lorenz Bauer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-07-30net: dsa: ksz: Add KSZ8795 tag codeTristram Ha1-0/+62
Add DSA tag code for Microchip KSZ8795 switch. The switch is simpler and the tag is only 1 byte, instead of 2 as is the case with KSZ9477. Signed-off-by: Tristram Ha <[email protected]> Signed-off-by: Marek Vasut <[email protected]> Cc: Andrew Lunn <[email protected]> Cc: David S. Miller <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: Tristram Ha <[email protected]> Cc: Vivien Didelot <[email protected]> Cc: Woojung Huh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30vsock/virtio: change the maximum packet size allowedStefano Garzarella1-2/+2
Since now we are able to split packets, we can avoid limiting their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE. Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size. Signed-off-by: Stefano Garzarella <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30vhost/vsock: split packets to send using multiple buffersStefano Garzarella1-3/+12
If the packets to sent to the guest are bigger than the buffer available, we can split them, using multiple buffers and fixing the length in the packet header. This is safe since virtio-vsock supports only stream sockets. Signed-off-by: Stefano Garzarella <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()Stefano Garzarella1-2/+2
fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use the same spinlock also if we are in the TX path. Move also buf_alloc under the same lock. Signed-off-by: Stefano Garzarella <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30vsock/virtio: reduce credit update messagesStefano Garzarella1-3/+13
In order to reduce the number of credit update messages, we send them only when the space available seen by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE. Signed-off-by: Stefano Garzarella <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30vsock/virtio: limit the memory used per-socketStefano Garzarella2-9/+52
Since virtio-vsock was introduced, the buffers filled by the host and pushed to the guest using the vring, are directly queued in a per-socket list. These buffers are preallocated by the guest with a fixed size (4 KB). The maximum amount of memory used by each socket should be controlled by the credit mechanism. The default credit available per-socket is 256 KB, but if we use only 1 byte per packet, the guest can queue up to 262144 of 4 KB buffers, using up to 1 GB of memory per-socket. In addition, the guest will continue to fill the vring with new 4 KB free buffers to avoid starvation of other sockets. This patch mitigates this issue copying the payload of small packets (< 128 bytes) into the buffer of last packet queued, in order to avoid wasting memory. Signed-off-by: Stefano Garzarella <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30hrtimer: Remove task argument from hrtimer_init_sleeper()Thomas Gleixner1-1/+1
All callers hand in 'current' and that's the only task pointer which actually makes sense. Remove the task argument and set current in the function. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-07-30compat_ioctl: pppoe: fix PPPOEIOCSFWD handlingArnd Bergmann1-0/+3
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in linux-2.5.69 along with hundreds of other commands, but was always broken sincen only the structure is compatible, but the command number is not, due to the size being sizeof(size_t), or at first sizeof(sizeof((struct sockaddr_pppox)), which is different on 64-bit architectures. Guillaume Nault adds: And the implementation was broken until 2016 (see 29e73269aa4d ("pppoe: fix reference counting in PPPoE proxy")), and nobody ever noticed. I should probably have removed this ioctl entirely instead of fixing it. Clearly, it has never been used. Fix it by adding a compat_ioctl handler for all pppoe variants that translates the command number and then calls the regular ioctl function. All other ioctl commands handled by pppoe are compatible between 32-bit and 64-bit, and require compat_ptr() conversion. This should apply to all stable kernels. Acked-by: Guillaume Nault <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30tipc: fix unitilized skb list crashJon Maloy1-2/+1
Our test suite somtimes provokes the following crash: Description of problem: [ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8 [ 1092.605072] PGD 0 P4D 0 [ 1092.607620] Oops: 0000 [#1] SMP PTI [ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1 [ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018 [ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc] [ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7 [ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282 [ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045 [ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000 [ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a [ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000 [ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a [ 1092.692614] FS: 0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000 [ 1092.700702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0 [ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1092.727843] PKRU: 55555554 [ 1092.730556] Call Trace: [ 1092.733010] <IRQ> [ 1092.735034] tipc_sk_filter_rcv+0x7ca/0xb80 [tipc] [ 1092.739828] ? __kmalloc_node_track_caller+0x1cb/0x290 [ 1092.744974] ? dev_hard_start_xmit+0xa5/0x210 [ 1092.749332] tipc_sk_rcv+0x389/0x640 [tipc] [ 1092.753519] tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc] [ 1092.758224] tipc_rcv+0x57a/0xf20 [tipc] [ 1092.762154] ? ktime_get_real_ts64+0x40/0xe0 [ 1092.766432] ? tpacket_rcv+0x50/0x9f0 [ 1092.770098] tipc_l2_rcv_msg+0x4a/0x70 [tipc] [ 1092.774452] __netif_receive_skb_core+0xb62/0xbd0 [ 1092.779164] ? enqueue_entity+0xf6/0x630 [ 1092.783084] ? kmem_cache_alloc+0x158/0x1c0 [ 1092.787272] ? __build_skb+0x25/0xd0 [ 1092.790849] netif_receive_skb_internal+0x42/0xf0 [ 1092.795557] napi_gro_receive+0xba/0xe0 [ 1092.799417] mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core] [ 1092.804564] mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core] [ 1092.809536] mlx5e_napi_poll+0xb2/0xce0 [mlx5_core] [ 1092.814415] ? __wake_up_common_lock+0x89/0xc0 [ 1092.818861] net_rx_action+0x149/0x3b0 [ 1092.822616] __do_softirq+0xe3/0x30a [ 1092.826193] irq_exit+0x100/0x110 [ 1092.829512] do_IRQ+0x85/0xd0 [ 1092.832483] common_interrupt+0xf/0xf [ 1092.836147] </IRQ> [ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0 [ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f [ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd [ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f [ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000 [ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940 [ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78 [ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000 [ 1092.905196] ? cpuidle_enter_state+0x92/0x2a0 [ 1092.909555] do_idle+0x236/0x280 [ 1092.912785] cpu_startup_entry+0x6f/0x80 [ 1092.916715] start_secondary+0x1a7/0x200 [ 1092.920642] secondary_startup_64+0xb7/0xc0 [...] The reason is that the skb list tipc_socket::mc_method.deferredq only is initialized for connectionless sockets, while nothing stops arriving multicast messages from being filtered by connection oriented sockets, with subsequent access to the said list. We fix this by initializing the list unconditionally at socket creation. This eliminates the crash, while the message still is dropped further down in tipc_sk_filter_rcv() as it should be. Reported-by: Li Shuang <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Reviewed-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30net: Use skb_frag_off accessorsJonathan Lemon11-44/+46
Use accessor functions for skb fragment's page_offset instead of direct references, in preparation for bvec conversion. Signed-off-by: Jonathan Lemon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30sctp: factor out sctp_connect_add_peerXin Long1-45/+31
In this function factored out from sctp_sendmsg_new_asoc() and __sctp_connect(), it adds a peer with the other addr into the asoc after this asoc is created with the 1st addr. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30sctp: factor out sctp_connect_new_asocXin Long1-84/+76
In this function factored out from sctp_sendmsg_new_asoc() and __sctp_connect(), it creates the asoc and adds a peer with the 1st addr. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30sctp: clean up __sctp_connectXin Long1-136/+73
__sctp_connect is doing quit similar things as sctp_sendmsg_new_asoc. To factor out common functions, this patch is to clean up their code to make them look more similar: 1. create the asoc and add a peer with the 1st addr. 2. add peers with the other addrs into this asoc one by one. while at it, also remove the unused 'addrcnt'. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30sctp: check addr_size with sa_family_t size in __sctp_setsockopt_connectxXin Long1-1/+2
Now __sctp_connect() is called by __sctp_setsockopt_connectx() and sctp_inet_connect(), the latter has done addr_size check with size of sa_family_t. In the next patch to clean up __sctp_connect(), we will remove addr_size check with size of sa_family_t from __sctp_connect() for the 1st address. So before doing that, __sctp_setsockopt_connectx() should do this check first, as sctp_inet_connect() does. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30sctp: only copy the available addr data in sctp_transport_initXin Long1-1/+1
'addr' passed to sctp_transport_init is not always a whole size of union sctp_addr, like the path: sctp_sendmsg() -> sctp_sendmsg_new_asoc() -> sctp_assoc_add_peer() -> sctp_transport_new() -> sctp_transport_init() In the next patches, we will also pass the address length of data only to sctp_assoc_add_peer(). So sctp_transport_init() should copy the only available data from addr to peer->ipaddr, instead of 'peer->ipaddr = *addr' which may cause slab-out-of-bounds. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30rxrpc: Fix -Wframe-larger-than= warnings from on-stack cryptoDavid Howells4-20/+96
rxkad sometimes triggers a warning about oversized stack frames when building with clang for a 32-bit architecture: net/rxrpc/rxkad.c:243:12: error: stack frame size of 1088 bytes in function 'rxkad_secure_packet' [-Werror,-Wframe-larger-than=] net/rxrpc/rxkad.c:501:12: error: stack frame size of 1088 bytes in function 'rxkad_verify_packet' [-Werror,-Wframe-larger-than=] The problem is the combination of SYNC_SKCIPHER_REQUEST_ON_STACK() in rxkad_verify_packet()/rxkad_secure_packet() with the relatively large scatterlist in rxkad_verify_packet_1()/rxkad_secure_packet_encrypt(). The warning does not show up when using gcc, which does not inline the functions as aggressively, but the problem is still the same. Allocate the cipher buffers from the slab instead, caching the allocated packet crypto request memory used for DATA packet crypto in the rxrpc_call struct. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Arnd Bergmann <[email protected]> Signed-off-by: David Howells <[email protected]> Acked-by: Arnd Bergmann <[email protected]> cc: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-30SUNRPC: Track writers of the 'channel' file to improve cache_listeners_existDave Wysochanski1-4/+8
The sunrpc cache interface is susceptible to being fooled by a rogue process just reading a 'channel' file. If this happens the kernel may think a valid daemon exists to service the cache when it does not. For example, the following may fool the kernel: cat /proc/net/rpc/auth.unix.gid/channel Change the tracking of readers to writers when considering whether a listener exists as all valid daemon processes either open a channel file O_RDWR or O_WRONLY. While this does not prevent a rogue process from "stealing" a message from the kernel, it does at least improve the kernels perception of whether a valid process servicing the cache exists. Signed-off-by: Dave Wysochanski <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2019-07-30rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packetDavid Howells1-0/+1
Fix the fact that a notification isn't sent to the recvmsg side to indicate a call failed when sendmsg() fails to transmit a DATA packet with the error ENETUNREACH, EHOSTUNREACH or ECONNREFUSED. Without this notification, the afs client just sits there waiting for the call to complete in some manner (which it's not now going to do), which also pins the rxrpc call in place. This can be seen if the client has a scope-level IPv6 address, but not a global-level IPv6 address, and we try and transmit an operation to a server's IPv6 address. Looking in /proc/net/rxrpc/calls shows completed calls just sat there with an abort code of RX_USER_ABORT and an error code of -ENETUNREACH. Fixes: c54e43d752c7 ("rxrpc: Fix missing start of call timeout") Signed-off-by: David Howells <[email protected]> Reviewed-by: Marc Dionne <[email protected]> Reviewed-by: Jeffrey Altman <[email protected]>
2019-07-30rxrpc: Fix potential deadlockDavid Howells3-1/+20
There is a potential deadlock in rxrpc_peer_keepalive_dispatch() whereby rxrpc_put_peer() is called with the peer_hash_lock held, but if it reduces the peer's refcount to 0, rxrpc_put_peer() calls __rxrpc_put_peer() - which the tries to take the already held lock. Fix this by providing a version of rxrpc_put_peer() that can be called in situations where the lock is already held. The bug may produce the following lockdep report: ============================================ WARNING: possible recursive locking detected 5.2.0-next-20190718 #41 Not tainted -------------------------------------------- kworker/0:3/21678 is trying to acquire lock: 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh /./include/linux/spinlock.h:343 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: __rxrpc_put_peer /net/rxrpc/peer_object.c:415 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_put_peer+0x2d3/0x6a0 /net/rxrpc/peer_object.c:435 but task is already holding lock: 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh /./include/linux/spinlock.h:343 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_peer_keepalive_dispatch /net/rxrpc/peer_event.c:378 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_peer_keepalive_worker+0x6b3/0xd02 /net/rxrpc/peer_event.c:430 Fixes: 330bdcfadcee ("rxrpc: Fix the keepalive generator [ver #2]") Reported-by: [email protected] Signed-off-by: David Howells <[email protected]> Reviewed-by: Marc Dionne <[email protected]> Reviewed-by: Jeffrey Altman <[email protected]>
2019-07-30Revert "mac80211: set NETIF_F_LLTX when using intermediate tx queues"Johannes Berg1-1/+0
Revert this for now, it has been reported multiple times that it completely breaks connectivity on various devices. Cc: [email protected] Fixes: 8dbb000ee73b ("mac80211: set NETIF_F_LLTX when using intermediate tx queues") Reported-by: Jean Delvare <[email protected]> Reported-by: Peter Lebbing <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2019-07-30Merge branch 'master' of git://blackhole.kfki.hu/nfPablo Neira Ayuso3-7/+3
Jozsef Kadlecsik says: ==================== ipset patches for the nf tree - When the support of destination MAC addresses for hash:mac sets was introduced, it was forgotten to add the same functionality to hash:ip,mac types of sets. The patch from Stefano Brivio adds the missing part. - When the support of destination MAC addresses for hash:mac sets was introduced, a copy&paste error was made in the code of the hash:ip,mac and bitmap:ip,mac types: the MAC address in these set types is in the second position and not in the first one. Stefano Brivio's patch fixes the issue. - There was still a not properly handled concurrency handling issue between renaming and listing sets at the same time, reported by Shijie Luo. ==================== Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-07-30netfilter: ebtables: also count base chain policiesFlorian Westphal1-11/+17
ebtables doesn't include the base chain policies in the rule count, so we need to add them manually when we call into the x_tables core to allocate space for the comapt offset table. This lead syzbot to trigger: WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649 xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649 Reported-by: [email protected] Fixes: 2035f3ff8eaa ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-07-30drivers: Introduce device lookup variants by nameSuzuki K Poulose1-6/+1
Add a helper to match the device name for device lookup. Also reuse this generic exported helper for the existing bus_find_device_by_name(). and add similar variants for driver/class. Cc: Alessandro Zummo <[email protected]> Cc: Alexander Aring <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Murphy <[email protected]> Cc: Harald Freudenberger <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Jacek Anaszewski <[email protected]> Cc: Lee Jones <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Maxime Coquelin <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Peter Oberparleiter <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Stefan Schmidt <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Reviewed-by: Heikki Krogerus <[email protected]> Acked-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-07-29can: fix ioctl function removalOliver Hartkopp2-2/+16
Commit 60649d4e0af ("can: remove obsolete empty ioctl() handler") replaced the almost empty can_ioctl() function with sock_no_ioctl() which always returns -EOPNOTSUPP. Even though we don't have any ioctl() functions on socket/network layer we need to return -ENOIOCTLCMD to be able to forward ioctl commands like SIOCGIFINDEX to the network driver layer. This patch fixes the wrong return codes in the CAN network layer protocols. Reported-by: kernel test robot <[email protected]> Fixes: 60649d4e0af ("can: remove obsolete empty ioctl() handler") Signed-off-by: Oliver Hartkopp <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-29net: sctp: drop unneeded likely() call around IS_ERR()Enrico Weigelt1-2/+2
IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-29xdp: Add devmap_hash map type for looking up devices by hashed indexToke Høiland-Jørgensen1-2/+7
A common pattern when using xdp_redirect_map() is to create a device map where the lookup key is simply ifindex. Because device maps are arrays, this leaves holes in the map, and the map has to be sized to fit the largest ifindex, regardless of how many devices actually are actually needed in the map. This patch adds a second type of device map where the key is looked up using a hashmap, instead of being used as an array index. This allows maps to be densely packed, so they can be smaller. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: Yonghong Song <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-07-29netfilter: ipset: Fix rename concurrency with listingJozsef Kadlecsik1-1/+1
Shijie Luo reported that when stress-testing ipset with multiple concurrent create, rename, flush, list, destroy commands, it can result ipset <version>: Broken LIST kernel message: missing DATA part! error messages and broken list results. The problem was the rename operation was not properly handled with respect of listing. The patch fixes the issue. Reported-by: Shijie Luo <[email protected]> Signed-off-by: Jozsef Kadlecsik <[email protected]>
2019-07-29netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and ↵Stefano Brivio2-2/+2
hash:ip,mac sets In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the KADT functions for sets matching on MAC addreses the copy of source or destination MAC address depending on the configured match. This was done correctly for hash:mac, but for hash:ip,mac and bitmap:ip,mac, copying and pasting the same code block presents an obvious problem: in these two set types, the MAC address is the second dimension, not the first one, and we are actually selecting the MAC address depending on whether the first dimension (IP address) specifies source or destination. Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags. This way, mixing source and destination matches for the two dimensions of ip,mac set types works as expected. With this setup: ip netns add A ip link add veth1 type veth peer name veth2 netns A ip addr add 192.0.2.1/24 dev veth1 ip -net A addr add 192.0.2.2/24 dev veth2 ip link set veth1 up ip -net A link set veth2 up dst=$(ip netns exec A cat /sys/class/net/veth2/address) ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16 ip netns exec A ipset add test_bitmap 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP ip netns exec A ipset create test_hash hash:ip,mac ip netns exec A ipset add test_hash 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP ipset correctly matches a test packet: # ping -c1 192.0.2.2 >/dev/null # echo $? 0 Reported-by: Chen Yi <[email protected]> Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Jozsef Kadlecsik <[email protected]>