Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently, when we enforce alignment tracking on direct packet access,
the verifier lets the following program pass despite doing a packet
write with unaligned access:
0: (61) r2 = *(u32 *)(r1 +76)
1: (61) r3 = *(u32 *)(r1 +80)
2: (61) r7 = *(u32 *)(r1 +8)
3: (bf) r0 = r2
4: (07) r0 += 14
5: (25) if r7 > 0x1 goto pc+4
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
6: (2d) if r0 > r3 goto pc+1
R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
7: (63) *(u32 *)(r0 -4) = r0
8: (b7) r0 = 0
9: (95) exit
from 6 to 8:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
8: (b7) r0 = 0
9: (95) exit
from 5 to 10:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=2 R10=fp
10: (07) r0 += 1
11: (05) goto pc-6
6: safe <----- here, wrongly found safe
processed 15 insns
However, if we enforce a pruning mismatch by adding state into r8
which is then being mismatched in states_equal(), we find that for
the otherwise same program, the verifier detects a misaligned packet
access when actually walking that path:
0: (61) r2 = *(u32 *)(r1 +76)
1: (61) r3 = *(u32 *)(r1 +80)
2: (61) r7 = *(u32 *)(r1 +8)
3: (b7) r8 = 1
4: (bf) r0 = r2
5: (07) r0 += 14
6: (25) if r7 > 0x1 goto pc+4
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=0,max_value=1
R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
7: (2d) if r0 > r3 goto pc+1
R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
R3=pkt_end R7=inv,min_value=0,max_value=1
R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
8: (63) *(u32 *)(r0 -4) = r0
9: (b7) r0 = 0
10: (95) exit
from 7 to 9:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=0,max_value=1
R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
9: (b7) r0 = 0
10: (95) exit
from 6 to 11:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=2
R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
11: (07) r0 += 1
12: (b7) r8 = 0
13: (05) goto pc-7 <----- mismatch due to r8
7: (2d) if r0 > r3 goto pc+1
R0=pkt(id=0,off=15,r=15) R1=ctx R2=pkt(id=0,off=0,r=15)
R3=pkt_end R7=inv,min_value=2
R8=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
8: (63) *(u32 *)(r0 -4) = r0
misaligned packet access off 2+15+-4 size 4
The reason why we fail to see it in states_equal() is that the
third test in compare_ptrs_to_packet() ...
if (old->off <= cur->off &&
old->off >= old->range && cur->off >= cur->range)
return true;
... will let the above pass. The situation we run into is that
old->off <= cur->off (14 <= 15), meaning that prior walked paths
went with smaller offset, which was later used in the packet
access after successful packet range check and found to be safe
already.
For example: Given is R0=pkt(id=0,off=0,r=0). Adding offset 14
as in above program to it, results in R0=pkt(id=0,off=14,r=0)
before the packet range test. Now, testing this against R3=pkt_end
with 'if r0 > r3 goto out' will transform R0 into R0=pkt(id=0,off=14,r=14)
for the case when we're within bounds. A write into the packet
at offset *(u32 *)(r0 -4), that is, 2 + 14 -4, is valid and
aligned (2 is for NET_IP_ALIGN). After processing this with
all fall-through paths, we later on check paths from branches.
When the above skb->mark test is true, then we jump near the
end of the program, perform r0 += 1, and jump back to the
'if r0 > r3 goto out' test we've visited earlier already. This
time, R0 is of type R0=pkt(id=0,off=15,r=0), and we'll prune
that part because this time we'll have a larger safe packet
range, and we already found that with off=14 all further insn
were already safe, so it's safe as well with a larger off.
However, the problem is that the subsequent write into the packet
with 2 + 15 -4 is then unaligned, and not caught by the alignment
tracking. Note that min_align, aux_off, and aux_off_align were
all 0 in this example.
Since we cannot tell at this time what kind of packet access was
performed in the prior walk and what minimal requirements it has
(we might do so in the future, but that requires more complexity),
fix it to disable this pruning case for strict alignment for now,
and let the verifier do check such paths instead. With that applied,
the test cases pass and reject the program due to misalignment.
Fixes: d1174416747d ("bpf: Track alignment of register values in the verifier.")
Reference: http://patchwork.ozlabs.org/patch/761909/
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 7d472a59c0e5ec117220a05de6b370447fb6cb66 ("arp: always override
existing neigh entries with gratuitous ARP") introduced a compiler
warning:
net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
this function [-Wmaybe-uninitialized]
While the code logic seems to be correct and doesn't allow the variable
to be used uninitialized, and the warning is not consistently
reproducible, it's still worth fixing it for other people not to waste
time looking at the warning in case it pops up in the build environment.
Yes, compiler is probably at fault, but we will need to accommodate.
Fixes: 7d472a59c0e5 ("arp: always override existing neigh entries with gratuitous ARP")
Signed-off-by: Ihar Hrachyshka <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fastopen API should be used to perform fastopen operations on the TCP
socket. It does not make sense to use fastopen API to perform disconnect
by calling it with AF_UNSPEC. The fastopen data path is also prone to
race conditions and bugs when using with AF_UNSPEC.
One issue reported and analyzed by Vegard Nossum is as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thread A: Thread B:
------------------------------------------------------------------------
sendto()
- tcp_sendmsg()
- sk_stream_memory_free() = 0
- goto wait_for_sndbuf
- sk_stream_wait_memory()
- sk_wait_event() // sleep
| sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
| - tcp_sendmsg()
| - tcp_sendmsg_fastopen()
| - __inet_stream_connect()
| - tcp_disconnect() //because of AF_UNSPEC
| - tcp_transmit_skb()// send RST
| - return 0; // no reconnect!
| - sk_stream_wait_connect()
| - sock_error()
| - xchg(&sk->sk_err, 0)
| - return -ECONNRESET
- ... // wake up, see sk->sk_err == 0
- skb_entail() on TCP_CLOSE socket
If the connection is reopened then we will send a brand new SYN packet
after thread A has already queued a buffer. At this point I think the
socket internal state (sequence numbers etc.) becomes messed up.
When the new connection is closed, the FIN-ACK is rejected because the
sequence number is outside the window. The other side tries to
retransmit,
but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
and return EOPNOTSUPP to user if such case happens.
Fixes: cf60af03ca4e7 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Reported-by: Vegard Nossum <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The default value for somaxconn is set in sysctl_core_net_init(), but this
function is not called when kernel is configured without CONFIG_SYSCTL.
This results in the kernel not being able to accept TCP connections,
because the backlog has zero size. Usually, the user ends up with:
"TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request. Check SNMP counters."
If SYN cookies are not enabled the connection is rejected.
Before ef547f2ac16 (tcp: remove max_qlen_log), the effects were less
severe, because the backlog was always at least eight slots long.
Signed-off-by: Roman Kapl <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add null check to avoid a potential null pointer dereference.
Addresses-Coverity-ID: 1408831
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Acked-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Since 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") fill_info
does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
because it uses ip_tunnel_info_af() with the device level info, which is
not valid for COLLECT_METADATA.
Fix by checking for the presence of the actual sockets.
Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Daniel Borkmann says:
====================
BPF pruning follow-up
Follow-up to fix incorrect pruning when alignment tracking is
in use and to properly clear regs after call to not leave stale
data behind. For details, please see individual patches.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Since virtio does not provide it's own ndo_features_check handler,
TSO, and now checksum offload, are disabled for stacked vlans.
Re-enable the support and let the host take care of it. This
restores/improves Guest-to-Guest performance over Q-in-Q vlans.
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Vladislav Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
At least some of the be2net cards do not seem to be capabled
of performing checksum offload computions on Q-in-Q packets.
In these case, the recevied checksum on the remote is invalid
and TCP syn packets are dropped.
This patch adds a call to check disbled acceleration features
on Q-in-Q tagged traffic.
CC: Sathya Perla <[email protected]>
CC: Ajit Khaparde <[email protected]>
CC: Sriharsha Basavapatna <[email protected]>
CC: Somnath Kotur <[email protected]>
Signed-off-by: Vladislav Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
It appears that TCP checksum offloading has been broken for
Q-in-Q vlans. The behavior was execerbated by the
series
commit afb0bc972b52 ("Merge branch 'stacked_vlan_tso'")
that that enabled accleleration features on stacked vlans.
However, event without that series, it is possible to trigger
this issue. It just requires a lot more specialized configuration.
The root cause is the interaction between how
netdev_intersect_features() works, the features actually set on
the vlan devices and HW having the ability to run checksum with
longer headers.
The issue starts when netdev_interesect_features() replaces
NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
if the HW advertises IP|IPV6 specific checksums. This happens
for tagged and multi-tagged packets. However, HW that enables
IP|IPV6 checksum offloading doesn't gurantee that packets with
arbitrarily long headers can be checksummed.
This patch disables IP|IPV6 checksums on the packet for multi-tagged
packets.
CC: Toshiaki Makita <[email protected]>
CC: Michal Kubecek <[email protected]>
Signed-off-by: Vladislav Yasevich <[email protected]>
Acked-by: Toshiaki Makita <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The 88m1101 has an errata when configuring autoneg. However, it was
being applied to many other Marvell PHYs as well. Limit its scope to
just the 88m1101.
Fixes: 76884679c644 ("phylib: Add support for Marvell 88e1111S and 88e1145")
Reported-by: Daniel Walker <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
Acked-by: Harini Katakam <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fix build errors by making this driver depend on OF_MDIO, like
several other similar drivers do.
drivers/built-in.o: In function `octeon_mdiobus_remove':
mdio-octeon.c:(.text+0x196ee0): undefined reference to `mdiobus_unregister'
mdio-octeon.c:(.text+0x196ee8): undefined reference to `mdiobus_free'
drivers/built-in.o: In function `octeon_mdiobus_probe':
mdio-octeon.c:(.text+0x196f1d): undefined reference to `devm_mdiobus_alloc_size'
mdio-octeon.c:(.text+0x196ffe): undefined reference to `of_mdiobus_register'
mdio-octeon.c:(.text+0x197010): undefined reference to `mdiobus_free'
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-fixes-2017-05-23
Some TC offloads fixes from Or Gerlitz.
From Erez, mlx5 IPoIB RX fix to improve GRO.
From Mohamad, Command interface fix to improve mitigation against FW
commands timeouts.
From Tariq, Driver load Tolerance against affinity settings failures.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just two fixes this time:
* fix the scheduled scan "BUG: scheduling while atomic"
* check mesh address extension flags more strictly
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
rtnl_fdb_dump() failed to check the result of nlmsg_parse(), which led
to contents of |ifm| being uninitialized because nlh->nlmsglen was too
small to accommodate |ifm|. The uninitialized data may affect some
branches and result in unwanted effects, although kernel data doesn't
seem to leak to the userspace directly.
The bug has been detected with KMSAN and syzkaller.
For the record, here is the KMSAN report:
==================================================================
BUG: KMSAN: use of unitialized memory in rtnl_fdb_dump+0x5dc/0x1000
CPU: 0 PID: 1039 Comm: probe Not tainted 4.11.0-rc5+ #2727
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x143/0x1b0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
__kmsan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:491
rtnl_fdb_dump+0x5dc/0x1000 net/core/rtnetlink.c:3230
netlink_dump+0x84f/0x1190 net/netlink/af_netlink.c:2168
__netlink_dump_start+0xc97/0xe50 net/netlink/af_netlink.c:2258
netlink_dump_start ./include/linux/netlink.h:165
rtnetlink_rcv_msg+0xae9/0xb40 net/core/rtnetlink.c:4094
netlink_rcv_skb+0x339/0x5a0 net/netlink/af_netlink.c:2339
rtnetlink_rcv+0x83/0xa0 net/core/rtnetlink.c:4110
netlink_unicast_kernel net/netlink/af_netlink.c:1272
netlink_unicast+0x13b7/0x1480 net/netlink/af_netlink.c:1298
netlink_sendmsg+0x10b8/0x10f0 net/netlink/af_netlink.c:1844
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
__sys_sendmsg net/socket.c:2031
SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
SyS_sendmsg+0x87/0xb0 net/socket.c:2038
do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
RIP: 0033:0x401300
RSP: 002b:00007ffc3b0e6d58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401300
RDX: 0000000000000000 RSI: 00007ffc3b0e6d80 RDI: 0000000000000003
RBP: 00007ffc3b0e6e00 R08: 000000000000000b R09: 0000000000000004
R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004065a0 R14: 0000000000406630 R15: 0000000000000000
origin: 000000008fe00056
save_stack_trace+0x59/0x60 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:352
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:247
kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:260
slab_alloc_node mm/slub.c:2743
__kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4349
__kmalloc_reserve net/core/skbuff.c:138
__alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
alloc_skb ./include/linux/skbuff.h:933
netlink_alloc_large_skb net/netlink/af_netlink.c:1144
netlink_sendmsg+0x934/0x10f0 net/netlink/af_netlink.c:1819
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
__sys_sendmsg net/socket.c:2031
SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
SyS_sendmsg+0x87/0xb0 net/socket.c:2038
do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
==================================================================
and the reproducer:
==================================================================
#include <sys/socket.h>
#include <net/if_arp.h>
#include <linux/netlink.h>
#include <stdint.h>
int main()
{
int sock = socket(PF_NETLINK, SOCK_DGRAM | SOCK_NONBLOCK, 0);
struct msghdr msg;
memset(&msg, 0, sizeof(msg));
char nlmsg_buf[32];
memset(nlmsg_buf, 0, sizeof(nlmsg_buf));
struct nlmsghdr *nlmsg = nlmsg_buf;
nlmsg->nlmsg_len = 0x11;
nlmsg->nlmsg_type = 0x1e; // RTM_NEWROUTE = RTM_BASE + 0x0e
// type = 0x0e = 1110b
// kind = 2
nlmsg->nlmsg_flags = 0x101; // NLM_F_ROOT | NLM_F_REQUEST
nlmsg->nlmsg_seq = 0;
nlmsg->nlmsg_pid = 0;
nlmsg_buf[16] = (char)7;
struct iovec iov;
iov.iov_base = nlmsg_buf;
iov.iov_len = 17;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
sendmsg(sock, &msg, 0);
return 0;
}
==================================================================
Signed-off-by: Alexander Potapenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some PHY require to wait for a bit after the reset GPIO has been
toggled. This adds support for the DT property `phy-reset-post-delay`
which gives the delay in milliseconds to wait after reset.
If the DT property is not given, no delay is observed. Post reset delay
greater than 1000ms are invalid.
Signed-off-by: Quentin Schulz <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Acked-by: Fugang Duan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Xin Long says:
====================
sctp: a bunch of fixes for processing dupcookie
After introducing transport hashtable and per stream info into sctp,
some regressions were caused when processing dupcookie, this patchset
is to fix them.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
After sctp changed to use transport hashtable, a transport would be
added into global hashtable when adding the peer to an asoc, then
the asoc can be got by searching the transport in the hashtbale.
The problem is when processing dupcookie in sctp_sf_do_5_2_4_dupcook,
a new asoc would be created. A peer with the same addr and port as
the one in the old asoc might be added into the new asoc, but fail
to be added into the hashtable, as they also belong to the same sk.
It causes that sctp's dupcookie processing can not really work.
Since the new asoc will be freed after copying it's information to
the old asoc, it's more like a temp asoc. So this patch is to fix
it by setting it as a temp asoc to avoid adding it's any transport
into the hashtable and also avoid allocing assoc_id.
An extra thing it has to do is to also alloc stream info for any
temp asoc, as sctp dupcookie process needs it to update old asoc.
But I don't think it would hurt something, as a temp asoc would
always be freed after finishing processing cookie echo packet.
Reported-by: Jianwen Ji <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Since commit 3dbcc105d556 ("sctp: alloc stream info when initializing
asoc"), stream and stream.out info are always alloced when creating
an asoc.
So it's not correct to check !asoc->stream before updating stream
info when processing dupcookie, but would be better to check asoc
state instead.
Fixes: 3dbcc105d556 ("sctp: alloc stream info when initializing asoc")
Signed-off-by: Xin Long <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Masks for extracting part of the Completion Queue Entry (CQE)
field rss_hash_type was swapped, namely CQE_RSS_HTYPE_IP and
CQE_RSS_HTYPE_L4.
The bug resulted in setting skb->l4_hash, even-though the
rss_hash_type indicated that hash was NOT computed over the
L4 (UDP or TCP) part of the packet.
Added comments from the datasheet, to make it more clear what
these masks are selecting.
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some devices need their multicast filter reset but others are crashed by that.
So the methods need to be separated.
Signed-off-by: Oliver Neukum <[email protected]>
Reported-by: "Ridgway, Keith" <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2017-05-23
1) Fix wrong header offset for esp4 udpencap packets.
2) Fix a stack access out of bounds when creating a bundle
with sub policies. From Sabrina Dubroca.
3) Fix slab-out-of-bounds in pfkey due to an incorrect
sadb_x_sec_len calculation.
4) We checked the wrong feature flags when taking down
an interface with IPsec offload enabled.
Fix from Ilan Tayari.
5) Copy the anti replay sequence numbers when doing a state
migration, otherwise we get out of sync with the sequence
numbers. Fix from Antony Antony.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add tolerance to failures of irq_set_affinity_hint().
Its role is to give hints that optimizes performance,
and should not block the driver load.
In non-SMP systems, functionality is not available as
there is a single core, and all these calls definitely
fail. Hence, do not call the function and avoid the
warning prints.
Fixes: db058a186f98 ("net/mlx5_core: Set irq affinity hints")
Signed-off-by: Tariq Toukan <[email protected]>
Cc: [email protected]
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently when firmware command gets stuck or it takes long time to
complete, the driver command will get timeout and the command slot is
freed and can be used for new commands, and if the firmware receive new
command on the old busy slot its behavior is unexpected and this could
be harmful.
To fix this when the driver command gets timeout we return failure,
but we don't free the command slot and we wait for the firmware to
explicitly respond to that command.
Once all the entries are busy we will stop processing new firmware
commands.
Fixes: 9cba4ebcf374 ('net/mlx5: Fix potential deadlock in command mode change')
Signed-off-by: Mohamad Haj Yahia <[email protected]>
Cc: [email protected]
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
IPoIB packet contains the pseudo header area, we need to pull it prior
to reset_mac_header in order to let the GRO work well.
In more details:
GRO checks the mac address of the new coming packet, it does that by
comparing the hard_header_len size of the current packet to the previous
one in that session, the comparison is over hard_header_len size.
Now, the driver prepares that area in the skb by allocating area from the
reserved part and resetting the correct mac header to it.
Fixes: 9d6bd752c63c ("net/mlx5e: IPoIB, RX handler")
Signed-off-by: Erez Shitrit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The sparse tool emits these correct complaints:
drivers/net/ethernet/mellanox/mlx5/core//en_tc.c:1005:25: warning: cast to restricted __be32
drivers/net/ethernet/mellanox/mlx5/core//en_tc.c:1007:25: warning: cast to restricted __be16
The value is provided from user-space in network order, but there's
no way for them to realize that, avoid the warnings by casting to the
appropriate type.
Fixes: d79b6df6b10a ('net/mlx5e: Add parsing of TC pedit actions to HW format')
Signed-off-by: Or Gerlitz <[email protected]>
Reported-by: Leon Romanovsky <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently we don't support partial header re-writes through TC pedit
action offloading. However, the code that enforces that wasn't err-ing
on cases where the first and last bits of the mask are set but there is
some zero bit between them, such as in the below example, fix that!
tc filter add dev enp1s0 protocol ip parent ffff: prio 10 flower
ip_proto udp dst_port 2001 skip_sw
action pedit munge ip src set 1.0.0.1 retain 0xff0000ff
Fixes: d79b6df6b10a ('net/mlx5e: Add parsing of TC pedit actions to HW format')
Signed-off-by: Or Gerlitz <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When offloading header re-writes, the HW re-calculates the relevant L3/L4
checksums. Hence, when upper layers (as done by OVS) ask for TC checksum action
offload together with pedit offload, don't err. This command now works:
tc filter add dev ens1f0 protocol ip parent ffff: prio 20 flower skip_sw
ip_proto tcp dst_port 9001
action pedit ex munge tcp dport set 0x1234 pipe
action csum tcp
Signed-off-by: Or Gerlitz <[email protected]>
Reported-by: Paul Blakey <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Add the accessors for realizing if this is a csum action,
and for which fields checksum is needed.
Signed-off-by: Or Gerlitz <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
We wrongly direcly invoke hlist_del_rcu() and not hash_del_rcu() which
does a slightly different call now and may change later, fix that.
Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Or Gerlitz <[email protected]>
Reported-by: Paul Blakey <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Drivers should be able to call cfg80211_sched_scan_results() from atomic
context. However, with the introduction of multiple scheduled scan feature
this requirement was not taken into account resulting in regression shown
below.
[ 119.021594] BUG: scheduling while atomic: irq/47-iwlwifi/517/0x00000200
[ 119.021604] Modules linked in: [...]
[ 119.021759] CPU: 1 PID: 517 Comm: irq/47-iwlwifi Not tainted 4.12.0-rc2-t440s-20170522+ #1
[ 119.021763] Hardware name: LENOVO 20AQS03H00/20AQS03H00, BIOS GJET91WW (2.41 ) 09/21/2016
[ 119.021766] Call Trace:
[ 119.021778] ? dump_stack+0x5c/0x84
[ 119.021784] ? __schedule_bug+0x4c/0x70
[ 119.021792] ? __schedule+0x496/0x5c0
[ 119.021798] ? schedule+0x2d/0x80
[ 119.021804] ? schedule_preempt_disabled+0x5/0x10
[ 119.021810] ? __mutex_lock.isra.0+0x18e/0x4c0
[ 119.021817] ? __wake_up+0x2f/0x50
[ 119.021833] ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211]
[ 119.021844] ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211]
[ 119.021859] ? iwl_mvm_rx_lmac_scan_iter_complete_notif+0x17/0x30 [iwlmvm]
[ 119.021869] ? iwl_pcie_rx_handle+0x2a9/0x7e0 [iwlwifi]
[ 119.021878] ? iwl_pcie_irq_handler+0x17c/0x730 [iwlwifi]
[ 119.021884] ? irq_forced_thread_fn+0x60/0x60
[ 119.021887] ? irq_thread_fn+0x16/0x40
[ 119.021892] ? irq_thread+0x109/0x180
[ 119.021896] ? wake_threads_waitq+0x30/0x30
[ 119.021901] ? kthread+0xf2/0x130
[ 119.021905] ? irq_thread_dtor+0x90/0x90
[ 119.021910] ? kthread_create_on_node+0x40/0x40
[ 119.021915] ? ret_from_fork+0x26/0x40
Fixes: b34939b98369 ("cfg80211: add request id to cfg80211_sched_scan_*() api")
Reported-by: Sander Eikelenboom <[email protected]>
Signed-off-by: Arend van Spriel <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fix from Kees Cook:
"Marta noticed another misbehavior in EFI pstore, which this fixes.
Hopefully this is the last of the v4.12 fixes for pstore!"
* tag 'pstore-v4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
efi-pstore: Fix write/erase id tracking
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert a 4.11 change that turned out to be problematic and add a
.gitignore file.
Specifics:
- Revert a 4.11 commit related to the ACPI-based handling of laptop
lids that made changes incompatible with existing user space stacks
and broke things there (Lv Zheng).
- Add .gitignore to the ACPI tools directory (Prarit Bhargava)"
* tag 'acpi-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI / button: Remove lid_init_state=method mode"
tools/power/acpi: Add .gitignore file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix RTC wakeup from suspend-to-idle broken recently, fix CPU
idleness detection condition in the schedutil cpufreq governor, fix a
cpufreq driver build failure, fix an error code path in the power
capping framework, clean up the hibernate core and update the
intel_pstate documentation.
Specifics:
- Fix RTC wakeup from suspend-to-idle broken by the recent rework of
ACPI wakeup handling (Rafael Wysocki).
- Update intel_pstate driver documentation to reflect the current
code and explain how it works in more detail (Rafael Wysocki).
- Fix an issue related to CPU idleness detection on systems with
shared cpufreq policies in the schedutil governor (Juri Lelli).
- Fix a possible build issue in the dbx500 cpufreq driver (Arnd
Bergmann).
- Fix a function in the power capping framework core to return an
error code instead of 0 when there's an error (Dan Carpenter).
- Clean up variable definition in the hibernation core (Pushkar
Jambhlekar)"
* tag 'pm-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: dbx500: add a Kconfig symbol
PM / hibernate: Declare variables as static
PowerCap: Fix an error code in powercap_register_zone()
RTC: rtc-cmos: Fix wakeup from suspend-to-idle
PM / wakeup: Fix up wakeup_source_report_event()
cpufreq: intel_pstate: Document the current behavior and user interface
cpufreq: schedutil: use now as reference when aggregating shared policy requests
|
|
We need to initializes those variables to 0 for platforms that do not
provide ACPI parameters. Otherwise, we set sda_hold_time to random
values, breaking e.g. Galileo and IOT2000 boards.
Reported-and-tested-by: Linus Torvalds <[email protected]>
Reported-by: Tobias Klausmann <[email protected]>
Fixes: 9d6408433019 ("i2c: designware: don't infer timings described by ACPI from clock rate")
Signed-off-by: Jan Kiszka <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Acked-by: Jarkko Nikula <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Prior to the pstore interface refactoring, the "id" generated during
a backend pstore_write() was only retained by the internal pstore
inode tracking list. Additionally the "part" was ignored, so EFI
would encode this in the id. This corrects the misunderstandings
and correctly sets "id" during pstore_write(), and uses "part"
directly during pstore_erase().
Reported-by: Marta Lofstedt <[email protected]>
Fixes: 76cc9580e3fb ("pstore: Replace arguments for write() API")
Fixes: a61072aae693 ("pstore: Replace arguments for erase() API")
Signed-off-by: Kees Cook <[email protected]>
Tested-by: Marta Lofstedt <[email protected]>
|
|
Pull networking fixes from David Miller:
"Mostly netfilter bug fixes in here, but we have some bits elsewhere as
well.
1) Don't do SNAT replies for non-NATed connections in IPVS, from
Julian Anastasov.
2) Don't delete conntrack helpers while they are still in use, from
Liping Zhang.
3) Fix zero padding in xtables's xt_data_to_user(), from Willem de
Bruijn.
4) Add proper RCU protection to nf_tables_dump_set() because we
cannot guarantee that we hold the NFNL_SUBSYS_NFTABLES lock. From
Liping Zhang.
5) Initialize rcv_mss in tcp_disconnect(), from Wei Wang.
6) smsc95xx devices can't handle IPV6 checksums fully, so don't
advertise support for offloading them. From Nisar Sayed.
7) Fix out-of-bounds access in __ip6_append_data(), from Eric
Dumazet.
8) Make atl2_probe() propagate the error code properly on failures,
from Alexey Khoroshilov.
9) arp_target[] in bond_check_params() is used uninitialized. This
got changes from a global static to a local variable, which is how
this mistake happened. Fix from Jarod Wilson.
10) Fix fallout from unnecessary NULL check removal in cls_matchall,
from Jiri Pirko. This is definitely brown paper bag territory..."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
net: sched: cls_matchall: fix null pointer dereference
vsock: use new wait API for vsock_stream_sendmsg()
bonding: fix randomly populated arp target array
net: Make IP alignment calulations clearer.
bonding: fix accounting of active ports in 3ad
net: atheros: atl2: don't return zero on failure path in atl2_probe()
ipv6: fix out of bound writes in __ip6_append_data()
bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
smsc95xx: Support only IPv4 TCP/UDP csum offload
arp: always override existing neigh entries with gratuitous ARP
arp: postpone addr_type calculation to as late as possible
arp: decompose is_garp logic into a separate function
arp: fixed error in a comment
tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT
ebtables: arpreply: Add the standard target sanity check
netfilter: nf_tables: revisit chain/object refcounting from elements
netfilter: nf_tables: missing sanitization in data from userspace
netfilter: nf_tables: can't assume lock is acquired when dumping set elems
netfilter: synproxy: fix conntrackd interaction
...
|
|
Since the head is guaranteed by the check above to be null, the call_rcu
would explode. Remove the previously logically dead code that was made
logically very much alive and kicking.
Fixes: 985538eee06f ("net/sched: remove redundant null check on head")
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As reported by Michal, vsock_stream_sendmsg() could still
sleep at vsock_stream_has_space() after prepare_to_wait():
vsock_stream_has_space
vmci_transport_stream_has_space
vmci_qpair_produce_free_space
qp_lock
qp_acquire_queue_mutex
mutex_lock
Just switch to the new wait API like we did for commit
d9dc8b0f8b4e ("net: fix sleeping for sk_wait_event()").
Reported-by: Michal Kubecek <[email protected]>
Cc: Stefan Hajnoczi <[email protected]>
Cc: Jorgen Hansen <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In commit dc9c4d0fe023, the arp_target array moved from a static global
to a local variable. By the nature of static globals, the array used to
be initialized to all 0. At present, it's full of random data, which
that gets interpreted as arp_target values, when none have actually been
specified. Systems end up booting with spew along these lines:
[ 32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
[ 32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
[ 32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0
[ 32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
[ 32.204892] lacp0: Setting MII monitoring interval to 100
[ 32.211071] lacp0: Removing ARP target 216.124.228.17
[ 32.216824] lacp0: Removing ARP target 218.160.255.255
[ 32.222646] lacp0: Removing ARP target 185.170.136.184
[ 32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal
[ 32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
[ 32.243987] lacp0: Removing ARP target 56.125.228.17
[ 32.249625] lacp0: Removing ARP target 218.160.255.255
[ 32.255432] lacp0: Removing ARP target 15.157.233.184
[ 32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal
[ 32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
[ 32.276632] lacp0: Removing ARP target 16.0.0.0
[ 32.281755] lacp0: Removing ARP target 218.160.255.255
[ 32.287567] lacp0: Removing ARP target 72.125.228.17
[ 32.293165] lacp0: Removing ARP target 218.160.255.255
[ 32.298970] lacp0: Removing ARP target 8.125.228.17
[ 32.304458] lacp0: Removing ARP target 218.160.255.255
None of these were actually specified as ARP targets, and the driver does
seem to clean up the mess okay, but it's rather noisy and confusing, leaks
values to userspace, and the 255.255.255.255 spew shows up even when debug
prints are disabled.
The fix: just zero out arp_target at init time.
While we're in here, init arp_all_targets_value in the right place.
Fixes: dc9c4d0fe023 ("bonding: reduce scope of some global variables")
CC: Mahesh Bandewar <[email protected]>
CC: Jay Vosburgh <[email protected]>
CC: Veaceslav Falico <[email protected]>
CC: Andy Gospodarek <[email protected]>
CC: [email protected]
CC: [email protected]
Signed-off-by: Jarod Wilson <[email protected]>
Acked-by: Andy Gospodarek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
* pm-sleep:
PM / hibernate: Declare variables as static
RTC: rtc-cmos: Fix wakeup from suspend-to-idle
PM / wakeup: Fix up wakeup_source_report_event()
* powercap:
PowerCap: Fix an error code in powercap_register_zone()
|
|
* acpi-button:
Revert "ACPI / button: Remove lid_init_state=method mode"
* acpi-tools:
tools/power/acpi: Add .gitignore file
|
|
* intel_pstate:
cpufreq: intel_pstate: Document the current behavior and user interface
* pm-cpufreq:
cpufreq: dbx500: add a Kconfig symbol
* pm-cpufreq-sched:
cpufreq: schedutil: use now as reference when aggregating shared policy requests
|
|
The assignmnet:
ip_align = strict ? 2 : NET_IP_ALIGN;
in compare_pkt_ptr_alignment() trips up Coverity because we can only
get to this code when strict is true, therefore ip_align will always
be 2 regardless of NET_IP_ALIGN's value.
So just assign directly to '2' and explain the situation in the
comment above.
Reported-by: "Gustavo A. R. Silva" <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As of 7bb11dc9f59d and 0622cab0341c, bond slaves in a 3ad bond are not
removed from the aggregator when they are down, and the active slave count
is NOT equal to number of ports in the aggregator, but rather the number
of ports in the aggregator that are still enabled. The sysfs spew for
bonding_show_ad_num_ports() has a comment that says "Show number of active
802.3ad ports.", but it's currently showing total number of ports, both
active and inactive. Remedy it by using the same logic introduced in
0622cab0341c in __bond_3ad_get_active_agg_info(), so sysfs, procfs and
netlink all report the number of active ports. Note that this means that
IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of
NUM_PORTS, and thus perhaps should be renamed for clarity.
Lightly tested on a dual i40e lacp bond, simulating link downs with an ip
link set dev <slave2> down, was able to produce the state where I could
see both in the same aggregator, but a number of ports count of 1.
MII Status: up
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2 <---
Slave Interface: ens10
MII Status: up <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1
MII Status: up
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1 <---
Slave Interface: ens10
MII Status: down <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1
CC: Jay Vosburgh <[email protected]>
CC: Veaceslav Falico <[email protected]>
CC: Andy Gospodarek <[email protected]>
CC: [email protected]
Signed-off-by: Jarod Wilson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If dma mask checks fail in atl2_probe(), it breaks off initialization,
deallocates all resources, but returns zero.
The patch adds proper error code return value and
make error code setup unified.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Andrey Konovalov and [email protected] reported crashes caused by
one skb shared_info being overwritten from __ip6_append_data()
Andrey program lead to following state :
copy -4200 datalen 2000 fraglen 2040
maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200
The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
fraggap, 0); is overwriting skb->head and skb_shared_info
Since we apparently detect this rare condition too late, move the
code earlier to even avoid allocating skb and risking crashes.
Once again, many thanks to Andrey and syzkaller team.
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Reported-by: <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
|
|
The code to fetch a 64-bit value from user space was entirely buggered,
and has been since the code was merged in early 2016 in commit
b2f680380ddf ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
kernels").
Happily the buggered routine is almost certainly entirely unused, since
the normal way to access user space memory is just with the non-inlined
"get_user()", and the inlined version didn't even historically exist.
The normal "get_user()" case is handled by external hand-written asm in
arch/x86/lib/getuser.S that doesn't have either of these issues.
There were two independent bugs in __get_user_asm_u64():
- it still did the STAC/CLAC user space access marking, even though
that is now done by the wrapper macros, see commit 11f1a4b9755f
("x86: reorganize SMAP handling in user space accesses").
This didn't result in a semantic error, it just means that the
inlined optimized version was hugely less efficient than the
allegedly slower standard version, since the CLAC/STAC overhead is
quite high on modern Intel CPU's.
- the double register %eax/%edx was marked as an output, but the %eax
part of it was touched early in the asm, and could thus clobber other
inputs to the asm that gcc didn't expect it to touch.
In particular, that meant that the generated code could look like
this:
mov (%eax),%eax
mov 0x4(%eax),%edx
where the load of %edx obviously was _supposed_ to be from the 32-bit
word that followed the source of %eax, but because %eax was
overwritten by the first instruction, the source of %edx was
basically random garbage.
The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
the 64-bit output as early-clobber to let gcc know that no inputs should
alias with the output register.
Cc: Al Viro <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected] # v4.8+
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Al noticed that unsafe_put_user() had type problems, and fixed them in
commit a7cc722fff0b ("fix unsafe_put_user()"), which made me look more
at those functions.
It turns out that unsafe_get_user() had a type issue too: it limited the
largest size of the type it could handle to "unsigned long". Which is
fine with the current users, but doesn't match our existing normal
get_user() semantics, which can also handle "u64" even when that does
not fit in a long.
While at it, also clean up the type cast in unsafe_put_user(). We
actually want to just make it an assignment to the expected type of the
pointer, because we actually do want warnings from types that don't
convert silently. And it makes the code more readable by not having
that one very long and complex line.
[ This patch might become stable material if we ever end up back-porting
any new users of the unsafe uaccess code, but as things stand now this
doesn't matter for any current existing uses. ]
Cc: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|