Age | Commit message (Collapse) | Author | Files | Lines |
|
Due to the way I did the RX bitrate conversions in mac80211 with
spatch, going setting flags to setting the value, many drivers now
don't set the bandwidth value for 20 MHz, since with the flags it
wasn't necessary to (there was no 20 MHz flag, only the others.)
Rather than go through and try to fix up all the drivers, instead
renumber the enum so that 20 MHz, which is the typical bandwidth,
actually has the value 0, making those drivers all work again.
If VHT was hit used with a driver not reporting it, e.g. iwlmvm,
this manifested in hitting the bandwidth warning in
cfg80211_calculate_bitrate_vht().
Reported-by: Linus Torvalds <[email protected]>
Tested-by: Jens Axboe <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev
since it is always NULL.
This is clearly wrong, we have code to initialize it to loopback_dev,
unfortunately the order is still not correct.
loopback_dev is registered very early during boot, we lose a chance
to re-initialize it in notifier. addrconf_init() is called after
ip6_route_init(), which means we have no chance to correct it.
Fix it by moving this initialization explicitly after
ipv6_add_dev(init_net.loopback_dev) in addrconf_init().
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Sudarsana Reddy Kalluru says:
====================
qed*: Bug fix series.
The series contains minor bug fixes for qed/qede drivers.
Please consider applying it to 'net' branch.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Fail the configuration of advertised speed-autoneg value if the config
update is not supported.
Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Driver currently uses advertised-autoneg value to populate the
supported-autoneg field. When advertised field is updated, user gets
the same value for supported field. Supported-autoneg value need to be
populated from the link capabilities value returned by the MFW.
Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Value for status block id could be more than 256 in 100G mode, need to
update its data type from u8 to u16.
Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Static checkers complain that we should unlock before returning. Which
is true.
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Frank Rowand <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
|
|
As I noted on the mailing list, it's easier than I originally thought to
create intentional collisions in the digested names. Unfortunately it's
not too easy to solve this, so for now just fix the comment to not lie.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Now that there has been a dedicated mailing list, patchwork project, and
git repository set up for filesystem encryption, update the MAINTAINERS
file accordingly.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
When ext4 encryption was originally merged, we were encrypting the
user-specified filename in ext4_match(), introducing a lot of additional
complexity into ext4_match() and its callers. This has since been
changed to encrypt the filename earlier, so we can remove the gunk
that's no longer needed. This more or less reverts ext4_search_dir()
and ext4_find_dest_de() to the way they were in the v4.0 kernel.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Switch f2fs directory searches to use the fscrypt_match_name() helper
function. There should be no functional change.
Signed-off-by: Eric Biggers <[email protected]>
Acked-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Switch ext4 directory searches to use the fscrypt_match_name() helper
function. There should be no functional change.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Introduce a helper function fscrypt_match_name() which tests whether a
fscrypt_name matches a directory entry. Also clean up the magic numbers
and document things properly.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
When accessing an encrypted directory without the key, userspace must
operate on filenames derived from the ciphertext names, which contain
arbitrary bytes. Since we must support filenames as long as NAME_MAX,
we can't always just base64-encode the ciphertext, since that may make
it too long. Currently, this is solved by presenting long names in an
abbreviated form containing any needed filesystem-specific hashes (e.g.
to identify a directory block), then the last 16 bytes of ciphertext.
This needs to be sufficient to identify the actual name on lookup.
However, there is a bug. It seems to have been assumed that due to the
use of a CBC (ciphertext block chaining)-based encryption mode, the last
16 bytes (i.e. the AES block size) of ciphertext would depend on the
full plaintext, preventing collisions. However, we actually use CBC
with ciphertext stealing (CTS), which handles the last two blocks
specially, causing them to appear "flipped". Thus, it's actually the
second-to-last block which depends on the full plaintext.
This caused long filenames that differ only near the end of their
plaintexts to, when observed without the key, point to the wrong inode
and be undeletable. For example, with ext4:
# echo pass | e4crypt add_key -p 16 edir/
# seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
# find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
100000
# sync
# echo 3 > /proc/sys/vm/drop_caches
# keyctl new_session
# find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
2004
# rm -rf edir/
rm: cannot remove 'edir/_A7nNFi3rhkEQlJ6P,hdzluhODKOeWx5V': Structure needs cleaning
...
To fix this, when presenting long encrypted filenames, encode the
second-to-last block of ciphertext rather than the last 16 bytes.
Although it would be nice to solve this without depending on a specific
encryption mode, that would mean doing a cryptographic hash like SHA-256
which would be much less efficient. This way is sufficient for now, and
it's still compatible with encryption modes like HEH which are strong
pseudorandom permutations. Also, changing the presented names is still
allowed at any time because they are only provided to allow applications
to do things like delete encrypted directories. They're not designed to
be used to persistently identify files --- which would be hard to do
anyway, given that they're encrypted after all.
For ease of backports, this patch only makes the minimal fix to both
ext4 and f2fs. It leaves ubifs as-is, since ubifs doesn't compare the
ciphertext block yet. Follow-on patches will clean things up properly
and make the filesystems use a shared helper function.
Fixes: 5de0b4d0cd15 ("ext4 crypto: simplify and speed up filename encryption")
Reported-by: Gwendal Grignou <[email protected]>
Cc: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
If user has no key under an encrypted dir, fscrypt gives digested dentries.
Previously, when looking up a dentry, f2fs only checks its hash value with
first 4 bytes of the digested dentry, which didn't handle hash collisions fully.
This patch enhances to check entire dentry bytes likewise ext4.
Eric reported how to reproduce this issue by:
# seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
100000
# sync
# echo 3 > /proc/sys/vm/drop_caches
# keyctl new_session
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
99999
Cc: <[email protected]>
Reported-by: Eric Biggers <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
(fixed f2fs_dentry_hash() to work even when the hash is 0)
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
As ext4 and f2fs do, ubifs should check for consistent encryption
contexts during ->lookup() in an encrypted directory. This protects
certain users of filesystem encryption against certain types of offline
attacks.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
As for ext4, now that fscrypt_has_permitted_context() correctly handles
the case where we have the key for the parent directory but not the
child, f2fs_lookup() no longer has to work around it. Also add the same
warning message that ext4 uses.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Now that fscrypt_has_permitted_context() correctly handles the case
where we have the key for the parent directory but not the child, we
don't need to try to work around this in ext4_lookup().
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
To mitigate some types of offline attacks, filesystem encryption is
designed to enforce that all files in an encrypted directory tree use
the same encryption policy (i.e. the same encryption context excluding
the nonce). However, the fscrypt_has_permitted_context() function which
enforces this relies on comparing struct fscrypt_info's, which are only
available when we have the encryption keys. This can cause two
incorrect behaviors:
1. If we have the parent directory's key but not the child's key, or
vice versa, then fscrypt_has_permitted_context() returned false,
causing applications to see EPERM or ENOKEY. This is incorrect if
the encryption contexts are in fact consistent. Although we'd
normally have either both keys or neither key in that case since the
master_key_descriptors would be the same, this is not guaranteed
because keys can be added or removed from keyrings at any time.
2. If we have neither the parent's key nor the child's key, then
fscrypt_has_permitted_context() returned true, causing applications
to see no error (or else an error for some other reason). This is
incorrect if the encryption contexts are in fact inconsistent, since
in that case we should deny access.
To fix this, retrieve and compare the fscrypt_contexts if we are unable
to set up both fscrypt_infos.
While this slightly hurts performance when accessing an encrypted
directory tree without the key, this isn't a case we really need to be
optimizing for; access *with* the key is much more important.
Furthermore, the performance hit is barely noticeable given that we are
already retrieving the fscrypt_context and doing two keyring searches in
fscrypt_get_encryption_info(). If we ever actually wanted to optimize
this case we might start by caching the fscrypt_contexts.
Cc: [email protected] # 4.0+
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
IFLA_PHYS_PORT_NAME is a string attribute, so terminate it with \0.
Otherwise libnl3 fails to validate netlink messages with this attribute.
"ip -detail a" assumes too that the attribute is NUL-terminated when
printing it. It often was, due to padding.
I noticed this as libvirtd failing to start on a system with sfc driver
after upgrading it to Linux 4.11, i.e. when sfc added support for
phys_port_name.
Signed-off-by: Michal Schmidt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This fixes a race where vmbus callback for new packet arriving
could occur before NAPI is initialized.
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
V2: using "aquantia" subsystem tag.
The command "ethtool -i ethX" should display driver name (driver: atlantic)
instead vendor name (driver: aquantia).
Signed-off-by: Pavel Belous <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
raw_send_hdrinc() and rawv6_send_hdrinc() expect that the buffer copied
from the userspace contains the IPv4/IPv6 header, so if too few bytes are
copied, parts of the header may remain uninitialized.
This bug has been detected with KMSAN.
For the record, the KMSAN report:
==================================================================
BUG: KMSAN: use of unitialized memory in nf_ct_frag6_gather+0xf5a/0x44a0
inter: 0
CPU: 0 PID: 1036 Comm: probe Not tainted 4.11.0-rc5+ #2455
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x143/0x1b0 lib/dump_stack.c:52
kmsan_report+0x16b/0x1e0 mm/kmsan/kmsan.c:1078
__kmsan_warning_32+0x5c/0xa0 mm/kmsan/kmsan_instr.c:510
nf_ct_frag6_gather+0xf5a/0x44a0 net/ipv6/netfilter/nf_conntrack_reasm.c:577
ipv6_defrag+0x1d9/0x280 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn ./include/linux/netfilter.h:102
nf_hook_slow+0x13f/0x3c0 net/netfilter/core.c:310
nf_hook ./include/linux/netfilter.h:212
NF_HOOK ./include/linux/netfilter.h:255
rawv6_send_hdrinc net/ipv6/raw.c:673
rawv6_sendmsg+0x2fcb/0x41a0 net/ipv6/raw.c:919
inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
SyS_sendto+0xbc/0xe0 net/socket.c:1664
do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
RIP: 0033:0x436e03
RSP: 002b:00007ffce48baf38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000436e03
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffce48baf90 R08: 00007ffce48baf50 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000401790 R14: 0000000000401820 R15: 0000000000000000
origin: 00000000d9400053
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:362
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:257
kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:270
slab_alloc_node mm/slub.c:2735
__kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4341
__kmalloc_reserve net/core/skbuff.c:138
__alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
alloc_skb ./include/linux/skbuff.h:933
alloc_skb_with_frags+0x209/0xbc0 net/core/skbuff.c:4678
sock_alloc_send_pskb+0x9ff/0xe00 net/core/sock.c:1903
sock_alloc_send_skb+0xe4/0x100 net/core/sock.c:1920
rawv6_send_hdrinc net/ipv6/raw.c:638
rawv6_sendmsg+0x2918/0x41a0 net/ipv6/raw.c:919
inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633
sock_sendmsg net/socket.c:643
SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
SyS_sendto+0xbc/0xe0 net/socket.c:1664
do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
==================================================================
, triggered by the following syscalls:
socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = 3
sendto(3, NULL, 0, 0, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "ff00::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EPERM
A similar report is triggered in net/ipv4/raw.c if we use a PF_INET socket
instead of a PF_INET6 one.
Signed-off-by: Alexander Potapenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently jbd2_write_superblock() silently adds REQ_SYNC to flags with
which journal superblock is written. Make this explicit by making flags
passed down to jbd2_write_superblock() contain REQ_SYNC.
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
head is previously null checked and so the 2nd null check on head
is redundant and therefore can be removed.
Detected by CoverityScan, CID#1399505 ("Logically dead code")
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Under fuzzer stress, it is possible that a child gets a non NULL
fastopen_req pointer from its parent at accept() time, when/if parent
morphs from listener to active session.
We need to make sure this can not happen, by clearing the field after
socket cloning.
BUG: Double free or freeing an invalid pointer
Unexpected shadow byte: 0xFB
CPU: 3 PID: 20933 Comm: syz-executor3 Not tainted 4.11.0+ #306
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x292/0x395 lib/dump_stack.c:52
kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
kasan_report_double_free+0x5c/0x70 mm/kasan/report.c:185
kasan_slab_free+0x9d/0xc0 mm/kasan/kasan.c:580
slab_free_hook mm/slub.c:1357 [inline]
slab_free_freelist_hook mm/slub.c:1379 [inline]
slab_free mm/slub.c:2961 [inline]
kfree+0xe8/0x2b0 mm/slub.c:3882
tcp_free_fastopen_req net/ipv4/tcp.c:1077 [inline]
tcp_disconnect+0xc15/0x13e0 net/ipv4/tcp.c:2328
inet_child_forget+0xb8/0x600 net/ipv4/inet_connection_sock.c:898
inet_csk_reqsk_queue_add+0x1e7/0x250
net/ipv4/inet_connection_sock.c:928
tcp_get_cookie_sock+0x21a/0x510 net/ipv4/syncookies.c:217
cookie_v4_check+0x1a19/0x28b0 net/ipv4/syncookies.c:384
tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1384 [inline]
tcp_v4_do_rcv+0x731/0x940 net/ipv4/tcp_ipv4.c:1421
tcp_v4_rcv+0x2dc0/0x31c0 net/ipv4/tcp_ipv4.c:1715
ip_local_deliver_finish+0x4cc/0xc20 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:257 [inline]
ip_local_deliver+0x1ce/0x700 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:492 [inline]
ip_rcv_finish+0xb1d/0x20b0 net/ipv4/ip_input.c:396
NF_HOOK include/linux/netfilter.h:257 [inline]
ip_rcv+0xd8c/0x19c0 net/ipv4/ip_input.c:487
__netif_receive_skb_core+0x1ad1/0x3400 net/core/dev.c:4210
__netif_receive_skb+0x2a/0x1a0 net/core/dev.c:4248
process_backlog+0xe5/0x6c0 net/core/dev.c:4868
napi_poll net/core/dev.c:5270 [inline]
net_rx_action+0xe70/0x18e0 net/core/dev.c:5335
__do_softirq+0x2fb/0xb99 kernel/softirq.c:284
do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:899
</IRQ>
do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
do_softirq kernel/softirq.c:176 [inline]
__local_bh_enable_ip+0x1cf/0x1e0 kernel/softirq.c:181
local_bh_enable include/linux/bottom_half.h:31 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:931 [inline]
ip_finish_output2+0x9ab/0x15e0 net/ipv4/ip_output.c:230
ip_finish_output+0xa35/0xdf0 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip_output+0x1f6/0x7b0 net/ipv4/ip_output.c:404
dst_output include/net/dst.h:486 [inline]
ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
ip_queue_xmit+0x9a8/0x1a10 net/ipv4/ip_output.c:503
tcp_transmit_skb+0x1ade/0x3470 net/ipv4/tcp_output.c:1057
tcp_write_xmit+0x79e/0x55b0 net/ipv4/tcp_output.c:2265
__tcp_push_pending_frames+0xfa/0x3a0 net/ipv4/tcp_output.c:2450
tcp_push+0x4ee/0x780 net/ipv4/tcp.c:683
tcp_sendmsg+0x128d/0x39b0 net/ipv4/tcp.c:1342
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
SYSC_sendto+0x660/0x810 net/socket.c:1696
SyS_sendto+0x40/0x50 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x446059
RSP: 002b:00007faa6761fb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 0000000000446059
RDX: 0000000000000001 RSI: 0000000020ba3fcd RDI: 0000000000000017
RBP: 00000000006e40a0 R08: 0000000020ba4ff0 R09: 0000000000000010
R10: 0000000020000000 R11: 0000000000000282 R12: 0000000000708150
R13: 0000000000000000 R14: 00007faa676209c0 R15: 00007faa67620700
Object at ffff88003b5bbcb8, in cache kmalloc-64 size: 64
Allocated:
PID = 20909
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745
kmalloc include/linux/slab.h:490 [inline]
kzalloc include/linux/slab.h:663 [inline]
tcp_sendmsg_fastopen net/ipv4/tcp.c:1094 [inline]
tcp_sendmsg+0x221a/0x39b0 net/ipv4/tcp.c:1139
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
SYSC_sendto+0x660/0x810 net/socket.c:1696
SyS_sendto+0x40/0x50 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x1f/0xbe
Freed:
PID = 20909
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
slab_free_hook mm/slub.c:1357 [inline]
slab_free_freelist_hook mm/slub.c:1379 [inline]
slab_free mm/slub.c:2961 [inline]
kfree+0xe8/0x2b0 mm/slub.c:3882
tcp_free_fastopen_req net/ipv4/tcp.c:1077 [inline]
tcp_disconnect+0xc15/0x13e0 net/ipv4/tcp.c:2328
__inet_stream_connect+0x20c/0xf90 net/ipv4/af_inet.c:593
tcp_sendmsg_fastopen net/ipv4/tcp.c:1111 [inline]
tcp_sendmsg+0x23a8/0x39b0 net/ipv4/tcp.c:1139
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
SYSC_sendto+0x660/0x810 net/socket.c:1696
SyS_sendto+0x40/0x50 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 7db92362d2fe ("tcp: fix potential double free issue for fastopen_req")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Acked-by: Wei Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Milan Broz <[email protected]>
|
|
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_FUA implementation.
generic_make_request_checks() however strips REQ_FUA flag from a bio
when the storage doesn't report volatile write cache and thus write
effectively becomes asynchronous which can lead to performance
regressions. This affects superblock writes for ext4. Fix the problem
by marking superblock writes always as synchronous.
Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Since netif_carrier_on() will do nothing if device's
carrier is already on, so it's unnecessary to do
carrier status check.
It's the same for netif_carrier_off().
Signed-off-by: Zhu Yanjun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Depending on the passed @idle arg, there may be no need to calculate
'nr_free' or 'nr_clean' respectively in free_target_met() and
clean_target_met().
Signed-off-by: Mike Snitzer <[email protected]>
|
|
dm-cache's smq policy tries hard to do it's work during the idle periods
when there is no IO. But if there are no idle periods (eg, a long fio
run) we still need to allow some demotions and promotions to occur.
To achieve this, pass @idle=true to queue_promotion()'s
free_target_met() call so that free_target_met() doesn't short-circuit
the possibility of demotion simply because it isn't an idle period.
Fixes: b29d4986d0 ("dm cache: significant rework to leverage dm-bio-prison-v2")
Reported-by: John Harrigan <[email protected]>
Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Expose the fifo lists, cached next requests, batching state, and
dispatch list. It'd also be possible to add the sorted lists, but there
aren't already seq_file helpers for rbtrees.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Expose the domain token pools, asynchronous sbitmap depth, domain
request lists, and batching state.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This provides the infrastructure for schedulers to expose their internal
state through debugfs. We add a list of queue attributes and a list of
hctx attributes to struct elevator_type and wire them up when switching
schedulers.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Add missing seq_file.h header in blk-mq-debugfs.h
Signed-off-by: Jens Axboe <[email protected]>
|
|
Originally, I tied debugfs registration/unregistration together with
sysfs. There's no reason to do this, and it's getting in the way of
letting schedulers define their own debugfs attributes. Instead, tie the
debugfs registration to the lifetime of the structures themselves.
The saner lifetimes mean we can also get rid of the extra mq directory
and move everything one level up. I.e., nvme0n1/mq/hctx0/tags is now
just nvme0n1/hctx0/tags.
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Preparation for adding more declarations.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In commit e869b5462f83 ("blk-mq: Unregister debugfs attributes
earlier"), we shuffled the debugfs cleanup around so that the "state"
attribute was removed before we freed the blk-mq data structures.
However, later changes are going to undo that, so we need to explicitly
disallow running a dead queue.
[Omar: rebased and updated commit message]
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
A large part of blk-mq-debugfs.c is file_operations and seq_file
boilerplate. This sucks as is but will suck even more when schedulers
can define their own debugfs entries. Factor it all out into a single
blk_mq_debugfs_fops which multiplexes as needed. We store the
request_queue, blk_mq_hw_ctx, or blk_mq_ctx in the parent directory
dentry, which is kind of hacky, but it works.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
It's not clear what these numbered directories represent unless you
consult the code. We're about to get rid of the intermediate "mq"
directory, so these would be even more confusing without that context.
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Slightly more readable, plus we also strip leading spaces.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
blk_queue_flags_store() currently truncates and returns a short write if
the operation being written is too long. This can give us weird results,
like here:
$ echo "run bar"
echo: write error: invalid argument
$ dmesg
[ 1103.075435] blk_queue_flags_store: unsupported operation bar. Use either 'run' or 'start'
Instead, return an error if the user does this. While we're here, make
the argument names consistent with everywhere else in this file.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Make sure the spelled out flag names match the definition. This also
adds a missing hctx state, BLK_MQ_S_START_ON_RUN, and a missing
cmd_flag, __REQ_NOUNMAP.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This reads more naturally than spaces.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In commit 0d3b12584972 "nfs: Convert to separately allocated bdi" I have
wrongly cloned bdi reference in nfs_clone_super(). Further inspection
has shown that originally the code was actually allocating a new bdi (in
->clone_server callback) which was later registered in
nfs_fs_mount_common() and used for sb->s_bdi in nfs_initialise_sb().
This could later result in bdi for the original superblock not getting
unregistered when that superblock got shutdown (as the cloned sb still
held bdi reference) and later when a new superblock was created under
the same anonymous device number, a clash in sysfs has happened on bdi
registration:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 10284 at /linux-next/fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x74
sysfs: cannot create duplicate filename '/devices/virtual/bdi/0:32'
Modules linked in: axp20x_usb_power gpio_axp209 nvmem_sunxi_sid sun4i_dma sun4i_ss virt_dma
CPU: 1 PID: 10284 Comm: mount.nfs Not tainted 4.11.0-rc4+ #14
Hardware name: Allwinner sun7i (A20) Family
[<c010f19c>] (unwind_backtrace) from [<c010bc74>] (show_stack+0x10/0x14)
[<c010bc74>] (show_stack) from [<c03c6e24>] (dump_stack+0x78/0x8c)
[<c03c6e24>] (dump_stack) from [<c0122200>] (__warn+0xe8/0x100)
[<c0122200>] (__warn) from [<c0122250>] (warn_slowpath_fmt+0x38/0x48)
[<c0122250>] (warn_slowpath_fmt) from [<c02ac178>] (sysfs_warn_dup+0x64/0x74)
[<c02ac178>] (sysfs_warn_dup) from [<c02ac254>] (sysfs_create_dir_ns+0x84/0x94)
[<c02ac254>] (sysfs_create_dir_ns) from [<c03c8b8c>] (kobject_add_internal+0x9c/0x2ec)
[<c03c8b8c>] (kobject_add_internal) from [<c03c8e24>] (kobject_add+0x48/0x98)
[<c03c8e24>] (kobject_add) from [<c048d75c>] (device_add+0xe4/0x5a0)
[<c048d75c>] (device_add) from [<c048ddb4>] (device_create_groups_vargs+0xac/0xbc)
[<c048ddb4>] (device_create_groups_vargs) from [<c048dde4>] (device_create_vargs+0x20/0x28)
[<c048dde4>] (device_create_vargs) from [<c02075c8>] (bdi_register_va+0x44/0xfc)
[<c02075c8>] (bdi_register_va) from [<c023d378>] (super_setup_bdi_name+0x48/0xa4)
[<c023d378>] (super_setup_bdi_name) from [<c0312ef4>] (nfs_fill_super+0x1a4/0x204)
[<c0312ef4>] (nfs_fill_super) from [<c03133f0>] (nfs_fs_mount_common+0x140/0x1e8)
[<c03133f0>] (nfs_fs_mount_common) from [<c03335cc>] (nfs4_remote_mount+0x50/0x58)
[<c03335cc>] (nfs4_remote_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
[<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
[<c025cba0>] (vfs_kern_mount) from [<c033352c>] (nfs_do_root_mount+0x80/0xa0)
[<c033352c>] (nfs_do_root_mount) from [<c0333818>] (nfs4_try_mount+0x28/0x3c)
[<c0333818>] (nfs4_try_mount) from [<c0313874>] (nfs_fs_mount+0x2cc/0x8c4)
[<c0313874>] (nfs_fs_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
[<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
[<c025cba0>] (vfs_kern_mount) from [<c02600f0>] (do_mount+0x158/0xc7c)
[<c02600f0>] (do_mount) from [<c0260f98>] (SyS_mount+0x8c/0xb4)
[<c0260f98>] (SyS_mount) from [<c0107840>] (ret_fast_syscall+0x0/0x3c)
Fix the problem by always creating new bdi for a superblock as we used
to do.
Reported-and-tested-by: Corentin Labbe <[email protected]>
Fixes: 0d3b12584972ce5781179ad3f15cca3cdb5cae05
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
By poking at /debug/sched_features I triggered the following splat:
[] ======================================================
[] WARNING: possible circular locking dependency detected
[] 4.11.0-00873-g964c8b7-dirty #694 Not tainted
[] ------------------------------------------------------
[] bash/2109 is trying to acquire lock:
[] (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff8120cb8b>] static_key_slow_dec+0x1b/0x50
[]
[] but task is already holding lock:
[] (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
[]
[] which lock already depends on the new lock.
[]
[]
[] the existing dependency chain (in reverse order) is:
[]
[] -> #2 (&sb->s_type->i_mutex_key#4){+++++.}:
[] lock_acquire+0x100/0x210
[] down_write+0x28/0x60
[] start_creating+0x5e/0xf0
[] debugfs_create_dir+0x13/0x110
[] blk_mq_debugfs_register+0x21/0x70
[] blk_mq_register_dev+0x64/0xd0
[] blk_register_queue+0x6a/0x170
[] device_add_disk+0x22d/0x440
[] loop_add+0x1f3/0x280
[] loop_init+0x104/0x142
[] do_one_initcall+0x43/0x180
[] kernel_init_freeable+0x1de/0x266
[] kernel_init+0xe/0x100
[] ret_from_fork+0x31/0x40
[]
[] -> #1 (all_q_mutex){+.+.+.}:
[] lock_acquire+0x100/0x210
[] __mutex_lock+0x6c/0x960
[] mutex_lock_nested+0x1b/0x20
[] blk_mq_init_allocated_queue+0x37c/0x4e0
[] blk_mq_init_queue+0x3a/0x60
[] loop_add+0xe5/0x280
[] loop_init+0x104/0x142
[] do_one_initcall+0x43/0x180
[] kernel_init_freeable+0x1de/0x266
[] kernel_init+0xe/0x100
[] ret_from_fork+0x31/0x40
[] *** DEADLOCK ***
[]
[] 3 locks held by bash/2109:
[] #0: (sb_writers#11){.+.+.+}, at: [<ffffffff81292bcd>] vfs_write+0x17d/0x1a0
[] #1: (debugfs_srcu){......}, at: [<ffffffff8155a90d>] full_proxy_write+0x5d/0xd0
[] #2: (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
[]
[] stack backtrace:
[] CPU: 9 PID: 2109 Comm: bash Not tainted 4.11.0-00873-g964c8b7-dirty #694
[] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[] Call Trace:
[] lock_acquire+0x100/0x210
[] get_online_cpus+0x2a/0x90
[] static_key_slow_dec+0x1b/0x50
[] static_key_disable+0x20/0x30
[] sched_feat_write+0x131/0x170
[] full_proxy_write+0x97/0xd0
[] __vfs_write+0x28/0x120
[] vfs_write+0xb5/0x1a0
[] SyS_write+0x49/0xa0
[] entry_SYSCALL_64_fastpath+0x23/0xc2
This is because of the cpu hotplug lock rework. Break the chain at #1
by reversing the lock acquisition order. This way i_mutex_key#4 no
longer depends on cpu_hotplug_lock and things are good.
Cc: Jens Axboe <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Free memory correctly when an allocation fails on a loop and we free
backwards previously successful allocations.
Signed-off-by: Javier González <[email protected]>
Reviewed-by: Matias Bjørling <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Create nvme command before allocating a request using
nvme_alloc_request, which uses the command direction. Up until now, the
command has been zeroized, so all commands have been allocated as a
read operation.
Signed-off-by: Javier González <[email protected]>
Reviewed-by: Matias Bjørling <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The #ifndef was removed in 75aaafb79f73516b69d5639ad30a72d72e75c8b4,
but it was also protecting smp_send_reschedule() in kvm_vcpu_kick().
Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Macs send the maximum buffer size in response on ioctl to validate
negotiate security information, which causes us to fail the mount
as the response buffer is larger than the expected response.
Changed ioctl response processing to allow for padding of validate
negotiate ioctl response and limit the maximum response size to
maximum buffer size.
Signed-off-by: Steve French <[email protected]>
CC: Stable <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
- Minor code cleanups
- Fix section alignment for .init_array
* tag 'modules-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
kallsyms: Use bounded strnchr() when parsing string
module: Unify the return value type of try_module_get
module: set .init_array alignment to 8
|