Age | Commit message (Collapse) | Author | Files | Lines |
|
Reading from ADDR_EMPTY is out of bounds. The current code generates a
static checker warning because we check for out of bounds "lba" before
we check for ADDR_EMPTY, so the second check is always false. It looks
like we intended ADDR_EMPTY to be a no-op without printing a warning.
Fixes: a4bd217b4326 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Javier González <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This is a static checker fix, and perhaps not a real bug. The static
checker thinks that nr_secs could be negative. It would result in
zeroing more memory than intended. Anyway, even if it's not a bug,
changing this variable to unsigned makes the code easier to audit.
Fixes: a4bd217b4326 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Javier González <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull networking fixes from David Miller:
1) Don't race in IPSEC dumps, from Yuejie Shi.
2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu.
3) Fix out of bounds access in ipv6 segment routing code, from David
Lebrun.
4) Don't write into the header of cloned SKBs in smsc95xx driver, from
James Hughes.
5) Several other drivers have this bug too, fix them. From Eric
Dumazet.
6) Fix access to uninitialized data in TC action cookie code, from
Wolfgang Bumiller.
7) Fix double free in IPV6 segment routing, again from David Lebrun.
8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern.
9) Fix use after free in qrtr code, from Dan Carpenter.
10) Don't double-destroy devices in ip6mr code, from Nikolay
Aleksandrov.
11) Don't pass out-of-range TX queue indices into drivers, from Tushar
Dave.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
netpoll: Check for skb->queue_mapping
ip6mr: fix notification device destruction
bpf, doc: update bpf maintainers entry
net: qrtr: potential use after free in qrtr_sendmsg()
bpf: Fix values type used in test_maps
net: ipv6: RTF_PCPU should not be settable from userspace
gso: Validate assumption of frag_list segementation
kaweth: use skb_cow_head() to deal with cloned skbs
ch9200: use skb_cow_head() to deal with cloned skbs
lan78xx: use skb_cow_head() to deal with cloned skbs
sr9700: use skb_cow_head() to deal with cloned skbs
cx82310_eth: use skb_cow_head() to deal with cloned skbs
smsc75xx: use skb_cow_head() to deal with cloned skbs
ipv6: sr: fix double free of skb after handling invalid SRH
MAINTAINERS: Add "B:" field for networking.
net sched actions: allocate act cookie early
qed: Fix issue in populating the PFC config paramters.
qed: Fix possible system hang in the dcbnl-getdcbx() path.
qed: Fix sending an invalid PFC error mask to MFW.
qed: Fix possible error in populating max_tc field.
...
|
|
Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:
if (bi->profile)
disk->queue->backing_dev_info->capabilities |=
BDI_CAP_STABLE_WRITES;
else
disk->queue->backing_dev_info->capabilities &=
~BDI_CAP_STABLE_WRITES;
It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time. This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.
Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there. This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.
Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: [email protected] # 4.4+, needs backporting
Tested-by: Dan Williams <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.
One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.
e.g.
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 0000000000001525
tsk->{mm,active_mm}->pgd = fff800130ff9a000
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
kworker/48:1(475): Oops [#1]
CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G OE
4.11.0-rc3-davem-net+ #7
Workqueue: events queue_process
task: fff80013113299c0 task.stack: fff800131132c000
TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
00000000 Tainted: G OE
TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]>
g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
0000000000000001
g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
00000000000000c0
o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
0000000000000003
o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
000000000049ed94
RPC: <set_next_entity+0x34/0xb80>
l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
0000000000000000
l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
fff8001fa7605028
i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
0000000000000000
i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
00000000103fa4b0
I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]>
Call Trace:
[00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
[0000000000998c74] netpoll_start_xmit+0xf4/0x200
[0000000000998e10] queue_process+0x90/0x160
[0000000000485fa8] process_one_work+0x188/0x480
[0000000000486410] worker_thread+0x170/0x4c0
[000000000048c6b8] kthread+0xd8/0x120
[0000000000406064] ret_from_fork+0x1c/0x2c
[0000000000000000] (null)
Disabling lock debugging due to kernel taint
Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
Caller[0000000000998e10]: queue_process+0x90/0x160
Caller[0000000000485fa8]: process_one_work+0x188/0x480
Caller[0000000000486410]: worker_thread+0x170/0x4c0
Caller[000000000048c6b8]: kthread+0xd8/0x120
Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
Caller[0000000000000000]: (null)
Signed-off-by: Tushar Dave <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.
The trace from Andrey:
------------[ cut here ]------------
kernel BUG at net/core/dev.c:6813!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: netns cleanup_net
task: ffff880069208000 task.stack: ffff8800692d8000
RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
FS: 0000000000000000(0000) GS:ffff88006cb00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
Call Trace:
unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
__raw_notifier_call_chain kernel/notifier.c:394
raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
call_netdevice_notifiers net/core/dev.c:1663
rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
unregister_netdevice_many net/core/dev.c:7880
default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
kthread+0x35e/0x430 kernel/kthread.c:231
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
---[ end trace e0b29c57e9b3292c ]---
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The ADT7475 and ADT7476 have the STRT bit cleared by default[1]. Before any
monitoring activities the STRT bit needs to be set. Logically this needs
to happen before any of the sensors are read so the probe() function
seems the best place for it.
[1] - https://www.onsemi.com/pub/Collateral/ADT7475-D.PDF
Signed-off-by: Chris Packham <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
The shunt voltage and current registers are signed 16-bit values so
handle them as such.
Signed-off-by: Joe Schaack <[email protected]>
Reviewed-by: Aaron Sierra <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
Add various related files that have been missing under
BPF entry covering essential parts of its infrastructure
and also add myself as co-maintainer.
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If skb_pad() fails then it frees the skb so we should check for errors.
Fixes: bdabad3e363d ("net: Add Qualcomm IPC router")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Maps of per-cpu type have their value element size adjusted to 8 if it
is specified smaller during various map operations.
This makes test_maps as a 32-bit binary fail, in fact the kernel
writes past the end of the value's array on the user's stack.
To be quite honest, I think the kernel should reject creation of a
per-cpu map that doesn't have a value size of at least 8 if that's
what the kernel is going to silently adjust to later.
If the user passed something smaller, it is a sizeof() calcualtion
based upon the type they will actually use (just like in this testcase
code) in later calls to the map operations.
Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Signed-off-by: David S. Miller <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
|
|
From userspace calling ioctl(NVM_DEV_CREATE) was returning ENOMEM for
invalid arguments even though pblk (pblk_init) was returning correctly
-EINVAL to nvm_create_tgt inside core. This patch propagates the
correct return value to userspace.
Because pblk was introduced recently this only needs to go in 4.12.
Fixes: a4bd217b4326 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Rakesh Pandit <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Avoid that the following kernel bug gets triggered:
BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:349
in_atomic(): 1, irqs_disabled(): 0, pid: 8019, name: find
CPU: 10 PID: 8019 Comm: find Tainted: G W I 4.11.0-rc4-dbg+ #2
Call Trace:
dump_stack+0x68/0x93
___might_sleep+0x16e/0x230
__might_sleep+0x4a/0x80
__ext4_get_inode_loc+0x1e0/0x4e0
ext4_iget+0x70/0xbc0
ext4_iget_normal+0x2f/0x40
ext4_lookup+0xb6/0x1f0
lookup_slow+0x104/0x1e0
walk_component+0x19a/0x330
path_lookupat+0x4b/0x100
filename_lookup+0x9a/0x110
user_path_at_empty+0x36/0x40
vfs_statx+0x67/0xc0
SYSC_newfstatat+0x20/0x40
SyS_newfstatat+0xe/0x10
entry_SYSCALL_64_fastpath+0x18/0xad
This happens since the big if/else in blk_mq_make_request() doesn't
have final else section that also drops the ctx. Add that.
Fixes: b00c53e8f411 ("blk-mq: fix schedule-while-atomic with scheduler attached")
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Omar Sandoval <[email protected]>
Added a bit more to the commit log.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Andrey reported a fault in the IPv6 route code:
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880069809600 task.stack: ffff880062dc8000
RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
RSP: 0018:ffff880062dced30 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
Call Trace:
ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
...
Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
set. Flags passed to the kernel are blindly copied to the allocated
rt6_info by ip6_route_info_create making a newly inserted route appear
as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
and expects rt->dst.from to be set - which it is not since it is not
really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
generates the fault.
Fix by checking for the flag and failing with EINVAL.
Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info")
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
pointer") assumes that all SKBs in a frag_list (except maybe the last
one) contain the same amount of GSO payload.
This assumption is not always correct, resulting in the following
warning message in the log:
skb_segment: too many frags
For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
one frag, and some with 2 frags.
After GRO, the frag_list SKBs end up having different amounts of payload.
If this frag_list SKB is then forwarded, the aforementioned assumption
is violated.
Validate the assumption, and fall back to software GSO if it not true.
Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212
Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: Ilan Tayari <[email protected]>
Signed-off-by: Ilya Lesokhin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Eric Dumazet says:
====================
net: use skb_cow_head() to deal with cloned skbs
James Hughes found an issue with smsc95xx driver. Same problematic code
is found in other drivers.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
We can use skb_cow_head() to properly deal with clones,
especially the ones coming from TCP stack that allow their head being
modified. This avoids a copy.
Signed-off-by: Eric Dumazet <[email protected]>
Cc: James Hughes <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: James Hughes <[email protected]>
Cc: Matthew Garrett <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: James Hughes <[email protected]>
Cc: Woojung Huh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: James Hughes <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: cc28a20e77b2 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: James Hughes <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: James Hughes <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The icmpv6_param_prob() function already does a kfree_skb(),
this patch removes the duplicate one.
Fixes: 1ababeba4a21f3dba3da3523c670b207fb2feb62 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
Reported-by: Dan Carpenter <[email protected]>
Cc: Dan Carpenter <[email protected]>
Signed-off-by: David Lebrun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Christoph writes:
This is the current NVMe pile: virtualization extensions, lots of FC
updates and various misc bits. There are a few more FC bits that didn't
make the cut, but we'd like to get this request out before the merge
window for sure.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Just two fixes.
The first fixes kprobing a stdu, and is marked for stable as it's been
broken for ~ever. In hindsight this could have gone in next.
The other is a fix for a change we merged this cycle, where if we take
a certain exception when the kernel is running relocated (currently
only used for kdump), we checkstop the box.
Thanks to Ravi Bangoria"
* tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"Sorry this is so late. It's been in -next for over a week, but I
forgot to send it on until now.
A single fix to the DT binding of the HiSilicon PCIe host support"
* tag 'pci-v4.11-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: hisi: Fix DT binding (hisi-pcie-almost-ecam)
|
|
Pull block layer fixes from Jens Axboe:
"A couple of last minute fixes for regressions in this cycle. More
specifically:
- Two patches from Andy, adjusting the NVMe APST quirks to avoid some
issues specific to one Toshiba drive, and some variant of Samsung
on two specific Dell laptops.
- A fix for mtip32xx, turning off mq scheduling on that device. We
have a real fix for this, but it's too late in the cycle.
Thankfully we already have a NO_SCHED flag we can apply here. A
prep patch for this is ensuring that we honor the NO_SCHED flag
when attempting to online switch schedulers, previsouly we only did
so for drive load time. From Ming.
- Fixing an oops in blk-mq polling with scheduling attached. This one
is easily reproducible, it would be a shame to release 4.11 with
that issue. From me.
I'd prefer not having to send in patches at this point in time, but
the above are all things that have regressed in this cycle and the
fixes are relatively straight forward"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: fix potential oops with polling and blk-mq scheduler
nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA"
nvme: Adjust the Samsung APST quirk
mtip32xx: pass BLK_MQ_F_NO_SCHED
block: respect BLK_MQ_F_NO_SCHED
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI build fix from Rafael Wysocki:
"This avoids a false-positive build warning from the compiler.
Specifics:
- Avoid a false-positive warning regarding a variable that may not be
initialized that started to trigger after a previous general build
fix (Arnd Bergmann)"
* tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / power: Avoid maybe-uninitialized warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- kmalloc sdio scratch buffer to make it DMA-friendly
MMC host:
- dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used
- sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50
cards"
* tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card
mmc: dw_mmc: Don't allow Runtime PM for SDIO cards
mmc: sdio: fix alignment issue in struct sdio_func
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixlet from Dmitry Torokhov:
"An update to Elan PS/2 driver to allow working on yet another
Lifebook"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled
|
|
If skb_pad() fails then it frees skb and we don't need to free it again
at the end of the function.
Fixes: dc7bf5d7 ("HSI: Introduce driver for SSI Protocol")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Sebastian Reichel <[email protected]>
|
|
We need to get the command payload from the request before
we attempt to dereference it.
Fixes: 4dda4735c581 ("mtip32xx: add a status field to struct mtip_cmd")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We want people to report bugs to the netdev list.
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently most IOs which return the nvme error codes are retried on
the other path if those IOs returns EIO from NVMe driver. This
patch let Multipath distinguish nvme media error codes and some
generic or cmd-specific nvme error codes so that multipath will
not retry those kinds of IO, to save bandwidth.
Signed-off-by: Junxiong Guan <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
If an IO timeout occurs, it's helpful to know if the controller did not
post a completion or the driver missed an interrupt. While we never expect
the latter, this patch will make it possible to tell the difference so
we don't have to guess.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
The FC-NVME spec revised syntax to avoid comma separators.
Sync with the change in the parser for traddr on port attachments.
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
|
|
remoteport teardown never aborted the LS opertions. Add support.
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
|
|
Link LS's on the remoteport rather than the controller. LS's are
between nport's. Makes more sense, especially on async teardown where
the controller is torn down regardless of the LS (LS is more of a notifier
to the target of the teardown), to have them on the remoteport.
While revising ls send/done routines, issues were seen relative to
refcounting and cleanup, especially in async path. Reworked these code
paths.
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
|
|
Add missing reference in add_port
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
|
|
target transport:
----------------------
There are cases when there is a need to abort in-progress target
operations (writedata) so that controller termination or errors can
clean up. That can't happen currently as the abort is another target
op type, so it can't be used till the running one finishes (and it may
not). Solve by removing the abort op type and creating a separate
downcall from the transport to the lldd to request an io to be aborted.
The transport will abort ios on queue teardown or io errors. In general
the transport tries to call the lldd abort only when the io state is
idle. Meaning: ops that transmit data (readdata or rsp) will always
finish their transmit (or the lldd will see a state on the
link or initiator port that fails the transmit) and the done call for
the operation will occur. The transport will wait for the op done
upcall before calling the abort function, and as the io is idle, the
io can be cleaned up immediately after the abort call; Similarly, ios
that are not waiting for data or transmitting data must be in the nvmet
layer being processed. The transport will wait for the nvmet layer
completion before calling the abort function, and as the io is idle,
the io can be cleaned up immediately after the abort call; As for ops
that are waiting for data (writedata), they may be outstanding
indefinitely if the lldd doesn't see a condition where the initiatior
port or link is bad. In those cases, the transport will call the abort
function and wait for the lldd's op done upcall for the operation, where
it will then clean up the io.
Additionally, if a lldd receives an ABTS and matches it to an outstanding
request in the transport, A new new transport upcall was created to abort
the outstanding request in the transport. The transport expects any
outstanding op call (readdata or writedata) will completed by the lldd and
the operation upcall made. The transport doesn't act on the reported
abort (e.g. clean up the io) until an op done upcall occurs, a new op is
attempted, or the nvmet layer completes the io processing.
fcloop:
----------------------
Updated to support the new target apis.
On fcp io aborts from the initiator, the loopback context is updated to
NULL out the half that has completed. The initiator side is immediately
called after the abort request with an io completion (abort status).
On fcp io aborts from the target, the io is stopped and the initiator side
sees it as an aborted io. Target side ops, perhaps in progress while the
initiator side is done, continue but noop the data movement as there's no
structure on the initiator side to reference.
patch also contains:
----------------------
Revised lpfc to support the new abort api
commonized rsp buffer syncing and nulling of private data based on
calling paths.
errors in op done calls don't take action on the fod. They're bad
operations which implies the fod may be bad.
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
|
|
Current design has the fcloop job struct, used for both initiator and
target processing, allocated as part of the initiator request structure.
On aborts, the initiator side (based on the request) may terminate, yet
the target side wants to continue processing. the target side can't do
that if the initiator side goes away.
Revise fcloop to allocate an independent target side structure when it
starts an io from the initiator.
Added a lock to the request struct as well to synchronize pointer updates
on abort calls.
Modified target downcalls to recognize conditions where initiator has
aborted the io (thus nulled the pointer between job structs), thus
avoid referencing sgl lists which are gone and no longer making upcalls
to the initiator.
In conditions where the targetport is no longer connected, have the
initiator return an access failure rather than simulating a command
completion.
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
|
|
With the advent of the opdone calls changing context, the lldd can no
longer assume that once the op->done call returns for RSP operations
that the request struct is no longer being accessed.
As such, revise the lldd api for a req_release callback that the
transport will call when the job is complete. This will also be used
with abort cases.
Fixed text in api header for change in io complete semantics.
Revised lpfc to support the new req_release api.
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
|
|
Two new feature flags were added to control whether upcalls to the
transport result in context switches or stay in the calling context.
NVMET_FCTGTFEAT_CMD_IN_ISR:
By default, if the flag is not set, the transport assumes the
lldd is in a non-isr context and in the cpu context it should be
for the io queue. As such, the cmd handler is called directly in the
calling context.
If the flag is set, indicating the upcall is an isr context, the
transport mandates a transition to a workqueue. The workqueue assigned
to the queue is used for the context.
NVMET_FCTGTFEAT_OPDONE_IN_ISR
By default, if the flag is not set, the transport assumes the
lldd is in a non-isr context and in the cpu context it should be
for the io queue. As such, the fcp operation done callback is called
directly in the calling context.
If the flag is set, indicating the upcall is an isr context, the
transport mandates a transition to a workqueue. The workqueue assigned
to the queue is used for the context.
Updated lpfc for flags
Signed-off-by: James Smart <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
|
|
This is safer as it doesn't rely on the data being stored in
a single page in an sgl.
It also aids our effort to start phasing out users of sg_page. See [1].
For this we kmalloc some memory, copy to it and free at the end. Note:
we can't allocate this memory on the stack as the kbuild test robot
reports some frame size overflows on i386.
[1] https://lwn.net/Articles/720053/
Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
|
|
This change provides a mechanism to reduce the number of MMIO doorbell
writes for the NVMe driver. When running in a virtualized environment
like QEMU, the cost of an MMIO is quite hefy here. The main idea for
the patch is provide the device two memory location locations:
1) to store the doorbell values so they can be lookup without the doorbell
MMIO write
2) to store an event index.
I believe the doorbell value is obvious, the event index not so much.
Similar to the virtio specification, the virtual device can tell the
driver (guest OS) not to write MMIO unless you are writing past this
value.
FYI: doorbell values are written by the nvme driver (guest OS) and the
event index is written by the virtual device (host OS).
The patch implements a new admin command that will communicate where
these two memory locations reside. If the command fails, the nvme
driver will work as before without any optimizations.
Contributions:
Eric Northup <[email protected]>
Frank Swiderski <[email protected]>
Ted Tso <[email protected]>
Keith Busch <[email protected]>
Just to give an idea on the performance boost with the vendor
extension: Running fio [1], a stock NVMe driver I get about 200K read
IOPs with my vendor patch I get about 1000K read IOPs. This was
running with a null device i.e. the backing device simply returned
success on every read IO request.
[1] Running on a 4 core machine:
fio --time_based --name=benchmark --runtime=30
--filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32
--direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4
--rw=randread --blocksize=4k --randrepeat=false
Signed-off-by: Rob Nelson <[email protected]>
[mlin: port for upstream]
Signed-off-by: Ming Lin <[email protected]>
[koike: updated for upstream]
Signed-off-by: Helen Koike <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
The QPRIO field is only valid if weighted round robin arbitration is used,
and this driver doesn't enable that controller configuration option.
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
|
|
No point in providing and exporting this helper. There's just
one (real) user of it, just use rq_data_dir().
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fengguang Wu's zero day bot triggered a stack unwinder dump. This can
be easily triggered when CONFIG_FRAME_POINTERS is enabled and -mfentry
is in use on x86_32.
># cd /sys/kernel/debug/tracing
># echo 'p:schedule schedule' > kprobe_events
># echo stacktrace > events/kprobes/schedule/trigger
This is because the code that implemented fentry in the ftrace_regs_caller
tried to use the least amount of #ifdefs, and modified ebp when
CC_USE_FENTRY was defined to point to the parent ip as it does when
CC_USE_FENTRY is not defined. But when CONFIG_FRAME_POINTERS is set, it
corrupts the ebp register for this frame while doing the tracing.
NOTE, it does not corrupt ebp in any other way. It is just a bad frame
pointer when calling into the tracing infrastructure. The original ebp is
restored before returning from the fentry call. But if a stack trace is
performed inside the tracing, the unwinder will notice the bad ebp.
Instead of toying with ebp with CC_USING_FENTRY, just slap the parent ip
into the second parameter (%edx), and have an #else that does it the
original way.
The unwinder will unfortunately miss the function being traced, as the
stack frame is not set up yet for it, as it is for x86_64. But fixing that
is a bit more complex and did not work before anyway.
This has been tested with and without FRAME_POINTERS being set while using
-mfentry, as well as using an older compiler that uses mcount.
Analyzed-by: Josh Poimboeuf <[email protected]>
Fixes: 644e0e8dc76b ("x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Link: https://lists.01.org/pipermail/lkp/2017-April/006165.html
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
I lack the basic understanding of what segments mean, so we were being
limited to 512kib requests even with higher max_sectors sizes set.
Setting the maximum number of segments to unlimited allows us to
actually have arbitrarily large IO's go through NBD.
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|