aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-12-07blk-mq: re-build queue map in case of kdump kernelMing Lei1-2/+3
Now almost all .map_queues() implementation based on managed irq affinity doesn't update queue mapping and it just retrieves the old built mapping, so if nr_hw_queues is changed, the mapping talbe includes stale mapping. And only blk_mq_map_queues() may rebuild the mapping talbe. One case is that we limit .nr_hw_queues as 1 in case of kdump kernel. However, drivers often builds queue mapping before allocating tagset via pci_alloc_irq_vectors_affinity(), but set->nr_hw_queues can be set as 1 in case of kdump kernel, so wrong queue mapping is used, and kernel panic[1] is observed during booting. This patch fixes the kernel panic triggerd on nvme by rebulding the mapping table via blk_mq_map_queues(). [1] kernel panic log [ 4.438371] nvme nvme0: 16/0/0 default/read/poll queues [ 4.443277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [ 4.444681] PGD 0 P4D 0 [ 4.445367] Oops: 0000 [#1] SMP NOPTI [ 4.446342] CPU: 3 PID: 201 Comm: kworker/u33:10 Not tainted 4.20.0-rc5-00664-g5eb02f7ee1eb-dirty #459 [ 4.447630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 [ 4.448689] Workqueue: nvme-wq nvme_scan_work [nvme_core] [ 4.449368] RIP: 0010:blk_mq_map_swqueue+0xfb/0x222 [ 4.450596] Code: 04 f5 20 28 ef 81 48 89 c6 39 55 30 76 93 89 d0 48 c1 e0 04 48 03 83 f8 05 00 00 48 8b 00 42 8b 3c 28 48 8b 43 58 48 8b 04 f8 <48> 8b b8 98 00 00 00 4c 0f a3 37 72 42 f0 4c 0f ab 37 66 8b b8 f6 [ 4.453132] RSP: 0018:ffffc900023b3cd8 EFLAGS: 00010286 [ 4.454061] RAX: 0000000000000000 RBX: ffff888174448000 RCX: 0000000000000001 [ 4.456480] RDX: 0000000000000001 RSI: ffffe8feffc506c0 RDI: 0000000000000001 [ 4.458750] RBP: ffff88810722d008 R08: ffff88817647a880 R09: 0000000000000002 [ 4.464580] R10: ffffc900023b3c10 R11: 0000000000000004 R12: ffff888174448538 [ 4.467803] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000001 [ 4.469220] FS: 0000000000000000(0000) GS:ffff88817bac0000(0000) knlGS:0000000000000000 [ 4.471554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.472464] CR2: 0000000000000098 CR3: 0000000174e4e001 CR4: 0000000000760ee0 [ 4.474264] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.476007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4.477061] PKRU: 55555554 [ 4.477464] Call Trace: [ 4.478731] blk_mq_init_allocated_queue+0x36a/0x3ad [ 4.479595] blk_mq_init_queue+0x32/0x4e [ 4.480178] nvme_validate_ns+0x98/0x623 [nvme_core] [ 4.480963] ? nvme_submit_sync_cmd+0x1b/0x20 [nvme_core] [ 4.481685] ? nvme_identify_ctrl.isra.8+0x70/0xa0 [nvme_core] [ 4.482601] nvme_scan_work+0x23a/0x29b [nvme_core] [ 4.483269] ? _raw_spin_unlock_irqrestore+0x25/0x38 [ 4.483930] ? try_to_wake_up+0x38d/0x3b3 [ 4.484478] ? process_one_work+0x179/0x2fc [ 4.485118] process_one_work+0x1d3/0x2fc [ 4.485655] ? rescuer_thread+0x2ae/0x2ae [ 4.486196] worker_thread+0x1e9/0x2be [ 4.486841] kthread+0x115/0x11d [ 4.487294] ? kthread_park+0x76/0x76 [ 4.487784] ret_from_fork+0x3a/0x50 [ 4.488322] Modules linked in: nvme nvme_core qemu_fw_cfg virtio_scsi ip_tables [ 4.489428] Dumping ftrace buffer: [ 4.489939] (ftrace buffer empty) [ 4.490492] CR2: 0000000000000098 [ 4.491052] ---[ end trace 03cd268ad5a86ff7 ]--- Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: David Milburn <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: put back rcu lock in blkcg_bio_issue_check()Dennis Zhou1-0/+3
I was a little overzealous in removing the rcu_read_lock() call from blkcg_bio_issue_check() and it broke blk-throttle. Put it back. Fixes: e35403a034bf ("blkcg: associate blkg when associating a device") Signed-off-by: Dennis Zhou <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07block: convert io-latency to use rq_qos_waitJosef Bacik1-23/+8
Now that we have this common helper, convert io-latency over to use it as well. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07block: convert wbt_wait() to use rq_qos_wait()Josef Bacik1-54/+11
Now that we have rq_qos_wait() in place, convert wbt_wait() over to using it with it's specific callbacks. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07block: add rq_qos_wait to rq_qosJosef Bacik2-0/+92
Originally when I split out the common code from blk-wbt into rq_qos I left the wbt_wait() where it was and simply copied and modified it slightly to work for io-latency. However they are both basically the same thing, and as time has gone on wbt_wait() has ended up much smarter and kinder than it was when I copied it into io-latency, which means io-latency has lost out on these improvements. Since they are the same thing essentially except for a few minor things, create rq_qos_wait() that replicates what wbt_wait() currently does with callbacks that can be passed in for the snowflakes to do their own thing as appropriate. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: rename blkg_try_get() to blkg_tryget()Dennis Zhou4-11/+8
blkg reference counting now uses percpu_ref rather than atomic_t. Let's make this consistent with css_tryget. This renames blkg_try_get to blkg_tryget and now returns a bool rather than the blkg or %NULL. Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: change blkg reference counting to use percpu_refDennis Zhou2-12/+44
Every bio is now associated with a blkg putting blkg_get, blkg_try_get, and blkg_put on the hot path. Switch over the refcnt in blkg to use percpu_ref. Signed-off-by: Dennis Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: remove bio_disassociate_task()Dennis Zhou2-12/+1
Now that a bio only holds a blkg reference, so clean up is simply putting back that reference. Remove bio_disassociate_task() as it just calls bio_disassociate_blkg() and call the latter directly. Signed-off-by: Dennis Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: remove additional reference to the cssDennis Zhou4-88/+69
The previous patch in this series removed carrying around a pointer to the css in blkg. However, the blkg association logic still relied on taking a reference on the css to ensure we wouldn't fail in getting a reference for the blkg. Here the implicit dependency on the css is removed. The association continues to rely on the tryget logic walking up the blkg tree. This streamlines the three ways that association can happen: normal, swap, and writeback. Signed-off-by: Dennis Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: remove bio->bi_css and instead use bio->bi_blkgDennis Zhou8-66/+32
Prior patches ensured that any bio that interacts with a request_queue is properly associated with a blkg. This makes bio->bi_css unnecessary as blkg maintains a reference to blkcg already. This removes the bio field bi_css and transfers corresponding uses to access via bi_blkg. Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: associate writeback bios with a blkgDennis Zhou6-11/+37
One of the goals of this series is to remove a separate reference to the css of the bio. This can and should be accessed via bio_blkcg(). In this patch, wbc_init_bio() now requires a bio to have a device associated with it. Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: associate a blkg for pages being evicted by swapDennis Zhou3-28/+42
A prior patch in this series added blkg association to bios issued by cgroups. There are two other paths that we want to attribute work back to the appropriate cgroup: swap and writeback. Here we modify the way swap tags bios to include the blkg. Writeback will be tackle in the next patch. Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: consolidate bio_issue_init() to be a part of coreDennis Zhou5-10/+11
bio_issue_init among other things initializes the timestamp for an IO. Rather than have this logic handled by policies, this consolidates it to be on the init paths (normal, clone, bounce clone). Signed-off-by: Dennis Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Liu Bo <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: associate blkg when associating a deviceDennis Zhou5-12/+14
Previously, blkg association was handled by controller specific code in blk-throttle and blk-iolatency. However, because a blkg represents a relationship between a blkcg and a request_queue, it makes sense to keep the blkg->q and bio->bi_disk->queue consistent. This patch moves association into the bio_set_dev macro(). This should cover the majority of cases where the device is set/changed keeping the two pointers consistent. Fallback code is added to blkcg_bio_issue_check() to catch any missing paths. Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07dm: set the static flush bio device on demandDennis Zhou2-1/+12
The next patch changes the macro bio_set_dev() to associate a bio with a blkg based on the device set. However, dm creates a static bio to be used as the basis for cloning empty flush bios on creation. The bio_set_dev() call in alloc_dev() will cause problems with the next patch adding association to bio_set_dev() because the call is before the bdev is associated with a gendisk (bd_disk is %NULL). To get around this, set the device on the static bio every time and use that to clone to the other bios. Signed-off-by: Dennis Zhou <[email protected]> Acked-by: Mike Snitzer <[email protected]> Cc: Alasdair Kergon <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: introduce common blkg association logicDennis Zhou4-21/+62
There are 3 ways blkg association can happen: association with the current css, with the page css (swap), or from the wbc css (writeback). This patch handles how association is done for the first case where we are associating bsaed on the current css. If there is already a blkg associated, the css will be reused and association will be redone as the request_queue may have changed. Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: convert blkg_lookup_create() to find closest blkgDennis Zhou5-38/+44
There are several scenarios where blkg_lookup_create() can fail such as the blkcg dying, request_queue is dying, or simply being OOM. Most handle this by simply falling back to the q->root_blkg and calling it a day. This patch implements the notion of closest blkg. During blkg_lookup_create(), if it fails to create, return the closest blkg found or the q->root_blkg. blkg_try_get_closest() is introduced and used during association so a bio is always attached to a blkg. Signed-off-by: Dennis Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: update blkg_lookup_create() to do lockingDennis Zhou3-5/+29
To know when to create a blkg, the general pattern is to do a blkg_lookup() and if that fails, lock and do the lookup again, and if that fails finally create. It doesn't make much sense for everyone who wants to do creation to write this themselves. This changes blkg_lookup_create() to do locking and implement this pattern. The old blkg_lookup_create() is renamed to __blkg_lookup_create(). If a call site wants to do its own error handling or already owns the queue lock, they can use __blkg_lookup_create(). This will be used in upcoming patches. Signed-off-by: Dennis Zhou <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blkcg: fix ref count issue with bio_blkcg() using task_cssDennis Zhou5-14/+102
The bio_blkcg() function turns out to be inconsistent and consequently dangerous to use. The first part returns a blkcg where a reference is owned by the bio meaning it does not need to be rcu protected. However, the third case, the last line, is problematic: return css_to_blkcg(task_css(current, io_cgrp_id)); This can race against task migration and the cgroup dying. It is also semantically different as it must be called rcu protected and is susceptible to failure when trying to get a reference to it. This patch adds association ahead of calling bio_blkcg() rather than after. This makes association a required and explicit step along the code paths for calling bio_blkcg(). In blk-iolatency, association is moved above the bio_blkcg() call to ensure it will not return %NULL. BFQ uses the old bio_blkcg() function, but I do not want to address it in this series due to the complexity. I have created a private version documenting the inconsistency and noting not to use it. Signed-off-by: Dennis Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07blk-mq: remove QUEUE_FLAG_POLL from default MQ flagsJens Axboe1-2/+1
We only support polling if we have poll queues now, but the flag is being set by default. Remove the default QUEUE_FLAG_POLL setting, we'll set it in blk_mq_init_allocated_queue() if we have poll queues available for this device. Fixes: 6544d229bf43 ("block: enable polling by default if a poll map is initalized") Reported-by: Kirill Tkhai <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07Merge branch 'skb-headroom-slab-out-of-bounds'David S. Miller2-26/+44
Stefano Brivio says: ==================== Fix slab out-of-bounds on insufficient headroom for IPv6 packets Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX. Patch 2/2 makes sure we avoid writing before the allocated buffer in neigh_hh_output() in case the headroom is enough for the unaligned hardware header size, but not enough for the aligned one, and that we warn if we hit this condition. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-12-07neighbour: Avoid writing before skb->head in neigh_hh_output()Stefano Brivio1-5/+23
While skb_push() makes the kernel panic if the skb headroom is less than the unaligned hardware header size, it will proceed normally in case we copy more than that because of alignment, and we'll silently corrupt adjacent slabs. In the case fixed by the previous patch, "ipv6: Check available headroom in ip6_xmit() even without options", we end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware header and write 16 bytes, starting 2 bytes before the allocated buffer. Always check we're not writing before skb->head and, if the headroom is not enough, warn and drop the packet. v2: - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet (Eric Dumazet) - if we avoid the panic, though, we need to explicitly check the headroom before the memcpy(), otherwise we'll have corrupted slabs on a running kernel, after we warn - use __skb_push() instead of skb_push(), as the headroom check is already implemented here explicitly (Eric Dumazet) Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-12-07ipv6: Check available headroom in ip6_xmit() even without optionsStefano Brivio1-21/+21
Even if we send an IPv6 packet without options, MAX_HEADER might not be enough to account for the additional headroom required by alignment of hardware headers. On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL, sending short SCTP packets over IPv4 over L2TP over IPv6, we start with 100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54 bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2(). Those would be enough to append our 14 bytes header, but we're going to align that to 16 bytes, and write 2 bytes out of the allocated slab in neigh_hh_output(). KASan says: [ 264.967848] ================================================================== [ 264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70 [ 264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201 [ 264.967870] [ 264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1 [ 264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0) [ 264.967887] Call Trace: [ 264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0) [ 264.967903] [<00000000017e379c>] dump_stack+0x23c/0x290 [ 264.967912] [<00000000007bc594>] print_address_description+0xf4/0x290 [ 264.967919] [<00000000007bc8fc>] kasan_report+0x13c/0x240 [ 264.967927] [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70 [ 264.967935] [<000000000163f890>] ip6_finish_output+0x430/0x7f0 [ 264.967943] [<000000000163fe44>] ip6_output+0x1f4/0x580 [ 264.967953] [<000000000163882a>] ip6_xmit+0xfea/0x1ce8 [ 264.967963] [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8 [ 264.968033] [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core] [ 264.968037] [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth] [ 264.968041] [<0000000001220020>] dev_hard_start_xmit+0x268/0x928 [ 264.968069] [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350 [ 264.968071] [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478 [ 264.968075] [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0 [ 264.968078] [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8 [ 264.968081] [<00000000013ddd1e>] ip_output+0x226/0x4c0 [ 264.968083] [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938 [ 264.968100] [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp] [ 264.968116] [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp] [ 264.968131] [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp] [ 264.968146] [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp] [ 264.968161] [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp] [ 264.968177] [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp] [ 264.968192] [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp] [ 264.968208] [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp] [ 264.968212] [<0000000001197942>] __sys_connect+0x21a/0x450 [ 264.968215] [<000000000119aff8>] sys_socketcall+0x3d0/0xb08 [ 264.968218] [<000000000184ea7a>] system_call+0x2a2/0x2c0 [...] Just like ip_finish_output2() does for IPv4, check that we have enough headroom in ip6_xmit(), and reallocate it if we don't. This issue is older than git history. Reported-by: Jianlin Shi <[email protected]> Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-12-07tcp: lack of available data can also cause TSO deferEric Dumazet1-11/+24
tcp_tso_should_defer() can return true in three different cases : 1) We are cwnd-limited 2) We are rwnd-limited 3) We are application limited. Neal pointed out that my recent fix went too far, since it assumed that if we were not in 1) case, we must be rwnd-limited Fix this by properly populating the is_cwnd_limited and is_rwnd_limited booleans. After this change, we can finally move the silly check for FIN flag only for the application-limited case. The same move for EOR bit will be handled in net-next, since commit 1c09f7d073b1 ("tcp: do not try to defer skbs with eor mark (MSG_EOR)") is scheduled for linux-4.21 Tested by running 200 concurrent netperf -t TCP_RR -- -r 60000,100 and checking none of them was rwnd_limited in the chrono_stat output from "ss -ti" command. Fixes: 41727549de3e ("tcp: Do not underestimate rwnd_limited") Signed-off-by: Eric Dumazet <[email protected]> Suggested-by: Neal Cardwell <[email protected]> Reviewed-by: Neal Cardwell <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Reviewed-by: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-12-07Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-34/+62
Pull vhost/virtio fixes from Michael Tsirkin: "A couple of last-minute fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/vsock: fix use-after-free in network stack callers virtio/s390: fix race in ccw_io_helper() virtio/s390: avoid race on vcdev->config vhost/vsock: fix reset orphans race with close timeout
2018-12-07Merge tag 'arm64-fixes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Avoid sending IPIs with interrupts disabled" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: hibernate: Avoid sending cross-calling with interrupts disabled
2018-12-07Merge tag 'gcc-plugins-v4.20-rc6' of ↵Linus Torvalds2-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc stackleak plugin fixes from Kees Cook: - Remove tracing for inserted stack depth marking function (Anders Roxell) - Move gcc-plugin pass location to avoid objtool warnings (Alexander Popov) * tag 'gcc-plugins-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: stackleak: Register the 'stackleak_cleanup' pass before the '*free_cfg' pass stackleak: Mark stackleak_track_stack() as notrace
2018-12-07Merge branch 'linus' of ↵Linus Torvalds4-7/+13
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - Disable the new crypto stats interface as it's still being changed - Fix potential uses-after-free in cbc/cfb/pcbc. * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: user - Disable statistics interface crypto: do not free algorithm before using
2018-12-07Merge tag 'pci-v4.20-fixes-3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Revert ASPM change that caused a regression" * tag 'pci-v4.20-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: Revert "PCI/ASPM: Do not initialize link state when aspm_disabled is set"
2018-12-07ipv6: sr: properly initialize flowi6 prior passing to ip6_route_outputShmulik Ladkani1-0/+1
In 'seg6_output', stack variable 'struct flowi6 fl6' was missing initialization. Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: Shmulik Ladkani <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-12-07Merge tag 'for-linus-20181207' of git://git.kernel.dk/linux-blockLinus Torvalds6-55/+123
Pull block fixes from Jens Axboe: "Let's try this again... We're finally happy with the DM livelock issue, and it's also passed overnight testing and the corruption regression test. The end result is much nicer now too, which is great. Outside of that fix, there's a pull request for NVMe with two small fixes, and a regression fix for BFQ from this merge window. The BFQ fix looks bigger than it is, it's 90% comment updates" * tag 'for-linus-20181207' of git://git.kernel.dk/linux-block: blk-mq: punt failed direct issue to dispatch list nvmet-rdma: fix response use after free nvme: validate controller state before rescheduling keep alive block, bfq: fix decrement of num_active_groups
2018-12-07Merge branch 'i2c/for-current-fixed' of ↵Linus Torvalds6-30/+93
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "A set of driver bugfixes for the I2C subsystem" * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: uniphier-f: fix violation of tLOW requirement for Fast-mode i2c: uniphier: fix violation of tLOW requirement for Fast-mode i2c: uniphier-f: fill TX-FIFO only in IRQ handler for repeated START i2c: uniphier-f: fix timeout error after reading 8 bytes i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node i2c: axxia: properly handle master timeout i2c: rcar: check bus state before reinitializing i2c: nvidia-gpu: limit reads also for combined messages i2c: nvidia-gpu: adhere to I2C fault codes
2018-12-07Merge tag 'dmaengine-fix-4.20-rc6' of ↵Linus Torvalds3-29/+62
git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "Another pull request for dmaengine. We got bunch of fixes early this week and all are tagged to stable. Hope this is last fix for this cycle: - Fix imx-sdma handling of channel terminations, this involves reverting two commits and implement async termination - Fix cppi dma channel deletion from pending list on stop - Fix FIFO size for dw controller in Intel Merrifield" * tag 'dmaengine-fix-4.20-rc6' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: dw: Fix FIFO size for Intel Merrifield dmaengine: cppi41: delete channel from pending list when stop channel dmaengine: imx-sdma: use GFP_NOWAIT for dma descriptor allocations dmaengine: imx-sdma: implement channel termination via worker Revert "dmaengine: imx-sdma: alloclate bd memory from dma pool" Revert "dmaengine: imx-sdma: Use GFP_NOWAIT for dma allocations"
2018-12-07x86/vdso: Drop implicit common-page-size linker flagNick Desaulniers1-2/+2
GNU linker's -z common-page-size's default value is based on the target architecture. arch/x86/entry/vdso/Makefile sets it to the architecture default, which is implicit and redundant. Drop it. Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Reported-by: Dmitry Golovin <[email protected]> Reported-by: Bill Wendling <[email protected]> Suggested-by: Dmitry Golovin <[email protected]> Suggested-by: Rui Ueyama <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Fangrui Song <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://bugs.llvm.org/show_bug.cgi?id=38774 Link: https://github.com/ClangBuiltLinux/linux/issues/31
2018-12-07arm64: hibernate: Avoid sending cross-calling with interrupts disabledWill Deacon1-1/+1
Since commit 3b8c9f1cdfc50 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings"), a call to flush_icache_range() will use an IPI to cross-call other online CPUs so that any stale instructions are flushed from their pipelines. This triggers a WARN during the hibernation resume path, where flush_icache_range() is called with interrupts disabled and is therefore prone to deadlock: | Disabling non-boot CPUs ... | CPU1: shutdown | psci: CPU1 killed. | CPU2: shutdown | psci: CPU2 killed. | CPU3: shutdown | psci: CPU3 killed. | WARNING: CPU: 0 PID: 1 at ../kernel/smp.c:416 smp_call_function_many+0xd4/0x350 | Modules linked in: | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc4 #1 Since all secondary CPUs have been taken offline prior to invalidating the I-cache, there's actually no need for an IPI and we can simply call __flush_icache_range() instead. Cc: <[email protected]> Fixes: 3b8c9f1cdfc50 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings") Reported-by: Kunihiko Hayashi <[email protected]> Tested-by: Kunihiko Hayashi <[email protected]> Tested-by: James Morse <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2018-12-07Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linusJens Axboe2-2/+11
Pull NVMe fixes from Christoph. * 'nvme-4.20' of git://git.infradead.org/nvme: nvmet-rdma: fix response use after free nvme: validate controller state before rescheduling keep alive
2018-12-07blk-mq: punt failed direct issue to dispatch listJens Axboe1-28/+5
After the direct dispatch corruption fix, we permanently disallow direct dispatch of non read/write requests. This works fine off the normal IO path, as they will be retried like any other failed direct dispatch request. But for the blk_insert_cloned_request() that only DM uses to bypass the bottom level scheduler, we always first attempt direct dispatch. For some types of requests, that's now a permanent failure, and no amount of retrying will make that succeed. This results in a livelock. Instead of making special cases for what we can direct issue, and now having to deal with DM solving the livelock while still retaining a BUSY condition feedback loop, always just add a request that has been through ->queue_rq() to the hardware queue dispatch list. These are safe to use as no merging can take place there. Additionally, if requests do have prepped data from drivers, we aren't dependent on them not sharing space in the request structure to safely add them to the IO scheduler lists. This basically reverts ffe81d45322c and is based on a patch from Ming, but with the list insert case covered as well. Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue") Cc: [email protected] Suggested-by: Ming Lei <[email protected]> Reported-by: Bart Van Assche <[email protected]> Tested-by: Ming Lei <[email protected]> Acked-by: Mike Snitzer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07nvmet-rdma: fix response use after freeIsrael Rukshin1-1/+2
nvmet_rdma_release_rsp() may free the response before using it at error flow. Fixes: 8407879 ("nvmet-rdma: fix possible bogus dereference under heavy load") Signed-off-by: Israel Rukshin <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-12-07nvme: validate controller state before rescheduling keep aliveJames Smart1-1/+9
Delete operations are seeing NULL pointer references in call_timer_fn. Tracking these back, the timer appears to be the keep alive timer. nvme_keep_alive_work() which is tied to the timer that is cancelled by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't wait for it's completion. So nvme_stop_keep_alive() only stops a timer when it's pending. When a keep alive is in flight, there is no timer running and the nvme_stop_keep_alive() will have no affect on the keep alive io. Thus, if the io completes successfully, the keep alive timer will be rescheduled. In the failure case, delete is called, the controller state is changed, the nvme_stop_keep_alive() is called while the io is outstanding, and the delete path continues on. The keep alive happens to successfully complete before the delete paths mark it as aborted as part of the queue termination, so the timer is restarted. The delete paths then tear down the controller, and later on the timer code fires and the timer entry is now corrupt. Fix by validating the controller state before rescheduling the keep alive. Testing with the fix has confirmed the condition above was hit. Signed-off-by: James Smart <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-12-07block, bfq: fix decrement of num_active_groupsPaolo Valente3-25/+107
Since commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")', if there are process groups with I/O requests waiting for completion, then BFQ tags the scenario as 'asymmetric'. This detection is needed for preserving service guarantees (for details, see comments on the computation * of the variable asymmetric_scenario in the function bfq_better_to_idle). Unfortunately, commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")' contains an error exactly in the updating of the number of groups with I/O requests waiting for completion: if a group has more than one descendant process, then the above number of groups, which is renamed from num_active_groups to a more appropriate num_groups_with_pending_reqs by this commit, may happen to be wrongly decremented multiple times, namely every time one of the descendant processes gets all its pending I/O requests completed. A correct, complete solution should work as follows. Consider a group that is inactive, i.e., that has no descendant process with pending I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs is still accounting for this group, because the group still has some descendant process with some I/O request still in flight. num_groups_with_pending_reqs should be decremented when the in-flight request of the last descendant process is finally completed (assuming that nothing else has changed for the group in the meantime, in terms of composition of the group and active/inactive state of child groups and processes). To accomplish this, an additional pending-request counter must be added to entities, and must be updated correctly. To avoid this additional field and operations, this commit resorts to the following tradeoff between simplicity and accuracy: for an inactive group that is still counted in num_groups_with_pending_reqs, this commit decrements num_groups_with_pending_reqs when the first descendant process of the group remains with no request waiting for completion. This simplified scheme provides a fix to the unbalanced decrements introduced by 2d29c9f89fcd. Since this error was also caused by lack of comments on this non-trivial issue, this commit also adds related comments. Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection") Reported-by: Steven Barrett <[email protected]> Tested-by: Steven Barrett <[email protected]> Tested-by: Lucjan Lucjanov <[email protected]> Reviewed-by: Federico Motta <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-07Merge tag 'gnss-4.20-rc6' of ↵Greg Kroah-Hartman2-3/+5
https://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss into char-misc-linus Johan writes: GNSS fixes for 4.20-rc6 Here's a fix for a broken activation retry loop in the sirf driver. Included are also two MAINTAINERS updates. All have been in linux-next with no reported issues. Signed-off-by: Johan Hovold <[email protected]> * tag 'gnss-4.20-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss: MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching MAINTAINERS: add gnss scm tree gnss: sirf: fix activation retry handling
2018-12-07CIFS: Avoid returning EBUSY to upper layer VFSLong Li1-25/+6
EBUSY is not handled by VFS, and will be passed to user-mode. This is not correct as we need to wait for more credits. This patch also fixes a bug where rsize or wsize is used uninitialized when the call to server->ops->wait_mtu_credits() fails. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Long Li <[email protected]> Signed-off-by: Steve French <[email protected]> Reviewed-by: Pavel Shilovsky <[email protected]>
2018-12-07crypto: user - Disable statistics interfaceHerbert Xu1-1/+1
Since this user-space API is still undergoing significant changes, this patch disables it for the current merge window. Signed-off-by: Herbert Xu <[email protected]>
2018-12-06Merge tag 'drm-fixes-2018-12-07' of git://anongit.freedesktop.org/drm/drmLinus Torvalds38-157/+235
Pull drm fixes from Dave Airlie: "There's a bit more in here than I'd like, and I'm hoping things calm down when I'm out. msm: - a bunch of display fixes for the new DPU - a couple of command submission fixes omap: - some DSI fixes ast: - driver unload crash fix core: - fix the lease uevent so userspace can distinguish it amd: - fix a bpc regression - fix lru handling regression - fixed firmware support for new GPUs - power management fixes for vega20" * tag 'drm-fixes-2018-12-07' of git://anongit.freedesktop.org/drm/drm: (37 commits) drm/ast: Fix connector leak during driver unload drm/amdgpu/vcn: Update vcn.cur_state during suspend drm/amd/display: Fix overflow/truncation from strncpy. drm/amd/powerplay: improve OD code robustness drm/amdgpu: enlarge maximum waiting time of KIQ drm/fb-helper: Fix typo in parameter description drm/amd/powerplay: support SoftMin/Max setting for some specific DPM drm/amd/powerplay: issue pre-display settings for display change event drm/amd/powerplay: support new pptable upload on Vega20 drm/amdgpu/gmc8: always load MC firmware in the driver drm/amdgpu/gmc8: update MC firmware for polaris drm/amdgpu: update mc firmware image for polaris12 variants drm/msm: Fix error return checking drm/msm/dpu: Ignore alpha for XBGR8888 format drm/msm: dpu: Fix "WARNING: invalid free of devm_ allocated data" drm/msm/hdmi: Drop pointless static qualifier in msm_hdmi_bind() drm/msm: Move fence put to where failure occurs drm/msm: dpu: Don't set legacy plane->crtc pointer drm/msm/gpu: Don't map command buffers with nr_relocs equal to 0 drm/msm/hdmi: Enable HPD after HDMI IRQ is set up ...
2018-12-06Merge tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds7-49/+73
Pull NFS client bugfixes from Trond Myklebust: "This is mainly fallout from the updates to the SUNRPC code that is being triggered from less common combinations of NFS mount options. Highlights include: Stable fixes: - Fix a page leak when using RPCSEC_GSS/krb5p to encrypt data. Bugfixes: - Fix a regression that causes the RPC receive code to hang - Fix call_connect_status() so that it handles tasks that got transmitted while queued waiting for the socket lock. - Fix a memory leak in call_encode() - Fix several other connect races. - Fix receive code error handling. - Use the discard iterator rather than MSG_TRUNC for compatibility with AF_UNIX/AF_LOCAL sockets. - nfs: don't dirty kernel pages read by direct-io - pnfs/Flexfiles fix to enforce per-mirror stateid only for NFSv4 data servers" * tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Don't force a redundant disconnection in xs_read_stream() SUNRPC: Fix up socket polling SUNRPC: Use the discard iterator rather than MSG_TRUNC SUNRPC: Treat EFAULT as a truncated message in xs_read_stream_request() SUNRPC: Fix up handling of the XDRBUF_SPARSE_PAGES flag SUNRPC: Fix RPC receive hangs SUNRPC: Fix a potential race in xprt_connect() SUNRPC: Fix a memory leak in call_encode() SUNRPC: Fix leak of krb5p encode pages SUNRPC: call_connect_status() must handle tasks that got transmitted nfs: don't dirty kernel pages read by direct-io flexfiles: enforce per-mirror stateid only for v4 DSes
2018-12-06Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds1-0/+10
Pull ARM spectre fix from Russell King: "Exynos folk noticed that CPU hotplug wasn't working with their kernel configuration, and have tested this as fixing the problem" * 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: ensure that processor vtables is not lost after boot
2018-12-06Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds4-10/+16
Pull ARM fixes from Russell King: "Some small fixes that have been accumulated: - Chris Cole noticed that in a SMP environment, the DMA cache coherence handling can produce undesirable results in a corner case - Propagate that fix for ARMv7M as well - Fix a false positive with source fortification - Fix an uninitialised return that Nathan Jones spotted" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8816/1: dma-mapping: fix potential uninitialized return ARM: 8815/1: V7M: align v7m_dma_inv_range() with v7 counterpart ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE
2018-12-06i2c: uniphier-f: fix violation of tLOW requirement for Fast-modeMasahiro Yamada1-1/+18
Currently, the clock duty is set as tLOW/tHIGH = 1/1. For Fast-mode, tLOW is set to 1.25 us while the I2C spec requires tLOW >= 1.3 us. tLOW/tHIGH = 5/4 would meet both Standard-mode and Fast-mode: Standard-mode: tLOW = 5.56 us, tHIGH = 4.44 us Fast-mode: tLOW = 1.39 us, tHIGH = 1.11 us Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2018-12-06i2c: uniphier: fix violation of tLOW requirement for Fast-modeMasahiro Yamada1-1/+7
Currently, the clock duty is set as tLOW/tHIGH = 1/1. For Fast-mode, tLOW is set to 1.25 us while the I2C spec requires tLOW >= 1.3 us. tLOW/tHIGH = 5/4 would meet both Standard-mode and Fast-mode: Standard-mode: tLOW = 5.56 us, tHIGH = 4.44 us Fast-mode: tLOW = 1.39 us, tHIGH = 1.11 us Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2018-12-06i2c: uniphier-f: fill TX-FIFO only in IRQ handler for repeated STARTMasahiro Yamada1-4/+9
- For a repeated START condition, this controller starts data transfer immediately after the slave address is written to the TX-FIFO. - Once the TX-FIFO empty interrupt is asserted, the controller makes a pause even if additional data are written to the TX-FIFO. Given those circumstances, the data after a repeated START may not be transferred if the interrupt is asserted while the TX-FIFO is being filled up. A more reliable way is to append TX data only in the interrupt handler. Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>