aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-12-07block, bfq: fix decrement of num_active_groupsPaolo Valente3-25/+107
Since commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")', if there are process groups with I/O requests waiting for completion, then BFQ tags the scenario as 'asymmetric'. This detection is needed for preserving service guarantees (for details, see comments on the computation * of the variable asymmetric_scenario in the function bfq_better_to_idle). Unfortunately, commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")' contains an error exactly in the updating of the number of groups with I/O requests waiting for completion: if a group has more than one descendant process, then the above number of groups, which is renamed from num_active_groups to a more appropriate num_groups_with_pending_reqs by this commit, may happen to be wrongly decremented multiple times, namely every time one of the descendant processes gets all its pending I/O requests completed. A correct, complete solution should work as follows. Consider a group that is inactive, i.e., that has no descendant process with pending I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs is still accounting for this group, because the group still has some descendant process with some I/O request still in flight. num_groups_with_pending_reqs should be decremented when the in-flight request of the last descendant process is finally completed (assuming that nothing else has changed for the group in the meantime, in terms of composition of the group and active/inactive state of child groups and processes). To accomplish this, an additional pending-request counter must be added to entities, and must be updated correctly. To avoid this additional field and operations, this commit resorts to the following tradeoff between simplicity and accuracy: for an inactive group that is still counted in num_groups_with_pending_reqs, this commit decrements num_groups_with_pending_reqs when the first descendant process of the group remains with no request waiting for completion. This simplified scheme provides a fix to the unbalanced decrements introduced by 2d29c9f89fcd. Since this error was also caused by lack of comments on this non-trivial issue, this commit also adds related comments. Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection") Reported-by: Steven Barrett <[email protected]> Tested-by: Steven Barrett <[email protected]> Tested-by: Lucjan Lucjanov <[email protected]> Reviewed-by: Federico Motta <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-12-04blk-mq: fix corruption with direct issueJens Axboe1-1/+25
If we attempt a direct issue to a SCSI device, and it returns BUSY, then we queue the request up normally. However, the SCSI layer may have already setup SG tables etc for this particular command. If we later merge with this request, then the old tables are no longer valid. Once we issue the IO, we only read/write the original part of the request, not the new state of it. This causes data corruption, and is most often noticed with the file system complaining about the just read data being invalid: [ 235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256) because most of it is garbage... This doesn't happen from the normal issue path, as we will simply defer the request to the hardware queue dispatch list if we fail. Once it's on the dispatch list, we never merge with it. Fix this from the direct issue path by flagging the request as REQ_NOMERGE so we don't change the size of it before issue. See also: https://bugzilla.kernel.org/show_bug.cgi?id=201685 Tested-by: Guenter Roeck <[email protected]> Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2018-12-03libata: whitelist all SAMSUNG MZ7KM* solid-state disksJuha-Matti Tilli1-0/+1
These devices support read zero after trim (RZAT), as they advertise to the OS. However, the OS doesn't believe the SSDs unless they are explicitly whitelisted. Acked-by: Martin K. Petersen <[email protected]> Signed-off-by: Juha-Matti Tilli <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-30Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linusJens Axboe4-4/+11
Pull NVMe fixes from Christoph: "Various fixlets all over." * 'nvme-4.20' of git://git.infradead.org/nvme: nvme-rdma: fix double freeing of async event data nvme: flush namespace scanning work just before removing namespaces nvme: warn when finding multi-port subsystems without multipathing enabled nvme-pci: fix surprise removal nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request() nvme: Free ctrl device name on init failure
2018-11-30block: fix single range discard mergeMing Lei1-1/+1
There are actually two kinds of discard merge: - one is the normal discard merge, just like normal read/write request, and call it single-range discard - another is the multi-range discard, queue_max_discard_segments(rq->q) > 1 For the former case, queue_max_discard_segments(rq->q) is 1, and we should handle this kind of discard merge like the normal read/write request. This patch fixes the following kernel panic issue[1], which is caused by not removing the single-range discard request from elevator queue. Guangwu has one raid discard test case, in which this issue is a bit easier to trigger, and I verified that this patch can fix the kernel panic issue in Guangwu's test case. [1] kernel panic log from Jens's report BUG: unable to handle kernel NULL pointer dereference at 0000000000000148 PGD 0 P4D 0. Oops: 0000 [#1] SMP PTI CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \ 4.20.0-rc3-00649-ge64d9a554a91-dirty #14 Hardware name: Wiwynn \ Leopard-Orv2/Leopard-DDR BW, BIOS LBM08 03/03/2017 Workqueue: kblockd \ blk_mq_run_work_fn RIP: \ 0010:blk_mq_get_driver_tag+0x81/0x120 Code: 24 \ 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \ 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \ f6 87 b0 00 00 00 02 RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246 \ RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8 RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000 R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300 R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000 FS: 0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: blk_mq_dispatch_rq_list+0xec/0x480 ? elv_rb_del+0x11/0x30 blk_mq_do_dispatch_sched+0x6e/0xf0 blk_mq_sched_dispatch_requests+0xfa/0x170 __blk_mq_run_hw_queue+0x5f/0xe0 process_one_work+0x154/0x350 worker_thread+0x46/0x3c0 kthread+0xf5/0x130 ? process_one_work+0x350/0x350 ? kthread_destroy_worker+0x50/0x50 ret_from_fork+0x1f/0x30 Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \ kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \ cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \ button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \ nvme_core fuse sg loop efivarfs autofs4 CR2: 0000000000000148 \ ---[ end trace 340a1fb996df1b9b ]--- RIP: 0010:blk_mq_get_driver_tag+0x81/0x120 Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \ 00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 \ 20 72 37 f6 87 b0 00 00 00 02 Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached") Reported-by: Jens Axboe <[email protected]> Cc: Guangwu Zhang <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jianchao Wang <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-30nvme-rdma: fix double freeing of async event dataPrabhath Sajeepa1-0/+2
Some error paths in configuration of admin queue free data buffer associated with async request SQE without resetting the data buffer pointer to NULL, This buffer is also freed up again if the controller is shutdown or reset. Signed-off-by: Prabhath Sajeepa <[email protected]> Reviewed-by: Roland Dreier <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-11-30nvme: flush namespace scanning work just before removing namespacesSagi Grimberg1-1/+3
nvme_stop_ctrl can be called also for reset flow and there is no need to flush the scan_work as namespaces are not being removed. This can cause deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers before controller teardown (and specifically I/O cancellation of the scan_work itself) takes place, but the scan_work will be blocked anyways so there is no need to flush it. Instead, move scan_work flush to nvme_remove_namespaces() where it really needs to flush. Reported-by: Ming Lei <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed by: James Smart <[email protected]> Tested-by: Ewan D. Milne <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-11-30nvme: warn when finding multi-port subsystems without multipathing enabledChristoph Hellwig1-0/+3
Without CONFIG_NVME_MULTIPATH enabled a multi-port subsystem might show up as invididual devices and cause problems, warn about it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2018-11-30fs: fix lost error code in dio_completeMaximilian Heyne1-2/+2
commit e259221763a40403d5bb232209998e8c45804ab8 ("fs: simplify the generic_write_sync prototype") reworked callers of generic_write_sync(), and ended up dropping the error return for the directio path. Prior to that commit, in dio_complete(), an error would be bubbled up the stack, but after that commit, errors passed on to dio_complete were eaten up. This was reported on the list earlier, and a fix was proposed in https://lore.kernel.org/lkml/[email protected]/, but never followed up with. We recently hit this bug in our testing where fencing io errors, which were previously erroring out with EIO, were being returned as success operations after this commit. The fix proposed on the list earlier was a little short -- it would have still called generic_write_sync() in case `ret` already contained an error. This fix ensures generic_write_sync() is only called when there's no pending error in the write. Additionally, transferred is replaced with ret to bring this code in line with other callers. Fixes: e259221763a4 ("fs: simplify the generic_write_sync prototype") Reported-by: Ravi Nankani <[email protected]> Signed-off-by: Maximilian Heyne <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> CC: Torsten Mehlan <[email protected]> CC: Uwe Dannowski <[email protected]> CC: Amit Shah <[email protected]> CC: David Woodhouse <[email protected]> CC: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2018-11-27nvme-pci: fix surprise removalIgor Konopko1-1/+1
When a PCIe NVMe device is not present, nvme_dev_remove_admin() calls blk_cleanup_queue() on the admin queue, which frees the hctx for that queue. Moments later, on the same path nvme_kill_queues() calls blk_mq_unquiesce_queue() on admin queue and tries to access hctx of it, which leads to following OOPS: Oops: 0000 [#1] SMP PTI RIP: 0010:sbitmap_any_bit_set+0xb/0x40 Call Trace: blk_mq_run_hw_queue+0xd5/0x150 blk_mq_run_hw_queues+0x3a/0x50 nvme_kill_queues+0x26/0x50 nvme_remove_namespaces+0xb2/0xc0 nvme_remove+0x60/0x140 pci_device_remove+0x3b/0xb0 Fixes: cb4bfda62afa2 ("nvme-pci: fix hot removal during error handling") Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-11-27nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request()Ewan D. Milne1-1/+1
__nvme_fc_init_request() invokes memset() on the nvme_fcp_op_w_sgl structure, which NULLed-out the nvme_req(req)->ctrl field previously set by nvme_fc_init_request(). This apparently was not referenced until commit faf4a44fff ("nvme: support traffic based keep-alive") which now results in a crash in nvme_complete_rq(): [ 8386.897130] RIP: 0010:panic+0x220/0x26c [ 8386.901406] Code: 83 3d 6f ee 72 01 00 74 05 e8 e8 54 02 00 48 c7 c6 40 fd 5b b4 48 c7 c7 d8 8d c6 b3 31e [ 8386.922359] RSP: 0018:ffff99650019fc40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [ 8386.930804] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006 [ 8386.938764] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8e325f8168b0 [ 8386.946725] RBP: ffff99650019fcb0 R08: 0000000000000000 R09: 00000000000004f8 [ 8386.954687] R10: 0000000000000000 R11: ffff99650019f9b8 R12: ffffffffb3c55f3c [ 8386.962648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 [ 8386.970613] oops_end+0xd1/0xe0 [ 8386.974116] no_context+0x1b2/0x3c0 [ 8386.978006] do_page_fault+0x32/0x140 [ 8386.982090] page_fault+0x1e/0x30 [ 8386.985786] RIP: 0010:nvme_complete_rq+0x65/0x1d0 [nvme_core] [ 8386.992195] Code: 41 bc 03 00 00 00 74 16 0f 86 c3 00 00 00 66 3d 83 00 41 bc 06 00 00 00 0f 85 e7 00 000 [ 8387.013147] RSP: 0018:ffff99650019fe18 EFLAGS: 00010246 [ 8387.018973] RAX: 0000000000000000 RBX: ffff8e322ae51280 RCX: 0000000000000001 [ 8387.026935] RDX: 0000000000000400 RSI: 0000000000000001 RDI: ffff8e322ae51280 [ 8387.034897] RBP: ffff8e322ae51280 R08: 0000000000000000 R09: ffffffffb2f0b890 [ 8387.042859] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 8387.050821] R13: 0000000000000100 R14: 0000000000000004 R15: ffff8e2b0446d990 [ 8387.058782] ? swiotlb_unmap_page+0x40/0x40 [ 8387.063448] nvme_fc_complete_rq+0x2d/0x70 [nvme_fc] [ 8387.068986] blk_done_softirq+0xa1/0xd0 [ 8387.073264] __do_softirq+0xd6/0x2a9 [ 8387.077251] run_ksoftirqd+0x26/0x40 [ 8387.081238] smpboot_thread_fn+0x10e/0x160 [ 8387.085807] kthread+0xf8/0x130 [ 8387.089309] ? sort_range+0x20/0x20 [ 8387.093198] ? kthread_stop+0x110/0x110 [ 8387.097475] ret_from_fork+0x35/0x40 [ 8387.101462] ---[ end trace 7106b0adf5e422f8 ]--- Fixes: faf4a44fff ("nvme: support traffic based keep-alive") Signed-off-by: Ewan D. Milne <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-11-27nvme: Free ctrl device name on init failureKeith Busch1-1/+1
Free the kobject name that was allocated for the controller device on failure rather than its parent. Fixes: d22524a4782a9 ("nvme: switch controller refcounting to use struct device") Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-11-21Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linusJens Axboe1-10/+63
Pull NVMe fix from Christoph. * 'nvme-4.20' of git://git.infradead.org/nvme: nvme-fc: resolve io failures during connect
2018-11-15nvme-fc: resolve io failures during connectJames Smart1-10/+63
If an io error occurs on an io issued while connecting, recovery of the io falls flat as the state checking ends up nooping the error handler. Create an err_work work item that is scheduled upon an io error while connecting. The work thread terminates all io on all queues and marks the queues as not connected. The termination of the io will return back to the callee, which will then back out of the connection attempt and will reschedule, if possible, the connection attempt. The changes: - in case there are several commands hitting the error handler, a state flag is kept so that the error work is only scheduled once, on the first error. The subsequent errors can be ignored. - The calling sequence to stop keep alive and terminate the queues and their io is lifted from the reset routine. Made a small service routine used by both reset and err_work. - During debugging, found that the teardown path can reference an uninitialized pointer, resulting in a NULL pointer oops. The aen_ops weren't initialized yet. Add validation on their initialization before calling the teardown routine. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-11-14SCSI: fix queue cleanup race before queue initialization is doneMing Lei2-3/+10
c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has already fixed this race, however the implied synchronize_rcu() in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused performance regression. Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") tried to quiesce queue for avoiding unnecessary synchronize_rcu() only when queue initialization is done, because it is usual to see lots of inexistent LUNs which need to be probed. However, turns out it isn't safe to quiesce queue only when queue initialization is done. Because when one SCSI command is completed, the user of sending command can be waken up immediately, then the scsi device may be removed, meantime the run queue in scsi_end_request() is still in-progress, so kernel panic can be caused. In Red Hat QE lab, there are several reports about this kind of kernel panic triggered during kernel booting. This patch tries to address the issue by grabing one queue usage counter during freeing one request and the following run queue. Fixes: 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") Cc: Andrew Jones <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: [email protected] Cc: Martin K. Petersen <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: stable <[email protected]> Cc: jianchao.wang <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-14block: fix 32 bit overflow in __blkdev_issue_discard()Dave Chinner1-1/+3
A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to fall into an endless loop in the discard code. The test is creating a device that is exactly 2^32 sectors in size to test mkfs boundary conditions around the 32 bit sector overflow region. mkfs issues a discard for the entire device size by default, and hence this throws a sector count of 2^32 into blkdev_issue_discard(). It takes the number of sectors to discard as a sector_t - a 64 bit value. The commit ba5d73851e71 ("block: cleanup __blkdev_issue_discard") takes this sector count and casts it to a 32 bit value before comapring it against the maximum allowed discard size the device has. This truncates away the upper 32 bits, and so if the lower 32 bits of the sector count is zero, it starts issuing discards of length 0. This causes the code to fall into an endless loop, issuing a zero length discards over and over again on the same sector. Fixes: ba5d73851e71 ("block: cleanup __blkdev_issue_discard") Tested-by: Darrick J. Wong <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Killed pointless WARN_ON(). Signed-off-by: Jens Axboe <[email protected]>
2018-11-12libata: blacklist SAMSUNG MZ7TD256HAFV-000L9 SSDDiego Viola1-1/+1
med_power_with_dipm still causes freezes after updating the firmware to the latest version (DXT04L5Q). Set model_rev to NULL and blacklist the device. Cc: [email protected] Signed-off-by: Diego Viola <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-12block: copy ioprio in __bio_clone_fast() and bounceHannes Reinecke2-0/+2
We need to copy the io priority, too; otherwise the clone will run with a different priority than the original one. Fixes: 43b62ce3ff0a ("block: move bio io prio to a new field") Signed-off-by: Hannes Reinecke <[email protected]> Signed-off-by: Jean Delvare <[email protected]> Fixed up subject, and ordered stores. Signed-off-by: Jens Axboe <[email protected]>
2018-11-12kyber: fix wrong strlcpy() size in trace_kyber_latency()Omar Sandoval1-4/+4
When copying to the latency type, we should be passing LATENCY_TYPE_LEN, not DOMAIN_LEN (this isn't a problem in practice because we only pass "total" or "I/O"). Fix it by changing all of the strlcpy() calls to use sizeof(). Fixes: 6c3b7af1c975 ("kyber: add tracepoints") Reported-by: Jordan Glover <[email protected]> Tested-by: Jordan Glover <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-10floppy: fix race condition in __floppy_read_block_0()Jens Axboe1-1/+2
LKP recently reported a hang at bootup in the floppy code: [ 245.678853] INFO: task mount:580 blocked for more than 120 seconds. [ 245.679906] Tainted: G T 4.19.0-rc6-00172-ga9f38e1 #1 [ 245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 245.682181] mount D 6372 580 1 0x00000004 [ 245.683023] Call Trace: [ 245.683425] __schedule+0x2df/0x570 [ 245.683975] schedule+0x2d/0x80 [ 245.684476] schedule_timeout+0x19d/0x330 [ 245.685090] ? wait_for_common+0xa5/0x170 [ 245.685735] wait_for_common+0xac/0x170 [ 245.686339] ? do_sched_yield+0x90/0x90 [ 245.686935] wait_for_completion+0x12/0x20 [ 245.687571] __floppy_read_block_0+0xfb/0x150 [ 245.688244] ? floppy_resume+0x40/0x40 [ 245.688844] floppy_revalidate+0x20f/0x240 [ 245.689486] check_disk_change+0x43/0x60 [ 245.690087] floppy_open+0x1ea/0x360 [ 245.690653] __blkdev_get+0xb4/0x4d0 [ 245.691212] ? blkdev_get+0x1db/0x370 [ 245.691777] blkdev_get+0x1f3/0x370 [ 245.692351] ? path_put+0x15/0x20 [ 245.692871] ? lookup_bdev+0x4b/0x90 [ 245.693539] blkdev_get_by_path+0x3d/0x80 [ 245.694165] mount_bdev+0x2a/0x190 [ 245.694695] squashfs_mount+0x10/0x20 [ 245.695271] ? squashfs_alloc_inode+0x30/0x30 [ 245.695960] mount_fs+0xf/0x90 [ 245.696451] vfs_kern_mount+0x43/0x130 [ 245.697036] do_mount+0x187/0xc40 [ 245.697563] ? memdup_user+0x28/0x50 [ 245.698124] ksys_mount+0x60/0xc0 [ 245.698639] sys_mount+0x19/0x20 [ 245.699167] do_int80_syscall_32+0x61/0x130 [ 245.699813] entry_INT80_32+0xc7/0xc7 showing that we never complete that read request. The reason is that the completion setup is racy - it initializes the completion event AFTER submitting the IO, which means that the IO could complete before/during the init. If it does, we are passing garbage to complete() and we may sleep forever waiting for the event to occur. Fixes: 7b7b68bba5ef ("floppy: bail out in open() if drive is not responding to block0 read") Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-09Merge tag 'for-linus-20181109' of git://git.kernel.dk/linux-blockLinus Torvalds11-51/+43
Pull block layer fixes from Jens Axboe: - Two fixes for an ubd regression, one for missing locking, and one for a missing initialization of a field. The latter was an old latent bug, but it's now visible and triggers (Me, Anton Ivanov) - Set of NVMe fixes via Christoph, but applied manually due to a git tree mixup (Christoph, Sagi) - Fix for a discard split regression, in three patches (Ming) - Update libata git trees (Geert) - SPDX identifier for sata_rcar (Kuninori Morimoto) - Virtual boundary merge fix (Johannes) - Preemptively clear memory we are going to pass to userspace, in case the driver does a short read (Keith) * tag 'for-linus-20181109' of git://git.kernel.dk/linux-block: block: make sure writesame bio is aligned with logical block size block: cleanup __blkdev_issue_discard() block: make sure discard bio is aligned with logical block size Revert "nvmet-rdma: use a private workqueue for delete" nvme: make sure ns head inherits underlying device limits nvmet: don't try to add ns to p2p map unless it actually uses it sata_rcar: convert to SPDX identifiers ubd: fix missing initialization of io_req block: Clear kernel memory before copying to user MAINTAINERS: Fix remaining pointers to obsolete libata.git ubd: fix missing lock around request issue block: respect virtual boundary mask in bvecs
2018-11-09Merge tag 'ceph-for-4.20-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds4-19/+15
Pull Ceph fixes from Ilya Dryomov: "Two CephFS fixes (copy_file_range and quota) and a small feature bit cleanup" * tag 'ceph-for-4.20-rc2' of https://github.com/ceph/ceph-client: libceph: assume argonaut on the server side ceph: quota: fix null pointer dereference in quota check ceph: add destination file data sync before doing any remote copy
2018-11-09Merge tag 'mips_fixes_4.20_2' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Paul Burton: "A couple of small MIPS fixes for 4.20: - Extend an array to avoid overruns on some Octeon hardware, fixing a bug introduced in 4.3. - Fix a coherent DMA regression for systems without cache-coherent DMA introduced in the 4.20 merge window" * tag 'mips_fixes_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Fix `dma_alloc_coherent' returning a non-coherent allocation MIPS: OCTEON: fix out of bounds array access on CN68XX
2018-11-09block: make sure writesame bio is aligned with logical block sizeMing Lei1-1/+1
Obviously the created writesame bio has to be aligned with logical block size, and use bio_allowed_max_sectors() to retrieve this number. Cc: [email protected] Cc: Mike Snitzer <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Xiao Ni <[email protected]> Cc: Mariusz Dabrowski <[email protected]> Fixes: b49a0871be31a745b2ef ("block: remove split code in blkdev_issue_{discard,write_same}") Tested-by: Rui Salvaterra <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-09block: cleanup __blkdev_issue_discard()Ming Lei1-17/+6
Cleanup __blkdev_issue_discard() a bit: - remove local variable of 'end_sect' - remove code block of 'fail' Cc: Mike Snitzer <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Xiao Ni <[email protected]> Cc: Mariusz Dabrowski <[email protected]> Tested-by: Rui Salvaterra <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-09block: make sure discard bio is aligned with logical block sizeMing Lei3-3/+13
Obviously the created discard bio has to be aligned with logical block size. This patch introduces the helper of bio_allowed_max_sectors() for this purpose. Cc: [email protected] Cc: Mike Snitzer <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Xiao Ni <[email protected]> Cc: Mariusz Dabrowski <[email protected]> Fixes: 744889b7cbb56a6 ("block: don't deal with discard limit in blkdev_issue_discard()") Fixes: a22c4d7e34402cc ("block: re-add discard_granularity and alignment checks") Reported-by: Rui Salvaterra <[email protected]> Tested-by: Rui Salvaterra <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-09Revert "nvmet-rdma: use a private workqueue for delete"Christoph Hellwig1-15/+4
This reverts commit 2acf70ade79d26b97611a8df52eb22aa33814cd4. The commit never really fixed the intended issue and caused all kinds of other issues, including a use before initialization. Suggested-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-09nvme: make sure ns head inherits underlying device limitsSagi Grimberg2-1/+4
Whenever we update ns_head info, we need to make sure it is still compatible with all underlying backing devices because although nvme multipath doesn't have any explicit use of these limits, other devices can still be stacked on top of it which may rely on the underlying limits. Start with unlimited stacking limits, and every info update iterate over siblings and adjust queue limits. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-09nvmet: don't try to add ns to p2p map unless it actually uses itSagi Grimberg1-1/+1
Even without CONFIG_P2PDMA this results in a error print: nvmet: no peer-to-peer memory is available that's supported by rxe0 and /dev/nullb0 Fixes: c6925093d0b2 ("nvmet: Optionally use PCI P2P memory") Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Logan Gunthorpe <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-09Merge tag 's390-4.20-2' of ↵Linus Torvalds31-93/+175
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: - A fix for the pgtable_bytes misaccounting on s390. The patch changes common code part in regard to page table folding and adds extra checks to mm_[inc|dec]_nr_[pmds|puds]. - Add FORCE for all build targets using if_changed - Use non-loadable phdr for the .vmlinux.info section to avoid a segment overlap that confuses kexec - Cleanup the attribute definition for the diagnostic sampling - Increase stack size for CONFIG_KASAN=y builds - Export __node_distance to fix a build error - Correct return code of a PMU event init function - An update for the default configs * tag 's390-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/perf: Change CPUM_CF return code in event init function s390: update defconfigs s390/mm: Fix ERROR: "__node_distance" undefined! s390/kasan: increase instrumented stack size to 64k s390/cpum_sf: Rework attribute definition for diagnostic sampling s390/mm: fix mis-accounting of pgtable_bytes mm: add mm_pxd_folded checks to pgtable_bytes accounting functions mm: introduce mm_[p4d|pud|pmd]_folded mm: make the __PAGETABLE_PxD_FOLDED defines non-empty s390: avoid vmlinux segments overlap s390/vdso: add missing FORCE to build targets s390/decompressor: add missing FORCE to build targets
2018-11-08Merge tag 'xfs-4.20-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds3-4/+11
Pull xfs fixes from Darrick Wong: - fix incorrect dropping of error code from bmap - print buffer offsets instead of useless hashed pointers when dumping corrupt metadata - fix integer overflow in attribute verifier * tag 'xfs-4.20-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix overflow in xfs_attr3_leaf_verify xfs: print buffer offsets when dumping corrupt buffers xfs: Fix error code in 'xfs_ioc_getbmap()'
2018-11-08Merge tag 'led-fixes-for-4.20-rc2' of ↵Linus Torvalds2-21/+10
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED fixes from Jacek Anaszewski: "All three fixes are related to the newly added pattern trigger: - remove mutex_lock() from timer callback, which would trigger problems related to sleeping in atomic context, the removal is harmless since mutex protection turned out to be redundant in this case - fix pattern parsing to properly handle intervals with brightness == 0 - fix typos in the ABI documentation" * tag 'led-fixes-for-4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: Documentation: ABI: led-trigger-pattern: Fix typos leds: trigger: Fix sleeping function called from invalid context Fix pattern handling optimalization
2018-11-08Merge tag 'sound-4.20-rc2' of ↵Linus Torvalds2-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Two small regression fixes for HD-audio: one about vga_switcheroo and runtime PM, and another about Oops on some Thinkpads" * tag 'sound-4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix incorrect clearance of thinkpad_acpi hooks vga_switcheroo: Fix missing gpu_bound call at audio client registration
2018-11-08libceph: assume argonaut on the server sideIlya Dryomov2-16/+4
No one is running pre-argonaut. In addition one of the argonaut features (NOSRCADDR) has been required since day one (and a half, 2.6.34 vs 2.6.35) of the kernel client. Allow for the possibility of reusing these feature bits later. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2018-11-08ceph: quota: fix null pointer dereference in quota checkLuis Henriques1-1/+2
This patch fixes a possible null pointer dereference in check_quota_exceeded, detected by the static checker smatch, with the following warning:    fs/ceph/quota.c:240 check_quota_exceeded()     error: we previously assumed 'realm' could be null (see line 188) Fixes: b7a2921765cf ("ceph: quota: support for ceph.quota.max_files") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Luis Henriques <[email protected]> Reviewed-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-11-08ceph: add destination file data sync before doing any remote copyLuis Henriques1-2/+9
If we try to copy into a file that was just written, any data that is remote copied will be overwritten by our buffered writes once they are flushed.  When this happens, the call to invalidate_inode_pages2_range will also return a -EBUSY error. This patch fixes this by also sync'ing the destination file before starting any copy. Fixes: 503f82a9932d ("ceph: support copy_file_range file operation") Signed-off-by: Luis Henriques <[email protected]> Reviewed-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-11-08sata_rcar: convert to SPDX identifiersKuninori Morimoto1-5/+1
This patch updates license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Kuninori Morimoto <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-08ubd: fix missing initialization of io_reqAnton Ivanov1-2/+1
The SYNC path doesn't initialize io_req->error, which can cause random errors. Before the conversion to blk-mq, we always completed requests with BLK_STS_OK status, but now we actually look at the error field and this issue becomes apparent. Signed-off-by: Anton Ivanov <[email protected]> [axboe: fixed up commit message to explain what is actually going on] Signed-off-by: Jens Axboe <[email protected]>
2018-11-08Merge tag 'compiler-attributes-for-linus-v4.20-rc2' of ↵Linus Torvalds2-5/+13
https://github.com/ojeda/linux Pull compiler attribute fixlets from Miguel Ojeda: "Small improvements to Compiler Attributes: - Define asm_volatile_goto for non-gcc compilers (Nick Desaulniers) - Improve the explanation of compiler_attributes.h" * tag 'compiler-attributes-for-linus-v4.20-rc2' of https://github.com/ojeda/linux: Compiler Attributes: improve explanation of header include/linux/compiler*.h: define asm_volatile_goto
2018-11-08Merge tag 'mtd/fixes-for-4.20-rc2' of git://git.infradead.org/linux-mtdLinus Torvalds6-10/+18
Pull MTD fixes from Boris Brezillon: "MTD changes: - Kill a VLA in sa1100 SPI NOR changes: - Make sure ->addr_width is restored when SFDP parsing fails - Propate errors happening in cqspi_direct_read_execute() NAND changes: - Fix kernel-doc mismatch - Fix nanddev_neraseblocks() to return the correct value - Avoid selection of BCH_CONST_PARAMS when some users require dynamic BCH settings" * tag 'mtd/fixes-for-4.20-rc2' of git://git.infradead.org/linux-mtd: mtd: nand: Fix nanddev_pos_next_page() kernel-doc header mtd: sa1100: avoid VLA in sa1100_setup_mtd mtd: spi-nor: Reset nor->addr_width when SFDP parsing failed mtd: spi-nor: cadence-quadspi: Return error code in cqspi_direct_read_execute() mtd: nand: Fix nanddev_neraseblocks() mtd: nand: drop kernel-doc notation for a deleted function parameter mtd: docg3: don't set conflicting BCH_CONST_PARAMS option
2018-11-08Compiler Attributes: improve explanation of headerMiguel Ojeda1-5/+9
Explain better what "optional" attributes are, and avoid calling them so to avoid confusion. Simply retain "Optional" as a word to look for in the comments. Moreover, add a couple sentences to explain a bit more the intention and the documentation links. Signed-off-by: Miguel Ojeda <[email protected]>
2018-11-08s390/perf: Change CPUM_CF return code in event init functionThomas Richter1-1/+1
The function perf_init_event() creates a new event and assignes it to a PMU. This a done in a loop over all existing PMUs. For each listed PMU the event init function is called and if this function does return any other error than -ENOENT, the loop is terminated the creation of the event fails. If the event is invalid, return -ENOENT to try other PMUs. Signed-off-by: Thomas Richter <[email protected]> Reviewed-by: Hendrik Brueckner <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-11-07block: Clear kernel memory before copying to userKeith Busch1-0/+1
If the kernel allocates a bounce buffer for user read data, this memory needs to be cleared before copying it to the user, otherwise it may leak kernel memory to user space. Laurence Oberman <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-07MAINTAINERS: Fix remaining pointers to obsolete libata.gitGeert Uytterhoeven1-3/+3
libata.git no longer exists. Replace the remaining pointers to it by pointers to the block tree, which is where all libata development happens now. Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-07ubd: fix missing lock around request issueJens Axboe1-2/+7
We need to hold the device lock (and disable interrupts) while writing new commands, or we could be interrupted while that is happening and read invalid requests in the completion path. Fixes: 4e6da0fe8058 ("um: Convert ubd driver to blk-mq") Tested-by: Richard Weinberger <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-07Documentation: ABI: led-trigger-pattern: Fix typosGeert Uytterhoeven1-2/+2
- Spelling s/brigntess/brightness/, - Double "use". Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Pavel Machek <[email protected]> Signed-off-by: Jacek Anaszewski <[email protected]>
2018-11-07leds: trigger: Fix sleeping function called from invalid contextBaolin Wang1-16/+4
We will meet below issue due to mutex_lock() is called in interrupt context. The mutex lock is used to protect the pattern trigger data, but before changing new pattern trigger data (pattern values or repeat value) by users, we always cancel the timer firstly to clear previous patterns' performance. That means there is no race in pattern_trig_timer_function(), so we can drop the mutex lock in pattern_trig_timer_function() to avoid this issue. Moreover we can move the timer cancelling into mutex protection, since there is no deadlock risk if we remove the mutex lock in pattern_trig_timer_function(). BUG: sleeping function called from invalid context at kernel/locking/mutex.c:254 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.20.0-rc1-koelsch-00841-ga338c8181013c1a9 #171 Hardware name: Generic R-Car Gen2 (Flattened Device Tree) [<c020f19c>] (unwind_backtrace) from [<c020aecc>] (show_stack+0x10/0x14) [<c020aecc>] (show_stack) from [<c07affb8>] (dump_stack+0x7c/0x9c) [<c07affb8>] (dump_stack) from [<c02417d4>] (___might_sleep+0xf4/0x158) [<c02417d4>] (___might_sleep) from [<c07c92c4>] (mutex_lock+0x18/0x60) [<c07c92c4>] (mutex_lock) from [<c067b28c>] (pattern_trig_timer_function+0x1c/0x11c) [<c067b28c>] (pattern_trig_timer_function) from [<c027f6fc>] (call_timer_fn+0x1c/0x90) [<c027f6fc>] (call_timer_fn) from [<c027f944>] (expire_timers+0x94/0xa4) [<c027f944>] (expire_timers) from [<c027fc98>] (run_timer_softirq+0x108/0x15c) [<c027fc98>] (run_timer_softirq) from [<c02021cc>] (__do_softirq+0x1d4/0x258) [<c02021cc>] (__do_softirq) from [<c0224d24>] (irq_exit+0x64/0xc4) [<c0224d24>] (irq_exit) from [<c0268dd0>] (__handle_domain_irq+0x80/0xb4) [<c0268dd0>] (__handle_domain_irq) from [<c045e1b0>] (gic_handle_irq+0x58/0x90) [<c045e1b0>] (gic_handle_irq) from [<c02019f8>] (__irq_svc+0x58/0x74) Exception stack(0xeb483f60 to 0xeb483fa8) 3f60: 00000000 00000000 eb9afaa0 c0217e80 00000000 ffffe000 00000000 c0e06408 3f80: 00000002 c0e0647c c0c6a5f0 00000000 c0e04900 eb483fb0 c0207ea8 c0207e98 3fa0: 60020013 ffffffff [<c02019f8>] (__irq_svc) from [<c0207e98>] (arch_cpu_idle+0x1c/0x38) [<c0207e98>] (arch_cpu_idle) from [<c0247ca8>] (do_idle+0x138/0x268) [<c0247ca8>] (do_idle) from [<c0248050>] (cpu_startup_entry+0x18/0x1c) [<c0248050>] (cpu_startup_entry) from [<402022ec>] (0x402022ec) Fixes: 5fd752b6b3a2 ("leds: core: Introduce LED pattern trigger") Signed-off-by: Baolin Wang <[email protected]> Reported-by: Geert Uytterhoeven <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Jacek Anaszewski <[email protected]>
2018-11-07block: respect virtual boundary mask in bvecsJohannes Thumshirn2-2/+2
With drivers that are settting a virtual boundary constrain, we are seeing a lot of bio splitting and smaller I/Os being submitted to the driver. This happens because the bio gap detection code does not account cases where PAGE_SIZE - 1 is bigger than queue_virt_boundary() and thus will split the bio unnecessarily. Cc: Jan Kara <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: Ming Lei <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Acked-by: Keith Busch <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-11-07Merge tag 'hwmon-for-v4.20-rc2' of ↵Linus Torvalds2-8/+7
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Remove bogus __init annotations in ibmpowernv driver - Fix double-free in error handling of __hwmon_device_register() * tag 'hwmon-for-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (ibmpowernv) Remove bogus __init annotations hwmon: (core) Fix double-free in __hwmon_device_register()
2018-11-07Merge tag 'armsoc-fixes' of ↵Linus Torvalds9-31/+39
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "A few more fixes that have come in, and one revert of a previous fix. I was a bit too trigger happy to enable PREEMPT on multi_v7_defconfig, and it ended up regressing at least BeagleBone XM boards. While we get that debugged for next merge window, let's disable it again. Beyond that: - Stratix change to fix multicast filtering - Minor DT fixes for Renesas and i.MX - Ethernet fix for a Renesas board (switching main interfaces) - Ethernet phy regulator fix for i.MX6SX" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm64: dts: stratix10: fix multicast filtering ARM: defconfig: Disable PREEMPT again on multi_v7 arm64: dts: renesas: condor: switch from EtherAVB to GEther dt-bindings: arm: Fix RZ/G2E part number arm64: dts: renesas: r8a7795: add missing dma-names on hscif2 ARM: dts: imx6sx-sdb: Fix enet phy regulator ARM: dts: fsl: Fix improperly quoted stdout-path values ARM: dts: imx6sll: fix typo for fsl,imx6sll-i2c node