aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-02-21block: bounce: make sure that bvec table is updatedMing Lei1-2/+6
Block bounce needs to allocate new page for doing IO, and the new page has to be updated to bvec table. Commit 6dc4f100c switches __blk_queue_bounce() to use the new bio_for_each_segment_all() interface. Unfortunately the new bio_for_each_segment_all() can't be used to update bvec table. This patch fixes this issue by retrieving bvec from the table directly, then the new allocated page can be updated to the bio. This way is safe because the cloned bio is single page bvec. Fixes: 6dc4f100c ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec") Cc: Christoph Hellwig <[email protected]> Cc: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-21Merge branch 'nvme-5.1' of git://git.infradead.org/nvme into for-5.1/blockJens Axboe29-289/+168
Pull NVMe changes for 5.1 from Christoph * 'nvme-5.1' of git://git.infradead.org/nvme: (22 commits) nvme-rdma: use nr_phys_segments when map rq to sgl nvmet: convert to SPDX identifiers nvmet-rdma: convert to SPDX identifiers nvme-loop: convert to SPDX identifiers nvmet-fcloop: convert to SPDX identifiers nvmet-fc: convert to SPDX identifiers nvme: convert to SPDX identifiers nvme-pci: convert to SPDX identifiers nvme-lightnvm: convert to SPDX identifiers nvme-rdma: convert to SPDX identifiers nvme-fc: convert to SPDX identifiers nvme-fabrics: convert to SPDX identifiers nvme-tcp.h: fix SPDX header nvme_ioctl.h: remove duplicate GPL boilerplate nvme: return error from nvme_alloc_ns() nvme: avoid that deleting a controller triggers a circular locking complaint nvme: introduce a helper function for controller deletion nvme: unexport nvme_delete_ctrl_sync() nvme-pci: check kstrtoint() return value in queue_count_set() nvme-fabrics: document the poll function argument ...
2019-02-21nvme-rdma: use nr_phys_segments when map rq to sglChaitanya Kulkarni1-2/+2
Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check if a command contains data to be mapped. This fixes the case where a struct request contains LBAs, but it has no payload, such as Write Zeroes support. Fixes: 6e02318eaea5 ("nvme: add support for the Write Zeroes command") Reported-by: Ming Lei <[email protected]> Signed-off-by: Chaitanya Kulkarni <[email protected]> Tested-by: Ming Lei <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-02-20nvmet: convert to SPDX identifiersChristoph Hellwig7-63/+7
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvmet-rdma: convert to SPDX identifiersChristoph Hellwig1-9/+1
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme-loop: convert to SPDX identifiersChristoph Hellwig1-9/+1
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvmet-fcloop: convert to SPDX identifiersChristoph Hellwig1-12/+1
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvmet-fc: convert to SPDX identifiersChristoph Hellwig1-13/+1
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme: convert to SPDX identifiersChristoph Hellwig6-46/+6
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme-pci: convert to SPDX identifiersChristoph Hellwig1-9/+1
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme-lightnvm: convert to SPDX identifiersChristoph Hellwig1-15/+1
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme-rdma: convert to SPDX identifiersChristoph Hellwig2-18/+2
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme-fc: convert to SPDX identifiersChristoph Hellwig3-35/+3
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme-fabrics: convert to SPDX identifiersChristoph Hellwig2-18/+2
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme-tcp.h: fix SPDX headerChristoph Hellwig1-1/+1
For .h files we need to use /* */ style comments. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme_ioctl.h: remove duplicate GPL boilerplateChristoph Hellwig2-18/+1
We already have a ЅPDX header, so no need to duplicate the information. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2019-02-20nvme: return error from nvme_alloc_ns()Hannes Reinecke1-10/+21
nvme_alloc_ns() might fail, so we should be returning an error code. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-02-20nvme: avoid that deleting a controller triggers a circular locking complaintBart Van Assche1-2/+3
Rework nvme_delete_ctrl_sync() such that it does not have to wait for queued work. This patch avoids that test nvme/008 triggers the following complaint: WARNING: possible circular locking dependency detected 5.0.0-rc6-dbg+ #10 Not tainted ------------------------------------------------------ nvme/7918 is trying to acquire lock: 000000009a1a7b69 ((work_completion)(&ctrl->delete_work)){+.+.}, at: __flush_work+0x379/0x410 but task is already holding lock: 00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (kn->count#389){++++}: lock_acquire+0xc5/0x1e0 __kernfs_remove+0x42a/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x68/0xb0 device_del+0x24d/0x570 cdev_device_del+0x1a/0x50 nvme_delete_ctrl_work+0xbd/0xe0 process_one_work+0x4f1/0xa40 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 ((work_completion)(&ctrl->delete_work)){+.+.}: __lock_acquire+0x1323/0x17b0 lock_acquire+0xc5/0x1e0 __flush_work+0x399/0x410 flush_work+0x10/0x20 nvme_delete_ctrl_sync+0x65/0x70 nvme_sysfs_delete+0x4f/0x60 dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x186/0x240 __vfs_write+0xd7/0x430 vfs_write+0xfa/0x260 ksys_write+0xab/0x130 __x64_sys_write+0x43/0x50 do_syscall_64+0x71/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kn->count#389); lock((work_completion)(&ctrl->delete_work)); lock(kn->count#389); lock((work_completion)(&ctrl->delete_work)); *** DEADLOCK *** 3 locks held by nvme/7918: #0: 00000000e2223b44 (sb_writers#6){.+.+}, at: vfs_write+0x1eb/0x260 #1: 000000003404976f (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x240 #2: 00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210 stack backtrace: CPU: 4 PID: 7918 Comm: nvme Not tainted 5.0.0-rc6-dbg+ #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xca print_circular_bug.isra.36.cold.54+0x173/0x1d5 check_prev_add.constprop.45+0x996/0x1110 __lock_acquire+0x1323/0x17b0 lock_acquire+0xc5/0x1e0 __flush_work+0x399/0x410 flush_work+0x10/0x20 nvme_delete_ctrl_sync+0x65/0x70 nvme_sysfs_delete+0x4f/0x60 dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x186/0x240 __vfs_write+0xd7/0x430 vfs_write+0xfa/0x260 ksys_write+0xab/0x130 __x64_sys_write+0x43/0x50 do_syscall_64+0x71/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-02-20nvme: introduce a helper function for controller deletionBart Van Assche1-4/+9
This patch does not change any functionality but makes the next patch in this series easier to read. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-02-20nvme: unexport nvme_delete_ctrl_sync()Bart Van Assche2-3/+1
Since nvme_delete_ctrl_sync() is not called from any other kernel module, unexport it. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-02-20nvme-pci: check kstrtoint() return value in queue_count_set()Bart Van Assche1-0/+2
This patch avoids that the compiler complains about 'ret' being set but not being used when building with W=1. Fixes: 3b6592f70ad7 ("nvme: utilize two queue maps, one for reads and one for writes") # v5.0-rc1 Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-02-20nvme-fabrics: document the poll function argumentBart Van Assche1-0/+1
This patch avoids that the kernel-doc tool reports a warning when building with W=1. Fixes: 26c682274e0a ("nvme-fabrics: allow nvmf_connect_io_queue to poll") # v5.0-rc1 Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-02-20nvmet: fix indentationBart Van Assche1-1/+1
This patch avoids that smatch complains about inconsistent indentation. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") # v4.10 Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-02-20nvme-multipath: round-robin I/O policyHannes Reinecke3-1/+100
Implement a simple round-robin I/O policy for multipathing. Path selection is done in two rounds, first iterating across all optimized paths, and if that doesn't return any valid paths, iterate over all optimized and non-optimized paths. If no paths are found, use the existing algorithm. Also add a sysfs attribute 'iopolicy' to switch between the current NUMA-aware I/O policy and the 'round-robin' I/O policy. Signed-off-by: Hannes Reinecke <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-02-19block: avoid to READ fields of null bioMing Lei1-1/+3
rq->bio can be NULL sometimes, such as flush request, so don't read bio->bi_seg_front_size until this 'bio' is checked as valid. Cc: Bart Van Assche <[email protected]> Reported-by: Bart Van Assche <[email protected]> Fixes: dcebd755926b0f39dd1e ("block: use bio_for_each_bvec() to compute multi-page bvec count") Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15Merge tag 'v5.0-rc6' into for-5.1/blockJens Axboe566-2185/+5560
Pull in 5.0-rc6 to avoid a dumb merge conflict with fs/iomap.c. This is needed since io_uring is now based on the block branch, to avoid a conflict between the multi-page bvecs and the bits of io_uring that touch the core block parts. * tag 'v5.0-rc6': (525 commits) Linux 5.0-rc6 x86/mm: Make set_pmd_at() paravirt aware MAINTAINERS: Update the ocores i2c bus driver maintainer, etc blk-mq: remove duplicated definition of blk_mq_freeze_queue Blk-iolatency: warn on negative inflight IO counter blk-iolatency: fix IO hang due to negative inflight counter MAINTAINERS: unify reference to xen-devel list x86/mm/cpa: Fix set_mce_nospec() futex: Handle early deadlock return correctly futex: Fix barrier comment net: dsa: b53: Fix for failure when irq is not defined in dt blktrace: Show requests without sector mips: cm: reprime error cause mips: loongson64: remove unreachable(), fix loongson_poweroff(). sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() geneve: should not call rt6_lookup() when ipv6 was disabled KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222) kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974) signal: Better detection of synchronous signals ...
2019-02-15block: kill BLK_MQ_F_SG_MERGEMing Lei10-11/+7
QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: kill QUEUE_FLAG_NO_SG_MERGEMing Lei5-43/+6
Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"), physical segment number is mainly figured out in blk_queue_split() for fast path, and the flag of BIO_SEG_VALID is set there too. Now only blk_recount_segments() and blk_recalc_rq_segments() use this flag. Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID is set in blk_queue_split(). For another user of blk_recalc_rq_segments(): - run in partial completion branch of blk_update_request, which is an unusual case - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed since dm-rq is the only user. Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the current setup of the I/O path, as it isn't going to save you a significant amount of cycles. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: document usage of bio iterator helpersMing Lei1-0/+25
Now multi-page bvec is supported, some helpers may return page by page, meantime some may return segment by segment, this patch documents the usage. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: always define BIO_MAX_PAGES as 256Ming Lei1-8/+0
Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to increase BIO_MAX_PAGES for it. CONFIG_THP_SWAP needs to split one THP into normal pages and adds them all to one bio. With multipage-bvec, it just takes one bvec to hold them all. Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: enable multipage bvecsMing Lei4-12/+20
This patch pulls the trigger for multi-page bvecs. Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: allow bio_for_each_segment_all() to iterate over multi-page bvecMing Lei27-46/+127
This patch introduces one extra iterator variable to bio_for_each_segment_all(), then we can allow bio_for_each_segment_all() to iterate over multi-page bvec. Given it is just one mechannical & simple change on all bio_for_each_segment_all() users, this patch does tree-wide change in one single patch, so that we can avoid to use a temporary helper for this conversion. Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()Ming Lei1-1/+5
bch_bio_alloc_pages() is always called on one new bio, so it is safe to access the bvec table directly. Given it is the only kind of this case, open code the bvec table access since bio_for_each_segment_all() will be changed to support for iterating over multipage bvec. Acked-by: Coly Li <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: loop: pass multi-page bvec to iov_iterMing Lei1-10/+10
iov_iter is implemented on bvec itererator helpers, so it is safe to pass multi-page bvec to it, and this way is much more efficient than passing one page in each bvec. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15btrfs: use mp_bvec_last_segment to get bio's last pageMing Lei1-2/+3
Preparing for supporting multi-page bvec. Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15fs/buffer.c: use bvec iterator to truncate the bioMing Lei1-1/+4
Once multi-page bvec is enabled, the last bvec may include more than one page, this patch use mp_bvec_last_segment() to truncate the bio. Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: introduce mp_bvec_last_segment()Ming Lei1-0/+22
BTRFS and guard_bio_eod() need to get the last singlepage segment from one multipage bvec, so introduce this helper to make them happy. Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: use bio_for_each_bvec() to map sgMing Lei1-20/+50
It is more efficient to use bio_for_each_bvec() to map sg, meantime we have to consider splitting multipage bvec as done in blk_bio_segment_split(). Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: use bio_for_each_bvec() to compute multi-page bvec countMing Lei1-20/+83
First it is more efficient to use bio_for_each_bvec() in both blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how many multi-page bvecs there are in the bio. Secondly once bio_for_each_bvec() is used, the bvec may need to be splitted because its length can be very longer than max segment size, so we have to split the big bvec into several segments. Thirdly when splitting multi-page bvec into segments, the max segment limit may be reached, so the bio split need to be considered under this situation too. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: introduce bio_for_each_bvec() and rq_for_each_bvec()Ming Lei2-0/+14
bio_for_each_bvec() is used for iterating over multi-page bvec for bio split & merge code. rq_for_each_bvec() can be used for drivers which may handle the multi-page bvec directly, so far loop is one perfect use case. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: introduce multi-page bvec helpersMing Lei1-3/+27
This patch introduces helpers of 'mp_bvec_iter_*' for multi-page bvec support. The introduced helpers treate one bvec as real multi-page segment, which may include more than one pages. The existed helpers of bvec_iter_* are interfaces for supporting current bvec iterator which is thought as single-page by drivers, fs, dm and etc. These introduced helpers will build single-page bvec in flight, so this way won't break current bio/bvec users, which needn't any change. Follows some multi-page bvec background: - bvecs stored in bio->bi_io_vec is always multi-page style - bvec(struct bio_vec) represents one physically contiguous I/O buffer, now the buffer may include more than one page after multi-page bvec is supported, and all these pages represented by one bvec is physically contiguous. Before multi-page bvec support, at most one page is included in one bvec, we call it single-page bvec. - .bv_page of the bvec points to the 1st page in the multi-page bvec - .bv_offset of the bvec is the offset of the buffer in the bvec The effect on the current drivers/filesystem/dm/bcache/...: - almost everyone supposes that one bvec only includes one single page, so we keep the sp interface not changed, for example, bio_for_each_segment() still returns single-page bvec - bio_for_each_segment_all() will return single-page bvec too - during iterating, iterator variable(struct bvec_iter) is always updated in multi-page bvec style, and bvec_iter_advance() is kept not changed - returned(copied) single-page bvec is built in flight by bvec helpers from the stored multi-page bvec Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: remove bvec_iter_rewind()Ming Lei1-24/+0
Commit 7759eb23fd980 ("block: remove bio_rewind_iter()") removes bio_rewind_iter(), then no one uses bvec_iter_rewind() any more, so remove it. Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15block: don't use bio->bi_vcnt to figure out segment numberMing Lei1-7/+1
It is wrong to use bio->bi_vcnt to figure out how many segments there are in the bio even though CLONED flag isn't set on this bio, because this bio may be splitted or advanced. So always use bio_segments() in blk_recount_segments(), and it shouldn't cause any performance loss now because the physical segment number is figured out in blk_queue_split() and BIO_SEG_VALID is set meantime since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"). Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Fixes: 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max segments") Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-15btrfs: look at bi_size for repair decisionsChristoph Hellwig2-7/+1
bio_readpage_error currently uses bi_vcnt to decide if it is worth retrying an I/O. But the vector count is mostly an implementation artifact - it really should figure out if there is more than a single sector worth retrying. Use bi_size for that and shift by PAGE_SHIFT. This really should be blocks/sectors, but given that btrfs doesn't support a sector size different from the PAGE_SIZE using the page size keeps the changes to a minimum. Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-11block: avoid setting none scheduler if it's already noneAleksei Zakharov1-1/+4
There's no reason to freeze queue and remove scheduler if there's no scheduler already. Signed-off-by: Aleksei Zakharov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-11block: avoid setting wbt_lat_usec to current valueAleksei Zakharov1-0/+3
There's no reason to set wbt min lat and freeze request queue if current value is the same. Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Aleksei Zakharov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-11lightnvm: pblk: fix race condition on GCHeiner Litz6-7/+18
This patch fixes a race condition where a write is mapped to the last sectors of a line. The write is synced to the device but the L2P is not updated yet. When the line is garbage collected before the L2P update is performed, the sectors are ignored by the GC logic and the line is freed before all sectors are moved. When the L2P is finally updated, it contains a mapping to a freed line, subsequent reads of the corresponding LBAs fail. This patch introduces a per line counter specifying the number of sectors that are synced to the device but have not been updated in the L2P. Lines with a counter of greater than zero will not be selected for GC. Signed-off-by: Heiner Litz <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-11lightnvm: pblk: prevent stall due to wb thresholdJavier González3-10/+22
In order to respect mw_cuinits, pblk's write buffer maintains a backpointer to protect data not yet persisted; when writing to the write buffer, this backpointer defines a threshold that pblk's rate-limiter enforces. On small PU configurations, the following scenarios might take place: (i) the threshold is larger than the write buffer and (ii) the threshold is smaller than the write buffer, but larger than the maximun allowed split bio - 256KB at this moment (Note that writes are not always split - we only do this when we the size of the buffer is smaller than the buffer). In both cases, pblk's rate-limiter prevents the I/O to be written to the buffer, thus stalling. This patch fixes the original backpointer implementation by considering the threshold both on buffer creation and on the rate-limiters path, when bio_split is triggered (case (ii) above). Fixes: 766c8ceb16fc ("lightnvm: pblk: guarantee that backpointer is respected on writer stall") Signed-off-by: Javier González <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-11lightnvm: pblk: extend line wp balance checkHans Holmberg1-18/+38
pblk stripes writes of minimal write size across all non-offline chunks in a line, which means that the maximum write pointer delta should not exceed the minimal write size. Extend the line write pointer balance check to cover this case, and ignore the offline chunk wps. This will render us a warning during recovery if something unexpected has happened to the chunk write pointers (i.e. powerloss, a spurious chunk reset, ..). Reported-by: Zhoujie Wu <[email protected]> Tested-by: Zhoujie Wu <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Hans Holmberg <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-11lightnvm: pblk: fix TRACE_INCLUDE_PATHMasahiro Yamada1-1/+1
As the comment block in include/trace/define_trace.h says, TRACE_INCLUDE_PATH should be a relative path to the define_trace.h ../../drivers/lightnvm is the correct relative path. ../../../drivers/lightnvm is working by coincidence because the top Makefile adds -I$(srctree)/arch/$(SRCARCH)/include as a header search path, but we should not rely on it. Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>