Age | Commit message (Collapse) | Author | Files | Lines |
|
A crash happens when inject failed reconnection.
If reconnect failed after start io queues, the queues will be unquiesced
and new requests continue to be delivered. Reconnection error handling
process directly free queues without cancel suspend requests. The
suppend request will time out, and then crash due to use the queue
after free.
Add sync queues and cancel suppend requests for reconnection error
handling.
Signed-off-by: Chao Leng <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Add nvme_cancel_tagset and nvme_cancel_admin_tagset for tear down and
reconnection error handling.
Signed-off-by: Chao Leng <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Remove the extra space in the nvme_free_cels() when calling
xa_for_each loop which is not a common practice
(except drivers/infiniband/core/ not sure why).
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
When support for the NVMe ZNS commands was merged, tracing of these has
been omitted.
Add nvme_cmd_zone_mgmt_send, nvme_cmd_zone_mgmt_recv as well as
nvme_cmd_zone_append to the nvme driver's tracing facility.
Signed-off-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Add detailed parsing of format nvm admin command to make the
trace log more consistent and human-readable.
Signed-off-by: Michal Krakowiak <[email protected]>
Acked-by: Dan Williams <[email protected]>
Reviewed-by: Minwoo Im <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
All the updates are mentioned in the ratified NVMe 1.4 spec.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Max Gurtovoy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
In this preparation patch, we add helpers to convert lbas to sectors &
sectors to lba. This is needed to eliminate code duplication in the ZBD
backend.
Use these helpers in the block device backend.
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
We remove the extra local variable struct nvmet_ns in
nvmet_execute_identify_ns() since req already has ns member that can be
reused, this also eliminates the explicit call to nvmet_put_namespace()
which is already present in the request completion path.
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
We remove the extra local variable struct nvmet_ns in
nvmet_execute_identify_desclist() since req already has ns member that
can be reused, this also eliminates the explicit call to
nvmet_put_namespace() which is already present in the request
completion path.
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
We remove the extra local variable struct nvmet_ns in
nvmet_get_smart_log_nsid() since req already has ns member that can be
reused, this also eliminates the explicit call to nvmet_put_namespace()
which is already present in the request completion path.
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Just for current code in nvme_cleanup_cmd(), we don't have to get
namespace instance, but we need controller instance.
Controller instance can be retrieved by namespace instance, but it can
be directly accessed by nvme_request instance from request.
ctrl = nvme_req(req)->ctrl;
We don't have to go around namespace instance from request instance
through gendisk.
Signed-off-by: Minwoo Im <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
iov_iter uses the right helpers so we should be able
to pass in a multipage bvec. Right now the iov_iter is
initialized with more segments that it needs which doesn't
fail because the iov_iter is capped by byte count, but it
is better to use a full multipage bvec iter.
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
We might set the iov_iter direction wrong, which is harmless for this
use-case, but get it right. Also this makes the code slightly cleaner.
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
The controller can request a delay retrying a failed command by setting
the Command Retry Delay (CRD) field in the Completion Queue Entry.
Currentlty this features is only applied to commands on the I/O queue, but
not to commands on the admin queue. Retreive the nvme_ctrl from the
request so that no namespace is required and apply the feature to all
commands.
Signed-off-by: Minwoo Im <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
The only usage of these is to put their addresses in arrays of pointers
to const attribute_groups. Make them const to allow the compiler to put
them in read-only memory.
Signed-off-by: Rikard Falkeborn <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
searching assoc_list protected by rcu_read_lock if list not changed inline.
and according to the rcu list rules.
queue array embedded into nvmet_fc_tgt_assoc protected by rcu_read_lock
according to rcu dereference/assign rules.
queue and assoc object freed after grace period by call_rcu.
tgtport lock taken for changing assoc_list.
Reviewed-by: Eldad Zinger <[email protected]>
Reviewed-by: Elad Grupi <[email protected]>
Reviewed-by: James Smart <[email protected]>
Signed-off-by: Leonid Ravich <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Remove extra tab.
Signed-off-by: Israel Rukshin <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Remove code duplication.
Signed-off-by: Israel Rukshin <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Use semicolons and braces.
Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fix below warnings reported by coccicheck:
./drivers/block/rsxx/dma.c:948:3-8: WARNING: NULL check
before some freeing functions is not needed.
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
fixed the below warning:
/drivers/block/zram/zram_drv.c:534:2-8: WARNING: NULL check
before some freeing functions is not needed.
Signed-off-by: Tian Tao <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We can remove start_jif since it is not used by drbd_request_prepare,
then remove it from __drbd_make_request further.
Cc: Philipp Reisner <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: [email protected]
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Replace pci_read_config_word() with pcie_capability_read_word().
pcie_capability_read_word() takes care of a few special cases when reading
the PCIe capability. See 8c0d3a02c130 ("PCI: Add accessors for PCI Express
Capability").
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use PCI #defines for PCIe Device Control register values instead of
hard-coding bit positions. No functional change intended.
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently, loop device has only one global lock: loop_ctl_mutex.
This becomes hot in scenarios where many loop devices are used.
Scale it by introducing per-device lock: lo_mutex that protects
modifications of all fields in struct loop_device.
Keep loop_ctl_mutex to protect global data: loop_index_idr, loop_lookup,
loop_add.
The new lock ordering requirement is that loop_ctl_mutex must be taken
before lo_mutex.
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
Reviewed-by: Petr Vorel <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
blkdev_fallocate() tries to detect whether a discard raced with an
overlapping write by calling invalidate_inode_pages2_range(). However
this check can give both false negatives (when writing using direct IO
or when writeback already writes out the written pagecache range) and
false positives (when write is not actually overlapping but ends in the
same page when blocksize < pagesize). This actually causes issues for
qemu which is getting confused by EBUSY errors.
Fix the problem by removing this conflicting write detection since it is
inherently racy and thus of little use anyway.
Reported-by: Maxim Levitsky <[email protected]>
CC: "Darrick J. Wong" <[email protected]>
Link: https://lore.kernel.org/qemu-devel/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Cloned bios are can be used to on the same device, in which case we need
to inherit the BIO_REMAPPED flag to avoid a double partition remap. When
the cloned bios are used on another device, bio_set_dev will clear the flag.
Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio")
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
It's only used in the same file, mark is appropriately static.
Fixes: 71217df39dc6 ("block, bfq: make waker-queue detection more robust")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In the presence of many parallel I/O flows, the detection of waker
bfq_queues suffers from false positives. This commits addresses this
issue by making the filtering of actual wakers more selective. In more
detail, a candidate waker must be found to meet waker requirements
three times before being promoted to actual waker.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
To prevent injection information from being lost on bfq_queue merging,
also the amount of service that a bfq_queue receives must be saved and
restored when the bfq_queue is merged and split, respectively.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
To prevent weight-raising information from being lost on bfq_queue merging,
also the amount of service that a bfq_queue receives must be saved and
restored when the bfq_queue is merged and split, respectively.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
A bfq_queue may happen to be deemed as soft real-time while it is
still enjoying interactive weight-raising. If this happens because of
a false positive, then the bfq_queue is likely to loose its soft
real-time status soon. Upon losing such a status, the bfq_queue must
get back its interactive weight-raising, if its interactive period is
not over yet. But this case is not handled. This commit corrects this
error.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Upon an I/O-dispatch attempt, BFQ may detect that it was better to
plug I/O dispatch, and to wait for a new request to arrive for the
currently in-service queue. But the arrival of a new request for an
empty bfq_queue, and thus the switch from idle to busy of the
bfq_queue, may cause the scenario to change, and make plugging no
longer needed for service guarantees, or more convenient for
throughput. In this case, keeping I/O-dispatch plugged would certainly
lower throughput.
To address this issue, this commit makes such a check, and stops
plugging I/O if it is better to stop plugging I/O.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Some BFQ mechanisms make their decisions on a bfq_queue basing also on
whether the bfq_queue is I/O bound. In this respect, the current logic
for evaluating whether a bfq_queue is I/O bound is rather rough. This
commits replaces this logic with a more effective one.
The new logic measures the percentage of time during which a bfq_queue
is active, and marks the bfq_queue as I/O bound if the latter if this
percentage is above a fixed threshold.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When an already remapped bio is resubmitted (e.g. by blk_queue_split),
bio_check_eod will compare the remapped bi_sector against the size
of the partition, leading to spurious I/O failures.
Skip the EOD check in this case.
Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio")
Reported-by: Jens Axboe <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The block layer spends quite a while in blkdev_direct_IO() to copy and
initialise bio's bvec. However, if we've already got a bvec in the input
iterator it might be reused in some cases, i.e. when new
ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
performance boost, and it also reduces memory footprint.
Suggested-by: Matthew Wilcox <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add a helper function calculating the number of bvec segments we need to
allocate to construct a bio. It doesn't change anything functionally,
but will be used to not duplicate special cases in the future.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
iov_iter_advance() is heavily used, but implemented through generic
means. For bvecs there is a specifically crafted function for that, so
use bvec_iter_advance() instead, it's faster and slimmer.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This saves one memory allocation, and ensures the bvecs aren't freed
before the AIO completion. This will allow the lower level code to be
optimized so that it can avoid allocating another bvec array.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Direct IO does not operate on the current working set of pages managed
by the kernel, so it should not be accounted as memory stall to PSI
infrastructure.
The block layer and iomap direct IO use bio_iov_iter_get_pages()
to build bios, and they are the only users of it, so to avoid PSI
tracking for them clear out BIO_WORKINGSET flag. Do same for
dio_bio_submit() because fs/direct_io constructs bios by hand directly
calling bio_add_page().
Reported-by: Christoph Hellwig <[email protected]>
Suggested-by: Christoph Hellwig <[email protected]>
Suggested-by: Johannes Weiner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
zero-length bvec segments are allowed in general, but not handled by bio
and down the block layer so filtered out. This inconsistency may be
confusing and prevent from optimisations. As zero-length segments are
useless and places that were generating them are patched, declare them
not allowed.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
iter_file_splice_write() may spawn bvec segments with zero-length. In
preparation for prohibiting them, filter out by hand at splice level.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We can remove 'q' from blk_execute_rq as well after the previous change
in blk_execute_rq_nowait.
And more importantly it never really was needed to start with given
that we can trivial derive it from struct request.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Acked-by: Ulf Hansson <[email protected]> # for mmc
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The 'q' is not used since commit a1ce35fa4985 ("block: remove dead
elevator code"), also update the comment of the function.
And more importantly it never really was needed to start with given
that we can trivial derive it from struct request.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Free the request rq before returning error code.
Fixes: 972248e9111e ("scsi: bsg-lib: handle bidi requests without block layer help")
Signed-off-by: Pan Bian <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This bioset is just for allocating bio only from bio_next_split, and it
needn't bvecs, so remove the flag.
Cc: [email protected]
Cc: Coly Li <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Acked-by: Coly Li <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
bvec_alloc(), bvec_free() and bvec_nr_vecs() are only used inside block
layer core functions, no need to declare them in public header.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Pavel Begunkov <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|