aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-13blk-mq: don't run the hw_queue from blk_mq_request_bypass_insertChristoph Hellwig3-16/+15
blk_mq_request_bypass_insert takes a bool parameter to control how to run the queue at the end of the function. Move the blk_mq_run_hw_queue call to the callers that want it instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: don't run the hw_queue from blk_mq_insert_requestChristoph Hellwig1-24/+32
blk_mq_insert_request takes two bool parameters to control how to run the queue at the end of the function. Move the blk_mq_run_hw_queue call to the callers that want it instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: fold __blk_mq_try_issue_directly into its two callersChristoph Hellwig1-41/+31
Due to the wildly different behavior based on the bypass_insert argument, not a whole lot of code in __blk_mq_try_issue_directly is actually shared between blk_mq_try_issue_directly and blk_mq_request_issue_directly. Remove __blk_mq_try_issue_directly and fold the code into the two callers instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: factor out a blk_mq_get_budget_and_tag helperChristoph Hellwig1-10/+16
Factor out a helper from __blk_mq_try_issue_directly in preparation of folding that function into its two callers. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: refactor the DONTPREP/SOFTBARRIER andling in blk_mq_requeue_workChristoph Hellwig1-10/+11
Split the RQF_DONTPREP and RQF_SOFTBARRIER in separate branches to make the code more readable. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: refactor passthrough vs flush handling in blk_mq_insert_requestChristoph Hellwig1-32/+18
While both passthrough and flush requests call directly into blk_mq_request_bypass_insert, the parameters aren't the same. Split the handling into two separate conditionals and turn the whole function into an if/elif/elif/else flow instead of the gotos. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: remove blk_flush_queue_rqChristoph Hellwig1-7/+2
Just call blk_mq_add_to_requeue_list directly from the two callers. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: fold __blk_mq_insert_req_list into blk_mq_insert_requestChristoph Hellwig1-18/+7
Remove this very small helper and fold it into the only caller. Note that this moves the trace_block_rq_insert out of ctx->lock, matching the other calls to this tracepoint. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: fold __blk_mq_insert_request into blk_mq_insert_requestChristoph Hellwig2-14/+2
There is no good point in keeping the __blk_mq_insert_request around for two function calls and a singler caller. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: move blk_mq_sched_insert_request to blk-mq.cChristoph Hellwig4-83/+82
blk_mq_sched_insert_request is the main request insert helper and not directly I/O scheduler related. Move blk_mq_sched_insert_request to blk-mq.c, rename it to blk_mq_insert_request and mark it static. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: fold blk_mq_sched_insert_requests into blk_mq_dispatch_plug_listChristoph Hellwig5-34/+14
blk_mq_dispatch_plug_list is the only caller of blk_mq_sched_insert_requests, and it makes sense to just fold it there as blk_mq_sched_insert_requests isn't specific to I/O schedulers despite the name. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: move more logic into blk_mq_insert_requestsChristoph Hellwig3-20/+21
Move all logic related to the direct insert (including the call to blk_mq_run_hw_queue) into blk_mq_insert_requests to streamline the code flow up a bit, and to allow marking blk_mq_try_issue_list_directly static. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: include <linux/blk-mq.h> in block/blk-mq.hChristoph Hellwig15-14/+1
block/blk-mq.h needs various definitions from <linux/blk-mq.h>, include it there instead of relying on the source files to include both. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: remove blk-mq-tag.hChristoph Hellwig13-85/+60
blk-mq-tag.h is always included by blk-mq.h, and causes recursive inclusion hell with further changes. Just merge it into blk-mq.h instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-mq: don't plug for head insertions in blk_execute_rq_nowaitChristoph Hellwig1-1/+1
Plugs never insert at head, so don't plug for head insertions. Fixes: 1c2d2fff6dc0 ("block: wire-up support for passthrough plugging") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-throttle: only enable blk-stat when BLK_DEV_THROTTLING_LOWChengming Zhou1-1/+2
blk_throtl_register() will unconditionally enable blk-stat for gendisk when register, even when we have no BLK_DEV_THROTTLING_LOW config. Since the kernel always has only BLK_DEV_THROTTLING config and the BLK_DEV_THROTTLING_LOW config is still in EXPERIMENTAL state, we can just skip blk-stat when !BLK_DEV_THROTTLING_LOW. Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-stat: fix QUEUE_FLAG_STATS clearChengming Zhou1-2/+2
We need to set QUEUE_FLAG_STATS for two cases: 1. blk_stat_enable_accounting() 2. blk_stat_add_callback() So we should clear it only when ((q->stats->accounting == 0) && list_empty(&q->stats->callbacks)). blk_stat_disable_accounting() only check if q->stats->accounting is 0 before clear the flag, this patch fix it. Also add list_empty(&q->stats->callbacks)) check when enable, or the flag is already set. The bug can be reproduced on kernel without BLK_DEV_THROTTLING (since it unconditionally enable accounting, see the next patch). # cat /sys/block/sr0/queue/scheduler none mq-deadline [bfq] # cat /sys/kernel/debug/block/sr0/state SAME_COMP|IO_STAT|INIT_DONE|STATS|REGISTERED|NOWAIT|30 # echo none > /sys/block/sr0/queue/scheduler # cat /sys/kernel/debug/block/sr0/state SAME_COMP|IO_STAT|INIT_DONE|REGISTERED|NOWAIT # cat /sys/block/sr0/queue/wbt_lat_usec 75000 We can see that after changing elevator from "bfq" to "none", "STATS" flag is lost even though WBT callback still need it. Fixes: 68497092bde9 ("block: make queue stat accounting a reference") Cc: <[email protected]> # v5.17+ Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-iolatency: Make initialization lazyTejun Heo3-15/+28
Other rq_qos policies such as wbt and iocost are lazy-initialized when they are configured for the first time for the device but iolatency is initialized unconditionally from blkcg_init_disk() during gendisk init. Lazy init is beneficial because rq_qos policies add runtime overhead when initialized as every IO has to walk all registered rq_qos callbacks. This patch switches iolatency to lazy initialization too so that it only registered its rq_qos policy when it is first configured. Note that there is a known race condition between blkcg config file writes and del_gendisk() and this patch makes iolatency susceptible to it by exposing the init path to race against the deletion path. However, that problem already exists in iocost and is being worked on. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blk-iolatency: s/blkcg_rq_qos/iolat_rq_qos/Tejun Heo2-2/+2
The name was too generic given that there are multiple blkcg rq-qos policies. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blkcg: Restructure blkg_conf_prep() and friendsTejun Heo6-78/+127
We want to support lazy init of rq-qos policies so that iolatency is enabled lazily on configuration instead of gendisk initialization. The way blkg config helpers are structured now is a bit awkward for that. Let's restructure: * blkcg_conf_open_bdev() is renamed to blkg_conf_open_bdev(). The blkcg_ prefix was used because the bdev opening step is blkg-independent. However, the distinction is too subtle and confuses more than helps. Let's switch to blkg prefix so that it's consistent with the type and other helper names. * struct blkg_conf_ctx now remembers the original input string and is always initialized by the new blkg_conf_init(). * blkg_conf_open_bdev() is updated to take a pointer to blkg_conf_ctx like blkg_conf_prep() and can be called multiple times safely. Instead of modifying the double pointer to input string directly, blkg_conf_open_bdev() now sets blkg_conf_ctx->body. * blkg_conf_finish() is renamed to blkg_conf_exit() for symmetry and now must be called on all blkg_conf_ctx's which were initialized with blkg_conf_init(). Combined, this allows the users to either open the bdev first or do it altogether with blkg_conf_prep() which will help implementing lazy init of rq-qos policies. blkg_conf_init/exit() will also be used implement synchronization against device removal. This is necessary because iolat / iocost are configured through cgroupfs instead of one of the files under /sys/block/DEVICE. As cgroupfs operations aren't synchronized with block layer, the lazy init and other configuration operations may race against device removal. This patch makes blkg_conf_init/exit() used consistently for all cgroup-orginating configurations making them a good place to implement explicit synchronization. Users are updated accordingly. No behavior change is intended by this patch. v2: bfq wasn't updated in v1 causing a build error. Fixed. v3: Update the description to include future use of blkg_conf_init/exit() as synchronization points. Signed-off-by: Tejun Heo <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13blkcg: Drop unnecessary RCU read [un]locks from blkg_conf_prep/finish()Tejun Heo1-9/+4
Now that all RCU flavors have been combined either holding a spin lock, disabling irq or disabling preemption implies RCU read lock, so there's no need to use rcu_read_[un]lock() explicitly while holding queue_lock. This shouldn't cause any behavior changes. v2: Description updated. Leave __acquires/release on queue_lock alone. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-13nvme-fcloop: fix "inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage"Ming Lei1-21/+27
fcloop_fcp_op() could be called from flush request's ->end_io(flush_end_io) in which the spinlock of fq->mq_flush_lock is grabbed with irq saved/disabled. So fcloop_fcp_op() can't call spin_unlock_irq(&tfcp_req->reqlock) simply which enables irq unconditionally. Fixes the warning by switching to spin_lock_irqsave()/spin_unlock_irqrestore() Fixes: c38dbbfab1bc ("nvme-fcloop: fix inconsistent lock state warnings") Reported-by: Yi Zhang <[email protected]> Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Ewan D. Milne <[email protected]> Tested-by: Yi Zhang <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13blk-mq-rdma: remove queue mapping helper for rdma devicesSagi Grimberg5-66/+2
No rdma device exposes its irq vectors affinity today. So the only mapping that we have left, is the default blk_mq_map_queues, which we fallback to anyways. Also fixup the only consumer of this helper (nvme-rdma). Remove this now dead code. Signed-off-by: Sagi Grimberg <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvme-rdma: minor cleanup in nvme_rdma_create_cq()zhenwei pi1-8/+4
Before cleanup: enum ib_poll_context poll_ctx; if (nvme_rdma_poll_queue(queue)) { poll_ctx = IB_POLL_DIRECT; queue->ib_cq = ib_alloc_cq(ibdev, queue, queue->cq_size, comp_vector, poll_ctx); } else { poll_ctx = IB_POLL_SOFTIRQ; queue->ib_cq = ib_cq_pool_get(ibdev, queue->cq_size, comp_vector, poll_ctx); } After cleanup: if (nvme_rdma_poll_queue(queue)) queue->ib_cq = ib_alloc_cq(ibdev, queue, queue->cq_size, comp_vector, IB_POLL_DIRECT); else queue->ib_cq = ib_cq_pool_get(ibdev, queue->cq_size, comp_vector, IB_POLL_SOFTIRQ); IB_POLL_SOFTIRQ/IB_POLL_SOFTIRQ gets used directly in function, this seems more accessible. Signed-off-by: zhenwei pi <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvme: fix double blk_mq_complete_request for timeout request with low ↵Lei Yin1-2/+2
probability When nvme_cancel_tagset traverses all tagsets and executes nvme_cancel_request, this request may be executing blk_mq_free_request that is called by nvme_rdma_complete_timed_out/nvme_tcp_complete_timed_out. When blk_mq_free_request executes to WRITE_ONCE(rq->state, MQ_RQ_IDLE) and __blk_mq_free_request(rq), it will cause double blk_mq_complete_request for this request, and it will cause a null pointer error in the second execution of this function because rq->mq_hctx has set to NULL in first execution. Signed-off-by: Lei Yin <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvme: fix async event trace eventKeith Busch2-13/+7
Mixing AER Event Type and Event Info has masking clashes. Just print the event type, but also include the event info of the AER result in the trace. Fixes: 09bd1ff4b15143b ("nvme-core: add async event trace helper") Reported-by: Nate Thornton <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Minwoo Im <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvme-apple: return directly instead of elseChaitanya Kulkarni1-2/+2
There is no need for the else when direct return is used at the end of the function. Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Eric Curtin <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvme-apple: return directly instead of elseChaitanya Kulkarni1-2/+2
There is no need for the else when direct return is used at the end of the function. Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Eric Curtin <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvmet-tcp: validate idle poll modparam valueChaitanya Kulkarni1-2/+3
The module parameter idle_poll_period_usecs is passed to the function usecs_to_jiffies() which has following prototype and expect idle_poll_period_usecs arg type to be unsigned int:- unsigned long usecs_to_jiffies(const unsigned int u); Use similar module parameter validation callback as previous patch. Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvmet-tcp: validate so_priority modparam valueChaitanya Kulkarni1-2/+27
The module parameter so_priority is passed to the function sock_set_priority() which has following prototype and expect priotity arg type to be u32:- void sock_set_priority(struct sock *sk, u32 priority); Add a module parameter validation callback to reject any negative values for the so_priority as it is defigned as int. Use this oppurtunity to update the module parameter description and print the default value. Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvme-tcp: fence TCP socket on receive errorChris Leech1-0/+3
Ensure that no further socket reads occur after a receive processing error, either from io_work being re-scheduled or nvme_tcp_poll. Failing to do so can result in unrecognised PDU payloads or TCP stream garbage being processed as a C2H data PDU, and potentially start copying the payload to an invalid destination after looking up a request using a bogus command id. Signed-off-by: Chris Leech <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: John Meneghini <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvmet: remove nvmet_req_cns_error_completeChristoph Hellwig2-9/+4
Just fold it into the only caller. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2023-04-13nvmet: rename nvmet_execute_identify_cns_cs_nsChristoph Hellwig3-4/+4
nvmet_execute_identify_ns_zns is a more descriptive name for the function handling the "I/O Command Set Specific Identify Namespace Data Structure for the Zoned Namespace Command Set". Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Damien Le Moal <[email protected]>
2023-04-13nvmet: fix Identify Identification Descriptor List handlingChristoph Hellwig1-19/+1
The Identification Descriptor List CNS value does not check the CSI value, so remove the code trying to handle it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Damien Le Moal <[email protected]>
2023-04-13nvmet: cleanup nvmet_execute_identify()Damien Le Moal1-16/+19
Change the order of the cases in nvmet_execute_identify() main switch-case to match the NVMe 2.0 specification order as defined in table 273. This is also the increasing order of CNS values. While at it, for clarity, make it explicit that identify with cns set to NVME_ID_CNS_CS_NS does not support NVM command set specific data. No functional changes are introduced by this cleanup. Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Tested-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvmet: fix I/O Command Set specific Identify ControllerDamien Le Moal3-8/+18
For an identify command with cns set to NVME_ID_CNS_CS_CTRL, the NVMe 2.0 specification states that: If the I/O Command Set specified by the CSI field does not have an Identify Controller data structure, then the controller shall return a zero filled data structure. If the host requests a data structure for an I/O Command Set that the controller does not support, the controller shall abort the command with a status code of Invalid Field in Command. However, the current implementation of this identify command in nvmet_execute_identify() only handles the ZNS command set, returning an error for the NVM command set, which is not compliant with the specifications as we do support this command set. Fix this by: 1) Renaming nvmet_execute_identify_cns_cs_ctrl() to nvmet_execute_identify_ctrl_zns() to continue handling the ZNS command set as is. 2) Introduce a nvmet_execute_identify_ctrl_ns() helper to handle the NVM command set, returning a zero filled nvme_id_ctrl_nvm data structure. 3) Modify nvmet_execute_identify() to call these helpers based on the csi specified, returning an error for unsupported command sets. Fixes: aaf2e048af27 ("nvmet: add ZBD over ZNS backend support") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Tested-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvmet: fix Identify Active Namespace ID list handlingDamien Le Moal1-7/+2
The identify command with cns set to NVME_ID_CNS_NS_ACTIVE_LIST does not depend on the command set. The execution of this command should thus not look at the csi field specified in the command. Simplify nvmet_execute_identify() to directly call nvmet_execute_identify_nslist() without the csi switch-case. Fixes: ab5d0b38c047 ("nvmet: add Command Set Identifier support") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Tested-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvmet: fix Identify Controller handlingDamien Le Moal1-5/+2
The identify command with cns set to NVME_ID_CNS_CTRL does not depend on the command set. The execution of this command should thus not look at the csi specified in the command. Simplify nvmet_execute_identify() to directly call nvmet_execute_identify_ctrl() without the csi switch-case. Fixes: ab5d0b38c047 ("nvmet: add Command Set Identifier support") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Tested-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvmet: fix Identify Namespace handlingDamien Le Moal1-7/+2
The identify command with cns set to NVME_ID_CNS_NS does not directly depend on the command set. The NVMe specifications is rather confusing here as it appears that this command only applies to the NVM command set. However, footnote 8 of Figure 273 in the NVMe 2.0 base specifications clearly state that this command applies to NVM command sets that support logical blocks, that is, NVM and ZNS. Both the NVM and ZNS command set specifications also list this identify as mandatory. The command handling should thus not look at the csi field since it is defined as unused for this command. Given that we do not support the KV command set, simply remove the csi switch-case for that command handling and call directly nvmet_execute_identify_ns() in nvmet_execute_identify(). Fixes: ab5d0b38c047 ("nvmet: add Command Set Identifier support") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Tested-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvmet: fix error handling in nvmet_execute_identify_cns_cs_ns()Damien Le Moal1-7/+9
Nvme specifications state that: If the I/O Command Set associated with the namespace identified by the NSID field does not support the Identify Namespace data structure specified by the CSI field, the controller shall abort the command with a status code of Invalid Field in Command. In other words, if nvmet_execute_identify_cns_cs_ns() is called for a target with a block device that is not zoned, we should not return any data and set the status to NVME_SC_INVALID_FIELD. While at it, it is also better to revalidate the ns block devie *before* checking if the block device is zoned, to ensure that nvmet_execute_identify_cns_cs_ns() operates against updated device characteristics. Fixes: aaf2e048af27 ("nvmet: add ZBD over ZNS backend support") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-13nvme-pci: drop redundant pci_enable_pcie_error_reporting()Bjorn Helgaas1-5/+1
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-11s390/dasd: fix hanging blockdevice after request requeueStefan Haberland1-1/+1
The DASD driver does not kick the requeue list when requeuing IO requests to the blocklayer. This might lead to hanging blockdevice when there is no other trigger for this. Fix by automatically kick the requeue list when requeuing DASD requests to the blocklayer. Fixes: e443343e509a ("s390/dasd: blk-mq conversion") CC: [email protected] # 4.14+ Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Reviewed-by: Halil Pasic <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-11s390/dasd: add autoquiesce event for start IO errorStefan Haberland1-0/+2
Add a check for errors in the start_io function that signal a not working device. Trigger an autoquiesce event in that case. Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Reviewed-by: Halil Pasic <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-11s390/dasd: add aq_timeouts autoquiesce triggerStefan Haberland4-0/+60
Add a sysfs attribute aq_timeouts that controls after how many timeouts a autoquiesce event might be triggered. The default value is 32768 which is the maximum number of retries for the DASD device driver DASD_RETRIES_MAX. This means that the timeout trigger will never happen. The default value for DASD retries is 255. Setting the value to below 255 will trigger the timeout autoquiesce event before an IO error is generated. Also add the check for the configured amount of timeouts and trigger an autoquiesce event if exceeded. Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Reviewed-by: Halil Pasic <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-11s390/dasd: add aq_requeue sysfs attributeStefan Haberland1-0/+36
Add a sysfs attribute to control if all IO requests will be requeued to the blocklayer in case of an autoquiesce event or not. A value of 1 means that in case of an autoquiesce event all IO requests will be requeued to the blocklayer. A value of 0 means that the device will only be stopped. Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Reviewed-by: Halil Pasic <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-11s390/dasd: add aq_mask sysfs attributeStefan Haberland2-7/+59
Add sysfs attribute that controls the DASD autoquiesce feature. The autoquiesce is disabled when 0 is echoed to the attribute. A value greater than 0 will enable the feature. The aq_mask attribute will accept an unsigned integer and the value will be interpreted as bitmask defining the trigger events that will lead to an automatic quiesce. The following autoquiesce triggers will currently be available: DASD_EER_FATALERROR 1 - any final I/O error DASD_EER_NOPATH 2 - no remaining paths for the device DASD_EER_STATECHANGE 3 - a state change interrupt occurred DASD_EER_PPRCSUSPEND 4 - the device is PPRC suspended DASD_EER_NOSPC 5 - there is no space remaining on an ESE device DASD_EER_TIMEOUT 6 - a certain amount of timeouts occurred DASD_EER_STARTIO 7 - the IO start function encountered an error The currently supported maximum value is 255. Bit 31 is reserved for internal usage. Bit 0 is not used. Example: - deactivate autoquiesce $ echo 0 > /sys/bus/ccw/0.0.1234/aq_mask - enable autoquiesce for FATALERROR, NOPATH and TIMEOUT (0000 0000 0000 0000 0000 0000 0100 0110 => 70) $ echo 70 > /sys/bus/ccw/0.0.1234/aq_mask Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Reviewed-by: Halil Pasic <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-11s390/dasd: add autoquiesce featureStefan Haberland4-17/+48
Add the internal logic to check for autoquiesce triggers and handle them. Quiesce and resume are functions that tell Linux to stop/resume issuing I/Os to a specific DASD. The DASD driver allows a manual quiesce/resume via ioctl. Autoquiesce will define an amount of triggers that will lead to an automatic quiesce if a certain event occurs. There is no automatic resume. All events will be reported via DASD Extended Error Reporting (EER) if configured. Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Reviewed-by: Halil Pasic <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-11s390/dasd: remove unused DASD EER definesStefan Haberland1-11/+1
Remove definitions that have never been used. Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Reviewed-by: Halil Pasic <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-06blk-cgroup: delete cpd_init_fn of blkcg_policyChengming Zhou3-13/+2
blkcg_policy cpd_init_fn() is used to just initialize some default fields of policy data, which is enough to do in cpd_alloc_fn(). This patch delete the only user bfq_cpd_init(), and remove cpd_init_fn from blkcg_policy. Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-06blk-cgroup: delete cpd_bind_fn of blkcg_policyChengming Zhou2-22/+0
cpd_bind_fn is just used for update default weight when block subsys attached to a hierarchy. No any policy need it anymore. Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>