aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-06-01Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-4.18/blockJens Axboe11-193/+351
Pull NVMe changes from Christoph: "Below is another set of NVMe updates for 4.18. Besides the usual bug fixes this includes more feature completness in terms of AEN and log page handling on the target." * 'nvme-4.18' of git://git.infradead.org/nvme: nvme: use the changed namespaces list log to clear ns data changed AENs nvme: mark nvme_queue_scan static nvme: submit AEN event configuration on startup nvmet: mask pending AENs nvmet: add AEN configuration support nvmet: implement the changed namespaces log nvmet: split log page implementation nvmet: add a new nvmet_zero_sgl helper nvme.h: add AEN configuration symbols nvme.h: add the changed namespace list log nvme.h: untangle AEN notice definitions nvmet: fix error return code in nvmet_file_ns_enable() nvmet: fix a typo in nvmet_file_ns_enable() nvme-fabrics: allow internal passthrough command on deleting controllers nvme-loop: add support for multiple ports nvme-pci: simplify __nvme_submit_cmd nvme-pci: Rate limit the nvme timeout warnings nvme: allow duplicate controller if prior controller being deleted
2018-06-01block: split the blk-mq case from elevator_initChristoph Hellwig3-32/+48
There is almost no shared logic, which leads to a very confusing code flow. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-06-01block: move sysfs_lock into elevator_initChristoph Hellwig5-28/+8
Both callers take just around so function call, so move it in. Also remove the now pointless blk_mq_sched_init wrapper. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-06-01block: remove the always unused name argument to elevator_initChristoph Hellwig4-11/+5
Reported-by: Damien Le Moal <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-06-01block: unexport elevator_init/exitChristoph Hellwig3-4/+2
These are only used by the block core. Also move the declarations to block/blk.h. Reported-by: Damien Le Moal <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-06-01block: move initialization of elevator-related fields to blk_alloc_queue_nodeChristoph Hellwig2-5/+5
No point in doing this in elevator_init. Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: Damien Le Moal <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-06-01nvme: use the changed namespaces list log to clear ns data changed AENsChristoph Hellwig2-4/+49
Per section 5.2 we need to issue the corresponding log page to clear an AEN, so for a namespace data changed AEN we need to read the changed namespace list log. And once we read that log anyway we might as well use it to optimize the rescan. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2018-06-01nvme: mark nvme_queue_scan staticChristoph Hellwig2-11/+9
And move it toward the top of the file to avoid a forward declaration. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]>
2018-06-01nvme: submit AEN event configuration on startupHannes Reinecke2-0/+18
We should register for AEN events; some law-abiding targets might not be sending us AENs otherwise. Signed-off-by: Hannes Reinecke <[email protected]> [hch: slight cleanups] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2018-06-01nvmet: mask pending AENsChristoph Hellwig3-1/+10
Per section 5.2 of the NVMe 1.3 spec: "When the controller posts a completion queue entry for an outstanding Asynchronous Event Request command and thus reports an asynchronous event, subsequent events of that event type are automatically masked by the controller until the host clears that event. An event is cleared by reading the log page associated with that event using the Get Log Page command (see section 5.14)." Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2018-06-01nvmet: add AEN configuration supportChristoph Hellwig3-2/+32
AEN configuration via the 'Get Features' and 'Set Features' admin command is mandatory, so we should be implemeting handling for it. Signed-off-by: Hannes Reinecke <[email protected]> [hch: use WRITE_ONCE, check for invalid values] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Daniel Verkamp <[email protected]>
2018-06-01nvmet: implement the changed namespaces logChristoph Hellwig3-9/+76
Just keep a per-controller buffer of changed namespaces and copy it out in the get log page implementation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Daniel Verkamp <[email protected]>
2018-06-01nvmet: split log page implementationChristoph Hellwig1-63/+36
Remove the common code to allocate a buffer and copy it into the SGL. Instead the two no-op implementations just zero the SGL directly, and the smart log allocates a buffer on its own. This prepares for the more elaborate ANA log page. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]>
2018-06-01nvmet: add a new nvmet_zero_sgl helperChristoph Hellwig2-0/+8
Zeroes the SGL in the payload. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]>
2018-06-01nvme.h: add AEN configuration symbolsHannes Reinecke1-0/+5
Signed-off-by: Hannes Reinecke <[email protected]> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]>
2018-05-31nvme.h: add the changed namespace list logChristoph Hellwig1-0/+3
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]>
2018-05-31nvme.h: untangle AEN notice definitionsChristoph Hellwig2-14/+24
Stop including the event type in the definitions for the notice type. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]>
2018-05-31nvmet: fix error return code in nvmet_file_ns_enable()Wei Yongjun1-2/+6
Fix to return error code -ENOMEM from the memory alloc fail error handling case instead of 0, as done elsewhere in this function. Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Wei Yongjun <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-05-31nvmet: fix a typo in nvmet_file_ns_enable()Wei Yongjun1-1/+1
Fix a typo in nvmet_file_ns_enable(). Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Wei Yongjun <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-05-31nvme-fabrics: allow internal passthrough command on deleting controllersChristoph Hellwig1-48/+31
Without this we can't cleanly shut down. Based on analysis an an earlier patch from Hannes Reinecke. Fixes: bb06ec31452f ("nvme: expand nvmf_check_if_ready checks") Reported-by: Hannes Reinecke <[email protected]> Tested-by: Hannes Reinecke <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: James Smart <[email protected]>
2018-05-31block, bfq: prevent soft_rt_next_start from being stuck at infinityDavide Sapienza1-27/+2
BFQ can deem a bfq_queue as soft real-time only if the queue - periodically becomes completely idle, i.e., empty and with no still-outstanding I/O request; - after becoming idle, gets new I/O only after a special reference time soft_rt_next_start. In this respect, after commit "block, bfq: consider also past I/O in soft real-time detection", the value of soft_rt_next_start can never decrease. This causes a problem with the following special updating case for soft_rt_next_start: to prevent queues that are not completely idle to be wrongly detected as soft real-time (when they become non-empty again), soft_rt_next_start is temporarily set to infinity for empty queues with still outstanding I/O requests. But, if such an update is actually performed, then, because of the above commit, soft_rt_next_start will be stuck at infinity forever, and the queue will have no more chance to be considered soft real-time. On slow systems, this problem does cause actual soft real-time applications to be occasionally not detected as such. This commit addresses this issue by eliminating the pushing of soft_rt_next_start to infinity, and by changing the way non-empty queues are prevented from being wrongly detected as soft real-time. Simply, a queue that becomes non-empty again can now be detected as soft real-time only if it has no outstanding I/O request. Signed-off-by: Davide Sapienza <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-31block, bfq: increase weight-raising duration for interactive appsDavide Sapienza1-11/+15
The maximum possible duration of the weight-raising period for interactive applications is limited to 13 seconds, as this is the time needed to load the largest application that we considered when tuning weight raising. Unfortunately, in such an evaluation, we did not consider the case of very slow virtual machines. For example, on a QEMU/KVM virtual machine - running in a slow PC; - with a virtual disk stacked on a slow low-end 5400rpm HDD; - serving a heavy I/O workload, such as the sequential reading of several files; mplayer takes 23 seconds to start, if constantly weight-raised. To address this issue, this commit conservatively sets the upper limit for weight-raising duration to 25 seconds. Signed-off-by: Davide Sapienza <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-31block, bfq: remove slow-system classPaolo Valente2-105/+46
BFQ computes the duration of weight raising for interactive applications automatically, using some reference parameters. In particular, BFQ uses the best durations (see comments in the code for how these durations have been assessed) for two classes of systems: slow and fast ones. Examples of slow systems are old phones or systems using micro HDDs. Fast systems are all the remaining ones. Using these parameters, BFQ computes the actual duration of the weight raising, for the system at hand, as a function of the relative speed of the system w.r.t. the speed of a reference system, belonging to the same class of systems as the system at hand. This slow vs fast differentiation proved to be useful in the past, but happens to have little meaning with current hardware. Even worse, it does cause problems in virtual systems, where the speed of the system can vary frequently, and so widely to just confuse the class-detection mechanism, and, as we have verified experimentally, to cause BFQ to compute non-sensical weight-raising durations. This commit addresses this issue by removing the slow class and the class-detection mechanism. Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-31block, bfq: add description of weight-raising heuristicsPaolo Valente1-24/+56
A description of how weight raising works is missing in BFQ sources. In addition, the code for handling weight raising is scattered across a few functions. This makes it rather hard to understand the mechanism and its rationale. This commits adds such a description at the beginning of the main source file. Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-31block, bfq: remove the removal of 'next' rq in bfq_requests_mergedFilippo Muzzini1-7/+0
Since bfq_finish_request() is always called on the request 'next', after bfq_requests_merged() is finished, and bfq_finish_request() removes 'next' from its bfq_queue if needed, it isn't necessary to do such a removal in advance in bfq_merged_requests(). This commit removes such a useless 'next' removal. Signed-off-by: Filippo Muzzini <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-31block, bfq: remove wrong check in bfq_requests_mergedPaolo Valente1-6/+20
The request rq passed to the function bfq_requests_merged is always in a bfq_queue, so the check !RB_EMPTY_NODE(&rq->rb_node) at the beginning of bfq_requests_merged always succeeds, and the control flow systematically skips to the end of the function. This implies that the body of the function is never executed, i.e., the repositioning of rq is never performed. On the opposite end, a control is missing in the body of the function: 'next' must be removed only if it is inside a bfq_queue. This commit removes the wrong check on rq, and adds the missing check on 'next'. In addition, this commit adds comments on bfq_requests_merged. Signed-off-by: Filippo Muzzini <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-31block, bfq: remove wrong lock in bfq_requests_mergedFilippo Muzzini1-2/+0
In bfq_requests_merged(), there is a deadlock because the lock on bfqq->bfqd->lock is held by the calling function, but the code of this function tries to grab the lock again. This deadlock is currently hidden by another bug (fixed by next commit for this source file), which causes the body of bfq_requests_merged() to be never executed. This commit removes the deadlock by removing the lock/unlock pair. Signed-off-by: Filippo Muzzini <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30block: fixup bioset_integrity_create() callJens Axboe1-1/+1
Missed converting the bioset_integrity_create() bounce bio set call. Fixes: 338aa96d5661 ("block: convert bounce, q->bio_split to bioset_init()/mempool_init()") Signed-off-by: Jens Axboe <[email protected]>
2018-05-30block: Drop bioset_create()Kent Overstreet2-52/+15
All users have been converted to bioset_init(), kill off the old API. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30xfs: convert to bioset_init()/mempool_init()Kent Overstreet3-8/+7
Convert XFS to embedded bio sets. Acked-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30btrfs: convert to bioset_init()/mempool_init()Kent Overstreet1-14/+11
Convert btrfs to embedded bio sets. Acked-by: Chris Mason <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30fs: convert block_dev.c to bioset_init()Kent Overstreet1-6/+3
Convert block DIO code to embedded bio sets. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30target: convert to bioset_init()/mempool_init()Kent Overstreet2-9/+7
Convert the target code to embedded bio sets. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30dm: convert to bioset_init()/mempool_init()Kent Overstreet17-206/+197
Convert dm to embedded bio sets. Acked-by: Mike Snitzer <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30md: convert to bioset_init()/mempool_init()Kent Overstreet15-181/+159
Convert md to embedded bio sets. Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30bcache: convert to bioset_init()/mempool_init()Kent Overstreet7-52/+37
Convert bcache to embedded bio sets. Reviewed-by: Coly Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30lightnvm: convert to bioset_init()/mempool_init()Kent Overstreet6-65/+65
Convert lightnvm to embedded bio sets. Reviewed-by: Javier González <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30pktcdvd: convert to bioset_init()/mempool_init()Kent Overstreet2-26/+26
Convert pktcdvd to embedded bio sets. Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30drbd: convert to bioset_init()/mempool_init()Kent Overstreet6-59/+38
Convert drbd to embedded bio sets and mempools. Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30block: convert bounce, q->bio_split to bioset_init()/mempool_init()Kent Overstreet7-33/+41
Convert the core block functionality to embedded bio sets. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30blk-throttle: return proper bool type to caller instead of 0/1Chengguang Xu1-5/+5
Change to return true/false only for bool type return code. Signed-off-by: Chengguang Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iterChristoph Hellwig5-21/+4
We already check for started commands in all callbacks, but we should also protect against already completed commands. Do this by taking the checks to common code. Acked-by: Josef Bacik <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30nbd: clear DISCONNECT_REQUESTED flag once disconnection occurs.Kevin Vigor1-6/+15
When a userspace client requests a NBD device be disconnected, the DISCONNECT_REQUESTED flag is set. While this flag is set, the driver will not inform userspace when a connection is closed. Unfortunately the flag was never cleared, so once a disconnect was requested the driver would thereafter never tell userspace about a closed connection. Thus when connections failed due to timeout, no attempt to reconnect was made and eventually the device would fail. Fix by clearing the DISCONNECT_REQUESTED flag (and setting the DISCONNECTED flag) once all connections are closed. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Kevin Vigor <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30blk-throttle: fix potential NULL pointer dereference in throtl_select_dispatchLiu Bo1-1/+2
tg in throtl_select_dispatch is used first and then do check. Since tg may be NULL, it has potential NULL pointer dereference risk. So fix it. Signed-off-by: Joseph Qi <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30block: kyber: make kyber more friendly with mergingJianchao Wang1-32/+158
Currently, kyber is very unfriendly with merging. kyber depends on ctx rq_list to do merging, however, most of time, it will not leave any requests in ctx rq_list. This is because even if tokens of one domain is used up, kyber will try to dispatch requests from other domain and flush the rq_list there. To improve this, we setup kyber_ctx_queue (kcq) which is similar with ctx, but it has rq_lists for different domain and build same mapping between kcq and khd as the ctx & hctx. Then we could merge, insert and dispatch for different domains separately. At the same time, only flush the rq_list of kcq when get domain token successfully. Then if one domain token is used up, the requests could be left in the rq_list of that domain and maybe merged with following io. Following is my test result on machine with 8 cores and NVMe card INTEL SSDPEKKR128G7 fio size=256m ioengine=libaio iodepth=64 direct=1 numjobs=8 seq/random +------+---------------------------------------------------------------+ |patch?| bw(MB/s) | iops | slat(usec) | clat(usec) | merge | +----------------------------------------------------------------------+ | w/o | 606/612 | 151k/153k | 6.89/7.03 | 3349.21/3305.40 | 0/0 | +----------------------------------------------------------------------+ | w/ | 1083/616 | 277k/154k | 4.93/6.95 | 1830.62/3279.95 | 223k/3k | +----------------------------------------------------------------------+ When set numjobs to 16, the bw and iops could reach 1662MB/s and 425k on my platform. Signed-off-by: Jianchao Wang <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30blk-mq: abstract out blk-mq-sched rq list iteration bio merge helperJens Axboe2-11/+26
No functional changes in this patch, just a prep patch for utilizing this in an IO scheduler. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]>
2018-05-30nvme-loop: add support for multiple portsChristoph Hellwig2-19/+31
This is useful at least for multipath testing. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]>
2018-05-30nvme-pci: simplify __nvme_submit_cmdChristoph Hellwig1-23/+14
With recent CQ handling improvements we can now move the locking into __nvme_submit_cmd. Also remove the local tail variable to make the code more obvious, remove the __ prefix in the name, and fix the comments describing the function. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]>
2018-05-30nvme-pci: Rate limit the nvme timeout warningsKeith Busch1-1/+1
The block layer's timeout handling currently prevents drivers from completing commands outside the timeout callback once blk-mq decides they've expired. If a device breaks, this could potentially create many thousands of timed out commands. There's nothing of value to be gleaned from observing each of those messages, so this patch adds a rate limit on them. Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-05-30nvme: allow duplicate controller if prior controller being deletedJames Smart1-1/+3
The current checks for whether a new controller request "matches" an existing controller ignores controller state and checks identity strings. There are cases where an existing controller may be in its last steps of deletion when they are "matched" by a new connection. Change the behavior so that the new connection ignores controllers that are deleted. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>