aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-11-10nvme: track shared namespacesChristoph Hellwig3-50/+210
Introduce a new struct nvme_ns_head that holds information about an actual namespace, unlike struct nvme_ns, which only holds the per-controller namespace information. For private namespaces there is a 1:1 relation of the two, but for shared namespaces this lets us discover all the paths to it. For now only the identifiers are moved to the new structure, but most of the information in struct nvme_ns should eventually move over. To allow lockless path lookup the list of nvme_ns structures per nvme_ns_head is protected by SRCU, which requires freeing the nvme_ns structure through call_srcu. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Javier González <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: introduce a nvme_ns_ids structureChristoph Hellwig2-34/+49
This allows us to manage the various uniqueue namespace identifiers together instead needing various variables and arguments. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: track subsystemsChristoph Hellwig3-36/+205
This adds a new nvme_subsystem structure so that we can track multiple controllers that belong to a single subsystem. For now we only use it to store the NQN, and to check that we don't have duplicate NQNs unless the involved subsystems support multiple controllers. Includes code originally from Hannes Reinecke to expose the subsystems in sysfs. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10block, nvme: Introduce blk_mq_req_flags_tBart Van Assche8-21/+30
Several block layer and NVMe core functions accept a combination of BLK_MQ_REQ_* flags through the 'flags' argument but there is no verification at compile time whether the right type of block layer flags is passed. Make it possible for sparse to verify this. This patch does not change any functionality. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Oleksandr Natalenko <[email protected]> Cc: [email protected] Cc: Christoph Hellwig <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10block, scsi: Make SCSI quiesce and resume work reliablyBart Van Assche6-25/+70
The contexts from which a SCSI device can be quiesced or resumed are: * Writing into /sys/class/scsi_device/*/device/state. * SCSI parallel (SPI) domain validation. * The SCSI device power management methods. See also scsi_bus_pm_ops. It is essential during suspend and resume that neither the filesystem state nor the filesystem metadata in RAM changes. This is why while the hibernation image is being written or restored that SCSI devices are quiesced. The SCSI core quiesces devices through scsi_device_quiesce() and scsi_device_resume(). In the SDEV_QUIESCE state execution of non-preempt requests is deferred. This is realized by returning BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI devices. Avoid that a full queue prevents power management requests to be submitted by deferring allocation of non-preempt requests for devices in the quiesced state. This patch has been tested by running the following commands and by verifying that after each resume the fio job was still running: for ((i=0; i<10; i++)); do ( cd /sys/block/md0/md && while true; do [ "$(<sync_action)" = "idle" ] && echo check > sync_action sleep 1 done ) & pids=($!) for d in /sys/class/block/sd*[a-z]; do bdev=${d#/sys/class/block/} hcil=$(readlink "$d/device") hcil=${hcil#../../../} echo 4 > "$d/queue/nr_requests" echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth" fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \ --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16 \ --iodepth_batch=1 --thread --loops=$((2**31)) & pids+=($!) done sleep 1 echo "$(date) Hibernating ..." >>hibernate-test-log.txt systemctl hibernate sleep 10 kill "${pids[@]}" echo idle > /sys/block/md0/md/sync_action wait echo "$(date) Done." >>hibernate-test-log.txt done Reported-by: Oleksandr Natalenko <[email protected]> References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348). Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Martin Steigerwald <[email protected]> Tested-by: Oleksandr Natalenko <[email protected]> Cc: Martin K. Petersen <[email protected]> Cc: Ming Lei <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flagBart Van Assche3-0/+37
This flag will be used in the next patch to let the block layer core know whether or not a SCSI request queue has been quiesced. A quiesced SCSI queue namely only processes RQF_PREEMPT requests. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Martin Steigerwald <[email protected]> Tested-by: Oleksandr Natalenko <[email protected]> Cc: Ming Lei <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10ide, scsi: Tell the block layer at request allocation time about preempt ↵Bart Van Assche2-5/+5
requests Convert blk_get_request(q, op, __GFP_RECLAIM) into blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not change any functionality. Signed-off-by: Bart Van Assche <[email protected]> Tested-by: Martin Steigerwald <[email protected]> Acked-by: David S. Miller <[email protected]> [ for IDE ] Acked-by: Martin K. Petersen <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Oleksandr Natalenko <[email protected]> Cc: Ming Lei <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10block: Introduce BLK_MQ_REQ_PREEMPTBart Van Assche3-1/+6
Set RQF_PREEMPT if BLK_MQ_REQ_PREEMPT is passed to blk_get_request_flags(). Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Martin Steigerwald <[email protected]> Tested-by: Oleksandr Natalenko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10block: Introduce blk_get_request_flags()Bart Van Assche2-15/+38
A side effect of this patch is that the GFP mask that is passed to several allocation functions in the legacy block layer is changed from GFP_KERNEL into __GFP_DIRECT_RECLAIM. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Martin Steigerwald <[email protected]> Tested-by: Oleksandr Natalenko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10block: Make q_usage_counter also track legacy requestsMing Lei2-8/+14
This patch makes it possible to pause request allocation for the legacy block layer by calling blk_mq_freeze_queue() and blk_mq_unfreeze_queue(). Signed-off-by: Ming Lei <[email protected]> [ bvanassche: Combined two patches into one, edited a comment and made sure REQ_NOWAIT is handled properly in blk_old_get_request() ] Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: Martin Steigerwald <[email protected]> Tested-by: Oleksandr Natalenko <[email protected]> Cc: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10blk-mq: fix issue with shared tag queue re-runningJens Axboe3-41/+50
This patch attempts to make the case of hctx re-running on driver tag failure more robust. Without this patch, it's pretty easy to trigger a stall condition with shared tags. An example is using null_blk like this: modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 submit_queues=1 hw_queue_depth=1 which sets up 4 devices, sharing the same tag set with a depth of 1. Running a fio job ala: [global] bs=4k rw=randread norandommap direct=1 ioengine=libaio iodepth=4 [nullb0] filename=/dev/nullb0 [nullb1] filename=/dev/nullb1 [nullb2] filename=/dev/nullb2 [nullb3] filename=/dev/nullb3 will inevitably end with one or more threads being stuck waiting for a scheduler tag. That IO is then stuck forever, until someone else triggers a run of the queue. Ensure that we always re-run the hardware queue, if the driver tag we were waiting for got freed before we added our leftover request entries back on the dispatch list. Reviewed-by: Bart Van Assche <[email protected]> Tested-by: Bart Van Assche <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvmet: kill nvmet_inline_bio_initChristoph Hellwig1-14/+4
Much easier to just opencode this helper. Also use ARRAY_SIZE instead of passing the inline bvec array size manually. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvmet: better data length validationChristoph Hellwig5-25/+34
Currently the NVMe target stores the expexted data length in req->data_len and uses that for data transfer decisions, but that does not take the actual transfer length in the SGLs into account. So this adds a new transfer_len field, into which the transport drivers store the actual transfer length. We then check the two match before actually executing the command. The FC transport driver already had such a field, which is removed in favour of the common one. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme-pci: avoid dereference of symbol from unloaded moduleMing Lei1-1/+2
The 'remove_work' may be scheduled to run after nvme_remove() returns since we can't simply cancel it in nvme_remove() for avoiding deadlock. Once nvme_remove() returns, this module(nvme) can be unloaded. On the other hand, nvme_put_ctrl() calls ctr->ops->free_ctrl which may point to nvme_pci_free_ctrl() in unloaded module. This patch avoids this issue by queuing 'remove_work' via 'nvme_wq', and flush this worqueue in nvme_exit() as suggested by Sagi. Suggested-by: Sagi Grimberg <[email protected]> Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: send uevent for some asynchronous eventsKeith Busch3-0/+33
This will give udev a chance to observe and handle asynchronous event notifications and clear the log to unmask future events of the same type. The driver will create a change uevent of the asyncronuos event result before submitting the next AEN request to the device if a completed AEN event is of type error, smart, command set or vendor specific, Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Guan Junxiong <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: unexport starting async event workKeith Busch2-8/+1
Async event work is for core use only and should not be called directly from drivers. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Guan Junxiong <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: remove handling of multiple AEN requestsKeith Busch6-40/+11
The driver can handle tracking only one AEN request, so this patch removes handling for multiple ones. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: James Smart <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme-fc: remove unused "queue_size" fieldKeith Busch1-6/+3
This was being saved in a structure, but never used anywhere. The queue size is obtained through other means, so there's no reason to duplicate this without a user for it. Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Guan Junxiong <[email protected]> Reviewed-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: centralize AEN definesKeith Busch7-58/+30
All the transports were unnecessarilly duplicating the AEN request accounting. This patch defines everything in one place. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Guan Junxiong <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvmet: remove redundant local variableSagi Grimberg1-9/+4
the status is either success or some status id and we don't need a local variable for it. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvmet: remove redundant memset if failed to get_smart_log failedSagi Grimberg1-3/+1
We already allocated the buffer with kzalloc. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10blk-mq: Avoid that request queue removal can trigger list corruptionBart Van Assche1-0/+1
Avoid that removal of a request queue sporadically triggers the following warning: list_del corruption. next->prev should be ffff8807d649b970, but was 6b6b6b6b6b6b6b6b WARNING: CPU: 3 PID: 342 at lib/list_debug.c:56 __list_del_entry_valid+0x92/0xa0 Call Trace: process_one_work+0x11b/0x660 worker_thread+0x3d/0x3b0 kthread+0x129/0x140 ret_from_fork+0x27/0x40 Signed-off-by: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: fix eui_show() print formatJavier González1-1/+1
Fix print formatting, but keep the original output to prevent user breakage as suggested by Joe Perches. Signed-off-by: Javier González <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: compare NQN string with right sizeJavier González1-1/+1
Copy subnqns using NVMF_NQN_SIZE as it is < 256 Signed-off-by: Javier González <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10blk-mq: put driver tag if dispatch budget can't be gotMing Lei1-1/+3
We have to put the driver tag if dispatch budget can't be got, otherwise it might cause IO deadlock, especially in case that size of tags is very small. Fixes: de1482974080(blk-mq: introduce .get_budget and .put_budget in blk_mq_ops) Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10block: pass full fmode_t to blk_verify_commandChristoph Hellwig4-16/+14
Use the obvious calling convention. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10block: remove __bio_kmap_atomicChristoph Hellwig4-21/+8
This helper doesn't buy us much over calling kmap_atomic directly. In fact in the only caller it does a bit of useless work as the caller already has the bvec at hand, and said caller would even buggy for a multi-segment bio due to the use of this helper. So just remove it. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10block: kill bio_kmap/kunmap_irq()Jens Axboe2-13/+2
There are no users of it anymore. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10Revert "blk-mq: don't handle TAG_SHARED in restart"Jens Axboe1-4/+74
This reverts commit 358a3a6bccb74da9d63a26b2dd5f09f1e9970e0b. We have cases that aren't covered 100% in the drivers, so for now we have to retain the shared tag restart loops. Signed-off-by: Jens Axboe <[email protected]>
2017-11-10kthread: zero the kthread data structureShaohua Li1-5/+1
kthread() could bail out early before we initialize blkcg_css (if the kthread is killed very early. Please see xchg() statement in kthread()), which confuses free_kthread_struct. Instead of moving the blkcg_css initialization early, we simply zero the whole 'self' data structure, which doesn't sound much overhead. Reported-by: syzbot <[email protected]> Fixes: 05e3db95ebfc ("kthread: add a mechanism to store cgroup info") Cc: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Dmitry Vyukov <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvmet: fix comment typos in admin-cmd.cMinwoo Im1-2/+2
small typos fixed in admin-cmd.c Signed-off-by: Minwoo Im <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme-rdma: fix nvme_rdma_create_queue_ib error flowMax Gurtovoy1-1/+1
QP object is created using rdma_cm api, therefore the destruction should use the same api for symmetry. Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvmet-rdma: update queue list during ib_device removalIsrael Rukshin1-2/+4
A NULL deref happens when nvmet_rdma_remove_one() is called more than once (e.g. while connected via 2 ports). The first call frees the queues related to the first ib_device but doesn't remove them from the queue list. While calling nvmet_rdma_remove_one() for the second ib_device it goes over the full queue list again and we get the NULL deref. Fixes: f1d4ef7d ("nvmet-rdma: register ib_client to not deadlock in device removal") Signed-off-by: Israel Rukshin <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10lpfc: tie in to new dev_loss_tmo interface in nvme transportJames Smart1-0/+5
This patch calls the new nvme transport routine for dev_loss_tmo whenever the SCSI fc transport calls the lldd to make a dynamic change to a remote ports dev_loss_tmo. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme-fc: decouple ns references from lldd referencesJames Smart1-6/+78
In the lldd api, a lldd may unregister a remoteport (loss of connectivity or driver unload) or localport (driver unload). The lldd must wait for the remoteport_delete or localport_delete before completing its actions post the unregister. The xxx_deletes currently occur only when the xxxport structure is fully freed after all references are removed. Thus the lldd may be held hostage until an app or in-kernel entity that has a namespace open finally closes so the namespace can be removed, the controller removed, thus the transport objects, thus the lldd. This patch decouples the transport and os-facing objects from the lldd and the remoteport and localport. There is a point in all deletions where the transport will no longer interact with the lldd on behalf of a controller. That point centers around the association established with the target/subsystem. It will access the lldd whenever it attempts to create an association and while the association is active. New associations may only be created if the remoteport is live (thus the localport is live). It will not access the lldd after deleting the association. Therefore, the patch tracks the count of active controllers - those with associations being created or that are active - on a remoteport. It also tracks the number of remoteports that have active controllers, on a a localport. When a remoteport is unregistered, as soon as there are no active controllers, the lldd's remoteport_delete may be called and the lldd may continue. Similarly, when a localport is unregistered, as soon as there are no remoteports with active controllers, the localport_delete callback may be made. This significantly speeds up unregistration with the lldd. The transport objects continue in suspended status with reconnect timers running, and upon expiration, normal ref-counting will occur and the objects will be freed. The transport object may still be held hostage by the application/kernel module, but that is acceptable. With this change, the lldd may be fully unloaded and reloaded, and if registrations occur prior to the timeouts, the nvme controller and namespaces will resume normally as if a link bounce. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme-fc: fix localport resume using stale valuesJames Smart1-2/+10
The localport resume was not updating the lldd ops structure. If the lldd is unloaded and reloaded, the ops pointers will differ. Additionally, as there are device references taken by the localport, ensure that resume only resumes if the device matches as well. Signed-off-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: check admin passthru command effectsKeith Busch3-0/+126
The NVMe standard provides a command effects log page so the host may be aware of special requirements it may need to do for a particular command. For example, the command may need to run with IO quiesced to prevent timeouts or undefined behavior, or it may change the logical block formats that determine how the host needs to construct future commands. This patch saves the nvme command effects log page if the controller supports it, and performs appropriate actions before and after an admin passthrough command is completed. If the controller does not support the command effects log page, the driver will define the effects for known opcodes. The nvme format and santize are the only commands in this patch with known effects. Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: factor get log into a helperKeith Busch1-6/+13
And fix the warning on a successful firmware log. Reviewed-by: Javier González <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: fix and clarify the check for missing metadataChristoph Hellwig1-13/+18
Update the check in nvme_setup_rw for missing metadata so that it is together with the other metadata handling, does not contain impossible to reach conditions and warns if we get an impossible requests for a (non-PI) metadata-enabled namespace when CONFIG_BLK_DEV_INTEGRITY is not set. Also add a little helper that checks if a given metadata configuration contains protection information Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: Javier González <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: split __nvme_revalidate_diskChristoph Hellwig1-23/+26
Split out the code that applies the calculate value to a given disk/queue into new helper that can be reused by the multipath code. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: set the chunk size before freezing the queueChristoph Hellwig1-2/+3
We don't need a frozen queue to update the chunk_size, which just is a hint, and moving it a little earlier will allow for some better code reuse with the multipath code. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: don't pass struct nvme_ns to nvme_config_discardChristoph Hellwig1-16/+17
To allow reusing this function for the multipath node. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: don't pass struct nvme_ns to nvme_init_integrityChristoph Hellwig1-7/+7
To allow reusing this function for the multipath node. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: always unregister the integrity profile in __nvme_revalidate_diskChristoph Hellwig1-30/+10
This is safe because the queue is always frozen when we revalidate, and it simplifies both the existing code as well as the multipath implementation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10nvme: move the dying queue check from cancel to completionChristoph Hellwig1-6/+3
With multipath we don't want a hard DNR bit on a request that is cancelled by a controller reset, but instead want to be able to retry it on another patch. To archive this don't always set the DNR bit when the queue is dying in nvme_cancel_request, but defer that decision to nvme_req_needs_retry. Note that it applies to any command there and not just cancelled commands, but one the queue is dying that is the right thing to do anyway. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10blktrace: fix unlocked registration of tracepointsJens Axboe1-10/+22
We need to ensure that tracepoints are registered and unregistered with the users of them. The existing atomic count isn't enough for that. Add a lock around the tracepoints, so we serialize access to them. This fixes cases where we have multiple users setting up and tearing down tracepoints, like this: CPU: 0 PID: 2995 Comm: syzkaller857118 Not tainted 4.14.0-rc5-next-20171018+ #36 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1c4/0x1e0 kernel/panic.c:546 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177 do_trap_no_signal arch/x86/kernel/traps.c:211 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:260 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 RIP: 0010:tracepoint_add_func kernel/tracepoint.c:210 [inline] RIP: 0010:tracepoint_probe_register_prio+0x397/0x9a0 kernel/tracepoint.c:283 RSP: 0018:ffff8801d1d1f6c0 EFLAGS: 00010293 RAX: ffff8801d22e8540 RBX: 00000000ffffffef RCX: ffffffff81710f07 RDX: 0000000000000000 RSI: ffffffff85b679c0 RDI: ffff8801d5f19818 RBP: ffff8801d1d1f7c8 R08: ffffffff81710c10 R09: 0000000000000004 R10: ffff8801d1d1f6b0 R11: 0000000000000003 R12: ffffffff817597f0 R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8801d1d1f7a0 tracepoint_probe_register+0x2a/0x40 kernel/tracepoint.c:304 register_trace_block_rq_insert include/trace/events/block.h:191 [inline] blk_register_tracepoints+0x1e/0x2f0 kernel/trace/blktrace.c:1043 do_blk_trace_setup+0xa10/0xcf0 kernel/trace/blktrace.c:542 blk_trace_setup+0xbd/0x180 kernel/trace/blktrace.c:564 sg_ioctl+0xc71/0x2d90 drivers/scsi/sg.c:1089 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x444339 RSP: 002b:00007ffe05bb5b18 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000006d66c0 RCX: 0000000000444339 RDX: 000000002084cf90 RSI: 00000000c0481273 RDI: 0000000000000009 RBP: 0000000000000082 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff R13: 00000000c0481273 R14: 0000000000000000 R15: 0000000000000000 since we can now run these in parallel. Ensure that the exported helpers for doing this are grabbing the queue trace mutex. Reported-by: Steven Rostedt <[email protected]> Tested-by: Dmitry Vyukov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10blktrace: fix unlocked access to init/start-stop/teardownJens Axboe1-10/+48
sg.c calls into the blktrace functions without holding the proper queue mutex for doing setup, start/stop, or teardown. Add internal unlocked variants, and export the ones that do the proper locking. Fixes: 6da127ad0918 ("blktrace: Add blktrace ioctls to SCSI generic devices") Tested-by: Dmitry Vyukov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-10Merge tag 'ceph-for-4.14-rc9' of git://github.com/ceph/ceph-clientLinus Torvalds1-2/+2
Pull ceph gix from Ilya Dryomov: "Memory allocation flags fix, marked for stable" * tag 'ceph-for-4.14-rc9' of git://github.com/ceph/ceph-client: rbd: use GFP_NOIO for parent stat and data requests
2017-11-10Merge branch 'for-linus' of ↵Linus Torvalds3-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input layer updates from Dmitry Torokhov: - a new ACPI ID for Elan touchpad found in yet another Ideapad model - Synaptics RMI4 will allow binding to controllers reporting SMB version 3 (note that we are not adding any new ACPI IDs to the Synaptics PS/2 drover so unless user explicitly enables intertouch support there is no user-visible change) - a fixup to TSC 2004/5 touchscreen driver to mark input devices as "direct" to help userspace identify the type of device they are dealing with * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics-rmi4 - RMI4 can also use SMBUS version 3 Input: tsc200x-core - set INPUT_PROP_DIRECT Input: elan_i2c - add ELAN060C to the ACPI table
2017-11-10Merge remote-tracking branches 'spi/topic/sh-msiof', 'spi/topic/slave', ↵Mark Brown7-15/+495
'spi/topic/spreadtrum' and 'spi/topic/tegra114' into spi-next