aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-05-06lightnvm: pblk: kick writer on write recovery pathIgor Konopko1-0/+1
In case of write recovery path, there is a chance that writer thread is not active, kick immediately instead of waiting for timer. Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Javier González <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: fix lock order in pblk_rb_tear_down_checkIgor Konopko1-1/+1
In pblk_rb_tear_down_check() the spinlock functions are not called in proper order. Fixes: a4bd217 ("lightnvm: physical block device (pblk) target") Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Javier González <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: prevent race condition on pblk removeMarcin Dziegielewski1-0/+2
When we trigger nvm target remove during device hot unplug, there is a probability to hit a general protection fault. This is caused by use of nvm_dev thay may be freed from another (hot unplug) thread (in the nvm_unregister function). Introduce lock in nvme_ioctl_dev_remove function to prevent this situation. Signed-off-by: Marcin Dziegielewski <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: set propper line as data_line after gcMarcin Dziegielewski1-0/+1
In current implementation of l2p recovery, when we are after gc and we have open line, we are not setting current data line properly (we set last line from the device instead of last line ordered by seq_nr) and in consequence, kernel panic and data corruption. Signed-off-by: Marcin Dziegielewski <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: fix bio leak when bio is splitChansol Kim1-28/+19
For large size io where blk_queue_split needs to be called inside pblk_rw_io, results in bio leak as bio_endio is not called on the newly allocated. One way to observe this is to mounting ext4 filesystem on the target and issuing 1MB io with dd, e.g., dd bs=1MB if=/dev/null of=/mount/myvolume. kmemleak reports: unreferenced object 0xffff88803d7d0100 (size 256): comm "kworker/u16:1", pid 68, jiffies 4294899333 (age 284.120s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 60 e8 31 81 88 ff ff .........`.1.... 01 40 00 00 06 06 00 00 00 00 00 00 05 00 00 00 .@.............. backtrace: [<000000001f5aa04f>] kmem_cache_alloc+0x204/0x3c0 [<0000000040945aab>] mempool_alloc_slab+0x1d/0x30 [<00000000b4959ab4>] mempool_alloc+0x83/0x220 [<00000000646bad9b>] bio_alloc_bioset+0x229/0x320 [<000000009264b251>] bio_clone_fast+0x26/0xc0 [<0000000008250252>] bio_split+0x41/0x110 [<00000000e365cad0>] blk_queue_split+0x349/0x930 [<00000000eb5426bc>] pblk_make_rq+0x1b5/0x1f0 [<00000000eea09cec>] generic_make_request+0x2f9/0x690 [<00000000ae6acede>] submit_bio+0x12e/0x1f0 [<00000000f9b8b82a>] ext4_io_submit+0x64/0x80 [<000000009e4f817d>] ext4_bio_write_page+0x32e/0x890 [<00000000cbd0d106>] mpage_submit_page+0x65/0xc0 [<000000000eec7359>] mpage_map_and_submit_buffers+0x171/0x330 [<000000009a7afcb6>] ext4_writepages+0xd5e/0x1650 [<000000004476b096>] do_writepages+0x39/0xc0 In case there is a need for a split, blk_queue_split returns the newly allocated bio to the caller by changing the value of pointer passed as a reference, while the original is passed to generic_make_requests. Although pblk_rw_io's local variable bio* has changed and passed to pblk_submit_read and pblk_write_to_cache, work is done on this new bio*, and pblk_rw_io returns NVM_IO_DONE, pblk_make_rq calls bio_endio on the old bio* because it passed bio pointer by value to pblk_rw_io. pblk_rw_io is unfolded into pblk_make_rq so that there is no copying of bio* and bio_endio is called on the correct bio*. Signed-off-by: Chansol Kim <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: Inherit mdts from the parent nvme deviceIgor Konopko3-2/+9
Current lightnvm and pblk implementation does not care about NVMe max data transfer size, which can be smaller than 64*K=256K. There are existing NVMe controllers which NVMe max data transfer size is lower that 256K (for example 128K, which happens for existing NVMe controllers which are NVMe spec compliant). Such a controllers are not able to handle command which contains 64 PPAs, since the the size of DMAed buffer will be above the capabilities of such a controller. Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: set proper read status in bioIgor Konopko1-6/+5
Currently in case of read errors, bi_status is not set properly which leads to returning inproper data to layers above. This patch fix that by setting proper status in case of read errors. Also remove unnecessary warn_once(), which does not make sense in that place, since user bio is not used for interation with drive and thus bi_status will not be set here. Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Javier González <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: cleanly fail when there is not enough memoryIgor Konopko1-2/+7
L2P table can be huge in many cases, since it typically requires 1GB of DRAM for 1TB of drive. When there is not enough memory available, OOM killer turns on and kills random processes, which can be very annoying for users. This patch changes the flag for L2P table allocation on order to handle this situation in more user friendly way. GFP_KERNEL and __GPF_HIGHMEM are default flags used in parameterless vmalloc() calls, so they are also keeped in that patch. Additionally __GFP_NOWARN flag is added in order to hide very long dmesg warn in case of the allocation failures. The most important flag introduced in that patch is __GFP_RETRY_MAYFAIL, which would cause allocator to try use free memory and if not available to drop caches, but not to run OOM killer. Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: ensure that erase is chunk alignedIgor Konopko1-0/+1
The sector bits in the erase command may be uninitialized are uninitialized, causing the erase LBA to be unaligned to the chunk size. This is unexpected situation, since erase shall always be chunk aligned based on OCSSD the 2.0 specification. Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Javier González <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: fix race during put lineIgor Konopko1-6/+10
In the pblk_put_line_back function, a race condition with __pblk_map_invalidate can make a line not part of any lists. Fix gc_list by resetting it to null fixes the above issue. Fixes: a4bd217 ("lightnvm: physical block device (pblk) target") Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Javier González <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: gracefully handle GC vmalloc failIgor Konopko1-10/+9
Currently when we fail on rq data allocation in gc, it skips moving active data and moves line straigt to its free state. Losing user data in the process. Move the data allocation to an earlier phase of GC, where we can still fail gracefully by moving line back to the closed state. Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Javier González <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: remove unused smeta_ssec fieldIgor Konopko2-2/+0
smeta_ssec field in pblk_line is never used after it was replaced by the function pblk_line_smeta_start(). Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: reduce L2P memory footprintIgor Konopko5-11/+9
Currently L2P map size is calculated based on the total number of available sectors, which is redundant, since it contains mapping for overprovisioning as well (11% by default). Change this size to the real capacity and thus reduce the memory footprint significantly - with default op value it is approx. 110MB of DRAM less for every 1TB of media. Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: rollback on error during gc readIgor Konopko1-1/+6
A line is left unsigned to the blocks lists in case pblk_gc_line returns an error. This moves the line back to be appropriate list, which can then be picked up by the garbage collector. Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-06lightnvm: pblk: line reference fix in GCIgor Konopko1-1/+4
Fixes the GC error case when moving a line back to closed state while releasing additional references. Signed-off-by: Igor Konopko <[email protected]> Reviewed-by: Hans Holmberg <[email protected]> Reviewed-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-04block: don't drain in-progress dispatch in blk_cleanup_queue()Ming Lei1-12/+0
Now freeing hw queue resource is moved to hctx's release handler, we don't need to worry about the race between blk_cleanup_queue and run queue any more. So don't drain in-progress dispatch in blk_cleanup_queue(). This is basically revert of c2856ae2f315 ("blk-mq: quiesce queue before freeing queue"). Cc: Dongli Zhang <[email protected]> Cc: James Smart <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: [email protected], Cc: Martin K . Petersen <[email protected]>, Cc: Christoph Hellwig <[email protected]>, Cc: James E . J . Bottomley <[email protected]>, Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: James Smart <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-04blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_releaseMing Lei2-8/+2
hctx is always released after requeue is freed. With holding queue's kobject refcount, it is safe for driver to run queue, so one run queue might be scheduled after blk_sync_queue() is done. So moving the cancel of hctx->run_work into blk_mq_hw_sysfs_release() for avoiding run released queue. Cc: Dongli Zhang <[email protected]> Cc: James Smart <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: [email protected], Cc: Martin K . Petersen <[email protected]>, Cc: Christoph Hellwig <[email protected]>, Cc: James E . J . Bottomley <[email protected]>, Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: James Smart <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-04blk-mq: always free hctx after request queue is freedMing Lei3-13/+42
In normal queue cleanup path, hctx is released after request queue is freed, see blk_mq_release(). However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because of hw queues shrinking. This way is easy to cause use-after-free, because: one implicit rule is that it is safe to call almost all block layer APIs if the request queue is alive; and one hctx may be retrieved by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues(); finally use-after-free is triggered. Fixes this issue by always freeing hctx after releasing request queue. If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce a per-queue list to hold them, then try to resuse these hctxs if numa node is matched. Cc: Dongli Zhang <[email protected]> Cc: James Smart <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: [email protected], Cc: Martin K . Petersen <[email protected]>, Cc: Christoph Hellwig <[email protected]>, Cc: James E . J . Bottomley <[email protected]>, Reviewed-by: Hannes Reinecke <[email protected]> Tested-by: James Smart <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-04blk-mq: split blk_mq_alloc_and_init_hctx into two partsMing Lei1-64/+75
Split blk_mq_alloc_and_init_hctx into two parts, and one is blk_mq_alloc_hctx() for allocating all hctx resources, another is blk_mq_init_hctx() for initializing hctx, which serves as counter-part of blk_mq_exit_hctx(). Cc: Dongli Zhang <[email protected]> Cc: James Smart <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: [email protected] Cc: Martin K . Petersen <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: James E . J . Bottomley <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: James Smart <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-04blk-mq: free hw queue's resource in hctx's release handlerMing Lei4-8/+10
Once blk_cleanup_queue() returns, tags shouldn't be used any more, because blk_mq_free_tag_set() may be called. Commit 45a9c9d909b2 ("blk-mq: Fix a use-after-free") fixes this issue exactly. However, that commit introduces another issue. Before 45a9c9d909b2, we are allowed to run queue during cleaning up queue if the queue's kobj refcount is held. After that commit, queue can't be run during queue cleaning up, otherwise oops can be triggered easily because some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue(). We have invented ways for addressing this kind of issue before, such as: 8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done") c2856ae2f315 ("blk-mq: quiesce queue before freeing queue") But still can't cover all cases, recently James reports another such kind of issue: https://marc.info/?l=linux-scsi&m=155389088124782&w=2 This issue can be quite hard to address by previous way, given scsi_run_queue() may run requeues for other LUNs. Fixes the above issue by freeing hctx's resources in its release handler, and this way is safe becasue tags isn't needed for freeing such hctx resource. This approach follows typical design pattern wrt. kobject's release handler. Cc: Dongli Zhang <[email protected]> Cc: James Smart <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: [email protected], Cc: Martin K . Petersen <[email protected]>, Cc: Christoph Hellwig <[email protected]>, Cc: James E . J . Bottomley <[email protected]>, Reported-by: James Smart <[email protected]> Fixes: 45a9c9d909b2 ("blk-mq: Fix a use-after-free") Cc: [email protected] Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: James Smart <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-04blk-mq: move cancel of requeue_work into blk_mq_releaseMing Lei2-1/+2
With holding queue's kobject refcount, it is safe for driver to schedule requeue. However, blk_mq_kick_requeue_list() may be called after blk_sync_queue() is done because of concurrent requeue activities, then requeue work may not be completed when freeing queue, and kernel oops is triggered. So moving the cancel of requeue_work into blk_mq_release() for avoiding race between requeue and freeing queue. Cc: Dongli Zhang <[email protected]> Cc: James Smart <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: [email protected], Cc: Martin K . Petersen <[email protected]>, Cc: Christoph Hellwig <[email protected]>, Cc: James E . J . Bottomley <[email protected]>, Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: James Smart <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-04blk-mq: grab .q_usage_counter when queuing request from plug code pathMing Lei1-1/+11
Just like aio/io_uring, we need to grab 2 refcount for queuing one request, one is for submission, another is for completion. If the request isn't queued from plug code path, the refcount grabbed in generic_make_request() serves for submission. In theroy, this refcount should have been released after the sumission(async run queue) is done. blk_freeze_queue() works with blk_sync_queue() together for avoiding race between cleanup queue and IO submission, given async run queue activities are canceled because hctx->run_work is scheduled with the refcount held, so it is fine to not hold the refcount when running the run queue work function for dispatch IO. However, if request is staggered into plug list, and finally queued from plug code path, the refcount in submission side is actually missed. And we may start to run queue after queue is removed because the queue's kobject refcount isn't guaranteed to be grabbed in flushing plug list context, then kernel oops is triggered, see the following race: blk_mq_flush_plug_list(): blk_mq_sched_insert_requests() insert requests to sw queue or scheduler queue blk_mq_run_hw_queue Because of concurrent run queue, all requests inserted above may be completed before calling the above blk_mq_run_hw_queue. Then queue can be freed during the above blk_mq_run_hw_queue(). Fixes the issue by grab .q_usage_counter before calling blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is safe because the queue is absolutely alive before inserting request. Cc: Dongli Zhang <[email protected]> Cc: James Smart <[email protected]> Cc: [email protected], Cc: Martin K . Petersen <[email protected]>, Cc: Christoph Hellwig <[email protected]>, Cc: James E . J . Bottomley <[email protected]>, Reviewed-by: Bart Van Assche <[email protected]> Tested-by: James Smart <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-02Merge branch 'nvme-5.2' of git://git.infradead.org/nvme into for-5.2/blockJens Axboe7-31/+59
Pull NVMe updates from Christoph. * 'nvme-5.2' of git://git.infradead.org/nvme: nvmet: protect discovery change log event list iteration nvme: mark nvme_core_init and nvme_core_exit static nvme: move command size checks to the core nvme-fabrics: check more command sizes nvme-pci: check more command sizes nvme-pci: remove an unneeded variable initialization nvme-pci: unquiesce admin queue on shutdown nvme-pci: shutdown on timeout during deletion nvme-pci: fix psdt field for single segment sgls nvme-multipath: don't print ANA group state by default nvme-multipath: split bios with the ns_head bio_set before submitting nvme-tcp: fix possible null deref on a timed out io queue connect
2019-05-02block: fix function name in commentRaul E Rangel1-1/+1
The comment was out of date. Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Raul E Rangel <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-05-01nvmet: protect discovery change log event list iterationSagi Grimberg1-0/+5
When we iterate on the discovery subsystem controllers we need to protect against concurrent mutations to it. Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Minwoo Im <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01nvme: mark nvme_core_init and nvme_core_exit staticChristoph Hellwig2-5/+2
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]>
2019-05-01nvme: move command size checks to the coreChristoph Hellwig2-28/+30
Most command aren't PCIe specific, so move the size checking for them to core.c Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]>
2019-05-01nvme-fabrics: check more command sizesMinwoo Im1-0/+1
struct common_command provides a common structure for NVMe-oF command format. It also needs to be checked for unintended size growth. Signed-off-by: Minwoo Im <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01nvme-pci: check more command sizesMinwoo Im1-0/+7
All the NVMe command has 64bytes fixed size so that it has been assured with BUILD_BUG_ON(). The remaining command structures in linux/nvme.h also need to be checked here. Signed-off-by: Minwoo Im <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01nvme-pci: remove an unneeded variable initializationMinwoo Im1-1/+1
Variable "n" will be assigned once kstrtoint() succeeds, otherwise it will not be referred because kstrtoint() will return an error which means go out from this function. Signed-off-by: Minwoo Im <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01nvme-pci: unquiesce admin queue on shutdownKeith Busch1-1/+4
Just like IO queues, the admin queue also will not be restarted after a controller shutdown. Unquiesce this queue so that we do not block request dispatch on a permanently disabled controller. Reported-by: Yufen Yu <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01nvme-pci: shutdown on timeout during deletionKeith Busch1-1/+4
We do not restart a controller in a deleting state for timeout errors. When in this state, unblock potential request dispatchers with failed completions by shutting down the controller on timeout detection. Reported-by: Yufen Yu <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01nvme-pci: fix psdt field for single segment sglsKlaus Birkelund Jensen1-0/+1
The shortcut for single segment SGL requests did not set the PSDT field to mark the request as using SGLs. Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs") Signed-off-by: Klaus Birkelund Jensen <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01nvme-multipath: don't print ANA group state by defaultHannes Reinecke1-1/+1
Signed-off-by: Hannes Reinecke <[email protected]> Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01nvme-multipath: split bios with the ns_head bio_set before submittingHannes Reinecke1-0/+8
If the bio is moved to a different queue via blk_steal_bios() and the original queue is destroyed in nvme_remove_ns() we'll be ending with a crash in bio_endio() as the mempool for the split bio bvecs had already been destroyed. So split the bio using the original queue (which will remain during the lifetime of the bio) before sending it down to the underlying device. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01nvme-tcp: fix possible null deref on a timed out io queue connectSagi Grimberg1-1/+2
If I/O queue connect times out, we might have freed the queue socket already, so check for that on the error path in nvme_tcp_start_queue. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-05-01bcache: make is_discard_enabled() staticJens Axboe1-1/+1
It's not used outside this file. Fixes: 631207314d88 ("bcache: fix failure in journal relplay") Signed-off-by: Jens Axboe <[email protected]>
2019-04-30block: remove the unused blk_queue_dma_pad functionChristoph Hellwig2-17/+0
Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30block: add SPDX tags to block layer files missing licensing informationChristoph Hellwig31-0/+32
Various block layer files do not have any licensing information at all. Add SPDX tags for the default kernel GPLv2 license to those. Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30block: add a SPDX tag to blk-mq-rdma.hChristoph Hellwig1-0/+1
This file has no copyright notice, but was added as part of a commit adding another file using the default kernel GPLv2 license. Add a matching SPDX tag. Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30sed-opal.h: remove redundant licence boilerplateChristoph Hellwig1-9/+0
The file already has the correct SPDX header. Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30block: switch all files cleared marked as GPLv2 or later to SPDX tagsChristoph Hellwig10-130/+10
All these files have some form of the usual GPLv2 or later boilerplate. Switch them to use SPDX tags instead. Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30block: switch all files cleared marked as GPLv2 to SPDX tagsChristoph Hellwig19-208/+19
All these files have some form of the usual GPLv2 boilerplate. Switch them to use SPDX tags instead. Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30block: clean up __bio_add_pc_page a bitChristoph Hellwig1-6/+5
Share the bi_size update by moving the done label up, and duplicate the bv_len update in the two callers to get rid of the bvec_merge label. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30block: remove bogus comments in __bio_add_pc_pageChristoph Hellwig1-9/+0
We are never called with file system pages by defintions for the passthrough interface, and we also never undo any addition later these days. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30block: remove the __bio_add_pc_page exportChristoph Hellwig2-5/+1
The same page optimization is a rather odd corner case, which is not used outside bio.c and which really should not be used outside of bio.c either - we have better highlevel helpers like the rq/bio mapping helpers. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30block: remove the i argument to bio_for_each_segment_allChristoph Hellwig24-81/+47
We only have two callers that need the integer loop iterator, and they can easily maintain it themselves. Suggested-by: Matthew Wilcox <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Acked-by: David Sterba <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Coly Li <[email protected]> Reviewed-by: Matthew Wilcox <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30bcache: clean up do_btree_node_write a bitChristoph Hellwig1-4/+5
Use a variable containing the buffer address instead of the to be removed integer iterator from bio_for_each_segment_all. Suggested-by: Matthew Wilcox <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Coly Li <[email protected]> Reviewed-by: Matthew Wilcox <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-30bcache: remove redundant LIST_HEAD(journal) from run_cache_set()Coly Li1-1/+0
Commit 95f18c9d1310 ("bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC branch of run_cache_set") forgets to remove the original define of LIST_HEAD(journal), which makes the change no take effect. This patch removes redundant variable LIST_HEAD(journal) from run_cache_set(), to make Shenghui's fix working. Fixes: 95f18c9d1310 ("bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC branch of run_cache_set") Reported-by: Juha Aatrokoski <[email protected]> Cc: Shenghui Wang <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-04-26Merge branch 'nvme-5.2' of git://git.infradead.org/nvme into for-5.2/blockJens Axboe15-84/+77
Pull NVMe changes from Christoph. * 'nvme-5.2' of git://git.infradead.org/nvme: nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE nvme-rdma: fix typo in struct comment nvme-loop: kill timeout handler nvme-tcp: rename function to have nvme_tcp prefix nvme-rdma: fix a NULL deref when an admin connect times out nvme-tcp: fix a NULL deref when an admin connect times out nvmet-tcp: don't fail maxr2t greater than 1 nvmet-file: clamp-down file namespace lba_shift nvmet: include <linux/scatterlist.h> nvmet: return a specified error it subsys_alloc fails nvmet: rename nvme_completion instances from rsp to cqe nvmet-rdma: remove p2p_client initialization from fast-path