blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2018-09-26	block, scsi: Change the preempt-only flag into a counter	Bart Van Assche	1	-5/+9
	The RQF_PREEMPT flag is used for three purposes: - In the SCSI core, for making sure that power management requests are executed even if a device is in the "quiesced" state. - For domain validation by SCSI drivers that use the parallel port. - In the IDE driver, for IDE preempt requests. Rename "preempt-only" into "pm-only" because the primary purpose of this mode is power management. Since the power management core may but does not have to resume a runtime suspended device before performing system-wide suspend and since a later patch will set "pm-only" mode as long as a block device is runtime suspended, make it possible to set "pm-only" mode from more than one context. Since with this change scsi_device_quiesce() is no longer idempotent, make that function return early if it is called for a quiesced queue. Signed-off-by: Bart Van Assche <[email protected]> Acked-by: Martin K. Petersen <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Cc: Jianchao Wang <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Alan Stern <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-09-26	block: Move power management code into a new source file	Bart Van Assche	1	-23/+0
	Move the code for runtime power management from blk-core.c into the new source file blk-pm.c. Move the corresponding declarations from <linux/blkdev.h> into <linux/blk-pm.h>. For CONFIG_PM=n, leave out the declarations of the functions that are not used in that mode. This patch not only reduces the number of #ifdefs in the block layer core code but also reduces the size of header file <linux/blkdev.h> and hence should help to reduce the build time of the Linux kernel if CONFIG_PM is not defined. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Jianchao Wang <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Alan Stern <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-09-24	block: move req_gap_back_merge to blk.h	Christoph Hellwig	1	-19/+0
	No need to expose these helpers outside the block layer. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-09-24	block: move req_gap_{back,front}_merge to blk-merge.c	Christoph Hellwig	1	-69/+0
	Keep it close to the actual users instead of exposing the function to all drivers. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-09-24	block: move integrity_req_gap_{back,front}_merge to blk.h	Christoph Hellwig	1	-31/+0
	No need to expose these to drivers. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-09-11	blk-cgroup: increase number of supported policies	Jens Axboe	1	-1/+1
	After merging the iolatency policy, we potentially now have 4 policies being registered, but only support 3. This causes one of them to fail loading. Takashi reports that BFQ no longer works for him, because it fails to load due to policy registration failure. Bump to 5 policies, and also add a warning for when we have exceeded the global amount. If we have to touch this again, we should switch to a dynamic scheme instead. Reported-by: Takashi Iwai <[email protected]> Reviewed-by: Jeff Moyer <[email protected]> Tested-by: Takashi Iwai <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09	block: Remove two superfluous #include directives	Bart Van Assche	1	-2/+0
	Commit 12f5b9314545 ("blk-mq: Remove generation seqeunce") removed the only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but did not remove the corresponding #include directives. Since these include directives are no longer needed, remove them. Signed-off-by: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Keith Busch <[email protected]> Cc: Ming Lei <[email protected]> Cc: Jianchao Wang <[email protected]> Cc: Hannes Reinecke <[email protected]>, Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-26	block: move bio_integrity_{intervals,bytes} into blkdev.h	Greg Edwards	1	-0/+34
	This allows bio_integrity_bytes() to be called from drivers instead of open coding it. Acked-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Edwards <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-18	block: make bdev_ops->rw_page() take a REQ_OP instead of bool	Tejun Heo	1	-1/+1
	c11f0c0b5bb9 ("block/mm: make bdev_ops->rw_page() take a bool for read/write") replaced @op with boolean @is_write, which limited the amount of information going into ->rw_page() and more importantly page_endio(), which removed the need to expose block internals to mm. Unfortunately, we want to track discards separately and @is_write isn't enough information. This patch updates bdev_ops->rw_page() to take REQ_OP instead but leaves page_endio() to take bool @is_write. This allows the block part of operations to have enough information while not leaking it to mm. Signed-off-by: Tejun Heo <[email protected]> Cc: Mike Christie <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-13	block: remove blkdev_entry_to_request() macro	Vladimir Zapolskiy	1	-2/+0
	Remove blkdev_entry_to_request() macro, which remained unused through the observable history, also note that it repeats list_entry_rq() macro verbatim. Signed-off-by: Vladimir Zapolskiy <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-09	blk-rq-qos: refactor out common elements of blk-wbt	Josef Bacik	1	-2/+2
	blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-09	blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()	Ming Lei	1	-2/+0
	We have to remove synchronize_rcu() from blk_queue_cleanup(), otherwise long delay can be caused during lun probe. For removing it, we have to avoid to iterate the set->tag_list in IO path, eg, blk_mq_sched_restart(). This patch reverts 5b79413946d (Revert "blk-mq: don't handle TAG_SHARED in restart"). Given we have fixed enough IO hang issue, and there isn't any reason to restart all queues in one tags any more, see the following reasons: 1) blk-mq core can deal with shared-tags case well via blk_mq_get_driver_tag(), which can wake up queues waiting for driver tag. 2) SCSI is a bit special because it may return BLK_STS_RESOURCE if queue, target or host is ready, but SCSI built-in restart can cover all these well, see scsi_end_request(), queue will be rerun after any request initiated from this host/target is completed. In my test on scsi_debug(8 luns), this patch may improve IOPS by 20% ~ 30% when running I/O on these 8 luns concurrently. Fixes: 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: Omar Sandoval <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Martin K. Petersen <[email protected]> Cc: [email protected] Reported-by: Andrew Jones <[email protected]> Tested-by: Andrew Jones <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-09	block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n	Bart Van Assche	1	-0/+6
	Exclude zoned block device members from struct request_queue for CONFIG_BLK_DEV_ZONED == n. Avoid breaking the build by only building the code that uses these struct request_queue members if CONFIG_BLK_DEV_ZONED != n. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Cc: Matias Bjorling <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-09	block: Inline blk_queue_nr_zones()	Bart Van Assche	1	-5/+0
	Since the implementation of blk_queue_nr_zones() is trivial and since it only has a single caller, inline this function. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Cc: Matias Bjorling <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-07-09	block: Remove bdev_nr_zones()	Bart Van Assche	1	-9/+0
	Remove this function since it has no callers. This function was introduced in commit 6cc77e9cb080 ("block: introduce zoned block devices zone write locking"). Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Matias Bjorling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-06-26	block: Fix transfer when chunk sectors exceeds max	Keith Busch	1	-2/+2
	A device may have boundary restrictions where the number of sectors between boundaries exceeds its max transfer size. In this case, we need to cap the max size to the smaller of the two limits. Reported-by: Jitendra Bhivare <[email protected]> Tested-by: Jitendra Bhivare <[email protected]> Cc: <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-06-15	block: remov blk_queue_invalidate_tags	Christoph Hellwig	1	-2/+0
	This function is entirely unused, so remove it and the tag_queue_busy member of struct request_queue. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-06-14	blk-mq: don't time out requests again that are in the timeout handler	Christoph Hellwig	1	-0/+2
	We can currently call the timeout handler again on a request that has already been handed over to the timeout handler. Prevent that with a new flag. Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce") Reported-by: Andrew Randrianasulu <[email protected]> Tested-by: Andrew Randrianasulu <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-30	block: convert bounce, q->bio_split to bioset_init()/mempool_init()	Kent Overstreet	1	-1/+1
	Convert the core block functionality to embedded bio sets. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-29	block: move ->timeout request member	Jens Axboe	1	-2/+2
	After the recent timeout handling changes, we have two holes in the struct. Move the timeout near the deadline, killing both, and moving related members closer together. On my config on x86-64, this shrinks struct request from 312 to 304 bytes. Signed-off-by: Jens Axboe <[email protected]>
2018-05-29	block: document the blk_eh_timer_return values	Christoph Hellwig	1	-2/+2
	Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-29	block: remove BLK_EH_HANDLED	Christoph Hellwig	1	-1/+0
	Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-29	block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE	Christoph Hellwig	1	-1/+1
	The BLK_EH_NOT_HANDLED implies nothing happen, but very often that is not what is happening - instead the driver already completed the command. Fix the symbolic name to reflect that a little better. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-29	blk-mq: Remove generation seqeunce	Keith Busch	1	-23/+12
	This patch simplifies the timeout handling by relying on the request reference counting to ensure the iterator is operating on an inflight and truly timed out request. Since the reference counting prevents the tag from being reallocated, the block layer no longer needs to prevent drivers from completing their requests while the timeout handler is operating on it: a driver completing a request is allowed to proceed to the next state without additional syncronization with the block layer. This also removes any need for generation sequence numbers since the request lifetime is prevented from being reallocated as a new sequence while timeout handling is operating on it. To enables this a refcount is added to struct request so that request users can be sure they're operating on the same request without it changing while they're processing it. The request's tag won't be released for reuse until both the timeout handler and the completion are done with it. Signed-off-by: Keith Busch <[email protected]> [hch: slight cleanups, added back submission side hctx lock, use cmpxchg for completions] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-14	block: sanitize blk_get_request calling conventions	Christoph Hellwig	1	-4/+1
	Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-09	block: consolidate struct request timestamp fields	Omar Sandoval	1	-36/+2
	Currently, struct request has four timestamp fields: - A start time, set at get_request time, in jiffies, used for iostats - An I/O start time, set at start_request time, in ktime nanoseconds, used for blk-stats (i.e., wbt, kyber, hybrid polling) - Another start time and another I/O start time, used for cfq and bfq These can all be consolidated into one start time and one I/O start time, both in ktime nanoseconds, shaving off up to 16 bytes from struct request depending on the kernel config. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-09	block: use ktime_get_ns() instead of sched_clock() for cfq and bfq	Omar Sandoval	1	-15/+6
	cfq and bfq have some internal fields that use sched_clock() which can trivially use ktime_get_ns() instead. Their timestamp fields in struct request can also use ktime_get_ns(), which resolves the 8 year old comment added by commit 28f4197e5d47 ("block: disable preemption before using sched_clock()"). Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-05-09	block: get rid of struct blk_issue_stat	Omar Sandoval	1	-8/+18
	struct blk_issue_stat squashes three things into one u64: - The time the driver started working on a request - The original size of the request (for the io.low controller) - Flags for writeback throttling It turns out that on x86_64, we have a 4 byte hole in struct request which we can fill with the non-timestamp fields from blk_issue_stat, simplifying things quite a bit. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-04-25	Merge tag 'scsi-fixes' of ↵	Linus Torvalds	1	-0/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eight bug fixes, one spelling update and one tracepoint addition. The most serious is probably the mptsas write same fix because it means anyone using these controllers sees errors when modern filesystems try to issue discards" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: fix crash with iscsi target and dvd scsi: sd_zbc: Avoid that resetting a zone fails sporadically scsi: sd: Defer spinning up drive while SANITIZE is in progress scsi: megaraid_sas: Do not log an error if FW successfully initializes. scsi: ufs: add trace event for ufs upiu scsi: core: remove reference to scsi_show_extd_sense() scsi: mptsas: Disable WRITE SAME scsi: fnic: fix spelling mistake in fnic stats "Abord" -> "Abort" scsi: scsi_debug: IMMED related delay adjustments scsi: iscsi: respond to netlink with unicast when appropriate
2018-04-19	scsi: sd_zbc: Avoid that resetting a zone fails sporadically	Bart Van Assche	1	-0/+5
	Since SCSI scanning occurs asynchronously, since sd_revalidate_disk() is called from sd_probe_async() and since sd_revalidate_disk() calls sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called concurrently with blkdev_report_zones() and/or blkdev_reset_zones(). That can cause these functions to fail with -EIO because sd_zbc_read_zones() e.g. sets q->nr_zones to zero before restoring it to the actual value, even if no drive characteristics have changed. Avoid that this can happen by making the following changes: - Protect the code that updates zone information with blk_queue_enter() and blk_queue_exit(). - Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that these functions do not modify struct scsi_disk before all zone information has been obtained. Note: since commit 055f6e18e08f ("block: Make q_usage_counter also track legacy requests"; kernel v4.15) the request queue freezing mechanism also affects legacy request queues. Fixes: 89d947561077 ("sd: Implement support for ZBC devices") Signed-off-by: Bart Van Assche <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Damien Le Moal <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: [email protected] # v4.16 Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2018-04-18	block: add blk_queue_fua() helper function	Dave Chinner	1	-0/+1
	So we can check FUA support status from the iomap direct IO code. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-03-17	block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h>	Bart Van Assche	1	-11/+31
	It happens often while I'm preparing a patch for a block driver that I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT available for this driver? Do I have to introduce definitions of these constants before I can use these constants? To avoid this confusion, move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the <linux/blkdev.h> header file such that these become available for all block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h header file conditional to avoid that including that header file after <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE redefinition. Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have not been removed from uapi header files nor from NAND drivers in which these constants are used for another purpose than converting block layer offsets and sizes into a number of sectors. Cc: David S. Miller <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: Dan Williams <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-03-08	block: Move the queue_flag_*() functions from a public into a private header ↵	Bart Van Assche	1	-69/+0
	file This patch helps to avoid that new code gets introduced in block drivers that manipulates queue flags without holding the queue lock when that lock should be held. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Ming Lei <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-03-08	block: Complain if queue_flag_(set\|clear)_unlocked() is abused	Bart Van Assche	1	-0/+6
	Since it is not safe to use queue_flag_(set\|clear)_unlocked() without holding the queue lock after the sysfs entries for a queue have been created, complain if this happens. Cc: Mike Snitzer <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Ming Lei <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-03-08	block: Introduce blk_queue_flag_{set,clear,test_and_{set,clear}}()	Bart Van Assche	1	-0/+5
	Introduce functions that modify the queue flags and that protect these modifications with the request queue lock. Except for moving one wake_up_all() call from inside to outside a critical section, this patch does not change any functionality. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Ming Lei <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-03-08	block: Reorder the queue flag manipulation function definitions	Bart Van Assche	1	-11/+11
	Move the definition of queue_flag_clear_unlocked() up and move the definition of queue_in_flight() down such that all queue flag manipulation function definitions become contiguous. This patch does not change any functionality. Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Ming Lei <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-02-28	block: Add 'lock' as third argument to blk_alloc_queue_node()	Bart Van Assche	1	-1/+2
	This patch does not change any functionality. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Philipp Reisner <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-02-15	block: fix a typo in comment of BLK_MQ_POLL_STATS_BKTS	Minwoo Im	1	-1/+1
	Update comment typo _consisitent_ to _consistent_ from following commit. commit 0206319fdfee ("blk-mq: Fix poll_stat for new size-based bucketing.") Cc: Jens Axboe <[email protected]> Signed-off-by: Minwoo Im <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-01-29	Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block	Linus Torvalds	1	-17/+155
	Pull block updates from Jens Axboe: "This is the main pull request for block IO related changes for the 4.16 kernel. Nothing major in this pull request, but a good amount of improvements and fixes all over the map. This contains: - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and Paolo. - Support for SMR zones for deadline and mq-deadline from Damien and Christoph. - Set of fixes for bcache by way of Michael Lyle, including fixes from himself, Kent, Rui, Tang, and Coly. - Series from Matias for lightnvm with fixes from Hans Holmberg, Javier, and Matias. Mostly centered around pblk, and the removing rrpc 1.2 in preparation for supporting 2.0. - A couple of NVMe pull requests from Christoph. Nothing major in here, just fixes and cleanups, and support for command tracing from Johannes. - Support for blk-throttle for tracking reads and writes separately. From Joseph Qi. A few cleanups/fixes also for blk-throttle from Weiping. - Series from Mike Snitzer that enables dm to register its queue more logically, something that's alwways been problematic on dm since it's a stacked device. - Series from Ming cleaning up some of the bio accessor use, in preparation for supporting multipage bvecs. - Various fixes from Ming closing up holes around queue mapping and quiescing. - BSD partition fix from Richard Narron, fixing a problem where we can't mount newer (10/11) FreeBSD partitions. - Series from Tejun reworking blk-mq timeout handling. The previous scheme relied on atomic bits, but it had races where we would think a request had timed out if it to reused at the wrong time. - null_blk now supports faking timeouts, to enable us to better exercise and test that functionality separately. From me. - Kill the separate atomic poll bit in the request struct. After this, we don't use the atomic bits on blk-mq anymore at all. From me. - sgl_alloc/free helpers from Bart. - Heavily contended tag case scalability improvement from me. - Various little fixes and cleanups from Arnd, Bart, Corentin, Douglas, Eryu, Goldwyn, and myself" * 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits) block: remove smart1,2.h nvme: add tracepoint for nvme_complete_rq nvme: add tracepoint for nvme_setup_cmd nvme-pci: introduce RECONNECTING state to mark initializing procedure nvme-rdma: remove redundant boolean for inline_data nvme: don't free uuid pointer before printing it nvme-pci: Suspend queues after deleting them bsg: use pr_debug instead of hand crafted macros blk-mq-debugfs: don't allow write on attributes with seq_operations set nvme-pci: Fix queue double allocations block: Set BIO_TRACE_COMPLETION on new bio during split blk-throttle: use queue_is_rq_based block: Remove kblockd_schedule_delayed_work{,_on}() blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly() lib/scatterlist: Fix chaining support in sgl_alloc_order() blk-throttle: track read and write request individually block: add bdev_read_only() checks to common helpers block: fail op_is_write() requests to read-only partitions blk-throttle: export io_serviced_recursive, io_service_bytes_recursive ...
2018-01-19	block: Remove kblockd_schedule_delayed_work{,_on}()	Bart Van Assche	1	-2/+0
	The previous patch removed all users of these two functions. Hence also remove the functions themselves. Reviewed-by: Mike Snitzer <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-01-10	block: rearrange a few request fields for better cache layout	Jens Axboe	1	-13/+15
	Move completion related items (like the call single data) near the end of the struct, instead of mixing them in with the initial queueing related fields. Move queuelist below the bio structures. Then we have all queueing related bits in the first cache line. This yields a 1.5-2% increase in IOPS for a null_blk test, both for sync and for high thread count access. Sync test goes form 975K to 992K, 32-thread case from 20.8M to 21.2M IOPS. Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-01-10	block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit	Jens Axboe	1	-2/+0
	We only have one atomic flag left. Instead of using an entire unsigned long for that, steal the bottom bit of the deadline field that we already reserved. Remove ->atomic_flags, since it's now unused. Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-01-10	block: add accessors for setting/querying request deadline	Jens Axboe	1	-1/+3
	We reduce the resolution of request expiry, but since we're already using jiffies for this where resolution depends on the kernel configuration and since the timeout resolution is coarse anyway, that should be fine. Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-01-10	block: remove REQ_ATOM_POLL_SLEPT	Jens Axboe	1	-0/+2
	We don't need this to be an atomic flag, it can be a regular flag. We either end up on the same CPU for the polling, in which case the state is sane, or we did the sleep which would imply the needed barrier to ensure we see the right state. Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-01-09	blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq	Tejun Heo	1	-0/+2
	After the recent updates to use generation number and state based synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except to avoid firing the same timeout multiple times. Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple times. This removes atomic bitops from hot paths too. v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out(). v3: Added RQF_MQ_TIMEOUT_EXPIRED flag. Signed-off-by: Tejun Heo <[email protected]> Cc: "jianchao.wang" <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-01-09	blk-mq: replace timeout synchronization with a RCU and generation based scheme	Tejun Heo	1	-0/+23
	Currently, blk-mq timeout path synchronizes against the usual issue/completion path using a complex scheme involving atomic bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence rules. Unfortunately, it contains quite a few holes. There's a complex dancing around REQ_ATOM_STARTED and REQ_ATOM_COMPLETE between issue/completion and timeout paths; however, they don't have a synchronization point across request recycle instances and it isn't clear what the barriers add. blk_mq_check_expired() can easily read STARTED from N-2'th iteration, deadline from N-1'th, blk_mark_rq_complete() against Nth instance. In fact, it's pretty easy to make blk_mq_check_expired() terminate a later instance of a request. If we induce 5 sec delay before time_after_eq() test in blk_mq_check_expired(), shorten the timeout to 2s, and issue back-to-back large IOs, blk-mq starts timing out requests spuriously pretty quickly. Nothing actually timed out. It just made the call on a recycle instance of a request and then terminated a later instance long after the original instance finished. The scenario isn't theoretical either. This patch replaces the broken synchronization mechanism with a RCU and generation number based one. 1. Each request has a u64 generation + state value, which can be updated only by the request owner. Whenever a request becomes in-flight, the generation number gets bumped up too. This provides the basis for the timeout path to distinguish different recycle instances of the request. Also, marking a request in-flight and setting its deadline are protected with a seqcount so that the timeout path can fetch both values coherently. 2. The timeout path fetches the generation, state and deadline. If the verdict is timeout, it records the generation into a dedicated request abortion field and does RCU wait. 3. The completion path is also protected by RCU (from the previous patch) and checks whether the current generation number and state match the abortion field. If so, it skips completion. 4. The timeout path, after RCU wait, scans requests again and terminates the ones whose generation and state still match the ones requested for abortion. By now, the timeout path knows that either the generation number and state changed if it lost the race or the completion will yield to it and can safely timeout the request. While it's more lines of code, it's conceptually simpler, doesn't depend on direct use of subtle memory ordering or coherence, and hopefully doesn't terminate the wrong instance. While this change makes REQ_ATOM_COMPLETE synchronization unnecessary between issue/complete and timeout paths, REQ_ATOM_COMPLETE isn't removed yet as it's still used in other places. Future patches will move all state tracking to the new mechanism and remove all bitops in the hot paths. Note that this patch adds a comment explaining a race condition in BLK_EH_RESET_TIMER path. The race has always been there and this patch doesn't change it. It's just documenting the existing race. v2: - Fixed BLK_EH_RESET_TIMER handling as pointed out by Jianchao. - s/request->gstate_seqc/request->gstate_seq/ as suggested by Peter. - READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter. v3: - Fixed possible extended seqcount / u64_stats_sync read looping spotted by Peter. - MQ_RQ_IDLE was incorrectly being set in complete_request instead of free_request. Fixed. v4: - Rebased on top of hctx_lock() refactoring patch. - Added comment explaining the use of hctx_lock() in completion path. v5: - Added comments requested by Bart. - Note the addition of BLK_EH_RESET_TIMER race condition in the commit message. Signed-off-by: Tejun Heo <[email protected]> Cc: "jianchao.wang" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-01-05	block: introduce zoned block devices zone write locking	Christoph Hellwig	1	-0/+111
	Components relying only on the request_queue structure for accessing block devices (e.g. I/O schedulers) have a limited knowledged of the device characteristics. In particular, the device capacity cannot be easily discovered, which for a zoned block device also result in the inability to easily know the number of zones of the device (the zone size is indicated by the chunk_sectors field of the queue limits). Introduce the nr_zones field to the request_queue structure to simplify access to this information. Also, add the bitmap seq_zone_bitmap which indicates which zones of the device are sequential zones (write preferred or write required) and the bitmap seq_zones_wlock which indicates if a zone is write locked, that is, if a write request targeting a zone was dispatched to the device. These fields are initialized by the low level block device driver (sd.c for ZBC/ZAC disks). They are not initialized by stacking drivers (device mappers) handling zoned block devices (e.g. dm-linear). Using this, I/O schedulers can introduce zone write locking to control request dispatching to a zoned block device and avoid write request reordering by limiting to at most a single write request per zone outside of the scheduler at any time. Based on previous patches from Damien Le Moal. Signed-off-by: Christoph Hellwig <[email protected]> [Damien] * Fixed comments and identation in blkdev.h * Changed helper functions * Fixed this commit message Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-12-20	block: unalign call_single_data in struct request	Jens Axboe	1	-1/+1
	A previous change blindly added massive alignment to the call_single_data structure in struct request. This ballooned it in size from 296 to 320 bytes on my setup, for no valid reason at all. Use the unaligned struct __call_single_data variant instead. Fixes: 966a967116e69 ("smp: Avoid using two cache lines for struct call_single_data") Cc: [email protected] # v4.14 Signed-off-by: Jens Axboe <[email protected]>
2017-12-18	block: fix blk_rq_append_bio	Jens Axboe	1	-1/+1
	Commit caa4b02476e3(blk-map: call blk_queue_bounce from blk_rq_append_bio) moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider the fact that the bounced bio becomes invisible to caller since the parameter type is 'struct bio *'. Make it a pointer to a pointer to a bio, so the caller sees the right bio also after a bounce. Fixes: caa4b02476e3 ("blk-map: call blk_queue_bounce from blk_rq_append_bio") Cc: Christoph Hellwig <[email protected]> Reported-by: Michele Ballabio <[email protected]> (handling failure of blk_rq_append_bio(), only call bio_get() after blk_rq_append_bio() returns OK) Tested-by: Michele Ballabio <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-12-18	block: don't let passthrough IO go into .make_request_fn()	Ming Lei	1	-2/+19
	Commit a8821f3f3("block: Improvements to bounce-buffer handling") tries to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES, but ignores that passthrough I/O can use blk_queue_bounce() too. Especially, passthrough IO may not be sector-aligned, and the check of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may become true even though the max bvec number doesn't exceed BIO_MAX_PAGES, then cause the bio splitted, and the original passthrough bio is submited to generic_make_request(). This patch fixes this issue by checking if the bio is passthrough IO, and use bio_kmalloc() to allocate the cloned passthrough bio. Cc: NeilBrown <[email protected]> Fixes: a8821f3f3("block: Improvements to bounce-buffer handling") Tested-by: Michele Ballabio <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>