aboutsummaryrefslogtreecommitdiff
path: root/include/linux/blkdev.h
AgeCommit message (Collapse)AuthorFilesLines
2010-08-07block: don't allocate a payload for discard requestChristoph Hellwig1-0/+2
Allocating a fixed payload for discard requests always was a horrible hack, and it's not coming to byte us when adding support for discard in DM/MD. So change the code to leave the allocation of a payload to the lowlevel driver. Unfortunately that means we'll need another hack, which allows us to update the various block layer length fields indicating that we have a payload. Instead of hiding this in sd.c, which we already partially do for UNMAP support add a documented helper in the core block layer for it. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Mike Snitzer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-08-07block: unify flags for struct bio and struct requestChristoph Hellwig1-65/+1
Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-08-07block: remove wrappers for request type/flagsChristoph Hellwig1-28/+13
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in struct requests. This allows much easier grepping for different request types instead of unwinding through macros. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-08-07block: kill ISA_DMA_THRESHOLD usageFUJITA Tomonori1-1/+1
block uses ISA_DMA_THRESHOLD for BLK_BOUNCE_ISA. Only SCSI uses ISA_DMA_THRESHOLD for ancient drivers with non-zero unchecked_isa_dma. Nowadays drivers (and subsystems) use dma_mask properly instead of ISA_DMA_THRESHOLD. Documentation/scsi/scsi_mid_low_api.txt says: unchecked_isa_dma - 1=>only use bottom 16 MB of ram (ISA DMA addressing restriction), 0=>can use full 32 bit (or better) DMA address space So block simply uses DMA_BIT_MASK(24) for BLK_BOUNCE_ISA for SCSI. Signed-off-by: FUJITA Tomonori <[email protected]> Acked-by: James Bottomley <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-08-07block: add sysfs knob for turning off disk entropy contributionsJens Axboe1-1/+4
There are two reasons for doing this: - On SSD disks, the completion times aren't as random as they are for rotational drives. So it's questionable whether they should contribute to the random pool in the first place. - Calling add_disk_randomness() has a lot of overhead. This adds /sys/block/<dev>/queue/add_random that will allow you to switch off on a per-device basis. The default setting is on, so there should be no functional changes from this patch. Signed-off-by: Jens Axboe <[email protected]>
2010-06-01block: disable preemption before using sched_clock()Jens Axboe1-0/+9
Commit 9195291e5f05e01d67f9a09c756b8aca8f009089 added calls to sched_clock() from preemptible code. sched_clock() is both the wrong interface AND cannot be called without preempt disabled. Apply a temporary fix to get rid of the warnings, a real patch is in the works. Signed-off-by: Jens Axboe <[email protected]>
2010-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6Linus Torvalds1-0/+2
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (577 commits) Staging: ramzswap: Handler for swap slot free callback swap: Add swap slot free callback to block_device_operations swap: Add flag to identify block swap devices Staging: vt6655: use ETH_FRAME_LEN macro instead of custom one Staging: vt6655: use ETH_DATA_LEN macro instead of custom one Staging: vt6655: use ETH_FCS_LEN macro instead of custom one Staging: vt6656: use ETH_HLEN macro instead of custom one Staging: comedi: quatech_daqp_cs.c Replace eos semaphore with a completion. Staging: dt3155v4l: remove private memory allocator Staging: crystalhd: Remove typedefs from driver Staging: winbond: Fix for pointer name format issue in mds.c Staging: vt6656: removed custom UCHAR/USHORT/UINT/ULONG/ULONGLONG typedefs Staging: vt6656: removed custom CHAR/SHORT/INT/LONG typedefs Staging: comedi: Altered the way printk is used in 8255.c staging: iio: adis16350 and similar IMU driver Staging: iio: max1363 Fix two bugs in single_channel_from_ring Staging: iio: adis16220 extract bin_attribute structures from state Staging: iio: adis16220 vibration sensor driver Staging: comedi: Kconfig dependancy fixes Staging: comedi: fix up build error from last Kconfig changes ...
2010-05-21block,ide: simplify bdops->set_capacity() to ->unlock_native_capacity()Tejun Heo1-2/+1
bdops->set_capacity() is unnecessarily generic. All that's required is a simple one way notification to lower level driver telling it to try to unlock native capacity. There's no reason to pass in target capacity or return the new capacity. The former is always the inherent native capacity and the latter can be handled via the usual device resize / revalidation path. In fact, the current API is always used that way. Replace ->set_capacity() with ->unlock_native_capacity() which take only @disk and doesn't return anything. IDE which is the only current user of the API is converted accordingly. Signed-off-by: Tejun Heo <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Bartlomiej Zolnierkiewicz <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-05-18swap: Add swap slot free callback to block_device_operationsNitin Gupta1-0/+2
This callback is required when RAM based devices are used as swap disks. One such device is ramzswap which is used as compressed in-memory swap disk. For such devices, we need a callback as soon as a swap slot is no longer used to allow freeing memory allocated for this slot. Without this callback, stale data can quickly accumulate in memory defeating the whole purpose of such devices. Signed-off-by: Nitin Gupta <[email protected]> Acked-by: Linus Torvalds <[email protected]> Acked-by: Nigel Cunningham <[email protected]> Acked-by: Pekka Enberg <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-05-11block: allow initialization of previously allocated request_queueMike Snitzer1-0/+5
blk_init_queue() allocates the request_queue structure and then initializes it as needed (request_fn, elevator, etc). Split initialization out to blk_init_allocated_queue_node. Introduce blk_init_allocated_queue wrapper function to model existing blk_init_queue and blk_init_queue_node interfaces. Export elv_register_queue to allow a newly added elevator to be registered with sysfs. Export elv_unregister_queue for symmetry. These changes allow DM to initialize a device's request_queue with more precision. In particular, DM no longer unconditionally initializes a full request_queue (elevator et al). It only does so for a request-based DM device. Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-04-28blkdev: add blkdev_issue_zeroout helper functionDmitry Monakhov1-1/+2
- Add bio_batch helper primitive. This is rather generic primitive for submitting/waiting a complex request which consists of several bios. - blkdev_issue_zeroout() generate number of zero filed write bios. Signed-off-by: Dmitry Monakhov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-04-28blkdev: generalize flags for blkdev_issue_fn functionsDmitry Monakhov1-7/+11
The patch just convert all blkdev_issue_xxx function to common set of flags. Wait/allocation semantics preserved. Signed-off-by: Dmitry Monakhov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-04-21blkio: Fix blkio crash during rq stat updateVivek Goyal1-1/+2
blkio + cfq was crashing even when two sequential readers were put in two separate cgroups (group_isolation=0). The reason being that cfqq can migrate across groups based on its being sync-noidle or not, it can happen that at request insertion time, cfqq belonged to one cfqg and at request dispatch time, it belonged to root group. In this case request stats per cgroup can go wrong and it also runs into BUG_ON(). This patch implements rq stashing away a cfq group pointer and not relying on cfqq->cfqg pointer alone for rq stat accounting. [ 65.163523] ------------[ cut here ]------------ [ 65.164301] kernel BUG at block/blk-cgroup.c:117! [ 65.164301] invalid opcode: 0000 [#1] SMP [ 65.164301] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:60:00.1/host9/rport-9:0-0/target9:0:0/9:0:0:2/block/sde/stat [ 65.164301] CPU 1 [ 65.164301] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] [ 65.164301] [ 65.164301] Pid: 4505, comm: fio Not tainted 2.6.34-rc4-blk-for-35 #34 0A98h/HP xw8600 Workstation [ 65.164301] RIP: 0010:[<ffffffff8121924f>] [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf [ 65.164301] RSP: 0018:ffff8800ba5a79e8 EFLAGS: 00010046 [ 65.164301] RAX: 0000000000000096 RBX: ffff8800bb268d60 RCX: 0000000000000000 [ 65.164301] RDX: ffff8800bb268eb8 RSI: 0000000000000000 RDI: ffff8800bb268e00 [ 65.164301] RBP: ffff8800ba5a7a08 R08: 0000000000000064 R09: 0000000000000001 [ 65.164301] R10: 0000000000079640 R11: ffff8800a0bd5bf0 R12: ffff8800bab4af01 [ 65.164301] R13: ffff8800bab4af00 R14: ffff8800bb1d8928 R15: 0000000000000000 [ 65.164301] FS: 00007f18f75056f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000 [ 65.164301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 65.164301] CR2: 000000000040e7f0 CR3: 00000000ba52b000 CR4: 00000000000006e0 [ 65.164301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 65.164301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 65.164301] Process fio (pid: 4505, threadinfo ffff8800ba5a6000, task ffff8800ba45ae80) [ 65.164301] Stack: [ 65.164301] ffff8800ba5a7a08 ffff8800ba722540 ffff8800bab4af68 ffff8800bab4af68 [ 65.164301] <0> ffff8800ba5a7a38 ffffffff8121d814 ffff8800ba722540 ffff8800bab4af68 [ 65.164301] <0> ffff8800ba722540 ffff8800a08f6800 ffff8800ba5a7a68 ffffffff8121d8ca [ 65.164301] Call Trace: [ 65.164301] [<ffffffff8121d814>] cfq_remove_request+0xe4/0x116 [ 65.164301] [<ffffffff8121d8ca>] cfq_dispatch_insert+0x84/0xe1 [ 65.164301] [<ffffffff8121e833>] cfq_dispatch_requests+0x767/0x8e8 [ 65.164301] [<ffffffff8120e524>] ? submit_bio+0xc3/0xcc [ 65.164301] [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35 [ 65.164301] [<ffffffff8120ea8d>] blk_peek_request+0x191/0x1a7 [ 65.164301] [<ffffffffa000109c>] ? dm_get_live_table+0x44/0x4f [dm_mod] [ 65.164301] [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod] [ 65.164301] [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35 [ 65.164301] [<ffffffff8120f600>] __generic_unplug_device+0x32/0x37 [ 65.164301] [<ffffffff8120f8a0>] generic_unplug_device+0x2e/0x3c [ 65.164301] [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod] [ 65.164301] [<ffffffff8120b063>] blk_unplug+0x29/0x2d [ 65.164301] [<ffffffff8120b079>] blk_backing_dev_unplug+0x12/0x14 [ 65.164301] [<ffffffff81108a82>] block_sync_page+0x35/0x39 [ 65.164301] [<ffffffff810ad64e>] sync_page+0x41/0x4a [ 65.164301] [<ffffffff810ad665>] sync_page_killable+0xe/0x35 [ 65.164301] [<ffffffff81589027>] __wait_on_bit_lock+0x46/0x8f [ 65.164301] [<ffffffff810ad52d>] __lock_page_killable+0x66/0x6d [ 65.164301] [<ffffffff81055fd4>] ? wake_bit_function+0x0/0x33 [ 65.164301] [<ffffffff810ad560>] lock_page_killable+0x2c/0x2e [ 65.164301] [<ffffffff810aebfd>] generic_file_aio_read+0x361/0x4f0 [ 65.164301] [<ffffffff810e906c>] do_sync_read+0xcb/0x108 [ 65.164301] [<ffffffff811e32a3>] ? security_file_permission+0x16/0x18 [ 65.164301] [<ffffffff810e96d3>] vfs_read+0xab/0x108 [ 65.164301] [<ffffffff810e97f0>] sys_read+0x4a/0x6e [ 65.164301] [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b [ 65.164301] Code: 00 74 1c 48 8b 8b 60 01 00 00 48 85 c9 75 04 0f 0b eb fe 48 ff c9 48 89 8b 60 01 00 00 eb 1a 48 8b 8b 58 01 00 00 48 85 c9 75 04 <0f> 0b eb fe 48 ff c9 48 89 8b 58 01 00 00 45 84 e4 74 16 48 8b [ 65.164301] RIP [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf [ 65.164301] RSP <ffff8800ba5a79e8> [ 65.164301] ---[ end trace 1b2b828753032e68 ]--- Signed-off-by: Vivek Goyal <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-04-09blkio: Changes to IO controller additional stats patchesDivyesh Shah1-0/+18
that include some minor fixes and addresses all comments. Changelog: (most based on Vivek Goyal's comments) o renamed blkiocg_reset_write to blkiocg_reset_stats o more clarification in the documentation on io_service_time and io_wait_time o Initialize blkg->stats_lock o rename io_add_stat to blkio_add_stat and declare it static o use bool for direction and sync o derive direction and sync info from existing rq methods o use 12 for major:minor string length o define io_service_time better to cover the NCQ case o add a separate reset_stats interface o make the indexed stats a 2d array to simplify macro and function pointer code o blkio.time now exports in jiffies as before o Added stats description in patch description and Documentation/cgroup/blkio-controller.txt o Prefix all stats functions with blkio and make them static as applicable o replace IO_TYPE_MAX with IO_TYPE_TOTAL o Moved #define constant to top of blk-cgroup.c o Pass dev_t around instead of char * o Add note to documentation file about resetting stats o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef statements o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has rq_direction() and rq_sync() functions which are used by CFQ and when using io-controller at a higher level, bio_* functions can be added. Signed-off-by: Divyesh Shah<[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-04-02blkio: Increment the blkio cgroup stats for real nowDivyesh Shah1-1/+19
We also add start_time_ns and io_start_time_ns fields to struct request here to record the time when a request is created and when it is dispatched to device. We use ns uints here as ms and jiffies are not very useful for non-rotational media. Signed-off-by: Divyesh Shah<[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-03-19block: remove 16 bytes of padding from struct request on 64bitsRichard Kennedy1-5/+6
Remove alignment padding to shrink struct request from 336 to 320 bytes so needing one fewer cacheline and therefore removing 48 bytes from struct request_queue. Signed-off-by: Richard Kennedy <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-03-15block: Finalize conversion of block limits functionsMartin K. Petersen1-24/+0
Remove compatibility wrappers and update remaining drivers. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-02-26block: Consolidate phys_segment and hw_segment limitsMartin K. Petersen1-11/+16
Except for SCSI no device drivers distinguish between physical and hardware segment limits. Consolidate the two into a single segment limit. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-02-26block: Rename blk_queue_max_sectors to blk_queue_max_hw_sectorsMartin K. Petersen1-1/+8
The block layer calling convention is blk_queue_<limit name>. blk_queue_max_sectors predates this practice, leading to some confusion. Rename the function to appropriately reflect that its intended use is to set max_hw_sectors. Also introduce a temporary wrapper for backwards compability. This can be removed after the merge window is closed. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-02-26block: Add BLK_ prefix to definitionsMartin K. Petersen1-3/+7
Add a BLK_ prefix to block layer constants. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-02-26block: Remove unused accessor functionMartin K. Petersen1-1/+0
blk_queue_max_hw_sectors is no longer called by any subsystem and can be removed. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-02-25Merge branch 'master' into for-2.6.34Jens Axboe1-4/+2
Conflicts: include/linux/blkdev.h Signed-off-by: Jens Axboe <[email protected]>
2010-02-23Revert "block: improve queue_should_plug() by looking at IO depths"Jens Axboe1-3/+1
This reverts commit fb1e75389bd06fd5987e9cda1b4e0305c782f854. "Benjamin S." <[email protected]> reports that the patch in question causes a big drop in sequential throughput for him, dropping from 200MB/sec down to only 70MB/sec. Needs to be investigated more fully, for now lets just revert the offending commit. Conflicts: include/linux/blkdev.h Signed-off-by: Jens Axboe <[email protected]>
2010-01-29block: Added in stricter no merge semantics for block I/OAlan D. Brunelle1-0/+3
Updated 'nomerges' tunable to accept a value of '2' - indicating that _no_ merges at all are to be attempted (not even the simple one-hit cache). The following table illustrates the additional benefit - 5 minute runs of a random I/O load were applied to a dozen devices on a 16-way x86_64 system. nomerges Throughput %System Improvement (tput / %sys) -------- ------------ ----------- ------------------------- 0 12.45 MB/sec 0.669365609 1 12.50 MB/sec 0.641519199 0.40% / 2.71% 2 12.52 MB/sec 0.639849750 0.56% / 2.96% Signed-off-by: Alan D. Brunelle <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-01-11block: Stop using byte offsetsMartin K. Petersen1-12/+5
All callers of the stacking functions use 512-byte sector units rather than byte offsets. Simplify the code so the stacking functions take sectors when specifying data offsets. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-01-11block: bdev_stack_limits wrapperMartin K. Petersen1-0/+2
DM does not want to know about partition offsets. Add a partition-aware wrapper that DM can use when stacking block devices. Signed-off-by: Martin K. Petersen <[email protected]> Acked-by: Mike Snitzer <[email protected]> Reviewed-by: Alasdair G Kergon <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2010-01-11block: Fix discard alignment calculation and printingMartin K. Petersen1-2/+5
Discard alignment reporting for partitions was incorrect. Update to match the algorithm used elsewhere. The alignment can be negative (misaligned). Fix format string accordingly. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-12-30block: blk_rq_err_sectors cleanupGui Jianfeng1-6/+0
blk_rq_err_sectors() seems useless, get rid of it. Signed-off-by: Gui Jianfeng <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-12-29block: Fix incorrect alignment offset reporting and update documentationMartin K. Petersen1-2/+9
queue_sector_alignment_offset returned the wrong value which caused partitions to report an incorrect alignment_offset. Since offset alignment calculation is needed several places it has been split into a separate helper function. The topology stacking function has been updated accordingly. Furthermore, comments have been added to clarify how the stacking function works. Signed-off-by: Martin K. Petersen <[email protected]> Tested-by: Mike Snitzer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-12-03block: Allow devices to indicate whether discarded blocks are zeroedMartin K. Petersen1-0/+14
The discard ioctl is used by mkfs utilities to clear a block device prior to putting metadata down. However, not all devices return zeroed blocks after a discard. Some drives return stale data, potentially containing old superblocks. It is therefore important to know whether discarded blocks are properly zeroed. Both ATA and SCSI drives have configuration bits that indicate whether zeroes are returned after a discard operation. Implement a block level interface that allows this information to be bubbled up the stack and queried via a new block device ioctl. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-11-26block: add helpers to run flush_dcache_page() against a bio and a request's ↵Ilya Loginov1-0/+11
pages Mtdblock driver doesn't call flush_dcache_page for pages in request. So, this causes problems on architectures where the icache doesn't fill from the dcache or with dcache aliases. The patch fixes this. The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid pointless empty cache-thrashing loops on architectures for which flush_dcache_page() is a no-op. Every architecture was provided with this flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is equal 1 or do nothing otherwise. See "fix mtd_blkdevs problem with caches on some architectures" discussion on LKML for more information. Signed-off-by: Ilya Loginov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Peter Horton <[email protected]> Cc: "Ed L. Cashin" <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-11-10block: Expose discard granularityMartin K. Petersen1-0/+18
While SSDs track block usage on a per-sector basis, RAID arrays often have allocation blocks that are bigger. Allow the discard granularity and alignment to be set and teach the topology stacking logic how to handle them. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-10-29block: move bdi/address_space unplug functions to backing-dev.hJens Axboe1-13/+0
There's nothing block related about them, the backing device is used by things like NFS etc as well. This gets rid of the need to protect such calls by CONFIG_BLOCK. Signed-off-by: Jens Axboe <[email protected]>
2009-10-05block: get rid of kblock_schedule_delayed_work()Jens Axboe1-4/+0
It was briefly introduced to allow CFQ to to delayed scheduling, but we ended up removing that feature again. So lets kill the function and export, and just switch CFQ back to the normal work schedule since it is now passing in a '0' delay from all call sites. Signed-off-by: Jens Axboe <[email protected]>
2009-10-03block: Topology ioctlsMartin K. Petersen1-5/+30
Not all users of the topology information want to use libblkid. Provide the topology information through bdev ioctls. Also clarify sector size comments for existing BLK ioctls. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-10-03cfq-iosched: implement slower async initiate and queue ramp upJens Axboe1-0/+4
This slowly ramps up the async queue depth based on the time passed since the sync IO, and doesn't allow async at all until a sync slice period has passed. Signed-off-by: Jens Axboe <[email protected]>
2009-10-01block: allow large discard requestsChristoph Hellwig1-0/+3
Currently we set the bio size to the byte equivalent of the blocks to be trimmed when submitting the initial DISCARD ioctl. That means it is subject to the max_hw_sectors limitation of the HBA which is much lower than the size of a DISCARD request we can support. Add a separate max_discard_sectors tunable to limit the size for discard requests. We limit the max discard request size in bytes to 32bit as that is the limit for bio->bi_size. This could be much larger if we had a way to pass that information through the block layer. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-10-01block: use normal I/O path for discard requestsChristoph Hellwig1-4/+2
prepare_discard_fn() was being called in a place where memory allocation was effectively impossible. This makes it inappropriate for all but the most trivial translations of Linux's DISCARD operation to the block command set. Additionally adding a payload there makes the ownership of the bio backing unclear as it's now allocated by the device driver and not the submitter as usual. It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether the queue supports discard operations or not. blkdev_issue_discard now allocates a one-page, sector-length payload which is the right thing for the common ATA and SCSI implementations. The mtd implementation of prepare_discard_fn() is replaced with simply checking for the request being a discard. Largely based on a previous patch from Matthew Wilcox <[email protected]> which did the prepare_discard_fn but not the different payload allocation yet. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-09-14block: use blkdev_issue_discard in blk_ioctl_discardChristoph Hellwig1-3/+6
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <[email protected]>. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-09-14block: Optimal I/O limit wrapperMartin K. Petersen1-0/+1
Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper around it. DM needs this to avoid poking at the queue_limits directly. Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-09-11block: enable rq CPU completion affinity by defaultJens Axboe1-1/+2
Test results here look good, and on big OLTP runs it has also shown to significantly increase cycles attributed to the database and cause a performance boost. Signed-off-by: Jens Axboe <[email protected]>
2009-09-11block: improve queue_should_plug() by looking at IO depthsJens Axboe1-0/+2
Instead of just checking whether this device uses block layer tagging, we can improve the detection by looking at the maximum queue depth it has reached. If that crosses 4, then deem it a queuing device. This is important on high IOPS devices, since plugging hurts the performance there (it can be as much as 10-15% of the sys time). Signed-off-by: Jens Axboe <[email protected]>
2009-09-11bio: first step in sanitizing the bio->bi_rw flag testingJens Axboe1-1/+1
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <[email protected]>
2009-09-11block: implement mixed merge of different failfast requestsTejun Heo1-5/+18
Failfast has characteristics from other attributes. When issuing, executing and successuflly completing requests, failfast doesn't make any difference. It only affects how a request is handled on failure. Allowing requests with different failfast settings to be merged cause normal IOs to fail prematurely while not allowing has performance penalties as failfast is used for read aheads which are likely to be located near in-flight or to-be-issued normal IOs. This patch introduces the concept of 'mixed merge'. A request is a mixed merge if it is merge of segments which require different handling on failure. Currently the only mixable attributes are failfast ones (or lack thereof). When a bio with different failfast settings is added to an existing request or requests of different failfast settings are merged, the merged request is marked mixed. Each bio carries failfast settings and the request always tracks failfast state of the first bio. When the request fails, blk_rq_err_bytes() can be used to determine how many bytes can be safely failed without crossing into an area which requires further retrials. This allows request merging regardless of failfast settings while keeping the failure handling correct. This patch only implements mixed merge but doesn't enable it. The next one will update SCSI to make use of mixed merge. Signed-off-by: Tejun Heo <[email protected]> Cc: Niel Lambrechts <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-09-11block: use the same failfast bits for bio and requestTejun Heo1-0/+4
bio and request use the same set of failfast bits. This patch makes the following changes to simplify things. * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_* bits coincide with __REQ_FAILFAST_* bits. * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV but the matching is useless anyway. init_request_from_bio() is responsible for setting FAILFAST bits on FS requests and non-FS requests never use BIO_RW_AHEAD. Drop the code and comment from blk_rq_bio_prep(). * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and simplify FAILFAST flags handling in init_request_from_bio(). Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-08-01block: Add a wrapper for setting minimum request size without a queueMartin K. Petersen1-0/+1
Introduce blk_limits_io_min() and make blk_queue_io_min() call it. Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-07-11Fix compile error due to congestion_wait() changesTrond Myklebust1-5/+0
Move the definition of BLK_RW_ASYNC/BLK_RW_SYNC into linux/backing-dev.h so that it is available to all callers of set/clear_bdi_congested(). This replaces commit 097041e576ee3a50d92dd643ee8ca65bf6a62e21 ("fuse: Fix build error"), which will be reverted. Signed-off-by: Trond Myklebust <[email protected]> Acked-by: Larry Finger <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Miklos Szeredi <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-07-10block: fix sg SG_DXFER_TO_FROM_DEV regressionFUJITA Tomonori1-0/+1
I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use the block layer mapping API (2.6.28). Douglas Gilbert explained SG_DXFER_TO_FROM_DEV: http://www.spinics.net/lists/linux-scsi/msg37135.html = The semantics of SG_DXFER_TO_FROM_DEV were: - copy user space buffer to kernel (LLD) buffer - do SCSI command which is assumed to be of the DATA_IN (data from device) variety. This would overwrite some or all of the kernel buffer - copy kernel (LLD) buffer back to the user space. The idea was to detect short reads by filling the original user space buffer with some marker bytes ("0xec" it would seem in this report). The "resid" value is a better way of detecting short reads but that was only added this century and requires co-operation from the LLD. = This patch changes the block layer mapping API to support this semantics. This simply adds another field to struct rq_map_data and enables __bio_copy_iov() to copy data from user space even with READ requests. It's better to add the flags field and kills null_mapped and the new from_user fields in struct rq_map_data but that approach makes it difficult to send this patch to stable trees because st and osst drivers use struct rq_map_data (they were converted to use the block layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer mapping API. zhou sf reported this regiression and tested this patch: http://www.spinics.net/lists/linux-scsi/msg37128.html http://www.spinics.net/lists/linux-scsi/msg37168.html Reported-by: zhou sf <[email protected]> Tested-by: zhou sf <[email protected]> Cc: [email protected] Signed-off-by: FUJITA Tomonori <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-07-10Fix congestion_wait() sync/async vs read/write confusionJens Axboe1-4/+4
Commit 1faa16d22877f4839bd433547d770c676d1d964c accidentally broke the bdi congestion wait queue logic, causing us to wait on congestion for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead. Signed-off-by: Jens Axboe <[email protected]>
2009-07-01block: get rid of queue-private command filterJens Axboe1-14/+1
The initial patches to support this through sysfs export were broken and have been if 0'ed out in any release. So lets just kill the code and reclaim some space in struct request_queue, if anyone would later like to fixup the sysfs bits, the git history can easily restore the removed bits. Signed-off-by: Jens Axboe <[email protected]>