aboutsummaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)AuthorFilesLines
2013-11-10bcache: PRECEDING_KEY()Kent Overstreet2-7/+20
btree_insert_key() was open coding this, this is just refactoring. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: bch_(btree|extent)_ptr_invalid()Kent Overstreet4-21/+49
Trying to treat btree pointers and leaf node pointers the same way was a mistake - going to start being more explicit about the type of key/pointer we're dealing with. This is the first part of that refactoring; this patch shouldn't change any actual behaviour. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Don't bother with bucket refcount for btree node allocationsKent Overstreet4-27/+9
The bucket refcount (dropped with bkey_put()) is only needed to prevent the newly allocated bucket from being garbage collected until we've added a pointer to it somewhere. But for btree node allocations, the fact that we have btree nodes locked is enough to guard against races with garbage collection. Eventually the per bucket refcount is going to be replaced with something specific to bch_alloc_sectors(). Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Debug code improvementsKent Overstreet11-186/+162
Couple changes: * Consolidate bch_check_keys() and bch_check_key_order(), and move the checks that only check_key_order() could do to bch_btree_iter_next(). * Get rid of CONFIG_BCACHE_EDEBUG - now, all that code is compiled in when CONFIG_BCACHE_DEBUG is enabled, and there's now a sysfs file to flip on the EDEBUG checks at runtime. * Dropped an old not terribly useful check in rw_unlock(), and refactored/improved a some of the other debug code. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Fix bch_ptr_bad()Kent Overstreet1-34/+33
Previously, bch_ptr_bad() could return false when there was a pointer to a nonexistant device... it only filtered out keys with PTR_CHECK_DEV pointers. This behaviour was intended for multiple cache device support; for that, just because the device for one of the pointers has gone away doesn't mean we want to filter out the rest of the pointers. But we don't yet explicitly filter/check individual pointers, so without that this behaviour was wrong - a corrupt bkey with a bad device pointer could cause us to deref a bad pointer. Doh. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Pull on disk data structures out into a separate headerKent Overstreet9-340/+14
Now, the on disk data structures are in a header that can be exported to userspace - and having them all centralized is nice too. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Move sector allocator to alloc.cKent Overstreet4-186/+189
Just reorganizing things a bit. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Break up struct searchKent Overstreet8-385/+370
With all the recent refactoring around struct btree op struct search has gotten rather large. But we can now easily break it up in a different way - we break out struct btree_insert_op which is for inserting data into the cache, and that's now what the copying gc code uses - struct search is now specific to request.c Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Convert bch_btree_insert() to bch_btree_map_leaf_nodes()Kent Overstreet5-52/+43
Last of the btree_map() conversions. Main visible effect is bch_btree_insert() is no longer taking a struct btree_op as an argument anymore - there's no fancy state machine stuff going on, it's just a normal function. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Don't use op->insert_collisionKent Overstreet5-7/+16
When we convert bch_btree_insert() to bch_btree_map_leaf_nodes(), we won't be passing struct btree_op to bch_btree_insert() anymore - so we need a different way of returning whether there was a collision (really, a replace collision). Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Kill op->replaceKent Overstreet7-73/+71
This is prep work for converting bch_btree_insert to bch_btree_map_leaf_nodes() - we have to convert all its arguments to actual arguments. Bunch of churn, but should be straightforward. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Drop some closure stuffKent Overstreet3-250/+40
With a the recent bcache refactoring, some of the closure code isn't needed anymore. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Kill op->clKent Overstreet8-81/+63
This isn't used for waiting asynchronously anymore - so this is a fairly trivial refactoring. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Prune struct btree_opKent Overstreet11-171/+179
Eventual goal is for struct btree_op to contain only what is necessary for traversing the btree. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Clean up cache_lookup_fnKent Overstreet1-62/+46
There was some looping in submit_partial_cache_hit() and submit_partial_cache_hit() that isn't needed anymore - originally, we wouldn't necessarily process the full hit or miss all at once because when splitting the bio, we took into account the restrictions of the device we were sending it to. But, device bio size restrictions are now handled elsewhere, with a wrapper around generic_make_request() - so that looping has been unnecessary for awhile now and we can now do quite a bit of cleanup. And if we trim the key we're reading from to match the subset we're actually reading, we don't have to explicitly calculate bi_sector anymore. Neat. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Convert bch_btree_read_async() to bch_btree_map_keys()Kent Overstreet5-168/+125
This is a fairly straightforward conversion, mostly reshuffling - op->lookup_done goes away, replaced by MAP_DONE/MAP_CONTINUE. And the code for handling cache hits and misses wasn't really btree code, so it gets moved to request.c. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Move some stuff to btree.cKent Overstreet3-97/+96
With the new btree_map() functions, we don't need to export the stuff needed for traversing the btree anymore. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Add btree_map() functionsKent Overstreet5-97/+186
Lots of stuff has been open coding its own btree traversal - which is generally pretty simple code, but there are a few subtleties. This adds new new functions, bch_btree_map_nodes() and bch_btree_map_keys(), which do the traversal for you. Everything that's open coding btree traversal now (with the exception of garbage collection) is slowly going to be converted to these two functions; being able to write other code at a higher level of abstraction is a big improvement w.r.t. overall code quality. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Convert writeback to a kthreadKent Overstreet4-206/+203
This simplifies the writeback flow control quite a bit - previously, it was conceptually two coroutines, refill_dirty() and read_dirty(). This makes the code quite a bit more straightforward. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Convert gc to a kthreadKent Overstreet8-60/+74
We needed a dedicated rescuer workqueue for gc anyways... and gc was conceptually a dedicated thread, just one that wasn't running all the time. Switch it to a dedicated thread to make the code a bit more straightforward. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Convert bucket_wait to wait_queue_head_tKent Overstreet6-67/+70
At one point we did do fancy asynchronous waiting stuff with bucket_wait, but that's all gone (and bucket_wait is used a lot less than it used to be). So use the standard primitives. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Convert try_wait to wait_queue_head_tKent Overstreet4-99/+75
We never waited on c->try_wait asynchronously, so just use the standard primitives. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Move keylist out of btree_opKent Overstreet6-28/+36
Slowly working on pruning struct btree_op - the aim is for it to only contain things that are actually necessary for traversing the btree. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Refactor journalling flow controlKent Overstreet7-179/+207
Making things less asynchronous that don't need to be - bch_journal() only has to block when the journal or journal entry is full, which is emphatically not a fast path. So make it a normal function that just returns when it finishes, to make the code and control flow easier to follow. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Refactor read request code a bitKent Overstreet1-36/+35
More refactoring, and renaming. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Refactor request_write()Kent Overstreet2-187/+183
Try to improve some of the naming a bit to be more consistent, and also improve the flow of control in request_write() a bit. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Clean up keylist codeKent Overstreet5-52/+57
More random refactoring. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Add explicit keylist arg to btree_insert()Kent Overstreet5-16/+18
Some refactoring - better to explicitly pass stuff around instead of having it all in the "big bag of state", struct btree_op. Going to prune struct btree_op quite a bit over time. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Convert btree_insert_check_key() to btree_insert_node()Kent Overstreet4-72/+79
This was the main point of all this refactoring - now, btree_insert_check_key() won't fail just because the leaf node happened to be full. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Insert multiple keys at a timeKent Overstreet1-17/+16
We'll often end up with a list of adjacent keys to insert - because bch_data_insert() may have to fragment the data it writes. Originally, to simplify things and avoid having to deal with corner cases bch_btree_insert() would pass keys from this list one at a time to btree_insert_recurse() - mainly because the list of keys might span leaf nodes, so it was easier this way. With the btree_insert_node() refactoring, it's now a lot easier to just pass down the whole list and have btree_insert_recurse() iterate over leaf nodes until it's done. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Add btree_insert_node()Kent Overstreet3-66/+105
The flow of control in the old btree insertion code was rather - backwards; we'd recurse down the btree (in btree_insert_recurse()), and then if we needed to split the keys to be inserted into the parent node would be effectively returned up to btree_insert_recurse(), which would notice there was more work to do and finish the insertion. The main problem with this was that the full logic for btree insertion could only be used by calling btree_insert_recurse; if you'd gotten to a btree leaf some other way and had a key to insert, if it turned out that node needed to be split you were SOL. This inverts the flow of control so btree_insert_node() does _full_ btree insertion, including splitting - and takes a (leaf) btree node to insert into as a parameter. This means we can now _correctly_ handle cache misses - for cache misses, we need to insert a fake "check" key into the btree when we discover we have a cache miss - while we still have the btree locked. Previously, if the btree node was full inserting a cache miss would just fail. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Explicitly track btree node's parentKent Overstreet2-10/+20
This is prep work for the reworked btree insertion code. The way we set b->parent is ugly and hacky... the problem is, when btree_split() or garbage collection splits or rewrites a btree node, the parent changes for all its (potentially already cached) children. I may change this later and add some code to look through the btree node cache and find all our cached child nodes and change the parent pointer then... Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Remove unnecessary check in should_split()Kent Overstreet1-3/+2
Checking i->seq was redundant, because since ages ago we always initialize the new bset when advancing b->written Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Stripe size isn't necessarily a power of twoKent Overstreet5-25/+27
Originally I got this right... except that the divides didn't use do_div(), which broke 32 bit kernels. When I went to fix that, I forgot that the raid stripe size usually isn't a power of two... doh Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Add on error panic/unregister settingKent Overstreet4-5/+35
Works kind of like the ext4 setting, to panic or remount read only on errors. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Use blkdev_issue_discard()Kent Overstreet3-117/+11
The old asynchronous discard code was really a relic from when all the allocation code was asynchronous - now that allocation runs out of a dedicated thread there's no point in keeping around all that complicated machinery. Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Fix a lockdep splatKent Overstreet1-1/+1
bch_keybuf_del() takes a spinlock that can't be taken in interrupt context - whoops. Fortunately, this code isn't enabled by default (you have to toggle a sysfs thing). Signed-off-by: Kent Overstreet <[email protected]>
2013-11-10bcache: Fix a journalling performance bugKent Overstreet2-22/+28
2013-11-10bcache: Fix dirty_data accountingKent Overstreet1-4/+7
Dirty data accounting wasn't quite right - firstly, we were adding the key we're inserting after it could have merged with another dirty key already in the btree, and secondly we could sometimes pass the wrong offset to bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is important when tracking dirty data by stripe. Signed-off-by: Kent Overstreet <[email protected]> Cc: linux-stable <[email protected]> # >= v3.10
2013-11-09dm cache: promotion optimisation for writesJoe Thornber1-6/+87
If a write block triggers promotion and covers a whole block we can avoid a copy. Introduce dm_{hook,unhook}_bio to simplify saving and restoring bio fields (bi_private is now used by overwrite). Switch writethrough support over to using these helpers too. Signed-off-by: Joe Thornber <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2013-11-09dm cache: be much more aggressive about promoting writes to discarded blocksJoe Thornber1-21/+63
Previously these promotions only got priority if there were unused cache blocks. Now we give them priority if there are any clean blocks in the cache. The fio_soak_test in the device-mapper-test-suite now gives uniform performance across subvolumes (~16 seconds). Signed-off-by: Joe Thornber <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2013-11-09dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty()Joe Thornber1-19/+128
There are now two multiqueues for in cache blocks. A clean one and a dirty one. writeback_work comes from the dirty one. Demotions come from the clean one. There are two benefits: - Performance improvement, since demoting a clean block is a noop. - The cache cleans itself when io load is light. Signed-off-by: Joe Thornber <[email protected]> Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2013-11-09dm cache: optimize commit_if_neededHeinz Mauelshagen1-5/+7
Check commit_requested flag _before_ calling dm_cache_changed_this_transaction() superfluously. Also, be sure to set last_commit_jiffies _after_ dm_cache_commit() completes. Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Joe Thornber <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2013-11-09dm space map disk: optimise sm_disk_dec_blockJoe Thornber1-17/+1
Don't waste time spotting blocks that have been allocated and then freed in the same transaction. The extra lookup is expensive, and I don't think it really gives us much. Signed-off-by: Joe Thornber <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2013-11-09dm: fix Kconfig menu indentationMikulas Patocka1-11/+11
The option DM_LOG_USERSPACE is sub-option of DM_MIRROR, so place it right after DM_MIRROR. Doing so fixes various other Device mapper targets/features to be properly nested under "Device mapper support". Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2013-11-09dm: allow remove to be deferredMikulas Patocka3-10/+86
This patch allows the removal of an open device to be deferred until it is closed. (Previously such a removal attempt would fail.) The deferred remove functionality is enabled by setting the flag DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or DM_REMOVE_ALL ioctl. On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if the device was removed immediately or flagged to be removed on close - if the flag is clear, the device was removed. On return from DM_DEV_STATUS and other ioctls, the flag DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on closure. A device that is scheduled to be deleted can be revived using the message "@cancel_deferred_remove". This message clears the DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close. Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Alasdair G Kergon <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2013-11-09dm table: print error on preresume failureMike Snitzer1-1/+4
If preresume fails it is worth logging an error given that a device is left suspended due to the failure. This change was motivated by local preresume error logging that was added to the cache target ("preresume failed"). Elevating this target-agnostic context for the where the target-specific error occurred relative to the DM core's callouts makes sense. Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Joe Thornber <[email protected]>
2013-11-09dm crypt: add TCW IV mode for old CBC TCRYPT containersMilan Broz1-2/+183
dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers in LRW or XTS block encryption mode. TCRYPT containers prior to version 4.1 use CBC mode with some additional tweaks, this patch adds support for these containers. This new mode is implemented using special IV generator named TCW (TrueCrypt IV with whitening). TCW IV only supports containers that are encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and TripleDES). While this mode is legacy and is known to be vulnerable to some watermarking attacks (e.g. revealing of hidden disk existence) it can still be useful to activate old containers without using 3rd party software or for independent forensic analysis of such containers. (Both the userspace and kernel code is an independent implementation based on the format documentation and it completely avoids use of original source code.) The TCW IV generator uses two additional keys: Kw (whitening seed, size is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is always the IV size of the selected cipher). These keys are concatenated at the end of the main encryption key provided in mapping table. While whitening is completely independent from IV, it is implemented inside IV generator for simplification. The whitening value is always 16 bytes long and is calculated per sector from provided Kw as initial seed, xored with sector number and mixed with CRC32 algorithm. Resulting value is xored with ciphertext sector content. IV is calculated from the provided Kiv as initial IV seed and xored with sector number. Detailed calculation can be found in the Truecrypt documentation for version < 4.1 and will also be described on dm-crypt site, see: http://code.google.com/p/cryptsetup/wiki/DMCrypt The experimental support for activation of these containers is already present in git devel brach of cryptsetup. Signed-off-by: Milan Broz <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2013-11-09dm crypt: properly handle extra key string in initializationMilan Broz1-11/+18
Some encryption modes use extra keys (e.g. loopAES has IV seed) which are not used in block cipher initialization but are part of key string in table constructor. This patch adds an additional field which describes the length of the extra key(s) and substracts it before real key encryption setting. The key_size always includes the size, in bytes, of the key provided in mapping table. The key_parts describes how many parts (usually keys) are contained in the whole key buffer. And key_extra_size contains size in bytes of additional keys part (this number of bytes must be subtracted because it is processed by the IV generator). | K1 | K2 | .... | K64 | Kiv | |----------- key_size ----------------- | | |-key_extra_size-| | [64 keys] | [1 key] | => key_parts = 65 Example where key string contains main key K, whitening key Kw and IV seed Kiv: | K | Kiv | Kw | |--------------- key_size --------------| | |-----key_extra_size------| | [1 key] | [1 key] | [1 key] | => key_parts = 3 Because key_extra_size is calculated during IV mode setting, key initialization is moved after this step. For now, this change has no effect to supported modes (thanks to ilog2 rounding) but it is required by the following patch. Also, fix a sparse warning in crypt_iv_lmk_one(). Signed-off-by: Milan Broz <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2013-11-09dm cache: log error message if dm_kcopyd_copy() failsHeinz Mauelshagen1-1/+3
A migration failure should be logged (albeit limited). Signed-off-by: Heinz Mauelshagen <[email protected]> Signed-off-by: Joe Thornber <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>