| Age | Commit message (Collapse) | Author | Files | Lines |
|
The bcache driver has always accepted arbitrarily large bios and split
them internally. Now that every driver must accept arbitrarily large
bios this code isn't nessecary anymore.
Cc: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <[email protected]>
Signed-off-by: Ming Lin <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently we have two different ways to signal an I/O error on a BIO:
(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback
The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.
So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
There were two issues here:
- writeback thread did not start until the device first became dirty
- writeback thread used uninterruptible sleep once running
Without this patch I see kernel warnings printed and a load average of
1.52 after booting my test VM. With this patch the warnings are gone and
the load average is near 0.00 as expected.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Needed to bring blk-mq uptodate, since changes have been going in
since for-3.14/core was established.
Fixup merge issues related to the immutable biovec changes.
Signed-off-by: Jens Axboe <[email protected]>
Conflicts:
block/blk-flush.c
fs/btrfs/check-integrity.c
fs/btrfs/extent_io.c
fs/btrfs/scrub.c
fs/logfs/dev_bdev.c
|
|
The old writeback PD controller could get into states where it had throttled all
the way down and take way too long to recover - it was too complicated to really
understand what it was doing.
This rewrites a good chunk of it to hopefully be simpler and make more sense,
and it also pays more attention to units which should make the behaviour a bit
easier to understand.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We're just waiting on kthread_should_stop(), nothing else, so
interruptible sleep was wrong here.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
at the beginning (schedule_timout_interuptible) and others
do his on their own
This prevents wrong load average calculation (load of 1 per thread)
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.
Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "Ed L. Cashin" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Yehuda Sadeh <[email protected]>
Cc: Sage Weil <[email protected]>
Cc: Alex Elder <[email protected]>
Cc: [email protected]
Cc: Joshua Morris <[email protected]>
Cc: Philip Kelleher <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Jeremy Fitzhardinge <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: [email protected]
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: [email protected]
Cc: Boaz Harrosh <[email protected]>
Cc: Benny Halevy <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Nicholas A. Bellinger" <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Steven Whitehouse <[email protected]>
Cc: Dave Kleikamp <[email protected]>
Cc: Joern Engel <[email protected]>
Cc: Prasad Joshi <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: KONISHI Ryusuke <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Ben Myers <[email protected]>
Cc: [email protected]
Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Herton Ronaldo Krzesinski <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Guo Chao <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Asai Thambi S P <[email protected]>
Cc: Selvan Mani <[email protected]>
Cc: Sam Bradshaw <[email protected]>
Cc: Wei Yongjun <[email protected]>
Cc: "Roger Pau Monné" <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: Sebastian Ott <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Jerome Marchand <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Peng Tao <[email protected]>
Cc: Andy Adamson <[email protected]>
Cc: fanchaoting <[email protected]>
Cc: Jie Liu <[email protected]>
Cc: Sunil Mushran <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Pankaj Kumar <[email protected]>
Cc: Dan Magenheimer <[email protected]>
Cc: Mel Gorman <[email protected]>6
|
|
Whoops.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
The old scanning-by-stripe code burned too much CPU, this should be
better.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Last of the btree_map() conversions. Main visible effect is
bch_btree_insert() is no longer taking a struct btree_op as an argument
anymore - there's no fancy state machine stuff going on, it's just a
normal function.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
When we convert bch_btree_insert() to bch_btree_map_leaf_nodes(), we
won't be passing struct btree_op to bch_btree_insert() anymore - so we
need a different way of returning whether there was a collision (really,
a replace collision).
Signed-off-by: Kent Overstreet <[email protected]>
|
|
This is prep work for converting bch_btree_insert to
bch_btree_map_leaf_nodes() - we have to convert all its arguments to
actual arguments. Bunch of churn, but should be straightforward.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
This isn't used for waiting asynchronously anymore - so this is a fairly
trivial refactoring.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Eventual goal is for struct btree_op to contain only what is necessary
for traversing the btree.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Lots of stuff has been open coding its own btree traversal - which is
generally pretty simple code, but there are a few subtleties.
This adds new new functions, bch_btree_map_nodes() and
bch_btree_map_keys(), which do the traversal for you. Everything that's
open coding btree traversal now (with the exception of garbage
collection) is slowly going to be converted to these two functions;
being able to write other code at a higher level of abstraction is a
big improvement w.r.t. overall code quality.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
This simplifies the writeback flow control quite a bit - previously, it
was conceptually two coroutines, refill_dirty() and read_dirty(). This
makes the code quite a bit more straightforward.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Slowly working on pruning struct btree_op - the aim is for it to only
contain things that are actually necessary for traversing the btree.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Some refactoring - better to explicitly pass stuff around instead of
having it all in the "big bag of state", struct btree_op. Going to prune
struct btree_op quite a bit over time.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Originally I got this right... except that the divides didn't use
do_div(), which broke 32 bit kernels. When I went to fix that, I forgot
that the raid stripe size usually isn't a power of two... doh
Signed-off-by: Kent Overstreet <[email protected]>
|
|
schedule_timeout() != schedule_timeout_uninterruptible()
Signed-off-by: Kent Overstreet <[email protected]>
Cc: linux-stable <[email protected]> # >= v3.10
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Background writeback works by scanning the btree for dirty data and
adding those keys into a fixed size buffer, then for each dirty key in
the keybuf writing it to the backing device.
When read_dirty() finishes and it's time to scan for more dirty data, we
need to wait for the outstanding writeback IO to finish - they still
take up slots in the keybuf (so that foreground writes can check for
them to avoid races) - without that wait, we'll continually rescan when
we'll be able to add at most a key or two to the keybuf, and that takes
locks that starves foreground IO. Doh.
Signed-off-by: Kent Overstreet <[email protected]>
Cc: linux-stable <[email protected]> # >= v3.10
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Some of bcache's utility code has made it into the rest of the kernel,
so drop the bcache versions.
Bcache used to have a workaround for allocating from a bio set under
generic_make_request() (if you allocated more than once, the bios you
already allocated would get stuck on current->bio_list when you
submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT
when allocating bios under generic_make_request() so that allocation
could fail and it could retry from workqueue. But bio_alloc_bioset() has
a workaround now, so we can drop this hack and the associated error
handling.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Now that we're tracking dirty data per stripe, we can add two
optimizations for raid5/6:
* If a stripe is already dirty, force writes to that stripe to
writeback mode - to help build up full stripes of dirty data
* When flushing dirty data, preferentially write out full stripes first
if there are any.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
To make background writeback aware of raid5/6 stripes, we first need to
track the amount of dirty data within each stripe - we do this by
breaking up the existing sectors_dirty into per stripe atomic_ts
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Previously, dirty_data wouldn't get initialized until the first garbage
collection... which was a bit of a problem for background writeback (as
the PD controller keys off of it) and also confusing for users.
This is also prep work for making background writeback aware of raid5/6
stripes.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
The tracepoints were reworked to be more sensible, and fixed a null
pointer deref in one of the tracepoints.
Converted some of the pr_debug()s to tracepoints - this is partly a
performance optimization; it used to be that with DEBUG or
CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it
was changed to an empty inline function.
Some of the pr_debug() statements had rather expensive function calls as
part of the arguments, so this code was getting run unnecessarily even
on non debug kernels - in some fast paths, too.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
This code appears to have rotted... fix various bugs and do some
refactoring.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
Cc: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Does writethrough and writeback caching, handles unclean shutdown, and
has a bunch of other nifty features motivated by real world usage.
See the wiki at http://bcache.evilpiepirate.org for more.
Signed-off-by: Kent Overstreet <[email protected]>
|