aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs/scrub.c
AgeCommit message (Collapse)AuthorFilesLines
2015-01-14btrfs: expand btrfs_find_item if found_key is NULLDavid Sterba1-2/+6
If the found_key is NULL, then btrfs_find_item becomes a verbose wrapper for simple btrfs_search_slot. After we've removed all such callers, passing a NULL key is not valid anymore. Signed-off-by: David Sterba <[email protected]>
2015-01-14btrfs: cleanup, remove inode_item_info helperDavid Sterba1-1/+5
It's only a simple wrapper around btrfs_find_item, the locally defined key is not used. Signed-off-by: David Sterba <[email protected]>
2015-01-02Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()Dan Carpenter1-1/+1
The only way that "ret" is set is when we call scrub_pages_for_parity() so the skip to "if (ret) " test doesn't make sense and causes a static checker warning. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-12-02Merge branch 'raid56-scrub-replace' of git://github.com/miaoxie/linux-btrfs ↵Chris Mason1-32/+771
into for-linus
2014-12-03Btrfs, raid56: fix use-after-free problem in the final device replace ↵Miao Xie1-1/+1
procedure on raid56 The commit c404e0dc (Btrfs: fix use-after-free in the finishing procedure of the device replace) fixed a use-after-free problem which happened when removing the source device at the end of device replace, but at that time, btrfs didn't support device replace on raid56, so we didn't fix the problem on the raid56 profile. Currently, we implemented device replace for raid56, so we need kick that problem out before we enable that function for raid56. The fix method is very simple, we just increase the bio per-cpu counter before we submit a raid56 io, and decrease the counter when the raid56 io ends. Signed-off-by: Miao Xie <[email protected]>
2014-12-03Btrfs, replace: write raid56 parity into the replace target deviceMiao Xie1-1/+1
This function reused the code of parity scrub, and we just write the right parity or corrected parity into the target device before the parity scrub end. Signed-off-by: Miao Xie <[email protected]>
2014-12-03Btrfs, raid56: support parity scrub on raid56Miao Xie1-14/+595
The implementation is: - Read and check all the data with checksum in the same stripe. All the data which has checksum is COW data, and we are sure that it is not changed though we don't lock the stripe. because the space of that data just can be reclaimed after the current transction is committed, and then the fs can use it to store the other data, but when doing scrub, we hold the current transaction, that is that data can not be recovered, it is safe that read and check it out of the stripe lock. - Lock the stripe - Read out all the data without checksum and parity The data without checksum and the parity may be changed if we don't lock the stripe, so we need read it in the stripe lock context. - Check the parity - Re-calculate the new parity and write back it if the old parity is not right - Unlock the stripe If we can not read out the data or the data we read is corrupted, we will try to repair it. If the repair fails. we will mark the horizontal sub-stripe(pages on the same horizontal) as corrupted sub-stripe, and we will skip the parity check and repair of that horizontal sub-stripe. And in order to skip the horizontal sub-stripe that has no data, we introduce a bitmap. If there is some data on the horizontal sub-stripe, we will the relative bit to 1, and when we check and repair the parity, we will skip those horizontal sub-stripes that the relative bits is 0. Signed-off-by: Miao Xie <[email protected]>
2014-12-03Btrfs, scrub: repair the common data on RAID5/6 if it is corruptedMiao Xie1-18/+176
This patch implement the RAID5/6 common data repair function, the implementation is similar to the scrub on the other RAID such as RAID1, the differentia is that we don't read the data from the mirror, we use the data repair function of RAID5/6. Signed-off-by: Miao Xie <[email protected]>
2014-11-20btrfs: fix dead lock while running replace and defrag concurrentlyGui Hecheng1-30/+60
This can be reproduced by fstests: btrfs/070 The scenario is like the following: replace worker thread defrag thread --------------------- ------------- copy_nocow_pages_worker btrfs_defrag_file copy_nocow_pages_for_inode ... btrfs_writepages |A| lock_extent_bits extent_write_cache_pages |B| lock_page __extent_writepage ... writepage_delalloc find_lock_delalloc_range |B| lock_extent_bits find_or_create_page pagecache_get_page |A| lock_page This leads to an ABBA pattern deadlock. To fix it, o we just change it to an AABB pattern which means to @unlock_extent_bits() before we @lock_page(), and in this way the @extent_read_full_page_nolock() is no longer in an locked context, so change it back to @extent_read_full_page() to regain protection. o Since we @unlock_extent_bits() earlier, then before @write_page_nocow(), the extent may not really point at the physical block we want, so we have to check it before write. Signed-off-by: Gui Hecheng <[email protected]> Tested-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-10-02btrfs: remove unused members from struct scrub_warningDavid Sterba1-15/+2
Signed-off-by: David Sterba <[email protected]>
2014-09-17Btrfs: modify clean_io_failure and make it suit direct ioMiao Xie1-2/+1
We could not use clean_io_failure in the direct IO path because it got the filesystem information from the page structure, but the page in the direct IO bio didn't have the filesystem information in its structure. So we need modify it and pass all the information it need by parameters. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-09-17Btrfs: modify repair_io_failure and make it suit direct ioMiao Xie1-0/+1
The original code of repair_io_failure was just used for buffered read, because it got some filesystem data from page structure, it is safe for the page in the page cache. But when we do a direct read, the pages in bio are not in the page cache, that is there is no filesystem data in the page structure. In order to implement direct read data repair, we need modify repair_io_failure and pass all filesystem data it need by function parameters. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-09-17Btrfs: fix wrong disk size when writing super blocksMiao Xie1-1/+2
total_size will be changed when resizing a device, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super blocks of the previous transaction might not be updated. Considering the consistency of the metadata in the previous transaction, We should use the size in the previous transaction to check if the super block is beyond the boundary of the device. Fix it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-09-17Btrfs: fix wrong generation check of super block on a seed deviceMiao Xie1-1/+5
The super block generation of the seed devices is not the same as the filesystem which sprouted from them because we don't update the super block on the seed devices when we change that new filesystem. So we should not use the generation of that new filesystem to check the super block generation on the seed devices, Fix it. Signed-off-by: Miao Xie <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-09-17Btrfs: fix wrong fsid check of scrubMiao Xie1-5/+13
All the metadata in the seed devices has the same fsid as the fsid of the seed filesystem which is on the seed device, so we should check them by the current filesystem. Fix it. Signed-off-by: Miao Xie <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-09-17btrfs: use nodesize everywhere, kill leafsizeDavid Sterba1-16/+1
The nodesize and leafsize were never of different values. Unify the usage and make nodesize the one. Cleanup the redundant checks and helpers. Shaves a few bytes from .text: text data bss dec hex filename 852418 24560 23112 900090 dbbfa btrfs.ko.before 851074 24584 23112 898770 db6d2 btrfs.ko.after Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-09-17btrfs: kill the key type accessor helpersDavid Sterba1-1/+1
btrfs_set_key_type and btrfs_key_type are used inconsistently along with open coded variants. Other members of btrfs_key are accessed directly without any helpers anyway. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-08-24Btrfs: fix task hang under heavy compressed writeLiu Bo1-6/+8
This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. Btrfs now migrates to use kernel workqueue, but it introduces this hang problem. Btrfs has a kind of work queued as an ordered way, which means that its ordered_func() must be processed in the way of FIFO, so it usually looks like -- normal_work_helper(arg) work = container_of(arg, struct btrfs_work, normal_work); work->func() <---- (we name it work X) for ordered_work in wq->ordered_list ordered_work->ordered_func() ordered_work->ordered_free() The hang is a rare case, first when we find free space, we get an uncached block group, then we go to read its free space cache inode for free space information, so it will file a readahead request btrfs_readpages() for page that is not in page cache __do_readpage() submit_extent_page() btrfs_submit_bio_hook() btrfs_bio_wq_end_io() submit_bio() end_workqueue_bio() <--(ret by the 1st endio) queue a work(named work Y) for the 2nd also the real endio() So the hang occurs when work Y's work_struct and work X's work_struct happens to share the same address. A bit more explanation, A,B,C -- struct btrfs_work arg -- struct work_struct kthread: worker_thread() pick up a work_struct from @worklist process_one_work(arg) worker->current_work = arg; <-- arg is A->normal_work worker->current_func(arg) normal_work_helper(arg) A = container_of(arg, struct btrfs_work, normal_work); A->func() A->ordered_func() A->ordered_free() <-- A gets freed B->ordered_func() submit_compressed_extents() find_free_extent() load_free_space_inode() ... <-- (the above readhead stack) end_workqueue_bio() btrfs_queue_work(work C) B->ordered_free() As if work A has a high priority in wq->ordered_list and there are more ordered works queued after it, such as B->ordered_func(), its memory could have been freed before normal_work_helper() returns, which means that kernel workqueue code worker_thread() still has worker->current_work pointer to be work A->normal_work's, ie. arg's address. Meanwhile, work C is allocated after work A is freed, work C->normal_work and work A->normal_work are likely to share the same address(I confirmed this with ftrace output, so I'm not just guessing, it's rare though). When another kthread picks up work C->normal_work to process, and finds our kthread is processing it(see find_worker_executing_work()), it'll think work C as a collision and skip then, which ends up nobody processing work C. So the situation is that our kthread is waiting forever on work C. Besides, there're other cases that can lead to deadlock, but the real problem is that all btrfs workqueue shares one work->func, -- normal_work_helper, so this makes each workqueue to have its own helper function, but only a wraper pf normal_work_helper. With this patch, I no long hit the above hang. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-08-19Btrfs: don't write any data into a readonly device when scrubMiao Xie1-0/+11
We should not write data into a readonly device especially seed device when doing scrub, skip those devices. Signed-off-by: Miao Xie <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-06-19btrfs: Skip scrubbing removed chunks to avoid -ENOENT.Qu Wenruo1-10/+9
When run scrub with balance, sometimes -ENOENT will be returned, since in scrub_enumerate_chunks() will search dev_extent in *COMMIT_ROOT*, but btrfs_lookup_block_group() will search block group in *MEMORY*, so if a chunk is removed but not committed, -ENOENT will be returned. However, there is no need to stop scrubbing since other chunks may be scrubbed without problem. So this patch changes the behavior to skip removed chunks and continue to scrub the rest. Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-06-09Btrfs: fix scrub_print_warning to handle skinny metadata extentsLiu Bo1-2/+3
The skinny extents are intepreted incorrectly in scrub_print_warning(), and end up hitting the BUG() in btrfs_extent_inline_ref_size. Reported-by: Konstantinos Skarlatos <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-06-09btrfs: Remove unnecessary check for NULLTobias Klauser1-2/+2
iput() already checks for the inode being NULL, thus it's unnecessary to check before calling. Signed-off-by: Tobias Klauser <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-04-11Merge branch 'for-linus' of ↵Linus Torvalds1-19/+89
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull second set of btrfs updates from Chris Mason: "The most important changes here are from Josef, fixing a btrfs regression in 3.14 that can cause corruptions in the extent allocation tree when snapshots are in use. Josef also fixed some deadlocks in send/recv and other assorted races when balance is running" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (23 commits) Btrfs: fix compile warnings on on avr32 platform btrfs: allow mounting btrfs subvolumes with different ro/rw options btrfs: export global block reserve size as space_info btrfs: fix crash in remount(thread_pool=) case Btrfs: abort the transaction when we don't find our extent ref Btrfs: fix EINVAL checks in btrfs_clone Btrfs: fix unlock in __start_delalloc_inodes() Btrfs: scrub raid56 stripes in the right way Btrfs: don't compress for a small write Btrfs: more efficient io tree navigation on wait_extent_bit Btrfs: send, build path string only once in send_hole btrfs: filter invalid arg for btrfs resize Btrfs: send, fix data corruption due to incorrect hole detection Btrfs: kmalloc() doesn't return an ERR_PTR Btrfs: fix snapshot vs nocow writting btrfs: Change the expanding write sequence to fix snapshot related bug. btrfs: make device scan less noisy btrfs: fix lockdep warning with reclaim lock inversion Btrfs: hold the commit_root_sem when getting the commit root during send Btrfs: remove transaction from send ...
2014-04-11Btrfs: fix compile warnings on on avr32 platformWang Shilong1-1/+1
fs/btrfs/scrub.c: In function 'get_raid56_logic_offset': fs/btrfs/scrub.c:2269: warning: comparison of distinct pointer types lacks a cast fs/btrfs/scrub.c:2269: warning: right shift count >= width of type fs/btrfs/scrub.c:2269: warning: passing argument 1 of '__div64_32' from incompatible pointer type Since @rot is an int type, we should not use do_div(), fix it. Reported-by: kbuild test robot <[email protected]> Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-04-07Btrfs: scrub raid56 stripes in the right wayWang Shilong1-19/+89
Steps to reproduce: # mkfs.btrfs -f /dev/sda[8-11] -m raid5 -d raid5 # mount /dev/sda8 /mnt # btrfs scrub start -BR /mnt # echo $? <--unverified errors make return value be 3 This is because we don't setup right mapping between physical and logical address for raid56, which makes checksum mismatch. But we will find everthing is fine later when rechecking using btrfs_map_block(). This patch fixed the problem by settuping right mappings and we only verify data stripes' checksums. Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-04-04Merge branch 'for-linus' of ↵Linus Torvalds1-35/+62
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs changes from Chris Mason: "This is a pretty long stream of bug fixes and performance fixes. Qu Wenruo has replaced the btrfs async threads with regular kernel workqueues. We'll keep an eye out for performance differences, but it's nice to be using more generic code for this. We still have some corruption fixes and other patches coming in for the merge window, but this batch is tested and ready to go" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (108 commits) Btrfs: fix a crash of clone with inline extents's split btrfs: fix uninit variable warning Btrfs: take into account total references when doing backref lookup Btrfs: part 2, fix incremental send's decision to delay a dir move/rename Btrfs: fix incremental send's decision to delay a dir move/rename Btrfs: remove unnecessary inode generation lookup in send Btrfs: fix race when updating existing ref head btrfs: Add trace for btrfs_workqueue alloc/destroy Btrfs: less fs tree lock contention when using autodefrag Btrfs: return EPERM when deleting a default subvolume Btrfs: add missing kfree in btrfs_destroy_workqueue Btrfs: cache extent states in defrag code path Btrfs: fix deadlock with nested trans handles Btrfs: fix possible empty list access when flushing the delalloc inodes Btrfs: split the global ordered extents mutex Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock Btrfs: reclaim delalloc metadata more aggressively Btrfs: remove unnecessary lock in may_commit_transaction() Btrfs: remove the unnecessary flush when preparing the pages Btrfs: just do dirty page flush for the inode with compression before direct IO ...
2014-03-10btrfs: Cleanup the "_struct" suffix in btrfs_workequeueQu Wenruo1-13/+10
Since the "_struct" suffix is mainly used for distinguish the differnt btrfs_work between the original and the newly created one, there is no need using the suffix since all btrfs_workers are changed into btrfs_workqueue. Also this patch fixed some codes whose code style is changed due to the too long "_struct" suffix. Signed-off-by: Qu Wenruo <[email protected]> Tested-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2014-03-10btrfs: Replace fs_info->scrub_* workqueue with btrfs_workqueue.Qu Wenruo1-43/+50
Replace the fs_info->scrub_* with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo <[email protected]> Tested-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2014-03-10Btrfs: wake up @scrub_pause_wait as much as we canWang Shilong1-0/+10
check if @scrubs_running=@scrubs_paused condition inside wait_event() is not an atomic operation which means we may inc/dec @scrub_running/ paused at any time. Let's wake up @scrub_pause_wait as much as we can to let commit transaction blocked less. An example below: Thread1 Thread2 |->scrub_blocked_if_needed() |->scrub_pending_trans_workers_inc |->increase @scrub_paused |->increase @scrub_running |->wake up scrub_pause_wait list |->scrub blocked |->increase @scrub_paused Thread3 is commiting transaction which is blocked at btrfs_scrub_pause(). So after Thread2 increase @scrub_paused, we meet the condition @scrub_paused=@scrub_running, but transaction will be still blocked until another calling to wake up @scrub_pause_wait. Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2014-03-10Btrfs: device_replace: fix deadlock for nocow caseWang Shilong1-2/+15
commit cb7ab02156e4 cause a following deadlock found by xfstests,btrfs/011: Thread1 is commiting transaction which is blocked at btrfs_scrub_pause(). Thread2 is calling btrfs_file_aio_write() which has held inode's @i_mutex and commit transaction(blocked because Thread1 is committing transaction). Thread3 is copy_nocow_page worker which will also try to hold inode @i_mutex, so thread3 will wait Thread1 finished. Thread4 is waiting pending workers finished which will wait Thread3 finished. So the problem is like this: Thread1--->Thread4--->Thread3--->Thread2---->Thread1 Deadlock happens! we fix it by letting Thread1 go firstly, which means we won't block transaction commit while we are waiting pending workers finished. Reported-by: Qu Wenruo <[email protected]> Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2014-01-30Merge branch 'for-linus' of ↵Linus Torvalds1-66/+68
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "This is a pretty big pull, and most of these changes have been floating in btrfs-next for a long time. Filipe's properties work is a cool building block for inheriting attributes like compression down on a per inode basis. Jeff Mahoney kicked in code to export filesystem info into sysfs. Otherwise, lots of performance improvements, cleanups and bug fixes. Looks like there are still a few other small pending incrementals, but I wanted to get the bulk of this in first" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits) Btrfs: fix spin_unlock in check_ref_cleanup Btrfs: setup inode location during btrfs_init_inode_locked Btrfs: don't use ram_bytes for uncompressed inline items Btrfs: fix btrfs_search_slot_for_read backwards iteration Btrfs: do not export ulist functions Btrfs: rework ulist with list+rb_tree Btrfs: fix memory leaks on walking backrefs failure Btrfs: fix send file hole detection leading to data corruption Btrfs: add a reschedule point in btrfs_find_all_roots() Btrfs: make send's file extent item search more efficient Btrfs: fix to catch all errors when resolving indirect ref Btrfs: fix protection between walking backrefs and root deletion btrfs: fix warning while merging two adjacent extents Btrfs: fix infinite path build loops in incremental send btrfs: undo sysfs when open_ctree() fails Btrfs: fix snprintf usage by send's gen_unique_name btrfs: fix defrag 32-bit integer overflow btrfs: sysfs: list the NO_HOLES feature btrfs: sysfs: don't show reserved incompat feature btrfs: call permission checks earlier in ioctls and return EPERM ...
2014-01-28Btrfs: fix to search previous metadata extent item since skinny metadataWang Shilong1-2/+1
There is a bug that using btrfs_previous_item() to search metadata extent item. This is because in btrfs_previous_item(), we need type match, however, since skinny metada was introduced by josef, we may mix this two types. So just use btrfs_previous_item() is not working right. To keep btrfs_previous_item() like normal tree search, i introduce another function btrfs_previous_extent_item(). Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-01-28Btrfs: fix missing skinny metadata check in scrub_stripe()Wang Shilong1-1/+4
Check if we support skinny metadata firstly and fix to use right type to search. Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-01-28Btrfs: convert printk to btrfs_ and fix BTRFS prefixFrank Holton1-23/+27
Convert all applicable cases of printk and pr_* to the btrfs_* macros. Fix all uses of the BTRFS prefix. Signed-off-by: Frank Holton <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-01-28Btrfs: wrap repeated code into scrub_blocked_if_needed()Wang Shilong1-26/+17
Just wrap same code into one function scrub_blocked_if_needed(). This make a change that we will move waiting (@workers_pending = 0) before we can wake up commiting transaction(atomic_inc(@scrub_paused)), we must take carefully to not deadlock here. Thread 1 Thread 2 |->btrfs_commit_transaction() |->set trans type(COMMIT_DOING) |->btrfs_scrub_paused()(blocked) |->join_transaction(blocked) Move btrfs_scrub_paused() before setting trans type which means we can still join a transaction when commiting_transaction is blocked. Signed-off-by: Wang Shilong <[email protected]> Suggested-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-01-28Btrfs: fix wrong super generation mismatch when scrubbing supersWang Shilong1-19/+26
We came a race condition when scrubbing superblocks, the story is: In commiting transaction, we will update @last_trans_commited after writting superblocks, if scrubber start after writting superblocks and before updating @last_trans_commited, generation mismatch happens! We fix this by checking @scrub_pause_req, and we won't start a srubber until commiting transaction is finished.(after btrfs_scrub_continue() finished.) Reported-by: Sebastian Ochmann <[email protected]> Signed-off-by: Wang Shilong <[email protected]> Reviewed-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2014-01-28btrfs: remove unused variable from scrub_fixup_nodatasumValentina Giusti1-2/+0
Signed-off-by: Valentina Giusti <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-12-05Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-29/+4
Pull block layer fixes from Jens Axboe: "A small collection of fixes for the current series. It contains: - A fix for a use-after-free of a request in blk-mq. From Ming Lei - A fix for a blk-mq bug that could attempt to dereference a NULL rq if allocation failed - Two xen-blkfront small fixes - Cleanup of submit_bio_wait() type uses in the kernel, unifying that. From Kent - A fix for 32-bit blkg_rwstat reading. I apologize for this one looking mangled in the shortlog, it's entirely my fault for missing an empty line between the description and body of the text" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: fix use-after-free of request blk-mq: fix dereference of rq->mq_ctx if allocation fails block: xen-blkfront: Fix possible NULL ptr dereference xen-blkfront: Silence pfn maybe-uninitialized warning block: submit_bio_wait() conversions Update of blkg_stat and blkg_rwstat may happen in bh context
2013-11-24block: submit_bio_wait() conversionsKent Overstreet1-29/+4
It was being open coded in a few places. Signed-off-by: Kent Overstreet <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joern Engel <[email protected]> Cc: Prasad Joshi <[email protected]> Cc: Neil Brown <[email protected]> Cc: Chris Mason <[email protected]> Acked-by: NeilBrown <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2013-11-23block: Abstract out bvec iteratorKent Overstreet1-6/+6
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: "Ed L. Cashin" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Lars Ellenberg <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Geoff Levand <[email protected]> Cc: Yehuda Sadeh <[email protected]> Cc: Sage Weil <[email protected]> Cc: Alex Elder <[email protected]> Cc: [email protected] Cc: Joshua Morris <[email protected]> Cc: Philip Kelleher <[email protected]> Cc: Rusty Russell <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Neil Brown <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: [email protected] Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: [email protected] Cc: Boaz Harrosh <[email protected]> Cc: Benny Halevy <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Nicholas A. Bellinger" <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Chris Mason <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Dave Kleikamp <[email protected]> Cc: Joern Engel <[email protected]> Cc: Prasad Joshi <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: KONISHI Ryusuke <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Ben Myers <[email protected]> Cc: [email protected] Cc: Steven Rostedt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Len Brown <[email protected]> Cc: Pavel Machek <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Herton Ronaldo Krzesinski <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Guo Chao <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Wei Yongjun <[email protected]> Cc: "Roger Pau Monné" <[email protected]> Cc: Jan Beulich <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Ian Campbell <[email protected]> Cc: Sebastian Ott <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Jerome Marchand <[email protected]> Cc: Joe Perches <[email protected]> Cc: Peng Tao <[email protected]> Cc: Andy Adamson <[email protected]> Cc: fanchaoting <[email protected]> Cc: Jie Liu <[email protected]> Cc: Sunil Mushran <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Namjae Jeon <[email protected]> Cc: Pankaj Kumar <[email protected]> Cc: Dan Magenheimer <[email protected]> Cc: Mel Gorman <[email protected]>6
2013-11-23block: submit_bio_wait() conversionsKent Overstreet1-29/+4
It was being open coded in a few places. Signed-off-by: Kent Overstreet <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joern Engel <[email protected]> Cc: Prasad Joshi <[email protected]> Cc: Neil Brown <[email protected]> Cc: Chris Mason <[email protected]> Acked-by: NeilBrown <[email protected]>
2013-11-20Btrfs: do not inc uncorrectable_errors counter on ro scrubsIlya Dryomov1-2/+4
Currently if we discover an error when scrubbing in ro mode we a) blindly increment the uncorrectable_errors counter, and b) spam the dmesg with the 'unable to fixup (regular) error at ...' message, even though a) we haven't tried to determine if the error is correctable or not, and b) we haven't tried to fixup anything. Fix this. Cc: Stefan Behrens <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: avoid unnecessary scrub workers allocationWang Shilong1-13/+10
We only allocate scrub workers if we pass all the necessary checks, for example, there are no operation in progress. Besides, move mutex lock protection outside of scrub_workers_get() /scrub_workers_put(). Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: remove scrub_super_lock holding in btrfs_sync_log()Wang Shilong1-13/+5
Originally, we introduced scrub_super_lock to synchronize tree log code with scrubbing super. However we can replace scrub_super_lock with device_list_mutex, because writing super will hold this mutex, this will reduce an extra lock holding when writing supers in sync log code. Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: fix the dev-replace suspend sequenceIlya Dryomov1-2/+3
Replace progresses strictly from lower to higher offsets, and the progress is tracked in chunks, by storing the physical offset of the dev_extent which is being copied in the cursor_left field of btrfs_dev_replace_item. When we are done copying the chunk, left_cursor is updated to point one byte past the dev_extent, so that on resume we can skip the dev_extents that have already been copied. There is a major bug (which goes all the way back to the inception of dev-replace in 3.8) in the way left_cursor is bumped: the bump is done unconditionally, without any regard to the scrub_chunk return value. On suspend (and also on any kind of error) scrub_chunk returns early, i.e. without completing the copy. This leads to us skipping the chunk that hasn't been fully copied yet when resuming. Fix this by doing the cursor_left update only if scrub_chunk ret is 0. (On suspend scrub_chunk returns with -ECANCELED, so this fix covers both suspend and error cases.) Cc: Stefan Behrens <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Btrfs: improve replacing nocow extentsJosef Bacik1-14/+98
Various people have hit a deadlock when running btrfs/011. This is because when replacing nocow extents we will take the i_mutex to make sure nobody messes with the file while we are replacing the extent. The problem is we are already holding a transaction open, which is a locking inversion, so instead we need to save these inodes we find and then process them outside of the transaction. Further we can't just lock the inode and assume we are good to go. We need to lock the extent range and then read back the extent cache for the inode to make sure the extent really still points at the physical block we want. If it doesn't we don't have to copy it. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-01Btrf: cleanup: don't check for root_refs == 0 twiceStefan Behrens1-5/+0
btrfs_read_fs_root_no_name() already checks if btrfs_root_refs() is zero and returns ENOENT in this case. There is no need to do it again in three more places. Signed-off-by: Stefan Behrens <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-01Btrfs: Format mirror_num as intGeert Uytterhoeven1-3/+3
mirror_num is always "int", hence don't cast it to "unsigned long long" and format it as a 64-bit number. Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-01Btrfs: Format PAGE_SIZE as unsigned longGeert Uytterhoeven1-3/+2
PAGE_SIZE is "unsigned long" everywhere, so there's no need to cast it to "unsigned long long" and format it as a 64-bit number. Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-01Btrfs: Remove superfluous casts from u64 to unsigned long longGeert Uytterhoeven1-11/+5
u64 is "unsigned long long" on all architectures now, so there's no need to cast it when formatting it using the "ll" length modifier. Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>