aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-06-14btrfs: show compiled-in config features at module load timeDavid Sterba2-6/+13
We want to know if there are debugging features compiled in, this may affect performance. The message is printed before the sanity checks. Also kill version.h file that serves no purpose, we don't use any version tag for kernel module. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-06-14btrfs: move ifdef around sanity checks out of init_btrfs_fsDavid Sterba3-5/+3
Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-06-14btrfs: add prefix to sanity tests messagesDavid Sterba1-48/+49
And change the message level to KERN_INFO. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-06-14btrfs: add debug check for extent_io range alignmentDavid Sterba1-0/+27
The 'end' value must exactly cover the end of the interval, which means one byte less than the expected block alignment, or in case of a file smaller than one block, one byte less than the inode size. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-06-14Btrfs: fix check on same raid type flag twiceHenrik Nordvik1-1/+1
Code checked for raid 5 flag in two else-if branches, so code would never be reached. Probably a copy-paste bug. Signed-off-by: Henrik Nordvik <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-06-08Btrfs: stop all workers before cleaning up rootsJosef Bacik1-3/+3
Dave reported a panic because the extent_root->commit_root was NULL in the caching kthread. That is because we just unset it in free_root_pointers, which is not the correct thing to do, we have to either wait for the caching kthread to complete or hold the extent_commit_sem lock so we know the thread has exited. This patch makes the kthreads all stop first and then we do our cleanup. This should fix the race. Thanks, Reported-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-06-08Btrfs: fix use-after-free bug during umountLiu Bo1-2/+2
Commit be283b2e674a09457d4563729015adb637ce7cc1 ( Btrfs: use helper to cleanup tree roots) introduced the following bug, BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 IP: [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] [...] Pid: 2463, comm: btrfs-cache-1 Tainted: G O 3.9.0+ #4 innotek GmbH VirtualBox/VirtualBox RIP: 0010:[<ffffffffa039368c>] [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730) [...] Call Trace: [<ffffffffa0398a99>] btrfs_search_slot+0x104/0x64d [btrfs] [<ffffffffa039aea4>] btrfs_next_old_leaf+0xa7/0x334 [btrfs] [<ffffffffa039b141>] btrfs_next_leaf+0x10/0x12 [btrfs] [<ffffffffa039ea13>] caching_thread+0x1a3/0x2e0 [btrfs] [<ffffffffa03d8811>] worker_loop+0x14b/0x48e [btrfs] [<ffffffffa03d86c6>] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [<ffffffff81068d3d>] kthread+0x8d/0x95 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 [<ffffffff8151e5ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 RIP [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] We've free'ed commit_root before actually getting to free block groups where caching thread needs valid extent_root->commit_root. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-06-08Btrfs: init relocate extent_io_tree with a mappingJosef Bacik1-4/+5
Dave reported a NULL pointer deref. This is caused because he thought he'd be smart and add sanity checks to the extent_io bit operations, but he didn't expect a tree to have a NULL mapping. To fix this we just need to init the relocation's processed_blocks with the btree_inode->i_mapping. Thanks, Reported-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-06-08btrfs: Drop inode if inode root is NULLNaohiro Aota1-0/+3
There is a path where btrfs_drop_inode() is called with its inode's root is NULL: In btrfs_new_inode(), when btrfs_set_inode_index() fails, iput() is called. We should handle this case before taking look at the root->root_item. Signed-off-by: Naohiro Aota <[email protected]> Reviewed-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-06-08Btrfs: don't delete fs_roots until after we cleanup the transactionJosef Bacik1-1/+1
We get a use after free if we had a transaction to cleanup since there could be delayed inodes which refer to their respective fs_root. Thanks Reported-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-05-17Merge branch 'for-chris' of ↵Chris Mason16-174/+181
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next
2013-05-17Btrfs: use a btrfs bioset instead of abusing bio internalsChris Mason9-72/+120
Btrfs has been pointer tagging bi_private and using bi_bdev to store the stripe index and mirror number of failed IOs. As bios bubble back up through the call chain, we use these to decide if and how to retry our IOs. They are also used to count IO failures on a per device basis. Recently a bio tracepoint was added lead to crashes because we were abusing bi_bdev. This commit adds a btrfs bioset, and creates explicit fields for the mirror number and stripe index. The plan is to extend this structure for all of the fields currently in struct btrfs_bio, which will mean one less kmalloc in our IO path. Signed-off-by: Chris Mason <[email protected]> Reported-by: Tejun Heo <[email protected]>
2013-05-17Btrfs: make sure roots are assigned before freeing their nodesJosef Bacik1-18/+21
If we fail to load the chunk tree we'll call free_root_pointers, except we may not have assigned the roots for the dev_root/extent_root/csum_root yet, so we could NULL pointer deref at this point. Just add checks to make sure these roots are set to keep us from panicing. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: explicitly use global_block_rsv for quota_treeStefan Behrens1-0/+2
The quota_tree was set up to use the empty_block_rsv before which would be problematic when the filesystem is filled up and ENOSPC happens during internal operations while the quota tree is updated and COWed (when the btrfs_qgroup_info_item items) are written. In fact, use_block_rsv() which is used in btrfs_cow_block() falls back to the global_block_rsv in this case. But just in order to make it more clear what is happening, change it to explicitly use the global_block_rsv. Signed-off-by: Stefan Behrens <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17btrfs: do away with non-whole_page extent I/OAlexandre Oliva1-55/+30
end_bio_extent_readpage computes whole_page based on bv_offset and bv_len, without taking into account that blk_update_request may modify them when some of the blocks to be read into a page produce a read error. This would cause the read to unlock only part of the file range associated with the page, which would in turn leave the entire page locked, which would not only keep the process blocked instead of returning -EIO to it, but also prevent any further access to the file. It turns out that btrfs always issues whole-page reads and writes. The special handling of non-whole_page appears to be a mistake or a left-over from a time when this wasn't the case. Indeed, end_bio_extent_writepage distinguished between whole_page and non-whole_page writes but behaved identically in both cases! I've replaced the whole_page computations with warnings, just to be sure that we're not issuing partial page reads or writes. The warnings should probably just go away some time. Signed-off-by: Alexandre Oliva <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock contextMiao Xie1-0/+6
btrfs_invalidate_inodes() may sleep, so we should not invoke it in the spin lock context. Fix it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix()Miao Xie1-1/+0
We have checked if ->node is NULL or not, so it is unnecessary to use BUG_ON() to check again. Remove it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: pause the space balance when remounting to R/OMiao Xie1-0/+1
Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: fix unprotected root node of the subvolume's inode rb-treeMiao Xie1-4/+3
The root node of the rb-tree may be changed, so we should get it under the lock. Fix it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: fix accessing a freed tree rootMiao Xie1-1/+1
inode_tree_del() will move the tree root into the dead root list, and then the tree will be destroyed by the cleaner. So if we remove the delayed node which is cached in the inode after inode_tree_del(), we may access a freed tree root. Fix it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: return errno if possible when we fail to allocate memoryLiu Bo1-2/+6
We need to set return value explicitly, otherwise we'll lose the error value. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: update the global reserve if it is emptyMiao Xie1-1/+8
Before applying this patch, we reserved the space for the global reserve by the minimum unit if we found it is empty, it was unreasonable and inefficient, because if the global reserve space was depleted, it implied that the size of the global reserve was too small. In this case, we shoud update the global reserve and fill it. Cc: Tsutomu Itoh <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: don't steal the reserved space from the global reserve if their space ↵Miao Xie1-2/+4
type is different If the type of the space we need is different with the global reserve, we can not steal the space from the global reserve, because we can not allocate the space from the free space cache that the global reserve points to. Cc: Tsutomu Itoh <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: optimize the error handle of use_block_rsv()Miao Xie1-37/+28
cc: Tsutomu Itoh <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: don't use global block reservation for inode cache truncationMiao Xie5-22/+34
It is very likely that there are lots of subvolumes/snapshots in the filesystem, so if we use global block reservation to do inode cache truncation, we may hog all the free space that is reserved in global rsv. So it is better that we do the free space reservation for inode cache truncation by ourselves. Cc: Tsutomu Itoh <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: don't abort the current transaction if there is no enough space for ↵Miao Xie1-1/+2
inode cache The filesystem with inode cache was forced to be read-only when we umounted it. Steps to reproduce: # mkfs.btrfs -f ${DEV} # mount -o inode_cache ${DEV} ${MNT} # dd if=/dev/zero of=${MNT}/file1 bs=1M count=8192 # btrfs fi syn ${MNT} # dd if=${MNT}/file1 of=/dev/null bs=1M # rm -f ${MNT}/file1 # btrfs fi syn ${MNT} # umount ${MNT} It is because there was no enough space to do inode cache truncation, and then we aborted the current transaction. But no space error is not a serious problem when we write out the inode cache, and it is safe that we just skip this step if we meet this problem. So we need not abort the current transaction. Reported-by: Tsutomu Itoh <[email protected]> Signed-off-by: Miao Xie <[email protected]> Tested-by: Tsutomu Itoh <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Correct allowed raid levels on balance.Andreas Philipp1-7/+6
Raid5 with 3 devices is well defined while the old logic allowed raid5 only with a minimum of 4 devices when converting the block group profile via btrfs balance. Creating a raid5 with just three devices using mkfs.btrfs worked always as expected. This is now fixed and the whole logic is rewritten. Signed-off-by: Andreas Philipp <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: fix possible memory leak in replace_path()Stefan Behrens1-1/+1
In replace_path(), if read_tree_block() fails, we cannot return directly, we should free some allocated memory otherwise memory leak happens. Similar to Wang's "Btrfs: fix possible memory leak in the find_parent_nodes()" patch, the current commit fixes an issue that is related to the "Btrfs: fix all callers of read_tree_block" commit. Signed-off-by: Stefan Behrens <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: fix possible memory leak in the find_parent_nodes()Wang Shilong1-1/+2
In the find_parent_nodes(), if read_tree_block() fails, we can not return directly, we should free some allocated memory otherwise memory leak happens. Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: don't allow device replace on RAID5/RAID6Stefan Behrens1-0/+5
This is not yet supported and causes crashes. One sad user reported that it destroyed his filesystem. One failure is in __btrfs_map_block+0xc1f calling kmalloc(0). 0x5f21f is in __btrfs_map_block (fs/btrfs/volumes.c:4923). 4918 num_stripes = map->num_stripes; 4919 max_errors = nr_parity_stripes(map); 4920 4921 raid_map = kmalloc(sizeof(u64) * num_stripes, 4922 GFP_NOFS); 4923 if (!raid_map) { 4924 ret = -ENOMEM; 4925 goto out; 4926 } 4927 There might be more issues. Until this is really tested, don't allow users to start the procedure on RAID5/RAID6 filesystems. Signed-off-by: Stefan Behrens <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: handle running extent ops with skinny metadataJosef Bacik4-10/+12
Chris hit a bug where we weren't finding extent records when running extent ops. This is because we use the delayed_ref_head when running the extent op, which means we can't use the ->type checks to see if we are metadata. We also lose the level of the metadata we are working on. So to fix this we can just check the ->is_data section of the extent_op, and we can store the level of the buffer we were modifying in the extent_op. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: remove warn on in free space cache writeoutJosef Bacik1-3/+1
This catches block groups that are too large to properly cache. We deal with this case fine, so the warning just confuses users. Remove the warning. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: don't null pointer deref on abortJosef Bacik1-1/+1
I'm sorry, theres no excuse for this sort of work. We need to use root->leafsize since eb may be NULL. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-05-17btrfs: don't stop searching after encountering the wrong itemGabriel de Perthuis1-5/+5
The search ioctl skips items that are too large for a result buffer, but inline items of a certain size occuring before any search result is found would trigger an overflow and stop the search entirely. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641 Cc: [email protected] Signed-off-by: Gabriel de Perthuis <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17Btrfs: fix off-by-one in fiemapLiu Bo1-2/+2
lock_extent/unlock_extent expect an exclusive end. Tested-by: David Sterba <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-17btrfs: annotate quota tree for lockdepDavid Sterba2-4/+4
Quota tree has been missing from lockdep annotations, though no warning has been seen in the wild. There's currently one entry that does not belong there, BTRFS_ORPHAN_OBJECTID. No such tree exists, it's probably a copy & paste mistake, the id is defined among tree ids. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-07Btrfs: allow superblock mismatch from older mkfsChris Mason1-0/+5
We've added new checks to make sure the super block crc is correct during mount. A fresh filesystem from an older mkfs won't have the crc set. This adds a warning when it finds a newly created filesystem but doesn't fail the mount. Signed-off-by: Chris Mason <[email protected]>
2013-05-07btrfs: enhance superblock checksDavid Sterba2-17/+71
The superblock checksum is not verified upon mount. <awkward silence> Add that check and also reorder existing checks to a more logical order. Current mkfs.btrfs does not calculate the correct checksum of super_block and thus a freshly created filesytem will fail to mount when this patch is applied. First transaction commit calculates correct superblock checksum and saves it to disk. Reproducer: $ mfks.btrfs /dev/sda $ mount /dev/sda /mnt $ btrfs scrub start /mnt $ sleep 5 $ btrfs scrub status /mnt ... super:2 ... Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-05-06btrfs: fix misleading variable name for flagsDavid Sterba2-19/+20
The variable was named 'data' in btrfs_reserve_extent and that's the only function that actually uses it to let btrfs_get_alloc_profile know what profile we want. Then it's passed down as u64 flags. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-06btrfs: use unsigned long type for extent state bitsDavid Sterba3-37/+40
Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-06Btrfs: improve the loop of scrub_stripeLiu Bo1-26/+57
1) Right now scrub_stripe() is looping in some unnecessary cases: * when the found extent item's objectid has been out of the dev extent's range but we haven't finish scanning all the range within the dev extent * when all the items has been processed but we haven't finish scanning all the range within the dev extent In both cases, we can just finish the loop to save costs. 2) Besides, when the found extent item's length is larger than the stripe len(64k), we don't have to release the path and search again as it'll get at the same key used in the last loop, we can instead increase the logical cursor in place till all space of the extent is scanned. 3) And we use 0 as the key's offset to search btree, then get to previous item to find a smaller item, and again have to move to the next one to get the right item. Setting offset=-1 and previous_item() is the correct way. 4) As we won't find any checksum at offset unless this 'offset' is in a data extent, we can just find checksum when we're really going to scrub an extent. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-06btrfs: read entire device info under lockDavid Sterba1-1/+1
There's a theoretical possibility of reading stale (or even more theoretically, freed) data from DEV_INFO ioctl when the device would disappear between an early mutex unlock and data being copied from the device structure. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-06btrfs: remove unused gfp mask parameter from release_extent_buffer callchainDavid Sterba3-16/+7
It's unused since 0b32f4bbb423f02ac. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-06btrfs: handle errors returned from get_tree_block_keyDavid Sterba1-4/+8
Signed-off-by: David Sterba <[email protected]> Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-06btrfs: make static code static & remove dead codeEric Sandeen34-392/+135
Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-06Btrfs: deal with errors in write_dev_supersJosef Bacik1-1/+11
If you try to mount -o loop a restored file system it will panic if the file ends up being smaller than the original disk. This is because we go to try and get a block for a super that may be past the EOF which makes __getblk return NULL for a buffer head when we aren't expecting it to. Fix this by dealing with this case and just jacking up the errors count. With this patch we no longer panic when mounting a restored file system loopback. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-05-06Btrfs: remove almost all of the BUG()'s from tree-log.cJosef Bacik1-53/+98
There were a whole bunch and I was doing it for other things. I haven't tested these error paths but at the very least this is better than panicing. I've only left 2 BUG_ON()'s since they are logic errors and I want to replace them with a ASSERT framework that we can compile out for production users. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-05-06Btrfs: deal with free space cache errors while replaying logJosef Bacik3-32/+59
So everybody who got hit by my fsync bug will still continue to hit this BUG_ON() in the free space cache, which is pretty heavy handed. So I took a file system that had this bug and fixed up all the BUG_ON()'s and leaks that popped up when I tried to mount a broken file system like this. With this patch we just fail to mount instead of panicing. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-05-06Btrfs: automatic rescan after "quota enable" commandJan Schmidt1-0/+11
When qgroup tracking is enabled, we do an automatic cycle of the new rescan mechanism. Signed-off-by: Jan Schmidt <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-05-06Btrfs: rescan for qgroupsJan Schmidt5-35/+400
If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: Jan Schmidt <[email protected]> Signed-off-by: Josef Bacik <[email protected]>