aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)AuthorFilesLines
2015-02-02btrfs: cleanup, rename a few variables in btrfs_read_sys_arrayDavid Sterba1-15/+16
There's a pointer to buffer, integer offset and offset passed as pointer, try to find matching names for them. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-02-02btrfs: add checks for sys_chunk_array sizesDavid Sterba1-0/+19
Verify that possible minimum and maximum size is set, validity of contents is checked in btrfs_read_sys_array. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-02-02btrfs: more superblock checks, lower bounds on devices and sectorsize/nodesizeDavid Sterba1-0/+19
I received a few crafted images from Jiri, all got through the recently added superblock checks. The lower bounds checks for num_devices and sector/node -sizes were missing and caused a crash during mount. Tools for symbolic code execution were used to prepare the images contents. Reported-by: Jiri Slaby <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-02-02Btrfs: Add code to support file creation timechandan r3-2/+36
This patch adds a new member to the 'struct btrfs_inode' structure to hold the file creation time. Signed-off-by: chandan <[email protected]> [refreshed, removed btrfs_inode_otime] Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-02-02btrfs: kill btrfs_inode_*time helpersDavid Sterba5-69/+33
They just opencode taking address of the timespec member. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-30Merge branch 'for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "We have one more fix for btrfs in my for-linus branch - this was a bug in the new raid5/6 scrubbing support" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix raid56 scrub failed in xfstests btrfs/072
2015-01-27btrfs: fix raid56 scrub failed in xfstests btrfs/072Gui Hecheng1-0/+2
The xfstests btrfs/072 reports uncorrectable read errors in dmesg, because scrub forgets to use commit_root for parity scrub routine and scrub attempts to scrub those extents items whose contents are not fully on disk. To fix it, we just add the @search_commit_root flag back. Signed-off-by: Gui Hecheng <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-24Merge branch 'for-linus' of ↵Linus Torvalds6-6/+17
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We have a few fixes in my for-linus branch. Qu Wenruo's batch fix a regression between some our merge window pull and the inode_cache feature. The rest are smaller bugs" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. btrfs: Fix the bug that fs_info->pending_changes is never cleared. btrfs: fix state->private cast on 32 bit machines Btrfs: fix race deleting block group from space_info->ro_bgs list Btrfs: fix incorrect freeing in scrub_stripe btrfs: sync ioctl, handle errors after transaction start
2015-01-22Btrfs: insert_new_root: Fix lock type of the extent buffer.chandan1-1/+1
btrfs_alloc_tree_block() returns an extent buffer on which a blocked lock has been taken. Hence assign the appropriate value to path->locks[level]. Signed-off-by: Chandan Rajendra <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: fix unused members in struct btrfs_rootAnand Jain2-4/+0
There isn't any real use of following members of struct btrfs_root so delete them. struct kobject root_kobj; struct completion kobj_unregister; Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21btrfs: qgroup: move WARN_ON() to the correct location.Yang Dongsheng1-2/+1
In function qgroup_excl_accounting(), we need to WARN when qg->excl is less than what we want to free, same to child and parents. But currently, for parent qgroup, the WARN_ON() is located after freeing qg->excl. It will WARN out even we free it normally. This patch move this WARN_ON() before freeing qg->excl. Signed-off-by: Dongsheng Yang <[email protected]> Reviewed-by: Satoru Takeuchi <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: cleanup unused run_mostLiu Bo1-4/+1
"run_most" is not used anymore. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: Satoru Takeuchi <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Rename all ref_count to refs in structZhao Lei1-13/+13
refs is better than ref_count to record a struct's ref count. Signed-off-by: Zhao Lei <[email protected]> Suggested-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Introduce BTRFS_BLOCK_GROUP_RAID56_MASK to check raid56 simplyZhao Lei4-28/+19
So we can check raid56 with: (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) instead of long: (map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Include map_type in raid_bioZhao Lei4-18/+22
Corrent code use many kinds of "clever" way to determine operation target's raid type, as: raid_map != NULL or raid_map[MAX_NR] == RAID[56]_Q_STRIPE To make code easy to maintenance, this patch put raid type into bbio, and we can always get raid type from bbio with a "stupid" way. Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Simplify scrub_setup_recheck_block()'s argumentZhao Lei1-16/+9
scrub_setup_recheck_block() have many arguments but most of them can be get from one of them, we can remove them to make code clean. Some other cleanup for that function also included in this patch. Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Combine per-page recover in dev-replace and scrubZhao Lei1-72/+48
The code are similar, combine them to make code clean and easy to maintenance. Some lost condition are also completed with benefit of this combination. Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Separate finding-right-mirror and writing-to-target's process in ↵Zhao Lei1-27/+17
scrub_handle_errored_block() In corrent code, code of finding-right-mirror and writing-to-target are mixed in logic, if we find a right mirror but failed in writing to target, it will treat as "hadn't found right block", and fill the target with sblock_bad. Actually, "failed in writing to target" does not mean "source block is wrong", this patch separate above two condition in logic, and do some cleanup to make code clean. Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Break loop when reach BTRFS_MAX_MIRRORS in scrub_setup_recheck_block()Zhao Lei1-1/+1
Use break instead of useless loop should be more suitable in this case. Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: btrfs_rm_dev_replace_blocked(): Use wait_event()Zhao Lei1-11/+2
Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Cleanup btrfs_bio_counter_inc_blocked()Zhao Lei1-6/+6
1: Remove no-need DEFINE_WAIT(wait) 2: Add likely() for BTRFS_FS_STATE_DEV_REPLACING condition 3: Use while loop instead of goto Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Remove noneed force_write in scrub_write_block_to_dev_replaceZhao Lei1-12/+7
It is always 1 in this place, because !1 case was already jumped out in previous code. Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Fix a jump typo of nodatasum_case to avoid wrong WARN_ON()Zhao Lei1-1/+2
if (sctx->is_dev_replace && !is_metadata && !have_csum) { ... goto nodatasum_case; } ... nodatasum_case: WARN_ON(sctx->is_dev_replace); In above code, nodatasum_case marker should be moved after WARN_ON(). Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: add ref_count and free function for btrfs_bioZhao Lei7-54/+57
1: ref_count is simple than current RBIO_HOLD_BBIO_MAP_BIT flag to keep btrfs_bio's memory in raid56 recovery implement. 2: free function for bbio will make code clean and flexible, plus forced data type checking in compile. Changelog v1->v2: Rename following by David Sterba's suggestion: put_btrfs_bio() -> btrfs_put_bio() get_btrfs_bio() -> btrfs_get_bio() bbio->ref_count -> bbio->refs Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: Make raid_map array be inlined in btrfs_bio structureZhao Lei5-125/+105
It can make code more simple and clear, we need not care about free bbio and raid_map together. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: sort raid_map before adding tgtdev stripesZhao Lei1-14/+8
It can avoid complex calculation of real stripes in sort, moreover, we can clean up code of sorting tgtdev_map because it will be in order initially. Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: fix a out-of-bound access of raid_mapZhao Lei1-2/+5
We add the number of stripes on target devices into bbio->num_stripes if we are under device replacement, and we just sort the raid_map of those stripes that not on the target devices, so if when we need real raid_map, we need skip the stripes on the target devices. Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: fix fsync log replay for inodes with a mix of regular refs and extrefsFilipe Manana2-5/+43
If we have an inode with a large number of hard links, some of which may be extrefs, turn a regular ref into an extref, fsync the inode and then replay the fsync log (after a crash/reboot), we can endup with an fsync log that makes the replay code always fail with -EOVERFLOW when processing the inode's references. This is easy to reproduce with the test case I made for xfstests. Its steps are the following: _scratch_mkfs "-O extref" >> $seqres.full 2>&1 _init_flakey _mount_flakey # Create a test file with 3001 hard links. This number is large enough to # make btrfs start using extrefs at some point even if the fs has the maximum # possible leaf/node size (64Kb). echo "hello world" > $SCRATCH_MNT/foo for i in `seq 1 3000`; do ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_`printf "%04d" $i` done # Make sure all metadata and data are durably persisted. sync # Now remove one link, add a new one with a new name, add another new one with # the same name as the one we just removed and fsync the inode. rm -f $SCRATCH_MNT/foo_link_0001 ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_3001 ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_0001 rm -f $SCRATCH_MNT/foo_link_0002 ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_3002 ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_3003 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo # Simulate a crash/power loss. This makes sure the next mount # will see an fsync log and will replay that log. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # Check that the number of hard links is correct, we are able to remove all # the hard links and read the file's data. This is just to verify we don't # get stale file handle errors (due to dangling directory index entries that # point to inodes that no longer exist). echo "Link count: $(stat --format=%h $SCRATCH_MNT/foo)" [ -f $SCRATCH_MNT/foo ] || echo "Link foo is missing" for ((i = 1; i <= 3003; i++)); do name=foo_link_`printf "%04d" $i` if [ $i -eq 2 ]; then [ -f $SCRATCH_MNT/$name ] && echo "Link $name found" else [ -f $SCRATCH_MNT/$name ] || echo "Link $name is missing" fi done rm -f $SCRATCH_MNT/foo_link_* cat $SCRATCH_MNT/foo rm -f $SCRATCH_MNT/foo status=0 exit The fix is simply to correct the overflow condition when overwriting a reference item because it was wrong, trying to increase the item in the fs/subvol tree by an impossible amount. Also ensure that we don't insert one normal ref and one ext ref for the same dentry - this happened because processing a dir index entry from the parent in the log happened when the normal ref item was full, which made the logic insert an extref and later when the normal ref had enough room, it would be inserted again when processing the ref item from the child inode in the log. This issue has been present since the introduction of the extrefs feature (2012). A test case for xfstests follows soon. This test only passes if the previous patch titled "Btrfs: fix fsync when extend references are added to an inode" is applied too. Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: fix fsync when extend references are added to an inodeFilipe Manana1-11/+16
If we added an extended reference to an inode and fsync'ed it, the log replay code would make our inode have an incorrect link count, which was lower then the expected/correct count. This resulted in stale directory index entries after deleting some of the hard links, and any access to the dangling directory entries resulted in -ESTALE errors because the entries pointed to inode items that don't exist anymore. This is easy to reproduce with the test case I made for xfstests, and the bulk of that test is: _scratch_mkfs "-O extref" >> $seqres.full 2>&1 _init_flakey _mount_flakey # Create a test file with 3001 hard links. This number is large enough to # make btrfs start using extrefs at some point even if the fs has the maximum # possible leaf/node size (64Kb). echo "hello world" > $SCRATCH_MNT/foo for i in `seq 1 3000`; do ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_`printf "%04d" $i` done # Make sure all metadata and data are durably persisted. sync # Add one more link to the inode that ends up being a btrfs extref and fsync # the inode. ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_3001 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo # Simulate a crash/power loss. This makes sure the next mount # will see an fsync log and will replay that log. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # Now after the fsync log replay btrfs left our inode with a wrong link count N, # which was smaller than the correct link count M (N < M). # So after removing N hard links, the remaining M - N directory entries were # still visible to user space but it was impossible to do anything with them # because they pointed to an inode that didn't exist anymore. This resulted in # stale file handle errors (-ESTALE) when accessing those dentries for example. # # So remove all hard links except the first one and then attempt to read the # file, to verify we don't get an -ESTALE error when accessing the inodel # # The btrfs fsck tool also detected the incorrect inode link count and it # reported an error message like the following: # # root 5 inode 257 errors 2001, no inode item, link count wrong # unresolved ref dir 256 index 2978 namelen 13 name foo_link_2976 filetype 1 errors 4, no inode ref # # The fstests framework automatically calls fsck after a test is run, so we # don't need to call fsck explicitly here. rm -f $SCRATCH_MNT/foo_link_* cat $SCRATCH_MNT/foo status=0 exit So make sure an fsync always flushes the delayed inode item, so that the fsync log contains it (needed in order to trigger the link count fixup code) and fix the extref counting function, which always return -ENOENT to its caller (and made it assume there were always 0 extrefs). This issue has been present since the introduction of the extrefs feature (2012). A test case for xfstests follows soon. Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: fix directory inconsistency after fsync log replayFilipe Manana1-2/+18
If we have an inode (file) with a link count greater than 1, remove one of its hard links, fsync the inode, power fail/crash and then replay the fsync log on the next mount, we end up getting the parent directory's metadata inconsistent - its i_size still reflects the deleted hard link and has dangling index entries (with no matching inode reference entries). This prevents the directory from ever being deletable, as its i_size can never decrease to BTRFS_EMPTY_DIR_SIZE even if all of its children inodes are deleted, and the dangling index entries can never be removed (as they point to an inode that does not exist anymore). This is easy to reproduce with the following excerpt from the test case for xfstests that I just made: _scratch_mkfs >> $seqres.full 2>&1 _init_flakey _mount_flakey # Create a test file with 2 hard links in the same directory. mkdir -p $SCRATCH_MNT/a/b echo "hello world" > $SCRATCH_MNT/a/b/foo ln $SCRATCH_MNT/a/b/foo $SCRATCH_MNT/a/b/bar # Make sure all metadata and data are durably persisted. sync # Now remove one of the hard links and fsync the inode. rm -f $SCRATCH_MNT/a/b/bar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a/b/foo # Simulate a crash/power loss. This makes sure the next mount # will see an fsync log and will replay that log. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # Remove the last hard link of the file and attempt to remove its parent # directory - this failed in btrfs because the fsync log and replay code # didn't decrement the parent directory's i_size and left dangling directory # index entries - this made the btrfs rmdir implementation always fail with # the error -ENOTEMPTY. # # The dangling directory index entries were visible to user space, but it was # impossible to do anything on them (unlink, open, read, write, stat, etc) # because the inode they pointed to did not exist anymore. # # The parent directory's metadata inconsistency (stale index entries) was # also detected by btrfs' fsck tool, which is run automatically by the fstests # framework when the test finishes. The error message reported by fsck was: # # root 5 inode 259 errors 2001, no inode item, link count wrong # unresolved ref dir 258 index 3 namelen 3 name bar filetype 1 errors 4, no inode ref # rm -f $SCRATCH_MNT/a/b/* rmdir $SCRATCH_MNT/a/b rmdir $SCRATCH_MNT/a To fix this just make sure that after an unlink, if the inode is fsync'ed, he parent inode is fully logged in the fsync log. A test case for xfstests follows soon. Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: lookup for block group only if needed when freeing a tree blockFilipe Manana1-4/+6
Very often our extent buffer's header generation doesn't match the current transaction's id or it is also referenced by other trees (snapshots), so we don't need the corresponding block group cache object. Therefore only search for it if we are going to use it, so we avoid an unnecessary search in the block groups rbtree (and acquiring and releasing its spinlock). Freeing a tree block is performed when COWing or deleting a node/leaf, which implies we are holding the node/leaf's parent node lock, therefore reducing the amount of time spent when freeing a tree block helps reducing the amount of time we are holding the parent node's lock. For example, for a run of xfstests/generic/083, the block group cache object was needed only 682 times for a total of 226691 calls to free a tree block. Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21btrfs: remove a no-op unfreeze superbock callbackDavid Sterba1-6/+0
Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21btrfs: switch extent_state state to unsignedDavid Sterba4-55/+55
Currently there's a 4B hole in the structure between refs and state and there are only 16 bits used so we can make it unsigned. This will get a better packing and may save some stack space for local variables. The size of extent_state gets reduced by 8B and there are usually a lot of slab objects. struct extent_state { u64 start; /* 0 8 */ u64 end; /* 8 8 */ struct rb_node rb_node; /* 16 24 */ wait_queue_head_t wq; /* 40 24 */ /* --- cacheline 1 boundary (64 bytes) --- */ atomic_t refs; /* 64 4 */ /* XXX 4 bytes hole, try to pack */ long unsigned int state; /* 72 8 */ u64 private; /* 80 8 */ /* size: 88, cachelines: 2, members: 7 */ /* sum members: 84, holes: 1, sum holes: 4 */ /* last cacheline: 24 bytes */ }; Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21btrfs: set proper message level for skinny metadataDavid Sterba1-1/+1
This has been confusing people for too long, the message is really just informative. CC: <[email protected]> # 3.10+ Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21btrfs: update message levels after checksum errorsDavid Sterba2-2/+2
The errors are worth noting and might get missed with INFO level. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21btrfs: update message levels during failed mountDavid Sterba1-8/+8
All error conditions from open_ctree shall be ERR. Warning would suggest that something's wrong and we can continue. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21btrfs: update message levels for errorsDavid Sterba2-5/+6
Several messages that point to some internal problem, level INFO is wrong here. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: fix setup_leaf_for_split() to avoid leaf corruptionFilipe Manana1-2/+4
We were incorrectly detecting when the target key didn't exist anymore after releasing the path and re-searching the tree. This could make us split or duplicate (btrfs_split_item() and btrfs_duplicate_item() are its only callers at the moment) an item when we should not. For the case of duplicating an item, we currently only duplicate checksum items (csum tree) and file extent items (fs/subvol trees). For the checksum items we end up overriding the item completely, but for file extent items we update only some of their fields in the copy (done in __btrfs_drop_extents), which means we can end up having a logical corruption for some values. Also for the case where we duplicate a file extent item it will make us produce a leaf with a wrong key order, as btrfs_duplicate_item() advances us to the next slot and then its caller sets a smaller key on the new item at that slot (like in __btrfs_drop_extents() e.g.). Alternatively if the tree search in setup_leaf_for_split() leaves with path->slots[0] == btrfs_header_nritems(path->nodes[0]), we end up accessing beyond the leaf's end (when we check if the item's size has changed) and make our caller insert an item at the invalid slot btrfs_header_nritems(path->nodes[0]) + 1, causing an invalid memory access if the leaf is full or nearly full. This issue has been present since the introduction of this function in 2009: Btrfs: Add btrfs_duplicate_item commit ad48fd754676bfae4139be1a897b1ea58f9aaf21 Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Merge branch 'cleanup/blocksize-diet-part2' of ↵Chris Mason13-73/+84
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus
2015-01-21Merge branch 'fix/find-item-path-leak' of ↵Chris Mason7-49/+46
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus
2015-01-21Btrfs: track dirty block groups on their own listJosef Bacik5-124/+72
Currently any time we try to update the block groups on disk we will walk _all_ block groups and check for the ->dirty flag to see if it is set. This function can get called several times during a commit. So if you have several terabytes of data you will be a very sad panda as we will loop through _all_ of the block groups several times, which makes the commit take a while which slows down the rest of the file system operations. This patch introduces a dirty list for the block groups that we get added to when we dirty the block group for the first time. Then we simply update any block groups that have been dirtied since the last time we called btrfs_write_dirty_block_groups. This allows us to clean up how we write the free space cache out so it is much cleaner. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Btrfs: change how we track dirty rootsJosef Bacik3-5/+21
I've been overloading root->dirty_list to keep track of dirty roots and which roots need to have their commit roots switched at transaction commit time. This could cause us to lose an update to the root which could corrupt the file system. To fix this use a state bit to know if the root is dirty, and if it isn't set we go ahead and move the root to the dirty list. This way if we re-dirty the root after adding it to the switch_commit list we make sure to update it. This also makes it so that the extent root is always the last root on the dirty list to try and keep the amount of churn down at this point in the commit. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-21Merge branch 'for-mingo' of ↵Ingo Molnar1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates. Signed-off-by: Ingo Molnar <[email protected]>
2015-01-20btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.Qu Wenruo1-0/+10
Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending changes) will call btrfs_start_transaction() in sync_fs(), to handle some operations needed to be done in next transaction. However this can cause deadlock if the filesystem is frozen, with the following sys_r+w output: [ 143.255932] Call Trace: [ 143.255936] [<ffffffff816c0e09>] schedule+0x29/0x70 [ 143.255939] [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100 [ 143.255971] [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0 [btrfs] [ 143.255992] [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 143.256003] [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs] [ 143.256007] [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30 [ 143.256011] [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0 [ 143.256014] [<ffffffff811f7d75>] sys_sync+0x55/0x90 [ 143.256017] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 [ 143.256111] Call Trace: [ 143.256114] [<ffffffff816c0e09>] schedule+0x29/0x70 [ 143.256119] [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0 [ 143.256123] [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20 [ 143.256131] [<ffffffff811caae8>] thaw_super+0x28/0xc0 [ 143.256135] [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540 [ 143.256187] [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0 [ 143.256213] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 The reason is like the following: (Holding s_umount) VFS sync_fs staff: |- btrfs_sync_fs() |- btrfs_start_transaction() |- sb_start_intwrite() (Waiting thaw_fs to unfreeze) VFS thaw_fs staff: thaw_fs() (Waiting sync_fs to release s_umount) So deadlock happens. This can be easily triggered by fstest/generic/068 with inode_cache mount option. The fix is to check if the fs is frozen, if the fs is frozen, just return and waiting for the next transaction. Cc: David Sterba <[email protected]> Reported-by: Gui Hecheng <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> [enhanced comment, changed to SB_FREEZE_WRITE] Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-20btrfs: Fix the bug that fs_info->pending_changes is never cleared.Qu Wenruo1-1/+1
Fs_info->pending_changes is never cleared since the original code uses cmpxchg(&fs_info->pending_changes, 0, 0), which will only clear it if pending_changes is already 0. This will cause a lot of problem when mount it with inode_cache mount option. If the btrfs is mounted as inode_cache, pending_changes will always be 1, even when the fs is frozen. Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2015-01-20fs: remove default_backing_dev_infoChristoph Hellwig1-1/+1
Now that default_backing_dev_info is not used for writeback purposes we can git rid of it easily: - instead of using it's name for tracing unregistered bdi we just use "unknown" - btrfs and ceph can just assign the default read ahead window themselves like several other filesystems already do. - we can assign noop_backing_dev_info as the default one in alloc_super. All filesystems already either assigned their own or noop_backing_dev_info. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-01-20fs: remove mapping->backing_dev_infoChristoph Hellwig2-7/+0
Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Ryusuke Konishi <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-01-20fs: export inode_to_bdi and use it in favor of mapping->backing_dev_infoChristoph Hellwig1-1/+1
Now that we got rid of the bdi abuse on character devices we can always use sb->s_bdi to get at the backing_dev_info for a file, except for the block device special case. Export inode_to_bdi and replace uses of mapping->backing_dev_info with it to prepare for the removal of mapping->backing_dev_info. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-01-20fs: introduce f_op->mmap_capabilities for nommu mmap supportChristoph Hellwig1-2/+1
Since "BDI: Provide backing device capability information [try #3]" the backing_dev_info structure also provides flags for the kind of mmap operation available in a nommu environment, which is entirely unrelated to it's original purpose. Introduce a new nommu-only file operation to provide this information to the nommu mmap code instead. Splitting this from the backing_dev_info structure allows to remove lots of backing_dev_info instance that aren't otherwise needed, and entirely gets rid of the concept of providing a backing_dev_info for a character device. It also removes the need for the mtd_inodefs filesystem. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Acked-by: Brian Norris <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-01-19btrfs: fix state->private cast on 32 bit machinesSatoru Takeuchi1-1/+1
Suppress the following warning displayed on building 32bit (i686) kernel. =============================================================================== ... CC [M] fs/btrfs/extent_io.o fs/btrfs/extent_io.c: In function ‘btrfs_free_io_failure_record’: fs/btrfs/extent_io.c:2193:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] failrec = (struct io_failure_record *)state->private; ... =============================================================================== Signed-off-by: Satoru Takeuchi <[email protected]> Reported-by: Chris Murphy <[email protected]> Signed-off-by: Chris Mason <[email protected]>