aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)AuthorFilesLines
2012-07-23Btrfs: fix a bug of writting free space cache during balanceLiu Bo1-3/+21
Here is the whole story: 1) A free space cache consists of two parts: o free space cache inode, which is special becase it's stored in root tree. o free space info, which is stored as the above inode's file data. But we only build up another new inode and does not flush its free space info onto disk when we _clear and setup_ free space cache, and this ends up with that the block group cache's cache_state remains DC_SETUP instead of DC_WRITTEN. And holding DC_SETUP means that we will not truncate this free space cache inode, which means the disk offset of its file extent will remain _unchanged_ at least until next transaction finishes committing itself. 2) We can set a block group readonly when we relocate the block group. However, if the readonly block group covers the disk offset where our free space cache inode is going to write, it will force the free space cache inode into cow_file_range() and it'll end up hitting a BUG_ON. 3) Due to the above analysis, we fix this bug by adding the missing dirty flag. 4) However, it's not over, there is still another case, nospace_cache. With nospace_cache, we do not want to set dirty flag, instead we just truncate free space cache inode and bail out with setting cache state DC_WRITTEN. We can benifit from it since it saves us another 'pre-allocation' part which usually costs a lot. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: do not abort transaction in prealloc caseLiu Bo1-1/+5
During disk balance, we prealloc new file extent for file data relocation, but we may fail in 'no available space' case, and it leads to flipping btrfs into readonly. It is not necessary to bail out and abort transaction since we do have several ways to rescue ourselves from ENOSPC case. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: kill root from btrfs_is_free_space_inodeLiu Bo4-15/+16
Since root can be fetched via BTRFS_I macro directly, we can save an args for btrfs_is_free_space_inode(). Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: fix btrfs_is_free_space_inode to recognize btree inodeLiu Bo1-2/+4
For btree inode, its root is also 'tree root', so btree inode can be misunderstood as a free space inode. We should add one more check for btree inode. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: avoid I/O repair BUG() from btree_read_extent_buffer_pages()Stefan Behrens1-1/+1
From btree_read_extent_buffer_pages(), currently repair_io_failure() can be called with mirror_num being zero when submit_one_bio() returned an error before. This used to cause a BUG_ON(!mirror_num) in repair_io_failure() and indeed this is not a case that needs the I/O repair code to rewrite disk blocks. This commit prevents calling repair_io_failure() in this case and thus avoids the BUG_ON() and malfunction. Signed-off-by: Stefan Behrens <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: rework shrink_delallocJosef Bacik1-57/+24
So shrink_delalloc has grown all sorts of cruft over the years thanks to many reworkings of how we track enospc. What happens now as we fill up the disk is we will loop for freaking ever hoping to reclaim a arbitrary amount of space of metadata, this was from when everybody flushed at the same time. Now we only have people flushing one at a time. So instead of trying to reclaim a huge amount of space, just try to flush a decent chunk of space, and stop looping as soon as we have enough free space to satisfy our reservation. This makes xfstests 224 go much faster. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: do not set subvolume flags in readonly modeLiu Bo1-14/+28
$ mkfs.btrfs /dev/sdb7 $ btrfstune -S1 /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs mount: block device /dev/sdb7 is write-protected, mounting read-only $ btrfs dev add /dev/sdb8 /mnt/btrfs/ Now we get a btrfs in which mnt flags has readonly but sb flags does not. So for those ioctls that only check sb flags with MS_RDONLY, it is going to be a problem. Setting subvolume flags is such an ioctl, we should use mnt_want_write_file() to check RO flags. Signed-off-by: Liu Bo <[email protected]>
2012-07-23Btrfs: use mnt_want_write_file instead of mnt_want_writeLiu Bo1-2/+2
mnt_want_write_file is faster when file has been opened for write. Signed-off-by: Liu Bo <[email protected]>
2012-07-23Btrfs: remove redundant r/o check for superblockLiu Bo1-7/+0
mnt_want_write() and mnt_want_write_file() will check sb->s_flags with MS_RDONLY, and we don't need to do it ourselves. Signed-off-by: Liu Bo <[email protected]>
2012-07-23Btrfs: check write access to mount earlier while creating snapshotsLiu Bo1-11/+11
Move check of write access to mount into upper functions so that we can use mnt_want_write_file instead, which is faster than mnt_want_write. Signed-off-by: Liu Bo <[email protected]>
2012-07-23Btrfs: fix typo in cow_file_range_async and async_cow_submitLiu Bo1-2/+2
It should be 10 * 1024 * 1024. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2012-07-23Btrfs: change how we indicate we're adding csumsJosef Bacik4-15/+18
There is weird logic I had to put in place to make sure that when we were adding csums that we'd used the delalloc block rsv instead of the global block rsv. Part of this meant that we had to free up our transaction reservation before we ran the delayed refs since csum deletion happens during the delayed ref work. The problem with this is that when we release a reservation we will add it to the global reserve if it is not full in order to keep us going along longer before we have to force a transaction commit. By releasing our reservation before we run delayed refs we don't get the opportunity to drain down the global reserve for the work we did, so we won't refill it as often. This isn't a problem per-se, it just results in us possibly committing transactions more and more often, and in rare cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran out of space in our block rsv. This also helps us by holding onto space while the delayed refs run so we don't end up with as many people trying to do things at the same time, which again will help us not force commits or hit the use_block_rsv warnings. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: return error of btrfs_update_inode() to callerTsutomu Itoh2-3/+3
We didn't check error of btrfs_update_inode(), but that error looks easy to bubble back up. Reviewed-by: David Sterba <[email protected]> Signed-off-by: Tsutomu Itoh <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: fix error handling in __add_reloc_root()Dan Carpenter1-1/+2
We dereferenced "node" in the error message after freeing it. Also btrfs_panic() can return so we should return an error code instead of continuing. Signed-off-by: Dan Carpenter <[email protected]>
2012-07-23Btrfs: do not ignore errors from btrfs_cleanup_fs_roots() when mountingIlya Dryomov1-2/+2
There used to be a BUG_ON(ret) there before EH patch (79787eaa) went in. Bail out with EINVAL. Cc: David Sterba <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2012-07-23Btrfs: do not return EINVAL instead of ENOMEM from open_ctree()Ilya Dryomov1-1/+1
When bailing from open_ctree() err is returned, not ret. Signed-off-by: Ilya Dryomov <[email protected]>
2012-07-23Btrfs: add DEVICE_READY ioctlJosef Bacik4-2/+18
This will be used in conjunction with btrfs device ready <dev>. This is needed for initrd's to have a nice and lightweight way to tell if all of the devices needed for a file system are in the cache currently. This keeps them from having to do mount+sleep loops waiting for devices to show up. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: flush delayed inodes if we're short on spaceJosef Bacik3-38/+83
Those crazy gentoo guys have been complaining about ENOSPC errors on their portage volumes. This is because doing things like untar tends to create lots of new files which will soak up all the reservation space in the delayed inodes. Usually this gets papered over by the fact that we will try and commit the transaction, however if this happens in the wrong spot or we choose not to commit the transaction you will be screwed. So add the ability to expclitly flush delayed inodes to free up space. Please test this out guys to make sure it works since as usual I cannot reproduce. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2012-07-23btrfs: join DEV_STATS ioctls to oneDavid Sterba4-15/+15
Commit c11d2c236cc260b36 (Btrfs: add ioctl to get and reset the device stats) introduced two ioctls doing almost the same thing distinguished by just the ioctl number which encodes "do reset after read". I have suggested http://www.mail-archive.com/[email protected]/msg16604.html to implement it via the ioctl args. This hasn't happen, and I think we should use a more clean way to pass flags and should not waste ioctl numbers. CC: Stefan Behrens <[email protected]> Signed-off-by: David Sterba <[email protected]>
2012-07-23btrfs: ignore unfragmented file checks in defrag when compression enabled - ↵Andrew Mahone1-3/+5
rebased Rebased on btrfs-next and retested. Inform should_defrag_range if BTRFS_DEFRAG_RANGE_COMPRESS is set. If so, skip checks for adjacent extents and extent size when deciding whether to defrag, as these can prevent an uncompressed and unfragmented file from being compressed as requested. Signed-off-by: Andrew Mahone <[email protected]>
2012-07-23Btrfs: small naming cleanup in join_transaction()Dan Carpenter1-2/+2
"root->fs_info" and "fs_info" are the same, but "fs_info" is prefered because it is shorter and that's what is used in the rest of the function. Signed-off-by: Dan Carpenter <[email protected]>
2012-07-23Btrfs: don't update atime on RO subvolumesAlexander Block1-0/+5
Before the update_time inode operation was indroduced, it was not possible to prevent updates of atime on RO subvolumes. VFS was only able to check for RO on the mount, but did not know anything about btrfs subvolumes. btrfs_update_time does now check if the root is RO and skip updating of times. Signed-off-by: Alexander Block <[email protected]>
2012-07-23Btrfs: allow mount -o remount,compress=noArnd Hannemann1-1/+8
Btrfs allows to turn on compression on a mounted and used filesystem by issuing mount -o remount,compress=lzo. This patch allows to turn compression off again while the filesystem is mounted. As suggested by David Sterba if the compress-force option was set, it is implicitly cleared if compression is turned off. Tested-by: David Sterba <[email protected]> Signed-off-by: Arnd Hannemann <[email protected]>
2012-07-23Btrfs: remove ->dirty_inodeJosef Bacik1-11/+0
We do all of our inode updating when we change it, and now that we do ->update_time we don't need ->dirty_inode for atime updates anymore, so just remove it. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2012-07-23Btrfs: reduce calls to wake_up on uncontended locksChris Mason1-5/+9
The btrfs locks were unconditionally calling wake_up as the locks were released. This lead to extra thrashing on the waitqueue, especially for locks that were dominated by readers. Signed-off-by: Chris Mason <[email protected]>
2012-07-23Btrfs: don't wait around for new log writers on an SSDChris Mason1-1/+2
Waiting on spindles improves performance, but ssds want all the IO as quickly as we can push it down. Signed-off-by: Chris Mason <[email protected]>
2012-07-23btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()Al Viro1-2/+2
Signed-off-by: Al Viro <[email protected]>
2012-07-14VFS: Pass mount flags to sget()David Howells1-2/+2
Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-07-14don't pass nameidata to ->create()Al Viro1-1/+1
boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: Al Viro <[email protected]>
2012-07-14stop passing nameidata to ->lookup()Al Viro1-1/+1
Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <[email protected]>
2012-07-14vfs: switch i_dentry/d_alias to hlistAl Viro1-1/+1
Signed-off-by: Al Viro <[email protected]>
2012-07-12Btrfs: fix typo in convert_extent_bitLiu Bo1-1/+2
It should be convert_extent_bit. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2012-07-12Btrfs: add qgroup inheritanceArne Jansen4-18/+61
When creating a subvolume or snapshot, it is necessary to initialize the qgroup account with a copy of some other (tracking) qgroup. This patch adds parameters to the ioctls to pass the information from which qgroup to inherit. Signed-off-by: Arne Jansen <[email protected]>
2012-07-12Btrfs: add qgroup ioctlsArne Jansen2-0/+212
Ioctls to control the qgroup feature like adding and removing qgroups and assigning qgroups. Signed-off-by: Arne Jansen <[email protected]>
2012-07-12Btrfs: hooks to reserve qgroup spaceArne Jansen3-0/+29
Like block reserves, reserve a small piece of space on each transaction start and for delalloc. These are the hooks that can actually return EDQUOT to the user. The amount of space reserved is tracked in the transaction handle. Signed-off-by: Arne Jansen <[email protected]>
2012-07-12Btrfs: hooks for qgroup to record delayed refsJan Schmidt3-6/+36
Hooks into qgroup code to record refs and into transaction commit. This is the main entry point for qgroup. Basically every change in extent backrefs got accounted to the appropriate qgroups. Signed-off-by: Arne Jansen <[email protected]> Signed-off-by: Jan Schmidt <[email protected]>
2012-07-12Btrfs: quota tree support and startupArne Jansen2-6/+42
Init the quota tree along with the others on open_ctree and close_ctree. Add the quota tree to the list of well known trees in btrfs_read_fs_root_no_name. Signed-off-by: Arne Jansen <[email protected]>
2012-07-12Btrfs: call the qgroup accounting functionsJan Schmidt2-0/+17
Signed-off-by: Jan Schmidt <[email protected]>
2012-07-12Btrfs: qgroup implementation and prototypesArne Jansen7-1/+1681
Signed-off-by: Arne Jansen <[email protected]> Signed-off-by: Jan Schmidt <[email protected]>
2012-07-10Btrfs: Test code to change the order of delayed-ref processingArne Jansen1-0/+49
Normally delayed refs get processed in ascending bytenr order. This correlates in most cases to the order added. To expose dependencies on this order, we start to process the tree in the middle instead of the beginning. This code is only effective when SCRAMBLE_DELAYED_REFS is defined. Signed-off-by: Arne Jansen <[email protected]>
2012-07-10Btrfs: qgroup state and initializationArne Jansen2-0/+31
Add state to fs_info. Signed-off-by: Arne Jansen <[email protected]>
2012-07-10Btrfs: added helper to create new treesArne Jansen2-1/+83
This creates a brand new tree. Will be used to create the quota tree. Signed-off-by: Arne Jansen <[email protected]>
2012-07-10Btrfs: check the root passed to btrfs_end_transactionArne Jansen2-0/+12
This patch only add a consistancy check to validate that the same root is passed to start_transaction and end_transaction. Subvolume quota depends on this. Signed-off-by: Arne Jansen <[email protected]>
2012-07-10Btrfs: add helper for tree enumerationArne Jansen2-0/+75
Often no exact match is wanted but just the next lower or higher item. There's a lot of duplicated code throughout btrfs to deal with the corner cases. This patch adds a helper function that can facilitate searching. Signed-off-by: Arne Jansen <[email protected]>
2012-07-10Btrfs: qgroup on-disk formatArne Jansen1-0/+136
Not all features are in use by the current version and thus may change in the future. Signed-off-by: Arne Jansen <[email protected]>
2012-07-10Btrfs: join tree mod log code with the code holding back delayed refsJan Schmidt9-219/+240
We've got two mechanisms both required for reliable backref resolving (tree mod log and holding back delayed refs). You cannot make use of one without the other. So instead of requiring the user of this mechanism to setup both correctly, we join them into a single interface. Additionally, we stop inserting non-blockers into fs_info->tree_mod_seq_list as we did before, which was of no value. Signed-off-by: Jan Schmidt <[email protected]>
2012-07-10Btrfs: fix buffer leak in btrfs_next_old_leafJan Schmidt1-0/+1
When calling btrfs_next_old_leaf, we were leaking an extent buffer in the rare case of using the deadlock avoidance code needed for the tree mod log. Signed-off-by: Jan Schmidt <[email protected]>
2012-07-05Merge branch 'for-linus' of ↵Linus Torvalds13-201/+258
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "I held off on my rc5 pull because I hit an oops during log recovery after a crash. I wanted to make sure it wasn't a regression because we have some logging fixes in here. It turns out that a commit during the merge window just made it much more likely to trigger directory logging instead of full commits, which exposed an old bug. The new backref walking code got some additional fixes. This should be the final set of them. Josef fixed up a corner where our O_DIRECT writes and buffered reads could expose old file contents (not stale, just not the most recent). He and Liu Bo fixed crashes during tree log recover as well. Ilya fixed errors while we resume disk balancing operations on readonly mounts." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: run delayed directory updates during log replay Btrfs: hold a ref on the inode during writepages Btrfs: fix tree log remove space corner case Btrfs: fix wrong check during log recovery Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS Btrfs: resume balance on rw (re)mounts properly Btrfs: restore restriper state on all mounts Btrfs: fix dio write vs buffered read race Btrfs: don't count I/O statistic read errors for missing devices Btrfs: resolve tree mod log locking issue in btrfs_next_leaf Btrfs: fix tree mod log rewind of ADD operations Btrfs: leave critical region in btrfs_find_all_roots as soon as possible Btrfs: always put insert_ptr modifications into the tree mod log Btrfs: fix tree mod log for root replacements at leaf level Btrfs: support root level changes in __resolve_indirect_ref Btrfs: avoid waiting for delayed refs when we must not
2012-07-02Btrfs: run delayed directory updates during log replayChris Mason1-0/+6
While we are resolving directory modifications in the tree log, we are triggering delayed metadata updates to the filesystem btrees. This commit forces the delayed updates to run so the replay code can find any modifications done. It stops us from crashing because the directory deleltion replay expects items to be removed immediately from the tree. Signed-off-by: Chris Mason <[email protected]> cc: [email protected]
2012-07-02Btrfs: hold a ref on the inode during writepagesJosef Bacik1-0/+14
We can race with unlink and not actually be able to do our igrab in btrfs_add_ordered_extent. This will result in all sorts of problems. Instead of doing the complicated work to try and handle returning an error properly from btrfs_add_ordered_extent, just hold a ref to the inode during writepages. If we cannot grab a ref we know we're freeing this inode anyway and can just drop the dirty pages on the floor, because screw them we're going to invalidate them anyway. Thanks, Signed-off-by: Josef Bacik <[email protected]>