aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2013-11-13writeback: do not sync data dirtied after sync startJan Kara3-18/+32
When there are processes heavily creating small files while sync(2) is running, it can easily happen that quite some new files are created between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2). That can happen especially if there are several busy filesystems (remember that sync traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one fs before starting it on another fs). Because WB_SYNC_ALL pass is slow (e.g. causes a transaction commit and cache flush for each inode in ext3), resulting sync(2) times are rather large. The following script reproduces the problem: function run_writers { for (( i = 0; i < 10; i++ )); do mkdir $1/dir$i for (( j = 0; j < 40000; j++ )); do dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null done & done } for dir in "$@"; do run_writers $dir done sleep 40 time sync Fix the problem by disregarding inodes dirtied after sync(2) was called in the WB_SYNC_ALL pass. To allow for this, sync_inodes_sb() now takes a time stamp when sync has started which is used for setting up work for flusher threads. To give some numbers, when above script is run on two ext4 filesystems on simple SATA drive, the average sync time from 10 runs is 267.549 seconds with standard deviation 104.799426. With the patched kernel, the average sync time from 10 runs is 2.995 seconds with standard deviation 0.096. Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Fengguang Wu <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13/proc/pid/smaps: show VM_SOFTDIRTY flag in VmFlags lineNaoya Horiguchi1-0/+3
This flag shows that the VMA is "newly created" and thus represents "dirty" in the task's VM. You can clear it by "echo 4 > /proc/pid/clear_refs." Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: Pavel Emelyanov <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13mm, mempolicy: make mpol_to_str robust and always succeedDavid Rientjes1-8/+6
mpol_to_str() should not fail. Currently, it either fails because the string buffer is too small or because a string hasn't been defined for a mempolicy mode. If a new mempolicy mode is introduced and no string is defined for it, just warn and return "unknown". If the buffer is too small, just truncate the string and return, the same behavior as snprintf(). This also fixes a bug where there was no NULL-byte termination when doing *p++ = '=' and *p++ ':' and maxlen has been reached. Signed-off-by: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Chen Gang <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Dave Jones <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13mm: use pgdat_end_pfn() to simplify the code in othersXishi Qiu1-2/+1
Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn + pgdat->node_spanned_pages". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: simplify ocfs2_invalidatepage() and ocfs2_releasepage()Jan Kara1-17/+2
Ocfs2 doesn't do data journalling. Thus its ->invalidatepage and ->releasepage functions never get called on buffers that have journal heads attached. So just use standard variants of functions from buffer.c. Signed-off-by: Jan Kara <[email protected]> Cc: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: convert use of typedef ctl_table to struct ctl_tableJoe Perches1-4/+4
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <[email protected]> Cc: Mark Fasheh <[email protected]> Acked-by: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: fix possible double free in ocfs2_write_begin_nolockXue jiufei1-2/+6
When ocfs2_write_cluster_by_desc() failed in ocfs2_write_begin_nolock() because of ENOSPC, it goes to out_quota, freeing data_ac(meta_ac). Then it calls ocfs2_try_to_free_truncate_log() to free space. If enough space freed, it will try to write again. Unfortunately, some error happenes before ocfs2_lock_allocators(), it goes to out and free data_ac(meta_ac) again. Signed-off-by: joyce <[email protected]> Reviewed-by: Jie Liu <[email protected]> Acked-by: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: add missing errno in ocfs2_ioctl_move_extents()Younger Liu1-2/+6
If the file is not regular or writeable, it should return errno(EPERM). This patch is based on 85a258b70d ("ocfs2: fix error handling in ocfs2_ioctl_move_extents()"). Signed-off-by: Younger Liu <[email protected]> Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Dan Carpenter <[email protected]> Reviewed-by: Jie Liu <[email protected]> Cc: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: do not call brelse() if group_bh is not initialized in ocfs2_group_add()Younger Liu1-3/+6
If group_bh is not initialized, there is no need to release. This problem does not cause anything wrong, but the patch would make the code more logical. Signed-off-by: Younger Liu <[email protected]> Cc: Mark Fasheh <[email protected]> Acked-by: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: rollback transaction in ocfs2_group_add()Younger Liu1-0/+3
If ocfs2_journal_access_di() fails, group->bg_next_group should rollback. Otherwise, there would be a inconsistency between group_bh and main_bm_bh. Signed-off-by: Younger Liu <[email protected]> Cc: Mark Fasheh <[email protected]> Acked-by: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: break useless while loopJunxiao Bi1-1/+3
Signed-off-by: Junxiao Bi <[email protected]> Signed-off-by: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: use find_last_bit()Akinobu Mita1-16/+2
We already have find_last_bit(). So just use it as described in the comment. Signed-off-by: Akinobu Mita <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: delay migration when the lockres is in migration stateXue jiufei1-0/+4
We trigger a bug in __dlm_lockres_reserve_ast() when we parallel umount 4 nodes. The situation is as follows: 1) Node A migrate all lockres it owned(eg. lockres A) to other nodes say node B when it umounts. 2) Receiving MIG_LOCKRES message from A, Node B masters the lockres A with DLM_LOCK_RES_MIGRATING state set. 3) Then we umount ocfs2 on node B. It also should migrate lockres A to another node, say node C. But now, DLM_LOCK_RES_MIGRATING state of lockers A is not cleared. Node B triggered the BUG on lockres with state DLM_LOCK_RES_MIGRATING. Signed-off-by: Xuejiufei <[email protected]> Signed-off-by: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Tariq Saeed <[email protected]> Cc: Srinivas Eeda <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: skip locks in the blocked listXue jiufei1-0/+7
A parallel umount on 4 nodes triggered a bug in dlm_process_recovery_date(). Here's the situation: Receiving MIG_LOCKRES message, A node processes the locks in migratable lockres. It copys lvb from migratable lockres when processing the first valid lock. If there is a lock in the blocked list with the EX level, it triggers the BUG. Since valid lvbs are set when locks are granted with EX or PR levels, locks in the blocked list cannot have valid lvbs. Therefore I think we should skip the locks in the blocked list. Signed-off-by: Xuejiufei <[email protected]> Signed-off-by: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: use bitmap_weight()Akinobu Mita1-15/+7
Use bitmap_weight() instead of reinventing the wheel. Signed-off-by: Akinobu Mita <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: don't spam on -EDQUOTJoel Becker1-1/+2
-EDQUOT is a user-visible error, not a logic problem. Teach mlog_errno() to ignore it like it ignores -ENOSPC, etc. Signed-off-by: Joel Becker <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reported-by: Marek Królikowski <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: add necessary check in case sb_getblk() failsRui Xiang2-0/+11
sb_getblk() may return an err, so add a check for bh. [[email protected]: also add a check after calling sb_getblk() in ocfs2_create_xattr_block()] Signed-off-by: Rui Xiang <[email protected]> Reviewed-by: Jie Liu <[email protected]> Reviewed-by: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13ocfs2: return ENOMEM when sb_getblk() failsRui Xiang9-16/+18
The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So return ENOMEM instead when it fails. [[email protected]: ocfs2_symlink_get_block() and ocfs2_read_blocks_sync() and ocfs2_read_blocks() need the same change] Signed-off-by: Rui Xiang <[email protected]> Reviewed-by: Jie Liu <[email protected]> Reviewed-by: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13fs/ocfs2/file.c: fix wrong commentJunxiao Bi1-1/+1
Unwritten extent only exists for file systems which support holes. But the comment said was opposite meaning and also the comment is not very clear, so rephase it. Signed-off-by: Junxiao Bi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13fs/ocfs2: remove unnecessary variable bits_wanted from ocfs2_calc_extend_creditsGoldwyn Rodrigues7-29/+16
Code cleanup to remove unnecessary variable passed but never used to ocfs2_calc_extend_credits. Signed-off-by: Goldwyn Rodrigues <[email protected]> Cc: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-12Merge tag 'devicetree-for-3.13' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates" * tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits) powerpc: add missing explicit OF includes for ppc dt/irq: add empty of_irq_count for !OF_IRQ dt: disable self-tests for !OF_IRQ of: irq: Fix interrupt-map entry matching MIPS: Netlogic: replace early_init_devtree() call of: Add Panasonic Corporation vendor prefix of: Add Chunghwa Picture Tubes Ltd. vendor prefix of: Add AU Optronics Corporation vendor prefix of/irq: Fix potential buffer overflow of/irq: Fix bug in interrupt parsing refactor. of: set dma_mask to point to coherent_dma_mask of: add vendor prefix for PHYTEC Messtechnik GmbH DT: sort vendor-prefixes.txt of: Add vendor prefix for Cadence of: Add empty for_each_available_child_of_node() macro definition arm/versatile: Fix versatile irq specifications. of/irq: create interrupts-extended property microblaze/pci: Drop PowerPC-ism from irq parsing of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code. of/irq: Use irq_of_parse_and_map() ...
2013-11-12Merge tag 'h8300-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull h8300 platform removal from Guenter Roeck: "The patch series has been in -next for more than one relase cycle. I did get a number of Acks, and no objections. H8/300 has been dead for several years, the kernel for it has not compiled for ages, and recent versions of gcc for it are broken. Remove support for it" * tag 'h8300-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: CREDITS: Add Yoshinori Sato for h8300 fs/minix: Drop dependency on H8300 Drop remaining references to H8/300 architecture Drop MAINTAINERS entry for H8/300 watchdog: Drop references to H8300 architecture net/ethernet: Drop H8/300 Ethernet driver net/ethernet: smsc9194: Drop conditional code for H8/300 ide: Drop H8/300 driver Drop support for Renesas H8/300 (h8300) architecture
2013-11-11ext4: add prototypes for macro-generated functionsAndreas Dilger1-0/+11
It isn't very easy to find the declarations for the functions created by EXT4_INODE_BIT_FNS() because the names are generated by macros: ext4_test_inode_flag, ext4_set_inode_flag, ext4_clear_inode_flag ext4_test_inode_state, ext4_set_inode_state, ext4_clear_inode_state Add explicit declarations for these functions so that grep and tags can find them. Signed-off-by: Andreas Dilger <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2013-11-11ext4: return non-zero st_blocks for inline dataAndreas Dilger1-3/+11
Return a non-zero st_blocks to userspace for statfs() and friends. Some versions of tar will assume that files with st_blocks == 0 do not contain any data and will skip reading them entirely. Signed-off-by: Andreas Dilger <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2013-11-11Btrfs: rename btrfs_start_all_delalloc_inodesMiao Xie7-9/+7
rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at the same place. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: don't wait for the completion of all the ordered extentsMiao Xie8-18/+31
It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: don't wait for all the async delalloc when shrinking delallocMiao Xie1-2/+12
It was very likely that there were lots of async delalloc pages in the filesystem, if we waited until all the pages were flushed, we would be blocked for a long time, and the performance would also drop down. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: fix the confusion between delalloc bytes and metadata bytesMiao Xie1-0/+6
In shrink_delalloc(), what we need reclaim is the metadata space, so flushing pages by to_reclaim is not reasonable, it is very likely that the pages we flush are not enough. And then we had to invoke the flush function for several times, at the worst, we need call flush_space for several times. It wasted time. We improve this problem by converting the metadata space size we need reserve to the delalloc bytes, By this way, we can flush the pages by a reasonable number. (Now we use a fixed number to do conversion, it is not flexible, maybe we can find a good way to improve it in the future.) Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: pick up the code for the item number calculation in flush_space()Miao Xie1-9/+16
This patch picked up the code that was used to calculate the number of the items for which we need reserve space, and we will use it in the next patch. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: wait for the ordered extent only when we wantMiao Xie1-1/+2
Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc()Miao Xie1-4/+3
Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: avoid unnecessary scrub workers allocationWang Shilong1-13/+10
We only allocate scrub workers if we pass all the necessary checks, for example, there are no operation in progress. Besides, move mutex lock protection outside of scrub_workers_get() /scrub_workers_put(). Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: check file extent type before anything elseJosef Bacik1-5/+5
I hit this problem with my no holes patch and it made me realize what the problem was for bz 60834. If the first item in the leaf is an inline extent and we try to read anything starting from disk_bytenr onward we will read off the end of the leaf. So we need to check to see what it's type is, and if it's not REG we can just break out. This should fix this problem. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11btrfs: Remove useless variable in write_ctree_super()Rashika1-4/+1
The function write_ctree_super() in disk-io.c uses variable ret to return the result of function write_all_supers(). Since, this variable serves no purpose, hence the patch removes it and returns the call of the called function. Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Rashika Kheria <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11btrfs: Fix checkpatch.pl warning of spacing issuesDulshani Gunawardhana9-19/+19
Fix spacing issues detected via checkpatch.pl in accordance with the kernel style guidelines. Signed-off-by: Dulshani Gunawardhana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11btrfs: Replace kmalloc with kmalloc_arrayDulshani Gunawardhana4-5/+5
Replace kmalloc(size * nr, ) with kmalloc_array(nr, size), thus making it easier to check is that the calculation doesn't wrap or return a smaller allocation Signed-off-by: Dulshani Gunawardhana <[email protected]> Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11btrfs: Enclose macros with complex values within parenthesisDulshani Gunawardhana1-4/+4
Enclose macros with complex values within parenthesis in accordance to checkpatch.pl. Signed-off-by: Dulshani Gunawardhana <[email protected]> Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11btrfs: Use WARN_ON()'s return value in place of WARN_ON(1)Dulshani Gunawardhana14-70/+28
Use WARN_ON()'s return value in place of WARN_ON(1) for cleaner source code that outputs a more descriptive warnings. Also fix the styling warning of redundant braces that came up as a result of this fix. Signed-off-by: Dulshani Gunawardhana <[email protected]> Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11btrfs: Remove redundant local zero structureDulshani Gunawardhana1-3/+2
Remove redundant local zero structure, replacing it by the kernel's global ZERO_PAGE. Signed-off-by: Dulshani Gunawardhana <[email protected]> Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11btrfs: Pack struct btrfs_deviceDulshani Gunawardhana1-10/+10
Pack the structure btrfs_device in volumes.h to eliminate holes detected by pahole, thus reducing binary memory footprint. Signed-off-by: Dulshani Gunawardhana <[email protected]> Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11btrfs: Replace multiple atomic_inc() with atomic_add()Rashika1-4/+4
This patch replaces multiple atomic_inc() with atomic_add() in delayed-inode.c to reduce source code and have few instructions for compilation. Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Rashika Kheria <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11btrfs: Add helper function for free_root_pointers()Rashika1-41/+19
The function free_root_pointers() in disk-io.h contains redundant code. Therefore, this patch adds a helper function free_root_extent_buffers() to free_root_pointers() to eliminate redundancy. Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Rashika Kheria <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: fix a crash when running balance and defrag concurrentlyLiu Bo1-0/+3
Running balance and defrag concurrently can end up with a crash: kernel BUG at fs/btrfs/relocation.c:4528! RIP: 0010:[<ffffffffa01ac33b>] [<ffffffffa01ac33b>] btrfs_reloc_cow_block+ 0x1eb/0x230 [btrfs] Call Trace: [<ffffffffa01398c1>] ? update_ref_for_cow+0x241/0x380 [btrfs] [<ffffffffa0180bad>] ? copy_extent_buffer+0xad/0x110 [btrfs] [<ffffffffa0139da1>] __btrfs_cow_block+0x3a1/0x520 [btrfs] [<ffffffffa013a0b6>] btrfs_cow_block+0x116/0x1b0 [btrfs] [<ffffffffa013ddad>] btrfs_search_slot+0x43d/0x970 [btrfs] [<ffffffffa0153c57>] btrfs_lookup_file_extent+0x37/0x40 [btrfs] [<ffffffffa0172a5e>] __btrfs_drop_extents+0x11e/0xae0 [btrfs] [<ffffffffa013b3fd>] ? generic_bin_search.constprop.39+0x8d/0x1a0 [btrfs] [<ffffffff8117d14a>] ? kmem_cache_alloc+0x1da/0x200 [<ffffffffa0138e7a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [<ffffffffa0173ef0>] btrfs_drop_extents+0x60/0x90 [btrfs] [<ffffffffa016b24d>] relink_extent_backref+0x2ed/0x780 [btrfs] [<ffffffffa0162fe0>] ? btrfs_submit_bio_hook+0x1e0/0x1e0 [btrfs] [<ffffffffa01b8ed7>] ? iterate_inodes_from_logical+0x87/0xa0 [btrfs] [<ffffffffa016b909>] btrfs_finish_ordered_io+0x229/0xac0 [btrfs] [<ffffffffa016c3b5>] finish_ordered_fn+0x15/0x20 [btrfs] [<ffffffffa018cbe5>] worker_loop+0x125/0x4e0 [btrfs] [<ffffffffa018cac0>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [<ffffffff81075ea0>] kthread+0xc0/0xd0 [<ffffffff81075de0>] ? insert_kthread_work+0x40/0x40 [<ffffffff8164796c>] ret_from_fork+0x7c/0xb0 [<ffffffff81075de0>] ? insert_kthread_work+0x40/0x40 ---------------------------------------------------------------------- It turns out to be that balance operation will bump root's @last_snapshot, which enables snapshot-aware defrag path, and backref walking stuff will find data reloc tree as refs' parent, and hit the BUG_ON() during COW. As data reloc tree's data is just for relocation purpose, and will be deleted right after relocation is done, it's unnecessary to walk those refs belonged to data reloc tree, it'd be better to skip them. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: do not run snapshot-aware defragment on errorLiu Bo1-19/+28
If something wrong happens in write endio, running snapshot-aware defragment can end up with undefined results, maybe a crash, so we should avoid it. In order to share similar code, this also adds a helper to free the struct for snapshot-aware defrag. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: log recovery, don't unlink inode always on errorFilipe David Borba Manana1-1/+4
If we get any error while doing a dir index/item lookup in the log tree, we were always unlinking the corresponding inode in the subvolume. It makes sense to unlink only if the lookup failed to find the dir index/item, which corresponds to NULL or -ENOENT, and not when other errors happen (like a transient -ENOMEM or -EIO). Signed-off-by: Filipe David Borba Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: fix csum search offset/length calculation in log treeFilipe David Borba Manana1-7/+7
We were setting the csums search offset and length to the right values if the extent is compressed, but later on right before doing the csums lookup we were overriding these two parameters regardless of compression being set or not for the extent. Signed-off-by: Filipe David Borba Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: fix verification of dir_itemFilipe David Borba Manana1-2/+4
We were ignoring the name component of the dir_item. Both the name and data must fit within BTRFS_MAX_XATTR_SIZE(root). Signed-off-by: Filipe David Borba Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: remove scrub_super_lock holding in btrfs_sync_log()Wang Shilong5-20/+8
Originally, we introduced scrub_super_lock to synchronize tree log code with scrubbing super. However we can replace scrub_super_lock with device_list_mutex, because writing super will hold this mutex, this will reduce an extra lock holding when writing supers in sync log code. Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: use 'u64' rather than 'int' to get extent's generationWang Shilong1-1/+1
We define a 'int' to get extent's generation by mistake,fix it. Signed-off-by: Wang Shilong <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-11-11Btrfs: fix the free space write out failure when there is no data spaceMiao Xie1-3/+12
After running space balance on a new fs, the fs check program outputed the following warning message: free space inode generation (0) did not match free space cache generation (20) Steps to reproduce: # mkfs.btrfs -f <dev> # mount <dev> <mnt> # btrfs balance start <mnt> # umount <mnt> # btrfs check <dev> It was because there was no data space after the space balance, and the free space write out task didn't try to allocate a new data chunk for the free space inode when doing the reservation. So the data space reservation failed, and in order to tell the free space loader that this free space inode could not be trusted, the generation of the free space inode wasn't updated. Then the check program found this problem and outputed the above message. But in fact, it is safe that we try to allocate a new data chunk when we find the data space is not enough. The patch fixes the above problem by this way. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>