aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-02-20btrfs: remove unused "item" in btrfs_insert_delayed_item()Eric Sandeen1-2/+0
"item" was set but never used in this function. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20btrfs: fix varargs in __btrfs_std_errorEric Sandeen1-7/+7
__btrfs_std_error didn't always properly call va_end, and might call va_start even if fmt was NULL. Move all the varargs handling into the block where we have fmt. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20btrfs: add missing break in btrfs_print_leaf()Eric Sandeen1-0/+1
I don't think that BTRFS_DEV_EXTENT_KEY is supposed to fall through to BTRFS_DEV_STATS_KEY ... Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20btrfs: annotate intentional switch case fallthroughsEric Sandeen2-0/+2
This keeps static checkers happy. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20btrfs: handle null fs_info in btrfs_panic()Eric Sandeen2-4/+7
At least backref_tree_panic() can apparently pass in a null fs_info, so handle that in __btrfs_panic to get the message out on the console. The btrfs_panic macro also uses fs_info, but that's largely pointless; it's testing to see if BTRFS_MOUNT_PANIC_ON_FATAL_ERROR is not set. But if it *were* set, __btrfs_panic() would have, well, paniced and we wouldn't be here, testing it! So just BUG() at this point. And since we only use fs_info once now, just use it directly. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20btrfs: remove unused fs_info from btrfs_decode_error()Eric Sandeen1-5/+4
Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20btrfs: list_entry can't return NULLEric Sandeen1-2/+0
No need to test the result, we can't get a null pointer from list_entry() Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20btrfs: remove unused fd in btrfs_ioctl_send()Eric Sandeen1-3/+0
All we do is set it to NULL and test it :) Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: do not overcommit if we don't have enough space for global rsvJosef Bacik1-2/+19
Because of how little we allocate chunks now we can get really tight on metadata space before we will allocate a new chunk. This resulted in being unable to add device extents when allocating a new metadata chunk as we did not have enough space. This is because we were allowed to overcommit too much metadata without actually making sure we had enough space to make allocations. The idea behind overcommit is that we are allowed to say "sure you can have that reservation" when most of the free space is occupied by reservations, not actual allocations. But in this case where a majority of the total space is in use by actual allocations we can screw ourselves by not being able to make real allocations when it matters. So make sure we have enough real space for our global reserve, and if not then don't allow overcommitting. Thanks, Reported-and-tested-by: Jim Schutt <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: remove extent mapping if we fail to add chunkJosef Bacik1-2/+12
I got a double free error when unmounting a file system that failed to add a chunk during its operation. This is because we will kfree the mapping that we created but leave the extent_map in the em_tree for chunks. So to fix this just remove the extent_map when we error out so we don't run into this problem. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: fix chunk allocation error handlingJosef Bacik1-10/+22
If we error out allocating a dev extent we will have already created the block group and such which will cause problems since the allocator may have tried to allocate out of the block group that no longer exists. This will cause BUG_ON()'s in the bio submission path. This also makes a failure to allocate a dev extent a non-abort error, we will just clean up the dev extents we did allocate and exit. Now if we fail to delete the dev extents we will abort since we can't have half of the dev extents hanging around, but this will make us much less likely to abort. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: use bit operation for ->fs_stateMiao Xie6-12/+14
There is no lock to protect fs_info->fs_state, it will introduce some problems, such as the value may be covered by the other task when several tasks modify it. For example: Task0 - CPU0 Task1 - CPU1 mov %fs_state rax or $0x1 rax mov %fs_state rax or $0x2 rax mov rax %fs_state mov rax %fs_state The expected value is 3, but in fact, it is 2. Though this problem doesn't happen now (because there is only one flag currently), the code is error prone, if we add other flags, the above problem will happen to a certainty. Now we use bit operation for it to fix the above problem. In this way, we can make the code more robust and be easy to add new flags. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bitsMiao Xie4-32/+49
There is no lock to protect fs_info->avail_{data, metadata, system}_alloc_bits, it may introduce some problem, such as the wrong profile information, so we add a seqlock to protect them. Signed-off-by: Zhao Lei <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: use the inode own lock to protect its delalloc_bytesMiao Xie3-13/+37
We need not use a global lock to protect the delalloc_bytes of the inode, just use its own lock. In this way, we can reduce the lock contention and ->delalloc_lock will just protect delalloc inode list. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: use percpu counter for fs_info->delalloc_bytesMiao Xie4-11/+26
fs_info->delalloc_bytes is accessed very frequently, so use percpu counter instead of the u64 variant for it to reduce the lock contention. This patch also fixed the problem that we access the variant without the lock protection.At worst, we would not flush the delalloc inodes, and just return ENOSPC error when we still have some free space in the fs. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: use percpu counter for dirty metadata countMiao Xie3-40/+42
->dirty_metadata_bytes is accessed very frequently, so use percpu counter instead of the u64 variant to reduce the contention of the lock. This patch also fixed the problem that we access it without lock protection in __btrfs_btree_balance_dirty(), which may cause we skip the dirty pages flush. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: protect fs_info->alloc_startMiao Xie2-0/+14
fs_info->alloc_start is a 64bits variant, can be accessed by multi-task, but it is not protected strictly, it can be changed while we are accessing it. On 32bit machine, we will get wrong value because we access it by two instructions.(In fact, it is also possible that the same problem happens on the 64bit machine, because the compiler may split the 64bit operation into two 32bit operation.) For example: Assuming -> alloc_start is 0x0000 0000 0001 0000 at the beginning, then we remount and set ->alloc_start to 0x0000 0100 0000 0000. Task0 Task1 load high 32 bits set high 32 bits set low 32 bits load low 32 bits Task1 will get 0. This patch fixes this problem by using two locks to protect it fs_info->chunk_mutex sb->s_umount On the read side, we just need get one of these two locks, and on the write side, we must lock all of them. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: add a comment for fs_info->max_inlineMiao Xie1-0/+6
Though ->max_inline is a 64bit variant, and may be accessed by multi-task, but it is just suggestive number, so we needn't add anything to protect fs_info->max_inline, just add a comment to explain wny we don't use a lock to protect it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.hFilipe Brandenburger11-11/+19
The header file will then be installed under /usr/include/linux so that userspace applications can refer to Btrfs ioctls by name and use the same structs used internally in the kernel. Signed-off-by: Filipe Brandenburger <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: Check CAP_DAC_READ_SEARCH for BTRFS_IOC_INO_PATHSKusanagi Kouichi1-1/+1
CAP_DAC_READ_SEARCH overrides read and search permission check on file and directory. It seems fit for BTRFS_IOC_INO_PATHS. Signed-off-by: Kusanagi Kouichi <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Revert "Btrfs: fix permissions of empty files not affected by umask"Josef Bacik1-6/+0
This reverts commit 2794ed013b3551cbae887ea1b93c52aaacb7370d. Wasn't supposed to get used in btrfs_mknod, it was supposed to be in btrfs_create, which was done in commit 9185aa587b7425f8f4520da2e66792f5f3c2b815. Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: don't traverse the ordered operation list repeatedlyMiao Xie1-14/+5
btrfs_run_ordered_operations() needn't traverse the ordered operation list repeatedly, it is because the transaction commiter will invoke it again when there is no other writer in this transaction, it can ensure that no one can add new objects into the ordered operation list. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: traverse and flush the delalloc inodes onceMiao Xie1-8/+1
btrfs_start_delalloc_inodes() needn't traverse and flush the delalloc inodes repeatedly. It is because we can regard the data that the users write after we start delalloc inodes flush as the one which is after the delalloc inodes flush is done, and we can flush it next time. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: check the return value of btrfs_run_ordered_operations()Miao Xie1-2/+2
We forget to check the return value of btrfs_run_ordered_operations() when flushing all the pending stuffs, fix it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: check the return value of btrfs_start_delalloc_inodes()Miao Xie2-2/+8
We forget to check the return value of btrfs_start_delalloc_inodes(), fix it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: make raid attr array more readableMiao Xie3-20/+59
The current code of raid attr arry is hard to understand and it is easy to introduce some problem if we modify the array. So I changed it and made it more readable. Cc: Liu Bo <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: record first logical byte in memoryLiu Bo3-1/+20
This'd save us a rbtree search which may become expensive in large filesystem. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: save us a read_lockLiu Bo1-3/+2
This does not change the logic of code, but can save us a read_lock. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: use token to avoid times mapping extent bufferLiu Bo1-28/+35
The API in tree log code has done sort of changes, and it proves that we can benifit from using token, so do the same thing here. function_graph tracer's timer shows that it costs nearly half time of before(39.788us -> 22.391us). Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: kill unused argument of btrfs_pin_extent_for_log_replayLiu Bo3-6/+3
Argument 'trans' is not used any more. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: kill unused argument of update_block_groupLiu Bo1-7/+5
Argument 'trans' is not used any more. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: kill unused arguments of cache_block_groupLiu Bo1-8/+5
Argument 'trans' and 'root' are not used any more. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: remove deprecated commentsLiu Bo1-6/+0
commit d53ba47484ed6245e640ee4bfe9d21e9bfc15765 (Btrfs: use commit root when loading free space cache) has remove the deadlock check, and the related comments can be removed as well. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: don't re-enter when allocating a chunkJosef Bacik3-0/+9
If we start running low on metadata space we will try to allocate a chunk, which could then try to allocate a chunk to add the device entry. The thing is we allocate a chunk before we try really hard to make the allocation, so we should be able to find space for the device entry. Add a flag to the trans handle so we know we're currently allocating a chunk so we can just bail out if we try to allocate another chunk. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: wait on ordered extents at the last possible momentJosef Bacik7-9/+247
Since we don't actually copy the extent information from the source tree in the fast case we don't need to wait for ordered io to be completed in order to fsync, we just need to wait for the io to be completed. So when we're logging our file just attach all of the ordered extents to the log, and then when the log syncs just wait for IO_DONE on the ordered extents and then write the super. Thanks, Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: fix trivial error in btrfs_ioctl_resize()Miao Xie1-6/+7
This patch fixes the following problem: - improper return value - unnecessary read-only check Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: use wrapper page_offsetMiao Xie4-18/+15
Use wrapper page_offset to get byte-offset into filesystem object for page. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: flush all dirty inodes if writeback can not startMiao Xie1-9/+31
We may try to flush some dirty pages when there is no enough space to reserve. But it is possible that this operation fails, in order to get enough space to reserve successfully, we will sync all the delalloc file. This operation is safe, we needn't worry about the case that the filesystem goes from r/w to r/o. because the filesystem should guarantee all the dirty pages have been written into the disk after it becomes readonly, so the sync operation will do nothing if the filesystem is already readonly. Though it may waste lots of time, as a corner case, we needn't care. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: make delayed ref lock logic more readableMiao Xie3-18/+38
Locking and unlocking delayed ref mutex are in the different functions, and the name of lock functions is not uniform, so the readability is not so good, this patch optimizes the lock logic and makes it more readable. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: fix lots of orphan inodes when the space is not enoughMiao Xie3-17/+85
We're running into having 50-100 orphans left over with xfstests 83 because of ENOSPC when trying to start the transaction for the inode update. But in fact, it makes no sense in updating the inode for the new size while we're deleting the stupid thing. This patch fixes this problem. Reported-by: Josef Bacik <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: cleanup similar code in delayed inodeMiao Xie1-46/+37
The delayed item commit code in several functions is similar, so cleanup it. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]>
2013-02-20Btrfs: use common work instead of delayed workMiao Xie1-4/+4
Since we do not want to delay the async transaction commit, we should use common work, not delayed work. Signed-off-by: Miao Xie <[email protected]>
2013-02-20Btrfs: cleanup unnecessary clear when freeing a transaction or a trans handleMiao Xie1-2/+0
We clear the transaction object and the trans handle when they are about to be freed, it is unnecessary, cleanup it. Signed-off-by: Miao Xie <[email protected]>
2013-02-20Btrfs: use slabs for delayed reference allocationMiao Xie5-21/+115
The delayed reference allocation is in the fast path of the IO, so use slabs to improve the speed of the allocation. And besides that, it can do check for leaked objects when the module is removed. Signed-off-by: Miao Xie <[email protected]>
2013-02-18Linux 3.8Linus Torvalds1-1/+1
2013-02-18Merge branch 'for-linus' of ↵Linus Torvalds3-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input subsystem fixes from Dmitry Torokhov: "Two small driver fixups and a documentation update for managed input devices" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wacom - fix wacom_set_report retry logic Input: document that unregistering managed devices is not necessary Input: lm8323 - fix checking PWM interrupt status
2013-02-18mm: fix pageblock bitmap allocationLinus Torvalds1-6/+9
Commit c060f943d092 ("mm: use aligned zone start for pfn_to_bitidx calculation") fixed out calculation of the index into the pageblock bitmap when a !SPARSEMEM zome was not aligned to pageblock_nr_pages. However, the _allocation_ of that bitmap had never taken this alignment requirement into accout, so depending on the exact size and alignment of the zone, the use of that index could then access past the allocation, resulting in some very subtle memory corruption. This was reported (and bisected) by Ingo Molnar: one of his random config builds would hang with certain very specific kernel command line options. In the meantime, commit c060f943d092 has been marked for stable, so this fix needs to be back-ported to the stable kernels that backported the commit to use the right alignment. Bisected-and-tested-by: Ingo Molnar <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2013-02-15btrfs: access superblock via pagecache in scan_one_deviceDavid Sterba1-6/+64
btrfs_scan_one_device is calling set_blocksize() which can race with a concurrent process making dirty page cache pages. It can end up dropping dirty page cache pages on the floor, which isn't very nice when someone is just running btrfs dev scan to find filesystems on the box. Now that udev is registering btrfs devices as it discovers them, we can actually end up racing with our own mkfs program too. When this happens, we drop some of the important blocks written by mkfs. This commit changes scan_one_device to read the super out of the page cache instead of trying to use bread. This way we don't have to care about the blocksize of the device. This also drops the invalidate_bdev() call. It wasn't very polite to invalidate during the scan either. mkfs is putting the super into the page cache, there's no reason to invalidate at this point. Signed-off-by: David Sterba <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-02-15Merge tag 'stable/for-linus-3.8-rc7-tag-two' of ↵Linus Torvalds5-65/+33
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: "Two fixes: - A simple bug-fix for redundant NULL check. - CVE-2013-0228/XSA-42: x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS and two reverts: - Revert the PVonHVM kexec. The patch introduces a regression with older hypervisor stacks, such as Xen 4.1." * tag 'stable/for-linus-3.8-rc7-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: Revert "xen PVonHVM: use E820_Reserved area for shared_info" Revert "xen/PVonHVM: fix compile warning in init_hvm_pv_info" xen: remove redundant NULL check before unregister_and_remove_pcpu(). x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.
2013-02-15Revert "[media] dvb_frontend: return -ENOTTY for unimplement IOCTL"Mauro Carvalho Chehab1-3/+3
As reported by Klaus Schmidinger: "In VDR I use an ioctl() call with FE_READ_UNCORRECTED_BLOCKS on a device (using stb0899). After this call I check 'errno' for EOPNOTSUPP to determine whether this device supports this call. This used to work just fine, until a few months ago I noticed that my devices using stb0899 didn't display their signal quality in VDR's OSD any more. After further investigation I found that ioctl(FE_READ_UNCORRECTED_BLOCKS) no longer returns EOPNOTSUPP, but rather ENOTTY. And since I stop getting the signal quality in case any unknown errno value appears, this broke my signal quality query function." While the changes reflect what is there at: http://comments.gmane.org/gmane.linux.kernel/1235728 it does cause regression on userspace. So, revert it to stop the damage. This reverts commit 177ffe506cf8 ("[media] dvb_frontend: return -ENOTTY for unimplement IOCTL"). Reported-by: Klaus Schmidinger <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>