aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2013-10-22f2fs: use true and false for boolean valueHaicheng Li1-2/+2
Signed-off-by: Haicheng Li <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2013-10-22Merge tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggyLinus Torvalds1-2/+1
Pull jfs bugfix from David Kleikamp: "Just a patch to fix an oops in an error path" * tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggy: jfs: fix error path in ialloc
2013-10-21xfs: fold xfs_change_file_space into xfs_ioc_spaceChristoph Hellwig4-191/+126
Now that only one caller of xfs_change_file_space is left it can be merged into said caller. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-21xfs: simplify the fallocate pathChristoph Hellwig3-63/+56
Call xfs_alloc_file_space or xfs_free_file_space directly from xfs_file_fallocate instead of going through xfs_change_file_space. This simplified the code by removing the unessecary marshalling of the arguments into an xfs_flock64_t structure and allows removing checks that are already done in the VFS code. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-21xfs: always hold the iolock when calling xfs_change_file_spaceChristoph Hellwig4-49/+27
Currently fallocate always holds the iolock when calling into xfs_change_file_space, while the ioctl path lets some of the lower level functions take it, but leave it out in others. This patch makes sure the ioctl path also always holds the iolock and thus introduces consistent locking for the preallocation operations while simplifying the code and allowing to kill the now unused XFS_ATTR_NOLOCK flag. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-21xfs: remove the unused XFS_ATTR_NONBLOCK flagChristoph Hellwig2-4/+0
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-21xfs: always take the iolock around xfs_setattr_sizeChristoph Hellwig4-22/+24
There is no reason to conditionally take the iolock inside xfs_setattr_size when we can let the caller handle it unconditionally, which just incrases the lock hold time for the case where it was previously taken internally by a few instructions. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-20nfsd regression since delayed fput()Al Viro1-2/+2
Background: nfsd v[23] had throughput regression since delayed fput went in; every read or write ends up doing fput() and we get a pair of extra context switches out of that (plus quite a bit of work in queue_work itselfi, apparently). Use of schedule_delayed_work() gives it a chance to accumulate a bit before we do __fput() on all of them. I'm not too happy about that solution, but... on at least one real-world setup it reverts about 10% throughput loss we got from switch to delayed fput. Signed-off-by: Al Viro <[email protected]>
2013-10-19Merge 3.12-rc6 into driver-core-nextGreg Kroah-Hartman55-302/+473
We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-10-18Merge branch 'for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a regression in our initial rc1 pull. When doing nocow writes we were sometimes starting a transaction with locks held" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: release path before starting transaction in can_nocow_extent
2013-10-18udf: fix for pathetic mount times in case of invalid file systemPeter A. Felvegi1-9/+36
The UDF driver was not strict enough about checking the IDs in the VSDs when mounting, which resulted in reading through all the sectors of the block device in some unfortunate cases. Eg, trying to mount my uninitialized 200G SSD partition (all 0xFF bytes) took ~350 minutes to fail, because the code expected some of the valid IDs or a zero byte. During this, the mount couldn't be killed, sync from the cmdline blocked, and the machine froze into the shutdown. Valid filesystems (extX, btrfs, ntfs) were rejected by the mere accident of having a zero byte at just the right place in some of their sectors, close enough to the beginning not to generate excess I/O. The fix adds a hard limit on the VSD sector offset, adds the two missing VSD IDs, and stops scanning when encountering an invalid ID. Also replaced the magic number 32768 with a more meaningful #define, and supressed the bogus message about failing to read the first sector if no UDF fs was detected. Signed-off-by: Peter A. Felvegi <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2013-10-18Btrfs: release path before starting transaction in can_nocow_extentJosef Bacik1-0/+1
We can't be holding tree locks while we try to start a transaction, we will deadlock. Thanks, Reported-by: Sage Weil <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-10-17Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds8-21/+93
Pull CIFS fixes from Steve French: "Five small cifs fixes (includes fixes for: unmount hang, 2 security related, symlink, large file writes)" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: cifs: ntstatus_to_dos_map[] is not terminated cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods cifs: Fix inability to write files >2GB to SMB2/3 shares cifs: Avoid umount hangs with smb2 when server is unresponsive do not treat non-symlink reparse points as valid symlinks
2013-10-17ext4: add ratelimiting to ext4 messagesTheodore Ts'o2-58/+100
In the case of a storage device that suddenly disappears, or in the case of significant file system corruption, this can result in a huge flood of messages being sent to the console. This can overflow the file system containing /var/log/messages, or if a serial console is configured, this can slow down the system so much that a hardware watchdog can end up triggering forcing a system reboot. Google-Bug-Id: 7258357 Signed-off-by: "Theodore Ts'o" <[email protected]>
2013-10-18f2fs: avoid to write during the recoveryJaegeuk Kim2-7/+12
This patch enhances the recovery routine not to write any data/node/meta until its completion. If any writes are sent to the disk, it could contaminate the written history that will be used for further recovery. Signed-off-by: Jaegeuk Kim <[email protected]>
2013-10-18f2fs: avoid wait if IO end up when do_checkpoint for better performanceGu Zheng3-2/+14
Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back. Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting, and no additional wake up mechanism was introduced if IO ends up before regular period costed. Yuan Zhong found there is a situation that after the pages have been written back, but the checkpoint thread still wait for congestion_wait to exit. So here we store checkpoint task into f2fs_sb when doing checkpoint, it'll wait for IO completes if there's IO going on, and in the end IO path, wake up checkpoint task when IO ends up. Thanks to Yuan Zhong's pre work about this problem. Reported-by: Yuan Zhong <[email protected]> Signed-off-by: Gu Zheng <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2013-10-18f2fs: introduce function read_raw_super_block()Gu Zheng1-21/+34
Introduce function read_raw_super_block() to hide reading raw super block and the retry routine if the first sb is invalid. Signed-off-by: Gu Zheng <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2013-10-18f2fs: fix the starvation problem on cp_rwsemJaegeuk Kim2-14/+0
This patch removes the logic previously introduced to address the starvation on cp_rwsem. One potential there-in bug is that we should cover the wait.list with spin_lock, but the previous code broke this rule. And, actually current rwsem handles this starvation issue reasonably, so that we didn't need to do this before neither. Signed-off-by: Jaegeuk Kim <[email protected]>
2013-10-18f2fs: fix to store and retrieve i_rdev correctlyJaegeuk Kim1-17/+32
When storing i_rdev, we should check its file type. Signed-off-by: Jaegeuk Kim <[email protected]>
2013-10-17ext4: fix performance regression in ext4_writepagesMing Lei1-13/+13
Commit 4e7ea81db5(ext4: restructure writeback path) introduces another performance regression on random write: - one more page may be added to ext4 extent in mpage_prepare_extent_to_map, and will be submitted for I/O so nr_to_write will become -1 before 'done' is set - the worse thing is that dirty pages may still be retrieved from page cache after nr_to_write becomes negative, so lots of small chunks can be submitted to block device when page writeback is catching up with write path, and performance is hurted. On one arm A15 board with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB, SATA controller: 3.0Gbps), this patch can improve below test's result from 157MB/sec to 174MB/sec(>10%): dd if=/dev/zero of=./z.img bs=8K count=512K The above test is actually prototype of block write in bonnie++ utility. This patch makes sure no more pages than nr_to_write can be added to extent for mapping, so that nr_to_write won't become negative. Cc: [email protected] Acked-by: Jan Kara <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2013-10-17xfs: don't break from growfs ag update loop on errorEric Sandeen1-9/+13
When xfs_growfs_data_private() is updating backup superblocks, it bails out on the first error encountered, whether reading or writing: * If we get an error writing out the alternate superblocks, * just issue a warning and continue. The real work is * already done and committed. This can cause a problem later during repair, because repair looks at all superblocks, and picks the most prevalent one as correct. If we bail out early in the backup superblock loop, we can end up with more "bad" matching superblocks than good, and a post-growfs repair may revert the filesystem to the old geometry. With the combination of superblock verifiers and old bugs, we're more likely to encounter read errors due to verification. And perhaps even worse, we don't even properly write any of the newly-added superblocks in the new AGs. Even with this change, growfs will still say: xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning data blocks changed from 319815680 to 335216640 which might be confusing to the user, but it at least communicates that something has gone wrong, and dmesg will probably highlight the need for an xfs_repair. And this is still best-effort; if verifiers fail on more than half the backup supers, they may still "win" - but that's probably best left to repair to more gracefully handle by doing its own strict verification as part of the backup super "voting." Signed-off-by: Eric Sandeen <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Mark Tinguely <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-17xfs: don't emit corruption noise on fs probesEric Sandeen1-2/+3
If we get EWRONGFS due to probing of non-xfs filesystems, there's no need to issue the scary corruption error and backtrace. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Mark Tinguely <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-17xfs: remove newlines from strings passed to __xfs_printkEric Sandeen10-20/+20
__xfs_printk adds its own "\n". Having it in the original string leads to unintentional blank lines from these messages. Most format strings have no newline, but a few do, leading to i.e.: [ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119911] [ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119919] Fix them all. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Mark Tinguely <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-17xfs: prevent deadlock trying to cover an active logDave Chinner3-25/+47
Recent analysis of a deadlocked XFS filesystem from a kernel crash dump indicated that the filesystem was stuck waiting for log space. The short story of the hang on the RHEL6 kernel is this: - the tail of the log is pinned by an inode - the inode has been pushed by the xfsaild - the inode has been flushed to it's backing buffer and is currently flush locked and hence waiting for backing buffer IO to complete and remove it from the AIL - the backing buffer is marked for write - it is on the delayed write queue - the inode buffer has been modified directly and logged recently due to unlinked inode list modification - the backing buffer is pinned in memory as it is in the active CIL context. - the xfsbufd won't start buffer writeback because it is pinned - xfssyncd won't force the log because it sees the log as needing to be covered and hence wants to issue a dummy transaction to move the log covering state machine along. Hence there is no trigger to force the CIL to the log and hence unpin the inode buffer and therefore complete the inode IO, remove it from the AIL and hence move the tail of the log along, allowing transactions to start again. Mainline kernels also have the same deadlock, though the signature is slightly different - the inode buffer never reaches the delayed write lists because xfs_buf_item_push() sees that it is pinned and hence never adds it to the delayed write list that the xfsaild flushes. There are two possible solutions here. The first is to simply force the log before trying to cover the log and so ensure that the CIL is emptied before we try to reserve space for the dummy transaction in the xfs_log_worker(). While this might work most of the time, it is still racy and is no guarantee that we don't get stuck in xfs_trans_reserve waiting for log space to come free. Hence it's not the best way to solve the problem. The second solution is to modify xfs_log_need_covered() to be aware of the CIL. We only should be attempting to cover the log if there is no current activity in the log - covering the log is the process of ensuring that the head and tail in the log on disk are identical (i.e. the log is clean and at idle). Hence, by definition, if there are items in the CIL then the log is not at idle and so we don't need to attempt to cover it. When we don't need to cover the log because it is active or idle, we issue a log force from xfs_log_worker() - if the log is idle, then this does nothing. However, if the log is active due to there being items in the CIL, it will force the items in the CIL to the log and unpin them. In the case of the above deadlock scenario, instead of xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to cover the log, it will instead force the log, thereby unpinning the inode buffer, allowing IO to be issued and complete and hence removing the inode that was pinning the tail of the log from the AIL. At that point, everything will start moving along again. i.e. the xfs_log_worker turns back into a watchdog that can alleviate deadlocks based around pinned items that prevent the tail of the log from being moved... Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Eric Sandeen <[email protected]> Signed-off-by: Ben Myers <[email protected]>
2013-10-16Merge branch 'akpm' (fixes from Andrew Morton)Linus Torvalds3-6/+22
Merge misc fixes from Andrew Morton. * emailed patches from Andrew Morton <[email protected]>: (21 commits) mm: revert mremap pud_free anti-fix mm: fix BUG in __split_huge_page_pmd swap: fix set_blocksize race during swapon/swapoff procfs: call default get_unmapped_area on MMU-present architectures procfs: fix unintended truncation of returned mapped address writeback: fix negative bdi max pause percpu_refcount: export symbols fs: buffer: move allocation failure loop into the allocator mm: memcg: handle non-error OOM situations more gracefully tools/testing/selftests: fix uninitialized variable block/partitions/efi.c: treat size mismatch as a warning, not an error mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages mm/zswap: bugfix: memory leak when re-swapon mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages mm: migration: do not lose soft dirty bit if page is in migration state gcov: MAINTAINERS: Add an entry for gcov mm/hugetlb.c: correct missing private flag clearing mm/vmscan.c: don't forget to free shrinker->nr_deferred ipc/sem.c: synchronize semop and semctl with IPC_RMID ipc: update locking scheme comments ...
2013-10-16procfs: call default get_unmapped_area on MMU-present architecturesHATAYAMA Daisuke1-2/+6
Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added proc_reg_get_unmapped_area in proc_reg_file_ops and proc_reg_file_ops_no_compat, by which now mmap always returns EIO if get_unmapped_area method is not defined for the target procfs file, which causes regression of mmap on /proc/vmcore. To address this issue, like get_unmapped_area(), call default current->mm->get_unmapped_area on MMU-present architectures if pde->proc_fops->get_unmapped_area, i.e. the one in actual file operation in the procfs file, is not defined. Reported-by: Michael Holzheu <[email protected]> Signed-off-by: HATAYAMA Daisuke <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: David S. Miller <[email protected]> Tested-by: Michael Holzheu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-10-16procfs: fix unintended truncation of returned mapped addressHATAYAMA Daisuke1-1/+1
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the mapped virtual address returned from get_unmapped_area method in pde->proc_fops due to the variable rv of signed integer on x86_64. This is too small to have vitual address of unsigned long on x86_64 since on x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes. To fix this issue, use unsigned long instead. Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)"). Signed-off-by: HATAYAMA Daisuke <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: David S. Miller <[email protected]> Tested-by: Michael Holzheu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-10-16fs: buffer: move allocation failure loop into the allocatorJohannes Weiner1-2/+12
Buffer allocation has a very crude indefinite loop around waking the flusher threads and performing global NOFS direct reclaim because it can not handle allocation failures. The most immediate problem with this is that the allocation may fail due to a memory cgroup limit, where flushers + direct reclaim might not make any progress towards resolving the situation at all. Because unlike the global case, a memory cgroup may not have any cache at all, only anonymous pages but no swap. This situation will lead to a reclaim livelock with insane IO from waking the flushers and thrashing unrelated filesystem cache in a tight loop. Use __GFP_NOFAIL allocations for buffers for now. This makes sure that any looping happens in the page allocator, which knows how to orchestrate kswapd, direct reclaim, and the flushers sensibly. It also allows memory cgroups to detect allocations that can't handle failure and will allow them to ultimately bypass the limit if reclaim can not make progress. Reported-by: azurIt <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-10-16mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pagesCyrill Gorcunov1-1/+3
If a page we are inspecting is in swap we may occasionally report it as having soft dirty bit (even if it is clean). The pte_soft_dirty helper should be called on present pte only. Signed-off-by: Cyrill Gorcunov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Matt Mackall <[email protected]> Cc: Xiao Guangrong <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-10-16Merge branch 'for-linus' of ↵Linus Torvalds2-6/+4
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull tmpfile fix from Al Viro: "A fix for double iput() in ->tmpfile() on ext3 and ext4; I'd fucked it up, Miklos has caught it" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext[34]: fix double put in tmpfile
2013-10-16ecryptfs: Fix memory leakage in keystore.cGeyslan G. Bem1-1/+2
In 'decrypt_pki_encrypted_session_key' function: Initializes 'payload' pointer and releases it on exit. Signed-off-by: Geyslan G. Bem <[email protected]> Signed-off-by: Tyler Hicks <[email protected]> Cc: [email protected] # v2.6.28+
2013-10-16dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSYBart Van Assche1-3/+1
When dlm_release_lockspace(ls, 1) is invoked on a busy system immediately after the last dlm_unlock() AST has finished it can occur that lkb_idr_is_local() is invoked for the unlocked LKB since removal from ls_lkbidr only occurs after the AST has returned. If that happens dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing the lockspace. Fix this race condition by changing lkb_idr_is_local() such that it only returns true for LKB's that have not yet been unlocked. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: David Teigland <[email protected]>
2013-10-16ext3: Count journal as bsddf overhead in ext3_statfsEric Sandeen1-0/+4
ext4 counts journal space as bsddf overhead, but ext3 does not. For some reason when I patched ext4 I thought I should leave ext3 alone, but frankly it makes more sense to fix it, I think. Otherwise we get inconsistent behavior from ext3 under ext3.ko, and ext3 under ext4.ko, which is not at all desirable... This is testable by xfstests shared/289, though it will need modification because it currently special-cases ext3. Signed-off-by: Eric Sandeen <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2013-10-16ext4: fixup kerndoc annotation of mpage_map_and_submit_extent()Jan Kara1-0/+3
Document give_up_on_write argument of mpage_map_and_submit_extent(). Signed-off-by: Jan Kara <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2013-10-16ext4: fix assertion in ext4_add_complete_io()Jan Kara1-2/+3
It doesn't make sense to require io_end->handle when we are in nojournal mode. So update the assertion accordingly to avoid false warnings from ext4_add_complete_io(). Reported-by: Eric Whitney <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2013-10-15ext[34]: fix double put in tmpfileMiklos Szeredi2-6/+4
d_tmpfile() already swallowed the inode ref. Signed-off-by: Miklos Szeredi <[email protected]> Cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2013-10-15GFS2: Use lockref for glocksSteven Whitehouse4-49/+45
Currently glocks have an atomic reference count and also a spinlock which covers various internal fields, such as the state. This intent of this patch is to replace the spinlock and the atomic reference count with a lockref structure. This contains a spinlock which we can continue to use as before, and a reference counter which is used in conjuction with the spinlock to replace the previous atomic counter. As a result of this there are some new rules for reference counting on glocks. We need to distinguish between reference count changes under gl_spin (which are now just increment or decrement of the new counter, provided the count cannot hit zero) and those which are outside of gl_spin, but which now take gl_spin internally. The conversion is relatively straight forward. There is probably some further clean up which can be done, but the priority at this stage is to make the change in as simple a manner as possible. A consequence of this change is that the reference count is being decoupled from the lru list processing. This should allow future adoption of the lru_list code with glocks in due course. The reason for using the "dead" state and not just relying on 0 being the "invalid state" is so that in due course 0 ref counts can be allowable. The intent is to eventually be able to remove the ref count changes which are currently hidden away in state_change(). Signed-off-by: Steven Whitehouse <[email protected]>
2013-10-14cifs: ntstatus_to_dos_map[] is not terminatedTim Gardner1-1/+3
Functions that walk the ntstatus_to_dos_map[] array could run off the end. For example, ntstatus_to_dos() loops while ntstatus_to_dos_map[].ntstatus is not 0. Granted, this is mostly theoretical, but could be used as a DOS attack if the error code in the SMB header is bogus. [Might consider adding to stable, as this patch is low risk - Steve] Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Tim Gardner <[email protected]> Signed-off-by: Steve French <[email protected]>
2013-10-14sysfs/bin: Fix size handling overflow for bin_attributeBenjamin Herrenschmidt1-1/+1
While looking at the code, I noticed that bin_attribute read() and write() ops copy the inode size into an int for futher comparisons. Some bin_attributes can be fairly large. For example, pci creates some for BARs set to the BAR size and giant BARs are around the corner, so this is going to break something somewhere eventually. Let's use the right type. [adjust for seqfile conversions, only needed for bin_read() - gkh] Signed-off-by: Benjamin Herrenschmidt <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-10-14sysfs: make sysfs_file_ops() follow ignore_lockdep flagTejun Heo3-21/+20
375b611e60 ("sysfs: remove sysfs_buffer->ops") introduced sysfs_file_ops() which determines the associated file operation of a given sysfs_dirent. As file ops access should be protected by an active reference, the new function includes a lockdep assertion on the sysfs_dirent; unfortunately, I forgot to take attr->ignore_lockdep flag into account and the lockdep assertion trips spuriously for files which opt out from active reference lockdep checking. # cat /sys/devices/pci0000:00/0000:00:01.2/usb1/authorized ------------[ cut here ]------------ WARNING: CPU: 1 PID: 540 at /work/os/work/fs/sysfs/file.c:79 sysfs_file_ops+0x4e/0x60() Modules linked in: CPU: 1 PID: 540 Comm: cat Not tainted 3.11.0-work+ #3 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000009 ffff880016205c08 ffffffff81ca0131 0000000000000000 ffff880016205c40 ffffffff81096d0d ffff8800166cb898 ffff8800166f6f60 ffffffff8125a220 ffff880011ab1ec0 ffff88000aff0c78 ffff880016205c50 Call Trace: [<ffffffff81ca0131>] dump_stack+0x4e/0x82 [<ffffffff81096d0d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff81096dea>] warn_slowpath_null+0x1a/0x20 [<ffffffff8125994e>] sysfs_file_ops+0x4e/0x60 [<ffffffff8125a274>] sysfs_open_file+0x54/0x300 [<ffffffff811df612>] do_dentry_open.isra.17+0x182/0x280 [<ffffffff811df820>] finish_open+0x30/0x40 [<ffffffff811f0623>] do_last+0x503/0xd90 [<ffffffff811f0f6b>] path_openat+0xbb/0x6d0 [<ffffffff811f23ba>] do_filp_open+0x3a/0x90 [<ffffffff811e09a9>] do_sys_open+0x129/0x220 [<ffffffff811e0abe>] SyS_open+0x1e/0x20 [<ffffffff81caf3c2>] system_call_fastpath+0x16/0x1b ---[ end trace aa48096b111dafdb ]--- Rename fs/sysfs/dir.c::ignore_lockdep() to sysfs_ignore_lockdep() and move it to fs/sysfs/sysfs.h and make sysfs_file_ops() skip lockdep assertion if sysfs_ignore_lockdep() is true. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Yinghai Lu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-10-14Docs: Kconfig: `devlopers' -> `developers'Michael Witten1-1/+2
Signed-off-by: Michael Witten <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2013-10-12vfs: allow O_PATH file descriptors for fstatfs()Linus Torvalds1-1/+1
Olga reported that file descriptors opened with O_PATH do not work with fstatfs(), found during further development of ksh93's thread support. There is no reason to not allow O_PATH file descriptors here (fstatfs is very much a path operation), so use "fdget_raw()". See commit 55815f70147d ("vfs: make O_PATH file descriptors usable for 'fstat()'") for a very similar issue reported for fstat() by the same team. Reported-and-tested-by: ольга крыжановская <[email protected]> Acked-by: Al Viro <[email protected]> Cc: [email protected] # O_PATH introduced in 3.0+ Signed-off-by: Linus Torvalds <[email protected]>
2013-10-12Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "A bug fix and performance regression fix for ext4" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix memory leak in xattr ext4: fix performance regression in writeback of random writes
2013-10-12Merge branch 'for-linus' of ↵Linus Torvalds6-21/+25
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We've got more bug fixes in my for-linus branch: One of these fixes another corner of the compression oops from last time. Miao nailed down some problems with concurrent snapshot deletion and drive balancing. I kept out one of his patches for more testing, but these are all stable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix oops caused by the space balance and dead roots Btrfs: insert orphan roots into fs radix tree Btrfs: limit delalloc pages outside of find_delalloc_range Btrfs: use right root when checking for hash collision
2013-10-12ext4: fix memory leak in xattrDave Jones1-0/+2
If we take the 2nd retry path in ext4_expand_extra_isize_ea, we potentionally return from the function without having freed these allocations. If we don't do the return, we over-write the previous allocation pointers, so we leak either way. Spotted with Coverity. [ Fixed by tytso to set is and bs to NULL after freeing these pointers, in case in the retry loop we later end up triggering an error causing a jump to cleanup, at which point we could have a double free bug. -- Ted ] Signed-off-by: Dave Jones <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Reviewed-by: Eric Sandeen <[email protected]> Cc: [email protected]
2013-10-10aio: Fix a trinity splatKent Overstreet1-81/+48
aio kiocb refcounting was broken - it was relying on keeping track of the number of available ring buffer entries, which it needs to do anyways; then at shutdown time it'd wait for completions to be delivered until the # of available ring buffer entries equalled what it was initialized to. Problem with that is that the ring buffer is mapped writable into userspace, so userspace could futz with the head and tail pointers to cause the kernel to see extra completions, and cause free_ioctx() to return while there were still outstanding kiocbs. Which would be bad. Fix is just to directly refcount the kiocbs - which is more straightforward, and with the new percpu refcounting code doesn't cost us any cacheline bouncing which was the whole point of the original scheme. Also clean up ioctx_alloc()'s error path and fix a bug where it wasn't subtracting from aio_nr if ioctx_add_table() failed. Signed-off-by: Kent Overstreet <[email protected]>
2013-10-10Btrfs: fix oops caused by the space balance and dead rootsMiao Xie3-7/+17
When doing space balance and subvolume destroy at the same time, we met the following oops: kernel BUG at fs/btrfs/relocation.c:2247! RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs] Call Trace: [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs] [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs] [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs] [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs] [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs] [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs] [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs] [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs] [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs] [<ffffffff81122f53>] ? path_openat+0x234/0x4db [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391 [<ffffffff810f8ab6>] ? vma_link+0x74/0x94 [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39 [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2 [<ffffffff811259d4>] SyS_ioctl+0x57/0x83 [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10 [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b It is because we returned the error number if the reference of the root was 0 when doing space relocation. It was not right here, because though the root was dead(refs == 0), but the space it held still need be relocated, or we could not remove the block group. So in this case, we should return the root no matter it is dead or not. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-10-10Btrfs: insert orphan roots into fs radix treeMiao Xie1-5/+3
Now we don't drop all the deleted snapshots/subvolumes before the space balance. It means we have to relocate the space which is held by the dead snapshots/subvolumes. So we must into them into fs radix tree, or we would forget to commit the change of them when doing transaction commit, and it would corrupt the metadata. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-10-10Btrfs: limit delalloc pages outside of find_delalloc_rangeJosef Bacik1-8/+4
Liu fixed part of this problem and unfortunately I steered him in slightly the wrong direction and so didn't completely fix the problem. The problem is we limit the size of the delalloc range we are looking for to max bytes and then we try to lock that range. If we fail to lock the pages in that range we will shrink the max bytes to a single page and re loop. However if our first page is inside of the delalloc range then we will end up limiting the end of the range to a period before our first page. This is illustrated below [0 -------- delalloc range --------- 256mb] [page] So find_delalloc_range will return with delalloc_start as 0 and end as 128mb, and then we will notice that delalloc_start < *start and adjust it up, but not adjust delalloc_end up, so things go sideways. To fix this we need to not limit the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that way we don't end up with this confusion. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-10-10Btrfs: use right root when checking for hash collisionJosef Bacik1-1/+1
btrfs_rename was using the root of the old dir instead of the root of the new dir when checking for a hash collision, so if you tried to move a file into a subvol it would freak out because it would see the file you are trying to move in its current root. This fixes the bug where this would fail btrfs subvol create test1 btrfs subvol create test2 mv test1 test2. Thanks to Chris Murphy for catching this, Cc: [email protected] Reported-by: Chris Murphy <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>