aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-07-21vhost: handle wrap around in # of bufs mathShirley Ma1-3/+9
The meth for calculating the # of outstanding buffers gives incorrect results when vq->upend_idx wraps around zero. Fix that. Signed-off-by: Shirley Ma <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2011-07-21mutex: Make mutex_destroy() an inline functionJean Delvare1-1/+1
The non-debug variant of mutex_destroy is a no-op, currently implemented as a macro which does nothing. This approach fails to check the type of the parameter, so an error would only show when debugging gets enabled. Using an inline function instead, offers type checking for earlier bug catching. Signed-off-by: Jean Delvare <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-21Merge branch 'tip/perf/core' of ↵Ingo Molnar26-1003/+1663
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2011-07-21Merge branch 'perf/urgent' into perf/coreIngo Molnar7-38/+148
Merge reason: pick up the latest fixes - they won't make v3.0. Signed-off-by: Ingo Molnar <[email protected]>
2011-07-21vhost-net: update used ring on backend changeMichael S. Tsirkin1-1/+5
On backend change, we flushed out outstanding skbs but forgot to update the used ring, so that done entries were left in the ubuf_info ring. As a result we lose heads or complete incorrect ones, crashing the guest or leaking memory. Fix by updating the used ring. Signed-off-by: Michael S. Tsirkin <[email protected]>
2011-07-21x86: Serialize EFI time accesses on rtc_lockJan Beulich1-6/+33
The EFI specification requires that callers of the time related runtime functions serialize with other CMOS accesses in the kernel, as the EFI time functions may choose to also use the legacy CMOS RTC. Besides fixing a latent bug, this is a prerequisite to safely enable the rtc-efi driver for x86, which ought to be preferred over rtc-cmos on all EFI platforms. Signed-off-by: Jan Beulich <[email protected]> Acked-by: Matthew Garrett <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Cc: Matthew Garrett <[email protected]>
2011-07-21x86: Serialize SMP bootup CMOS accesses on rtc_lockJan Beulich1-0/+8
With CPU hotplug, there is a theoretical race between other CMOS (namely RTC) accesses and those done in the SMP secondary processor bringup path. I am unware of the problem having been noticed by anyone in practice, but it would very likely be rather spurious and very hard to reproduce. So to be on the safe side, acquire rtc_lock around those accesses. Signed-off-by: Jan Beulich <[email protected]> Cc: John Stultz <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-21x86: Fix write lock scalability 64-bit issueJan Beulich6-28/+73
With the write lock path simply subtracting RW_LOCK_BIAS there is, on large systems, the theoretical possibility of overflowing the 32-bit value that was used so far (namely if 128 or more CPUs manage to do the subtraction, but don't get to do the inverse addition in the failure path quickly enough). A first measure is to modify RW_LOCK_BIAS itself - with the new value chosen, it is good for up to 2048 CPUs each allowed to nest over 2048 times on the read path without causing an issue. Quite possibly it would even be sufficient to adjust the bias a little further, assuming that allowing for significantly less nesting would suffice. However, as the original value chosen allowed for even more nesting levels, to support more than 2048 CPUs (possible currently only for 64-bit kernels) the lock itself gets widened to 64 bits. Signed-off-by: Jan Beulich <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-21x86: Unify rwsem assembly implementationJan Beulich5-100/+62
Rather than having two functionally identical implementations for 32- and 64-bit configurations, use the previously extended assembly abstractions to fold the rwsem two implementations into a shared one. Signed-off-by: Jan Beulich <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-21x86: Unify rwlock assembly implementationJan Beulich6-90/+56
Rather than having two functionally identical implementations for 32- and 64-bit configurations, extend the existing assembly abstractions enough to fold the two rwlock implementations into a shared one. Signed-off-by: Jan Beulich <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-20fs:update the NOTE of the file_operations structureWanlong Gao1-5/+0
Big kernel lock had been removed and setlease now use the lock_flocks() to hold a special spin lock file_lock_lock by Matthew. So just remove the out-of-date NOTE. Signed-off-by: Wanlong Gao <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-21CIFS: Fix wrong length in cifs_iovec_readPavel Shilovsky1-1/+1
Signed-off-by: Pavel Shilovsky <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
2011-07-20Remove dead code in dget_parent()Al Viro1-5/+0
->d_parent is never NULL... Signed-off-by: Al Viro <[email protected]>
2011-07-20AFS: Fix silly characters in a commentDavid Howells1-1/+1
Fix silly characters in a comment in AFS code (some weird characters replaced the word 'flag' some point way back). Reported-by: [email protected] Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20switch d_add_ci() to d_splice_alias() in "found negative" case as wellAl Viro1-19/+5
Signed-off-by: Al Viro <[email protected]>
2011-07-20simplify gfs2_lookup()Al Viro1-11/+3
d_splice_alias() will DTRT when given NULL or ERR_PTR Signed-off-by: Al Viro <[email protected]>
2011-07-20jfs_lookup(): don't bother with . or ..Al Viro1-24/+15
they'll never be passed to ->lookup() Signed-off-by: Al Viro <[email protected]>
2011-07-20get rid of useless dget_parent() in btrfs rename() and link()Al Viro1-4/+2
->d_parent is locked and stable there... Signed-off-by: Al Viro <[email protected]>
2011-07-20get rid of useless dget_parent() in fs/btrfs/ioctl.cAl Viro1-12/+4
both callers there have dentry->d_parent stabilized by the fact that their caller had obtained dentry from lookup_one_len() and had not dropped ->i_mutex on parent since then. Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlersJosef Bacik71-164/+462
Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20drivers: fix up various ->llseek() implementationsJosef Bacik4-0/+14
Fix up a few ->llseek() implementations that won't deal with SEEK_HOLE/SEEK_DATA properly. Make them future proof so that if we ever add new options they will return -EINVAL. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseekJosef Bacik7-12/+66
This converts everybody to handle SEEK_HOLE/SEEK_DATA properly. In some cases we just return -EINVAL, in others we do the normal generic thing, and in others we're simply making sure that the properly due-dilligence is done. For example in NFS/CIFS we need to make sure the file size is update properly for the SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself that is all we have to do. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20Ext4: handle SEEK_HOLE/SEEK_DATA genericallyJosef Bacik1-0/+21
Since Ext4 has its own lseek we need to make sure it handles SEEK_HOLE/SEEK_DATA. For now just do the same thing that is done in the generic case, somebody else can come along and make it do fancy things later. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20Btrfs: implement our own ->llseekJosef Bacik2-1/+150
In order to handle SEEK_HOLE/SEEK_DATA we need to implement our own llseek. Basically for the normal SEEK_*'s we will just defer to the generic helper, and for SEEK_HOLE/SEEK_DATA we will use our fiemap helper to figure out the nearest hole or data. Currently this helper doesn't check for delalloc bytes for prealloc space, so for now treat prealloc as data until that is fixed. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: add SEEK_HOLE and SEEK_DATA flagsJosef Bacik3-4/+54
This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. Turns out using fiemap in things like cp cause more problems than it solves, so lets try and give userspace an interface that doesn't suck. We need to match solaris here, and the definitions are *o* If /whence/ is SEEK_HOLE, the offset of the start of the next hole greater than or equal to the supplied offset is returned. The definition of a hole is provided near the end of the DESCRIPTION. *o* If /whence/ is SEEK_DATA, the file pointer is set to the start of the next non-hole file region greater than or equal to the supplied offset. So in the generic case the entire file is data and there is a virtual hole at the end. That means we will just return i_size for SEEK_HOLE and will return the same offset for SEEK_DATA. This is how Solaris does it so we have to do it the same way. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20reiserfs: make reiserfs default to barrier=flushChristoph Hellwig1-0/+1
Change the default reiserfs mount option to barrier=flush. Based on a patch from Jeff Mahoney in the SuSE tree. Signed-off-by: Jeff Mahoney <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20ext3: make ext3 mount default to barrier=1Christoph Hellwig1-0/+2
This patch turns on barriers by default for ext3. mount -o barrier=0 will turn them off. Based on a patch from Chris Mason in the SuSE tree. Signed-off-by: Chris Mason <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Eric Sandeen <[email protected]> Acked-by: Jan Kara <[email protected]> Acked-by: Jeff Mahoney <[email protected]> Acked-by: Ted Ts'o <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20don't open-code parent_ino() in assorted ->readdir()Al Viro3-3/+3
Signed-off-by: Al Viro <[email protected]>
2011-07-20minix_getattr(): don't bother with ->d_parentAl Viro1-2/+1
we can find superblock easier, TYVM... Signed-off-by: Al Viro <[email protected]>
2011-07-20coda_venus_readdir(): use offsetof()Al Viro1-2/+1
Signed-off-by: Al Viro <[email protected]>
2011-07-20arm: don't create useless copies to pass into debugfs_create_dir()Al Viro2-12/+5
its first argument is const char * and it's really not modified... Signed-off-by: Al Viro <[email protected]>
2011-07-20switch assorted clock drivers to debugfs_remove_recursive()Al Viro6-41/+13
Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: seq_file - add event counter to simplify poll() supportKay Sievers6-43/+20
Moving the event counter into the dynamically allocated 'struc seq_file' allows poll() support without the need to allocate its own tracking structure. All current users are switched over to use the new counter. Requested-by: Andrew Morton [email protected] Acked-by: NeilBrown <[email protected]> Tested-by: Lucas De Marchi [email protected] Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: move inode_dio_done to the end_io handlerChristoph Hellwig4-3/+13
For filesystems that delay their end_io processing we should keep our i_dio_count until the the processing is done. Enable this by moving the inode_dio_done call to the end_io handler if one exist. Note that the actual move to the workqueue for ext4 and XFS is not done in this patch yet, but left to the filesystem maintainers. At least for XFS it's not needed yet either as XFS has an internal equivalent to i_dio_count. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: simplify the blockdev_direct_IO prototypeChristoph Hellwig10-29/+26
Simple filesystems always pass inode->i_sb_bdev as the block device argument, and never need a end_io handler. Let's simply things for them and for my grepping activity by dropping these arguments. The only thing not falling into that scheme is ext4, which passes and end_io handler without needing special flags (yet), but given how messy the direct I/O code there is use of __blockdev_direct_IO in one instead of two out of three cases isn't going to make a large difference anyway. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: always maintain i_dio_countChristoph Hellwig3-24/+17
Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING. This these filesystems to also protect truncate against direct I/O requests by using common code. Right now the only non-DIO_LOCKING filesystem that appears to do so is XFS, which uses an opencoded variant of the i_dio_count scheme. Behaviour doesn't change for filesystems never calling inode_dio_wait. For ext4 behaviour changes when using the dioread_nonlock option, which previously was missing any protection between truncate and direct I/O reads. For ocfs2 that handcrafted i_dio_count manipulations are replaced with the common code now enable. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: move inode_dio_wait calls into ->setattrChristoph Hellwig12-3/+24
Let filesystems handle waiting for direct I/O requests themselves instead of doing it beforehand. This means filesystem-specific locks to prevent new dio referenes from appearing can be held. This is important to allow generalizing i_dio_count to non-DIO_LOCKING filesystems. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20rw_semaphore: remove up/down_read_non_ownerChristoph Hellwig2-26/+0
Now that the last users is gone these can be removed. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: kill i_alloc_semChristoph Hellwig13-53/+78
i_alloc_sem is a rather special rw_semaphore. It's the last one that may be released by a non-owner, and it's write side is always mirrored by real exclusion. It's intended use it to wait for all pending direct I/O requests to finish before starting a truncate. Replace it with a hand-grown construct: - exclusion for truncates is already guaranteed by i_mutex, so it can simply fall way - the reader side is replaced by an i_dio_count member in struct inode that counts the number of pending direct I/O requests. Truncate can't proceed as long as it's non-zero - when i_dio_count reaches non-zero we wake up a pending truncate using wake_up_bit on a new bit in i_flags - new references to i_dio_count can't appear while we are waiting for it to read zero because the direct I/O count always needs i_mutex (or an equivalent like XFS's i_iolock) for starting a new operation. This scheme is much simpler, and saves the space of a spinlock_t and a struct list_head in struct inode (typically 160 bits on a non-debug 64-bit system). Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20fs: simplify handling of zero sized reads in __blockdev_direct_IOChristoph Hellwig1-2/+5
Reject zero sized reads as soon as we know our I/O length, and don't borther with locks or allocations that might have to be cleaned up otherwise. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20ext4: Rewrite ext4_page_mkwrite() to use generic helpersJan Kara1-51/+55
Rewrite ext4_page_mkwrite() to use __block_page_mkwrite() helper. This removes the need of using i_alloc_sem to avoid races with truncate which seems to be the wrong locking order according to lock ordering documented in mm/rmap.c. Also calling ext4_da_write_begin() as used by the old code seems to be problematic because we can decide to flush delay-allocated blocks which will acquire s_umount semaphore - again creating unpleasant lock dependency if not directly a deadlock. Also add a check for frozen filesystem so that we don't busyloop in page fault when the filesystem is frozen. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20fat: remove i_alloc_sem abuseChristoph Hellwig3-2/+7
Add a new rw_semaphore to protect bmap against truncate. Previous i_alloc_sem was abused for this, but it's going away in this series. Note that we can't simply use i_mutex, given that the swapon code calls ->bmap under it. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20VFS: Fixup kerneldoc for generic_permission()Tobias Klauser1-1/+0
The flags parameter went away in d749519b444db985e40b897f73ce1898b11f997e Signed-off-by: Tobias Klauser <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20anonfd: fix missing declarationTomasz Stanislawski1-0/+2
The forward declaration of struct file_operations is added to avoid compilation warnings. Signed-off-by: Tomasz Stanislawski <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20xfs: make use of new shrinker callout for the inode cacheDave Chinner3-56/+46
Convert the inode reclaim shrinker to use the new per-sb shrinker operations. This allows much bigger reclaim batches to be used, and allows the XFS inode cache to be shrunk in proportion with the VFS dentry and inode caches. This avoids the problem of the VFS caches being shrunk significantly before the XFS inode cache is shrunk resulting in imbalances in the caches during reclaim. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20vfs: increase shrinker batch sizeDave Chinner2-0/+7
Now that the per-sb shrinker is responsible for shrinking 2 or more caches, increase the batch size to keep econmies of scale for shrinking each cache. Increase the shrinker batch size to 1024 objects. To allow for a large increase in batch size, add a conditional reschedule to prune_icache_sb() so that we don't hold the LRU spin lock for too long. This mirrors the behaviour of the __shrink_dcache_sb(), and allows us to increase the batch size without needing to worry about problems caused by long lock hold times. To ensure that filesystems using the per-sb shrinker callouts don't cause problems, document that the object freeing method must reschedule appropriately inside loops. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20superblock: add filesystem shrinker operationsDave Chinner3-12/+51
Now we have a per-superblock shrinker implementation, we can add a filesystem specific callout to it to allow filesystem internal caches to be shrunk by the superblock shrinker. Rather than perpetuate the multipurpose shrinker callback API (i.e. nr_to_scan == 0 meaning "tell me how many objects freeable in the cache), two operations will be added. The first will return the number of objects that are freeable, the second is the actual shrinker call. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20inode: remove iprune_semDave Chinner1-21/+0
Now that we have per-sb shrinkers with a lifecycle that is a subset of the superblock lifecycle and can reliably detect a filesystem being unmounted, there is not longer any race condition for the iprune_sem to protect against. Hence we can remove it. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20superblock: introduce per-sb cache shrinker infrastructureDave Chinner6-257/+121
With context based shrinkers, we can implement a per-superblock shrinker that shrinks the caches attached to the superblock. We currently have global shrinkers for the inode and dentry caches that split up into per-superblock operations via a coarse proportioning method that does not batch very well. The global shrinkers also have a dependency - dentries pin inodes - so we have to be very careful about how we register the global shrinkers so that the implicit call order is always correct. With a per-sb shrinker callout, we can encode this dependency directly into the per-sb shrinker, hence avoiding the need for strictly ordering shrinker registrations. We also have no need for any proportioning code for the shrinker subsystem already provides this functionality across all shrinkers. Allowing the shrinker to operate on a single superblock at a time means that we do less superblock list traversals and locking and reclaim should batch more effectively. This should result in less CPU overhead for reclaim and potentially faster reclaim of items from each filesystem. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20xfs: add size update tracepoint to IO completionDave Chinner2-4/+9
For improving insight into IO completion behaviour. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Alex Elder <[email protected]>