aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2013-09-21Btrfs: drop dir i_size when adding new names on replayJosef Bacik1-0/+27
So if we have dir_index items in the log that means we also have the inode item as well, which means that the inode's i_size is correct. However when we process dir_index'es we call btrfs_add_link() which will increase the directory's i_size for the new entry. To fix this we need to just set the dir items i_size to 0, and then as we find dir_index items we adjust the i_size. btrfs_add_link() will do it for new entries, and if the entry already exists we can just add the name_len to the i_size ourselves. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Btrfs: replay dir_index items before other itemsJosef Bacik1-3/+12
A user reported a bug where his log would not replay because he was getting -EEXIST back. This was because he had a file moved into a directory that was logged. What happens is the file had a lower inode number, and so it is processed first when replaying the log, and so we add the inode ref in for the directory it was moved to. But then we process the directories DIR_INDEX item and try to add the inode ref for that inode and it fails because we already added it when we replayed the inode. To solve this problem we need to just process any DIR_INDEX items we have in the log first so this all is taken care of, and then we can replay the rest of the items. With this patch my reproducer can remount the file system properly instead of erroring out. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Btrfs: check roots last log commit when checking if an inode has been loggedJosef Bacik1-1/+4
Liu introduced a local copy of the last log commit for an inode to make sure we actually log an inode even if a log commit has already taken place. In order to make sure we didn't relog the same inode multiple times he set this local copy to the current trans when we log the inode, because usually we log the inode and then sync the log. The exception to this is during rename, we will relog an inode if the name changed and it is already in the log. The problem with this is then we go to sync the inode, and our check to see if the inode has already been logged is tripped and we don't sync the log. To fix this we need to _also_ check against the roots last log commit, because it could be less than what is in our local copy of the log commit. This fixes a bug where we rename a file into a directory and then fsync the directory and then on remount the directory is no longer there. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Btrfs: actually log directory we are fsync()'ingJosef Bacik1-1/+9
If you just create a directory and then fsync that directory and then pull the power plug you will come back up and the directory will not be there. That is because we won't actually create directories if we've logged files inside of them since they will be created on replay, but in this check we will set our logged_trans of our current directory if it happens to be a directory, making us think it doesn't need to be logged. Fix the logic to only do this to parent directories. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Btrfs: actually limit the size of delalloc rangeJosef Bacik1-3/+5
So forever we have had this thing to limit the amount of delalloc pages we'll setup to be written out to 128mb. This is because we have to lock all the pages in this range, so anything above this gets a bit unweildly, and also without a limit we'll happily allocate gigantic chunks of disk space. Turns out our check for this wasn't quite right, we wouldn't actually limit the chunk we wanted to write out, we'd just stop looking for more space after we went over the limit. So if you do a giant 20gb dd on my box with lots of ram I could get 2gig extents. This is fine normally, except when you go to relocate these extents and we can't find enough space to relocate these moster extents, since we have to be able to allocate exactly the same sized extent to move it around. So fix this by actually enforcing the limit. With this patch I'm no longer seeing giant 1.5gb extents. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Btrfs: allocate the free space by the existed max extent size when ENOSPCMiao Xie3-31/+74
By the current code, if the requested size is very large, and all the extents in the free space cache are small, we will waste lots of the cpu time to cut the requested size in half and search the cache again and again until it gets down to the size the allocator can return. In fact, we can know the max extent size in the cache after the first search, so we needn't cut the size in half repeatedly, and just use the max extent size directly. This way can save lots of cpu time and make the performance grow up when there are only fragments in the free space cache. According to my test, if there are only 4KB free space extents in the fs, and the total size of those extents are 256MB, we can reduce the execute time of the following test from 5.4s to 1.4s. dd if=/dev/zero of=<testfile> bs=1MB count=1 oflag=sync Changelog v2 -> v3: - fix the problem that we skip the block group with the space which is less than we need. Changelog v1 -> v2: - address the problem that we return a wrong start position when searching the free space in a bitmap. Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21btrfs: add lockdep and tracing annotations for uuid treeDavid Sterba1-0/+1
Signed-off-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21btrfs: show compiled-in config features at module load timeStefan Behrens1-0/+3
We want to know if there are debugging features compiled in, this may affect performance. The message is printed before the sanity checks. (This commit message is a copy of David Sterba's commit message when he introduced btrfs_print_info()). Signed-off-by: Stefan Behrens <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Btrfs: more efficient inode tree replace operationFilipe David Borba Manana1-5/+5
Instead of removing the current inode from the red black tree and then add the new one, just use the red black tree replace operation, which is more efficient. Signed-off-by: Filipe David Borba Manana <[email protected]> Reviewed-by: Zach Brown <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Btrfs: do not add replace target to the alloc_listIlya Dryomov1-1/+2
If replace was suspended by the umount, replace target device is added to the fs_devices->alloc_list during a later mount. This is obviously wrong. ->is_tgtdev_for_dev_replace is supposed to guard against that, but ->is_tgtdev_for_dev_replace is (and can only ever be) initialized *after* everything is opened and fs_devices lists are populated. Fix this by checking the devid instead: for replace targets it's always equal to BTRFS_DEV_REPLACE_DEVID. Cc: Stefan Behrens <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Stefan Behrens <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Btrfs: fixup error handling in btrfs_reloc_cowJosef Bacik3-22/+32
If we failed to actually allocate the correct size of the extent to relocate we will end up in an infinite loop because we won't return an error, we'll just move on to the next extent. So fix this up by returning an error, and then fix all the callers to return an error up the stack rather than BUG_ON()'ing. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2013-09-21Merge tag 'v3.11' into for-linusChris Mason4-49/+57
Linux 3.11
2013-09-20CacheFiles: Don't try to dump the index key if the cookie has been clearedDavid Howells1-1/+1
Don't try to dump the index key that distinguishes an object if netfs data in the cookie the object refers to has been cleared (ie. the cookie has passed most of the way through __fscache_relinquish_cookie()). Since the netfs holds the index key, we can't get at it once the ->def and ->netfs_data pointers have been cleared - and a NULL pointer exception will ensue, usually just after a: CacheFiles: Error: Unexpected object collision error is reported. Signed-off-by: David Howells <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-20CacheFiles: Fix memory leak in cachefiles_check_auxdata error pathsJosh Boyer1-14/+15
In cachefiles_check_auxdata(), we allocate auxbuf but fail to free it if we determine there's an error or that the data is stale. Further, assigning the output of vfs_getxattr() to auxbuf->len gives problems with checking for errors as auxbuf->len is a u16. We don't actually need to set auxbuf->len, so keep the length in a variable for now. We shouldn't need to check the upper limit of the buffer as an overflow there should be indicated by -ERANGE. While we're at it, fscache_check_aux() returns an enum value, not an int, so assign it to an appropriately typed variable rather than to ret. Signed-off-by: Josh Boyer <[email protected]> Signed-off-by: David Howells <[email protected]> cc: Hongyi Jia <[email protected]> cc: Milosz Tanski <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-19Merge branch 'for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "These fix several bugs with RBD from 3.11 that didn't get tested in time for the merge window: some error handling, a use-after-free, and a sequencing issue when unmapping and image races with a notify operation. There is also a patch fixing a problem with the new ceph + fscache code that just went in" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: fscache: check consistency does not decrement refcount rbd: fix error handling from rbd_snap_name() rbd: ignore unmapped snapshots that no longer exist rbd: fix use-after free of rbd_dev->disk rbd: make rbd_obj_notify_ack() synchronous rbd: complete notifies before cleaning up osd_client and rbd_dev libceph: add function to ensure notifies are complete
2013-09-18Merge branch 'for-linus' of ↵Linus Torvalds8-36/+55
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "atomic_open-related fixes (Miklos' series, with EEXIST-related parts replaced with fix in fs/namei.c:atomic_open() instead of messing with the instances) + race fix in autofs + leak on failure exit in 9p" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: don't forget to destroy inode cache if fscache registration fails atomic_open: take care of EEXIST in no-open case with O_CREAT|O_EXCL in fs/namei.c vfs: don't set FILE_CREATED before calling ->atomic_open() nfs: set FILE_CREATED gfs2: set FILE_CREATED cifs: fix filp leak in cifs_atomic_open() vfs: improve i_op->atomic_open() documentation autofs4: close the races around autofs4_notify_daemon()
2013-09-18Merge tag 'please-pull-pstore' of ↵Linus Torvalds1-6/+23
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore/compression fixes from Tony Luck: "Three pstore fixes related to compression: 1) Better adjustment of size of compression buffer (was too big for EFIVARS backend resulting in compression failure 2) Use zlib_inflateInit2 instead of zlib_inflateInit 3) Don't print messages about compression failure. They will waste space that may better be used to log console output leading to the crash" * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: pstore: Remove the messages related to compression failure pstore: Use zlib_inflateInit2 instead of zlib_inflateInit pstore: Adjust buffer size for compression for smaller registered buffers
2013-09-18cifs: stop trying to use virtual circuitsJeff Layton3-88/+1
Currently, we try to ensure that we use vcnum of 0 on the first established session on a connection and then try to use a different vcnum on each session after that. This is a little odd, since there's no real reason to use a different vcnum for each SMB session. I can only assume there was some confusion between SMB sessions and VCs. That's somewhat understandable since they both get created during SESSION_SETUP, but the documentation indicates that they are really orthogonal. The comment on max_vcs in particular looks quite misguided. An SMB session is already uniquely identified by the SMB UID value -- there's no need to again uniquely ID with a VC. Furthermore, a vcnum of 0 is a cue to the server that it should release any resources that were previously held by the client. This sounds like a good thing, until you consider that: a) it totally ignores the fact that other programs on the box (e.g. smbclient) might have connections established to the server. Using a vcnum of 0 causes them to get kicked off. b) it causes problems with NAT. If several clients are connected to the same server via the same NAT'ed address, whenever one connects to the server it kicks off all the others, which then reconnect and kick off the first one...ad nauseum. I don't see any reason to ignore the advice in "Implementing CIFS" which has a comprehensive treatment of virtual circuits. In there, it states "...and contrary to the specs the client should always use a VcNumber of one, never zero." Have the client just use a hardcoded vcnum of 1, and stop abusing the special behavior of vcnum 0. Reported-by: [email protected] <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Volker Lendecke <[email protected]> Signed-off-by: Steve French <[email protected]>
2013-09-18CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing themDavid Howells3-0/+28
In cifs_readpages(), we may decide we don't want to read a page after all - but the page may already have passed through fscache_read_or_alloc_pages() and thus have marks and reservations set. Thus we have to call fscache_readpages_cancel() or fscache_uncache_page() on the pages we're returning to clear the marks. NFS, AFS and 9P should be unaffected by this as they call read_cache_pages() which does the cleanup for you. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
2013-09-18fuse: fix fallocate vs. ftruncate raceMaxim Patlasov1-0/+7
A former patch introducing FUSE_I_SIZE_UNSTABLE flag provided detailed description of races between ftruncate and anyone who can extend i_size: > 1. As in the previous scenario fuse_dentry_revalidate() discovered that i_size > changed (due to our own fuse_do_setattr()) and is going to call > truncate_pagecache() for some 'new_size' it believes valid right now. But by > the time that particular truncate_pagecache() is called ... > 2. fuse_do_setattr() returns (either having called truncate_pagecache() or > not -- it doesn't matter). > 3. The file is extended either by write(2) or ftruncate(2) or fallocate(2). > 4. mmap-ed write makes a page in the extended region dirty. This patch adds necessary bits to fuse_file_fallocate() to protect from that race. Signed-off-by: Maxim Patlasov <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Cc: [email protected]
2013-09-18fuse: wait for writeback in fuse_file_fallocate()Maxim Patlasov1-6/+10
The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE): 1) An user makes a page dirty via mmap-ed write. 2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE and <offset, size> covering the page. 3) Before truncate_pagecache_range call from fuse_file_fallocate, the page goes to write-back. The page is fully processed by fuse_writepage (including end_page_writeback on the page), but fuse_flush_writepages did nothing because fi->writectr < 0. 4) truncate_pagecache_range is called and fuse_file_fallocate is finishing by calling fuse_release_nowrite. The latter triggers processing queued write-back request which will write stale data to the hole soon. Changed in v2 (thanks to Brian for suggestion): - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise, we can end up in returning -ENOTSUPP while user data is already punched from page cache. Use filemap_write_and_wait_range() instead. Changed in v3 (thanks to Miklos for suggestion): - fuse_wait_on_writeback() is prone to livelocks; use fuse_set_nowrite() instead. So far as we need a dirty-page barrier only, fuse_sync_writes() should be enough. - rebased to for-linus branch of fuse.git Signed-off-by: Maxim Patlasov <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Cc: [email protected]
2013-09-18GFS2: new function gfs2_rbm_incrBob Peterson1-3/+27
Since the previous patch eliminated bi in favor of bii, this follow-on patch needed to be adjusted accordingly. Here is the revised version. This patch adds a new function, gfs2_rbm_incr, which increments an rbm structure. This is more efficient than calling gfs2_rbm_to_block, incrementing, then calling gfs2_rbm_from_block. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2013-09-18GFS2: Introduce rbm field biiBob Peterson2-54/+65
This is a respin of the original patch. As Steve pointed out, the introduction of field bii makes it easy to eliminate bi itself. This revised patch does just that, replacing bi with bii. This patch adds a new field to the rbm structure, called bii, which is an index into the array of bitmaps for an rgrp. This replaces *bi which was a pointer to the bitmap. This is being done for further optimizations. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2013-09-179p: don't forget to destroy inode cache if fscache registration failsAl Viro1-3/+4
Signed-off-by: Al Viro <[email protected]>
2013-09-17atomic_open: take care of EEXIST in no-open case with O_CREAT|O_EXCL in ↵Al Viro2-21/+20
fs/namei.c Signed-off-by: Al Viro <[email protected]>
2013-09-17bio-integrity: Fix use of bs->bio_integrity_pool after freeBjorn Helgaas1-1/+1
This fixes a copy and paste error introduced by 9f060e2231 ("block: Convert integrity to bvec_alloc_bs()"). Found by Coverity (CID 1020654). Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Kent Overstreet <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2013-09-17jfs: fix error path in iallocDave Kleikamp1-2/+1
If insert_inode_locked() fails, we shouldn't be calling unlock_new_inode(). Signed-off-by: Dave Kleikamp <[email protected]> Tested-by: Michael L. Semon <[email protected]> Cc: [email protected]
2013-09-17GFS2: Do not reset flags on active reservationsBob Peterson1-1/+0
When we used try locks for rgrps on block allocations, it was important to clear the flags field so that we used a blocking hold on the glock. Now that we're not doing try locks, clearing flags is unnecessary, and a waste of time. In fact, it's probably doing the wrong thing because it clears the GL_SKIP bit that was set for the lvb tracking purposes. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2013-09-17GFS2: introduce bi_blocks for optimizationBob Peterson2-1/+6
This patch introduces a new field in the bitmap structure called bi_blocks. Its purpose is to save us from constantly multiplying bi_len by the constant GFS2_NBBY. It also paves the way for more optimization in a future patch. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2013-09-17GFS2: optimize rbm_from_block wrt bi_startBob Peterson1-1/+1
In function gfs2_rbm_from_block, it starts by checking if the block falls within the first bitmap. It does so by checking if the rbm's offset is less than (rbm->bi->bi_start + rbm->bi->bi_len) * GFS2_NBBY. However, the first bitmap will always have bi_start==0. Therefore this is an unnecessary calculation in a function that gets called billions of times. This patch removes the reference to bi_start. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2013-09-17GFS2: d_splice_alias() can't return errorMiklos Szeredi1-3/+1
unless it was given an IS_ERR(inode), which isn't the case here. So clean up the unnecessary error handling in gfs2_create_inode(). This paves the way for real fixes (hence the stable Cc). Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]> Cc: [email protected]
2013-09-16fs/minix: Drop dependency on H8300Guenter Roeck1-1/+1
The H8/300 architecture is gone. Drop dependencies on it. Cc: Yoshinori Sato <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Guenter Roeck <[email protected]>
2013-09-16vfs: don't set FILE_CREATED before calling ->atomic_open()Miklos Szeredi1-3/+8
If O_CREAT|O_EXCL are passed to open, then we know that either - the file is successfully created, or - the operation fails in some way. So previously we set FILE_CREATED before calling ->atomic_open() so the filesystem doesn't have to. This, however, led to bugs in the implementation that went unnoticed when the filesystem didn't check for existence, yet returned success. To prevent this kind of bug, require filesystems to always explicitly set FILE_CREATED on O_CREAT|O_EXCL and verify this in the VFS. Also added a couple more verifications for the result of atomic_open(): - Warn if filesystem set FILE_CREATED despite the lack of O_CREAT. - Warn if filesystem set FILE_CREATED but gave a negative dentry. Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-16nfs: set FILE_CREATEDMiklos Szeredi1-0/+3
Set FILE_CREATED on O_CREAT|O_EXCL. If the NFS server honored our request for exclusivity then this must be correct. Currently this is a no-op, since the VFS sets FILE_CREATED anyway. The next patch will, however, require this flag to be always set by filesystems. Signed-off-by: Miklos Szeredi <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-16gfs2: set FILE_CREATEDMiklos Szeredi1-1/+3
In gfs2_create_inode() set FILE_CREATED in *opened. Signed-off-by: Miklos Szeredi <[email protected]> Cc: Steven Whitehouse <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-16cifs: fix filp leak in cifs_atomic_open()Miklos Szeredi1-0/+1
If an error occurs after having called finish_open() then fput() needs to be called on the already opened file. Signed-off-by: Miklos Szeredi <[email protected]> Cc: Steve French <[email protected]> Cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2013-09-16vfs: improve i_op->atomic_open() documentationMiklos Szeredi1-3/+18
Fix documentation of ->atomic_open() and related functions: finish_open() and finish_no_open(). Also add details that seem to be unclear and a source of bugs (some of which are fixed in the following series). Cc-ing maintainers of all filesystems implementing ->atomic_open(). Signed-off-by: Miklos Szeredi <[email protected]> Cc: Eric Van Hensbergen <[email protected]> Cc: Sage Weil <[email protected]> Cc: Steve French <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-09-16autofs4: close the races around autofs4_notify_daemon()Al Viro1-10/+3
Don't drop ->wq_mutex before calling autofs4_notify_daemon() only to regain it there. Besides being pointless, that opens a race window where autofs4_wait_release() could've come and freed wq->name.name. And do the debugging printk in the "reused an existing wq" case before dropping ->wq_mutex - the same reason... Signed-off-by: Al Viro <[email protected]> Acked-by: Ian Kent <[email protected]>
2013-09-16Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-5/+10
Pull CIFS fixes from Steve French: "Two minor cifs fixes and a minor documentation cleanup for cifs.txt" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: update cifs.txt and remove some outdated infos cifs: Avoid calling unlock_page() twice in cifs_readpage() when using fscache cifs: Do not take a reference to the page in cifs_readpage_worker()
2013-09-16Merge tag 'upstream-3.12-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds1-3/+4
Pull ubifs fix from Artem Bityutskiy: "Just one patch which fixes the power-cut recovery testing mode. I'll start using a single UBI/UBIFS tree instead of 2 trees from now on. So in the future you'll get 1 small pull request instead of 2 tiny ones" * tag 'upstream-3.12-rc1' of git://git.infradead.org/linux-ubifs: UBIFS: remove invalid warn msg with tst_recovery enabled
2013-09-16pstore: Remove the messages related to compression failureAruna Balakrishnaiah1-4/+0
Remove the messages indicating compression failure as it will add to the space during panic path. Reported-by: Seiji Aguchi <[email protected]> Tested-by: Seiji Aguchi <[email protected]> Signed-off-by: Aruna Balakrishnaiah <[email protected]> Signed-off-by: Tony Luck <[email protected]>
2013-09-16pstore: Use zlib_inflateInit2 instead of zlib_inflateInitAruna Balakrishnaiah1-1/+1
Since zlib_deflateInit2() is used for specifying window bit during compression, zlib_inflateInit2() is appropriate for decompression. Reported-by: Seiji Aguchi <[email protected]> Tested-by: Seiji Aguchi <[email protected]> Signed-off-by: Aruna Balakrishnaiah <[email protected]> Signed-off-by: Tony Luck <[email protected]>
2013-09-16pstore: Adjust buffer size for compression for smaller registered buffersAruna Balakrishnaiah1-1/+22
When backends (ex: efivars) have smaller registered buffers, the big_oops_buf is too big for them as number of repeated occurences in the text captured will be less. What happens is that pstore takes too big a bite from the dmesg log and then finds it cannot compress it enough to meet the backend block size. Patch takes care of adjusting the buffer size based on the registered buffer size. cmpr values have been arrived after doing experiments with plain text for buffers of size 1k - 4k (Smaller the buffer size repeated occurence will be less) and with sample crash log for buffers ranging from 4k - 10k. Reported-by: Seiji Aguchi <[email protected]> Tested-by: Seiji Aguchi <[email protected]> Signed-off-by: Aruna Balakrishnaiah <[email protected]> Signed-off-by: Tony Luck <[email protected]>
2013-09-16ext4: fix performance regression in writeback of random writesJan Kara1-1/+1
The Linux Kernel Performance project guys have reported that commit 4e7ea81db5 introduces a performance regression for the following fio workload: [global] direct=0 ioengine=mmap size=1500M bs=4k pre_read=1 numjobs=1 overwrite=1 loops=5 runtime=300 group_reporting invalidate=0 directory=/mnt/ file_service_type=random:36 file_service_type=random:36 [job0] startdelay=0 rw=randrw filename=data0/f1:data0/f2 [job1] startdelay=0 rw=randrw filename=data0/f2:data0/f1 ... [job7] startdelay=0 rw=randrw filename=data0/f2:data0/f1 The culprit of the problem is that after the commit ext4_writepages() are more aggressive in writing back pages. Thus we have less consecutive dirty pages resulting in more seeking. This increased aggressivity is caused by a bug in the condition terminating ext4_writepages(). We start writing from the beginning of the file even if we should have terminated ext4_writepages() because wbc->nr_to_write <= 0. After fixing the condition the throughput of the fio workload is about 20% better than before writeback reorganization. Reported-by: "Yan, Zheng" <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2013-09-15vfs: fix typo in comment in recent dentry workLinus Torvalds1-1/+1
Sedat points out that I transposed some letters in "LRU" and wrote "RLU" instead in one of the new comments explaining the flow. Let's just fix it. Reported-by: Sedat Dilek <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-13Merge tag 'writeback-fixes' of ↵Linus Torvalds2-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback fix from Wu Fengguang: "A trivial writeback fix" * tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Do not sort b_io list only because of block device inode
2013-09-13vfs: fix dentry LRU list handling and nr_dentry_unused accountingLinus Torvalds1-27/+101
The LRU list changes interacted badly with our nr_dentry_unused accounting, and even worse with the new DCACHE_LRU_LIST bit logic. This introduces helper functions to make sure everything follows the proper dcache d_lru list rules: the dentry cache is complicated by the fact that some of the hotpaths don't even want to look at the LRU list at all, and the fact that we use the same list entry in the dentry for both the LRU list and for our temporary shrinking lists when removing things from the LRU. The helper functions temporarily have some extra sanity checking for the flag bits that have to match the current LRU state of the dentry. We'll remove that before the final 3.12 release, but considering how easy it is to get wrong, this first cleanup version has some very particular sanity checking. Acked-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-13cifs: Avoid calling unlock_page() twice in cifs_readpage() when using fscacheSachin Prabhu1-3/+7
When reading a single page with cifs_readpage(), we make a call to fscache_read_or_alloc_page() which once done, asynchronously calls the completion function cifs_readpage_from_fscache_complete(). This completion function unlocks the page once it has been populated from cache. The module then attempts to unlock the page a second time in cifs_readpage() which leads to warning messages. In case of a successful call to fscache_read_or_alloc_page() we should skip the second unlock_page() since this will be called by the cifs_readpage_from_fscache_complete() once the page has been populated by fscache. With the modifications to cifs_readpage_worker(), we will need to re-grab the page lock in cifs_write_begin(). The problem was first noticed when testing new fscache patches for cifs. https://bugzilla.redhat.com/show_bug.cgi?id=1005737 Signed-off-by: Sachin Prabhu <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
2013-09-13cifs: Do not take a reference to the page in cifs_readpage_worker()Sachin Prabhu1-2/+3
We do not need to take a reference to the pagecache in cifs_readpage_worker() since the calling function will have already taken one before passing the pointer to the page as an argument to the function. Signed-off-by: Sachin Prabhu <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
2013-09-13Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds7-269/+537
Pull aio changes from Ben LaHaise: "First off, sorry for this pull request being late in the merge window. Al had raised a couple of concerns about 2 items in the series below. I addressed the first issue (the race introduced by Gu's use of mm_populate()), but he has not provided any further details on how he wants to rework the anon_inode.c changes (which were sent out months ago but have yet to be commented on). The bulk of the changes have been sitting in the -next tree for a few months, with all the issues raised being addressed" * git://git.kvack.org/~bcrl/aio-next: (22 commits) aio: rcu_read_lock protection for new rcu_dereference calls aio: fix race in ring buffer page lookup introduced by page migration support aio: fix rcu sparse warnings introduced by ioctx table lookup patch aio: remove unnecessary debugging from aio_free_ring() aio: table lookup: verify ctx pointer staging/lustre: kiocb->ki_left is removed aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3" aio: be defensive to ensure request batching is non-zero instead of BUG_ON() aio: convert the ioctx list to table lookup v3 aio: double aio_max_nr in calculations aio: Kill ki_dtor aio: Kill ki_users aio: Kill unneeded kiocb members aio: Kill aio_rw_vect_retry() aio: Don't use ctx->tail unnecessarily aio: io_cancel() no longer returns the io_event aio: percpu ioctx refcount aio: percpu reqs_available aio: reqs_active -> reqs_available aio: fix build when migration is disabled ...