Age | Commit message (Collapse) | Author | Files | Lines |
|
Add an argument to the new smaps_pte_entry() function to let it account in
things other than PAGE_SIZE units. I changed all of the PAGE_SIZE sites,
even though not all of them can be reached for transparent huge pages,
just so this will continue to work without changes as THPs are improved.
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Eric B Munson <[email protected]>
Tested-by: Eric B Munson <[email protected]>
Cc: Michael J Wolf <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Matt Mackall <[email protected]>
Cc: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We will use smaps_pte_entry() in a moment to handle both small and
transparent large pages. But, we must break it out of smaps_pte_range()
first.
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Eric B Munson <[email protected]>
Tested-by: Eric B Munson <[email protected]>
Cc: Michael J Wolf <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Matt Mackall <[email protected]>
Cc: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Right now, if a mm_walk has either ->pte_entry or ->pmd_entry set, it will
unconditionally split any transparent huge pages it runs in to. In
practice, that means that anyone doing a
cat /proc/$pid/smaps
will unconditionally break down every huge page in the process and depend
on khugepaged to re-collapse it later. This is fairly suboptimal.
This patch changes that behavior. It teaches each ->pmd_entry handler
(there are five) that they must break down the THPs themselves. Also, the
_generic_ code will never break down a THP unless a ->pte_entry handler is
actually set.
This means that the ->pmd_entry handlers can now choose to deal with THPs
without breaking them down.
[[email protected]: coding-style fixes]
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Eric B Munson <[email protected]>
Tested-by: Eric B Munson <[email protected]>
Cc: Michael J Wolf <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matt Mackall <[email protected]>
Cc: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch series changes remove_from_page_cache()'s page ref counting
rule. Page cache ref count is decreased in delete_from_page_cache(). So
we don't need to decrease the page reference in callers.
Signed-off-by: Minchan Kim <[email protected]>
Cc: William Irwin <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: Johannes Weiner <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This function basically does:
remove_from_page_cache(old);
page_cache_release(old);
add_to_page_cache_locked(new);
Except it does this atomically, so there's no possibility for the "add" to
fail because of a race.
If memory cgroups are enabled, then the memory cgroup charge is also moved
from the old page to the new.
This function is currently used by fuse to move pages into the page cache
on read, instead of copying the page contents.
[[email protected]: add freepage() hook to replace_page_cache_page()]
Signed-off-by: Miklos Szeredi <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove pointless memset() in nfsacl_encode().
Thanks to Trond Myklebust <[email protected]> for pointing out
that it is not needed since posix_acl_init() will set everything
regardless..
Signed-off-by: Jesper Juhl <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
d_alloc_and_lookup() calls i_op->lookup method due to
rootfh changes his fsid.
During mount i_op of NFS root inode is set to
nfs_mountpoint_inode_operations, if rpc_ops->getroot()
and rpc_ops->getattr() return different fsid.
After that nfs_follow_remote_path() raised oops:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
stack trace:
d_alloc_and_lookup+0x4c/0x74
do_lookup+0x1e3/0x280
link_path_walk+0x12e/0xab0
nfs4_remote_get_sb+0x56/0x2c0 [nfs]
path_walk+0x67/0xe0
vfs_path_lookup+0x8e/0x100
nfs_follow_remote_path+0x16f/0x3e0 [nfs]
nfs4_try_mount+0x6f/0xd0 [nfs]
nfs_get_sb+0x269/0x400 [nfs]
vfs_kern_mount+0x8a/0x1f0
do_kern_mount+0x52/0x130
do_mount+0x20a/0x260
sys_mount+0x90/0xe0
system_call_fastpath+0x16/0x1b
So just refresh fsid, as RFC3530 doesn't specify behavior
in case of rootfh changes fsid.
Signed-off-by: Vitaliy Gusev <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
[net/9p]: Introduce basic flow-control for VirtIO transport.
9p: use the updated offset given by generic_write_checks
[net/9p] Don't re-pin pages on retrying virtqueue_add_buf().
[net/9p] Set the condition just before waking up.
[net/9p] unconditional wake_up to proc waiting for space on VirtIO ring
fs/9p: Add v9fs_dentry2v9ses
fs/9p: Attach writeback_fid on first open with WR flag
fs/9p: Open writeback fid in O_SYNC mode
fs/9p: Use truncate_setsize instead of vmtruncate
net/9p: Fix compile warning
net/9p: Convert the in the 9p rpc call path to GFP_NOFS
fs/9p: Fix race in initializing writeback fid
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: use watch/notify for changes in rbd header
libceph: add lingering request and watch/notify event framework
rbd: update email address in Documentation
ceph: rename dentry_release -> d_release, fix comment
ceph: add request to the tail of unsafe write list
ceph: remove request from unsafe list if it is canceled/timed out
ceph: move readahead default to fs/ceph from libceph
ceph: add ino32 mount option
ceph: update common header files
ceph: remove debugfs debug cruft
libceph: fix osd request queuing on osdmap updates
ceph: preserve I_COMPLETE across rename
libceph: Fix base64-decoding when input ends in newline.
|
|
pstore_dump() can be called with many different "reason" codes. Save
the name of the code in the persistent store record.
Also - only worthwhile calling pstore_mkfile for KMSG_DUMP_OOPS - that
is the only one where the kernel will continue running.
Reviewed-by: Seiji Aguchi <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
|
|
Bugzilla bug 31422 reports occasional "page allocation failure. order:4"
at Squashfs mount time. Fix this by making zlib workspace allocation
use vmalloc rather than kmalloc.
Reported-by: Mehmet Giritli <[email protected]>
Signed-off-by: Phillip Lougher <[email protected]>
|
|
Without this fix, even if a file is opened in O_APPEND mode, data will be
written at current file position instead of end of file.
Signed-off-by: M. Mohan Kumar <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
|
|
Add the new static inline and use the same
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Venkateswararao Jujjuri <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
|
|
We don't need writeback fid if we are only doing O_RDONLY open
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Venkateswararao Jujjuri <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
|
|
Older version of protocol don't support tsyncfs operation.
So for them force a O_SYNC flag on the server
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Venkateswararao Jujjuri <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
|
|
convert vmtruncate usage to truncate_setsize. We also writeback
all dirty pages before doing 9p operations and on success call truncate_setsize.
This ensure that we continue sanely on failed truncate on the server. The
disadvantage is that we are now going to write back the content that get
thrown away later as a part of truncate.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Venkateswararao Jujjuri <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
|
|
When two process open the same file we can end up with both of them
allocating the writeback_fid. Add a new mutex which can be used
for synchronizing v9fs_inode member values.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Venkateswararao Jujjuri <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: make fuse_dentry_revalidate() RCU aware
fuse: make fuse_permission() RCU aware
fuse: wakeup pollers on connection release/abort
fuse: reduce size of struct fuse_request
|
|
After the stack plugging introduction, these are called lockless.
Ensure that the counters are updated atomically.
Signed-off-by: Shaohua Li<[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
- Add more ext4 tracepoints.
- Change ext4 tracepoints to use dev_t field with MAJOR/MINOR macros
so that we can save 4 bytes in the ring buffer on some platforms.
- Add sync_mode to ext4_da_writepages, ext4_da_write_pages, and
ext4_da_writepages_result tracepoints. Also remove for_reclaim
field from ext4_da_writepages since it is usually not very useful.
Signed-off-by: Jiaying Zhang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
|
|
We can call kfree on uninitialized members of the s_group_info array
on an the error path. We can avoid this by kzalloc'ing the array.
This doesn't entirely solve the oops on mount if we fail down this
path; failed_mount4: frees the sbi, for one, which gets referenced
later in the failed mount paths - I haven't worked that out yet.
https://bugzilla.kernel.org/show_bug.cgi?id=30872
Reported-by: Eugene A. Shatokhin <[email protected]>
Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
|
|
When one of the two waits in nfs_commit_inode() is interrupted, it
returns a non-negative value, which causes nfs_wb_page() to think
that the operation was successful causing it to busy-loop rather
than exiting.
It also causes nfs_file_fsync() to incorrectly report the file as
being successfully committed to disk.
This patch fixes both problems by ensuring that we return an error
if the attempts to wait fail.
Signed-off-by: Trond Myklebust <[email protected]>
Cc: [email protected]
|
|
If we're only doing a single write, and there are no other unstable
writes being queued up, we might want to just flip to using a stable
write RPC call.
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
When we do performence-testing on ext4 filesystem, we observed a
warning like this:
EXT4-fs error (device sda7): ext4_mb_generate_buddy:718: group 259825901 blocks in bitmap, 26057 in gd
instead, it should be
"group 2598, 25901 blocks in bitmap, 26057 in gd"
Reviewed-by: Coly Li <[email protected]>
Cc: Tao Ma <[email protected]>
Signed-off-by: Robin Dong <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
|
|
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (23 commits)
xfs: don't name variables "panic"
xfs: factor agf counter updates into a helper
xfs: clean up the xfs_alloc_compute_aligned calling convention
xfs: kill support/debug.[ch]
xfs: Convert remaining cmn_err() callers to new API
xfs: convert the quota debug prints to new API
xfs: rename xfs_cmn_err_fsblock_zero()
xfs: convert xfs_fs_cmn_err to new error logging API
xfs: kill xfs_fs_mount_cmn_err() macro
xfs: kill xfs_fs_repair_cmn_err() macro
xfs: convert xfs_cmn_err to xfs_alert_tag
xfs: Convert xlog_warn to new logging interface
xfs: Convert linux-2.6/ files to new logging interface
xfs: introduce new logging API.
xfs: zero proper structure size for geometry calls
xfs: enable delaylog by default
xfs: more sensible inode refcounting for ialloc
xfs: stop using xfs_trans_iget in the RT allocator
xfs: check if device support discard in xfs_ioc_trim()
xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1
...
|
|
/sys/fs is a somewhat strange way to tweak what could more
obviously be tuned with a mount option.
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Just for consistency's sake. Fix obsolete comment too.
Signed-off-by: Sage Weil <[email protected]>
|
|
In sync_write_wait(), we assume that the newest request is at the
tail of unsafe write list. We should maintain the semantics here.
Signed-off-by: Henry C Chang <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
|
|
This fixes the list corruption warning like this:
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81()
Hardware name: X8DTU
list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130).
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan]
Pid: 10977, comm: smbd Tainted: G W 2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1
Call Trace:
[<ffffffff8105753c>] warn_slowpath_common+0x7c/0x94
[<ffffffff810575ab>] warn_slowpath_fmt+0x41/0x43
[<ffffffff812351a3>] __list_add+0x68/0x81
[<ffffffffa014799d>] ceph_aio_write+0x614/0x8a2 [ceph]
[<ffffffff8111d2a0>] do_sync_write+0xe8/0x125
[<ffffffff81075a1f>] ? autoremove_wake_function+0x0/0x39
[<ffffffff811f21ec>] ? selinux_file_permission+0x5c/0xb3
[<ffffffff811e8521>] ? security_file_permission+0x16/0x18
[<ffffffff8111d864>] vfs_write+0xae/0x10b
[<ffffffff8111d91b>] sys_pwrite64+0x5a/0x76
[<ffffffff81012d32>] system_call_fastpath+0x16/0x1b
---[ end trace 08573eb9f07ff6f4 ]---
Signed-off-by: Henry C Chang <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
|
|
Signed-off-by: Sage Weil <[email protected]>
|
|
The ino32 mount option forces the ceph fs to report 32 bit
ino values. This is useful for 64 bit kernels with 32 bit userspace.
Signed-off-by: Yehuda Sadeh <[email protected]>
|
|
Whoops!
Signed-off-by: Sage Weil <[email protected]>
|
|
lookup_mnt() is only used in the core fs routines now, so it doesn't need to
be globally declared anymore. It isn't exported to modules at the moment, so
nothing that can be modularised seems to be using it.
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Only bail out of fuse_dentry_revalidate() on LOOKUP_RCU when blocking
is actually necessary.
CC: Nick Piggin <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
Only bail out of fuse_permission() on IPERM_FLAG_RCU when blocking is
actually necessary.
CC: Nick Piggin <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
If a fuse dev connection is broken, wake up any
processes that are blocking, in a poll system call,
on one of the files in the now defunct filesystem.
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
Reduce the size of struct fuse_request by removing cuse_init_out from
the request structure and allocating it dinamically instead.
CC: Tejun Heo <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
The usage of find_first_zero_bit() in bfs_create() is wrong for two
reasons.
The bitmap size argument to find_first_zero_bit() is info->si_lasti but
the correct bitmap size is info->si_lasti + 1 as info->si_lasti is the
last valid index in info->si_imap bitmap.
Another problem is that it is impossible to detect that info->si_imap
bitmap is full because there is an off-by-one bug in the return value
check for find_first_zero_bit(). If no zero bits exist in info->si_imap,
find_first_zero_bit() returns info->si_lasti. But the check can't catch
it due to the off-by-one.
Signed-off-by: Akinobu Mita <[email protected]>
Acked-by: "Tigran A. Aivazian" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
dentry_open() requires callers to pass a valid vfsmount.
Signed-off-by: Tetsuo Handa <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
In this case nobody can open a slave point, so will be better return
from devpts_pty_new()
Now we should not check error code from d_find_alias() in
devpts_pty_kill(), because the dentry exists all times.
Signed-off-by: Andrey Vagin <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
These should be spin_unlock() instead of spin_lock(). It's a typo.
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Move kfree() of i_private out of ->unlink() and into ->evict_inode()
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
It is frequently useful to sync a single file system, instead of all
mounted file systems via sync(2):
- On machines with many mounts, it is not at all uncommon for some of
them to hang (e.g. unresponsive NFS server). sync(2) will get stuck on
those and may never get to the one you do care about (e.g., /).
- Some applications write lots of data to the file system and then
want to make sure it is flushed to disk. Calling fsync(2) on each
file introduces unnecessary ordering constraints that result in a large
amount of sub-optimal writeback/flush/commit behavior by the file
system.
There are currently two ways (that I know of) to sync a single super_block:
- BLKFLSBUF ioctl on the block device: That also invalidates the bdev
mapping, which isn't usually desirable, and doesn't work for non-block
file systems.
- 'mount -o remount,rw' will call sync_filesystem as an artifact of the
current implemention. Relying on this little-known side effect for
something like data safety sounds foolish.
Both of these approaches require root privileges, which some applications
do not have (nor should they need?) given that sync(2) is an unprivileged
operation.
This patch introduces a new system call syncfs(2) that takes an fd and
syncs only the file system it references. Maybe someday we can
$ sync /some/path
and not get
sync: ignoring all arguments
The syscall is motivated by comments by Al and Christoph at the last LSF.
syncfs(2) seems like an appropriate name given statfs(2).
A similar ioctl was also proposed a while back, see
http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2
Signed-off-by: Sage Weil <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Hi,
I was backporting the coredump over pipe feature and noticed this small typo,
I wish I would have something bigger to contribute...
>From 15d6080e0ed4267da103c706917a33b1015e8804 Mon Sep 17 00:00:00 2001
From: Holger Hans Peter Freyther <[email protected]>
Date: Thu, 24 Feb 2011 17:42:50 +0100
Subject: [PATCH] fs: Fix a small typo in the comment
The function is called umh_pipe_setup not uhm_pipe_setup.
Signed-off-by: Holger Hans Peter Freyther <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Fixed coding style issue.
Signed-off-by: David Jenni <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Remove the leftover from the commit 8ff3e8e85fa6 ("select:
switch select() and poll() over to hrtimers").
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Arjan van de Ven <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Move declaration of 'inode' to beginning of the function. Since it
is referenced directly or indirectly (in case of FIFREEZE/FITHAW/
FS_IOC_FIEMAP) it's not harmful IMHO. And remove unnecessary casts
using 'argp' instead.
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
FITRIM isn't added in compat_ioctl. So a 32 bit program can't be executed
in a 64 bit platform. Add it in the compat_ioctl.
Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
|
|
Checking return code from ext4_journal_get_write_access() is important
with snapshots, because this function invokes COW, so may return new
errors, such as ENOSPC.
ext4_clear_blocks() now returns < 0 for fatal errors, in which case,
ext4_free_data() is aborted.
Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
|