aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2009-06-22ocfs2: Pin journal head before accessing jh->b_committed_dataSunil Mushran1-4/+24
This patch adds jbd_lock_bh_state() and jbd_unlock_bh_state() around accessses to jh->b_committed_data. Fixes oss bugzilla#1131 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1131 Signed-off-by: Sunil Mushran <[email protected]> Signed-off-by: Joel Becker <[email protected]>
2009-06-22ocfs2: Update atime in splice read if necessary.Tao Ma1-3/+3
We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so that we can update atime in splice read if necessary. Signed-off-by: Tao Ma <[email protected]> Signed-off-by: Joel Becker <[email protected]>
2009-06-22ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.Joel Becker5-9/+38
The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and the DLM cannot reconstruct its state. Clients of the DLM need to know this. ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses track of the state. This is not a standard behavior, but ocfs2 has always relied on it. Thus, an o2dlm LVB is always "valid". ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When fs/dlm loses track of an LVBs state, it sets a flag (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of the LVB may be garbage or merely stale. ocfs2 doesn't want to try to guess at the validity of the stale LVB. Instead, it should be checking the VALNOTVALID flag. As this is the 'standard' way of treating LVBs, we will promote this behavior. We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when the LVB is valid. o2dlm will always return valid, while fs/dlm will check VALNOTVALID. Signed-off-by: Joel Becker <[email protected]> Acked-by: Mark Fasheh <[email protected]>
2009-06-22Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsdLinus Torvalds17-597/+1142
* 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits) SUNRPC: Fix the TCP server's send buffer accounting nfsd41: Backchannel: minorversion support for the back channel nfsd41: Backchannel: cleanup nfs4.0 callback encode routines nfsd41: Remove ip address collision detection case nfsd: optimise the starting of zero threads when none are running. nfsd: don't take nfsd_mutex twice when setting number of threads. nfsd41: sanity check client drc maxreqs nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct NFS: kill off complicated macro 'PROC' sunrpc: potential memory leak in function rdma_read_xdr nfsd: minor nfsd_vfs_write cleanup nfsd: Pull write-gathering code out of nfsd_vfs_write nfsd: track last inode only in use_wgather case sunrpc: align cache_clean work's timer nfsd: Use write gathering only with NFSv2 NFSv4: kill off complicated macro 'PROC' NFSv4: do exact check about attribute specified knfsd: remove unreported filehandle stats counters knfsd: fix reply cache memory corruption knfsd: reply cache cleanups ...
2009-06-22Merge branch 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds24-510/+3930
* 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (128 commits) nfs41: sunrpc: xprt_alloc_bc_request() should not use spin_lock_bh() nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res nfs: remove unnecessary NFS_INO_INVALID_ACL checks NFS: More "sloppy" parsing problems NFS: Invalid mount option values should always fail, even with "sloppy" NFS: Remove unused XDR decoder functions NFS: Update MNT and MNT3 reply decoding functions NFS: add XDR decoder for mountd version 3 auth-flavor lists NFS: add new file handle decoders to in-kernel mountd client NFS: Add separate mountd status code decoders for each mountd version NFS: remove unused function in fs/nfs/mount_clnt.c NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument NFS: Clean up MNT program definitions lockd: Don't bother with RPC ping for NSM upcalls lockd: Update NSM state from SM_MON replies NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available NFS: Return error code from nfs_callback_up() to user space NFS: Do not display the setting of the "intr" mount option NFS: add support for splice writes nfs41: Backchannel: CB_SEQUENCE validation ...
2009-06-22Making fs/minix/minix.h double including safeHitoshi Mitake1-0/+5
I happened to find that fs/minix/minix.h doesn't guard double include. Yes, I know this never cause something destructive because this is self-evidence that no source file includes minix.h twice, but I think fixing this is better than disregarding it. Signed-off-by: Hitoshi Mitake <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-20nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_resBenny Halevy1-1/+1
nfs4_open_recover_helper clears opendata->o_res before calling nfs4_init_opendata_res, thus causing NFSv4.0 OPEN operations to be sent rather than nfsv4.1. Signed-off-by: Benny Halevy <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-20Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds1-0/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...
2009-06-20fat: Fix the removal of opts->fs_dmaskOGAWA Hirofumi1-1/+1
(ce3b0f8d5c2203301fc87f3aaaed73e5819e2a48: New helper - current_umask()) is removing the opts->fs_dmask, probably it's a cut-and-paste miss or something. Signed-off-by: OGAWA Hirofumi <[email protected]>
2009-06-19Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds3-29/+8
* 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: inotify_destroy_mark_entry could get called twice
2009-06-19Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds8-12/+12
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: Fix kernel-doc parameter name typo in blk-settings.c: block: rename CONFIG_LBD to CONFIG_LBDAF block: Fix bounce_pfn setting hd: stop defining MAJOR_NR
2009-06-19inotify: inotify_destroy_mark_entry could get called twiceEric Paris3-29/+8
inotify_destroy_mark_entry could get called twice for the same mark since it is called directly in inotify_rm_watch and when the mark is being destroyed for another reason. As an example assume that the file being watched was just deleted so inotify_destroy_mark_entry would get called from the path fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() -> fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry(). If this happened at the same time as userspace tried to remove a watch via inotify_rm_watch we could attempt to remove the mark from the idr twice and could thus double dec the ref cnt and potentially could be in a use after free/double free situation. The fix is to have inotify_rm_watch use the generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the inotify_destroy_mark_entry() function can only be called one. This patch also renames the function to inotify_ingored_remove_idr() so it is clear what is actually going on in the function. Hopefully this fixes: [ 20.342058] idr_remove called for id=20 which is not allocated. [ 20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077 [ 20.353933] Call Trace: [ 20.356410] [<ffffffff811a82b7>] idr_remove+0x115/0x18f [ 20.361737] [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75 [ 20.367061] [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf [ 20.373771] [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf [ 20.380306] [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10 [ 20.386238] [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170 [ 20.393293] [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf [ 20.399829] [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6 [ 20.405850] [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b Reported-by: Peter Zijlstra <[email protected]> Signed-off-by: Eric Paris <[email protected]> Tested-by: Peter Ziljlstra <[email protected]>
2009-06-19block: rename CONFIG_LBD to CONFIG_LBDAFBartlomiej Zolnierkiewicz8-12/+12
Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-06-18nfsd41: Backchannel: minorversion support for the back channelAndy Adamson2-1/+3
Prepare to share backchannel code with NFSv4.1. Signed-off-by: Andy Adamson <[email protected]> Signed-off-by: Benny Halevy <[email protected]> Signed-off-by: Ricardo Labiaga <[email protected]> [nfsd41: use nfsd4_cb_sequence for callback minorversion] Signed-off-by: Benny Halevy <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2009-06-18nfsd41: Backchannel: cleanup nfs4.0 callback encode routinesAndy Adamson1-8/+16
Mimic the client and prepare to share the back channel xdr with NFSv4.1. Bump the number of operations in each encode routine, then backfill the number of operations. Signed-off-by: Andy Adamson <[email protected]> Signed-off-by: Benny Halevy <[email protected]> Signed-off-by: Ricardo Labiaga <[email protected]> Signed-off-by: Benny Halevy <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2009-06-18Merge branch 'devel-for-2.6.31' into for-2.6.31Trond Myklebust13-217/+626
Conflicts: fs/nfs/client.c fs/nfs/super.c
2009-06-18nfsd41: Remove ip address collision detection caseMike Sager1-12/+6
Verified that cthon and pynfs exchange id tests pass (except for the two expected fails: EID8 and EID50) Signed-off-by: Mike Sager <[email protected]> Signed-off-by: Benny Halevy <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2009-06-18Merge branch 'for_linus' of ↵Linus Torvalds19-324/+1714
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: clean up jbd2_journal_try_to_free_buffers() ext4: Don't update ctime for non-extent-mapped inodes ext4: Fix up whitespace issues in fs/ext4/inode.c ext4: Fix 64-bit block type problem on 32-bit platforms ext4: teach the inode allocator to use a goal inode number ext4: Use a hash of the topdir directory name for the Orlov parent group ext4: document the "abort" mount option ext4: move the abort flag from s_mount_opts to s_mount_flags ext4: update the s_last_mounted field in the superblock ext4: change s_mount_opt to be an unsigned int ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl ext4: avoid unnecessary spinlock in critical POSIX ACL path ext3: avoid unnecessary spinlock in critical POSIX ACL path ext4: convert instrumentation from markers to tracepoints jbd2: convert instrumentation from markers to tracepoints
2009-06-18seq_file: add function to write binary dataPeter Oberparleiter1-0/+20
seq_write() can be used to construct seq_files containing arbitrary data. Required by the gcov-profiling interface to synthesize binary profiling data files. Signed-off-by: Peter Oberparleiter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Huang Ying <[email protected]> Cc: Li Wei <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Rusty Russell <[email protected]> Cc: WANG Cong <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18elf_core_dump: use rcu_read_lock() to access ->real_parentOleg Nesterov2-4/+12
In theory it is not safe to dereference ->parent/real_parent without tasklist or rcu lock, we can race with re-parenting. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18reiserfs: fix warnings with gcc 4.4Jeff Mahoney2-7/+8
Several code paths in reiserfs have a construct like: if (is_direntry_le_ih(ih = B_N_PITEM_HEAD(src, item_num))) ... which, in addition to being ugly, end up causing compiler warnings with gcc 4.4.0. Previous compilers didn't issue a warning. fs/reiserfs/do_balan.c:1273: warning: operation on `aux_ih' may be undefined fs/reiserfs/lbalance.c:393: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:421: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:777: warning: operation on `ih' may be undefined I believe this is due to the ih being passed to macros which evaluate the argument more than once. This is old code and we haven't seen any problems with it, but this patch eliminates the warnings. It converts the multiple evaluation macros to static inlines and does a preassignment for the cases that were causing the warnings because that code is just ugly. Reported-by: Chris Mason <[email protected]> Signed-off-by: Jeff Mahoney <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18ufs: sector_t cannot be negativeRoel Kluin1-9/+1
unsigned i_block,fragment cannot be negative. Signed-off-by: Roel Kluin <[email protected]> Signed-off-by: Evgeniy Dushistov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18isofs: cleanup mount option processingJan Kara4-45/+40
Remove unused variables from isofs_sb_info (used to be some mount options), unify variables for option to use 0/1 (some options used 'y'/'n'), use bit fields for option flags in superblock. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18isofs: fix setting of uid and gid to 0Jan Kara2-14/+14
isofs allows setting of default uid and gid of files but value 0 was used to indicate that user did not specify any uid/gid mount option. Since this option also overrides uid/gid set in Rock Ridge extension, it makes sense to allow forcing uid/gid 0. Fix option processing to allow this. Cc: <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18isofs: let mode and dmode mount options override rock ridge mode settingJan Kara2-12/+41
So far, permissions set via 'mode' and/or 'dmode' mount options were effective only if the medium had no rock ridge extensions (or was mounted without them). Add 'overriderockmode' mount option to indicate that these options should override permissions set in rock ridge extensions. Maybe this should be default but the current behavior is there since mount options were created so I think we should not change how they behave. Cc: <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18ext3: make sure inode is deleted from orphan list after truncateJan Kara1-9/+11
As Ted pointed out, it can happen that ext3_truncate() returns without removing inode from orphan list. This way we could in some rare cases (like when we get ENOMEM from an allocation in ext3_truncate called because of failed ext3_write_begin) leave the inode on orphan list and that triggers assertion failure on umount. So make ext3_truncate() always remove inode from in-memory orphan list. Cc: Theodore Ts'o <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18jbd: clean up journal_try_to_free_buffers()Hisashi Hifumi1-48/+0
I delete the following patch "commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91 Author: Mingming Cao <[email protected]> Date: Fri Jul 25 01:46:22 2008 -0700 jbd: fix race between free buffer and commit transaction This patch is no longer needed because if race between freeing buffer and committing transaction functionality occurs and dio gets error, currently dio falls back to buffered IO by the following patch. commit 6ccfa806a9cfbbf1cd43d5b6aa47ef2c0eb518fd Author: Hisashi Hifumi <[email protected]> Date: Tue Sep 2 14:35:40 2008 -0700 VFS: fix dio write returning EIO when try_to_release_page fails Signed-off-by: Hisashi Hifumi <[email protected]> Cc: Theodore Tso <[email protected]> Cc: Mingming Cao <[email protected]> Acked-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18ext3: fix chain verification in ext3_get_blocks()Jan Kara1-1/+1
Chain verification in ext3_get_blocks() has been hosed since it called verify_chain(chain, NULL) which always returns success. As a result readers could in theory race with truncate. On the other hand the race probably cannot happen with the current locking scheme, since by the time ext3_truncate() is called all the pages are already removed and hence get_block() shouldn't be called on such pages... Signed-off-by: Jan Kara <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18ext2: Do not update mtime of a moved directoryJan Kara3-5/+7
One of our users is complaining that his backup tool is upset on ext2 (while it's happy on ext3, xfs, ...) because of the mtime change. The problem is: mkdir foo mkdir bar mkdir foo/a Now under ext2: mv foo/a foo/b changes mtime of 'foo/a' (foo/b after the move). That does not really make sense and it does not happen under any other filesystem I've seen. More complicated is: mv foo/a bar/a This changes mtime of foo/a (bar/a after the move) and it makes some sense since we had to update parent directory pointer of foo/a. But again, no other filesystem does this. So after some thoughts I'd vote for consistency and change ext2 to behave the same as other filesystems. Do not update mtime of a moved directory. Specs don't say anything about it (neither that it should, nor that it should not be updated) and other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do it. So let's become more consistent. Spotted by [email protected], initial fix by Jörn Engel. Reported-by: <[email protected]> Cc: <[email protected]> Cc: Jörn Engel <[email protected]> Signed-off-by: Jan Kara <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18proc: vmcore - use kzalloc in get_new_element()Cyrill Gorcunov1-6/+1
Instead of kmalloc+memset better use straight kzalloc Signed-off-by: Cyrill Gorcunov <[email protected]> Reviewed-by: WANG Cong <[email protected]> Acked-by: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18procfs: remove sparse errors in proc_devtree.cMichal Simek1-5/+5
CHECK fs/proc/proc_devtree.c fs/proc/proc_devtree.c:197:14: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:203:34: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:210:14: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:223:26: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:226:14: warning: Using plain integer as NULL pointer Signed-off-by: Michal Simek <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18epoll: fix nested calls supportDavide Libenzi1-9/+12
This fixes a regression in 2.6.30. I unfortunately accepted a patch time ago, to drop the "current" usage from possible IRQ context, w/out proper thought over it. The patch switched to using the CPU id by bounding the nested call callback with a get_cpu()/put_cpu(). Unfortunately the ep_call_nested() function can be called with a callback that grabs sleepy locks (from own f_op->poll()), that results in epic fails. The following patch uses the proper "context" depending on the path where it is called, and on the kind of callback. This has been reported by Stefan Richter, that has also verified the patch is his previously failing environment. Signed-off-by: Davide Libenzi <[email protected]> Reported-by: Stefan Richter <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18proc: export statistics for softirq to /procKeika Kobayashi3-0/+60
Export statistics for softirq in /proc/softirqs and /proc/stat. 1. /proc/softirqs Implement /proc/softirqs which shows the number of softirq for each CPU like /proc/interrupts. 2. /proc/stat Add the "softirq" line to /proc/stat. This line shows the number of softirq for all cpu. The first column is the total of all softirqs and each subsequent column is the total for particular softirq. [[email protected]: remove redundant for_each_possible_cpu() loop] Signed-off-by: Keika Kobayashi <[email protected]> Reviewed-by: Hiroshi Shimamoto <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-18nfsd: optimise the starting of zero threads when none are running.NeilBrown1-2/+4
Currently, if we ask to set then number of nfsd threads to zero when there are none running, we set up all the sockets and register the service, and then tear it all down again. This is pointless. So detect that case and exit promptly. (also remove an assignment to 'error' which was never used. Signed-off-by: NeilBrown <[email protected]> Acked-by: Jeff Layton <[email protected]>
2009-06-18nfsd: don't take nfsd_mutex twice when setting number of threads.NeilBrown2-5/+15
Currently when we write a number to 'threads' in nfsdfs, we take the nfsd_mutex, update the number of threads, then take the mutex again to read the number of threads. Mostly this isn't a big deal. However if we are write '0', and portmap happens to be dead, then we can get unpredictable behaviour. If the nfsd threads all got killed quickly and the last thread is waiting for portmap to respond, then the second time we take the mutex we will block waiting for the last thread. However if the nfsd threads didn't die quite that fast, then there will be no contention when we try to take the mutex again. Unpredictability isn't fun, and waiting for the last thread to exit is pointless, so avoid taking the lock twice. To achieve this, get nfsd_svc return a non-negative number of active threads when not returning a negative error. Signed-off-by: NeilBrown <[email protected]>
2009-06-18fs: Provide empty .set_page_dirty() aop for anon inodesPeter Zijlstra1-0/+15
.set_page_dirty() is one of those a_ops that defaults to the buffer implementation when not set. Therefore provide a dummy function to make it do nothing. (Uncovered by perfcounters fd's which can now be writable-mmap-ed.) Signed-off-by: Peter Zijlstra <[email protected]> Cc: Al Viro <[email protected]> Cc: Davide Libenzi <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-06-18udf: Use device size when drive reported bogus number of written blocksJan Kara1-1/+6
Some drives report 0 as the number of written blocks when there are some blocks recorded. Use device size in such case so that we can automagically mount such media. Signed-off-by: Jan Kara <[email protected]>
2009-06-17nfs: remove unnecessary NFS_INO_INVALID_ACL checksJames Morris2-4/+0
Unless I'm mistaken, NFS_INO_INVALID_ACL is being checked twice during getacl calls (i.e. first via nfs_revalidate_inode() and then by each all site). Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: More "sloppy" parsing problemsChuck Lever1-34/+108
Specifying "port=-5" with the kernel's current mount option parser generates "unrecognized mount option". If "sloppy" is set, this causes the mount to succeed and use the default values; the desired behavior is that, since this is a valid option with an invalid value, the mount should fail, even with "sloppy." To properly handle "sloppy" parsing, we need to distinguish between correct options with invalid values, and incorrect options. We will need to parse integer values by hand, therefore, and not rely on match_token(). For instance, these must all fail with "invalid value": port=12345678 port=-5 port=samuel and not with "unrecognized option," as they do currently. Thus, for the sake of match_token() we need to treat the values for these options as strings, and do the conversion to integers using strict_strtol(). This is basically the same solution we used for the earlier "retry=" fix (commit ecbb3845), except in this case the kernel actually has to parse the value, rather than ignore it. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: Invalid mount option values should always fail, even with "sloppy"Chuck Lever1-96/+61
Ian Kent reports: "I've noticed a couple of other regressions with the options vers and proto option of mount.nfs(8). The commands: mount -t nfs -o vers=<invalid version> <server>:/<path> /<mountpoint> mount -t nfs -o proto=<invalid proto> <server>:/<path> /<mountpoint> both immediately fail. But if the "-s" option is also used they both succeed with the mount falling back to defaults (by the look of it). In the past these failed even when the sloppy option was given, as I think they should. I believe the sloppy option is meant to allow the mount command to still function for mount options (for example in shared autofs maps) that exist on other Unix implementations but aren't present in the Linux mount.nfs(8). So, an invalid value specified for a known mount option is different to an unknown mount option and should fail appropriately." See RH bugzilla 486266. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: Remove unused XDR decoder functionsChuck Lever1-29/+0
Clean up: Remove xdr_decode_fhstatus() and xdr_decode_fhstatus3(), now that they are unused. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: Update MNT and MNT3 reply decoding functionsChuck Lever4-14/+55
Solder xdr_stream-based XDR decoding functions into the in-kernel mountd client that are more careful about checking data types and watching for buffer overflows. The new MNT3 decoder includes support for auth-flavor list decoding. The "_sz" macro for MNT3 replies was missing the size of the file handle. I've added this back, and included the size of the auth flavor array. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: add XDR decoder for mountd version 3 auth-flavor listsChuck Lever2-0/+43
Introduce an xdr_stream-based XDR decoder that can unpack the auth- flavor list returned in a MNT3 reply. The nfs_mount() function's caller allocates an array, and passes the size and a pointer to it. The decoder decodes all the flavors it can into the array, and returns the number of decoded flavors. If the caller is not interested in the auth flavors, it can pass a value of zero as the size of the pre-allocated array. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: add new file handle decoders to in-kernel mountd clientChuck Lever1-0/+39
Introduce xdr_stream-based XDR file handle decoders to the in-kernel mountd client. These are more careful than the existing decoder functions about buffer overflows and data type and range checking. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: Add separate mountd status code decoders for each mountd versionChuck Lever1-0/+116
Introduce data structures and xdr_stream-based decoding functions for unmarshalling mountd status codes properly. Mountd version 3 uses specific standard error return codes that are not errno values and not NFS3ERR_ values. These have a well-defined standard mapping to local errno values. Introduce data structures and a decoder function that map these status codes to local errno values properly. This is new functionality (but not used yet). Version 1 mountd status values are defined by RFC 1094 as UNIX error values (errno values). Errno values on heterogeneous systems do not necessarily match each other. To avoid exposing possibly incorrect errno values to upper layers, the current XDR decoder converts all non-zero MNT version 1 status codes to -EACCES. The OpenGroup XNFS standard provides a mapping similar to but smaller than the version 3 error codes. Implement a decoder that uses the XNFS error codes, replacing the current decoder. For both mountd protocol versions, map unrecognized errors to -EACCES. Finally we introduce a replacement data structure for mnt_fhstatus at this time, which is used by the new XDR decoders. In addition to documenting that the status value returned by the XDR decoders is always an errno, this new structure will be expanded in subsequent patches. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: remove unused function in fs/nfs/mount_clnt.cChuck Lever1-8/+0
Clean up: remove xdr_encode_dirpath() now that it has been replaced. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argumentChuck Lever1-5/+44
Check the length of the supplied dirpath, and see that it fits properly in the RPC buffer. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17NFS: Clean up MNT program definitionsChuck Lever2-4/+31
Clean up: Relocate MNT program procedure number definitions to the only file that uses them. Relocate the version number definitions, which are shared, to nfs.h. Remove duplicate program number definitions. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17lockd: Don't bother with RPC ping for NSM upcallsChuck Lever1-0/+1
Cut NSM upcall RPC traffic in half -- don't do a NULL call first. The cases where a ping would be helpful are rare. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2009-06-17lockd: Update NSM state from SM_MON repliesChuck Lever2-7/+13
When rpc.statd starts up in user space at boot time, it attempts to write the latest NSM local state number into /proc/sys/fs/nfs/nsm_local_state. If lockd.ko isn't loaded yet (as is the case in most configurations), that file doesn't exist, thus the kernel's NSM state remains set to its initial value of zero during lockd operation. This is a problem because rpc.statd and lockd use the NSM state number to prevent repeated lock recovery on rebooted hosts. If lockd sends a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state number is received, there is no way for lockd or rpc.statd to distinguish that stale SM_NOTIFY from an actual reboot. Thus lock recovery could be performed after the rebooted host has already started reclaiming locks, and those locks will be lost. We could change /etc/init.d/nfslock so it always modprobes lockd.ko before starting rpc.statd. However, if lockd.ko is ever unloaded and reloaded, we are back at square one, since the NSM state is not preserved across an unload/reload cycle. This may happen frequently on clients that use automounter. A period of NFS inactivity causes lockd.ko to be unloaded, and the kernel loses its NSM state setting. Instead, let's use the fact that rpc.statd plants the local system's NSM state in every SM_MON (and SM_UNMON) reply. lockd performs a synchronous SM_MON upcall to the local rpc.statd _before_ sending its first NLM request to a new remote. This would permit rpc.statd to provide the current NSM state to lockd, even after lockd.ko had been unloaded and reloaded. Note that NLMPROC_LOCK arguments are constructed before the nsm_monitor() call, so we have to rearrange argument construction very slightly to make this all work out. And, the kernel appears to treat NSM state as a u32 (see struct nlm_args and nsm_res). Make nsm_local_state a u32 as well, to ensure we don't get bogus comparison results. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>