aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2012-09-21fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldocLiu Bo1-4/+0
Argument @only_this_sb has been removed. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2012-09-21btrfs: fix the commment for the action flags in delayed-ref.hWang Sheng-Hui1-1/+1
The action field has been merged into struct btrfs_delayed_ref_node, and no struct btrfs_delayed_ref is available now. Signed-off-by: Wang Sheng-Hui <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2012-09-20pstore: Avoid recursive spinlocks in the oops_in_progress caseChuansheng Liu1-1/+7
Like 8250 driver, when pstore is registered as a console, to avoid recursive spinlocks when panic happening, change the spin_lock_irqsave to spin_trylock_irqsave when oops_in_progress is true. Signed-off-by: liu chuansheng <[email protected]> Signed-off-by: Anton Vorontsov <[email protected]>
2012-09-20ext4: remove erroneous ext4_superblock_csum_set() in update_backups()Tao Ma1-2/+0
The update_backups() function is used to backup all the metadata blocks, so we should not take it for granted that 'data' is pointed to a super block and use ext4_superblock_csum_set to calculate the checksum there. In case where the data is a group descriptor block, it will corrupt the last group descriptor, and then e2fsck will complain about it it. As all the metadata checksums should already be OK when we do the backup, remove the wrong ext4_superblock_csum_set and it should be just fine. Reported-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Tao Ma <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: [email protected]
2012-09-20userns: Convert fat to use kuid/kgid where appropriateEric W. Biederman3-11/+17
Acked-by: OGAWA Hirofumi <[email protected]> Acked-by: Serge Hallyn <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2012-09-19ext4: fix potential deadlock in ext4_nonda_switch()Theodore Ts'o2-7/+11
In ext4_nonda_switch(), if the file system is getting full we used to call writeback_inodes_sb_if_idle(). The problem is that we can be holding i_mutex already, and this causes a potential deadlock when writeback_inodes_sb_if_idle() when it tries to take s_umount. (See lockdep output below). As it turns out we don't need need to hold s_umount; the fact that we are in the middle of the write(2) system call will keep the superblock pinned. Unfortunately writeback_inodes_sb() checks to make sure s_umount is taken, and the VFS uses a different mechanism for making sure the file system doesn't get unmounted out from under us. The simplest way of dealing with this is to just simply grab s_umount using a trylock, and skip kicking the writeback flusher thread in the very unlikely case that we can't take a read lock on s_umount without blocking. Also, we now check the cirteria for kicking the writeback thread before we decide to whether to fall back to non-delayed writeback, so if there are any outstanding delayed allocation writes, we try to get them resolved as soon as possible. [ INFO: possible circular locking dependency detected ] 3.6.0-rc1-00042-gce894ca #367 Not tainted ------------------------------------------------------- dd/8298 is trying to acquire lock: (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 but task is already holding lock: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 which lock already depends on the new lock. 2 locks held by dd/8298: #0: (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3 #1: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 stack backtrace: Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367 Call Trace: [<c015b79c>] ? console_unlock+0x345/0x372 [<c06d62a1>] print_circular_bug+0x190/0x19d [<c019906c>] __lock_acquire+0x86d/0xb6c [<c01999db>] ? mark_held_locks+0x5c/0x7b [<c0199724>] lock_acquire+0x66/0xb9 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c06db935>] down_read+0x28/0x58 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4 [<c0271ece>] ext4_da_write_begin+0x27/0x193 [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205 [<c01ddce7>] generic_file_aio_write+0x78/0xd3 [<c026d336>] ext4_file_write+0x480/0x4a6 [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c [<c0180944>] ? sched_clock_cpu+0x11a/0x13e [<c01967e9>] ? trace_hardirqs_off+0xb/0xd [<c018099f>] ? local_clock+0x37/0x4e [<c0209f2c>] do_sync_write+0x67/0x9d [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44 [<c020a7b9>] vfs_write+0x7b/0xe6 [<c020a9a6>] sys_write+0x3b/0x64 [<c06dd4bd>] syscall_call+0x7/0xb Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: [email protected]
2012-09-19ext4: speed up truncate/unlink by not using bforget() unless neededAndrey Sidorov1-2/+5
Do not iterate over data blocks scanning for bh's to forget as they're never exist. This improves time taken by unlink / truncate syscall. Tested by continuously truncating file that is being written by dd. Another test is rm -rf of linux tree while tar unpacks it. With ordered data mode condition unlikely(!tbh) was always met in ext4_free_blocks. With journal data mode tbh was found only few times, so optimisation is also possible. Unlinking fallocated 60G file after doing sync && echo 3 > /proc/sys/vm/drop_caches && time rm --help X86 before (linux 3.6-rc4): # time rm -f test1 real 0m2.710s user 0m0.000s sys 0m1.530s X86 after: # time rm -f test1 real 0m0.644s user 0m0.003s sys 0m0.060s MIPS before (linux 2.6.37): # time rm -f test1 real 0m 4.93s user 0m 0.00s sys 0m 4.61s MIPS after: # time rm -f test1 real 0m 0.16s user 0m 0.00s sys 0m 0.06s Signed-off-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrey Sidorov <[email protected]>
2012-09-19ext4: fix online resizing when the # of block groups is constantTheodore Ts'o1-22/+14
Commit 1c6bd7173d66b3 introduced a regression where an online resize operation which did not change the number of block groups would fail, i.e: mke2fs -t /dev/vdc 60000 mount /dev/vdc resize2fs /dev/vdc 60001 This was due to a bug in the logic regarding when to try converting the filesystem to use meta_bg. Also fix up a number of other minor issues with the online resizing code: (a) Fix a sparse warning; (b) only check to make sure the device is large enough once, instead of multiple times through the resize loop. Signed-off-by: "Theodore Ts'o" <[email protected]>
2012-09-18xfs: stop the sync worker before xfs_unmountfsBen Myers1-0/+1
Cancel work of the xfs_sync_worker before teardown of the log in xfs_unmountfs. This prevents occasional crashes on unmount like so: PID: 21602 TASK: ee9df060 CPU: 0 COMMAND: "kworker/0:3" #0 [c5377d28] crash_kexec at c0292c94 #1 [c5377d80] oops_end at c07090c2 #2 [c5377d98] no_context at c06f614e #3 [c5377dbc] __bad_area_nosemaphore at c06f6281 #4 [c5377df4] bad_area_nosemaphore at c06f629b #5 [c5377e00] do_page_fault at c070b0cb #6 [c5377e7c] error_code (via page_fault) at c070892c EAX: f300c6a8 EBX: f300c6a8 ECX: 000000c0 EDX: 000000c0 EBP: c5377ed0 DS: 007b ESI: 00000000 ES: 007b EDI: 00000001 GS: ffffad20 CS: 0060 EIP: c0481ad0 ERR: ffffffff EFLAGS: 00010246 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs] #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs] #10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs] #11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs] #12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs] #13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs] #14 [c5377f58] process_one_work at c024ee4c #15 [c5377f98] worker_thread at c024f43d #16 [c5377fbc] kthread at c025326b #17 [c5377fe8] kernel_thread_helper at c070e834 PID: 26653 TASK: e79143b0 CPU: 3 COMMAND: "umount" #0 [cde0fda0] __schedule at c0706595 #1 [cde0fe28] schedule at c0706b89 #2 [cde0fe30] schedule_timeout at c0705600 #3 [cde0fe94] __down_common at c0706098 #4 [cde0fec8] __down at c0706122 #5 [cde0fed0] down at c025936f #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs] #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs] #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs] #9 [cde0ff1c] generic_shutdown_super at c0333d7a #10 [cde0ff38] kill_block_super at c0333e0f #11 [cde0ff48] deactivate_locked_super at c0334218 #12 [cde0ff58] deactivate_super at c033495d #13 [cde0ff68] mntput_no_expire at c034bc13 #14 [cde0ff7c] sys_umount at c034cc69 #15 [cde0ffa0] sys_oldumount at c034ccd4 #16 [cde0ffb0] system_call at c0707e66 commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up at a later date. Signed-off-by: Ben Myers <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Mark Tinguely <[email protected]>
2012-09-18cifs: fix return value in cifsConvertToUTF16Jeff Layton1-1/+1
This function returns the wrong value, which causes the callers to get the length of the resulting pathname wrong when it contains non-ASCII characters. This seems to fix https://bugzilla.samba.org/show_bug.cgi?id=6767 Cc: <[email protected]> Reported-by: Baldvin Kovacs <[email protected]> Reported-and-Tested-by: Nicolas Lefebvre <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
2012-09-18vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()Miklos Szeredi1-2/+2
IBM reported a soft lockup after applying the fix for the rename_lock deadlock. Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was found to be the culprit. The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was killed. This flag can be set on non-killed dentries too, which results in infinite retries when trying to traverse the dentry tree. This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set in d_kill() and makes try_to_ascend() test only this flag. IBM reported successful test results with this patch. Signed-off-by: Miklos Szeredi <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2012-09-18ext4: make orphan functions be no-op in no-journal modeAnatol Pomozov1-4/+3
Instead of checking whether the handle is valid, we check if journal is enabled. This avoids taking the s_orphan_lock mutex in all cases when there is no journal in use, including the error paths where ext4_orphan_del() is called with a handle set to NULL. Signed-off-by: Anatol Pomozov <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2012-09-18ext4: re-enable -o discard functionality in no-journal modeTheodore Ts'o1-0/+2
This is a revert of commit b56ff9d397ce, which removed the call to ext4_issue_discard() to fix a BUG reported because ext4_issue_discard() was being called from inside a block group spinlock. As it turns out this bug had already been fixed by Lukas Czerner in commit 53fdcf992d61 by the simple expedient of moving when we call ext4_issue_discard() outside the spinlock. So it should be safe to re-enable this functionality, which I tested by putting an BUG_ON(in_atomic) just after the restored callsite to ext4_issue_discard(). Addresses-Google-Bug: #6750518 Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: Anatol Pomozov <[email protected]>
2012-09-18jfs: Fix do_div precision in commit b40c2e66Dave Kleikamp1-3/+4
In a hasty fix to replace a 64-bit division with do_div, I unintentionally assigned the divisor to a 32-bit variable. Signed-off-by: Dave Kleikamp <[email protected]> Cc: Tino Reichardt <[email protected]>
2012-09-18userns: Convert quotaEric W. Biederman1-15/+14
Now that the type changes are done, here is the final set of changes to make the quota code work when user namespaces are enabled. Small cleanups and fixes to make the code build when user namespaces are enabled. Cc: Jan Kara <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2012-09-18userns: Convert struct dquot_warnEric W. Biederman1-9/+7
Convert w_dq_id to be a struct kquid and remove the now unncessary w_dq_type. This is a simple conversion and enough other places have already been converted that this actually reduces the code complexity by a little bit, when removing now unnecessary type conversions. Cc: Jan Kara <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2012-09-18userns: Convert struct dquot dq_id to be a struct kqidEric W. Biederman8-80/+101
Change struct dquot dq_id to a struct kqid and remove the now unecessary dq_type. Make minimal changes to dquot, quota_tree, quota_v1, quota_v2, ext3, ext4, and ocfs2 to deal with the change in quota structures and signatures. The ocfs2 changes are larger than most because of the extensive tracing throughout the ocfs2 quota code that prints out dq_id. quota_tree.c:get_index is modified to take a struct kqid instead of a qid_t because all of it's callers pass in dquot->dq_id and it allows me to introduce only a single conversion. The rest of the changes are either just replacing dq_type with dq_id.type, adding conversions to deal with the change in type and occassionally adding qid_eq to allow quota id comparisons in a user namespace safe way. Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Jan Kara <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Theodore Tso <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2012-09-18userns: Modify dqget to take struct kqidEric W. Biederman3-14/+16
Modify dqget to take struct kqid instead of a type and an identifier pair. Modify the callers of dqget in ocfs2 and dquot to take generate a struct kqid so they can continue to call dqget. The conversion to create struct kqid should all be the final conversions that are needed in those code paths. Cc: Jan Kara <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2012-09-18userns: Convert quota netlink aka quota_send_warningEric W. Biederman4-12/+20
Modify quota_send_warning to take struct kqid instead a type and identifier pair. When sending netlink broadcasts always convert uids and quota identifiers into the intial user namespace. There is as yet no way to send a netlink broadcast message with different contents to receivers in different namespaces, so for the time being just map all of the identifiers into the initial user namespace which preserves the current behavior. Change the callers of quota_send_warning in gfs2, xfs and dquot to generate a struct kqid to pass to quota send warning. When all of the user namespaces convesions are complete a struct kqid values will be availbe without need for conversion, but a conversion is needed now to avoid needing to convert everything at once. Cc: Ben Myers <[email protected]> Cc: Alex Elder <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Steven Whitehouse <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2012-09-18userns: Convert qutoactlEric W. Biederman4-25/+43
Update the quotactl user space interface to successfull compile with user namespaces support enabled and to hand off quota identifiers to lower layers of the kernel in struct kqid instead of type and qid pairs. The quota on function is not converted because while it takes a quota type and an id. The id is the on disk quota format to use, which is something completely different. The signature of two struct quotactl_ops methods were changed to take struct kqid argumetns get_dqblk and set_dqblk. The dquot, xfs, and ocfs2 implementations of get_dqblk and set_dqblk are minimally changed so that the code continues to work with the change in parameter type. This is the first in a series of changes to always store quota identifiers in the kernel in struct kqid and only use raw type and qid values when interacting with on disk structures or userspace. Always using struct kqid internally makes it hard to miss places that need conversion to or from the kernel internal values. Cc: Jan Kara <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Ben Myers <[email protected]> Cc: Alex Elder <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2012-09-18userns: Implement struct kqidEric W. Biederman2-1/+133
Add the data type struct kqid which holds the kernel internal form of the owning identifier of a quota. struct kqid is a replacement for the implicit union of uid, gid and project id stored in an unsigned int and the quota type field that is was used in the quota data structures. Making the data type explicit allows the kuid_t and kgid_t type safety to propogate more thoroughly through the code, revealing more places where uid/gid conversions need be made. Along with the data type struct kqid comes the helper functions qid_eq, qid_lt, from_kqid, from_kqid_munged, qid_valid, make_kqid, make_kqid_invalid, make_kqid_uid, make_kqid_gid. Cc: Jan Kara <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2012-09-18userns: Add kprojid_t and associated infrastructure in projid.hEric W. Biederman1-0/+15
Implement kprojid_t a cousin of the kuid_t and kgid_t. The per user namespace mapping of project id values can be set with /proc/<pid>/projid_map. A full compliment of helpers is provided: make_kprojid, from_kprojid, from_kprojid_munged, kporjid_has_mapping, projid_valid, projid_eq, projid_eq, projid_lt. Project identifiers are part of the generic disk quota interface, although it appears only xfs implements project identifiers currently. The xfs code allows anyone who has permission to set the project identifier on a file to use any project identifier so when setting up the user namespace project identifier mappings I do not require a capability. Cc: Dave Chinner <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2012-09-18userns: Convert configfs to use kuid and kgid where appropriateEric W. Biederman1-2/+2
Cc: Joel Becker <[email protected]> Acked-by: Serge Hallyn <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2012-09-18userns: Convert extN to support kuids and kgids in posix aclsEric W. Biederman3-23/+60
Convert ext2, ext3, and ext4 to fully support the posix acl changes, using e_uid e_gid instead e_id. Enabled building with posix acls enabled, all filesystems supporting user namespaces, now also support posix acls when user namespaces are enabled. Cc: Theodore Tso <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Jan Kara <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2012-09-18userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattrEric W. Biederman16-47/+49
- Pass the user namespace the uid and gid values in the xattr are stored in into posix_acl_from_xattr. - Pass the user namespace kuid and kgid values should be converted into when storing uid and gid values in an xattr in posix_acl_to_xattr. - Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to pass in &init_user_ns. In the short term this change is not strictly needed but it makes the code clearer. In the longer term this change is necessary to be able to mount filesystems outside of the initial user namespace that natively store posix acls in the linux xattr format. Cc: Theodore Tso <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Jan Kara <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2012-09-18userns: Convert vfs posix_acl support to use kuids and kgidsEric W. Biederman3-20/+107
- In setxattr if we are setting a posix acl convert uids and gids from the current user namespace into the initial user namespace, before the xattrs are passed to the underlying filesystem. Untranslatable uids and gids are represented as -1 which posix_acl_from_xattr will represent as INVALID_UID or INVALID_GID. posix_acl_valid will fail if an acl from userspace has any INVALID_UID or INVALID_GID values. In net this guarantees that untranslatable posix acls will not be stored by filesystems. - In getxattr if we are reading a posix acl convert uids and gids from the initial user namespace into the current user namespace. Uids and gids that can not be tranlsated into the current user namespace will be represented as -1. - Replace e_id in struct posix_acl_entry with an anymouns union of e_uid and e_gid. For the short term retain the e_id field until all of the users are converted. - Don't set struct posix_acl.e_id in the cases where the acl type does not use e_id. Greatly reducing the use of ACL_UNDEFINED_ID. - Rework the ordering checks in posix_acl_valid so that I use kuid_t and kgid_t types throughout the code, and so that I don't need arithmetic on uid and gid types. Cc: Theodore Tso <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Jan Kara <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2012-09-17ext4: fix possible non-initialized variable in htree_dirblock_to_tree()Carlos Maiolino2-2/+4
htree_dirblock_to_tree() declares a non-initialized 'err' variable, which is passed as a reference to another functions expecting them to set this variable with their error codes. It's passed to ext4_bread(), which then passes it to ext4_getblk(). If ext4_map_blocks() returns 0 due to a lookup failure, leaving the ext4_getblk() buffer_head uninitialized, it will make ext4_getblk() return to ext4_bread() without initialize the 'err' variable, and ext4_bread() will return to htree_dirblock_to_tree() with this variable still uninitialized. htree_dirblock_to_tree() will pass this variable with garbage back to ext4_htree_fill_tree(), which expects a number of directory entries added to the rb-tree. which, in case, might return a fake non-zero value due the garbage left in the 'err' variable, leading the kernel to an Oops in ext4_dx_readdir(), once this is expecting a filled rb-tree node, when in turn it will have a NULL-ed one, causing an invalid page request when trying to get a fname struct from this NULL-ed rb-tree node in this line: fname = rb_entry(info->curr_node, struct fname, rb_hash); The patch itself initializes the err variable in htree_dirblock_to_tree() to avoid usage mistakes by the called functions, and also fix ext4_getblk() to return a initialized 'err' variable when ext4_map_blocks() fails a lookup. Signed-off-by: Carlos Maiolino <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2012-09-17ext4: do not enable delalloc by default for ext2Theodore Ts'o1-1/+1
Signed-off-by: Brian Foster <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>
2012-09-17userns: Convert the audit loginuid to be a kuidEric W. Biederman1-2/+10
Always store audit loginuids in type kuid_t. Print loginuids by converting them into uids in the appropriate user namespace, and then printing the resulting uid. Modify audit_get_loginuid to return a kuid_t. Modify audit_set_loginuid to take a kuid_t. Modify /proc/<pid>/loginuid on read to convert the loginuid into the user namespace of the opener of the file. Modify /proc/<pid>/loginud on write to convert the loginuid rom the user namespace of the opener of the file. Cc: Al Viro <[email protected]> Cc: Eric Paris <[email protected]> Cc: Paul Moore <[email protected]> ? Cc: David Miller <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2012-09-17fs/proc: fix potential unregister_sysctl_table hangFrancesco Ruggeri1-3/+2
The unregister_sysctl_table() function hangs if all references to its ctl_table_header structure are not dropped. This can happen sometimes because of a leak in proc_sys_lookup(): proc_sys_lookup() gets a reference to the table via lookup_entry(), but it does not release it when a subsequent call to sysctl_follow_link() fails. This patch fixes this leak by making sure the reference is always dropped on return. See also commit 076c3eed2c31 ("sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry") which reorganized this code in 3.4. Tested in Linux 3.4.4. Signed-off-by: Francesco Ruggeri <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2012-09-17xfs: stop the sync worker before xfs_unmountfsBen Myers1-0/+1
Cancel work of the xfs_sync_worker before teardown of the log in xfs_unmountfs. This prevents occasional crashes on unmount like so: PID: 21602 TASK: ee9df060 CPU: 0 COMMAND: "kworker/0:3" #0 [c5377d28] crash_kexec at c0292c94 #1 [c5377d80] oops_end at c07090c2 #2 [c5377d98] no_context at c06f614e #3 [c5377dbc] __bad_area_nosemaphore at c06f6281 #4 [c5377df4] bad_area_nosemaphore at c06f629b #5 [c5377e00] do_page_fault at c070b0cb #6 [c5377e7c] error_code (via page_fault) at c070892c EAX: f300c6a8 EBX: f300c6a8 ECX: 000000c0 EDX: 000000c0 EBP: c5377ed0 DS: 007b ESI: 00000000 ES: 007b EDI: 00000001 GS: ffffad20 CS: 0060 EIP: c0481ad0 ERR: ffffffff EFLAGS: 00010246 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs] #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs] #10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs] #11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs] #12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs] #13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs] #14 [c5377f58] process_one_work at c024ee4c #15 [c5377f98] worker_thread at c024f43d #16 [c5377fbc] kthread at c025326b #17 [c5377fe8] kernel_thread_helper at c070e834 PID: 26653 TASK: e79143b0 CPU: 3 COMMAND: "umount" #0 [cde0fda0] __schedule at c0706595 #1 [cde0fe28] schedule at c0706b89 #2 [cde0fe30] schedule_timeout at c0705600 #3 [cde0fe94] __down_common at c0706098 #4 [cde0fec8] __down at c0706122 #5 [cde0fed0] down at c025936f #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs] #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs] #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs] #9 [cde0ff1c] generic_shutdown_super at c0333d7a #10 [cde0ff38] kill_block_super at c0333e0f #11 [cde0ff48] deactivate_locked_super at c0334218 #12 [cde0ff58] deactivate_super at c033495d #13 [cde0ff68] mntput_no_expire at c034bc13 #14 [cde0ff7c] sys_umount at c034cc69 #15 [cde0ffa0] sys_oldumount at c034ccd4 #16 [cde0ffb0] system_call at c0707e66 commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up at a later date. Signed-off-by: Ben Myers <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Mark Tinguely <[email protected]>
2012-09-17JFS: use list_move instead of list_del/list_addWei Yongjun1-6/+3
Using list_move() instead of list_del() + list_add(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Dave Kleikamp <[email protected]>
2012-09-17fs/jfs: TRIM support for JFS FilesystemTino Reichardt9-21/+369
This patch adds support for the two linux interfaces of the discard/TRIM command for SSD devices and sparse/thinly-provisioned LUNs. JFS will support batched discard via FITRIM ioctl and online discard with the discard mount option. Signed-off-by: Tino Reichardt <[email protected]> Signed-off-by: Dave Kleikamp <[email protected]>
2012-09-16Merge 3.6-rc7 into driver-core-nextGreg Kroah-Hartman74-599/+743
This pulls in the fixes in that branch that are needed here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-09-16Merge branch 'for-linus' of ↵Linus Torvalds1-6/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull a btrfs revert from Chris Mason: "My for-linus branch has one revert in the new quota code. We're building up more fixes at etc for the next merge window, but I'm keeping them out unless they are bigger regressions or have a huge impact." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Revert "Btrfs: fix some error codes in btrfs_qgroup_inherit()"
2012-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller74-601/+749
Conflicts: net/netfilter/nfnetlink_log.c net/netfilter/xt_LOG.c Rather easy conflict resolution, the 'net' tree had bug fixes to make sure we checked if a socket is a time-wait one or not and elide the logging code if so. Whereas on the 'net-next' side we are calculating the UID and GID from the creds using different interfaces due to the user namespace changes from Eric Biederman. Signed-off-by: David S. Miller <[email protected]>
2012-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixesLinus Torvalds3-44/+61
Pull GFS2 fixes from Steven Whitehouse: "Here are three GFS2 fixes for the current kernel tree. These are all related to the block reservation code which was added at the merge window. That code will be getting an update at the forthcoming merge window too. In the mean time though there are a few smaller issues which should be fixed. The first patch resolves an issue with write sizes of greater than 32 bits with the size hinting code. The second ensures that the allocation data structure is initialised when using xattrs and the third takes into account allocations which may have been made by other nodes which affect a reservation on the local node." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: Take account of blockages when using reserved blocks GFS2: Fix missing allocation data for set/remove xattr GFS2: Make write size hinting code common
2012-09-14Merge tag 'ecryptfs-3.6-rc6-fixes' of ↵Linus Torvalds3-2/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: - Fixes a regression, introduced in 3.6-rc1, when a file is closed before its shared memory mapping is dirtied and unmapped. The lower file was being released when the eCryptfs file was closed and the dirtied pages could not be written out. - Adds a call to the lower filesystem's ->flush() from ecryptfs_flush(). - Fixes a regression, introduced in 2.6.39, when a file is renamed on top of another file. The target file's inode was not being evicted and the space taken by the file was not reclaimed until eCryptfs was unmounted. * tag 'ecryptfs-3.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: Copy up attributes of the lower target inode after rename eCryptfs: Call lower ->flush() from ecryptfs_flush() eCryptfs: Write out all dirty pages just before releasing the lower file
2012-09-14Revert "Btrfs: fix some error codes in btrfs_qgroup_inherit()"Chris Mason1-6/+2
This reverts commit 5986802c2fcc754040bb7ed95f30bb16c4a843b7. Both paths are not error paths but regular cases where non-qgroup subvols are involved. Signed-off-by: Chris Mason <[email protected]>
2012-09-14vfs: make O_PATH file descriptors usable for 'fstat()'Linus Torvalds1-1/+1
We already use them for openat() and friends, but fstat() also wants to be able to use O_PATH file descriptors. This should make it more directly comparable to the O_SEARCH of Solaris. Note that you could already do the same thing with "fstatat()" and an empty path, but just doing "fstat()" directly is simpler and faster, so there is no reason not to just allow it directly. See also commit 332a2e1244bd, which did the same thing for fchdir, for the same reasons. Reported-by: ольга крыжановская <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] # O_PATH introduced in 3.0+ Signed-off-by: Linus Torvalds <[email protected]>
2012-09-14eCryptfs: Copy up attributes of the lower target inode after renameTyler Hicks1-0/+5
After calling into the lower filesystem to do a rename, the lower target inode's attributes were not copied up to the eCryptfs target inode. This resulted in the eCryptfs target inode staying around, rather than being evicted, because i_nlink was not updated for the eCryptfs inode. This also meant that eCryptfs didn't do the final iput() on the lower target inode so it stayed around, as well. This would result in a failure to free up space occupied by the target file in the rename() operation. Both target inodes would eventually be evicted when the eCryptfs filesystem was unmounted. This patch calls fsstack_copy_attr_all() after the lower filesystem does its ->rename() so that important inode attributes, such as i_nlink, are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called and eCryptfs can drop its final reference on the lower inode. http://launchpad.net/bugs/561129 Signed-off-by: Tyler Hicks <[email protected]> Tested-by: Colin Ian King <[email protected]> Cc: <[email protected]> [2.6.39+]
2012-09-14eCryptfs: Call lower ->flush() from ecryptfs_flush()Tyler Hicks1-2/+8
Since eCryptfs only calls fput() on the lower file in ecryptfs_release(), eCryptfs should call the lower filesystem's ->flush() from ecryptfs_flush(). If the lower filesystem implements ->flush(), then eCryptfs should try to flush out any dirty pages prior to calling the lower ->flush(). If the lower filesystem does not implement ->flush(), then eCryptfs has no need to do anything in ecryptfs_flush() since dirty pages are now written out to the lower filesystem in ecryptfs_release(). Signed-off-by: Tyler Hicks <[email protected]>
2012-09-14fs: Build sys_stat64() and friends if __ARCH_WANT_COMPAT_STAT64Catalin Marinas1-2/+2
On AArch64 Linux, we want the sys_stat64() and related functions for compat support but do not need the generic struct stat64, enabled automatically if __ARCH_WANT_STAT64. Signed-off-by: Catalin Marinas <[email protected]> Acked-by: Arnd Bergmann <[email protected]>
2012-09-14eCryptfs: Write out all dirty pages just before releasing the lower fileTyler Hicks1-0/+1
Fixes a regression caused by: 821f749 eCryptfs: Revert to a writethrough cache model That patch reverted some code (specifically, 32001d6f) that was necessary to properly handle open() -> mmap() -> close() -> dirty pages -> munmap(), because the lower file could be closed before the dirty pages are written out. Rather than reapplying 32001d6f, this approach is a better way of ensuring that the lower file is still open in order to handle writing out the dirty pages. It is called from ecryptfs_release(), while we have a lock on the lower file pointer, just before the lower file gets the final fput() and we overwrite the pointer. https://launchpad.net/bugs/1047261 Signed-off-by: Tyler Hicks <[email protected]> Reported-by: Artemy Tregubenko <[email protected]> Tested-by: Artemy Tregubenko <[email protected]> Tested-by: Colin Ian King <[email protected]>
2012-09-13xattr: mark variable as uninitialized to make both gcc and smatch happyAristeu Rozanski1-1/+1
new_xattr in __simple_xattr_set() is only initialized with a valid pointer if value is not NULL, which only happens if this function is called directly with the intention to remove an existing extended attribute. Even being safe to be this way, smatch warns about possible NULL dereference. Dan Carpenter suggested using uninitialized_var() which will make both gcc and smatch happy. Cc: Fengguang Wu <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Aristeu Rozanski <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2012-09-13fs: add missing documentation to simple_xattr functionsAristeu Rozanski1-2/+16
v2: add function documentation instead of adding a separate file under Documentation/ tj: Updated comment a bit and rolled in Randy's suggestions. Cc: Li Zefan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Lennart Poettering <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Aristeu Rozanski <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2012-09-13ext4: advertise the fact that the kernel supports meta_bg resizingTheodore Ts'o1-0/+2
Signed-off-by: "Theodore Ts'o" <[email protected]>
2012-09-13ext4: log a resize update to the console every 10 secondsTheodore Ts'o1-0/+8
For very long online resizes, a periodic update to the console log is helpful for debugging and for progress reporting. Signed-off-by: "Theodore Ts'o" <[email protected]>
2012-09-13ext4: convert file system to meta_bg if needed during resizingTheodore Ts'o1-17/+133
If we have run out of reserved gdt blocks, then clear the resize_inode feature and enable the meta_bg feature, so that we can continue resizing the file system seamlessly. Signed-off-by: "Theodore Ts'o" <[email protected]>
2012-09-13GFS2: Take account of blockages when using reserved blocksSteven Whitehouse1-38/+28
The claim_reserved_blks() function was not taking account of the possibility of "blockages" while performing allocation. This can be caused by another node allocating something in the same extent which has been reserved locally. This patch tests for this condition and then skips the remainder of the reservation in this case. This is a relatively rare event, so that it should not affect the general performance improvement which the block reservations provide. The claim_reserved_blks() function also appears not to be able to deal with reservations which cross bitmap boundaries, but that can be dealt with in a future patch since we don't generate boundary crossing reservations currently. Signed-off-by: Steven Whitehouse <[email protected]> Reported-by: David Teigland <[email protected]> Cc: Bob Peterson <[email protected]>