aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2012-12-20sysv: drop vmtruncateMarco Stornelli2-7/+15
Removed vmtruncate Signed-off-by: Marco Stornelli <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-12-20ufs: drop vmtruncateMarco Stornelli1-5/+10
Removed vmtruncate Signed-off-by: Marco Stornelli <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-12-20fs: Fix imbalance in freeze protection in mark_files_ro()Jan Kara1-1/+1
File descriptors (even those for writing) do not hold freeze protection. Thus mark_files_ro() must call __mnt_drop_write() to only drop protection against remount read-only. Calling mnt_drop_write_file() as we do now results in: [ BUG: bad unlock balance detected! ] 3.7.0-rc6-00028-g88e75b6 #101 Not tainted ------------------------------------- kworker/1:2/79 is trying to release lock (sb_writers) at: [<ffffffff811b33b4>] mnt_drop_write+0x24/0x30 but there are no more locks to release! Reported-by: Zdenek Kabelac <[email protected]> CC: [email protected] Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-12-20vfs: remove DCACHE_NEED_LOOKUPJeff Layton3-57/+3
The code that relied on that flag was ripped out of btrfs quite some time ago, and never added back. Josef indicated that he was going to take a different approach to the problem in btrfs, and that we could just eliminate this flag. Cc: Josef Bacik <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-12-20path_init(): make -ENOTDIR failure exits consistentAl Viro1-2/+2
Signed-off-by: Al Viro <[email protected]>
2012-12-20vfs: remove unneeded permission check from path_initJeff Layton1-6/+1
When path_init is called with a valid dfd, that code checks permissions on the open directory fd and returns an error if the check fails. This permission check is redundant, however. Both callers of path_init immediately call link_path_walk afterward. The first thing that link_path_walk does for pathnames that do not consist only of slashes is to check for exec permissions at the starting point of the path walk. And this check in path_init() is on the path taken only when *name != '/' && *name != '\0'. In most cases, these checks are very quick, but when the dfd is for a file on a NFS mount with the actimeo=0, each permission check goes out onto the wire. The result is 2 identical ACCESS calls. Given that these codepaths are fairly "hot", I think it makes sense to eliminate the permission check in path_init and simply assume that the caller will eventually check the permissions before proceeding. Reported-by: Dave Wysochanski <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-12-20vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flagsMiao Xie1-1/+1
The compiler may optimize the while loop and make the check just be done once, so we should use ACCESS_ONCE() to guard access to ->mnt_flags Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-12-19Merge tag 'for-linus-20121219' of git://git.infradead.org/linux-mtdLinus Torvalds1-2/+4
Pull MTD updates from David Woodhouse: - Various cleanups especially in NAND tests - Add support for NAND flash on BCMA bus - DT support for sh_flctl and denali NAND drivers - Kill obsolete/superceded drivers (fortunet, nomadik_nand) - Fix JFFS2 locking bug in ENOMEM failure path - New SPI flash chips, as usual - Support writing in 'reliable mode' for DiskOnChip G4 - Debugfs support in nandsim * tag 'for-linus-20121219' of git://git.infradead.org/linux-mtd: (96 commits) mtd: nand: typo in nand_id_has_period() comments mtd: nand/gpio: use io{read,write}*_rep accessors mtd: block2mtd: throttle writes by calling balance_dirty_pages_ratelimited. mtd: nand: gpmi: reset BCH earlier, too, to avoid NAND startup problems mtd: nand/docg4: fix and improve read of factory bbt mtd: nand/docg4: reserve bb marker area in ecclayout mtd: nand/docg4: add support for writing in reliable mode mtd: mxc_nand: reorder part_probes to let cmdline override other sources mtd: mxc_nand: fix unbalanced clk_disable() in error path mtd: nandsim: Introduce debugfs infrastructure mtd: physmap_of: error checking to prevent a NULL pointer dereference mtg: docg3: potential divide by zero in doc_write_oob() mtd: bcm47xxnflash: writing support mtd: tests/read: initialize buffer for whole next page mtd: at91: atmel_nand: return bit flips for the PMECC read_page() mtd: fix recovery after failed write-buffer operation in cfi_cmdset_0002.c mtd: nand: onfi need to be probed in 8 bits mode mtd: nand: add NAND_BUSWIDTH_AUTO to autodetect bus width mtd: nand: print flash size during detection mted: nand_wait_ready timeout fix ...
2012-12-18ceph: fix dentry reference leak in ceph_encode_fh()Cyril Roelandt1-1/+3
dput() was not called in the error path. Signed-off-by: Cyril Roelandt <[email protected]> Cc: Sage Weil <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-18Merge branch 'for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull (again) user namespace infrastructure changes from Eric Biederman: "Those bugs, those darn embarrasing bugs just want don't want to get fixed. Linus I just updated my mirror of your kernel.org tree and it appears you successfully pulled everything except the last 4 commits that fix those embarrasing bugs. When you get a chance can you please repull my branch" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Fix typo in description of the limitation of userns_install userns: Add a more complete capability subset test to commit_creds userns: Require CAP_SYS_ADMIN for most uses of setns. Fix cap_capable to only allow owners in the parent user namespace to have caps.
2012-12-18Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds1-7/+9
Pull exofs changes from Boaz Harrosh: "These are just 3 patches, the last two are bug fixes on the error paths in exofs. The important patch is the one to osd_uld which adds sysfs info to osd devices for use by user-mode clustering discovery software. I'm already sitting on this patch since before February this year, It is important for some of the big installation cluster systems, who's been compiling their own kernel just for that patch." Ugh. The osd_uld patch already went through the SCSI tree, so this was kind of pointless. But at least it has the two small error-path fixes.. * 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: don't leak io_state and pages on read error exofs: clean up the correct page collection on write error osduld: Add osdname & systemid sysfs at scsi_osd class
2012-12-18Merge branch 'for-linus' of ↵Linus Torvalds42-1745/+5255
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "A big set of fixes and features. In terms of line count, most of the code comes from Stefan, who added the ability to replace a single drive in place. This is different from how btrfs normally replaces drives, and is much much much faster. Josef is plowing through our synchronous write performance. This pull request does not include the DIO_OWN_WAITING patch that was discussed on the list, but it has a number of other improvements to cut down our latencies and CPU time during fsync/O_DIRECT writes. Miao Xie has a big series of fixes and is spreading out ordered operations over more CPUs. This improves performance and reduces contention. I've put in fixes for error handling around hash collisions. These are going back to individual stable kernels as I test against them. Otherwise we have a lot of fixes and cleanups, thanks everyone! raid5/6 is being rebased against the device replacement code. I'll have it posted this Friday along with a nice series of benchmarks." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (115 commits) Btrfs: fix a bug of per-file nocow Btrfs: fix hash overflow handling Btrfs: don't take inode delalloc mutex if we're a free space inode Btrfs: fix autodefrag and umount lockup Btrfs: fix permissions of empty files not affected by umask Btrfs: put raid properties into global table Btrfs: fix BUG() in scrub when first superblock reading gives EIO Btrfs: do not call file_update_time in aio_write Btrfs: only unlock and relock if we have to Btrfs: use tokens where we can in the tree log Btrfs: optimize leaf_space_used Btrfs: don't memset new tokens Btrfs: only clear dirty on the buffer if it is marked as dirty Btrfs: move checks in set_page_dirty under DEBUG Btrfs: log changed inodes based on the extent map tree Btrfs: add path->really_keep_locks Btrfs: do not mark ems as prealloc if we are writing to them Btrfs: keep track of the extents original block length Btrfs: inline csums if we're fsyncing Btrfs: don't bother copying if we're only logging the inode ...
2012-12-18Merge tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds36-973/+1115
Pull NFS client updates from Trond Myklebust: "Features include: - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code. Remove altogether where possible, and replace with WARN_ON_ONCE and appropriate error returns where not. - NFSv4.1 client adds session dynamic slot table management. There is matching server side code that has been submitted to Bruce for consideration. Together, this code allows the server to dynamically manage the amount of memory it allocates to the duplicate request cache for each client. It will constantly resize those caches to reserve more memory for clients that are hot while shrinking caches for those that are quiescent. In addition, there are assorted bugfixes for the generic NFS write code, fixes to deal with the drop_nlink() warnings, and yet another fix for NFSv4 getacl." * tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (106 commits) SUNRPC: continue run over clients list on PipeFS event instead of break NFS: Don't use SetPageError in the NFS writeback code SUNRPC: variable 'svsk' is unused in function bc_send_request SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket NFSv4.1: Deal effectively with interrupted RPC calls. NFSv4.1: Move the RPC timestamp out of the slot. NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED. NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0 NFS: Fix calls to drop_nlink() NFS: Ensure that we always drop inodes that have been marked as stale nfs: Remove unused list nfs4_clientid_list nfs: Remove duplicate function declaration in internal.h NFS: avoid NULL dereference in nfs_destroy_server SUNRPC handle EKEYEXPIRED in call_refreshresult SUNRPC set gss gc_expiry to full lifetime nfs: fix page dirtying in NFS DIO read codepath nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ NFSv4.1: Be conservative about the client highest slotid NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly nfs: don't extend writes to cover entire page if pagecache is invalid ...
2012-12-17Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds44-149/+523
Merge misc patches from Andrew Morton: "Incoming: - lots of misc stuff - backlight tree updates - lib/ updates - Oleg's percpu-rwsem changes - checkpatch - rtc - aoe - more checkpoint/restart support I still have a pile of MM stuff pending - Pekka should be merging later today after which that is good to go. A number of other things are twiddling thumbs awaiting maintainer merges." * emailed patches from Andrew Morton <[email protected]>: (180 commits) scatterlist: don't BUG when we can trivially return a proper error. docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output fs, fanotify: add @mflags field to fanotify output docs: add documentation about /proc/<pid>/fdinfo/<fd> output fs, notify: add procfs fdinfo helper fs, exportfs: add exportfs_encode_inode_fh() helper fs, exportfs: escape nil dereference if no s_export_op present fs, epoll: add procfs fdinfo helper fs, eventfd: add procfs fdinfo helper procfs: add ability to plug in auxiliary fdinfo providers tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test breakpoint selftests: print failure status instead of cause make error kcmp selftests: print fail status instead of cause make error kcmp selftests: make run_tests fix mem-hotplug selftests: print failure status instead of cause make error cpu-hotplug selftests: print failure status instead of cause make error mqueue selftests: print failure status instead of cause make error vm selftests: print failure status instead of cause make error ubifs: use prandom_bytes mtd: nandsim: use prandom_bytes ...
2012-12-17fs, fanotify: add @mflags field to fanotify outputCyrill Gorcunov1-5/+9
The kernel keeps FAN_MARK_IGNORED_SURV_MODIFY bit separately from fsnotify_mark::mask|ignored_mask thus put it in @mflags (mark flags) field so the user-space reader will be able to detect if such bit were used on mark creation procedure. | pos: 0 | flags: 04002 | fanotify flags:10 event-flags:0 | fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003 | fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4 Signed-off-by: Cyrill Gorcunov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Matthew Helsley <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fs, notify: add procfs fdinfo helperCyrill Gorcunov5-1/+207
This allow us to print out fsnotify details such as watchee inode, device, mask and optionally a file handle. For inotify objects if kernel compiled with exportfs support the output will be | pos: 0 | flags: 02000000 | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:11a1000020542153 | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:49b1060023552153 If kernel compiled without exportfs support, the file handle won't be provided but inode and device only. | pos: 0 | flags: 02000000 | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 For fanotify the output is like | pos: 0 | flags: 04002 | fanotify flags:10 event-flags:0 | fanotify mnt_id:12 mask:3b ignored_mask:0 | fanotify ino:50205 sdev:800013 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:05020500fb1d47e7 To minimize impact on general fsnotify code the new functionality is gathered in fs/notify/fdinfo.c file. Signed-off-by: Cyrill Gorcunov <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Matthew Helsley <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fs, exportfs: add exportfs_encode_inode_fh() helperCyrill Gorcunov1-5/+14
We will need this helper in the next patch to provide a file handle for inotify marks in /proc/pid/fdinfo output. The patch is rather providing the way to use inodes directly when dentry is not available (like in case of inotify system). Signed-off-by: Cyrill Gorcunov <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Matthew Helsley <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fs, exportfs: escape nil dereference if no s_export_op presentCyrill Gorcunov1-1/+1
This routine will be used to generate a file handle in fdinfo output for inotify subsystem, where if no s_export_op present the general export_encode_fh should be used. Thus add a test if s_export_op present inside exportfs_encode_fh itself. Signed-off-by: Cyrill Gorcunov <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Matthew Helsley <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fs, epoll: add procfs fdinfo helperCyrill Gorcunov3-1/+47
This allows us to print out eventpoll target file descriptor, events and data, the /proc/pid/fdinfo/fd consists of | pos: 0 | flags: 02 | tfd: 5 events: 1d data: ffffffffffffffff enabled: 1 [avagin@: fix for unitialized ret variable] Signed-off-by: Cyrill Gorcunov <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Matthew Helsley <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fs, eventfd: add procfs fdinfo helperCyrill Gorcunov1-0/+20
This allows us to print out raw counter value. The /proc/pid/fdinfo/fd output is | pos: 0 | flags: 04002 | eventfd-count: 5a Signed-off-by: Cyrill Gorcunov <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Matthew Helsley <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17procfs: add ability to plug in auxiliary fdinfo providersCyrill Gorcunov1-0/+2
This patch brings ability to print out auxiliary data associated with file in procfs interface /proc/pid/fdinfo/fd. In particular further patches make eventfd, evenpoll, signalfd and fsnotify to print additional information complete enough to restore these objects after checkpoint. To simplify the code we add show_fdinfo callback inside struct file_operations (as Al and Pavel are proposing). Signed-off-by: Cyrill Gorcunov <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Matthew Helsley <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17ubifs: use prandom_bytesAkinobu Mita1-5/+3
This also converts filling memory loop to use memset. Signed-off-by: Akinobu Mita <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: David Laight <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Eilon Greenstein <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Robert Love <[email protected]> Cc: Valdis Kletnieks <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17exec: use -ELOOP for max recursion depthKees Cook4-15/+6
To avoid an explosion of request_module calls on a chain of abusive scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon as maximum recursion depth is hit, the error will fail all the way back up the chain, aborting immediately. This also has the side-effect of stopping the user's shell from attempting to reexecute the top-level file as a shell script. As seen in the dash source: if (cmd != path_bshell && errno == ENOEXEC) { *argv-- = cmd; *argv = cmd = path_bshell; goto repeat; } The above logic was designed for running scripts automatically that lacked the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC, things continue to behave as the shell expects. Additionally, when tracking recursion, the binfmt handlers should not be involved. The recursion being tracked is the depth of calls through search_binary_handler(), so that function should be exclusively responsible for tracking the depth. Signed-off-by: Kees Cook <[email protected]> Cc: halfdog <[email protected]> Cc: P J P <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17proc: pid/status: show all supplementary groupsArtem Bityutskiy1-1/+1
We display a list of supplementary group for each process in /proc/<pid>/status. However, we show only the first 32 groups, not all of them. Although this is rare, but sometimes processes do have more than 32 supplementary groups, and this kernel limitation breaks user-space apps that rely on the group list in /proc/<pid>/status. Number 32 comes from the internal NGROUPS_SMALL macro which defines the length for the internal kernel "small" groups buffer. There is no apparent reason to limit to this value. This patch removes the 32 groups printing limit. The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX, which is currently set to 65536. And this is the maximum count of groups we may possibly print. Signed-off-by: Artem Bityutskiy <[email protected]> Acked-by: Serge E. Hallyn <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17/proc/pid/status: add "Seccomp" fieldKees Cook1-0/+8
It is currently impossible to examine the state of seccomp for a given process. While attaching with gdb and attempting "call prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not reliable. If the process is in seccomp mode 1, this query will kill the process (prctl not allowed), if the process is in mode 2 with prctl not allowed, it will similarly be killed, and in weird cases, if prctl is filtered to return errno 0, it can look like seccomp is disabled. When reviewing the state of running processes, there should be a way to externally examine the seccomp mode. ("Did this build of Chrome end up using seccomp?" "Did my distro ship ssh with seccomp enabled?") This adds the "Seccomp" line to /proc/$pid/status. Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: James Morris <[email protected]> Acked-by: Serge E. Hallyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17procfs: add VmFlags field in smaps outputCyrill Gorcunov1-0/+53
During c/r sessions we've found that there is no way at the moment to fetch some VMA associated flags, such as mlock() and madvise(). This leads us to a problem -- we don't know if we should call for mlock() and/or madvise() after restore on the vma area we're bringing back to life. This patch intorduces a new field into "smaps" output called VmFlags, where all set flags associated with the particular VMA is shown as two letter mnemonics. [ Strictly speaking for c/r we only need mlock/madvise bits but it has been said that providing just a few flags looks somehow inconsistent. So all flags are here now. ] This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as other applications may start to use these fields. The data is encoded in a somewhat awkward two letters mnemonic form, to encourage userspace to be prepared for fields being added or removed in the future. [[email protected]: props to use for_each_set_bit] [[email protected]: props to use array instead of struct] [[email protected]: overall redesign and simplification] [[email protected]: remove unneeded braces per sfr, avoid using bloaty for_each_set_bit()] Signed-off-by: Cyrill Gorcunov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17proc: don't show nonexistent capabilitiesAndrew Vagin1-0/+9
Without this patch it is really hard to interpret a bounding set, if CAP_LAST_CAP is unknown for a current kernel. Non-existant capabilities can not be deleted from a bounding set with help of prctl. E.g.: Here are two examples without/with this patch. CapBnd: ffffffe0fdecffff CapBnd: 00000000fdecffff I suggest to hide non-existent capabilities. Here is two reasons. * It's logically and easier for using. * It helps to checkpoint-restore capabilities of tasks, because tasks can be restored on another kernel, where CAP_LAST_CAP is bigger. Signed-off-by: Andrew Vagin <[email protected]> Cc: Andrew G. Morgan <[email protected]> Reviewed-by: Serge E. Hallyn <[email protected]> Cc: Pavel Emelyanov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fs/fat: strip "cp" prefix from codepage in displayDave Reisner1-1/+2
Option parsing code expects an unsigned integer for the codepage option, but prefixes and stores this option with "cp" before passing to load_nls(). This makes the displayed option in /proc an invalid one. Strip the prefix when printing so that the displayed option is valid for reuse. Signed-off-by: Dave Reisner <[email protected]> Acked-by: OGAWA Hirofumi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fat: ix mount option parsingJan Kara1-11/+11
parse_options() is supposed to return value < 0 on error however we returned 0 (success) in a lot of cases. This actually was not a problem in practice because match_token() used by parse_options() is clever and catches most of the problems for us. Signed-off-by: Jan Kara <[email protected]> Cc: OGAWA Hirofumi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fat: provide option for setting timezone offsetJan Kara3-9/+28
So far FAT either offsets time stamps by sys_tz.minuteswest or leaves them as they are (when tz=UTC mount option is used). However in some cases it is useful if one can specify time stamp offset on his own (e.g. when time zone of the camera connected is different from time zone of the computer, or when HW clock is in UTC and thus sys_tz.minuteswest == 0). So provide a mount option time_offset= which allows user to specify offset in minutes that should be applied to time stamps on the filesystem. akpm: this code would work incorrectly when used via `mount -o remount', because cached inodes would not be updated. But fatfs's fat_remount() is basically a no-op anyway. Signed-off-by: Jan Kara <[email protected]> Acked-by: OGAWA Hirofumi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fat: notify when discard is not supportedNamjae Jeon1-0/+9
Change fatfs so that a warning is emitted when an attempt is made to mount a filesystem with the unsupported `discard' option. ext4 aready does this: http://patchwork.ozlabs.org/patch/192668/ Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Amit Sahrawat <[email protected]> Acked-by: OGAWA Hirofumi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17binfmt_elf: fix corner case kfree of uninitialized dataAlan Cox1-1/+3
If elf_core_dump() is called and fill_note_info() fails in the kmalloc() then it returns 0 but has not yet initialised all the needed fields. As a result we do a kfree(randomness) after correctly skipping the thread data. [[email protected]: checkpatch fixes] Signed-off-by: Alan Cox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17procfs: use kbasename()Andy Shevchenko1-5/+1
[[email protected]: remove duplicated include] Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17fs/notify/inode_mark.c: make fsnotify_find_inode_mark_locked() staticTushar Behera1-2/+3
Fixes following sparse warning: fs/notify/inode_mark.c:127:22: warning: symbol 'fsnotify_find_inode_mark_locked' was not declared. Should it be static? Signed-off-by: Tushar Behera <[email protected]> Cc: Eric Paris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17lseek: the "whence" argument is called "whence"Andrew Morton21-94/+94
But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-17Merge branch 'for-linus' of ↵Linus Torvalds25-305/+493
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "While small this set of changes is very significant with respect to containers in general and user namespaces in particular. The user space interface is now complete. This set of changes adds support for unprivileged users to create user namespaces and as a user namespace root to create other namespaces. The tyranny of supporting suid root preventing unprivileged users from using cool new kernel features is broken. This set of changes completes the work on setns, adding support for the pid, user, mount namespaces. This set of changes includes a bunch of basic pid namespace cleanups/simplifications. Of particular significance is the rework of the pid namespace cleanup so it no longer requires sending out tendrils into all kinds of unexpected cleanup paths for operation. At least one case of broken error handling is fixed by this cleanup. The files under /proc/<pid>/ns/ have been converted from regular files to magic symlinks which prevents incorrect caching by the VFS, ensuring the files always refer to the namespace the process is currently using and ensuring that the ptrace_mayaccess permission checks are always applied. The files under /proc/<pid>/ns/ have been given stable inode numbers so it is now possible to see if different processes share the same namespaces. Through the David Miller's net tree are changes to relax many of the permission checks in the networking stack to allowing the user namespace root to usefully use the networking stack. Similar changes for the mount namespace and the pid namespace are coming through my tree. Two small changes to add user namespace support were commited here adn in David Miller's -net tree so that I could complete the work on the /proc/<pid>/ns/ files in this tree. Work remains to make it safe to build user namespaces and 9p, afs, ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the Kconfig guard remains in place preventing that user namespaces from being built when any of those filesystems are enabled. Future design work remains to allow root users outside of the initial user namespace to mount more than just /proc and /sys." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits) proc: Usable inode numbers for the namespace file descriptors. proc: Fix the namespace inode permission checks. proc: Generalize proc inode allocation userns: Allow unprivilged mounts of proc and sysfs userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file procfs: Print task uids and gids in the userns that opened the proc file userns: Implement unshare of the user namespace userns: Implent proc namespace operations userns: Kill task_user_ns userns: Make create_new_namespaces take a user_ns parameter userns: Allow unprivileged use of setns. userns: Allow unprivileged users to create new namespaces userns: Allow setting a userns mapping to your current uid. userns: Allow chown and setgid preservation userns: Allow unprivileged users to create user namespaces. userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped userns: fix return value on mntns_install() failure vfs: Allow unprivileged manipulation of the mount namespace. vfs: Only support slave subtrees across different user namespaces vfs: Add a user namespace reference from struct mnt_namespace ...
2012-12-17Btrfs: fix a bug of per-file nocowLiu Bo2-3/+5
Users report a bug, the reproducer is: $ mkfs.btrfs /dev/loop0 $ mount /dev/loop0 /mnt/btrfs/ $ mkdir /mnt/btrfs/dir $ chattr +C /mnt/btrfs/dir/ $ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=10; $ lsattr /mnt/btrfs/dir/foo ---------------C- /mnt/btrfs/dir/foo $ filefrag /mnt/btrfs/dir/foo /mnt/btrfs/dir/foo: 1 extent found ---> an extent $ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=1 seek=5 conv=notrunc,nocreat; sync $ filefrag /mnt/btrfs/dir/foo /mnt/btrfs/dir/foo: 3 extents found ---> with nocow, btrfs breaks the extent into three parts The new created file should not only inherit the NODATACOW flag, but also honor NODATASUM flag, because we must do COW on a file extent with checksum. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-17Btrfs: fix hash overflow handlingChris Mason5-2/+95
The handling for directory crc hash overflows was fairly obscure, split_leaf returns EOVERFLOW when we try to extend the item and that is supposed to bubble up to userland. For a while it did so, but along the way we added better handling of errors and forced the FS readonly if we hit IO errors during the directory insertion. Along the way, we started testing only for EEXIST and the EOVERFLOW case was dropped. The end result is that we may force the FS readonly if we catch a directory hash bucket overflow. This fixes a few problem spots. First I add tests for EOVERFLOW in the places where we can safely just return the error up the chain. btrfs_rename is harder though, because it tries to insert the new directory item only after it has already unlinked anything the rename was going to overwrite. Rather than adding very complex logic, I added a helper to test for the hash overflow case early while it is still safe to bail out. Snapshot and subvolume creation had a similar problem, so they are using the new helper now too. Signed-off-by: Chris Mason <[email protected]> Reported-by: Pascal Junod <[email protected]>
2012-12-17Merge branch 'for_linus' of ↵Linus Torvalds4-10/+14
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3, udf, quota fixes from Jan Kara: "Some ext3 & quota cleanups and couple of udf fixes" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Use the pre-processor to compile out quotactl_cmd_write when !CONFIG_BLOCK ext3: drop if around WARN_ON ext3: get rid of the duplicate code on ext3_fill_super udf: remove un-needed variable from inode_getblk udf: don't increment lenExtents while writing to a hole udf: fix memory leak while allocating blocks during write
2012-12-16Btrfs: don't take inode delalloc mutex if we're a free space inodeJosef Bacik1-6/+19
This confuses and angers lockdep even though it's ok. We don't really need the lock for free space inodes since only the transaction committer will be reserving space. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: fix autodefrag and umount lockupJosef Bacik1-2/+17
This happens because writeback_inodes_sb_nr_if_idle does down_read. This doesn't work for us and it has not been fixed upstream yet, so do it ourselves and use that instead so we can stop having this stupid long standing lockup. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: fix permissions of empty files not affected by umaskFilipe Brandenburger1-0/+4
When a new file is created with btrfs_create(), the inode will initially be created with permissions 0666 and later on in btrfs_init_acl() it will be adapted to mask out the umask bits. The problem is that this change won't make it into the btrfs_inode unless there's another change to the inode (e.g. writing content changing the size or touching the file changing the mtime.) This fix adds a call to btrfs_update_inode() to btrfs_create() to make sure that the change will not get lost if the in-memory inode is flushed before other changes are made to the file. Signed-off-by: Filipe Brandenburger <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: put raid properties into global tableLiu Bo4-33/+29
Raid properties can be shared among raid calculation code, we can put them into a global table to keep it simple. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: fix BUG() in scrub when first superblock reading gives EIOStefan Behrens1-0/+11
This fixes a very special case that can be reproduced by just disconnecting a disk at runtime, and without unmounting the filesystem first, start scrub on the filesystem with the disconnected disk. All read and write EIOs are handled correctly, only the first superblock is an exception and gives a BUG() in a subfunction. The BUG() is correct, it would crash later otherwise. The subfunction must not be called for superblocks and this is what the fix changes. Reported-by: Joeri Vanthienen <[email protected]> Signed-off-by: Stefan Behrens <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: do not call file_update_time in aio_writeJosef Bacik2-29/+48
This starts a transaction and dirties the inode everytime we call it, which is super expensive if you have a write heavy workload. We will be updating the inode when the IO completes and we reserve the space for the inode update when we reserve space for the write, so there is no chance of loss of information or enospc issues. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: only unlock and relock if we have toJosef Bacik1-1/+4
I noticed while doing fsync tests that we were always dropping the path and re-searching when we first cow the log root even though we've already gotten the write lock on the root. That's because we don't take into account that there might not be a parent node, so fix the check to make sure there is actually a parent node before we undo all of this work for nothing. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: use tokens where we can in the tree logJosef Bacik1-54/+73
If we are syncing over and over the overhead of doing all those maps in fill_inode_item and log_changed_extents really starts to hurt, so use map tokens so we can avoid all the extra mapping. Since the token maps from our offset to the end of the page make sure to set the first thing in the item first so we really only do one map. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: optimize leaf_space_usedJosef Bacik1-2/+9
This gets called at least 4 times for every level while adding an object, and it involves 3 kmapping calls, which on my box take about 5us a piece. So instead use a token, which brings us down to 1 kmap call and makes this function take 1/3 of the time per call. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: don't memset new tokensJosef Bacik1-1/+1
Our token logic depends on token->kaddr being set, and if it is not it sets everything properly as needed. So instead of memsetting just set token->kaddr to NULL. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2012-12-16Btrfs: only clear dirty on the buffer if it is marked as dirtyJosef Bacik1-4/+4
No reason to set the path blocking or loop through all of the pages if the extent buffer isn't actually marked dirty. Thanks, Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>