aboutsummaryrefslogtreecommitdiff
path: root/fs/exec.c
AgeCommit message (Collapse)AuthorFilesLines
2008-05-26posix timers: discard SI_TIMER signals on execOleg Nesterov1-0/+1
Based on Roland's patch. This approach was suggested by Austin Clements from the very beginning, and then by Linus. As Austin pointed out, the execing task can be killed by SI_TIMER signal because exec flushes the signal handlers, but doesn't discard the pending signals generated by posix timers. Perhaps not a bug, but people find this surprising. See http://bugzilla.kernel.org/show_bug.cgi?id=10460 Signed-off-by: Oleg Nesterov <[email protected]> Cc: Austin Clements <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-05-16[PATCH] get rid of leak in compat_execve()Al Viro1-4/+8
Even though copy_compat_strings() doesn't cache the pages, copy_strings_kernel() and stuff indirectly called by e.g. ->load_binary() is doing that, so we need to drop the cache contents in the end. [found by WANG Cong <[email protected]>] Signed-off-by: Al Viro <[email protected]>
2008-05-13memcg: fix possible panic when CONFIG_MM_OWNER=yKOSAKI Motohiro1-1/+1
When mm destruction happens, we should pass mm_update_next_owner() the old mm. But unfortunately new mm is passed in exec_mmap(). Thus, kernel panic is possible when a multi-threaded process uses exec(). Also, the owner member comment description is wrong. mm->owner does not necessarily point to the thread group leader. [[email protected]: coding-style fixes] Signed-off-by: KOSAKI Motohiro <[email protected]> Acked-by: Balbir Singh <[email protected]> Cc: "Paul Menage" <[email protected]> Cc: "KAMEZAWA Hiroyuki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-05-01[PATCH] split linux/file.hAl Viro1-0/+1
Initial splitoff of the low-level stuff; taken to fdtable.h Signed-off-by: Al Viro <[email protected]>
2008-04-30document de_thread() with exit_notify() connectionOleg Nesterov1-1/+1
Add a couple of small comments, it is not easy to see what this code does. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-30signals: de_thread: simplify the ->child_reaper switchingOleg Nesterov1-13/+2
Now that we rely on SIGNAL_UNKILLABLE flag, de_thread() doesn't need the nasty hack to kill the old ->child_reaper during the mt-exec. This also means we can avoid taking tasklist_lock around zap_other_threads(). Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-29procfs task exe symlinkMatt Helsley1-0/+2
The kernel implements readlink of /proc/pid/exe by getting the file from the first executable VMA. Then the path to the file is reconstructed and reported as the result. Because of the VMA walk the code is slightly different on nommu systems. This patch avoids separate /proc/pid/exe code on nommu systems. Instead of walking the VMAs to find the first executable file-backed VMA we store a reference to the exec'd file in the mm_struct. That reference would prevent the filesystem holding the executable file from being unmounted even after unmapping the VMAs. So we track the number of VM_EXECUTABLE VMAs and drop the new reference when the last one is unmapped. This avoids pinning the mounted filesystem. [[email protected]: improve comments] [[email protected]: fix dup_mmap] Signed-off-by: Matt Helsley <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: David Howells <[email protected]> Cc:"Eric W. Biederman" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: YAMAMOTO Takashi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-29cgroups: add an owner to the mm_structBalbir Singh1-0/+1
Remove the mem_cgroup member from mm_struct and instead adds an owner. This approach was suggested by Paul Menage. The advantage of this approach is that, once the mm->owner is known, using the subsystem id, the cgroup can be determined. It also allows several control groups that are virtually grouped by mm_struct, to exist independent of the memory controller i.e., without adding mem_cgroup's for each controller, to mm_struct. A new config option CONFIG_MM_OWNER is added and the memory resource controller selects this config option. This patch also adds cgroup callbacks to notify subsystems when mm->owner changes. The mm_cgroup_changed callback is called with the task_lock() of the new task held and is called just prior to changing the mm->owner. I am indebted to Paul Menage for the several reviews of this patchset and helping me make it lighter and simpler. This patch was tested on a powerpc box, it was compiled with both the MM_OWNER config turned on and off. After the thread group leader exits, it's moved to init_css_state by cgroup_exit(), thus all future charges from runnings threads would be redirected to the init_css_set's subsystem. Signed-off-by: Balbir Singh <[email protected]> Cc: Pavel Emelianov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Sudhir Kumar <[email protected]> Cc: YAMAMOTO Takashi <[email protected]> Cc: Hirokazu Takahashi <[email protected]> Cc: David Rientjes <[email protected]>, Cc: Balbir Singh <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Pekka Enberg <[email protected]> Reviewed-by: Paul Menage <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-29exec: remove argv_len from struct linux_binprmTetsuo Handa1-3/+0
I noticed that 2.6.24.2 calculates bprm->argv_len at do_execve(). But it doesn't update bprm->argv_len after "remove_arg_zero() + copy_strings_kernel()" at load_script() etc. audit_bprm() is called from search_binary_handler() and search_binary_handler() is called from load_script() etc. Thus, I think the condition check if (bprm->argv_len > (audit_argv_kb << 10)) return -E2BIG; in audit_bprm() might return wrong result when strlen(removed_arg) != strlen(spliced_args). Why not update bprm->argv_len at load_script() etc. ? By the way, 2.6.25-rc3 seems to not doing the condition check. Is the field bprm->argv_len no longer needed? Signed-off-by: Tetsuo Handa <[email protected]> Cc: Ollie Wild <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-25[PATCH] sanitize unshare_files/reset_files_structAl Viro1-12/+6
* let unshare_files() give caller the displaced files_struct * don't bother with grabbing reference only to drop it in the caller if it hadn't been shared in the first place * in that form unshare_files() is trivially implemented via unshare_fd(), so we eliminate the duplicate logics in fork.c * reset_files_struct() is not just only called for current; it will break the system if somebody ever calls it for anything else (we can't modify ->files of somebody else). Lose the task_struct * argument. Signed-off-by: Al Viro <[email protected]>
2008-04-25[PATCH] sanitize handling of shared descriptor tables in failing execve()Al Viro1-16/+18
* unshare_files() can fail; doing it after irreversible actions is wrong and de_thread() is certainly irreversible. * since we do it unconditionally anyway, we might as well do it in do_execve() and save ourselves the PITA in binfmt handlers, etc. * while we are at it, binfmt_som actually leaked files_struct on failure. As a side benefit, unshare_files(), put_files_struct() and reset_files_struct() become unexported. Signed-off-by: Al Viro <[email protected]>
2008-03-03Allow ARG_MAX execve string space even with a small stack limitLinus Torvalds1-1/+9
The new code that removed the limitation on the execve string size (which was historically 32 pages) replaced it with a much softer limit based on RLIMIT_STACK which is usually much larger than the traditional limit. See commit b6a2fea39318e43fee84fa7b0b90d68bed92d2ba ("mm: variable length argument support") for details. However, if you have a small stack limit (perhaps because you need lots of stacks in a threaded environment), the new heuristic of allowing up to 1/4th of RLIMIT_STACK to be used for argument and environment strings could actually be smaller than the old limit. So just say that it's ok to have up to ARG_MAX strings regardless of the value of RLIMIT_STACK, and check the rlimit only when going over that traditional limit. (Of course, if you actually have a *really* small stack limit, the whole stack itself will be limited before you hit ARG_MAX, but that has always been true and is clearly the right behaviour anyway). Acked-by: Carlos O'Donell <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ollie Wild <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-14Introduce path_put()Jan Blunck1-2/+2
* Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [[email protected]: fix cifs] Signed-off-by: Jan Blunck <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Cc: <[email protected]> Cc: Al Viro <[email protected]> Cc: Steven French <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-14Embed a struct path into struct nameidata instead of nd->{dentry,mnt}Jan Blunck1-2/+2
This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 6894212 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [[email protected]: coding-style fixes] [[email protected]: fix cifs] [[email protected]: fix smack] Signed-off-by: Jan Blunck <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: Casey Schaufler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08Allow executables larger than 2GBAndi Kleen1-2/+3
This allows us to use executables >2GB. Based on a patch by Dave Anderson Signed-off-by: Andi Kleen <[email protected]> Cc: Dave Anderson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUTDavid Howells1-1/+1
Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set. Not all architectures support the A.OUT binfmt, so the ELF binfmt should not be permitted to go looking for A.OUT libraries to load in such a case. Not only that, but under such conditions A.OUT core dumps are not produced either. To make this work, this patch also does the following: (1) Makes the existence of the contents of linux/a.out.h contingent on CONFIG_ARCH_SUPPORTS_AOUT. (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT core dumping code. (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This is then included only where needed. This means that this bit of arch code will be stored in the appropriate A.OUT binfmt module rather than the core kernel. (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not needed) and FRV. This patch depends on the previous patch to move STACK_TOP[_MAX] out of asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT format is available. [[email protected]: uml: re-remove accidentally restored code] Signed-off-by: David Howells <[email protected]> Cc: <[email protected]> Signed-off-by: Jeff Dike <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08ITIMER_REAL: convert to use struct pidOleg Nesterov1-20/+2
signal_struct->tsk points to the ->group_leader and thus we have the nasty code in de_thread() which has to change it and restart ->real_timer if the leader is changed. Use "struct pid *leader_pid" instead. This also allows us to kill now unneeded send_group_sig_info(). Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Davide Libenzi <[email protected]> Cc: Pavel Emelyanov <[email protected]> Acked-by: Roland McGrath <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-05exec: rework the group exit and fix the race with killOleg Nesterov1-9/+4
As Roland pointed out, we have the very old problem with exec. de_thread() sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then clears signal->flags. All signals (even fatal ones) sent in this window (which is not too small) will be lost. With this patch exec doesn't abuse SIGNAL_GROUP_EXIT. signal_group_exit(), the new helper, should be used to detect exit_group() or exec() in progress. It can have more users, but this patch does only strictly necessary changes. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Davide Libenzi <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Robin Holt <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-05get_task_comm(): return the resultAndrew Morton1-1/+2
It was dumb to make get_task_comm() return void. Change it to return a pointer to the resulting output for caller convenience. Cc: Ulrich Drepper <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-11-28vfs: coredumping fixIngo Molnar1-0/+6
fix: http://bugzilla.kernel.org/show_bug.cgi?id=3043 only allow coredumping to the same uid that the coredumping task runs under. Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Alan Cox <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Acked-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-11-12core dump: remain dumpableRoland McGrath1-2/+4
The coredump code always calls set_dumpable(0) when it starts (even if RLIMIT_CORE prevents any core from being dumped). The effect of this (via task_dumpable) is to make /proc/pid/* files owned by root instead of the user, so the user can no longer examine his own process--in a case where there was never any privileged data to protect. This affects e.g. auxv, environ, fd; in Fedora (execshield) kernels, also maps. In practice, you can only notice this when a debugger has requested PTRACE_EVENT_EXIT tracing. set_dumpable was only used in do_coredump for synchronization and not intended for any security purpose. (It doesn't secure anything that wasn't already unsecured when a process dies by SIGTERM instead of SIGQUIT.) This changes do_coredump to check the core_waiters count as the means of synchronization, which is sufficient. Now we leave the "dumpable" bits alone. Signed-off-by: Roland McGrath <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19Isolate some explicit usage of task->tgidPavel Emelyanov1-2/+2
With pid namespaces this field is now dangerous to use explicitly, so hide it behind the helpers. Also the pid and pgrp fields o task_struct and signal_struct are to be deprecated. Unfortunately this patch cannot be sent right now as this leads to tons of warnings, so start isolating them, and deprecate later. Actually the p->tgid == pid has to be changed to has_group_leader_pid(), but Oleg pointed out that in case of posix cpu timers this is the same, and thread_group_leader() is more preferable. Signed-off-by: Pavel Emelyanov <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19pid namespaces: changes to show virtual ids to userPavel Emelyanov1-2/+2
This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [[email protected]: build fix] [[email protected]: nuther build fix] [[email protected]: yet nuther build fix] [[email protected]: remove unneeded casts] Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: Alexey Dobriyan <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul Menage <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19pid namespaces: use task_pid() to find leader's pidSukadev Bhattiprolu1-1/+1
Use task_pid() to get leader's 'struct pid' and avoid the find_pid(). Signed-off-by: Sukadev Bhattiprolu <[email protected]> Acked-by: Pavel Emelianov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Cedric Le Goater <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Herbert Poetzel <[email protected]> Cc: Kirill Korotaev <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19pid namespaces: rename child_reaper() functionSukadev Bhattiprolu1-1/+1
Rename the child_reaper() function to task_child_reaper() to be similar to other task_* functions and to distinguish the function from 'struct pid_namspace.child_reaper'. Signed-off-by: Sukadev Bhattiprolu <[email protected]> Cc: Pavel Emelianov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Cedric Le Goater <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Herbert Poetzel <[email protected]> Cc: Kirill Korotaev <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19pid namespaces: define and use task_active_pid_ns() wrapperSukadev Bhattiprolu1-1/+1
With multiple pid namespaces, a process is known by some pid_t in every ancestor pid namespace. Every time the process forks, the child process also gets a pid_t in every ancestor pid namespace. While a process is visible in >=1 pid namespaces, it can see pid_t's in only one pid namespace. We call this pid namespace it's "active pid namespace", and it is always the youngest pid namespace in which the process is known. This patch defines and uses a wrapper to find the active pid namespace of a process. The implementation of the wrapper will be changed in when support for multiple pid namespaces are added. Changelog: 2.6.22-rc4-mm2-pidns1: - [Pavel Emelianov, Alexey Dobriyan] Back out the change to use task_active_pid_ns() in child_reaper() since task->nsproxy can be NULL during task exit (so child_reaper() continues to use init_pid_ns). to implement child_reaper() since init_pid_ns.child_reaper to implement child_reaper() since tsk->nsproxy can be NULL during exit. 2.6.21-rc6-mm1: - Rename task_pid_ns() to task_active_pid_ns() to reflect that a process can have multiple pid namespaces. Signed-off-by: Sukadev Bhattiprolu <[email protected]> Acked-by: Pavel Emelianov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Cedric Le Goater <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Herbert Poetzel <[email protected]> Cc: Kirill Korotaev <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19setup vma->vm_page_prot by vm_get_page_prot()Coly Li1-1/+1
This patch uses vm_get_page_prot() to setup vma->vm_page_prot. Though inside vm_get_page_prot() the protection flags is AND with (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED), it does not hurt correct code. Signed-off-by: Coly Li <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17security/ cleanupsAdrian Bunk1-2/+0
This patch contains the following cleanups that are now possible: - remove the unused security_operations->inode_xattr_getsuffix - remove the no longer used security_operations->unregister_security - remove some no longer required exit code - remove a bunch of no longer used exports Signed-off-by: Adrian Bunk <[email protected]> Acked-by: James Morris <[email protected]> Cc: Chris Wright <[email protected]> Cc: Stephen Smalley <[email protected]> Cc: Serge Hallyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17exec: RT sub-thread can livelock and monopolize CPU on execOleg Nesterov1-13/+15
de_thread() yields waiting for ->group_leader to be a zombie. This deadlocks if an rt-prio execer shares the same cpu with ->group_leader. Change the code to use ->group_exit_task/notify_count mechanics. This patch certainly uglifies the code, perhaps someone can suggest something better. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17exec: consolidate 2 fast-pathsOleg Nesterov1-9/+0
Now that we don't pre-allocate the new ->sighand, we can kill the first fast path, it doesn't make sense any longer. At best, it can save one "list_empty()" check but leads to the code duplication. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17exec: simplify the new ->sighand allocationOleg Nesterov1-15/+9
de_thread() pre-allocates newsighand to make sure that exec() can't fail after killing all sub-threads. Imho, this buys nothing, but complicates the code: - this is (mostly) needed to handle CLONE_SIGHAND without CLONE_THREAD tasks, this is very unlikely (if ever used) case - unless we already have some serious problems, GFP_KERNEL allocation should not fail - ENOMEM still can happen after de_thread(), ->sighand is not the last object we have to allocate Change the code to allocate the new ->sighand on demand. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17exec: simplify ->sighand switchingOleg Nesterov1-7/+1
There is no any reason to do recalc_sigpending() after changing ->sighand. To begin with, recalc_sigpending() does not take ->sighand into account. This means we don't need to take newsighand->siglock while changing sighands. rcu_assign_pointer() provides a necessary barrier, and if another process reads the new ->sighand it should either take tasklist_lock or it should use lock_task_sighand() which has a corresponding smp_read_barrier_depends(). Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17exec: remove unnecessary check for MNT_NOEXECMiklos Szeredi1-5/+1
vfs_permission(MAY_EXEC) checks if the filesystem is mounted with "noexec", so there's no need to repeat this check in sys_uselib() and open_exec(). Signed-off-by: Miklos Szeredi <[email protected]> Cc: Al Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17core_pattern: fix up a few miscellaneous bugsNeil Horman1-2/+15
Fix do_coredump to detect a crash in the user mode helper process and abort the attempt to recursively dump core to another copy of the helper process, potentially ad-infinitum. [[email protected]: cleanups] Signed-off-by: Neil Horman <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17core_pattern: allow passing of arguments to user mode helper when ↵Neil Horman1-3/+22
core_pattern is a pipe A rewrite of my previous post for this enhancement. It uses jeremy's split_argv/free_argv library functions to translate core_pattern into an argv array to be passed to the user mode helper process. It also adds a translation to format_corename such that the origional value of RLIMIT_CORE can be passed to userspace as an argument. Signed-off-by: Neil Horman <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17core_pattern: ignore RLIMIT_CORE if core_pattern is a pipeNeil Horman1-4/+15
For some time /proc/sys/kernel/core_pattern has been able to set its output destination as a pipe, allowing a user space helper to receive and intellegently process a core. This infrastructure however has some shortcommings which can be enhanced. Specifically: 1) The coredump code in the kernel should ignore RLIMIT_CORE limitation when core_pattern is a pipe, since file system resources are not being consumed in this case, unless the user application wishes to save the core, at which point the app is restricted by usual file system limits and restrictions. 2) The core_pattern code should be able to parse and pass options to the user space helper as an argv array. The real core limit of the uid of the crashing proces should also be passable to the user space helper (since it is overridden to zero when called). 3) Some miscellaneous bugs need to be cleaned up (specifically the recognition of a recursive core dump, should the user mode helper itself crash. Also, the core dump code in the kernel should not wait for the user mode helper to exit, since the same context is responsible for writing to the pipe, and a read of the pipe by the user mode helper will result in a deadlock. This patch: Remove the check of RLIMIT_CORE if core_pattern is a pipe. In the event that core_pattern is a pipe, the entire core will be fed to the user mode helper. Signed-off-by: Neil Horman <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17Make unregister_binfmt() return voidAlexey Dobriyan1-2/+1
list_del() hardly can fail, so checking for return value is pointless (and current code always return 0). Nobody really cared that return value anyway. Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-17Use list_head in binfmt handlingAlexey Dobriyan1-28/+6
Switch single-linked binfmt formats list to usual list_head's. This leads to one-liners in register_binfmt() and unregister_binfmt(). The downside is one pointer more in struct linux_binfmt. This is not a problem, since the set of registered binfmts on typical box is very small -- (ELF + something distro enabled for you). Test-booted, played with executable .txt files, modprobe/rmmod binfmt_misc. Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-09-20signalfd simplificationDavide Libenzi1-3/+0
This simplifies signalfd code, by avoiding it to remain attached to the sighand during its lifetime. In this way, the signalfd remain attached to the sighand only during poll(2) (and select and epoll) and read(2). This also allows to remove all the custom "tsk == current" checks in kernel/signal.c, since dequeue_signal() will only be called by "current". I think this is also what Ben was suggesting time ago. The external effect of this, is that a thread can extract only its own private signals and the group ones. I think this is an acceptable behaviour, in that those are the signals the thread would be able to fetch w/out signalfd. Signed-off-by: Davide Libenzi <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-08-22exec: kill unsafe BUG_ON(sig->count) checksOleg Nesterov1-3/+0
de_thread: if (atomic_read(&oldsighand->count) <= 1) BUG_ON(atomic_read(&sig->count) != 1); This is not safe without the rmb() in between. The results of two correctly ordered __exit_signal()->atomic_dec_and_test()'s could be seen out of order on our CPU. The same is true for the "thread_group_empty()" case, __unhash_process()'s changes could be seen before atomic_dec_and_test(&sig->count). On some platforms (including i386) atomic_read() doesn't provide even the compiler barrier, in that case these checks are simply racy. Remove these BUG_ON()'s. Alternatively, we can do something like BUG_ON( ({ smp_rmb(); atomic_read(&sig->count) != 1; }) ); Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-08-22signalfd: make it group-wide, fix posix-timers schedulingOleg Nesterov1-7/+2
With this patch any thread can dequeue its own private signals via signalfd, even if it was created by another sub-thread. To do so, we pass "current" to dequeue_signal() if the caller is from the same thread group. This also fixes the scheduling of posix timers broken by the previous patch. If the caller doesn't belong to this thread group, we can't handle __SI_TIMER case properly anyway. Perhaps we should forbid the cross-process signalfd usage and convert ctx->tsk to ctx->sighand. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Davide Libenzi <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Roland McGrath <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-08-18Reset current->pdeath_signal on SUID binary executionMarcel Holtmann1-4/+9
This fixes a vulnerability in the "parent process death signal" implementation discoverd by Wojciech Purczynski of COSEINC PTE Ltd. and iSEC Security Research. http://marc.info/?l=bugtraq&m=118711306802632&w=2 Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-07-19coredump masking: reimplementation of dumpable using two flagsKawai, Hidehiro1-6/+56
This patch changes mm_struct.dumpable to a pair of bit flags. set_dumpable() converts three-value dumpable to two flags and stores it into lower two bits of mm_struct.flags instead of mm_struct.dumpable. get_dumpable() behaves in the opposite way. [[email protected]: export set_dumpable] Signed-off-by: Hidehiro Kawai <[email protected]> Cc: Alan Cox <[email protected]> Cc: David Howells <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Nick Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-07-19mm: variable length argument supportOllie Wild1-218/+396
Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from the old mm into the new mm. We create the new mm before the binfmt code runs, and place the new stack at the very top of the address space. Once the binfmt code runs and figures out where the stack should be, we move it downwards. It is a bit peculiar in that we have one task with two mm's, one of which is inactive. [[email protected]: limit stack size] Signed-off-by: Ollie Wild <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: <[email protected]> Cc: Hugh Dickins <[email protected]> [[email protected]: unexport bprm_mm_init] Signed-off-by: Adrian Bunk <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-07-19audit: rework execve auditPeter Zijlstra1-0/+3
The purpose of audit_bprm() is to log the argv array to a userspace daemon at the end of the execve system call. Since user-space hasn't had time to run, this array is still in pristine state on the process' stack; so no need to copy it, we can just grab it from there. In order to minimize the damage to audit_log_*() copy each string into a temporary kernel buffer first. Currently the audit code requires that the full argument vector fits in a single packet. So currently it does clip the argv size to a (sysctl) limit, but only when execve auditing is enabled. If the audit protocol gets extended to allow for multiple packets this check can be removed. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ollie Wild <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-05-23uselib: add missing MNT_NOEXEC checkChristoph Hellwig1-0/+3
We don't allow loading ELF shared library from noexec points so the same should apply to sys_uselib aswell. Signed-off-by: Christoph Hellwig <[email protected]> Cc: Ulrich Drepper <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-05-17make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename ↵Dan Aloni1-3/+1
size Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename size and change it to 128, so that extensive patterns such as '/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename generation. Signed-off-by: Dan Aloni <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-05-11Merge branch 'audit.b38' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b38' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] Abnormal End of Processes [PATCH] match audit name data [PATCH] complete message queue auditing [PATCH] audit inode for all xattr syscalls [PATCH] initialize name osid [PATCH] audit signal recipients [PATCH] add SIGNAL syscall class (v3) [PATCH] auditing ptrace
2007-05-11signal/timer/event: signalfd coreDavide Libenzi1-2/+9
This patch series implements the new signalfd() system call. I took part of the original Linus code (and you know how badly it can be broken :), and I added even more breakage ;) Signals are fetched from the same signal queue used by the process, so signalfd will compete with standard kernel delivery in dequeue_signal(). If you want to reliably fetch signals on the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This seems to be working fine on my Dual Opteron machine. I made a quick test program for it: http://www.xmailserver.org/signafd-test.c The signalfd() system call implements signal delivery into a file descriptor receiver. The signalfd file descriptor if created with the following API: int signalfd(int ufd, const sigset_t *mask, size_t masksize); The "ufd" parameter allows to change an existing signalfd sigmask, w/out going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new signalfd file. The "mask" allows to specify the signal mask of signals that we are interested in. The "masksize" parameter is the size of "mask". The signalfd fd supports the poll(2) and read(2) system calls. The poll(2) will return POLLIN when signals are available to be dequeued. As a direct consequence of supporting the Linux poll subsystem, the signalfd fd can use used together with epoll(2) too. The read(2) system call will return a "struct signalfd_siginfo" structure in the userspace supplied buffer. The return value is the number of bytes copied in the supplied buffer, or -1 in case of error. The read(2) call can also return 0, in case the sighand structure to which the signalfd was attached, has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will return -EAGAIN in case no signal is available. If the size of the buffer passed to read(2) is lower than sizeof(struct signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also return -ERESTARTSYS in case a signal hits the process. The format of the struct signalfd_siginfo is, and the valid fields depends of the (->code & __SI_MASK) value, in the same way a struct siginfo would: struct signalfd_siginfo { __u32 signo; /* si_signo */ __s32 err; /* si_errno */ __s32 code; /* si_code */ __u32 pid; /* si_pid */ __u32 uid; /* si_uid */ __s32 fd; /* si_fd */ __u32 tid; /* si_fd */ __u32 band; /* si_band */ __u32 overrun; /* si_overrun */ __u32 trapno; /* si_trapno */ __s32 status; /* si_status */ __s32 svint; /* si_int */ __u64 svptr; /* si_ptr */ __u64 utime; /* si_utime */ __u64 stime; /* si_stime */ __u64 addr; /* si_addr */ }; [[email protected]: fix signalfd_copyinfo() on i386] Signed-off-by: Davide Libenzi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-05-11attach_pid() with struct pid parameterSukadev Bhattiprolu1-1/+1
attach_pid() currently takes a pid_t and then uses find_pid() to find the corresponding struct pid. Sometimes we already have the struct pid. We can then skip find_pid() if attach_pid() were to take a struct pid parameter. Signed-off-by: Sukadev Bhattiprolu <[email protected]> Cc: Cedric Le Goater <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>