blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2008-05-24	signals: fix sigqueue_free() vs __exit_signal() race	Oleg Nesterov	1	-1/+6
	__exit_signal() does flush_sigqueue(tsk->pending) outside of ->siglock. This can race with another thread doing sigqueue_free(), we can free the same SIGQUEUE_PREALLOC sigqueue twice or corrupt the pending->list. Note that even sys_exit_group() can trigger this race, not only sys_timer_delete(). Move the callsite of flush_sigqueue(tsk->pending) under ->siglock. This patch doesn't touch flush_sigqueue(->shared_pending) below, it is called when there are no other threads which can play with signals, and sigqueue_free() can't be used outside of our thread group. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-05-01	[PATCH] split linux/file.h	Al Viro	1	-0/+1
	Initial splitoff of the low-level stuff; taken to fdtable.h Signed-off-by: Al Viro <[email protected]>
2008-04-30	pids: __set_special_pids: use change_pid() helper	Oleg Nesterov	1	-4/+2
	Use change_pid() instead of detach_pid() + attach_pid() in __set_special_pids(). This way task_session() is not NULL in between. Signed-off-by: Oleg Nesterov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-30	ptrace: introduce ptrace_reparented() helper	Oleg Nesterov	1	-5/+4
	Add another trivial helper for the sake of grep. It also auto-documents the fact that ->parent != real_parent implies ->ptrace. No functional changes. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-30	document de_thread() with exit_notify() connection	Oleg Nesterov	1	-0/+1
	Add a couple of small comments, it is not easy to see what this code does. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-30	reparent_thread: use same_thread_group()	Oleg Nesterov	1	-1/+1
	Trivial, use same_thread_group() in reparent_thread(). Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-30	ptrace: introduce task_detached() helper	Oleg Nesterov	1	-20/+25
	exit.c has numerous "->exit_signal == -1" comparisons, this check is subtle and deserves a helper. Imho makes the code more parseable for humans. At least it's surely more greppable. Also, a couple of whitespace cleanups. No functional changes. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-30	signals: do_group_exit(): use signal_group_exit() more consistently	Oleg Nesterov	1	-3/+4
	do_group_exit() checks SIGNAL_GROUP_EXIT to avoid taking sighand->siglock. Since ed5d2cac114202fe2978a9cbcab8f5032796d538 exec() doesn't set this flag, we should use signal_group_exit(). This is not needed for correctness, but can speedup the multithreaded exec and makes the code more consistent. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-29	cgroups: add an owner to the mm_struct	Balbir Singh	1	-0/+83
	Remove the mem_cgroup member from mm_struct and instead adds an owner. This approach was suggested by Paul Menage. The advantage of this approach is that, once the mm->owner is known, using the subsystem id, the cgroup can be determined. It also allows several control groups that are virtually grouped by mm_struct, to exist independent of the memory controller i.e., without adding mem_cgroup's for each controller, to mm_struct. A new config option CONFIG_MM_OWNER is added and the memory resource controller selects this config option. This patch also adds cgroup callbacks to notify subsystems when mm->owner changes. The mm_cgroup_changed callback is called with the task_lock() of the new task held and is called just prior to changing the mm->owner. I am indebted to Paul Menage for the several reviews of this patchset and helping me make it lighter and simpler. This patch was tested on a powerpc box, it was compiled with both the MM_OWNER config turned on and off. After the thread group leader exits, it's moved to init_css_state by cgroup_exit(), thus all future charges from runnings threads would be redirected to the init_css_set's subsystem. Signed-off-by: Balbir Singh <[email protected]> Cc: Pavel Emelianov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Sudhir Kumar <[email protected]> Cc: YAMAMOTO Takashi <[email protected]> Cc: Hirokazu Takahashi <[email protected]> Cc: David Rientjes <[email protected]>, Cc: Balbir Singh <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Pekka Enberg <[email protected]> Reviewed-by: Paul Menage <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-28	mempolicy: rename mpol_free to mpol_put	Lee Schermerhorn	1	-1/+1
	This is a change that was requested some time ago by Mel Gorman. Makes sense to me, so here it is. Note: I retain the name "mpol_free_shared_policy()" because it actually does free the shared_policy, which is NOT a reference counted object. However, ... The mempolicy object[s] referenced by the shared_policy are reference counted, so mpol_put() is used to release the reference held by the shared_policy. The mempolicy might not be freed at this time, because some task attached to the shared object associated with the shared policy may be in the process of allocating a page based on the mempolicy. In that case, the task performing the allocation will hold a reference on the mempolicy, obtained via mpol_shared_policy_lookup(). The mempolicy will be freed when all tasks holding such a reference have called mpol_put() for the mempolicy. Signed-off-by: Lee Schermerhorn <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-25	[PATCH] sanitize unshare_files/reset_files_struct	Al Viro	1	-1/+2
	* let unshare_files() give caller the displaced files_struct * don't bother with grabbing reference only to drop it in the caller if it hadn't been shared in the first place * in that form unshare_files() is trivially implemented via unshare_fd(), so we eliminate the duplicate logics in fork.c * reset_files_struct() is not just only called for current; it will break the system if somebody ever calls it for anything else (we can't modify ->files of somebody else). Lose the task_struct * argument. Signed-off-by: Al Viro <[email protected]>
2008-04-25	[PATCH] sanitize handling of shared descriptor tables in failing execve()	Al Viro	1	-3/+0
	* unshare_files() can fail; doing it after irreversible actions is wrong and de_thread() is certainly irreversible. * since we do it unconditionally anyway, we might as well do it in do_execve() and save ourselves the PITA in binfmt handlers, etc. * while we are at it, binfmt_som actually leaked files_struct on failure. As a side benefit, unshare_files(), put_files_struct() and reset_files_struct() become unexported. Signed-off-by: Al Viro <[email protected]>
2008-04-22	[PATCH] get rid of __exit_files(), __exit_fs() and __put_fs_struct()	Al Viro	1	-21/+6
	The only reason to have separated __...() for those was to keep them inlined for local users in exit.c. Since Alexey removed the inline on those, there's no reason whatsoever to keep them around; just collapse with normal variants. Signed-off-by: Al Viro <[email protected]>
2008-04-10	asmlinkage_protect replaces prevent_tail_call	Roland McGrath	1	-2/+2
	The prevent_tail_call() macro works around the problem of the compiler clobbering argument words on the stack, which for asmlinkage functions is the caller's (user's) struct pt_regs. The tail/sibling-call optimization is not the only way that the compiler can decide to use stack argument words as scratch space, which we have to prevent. Other optimizations can do it too. Until we have new compiler support to make "asmlinkage" binding on the compiler's own use of the stack argument frame, we have work around all the manifestations of this issue that crop up. More cases seem to be prevented by also keeping the incoming argument variables live at the end of the function. This makes their original stack slots attractive places to leave those variables, so the compiler tends not clobber them for something else. It's still no guarantee, but it handles some observed cases that prevent_tail_call() did not. Signed-off-by: Roland McGrath <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-03-08	Fix waitid si_code regression	Roland McGrath	1	-1/+1
	In commit ee7c82da830ea860b1f9274f1f0cdf99f206e7c2 ("wait_task_stopped: simplify and fix races with SIGCONT/SIGKILL/untrace"), the magic (short) cast when storing si_code was lost in wait_task_stopped. This leaks the in-kernel CLD_* values that do not match what userland expects. Signed-off-by: Roland McGrath <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-03-03	exit_notify: fix kill_orphaned_pgrp() usage with mt exit	Oleg Nesterov	1	-3/+4
	1. exit_notify() always calls kill_orphaned_pgrp(). This is wrong, we should do this only when the whole process exits. 2. exit_notify() uses "current" as "ignored_task", obviously wrong. Use ->group_leader instead. Test case: void hup(int sig) { printf("HUP received\n"); } void tfunc(void arg) { sleep(2); printf("sub-thread exited\n"); return NULL; } int main(int argc, char *argv[]) { if (!fork()) { signal(SIGHUP, hup); kill(getpid(), SIGSTOP); exit(0); } pthread_t thr; pthread_create(&thr, NULL, tfunc, NULL); sleep(1); printf("main thread exited\n"); syscall(__NR_exit, 0); return 0; } output: main thread exited HUP received Hangup With this patch the output is: main thread exited sub-thread exited HUP received Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-03-03	will_become_orphaned_pgrp: partially fix insufficient ->exit_state check	Oleg Nesterov	1	-9/+8
	p->exit_state != 0 doesn't mean this process is dead, it may have sub-threads. Change the code to use "p->exit_state && thread_group_empty(p)" instead. Without this patch, ^Z doesn't deliver SIGTSTP to the foreground process if the main thread has exited. However, the new check is not perfect either. There is a window when exit_notify() drops tasklist and before release_task(). Suppose that the last (non-leader) thread exits. This means that entire group exits, but thread_group_empty() is not true yet. As Eric pointed out, is_global_init() is wrong as well, but I did not dare to do other changes. Just for the record, has_stopped_jobs() is absolutely wrong too. But we can't fix it now, we should first fix SIGNAL_STOP_STOPPED issues. Even with this patch ^Z doesn't play well with the dead main thread. The task is stopped correctly but do_wait(WSTOPPED) won't see it. This is another unrelated issue, will be (hopefully) fixed separately. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-03-03	introduce kill_orphaned_pgrp() helper	Oleg Nesterov	1	-39/+35
	Factor out the common code in reparent_thread() and exit_notify(). No functional changes. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-14	Use struct path in fs_struct	Jan Blunck	1	-8/+4
	* Use struct path in fs_struct. Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Jan Blunck <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	kernel: remove fastcall in kernel/*	Harvey Harrison	1	-2/+2
	[[email protected]: coding-style fixes] Signed-off-by: Harvey Harrison <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	Pidns: make full use of xxx_vnr() calls	Pavel Emelyanov	1	-3/+3
	Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were _all_ converted to operate on the current pid namespace. After this each call like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo) one. Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where appropriate. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	pid: sys_wait... fixes	Eric W. Biederman	1	-24/+56
	This modifies do_wait and eligible child to take a pair of enum pid_type and struct pid *pid to precisely specify what set of processes are eligible to be waited for, instead of the raw pid_t value from sys_wait4. This fixes a bug in sys_waitid where you could not wait for children in just process group 1. This fixes a pid namespace crossing case in eligible_child. Allowing us to wait for a processes in our current process group even if our current process group == 0. This allows the no child with this pid case to be optimized. This allows us to optimize the pid membership test in eligible child to be optimized. This even closes a theoretical pid wraparound race where in a threaded parent if two threads are waiting for the same child and one thread picks up the child and the pid numbers wrap around and generate another child with that same pid before the other thread is scheduled (teribly insanely unlikely) we could end up waiting on the second child with the same pid# and not discover that the specific child we were waiting for has exited. [[email protected]: coding-style fixes] Signed-off-by: Eric W. Biederman <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	move the related code from exit_notify() to exit_signals()	Oleg Nesterov	1	-18/+0
	The previous bugfix was not optimal, we shouldn't care about group stop when we are the only thread or the group stop is in progress. In that case nothing special is needed, just set PF_EXITING and return. Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify(). So, from the performance POV the only difference is that we don't trust !signal_pending() until we take ->siglock. But this in fact fixes another ___pure___ theoretical minor race. __group_complete_signal() finds the task without PF_EXITING and chooses it as the target for signal_wake_up(). But nothing prevents this task from exiting in between without noticing the pending signal and thus unpredictably delaying the actual delivery. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Davide Libenzi <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	fix group stop with exit race	Oleg Nesterov	1	-1/+1
	do_signal_stop() counts all sub-thread and sets ->group_stop_count accordingly. Every thread should decrement ->group_stop_count and stop, the last one should notify the parent. However a sub-thread can exit before it notices the signal_pending(), or it may be somewhere in do_exit() already. In that case the group stop never finishes properly. Note: this is a minimal fix, we can add some optimizations later. Say we can return quickly if thread_group_empty(). Also, we can move some signal related code from exit_notify() to exit_signals(). Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Davide Libenzi <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	move daemonized kernel threads into the swapper's session	Oleg Nesterov	1	-1/+1
	Daemonized kernel threads run in the init's session. This doesn't match the behaviour of kthread_create()'ed threads, and this is one of the 2 reasons why we need a special hack in sys_setsid(). Now that set_special_pids() was changed to use struct pid, not pid_t, we can use init_struct_pid and set 0,0 special pids. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	teach set_special_pids() to use struct pid	Oleg Nesterov	1	-15/+15
	Change set_special_pids() to work with struct pid, not pid_t from global name space. This again speedups and imho cleanups the code, also a preparation for the next patch. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	wait_task_zombie: remove ->exit_state/exit_signal checks for WNOWAIT	Oleg Nesterov	1	-4/+0
	The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense. The exit_state was EXIT_ZOMBIE when the function was called, and another thread can change it to EXIT_DEAD right after the check. The second condition is not possible, detached non-traced threads were already filtered out by eligible_child(), we didn't drop tasklist since then. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	wait_task_continued/zombie: don't use task_pid_nr_ns() lockless	Oleg Nesterov	1	-10/+5
	Surprise, the other two wait_task_*() functions also abuse the task_pid_nr_ns() function, and may cause read-after-free or report nr == 0 in wait_task_continued(). wait_task_zombie() doesn't have this problem, but it is still better to cache pid_t rather than call task_pid_nr_ns() three times on the saved pid_namespace. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	do_wait: fix security checks	Oleg Nesterov	1	-21/+18
	Imho, the current usage of security_task_wait() is not logical. Suppose we have the single child p, and security_task_wait(p) return -EANY. In that case waitpid(-1) returns this error. Why? Isn't it better to return ECHLD? We don't really have reapable children. Now suppose that child was stolen by gdb. In that case we find this child on ->ptrace_children and set flag = 1, but we don't check that the child was denied. So, do_wait(..., WNOHANG) returns 0, this doesn't match the behaviour above. Without WNOHANG do_wait() blocks only to return the error later, when the child will be untraced. Inho, really strange. I think eligible_child() should return the error only if the child's pid was requested explicitly, otherwise we should silently ignore the tasks which were nacked by security_task_wait(). Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Cc: Chris Wright <[email protected]> Cc: Eric Paris <[email protected]> Cc: James Morris <[email protected]> Cc: Stephen Smalley <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	do_wait: cleanup delay_group_leader() usage	Oleg Nesterov	1	-13/+4
	eligible_child() == 2 means delay_group_leader(). With the previous patch this only matters for EXIT_ZOMBIE task, we can move that special check to the only place it is really needed. Also, with this patch we don't skip security_task_wait() for the group leaders in a non-empty thread group. I don't really understand the exact semantics of security_task_wait(), but imho this change is a bugfix. Also rearrange the code a bit to kill an ugly "check_continued" backdoor. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Eric Paris <[email protected]> Cc: James Morris <[email protected]> Cc: Roland McGrath <[email protected]> Cc: Stephen Smalley <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	wait_task_stopped(): remove unneeded delay_group_leader check	Oleg Nesterov	1	-4/+3
	wait_task_stopped() doesn't need the "delay_group_leader" parameter. If the child is not traced it must be a group leader. With or without subthreads ->group_stop_count == 0 when the whole task is stopped. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Mika Penttila <[email protected]> Acked-by: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	do_wait: factor out "retval != 0" checks	Oleg Nesterov	1	-8/+4
	Every branch if the main "if" statement does the same code at the end. Move it down. Also, fix the indentation. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	wait_task_stopped: simplify and fix races with SIGCONT/SIGKILL/untrace	Oleg Nesterov	1	-62/+26
	wait_task_stopped() has multiple races with SIGCONT/SIGKILL. tasklist_lock does not pin the child in TASK_TRACED/TASK_STOPPED stated, almost all info reported (including exit_code) may be wrong. In fact, the code under write_lock_irq(tasklist_lock) is not safe. The child may be PTRACE_DETACH'ed at this time by another subthread, in that case it is possible we are no longer its ->parent. Change wait_task_stopped() to take ->siglock before inspecting the task. This guarantees that the child can't resume and (for example) clear its ->exit_code, so we don't need to use xchg(&p->exit_code) and re-check. The only exception is ptrace_stop() which changes ->state and ->exit_code without ->siglock held during abort. But this can only happen if both the tracer and the tracee are dying (coredump is in progress), we don't care. With this patch wait_task_stopped() doesn't move the child to the end of the ->parent list on success. This optimization could be restored, but in that case we have to take write_lock(tasklist) and do some nasty checks. Also change the do_wait() since we don't return EAGAIN any longer. [[email protected]: fix up after Willy renamed everything] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	kill my_ptrace_child()	Oleg Nesterov	1	-20/+3
	Now that my_ptrace_child() is trivial we can use the "p->ptrace & PT_PTRACED" inline and simplify the corresponding logic in do_wait: we can't find the child in TASK_TRACED state without PT_PTRACED flag set, ptrace_untrace() either sets TASK_STOPPED or wakes up the tracee. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-08	kill PT_ATTACHED	Oleg Nesterov	1	-12/+1
	Since the patch "Fix ptrace_attach()/ptrace_traceme()/de_thread() race" commit f5b40e363ad6041a96e3da32281d8faa191597b9 we set PT_ATTACHED and change child->parent "atomically" wrt task_list lock. This means we can remove the checks like "PT_ATTACHED && ->parent != ptracer" which were needed to catch the "ptrace attach is in progress" case. We can also remove the flag itself since nobody else uses it. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-06	do_wait: remove one "else if" branch	Oleg Nesterov	1	-3/+1
	Minor cleanup. We can remove one "else if" branch. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-05	exec: rework the group exit and fix the race with kill	Oleg Nesterov	1	-1/+2
	As Roland pointed out, we have the very old problem with exec. de_thread() sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then clears signal->flags. All signals (even fatal ones) sent in this window (which is not too small) will be lost. With this patch exec doesn't abuse SIGNAL_GROUP_EXIT. signal_group_exit(), the new helper, should be used to detect exit_group() or exec() in progress. It can have more users, but this patch does only strictly necessary changes. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Davide Libenzi <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Robin Holt <[email protected]> Cc: Roland McGrath <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-02-03	Remove one useless extern declaration	Pierre Peiffer	1	-2/+0
	The file exit.c contains one useless extern declaration of sem_exit(). Moreover, it refers to nothing. This trivial patch removes it. Signed-off-by: Pierre Peiffer <[email protected]> Signed-off-by: Adrian Bunk <[email protected]>
2007-12-06	exit: Use task_is_*	Matthew Wilcox	1	-49/+39
	Also restructure the loop in do_wait() Signed-off-by: Matthew Wilcox <[email protected]>
2007-11-29	wait_task_stopped(): pass correct exit_code to wait_noreap_copyout()	Scott James Remnant	1	-1/+1
	In wait_task_stopped() exit_code already contains the right value for the si_status member of siginfo, and this is simply set in the non WNOWAIT case. If you call waitid() with a stopped or traced process, you'll get the signal in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the same time, you'll get the signal << 8 \| 0x7f Pass it unchanged to wait_noreap_copyout(); we would only need to shift it and add 0x7f if we were returning it in the user status field and that isn't used for any function that permits WNOWAIT. Signed-off-by: Scott James Remnant <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Cc: Roland McGrath <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-11-29	wait_task_stopped(): don't use task_pid_nr_ns() lockless	Oleg Nesterov	1	-5/+4
	wait_task_stopped(WNOWAIT) does task_pid_nr_ns() without tasklist/rcu lock, we can read an already freed memory. Use the cached pid_t value. Signed-off-by: Oleg Nesterov <[email protected]> Looks-good-to: Roland McGrath <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-11-15	wait_task_stopped: Check p->exit_state instead of TASK_TRACED	Roland McGrath	1	-2/+1
	The original meaning of the old test (p->state > TASK_STOPPED) was "not dead", since it was before TASK_TRACED existed and before the state/exit_state split. It was a wrong correction in commit 14bf01bb0599c89fc7f426d20353b76e12555308 to make this test for TASK_TRACED instead. It should have been changed when TASK_TRACED was introducted and again when exit_state was introduced. Signed-off-by: Roland McGrath <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Kees Cook <[email protected]> Acked-by: Scott James Remnant <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19	Uninline fork.c/exit.c	Alexey Dobriyan	1	-3/+3
	Save ~650 bytes here. add/remove: 4/0 grow/shrink: 0/7 up/down: 430/-1088 (-658) function old new delta __copy_fs_struct - 202 +202 __put_fs_struct - 112 +112 __exit_fs - 58 +58 __exit_files - 58 +58 exit_files 58 2 -56 put_fs_struct 112 5 -107 exit_fs 161 2 -159 sys_unshare 774 590 -184 copy_process 4031 3840 -191 do_exit 1791 1597 -194 copy_fs_struct 202 5 -197 No difference in lmbench lat_proc tests on 2-way Opteron 246. Smaaaal degradation on UP P4 (within errors). [[email protected]: coding-style fixes] Signed-off-by: Alexey Dobriyan <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19	Use helpers to obtain task pid in printks	Pavel Emelyanov	1	-1/+1
	The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [[email protected]: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <[email protected]> Cc: Dave Airlie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19	Isolate the explicit usage of signal->pgrp	Pavel Emelyanov	1	-1/+1
	The pgrp field is not used widely around the kernel so it is now marked as deprecated with appropriate comment. The initialization of INIT_SIGNALS is trimmed because a) they are set to 0 automatically; b) gcc cannot properly initialize two anonymous (the second one is the one with the session) unions. In this particular case to make it compile we'd have to add some field initialized right before the .pgrp. This is the same patch as the 1ec320afdc9552c92191d5f89fcd1ebe588334ca one (from Cedric), but for the pgrp field. Some progress report: We have to deprecate the pid, tgid, session and pgrp fields on struct task_struct and struct signal_struct. The session and pgrp are already deprecated. The tgid value is close to being such - the worst known usage in in fs/locks.c and audit code. The pid field deprecation is mainly blocked by numerous printk-s around the kernel that print the tsk->pid to log. Signed-off-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Cedric Le Goater <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Herbert Poetzl <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19	pid namespaces: changes to show virtual ids to user	Pavel Emelyanov	1	-11/+20
	This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [[email protected]: build fix] [[email protected]: nuther build fix] [[email protected]: yet nuther build fix] [[email protected]: remove unneeded casts] Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: Alexey Dobriyan <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul Menage <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19	pid namespaces: destroy pid namespace on init's death	Sukadev Bhattiprolu	1	-1/+26
	Terminate all processes in a namespace when the reaper of the namespace is exiting. We do this by walking the pidmap of the namespace and sending SIGKILL to all processes. Signed-off-by: Sukadev Bhattiprolu <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Paul Menage <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19	pid namespaces: prepare proc_flust_task() to flush entries from multiple ↵	Pavel Emelyanov	1	-1/+1
	proc trees The first part is trivial - we just make the proc_flush_task() to operate on arbitrary vfsmount with arbitrary ids and pass the pid and global proc_mnt to it. The other change is more tricky: I moved the proc_flush_task() call in release_task() higher to address the following problem. When flushing task from many proc trees we need to know the set of ids (not just one pid) to find the dentries' names to flush. Thus we need to pass the task's pid to proc_flush_task() as struct pid is the only object that can provide all the pid numbers. But after __exit_signal() task has detached all his pids and this information is lost. This creates a tiny gap for proc_pid_lookup() to bring some dentries back to tree and keep them in hash (since pids are still alive before __exit_signal()) till the next shrink, but since proc_flush_task() does not provide a 100% guarantee that the dentries will be flushed, this is OK to do so. Signed-off-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Paul Menage <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19	pid namespaces: move exit_task_namespaces()	Pavel Emelyanov	1	-1/+1
	Make task release its namespaces after it has reparented all his children to child_reaper, but before it notifies its parent about its death. The reason to release namespaces after reparenting is that when task exits it may send a signal to its parent (SIGCHLD), but if the parent has already exited its namespaces there will be no way to decide what pid to dever to him - parent can be from different namespace. The reason to release namespace before notifying the parent it that when task sends a SIGCHLD to parent it can call wait() on this taks and release it. But releasing the mnt namespace implies dropping of all the mounts in the mnt namespace and NFS expects the task to have valid sighand pointer. Thanks to Oleg for pointing out some races that can apear and helping with patches and fixes. Signed-off-by: Pavel Emelyanov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Paul Menage <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-19	pid namespaces: rework forget_original_parent()	Oleg Nesterov	1	-18/+21
	A pid namespace is a "view" of a particular set of tasks on the system. They work in a similar way to filesystem namespaces. A file (or a process) can be accessed in multiple namespaces, but it may have a different name in each. In a filesystem, this name might be /etc/passwd in one namespace, but /chroot/etc/passwd in another. For processes, a process may have pid 1234 in one namespace, but be pid 1 in another. This allows new pid namespaces to have basically arbitrary pids, and not have to worry about what pids exist in other namespaces. This is essential for checkpoint/restart where a restarted process's pid might collide with an existing process on the system's pid. In this particular implementation, pid namespaces have a parent-child relationship, just like processes. A process in a pid namespace may see all of the processes in the same namespace, as well as all of the processes in all of the namespaces which are children of its namespace. Processes may not, however, see others which are in their parent's namespace, but not in their own. The same goes for sibling namespaces. The know issue to be solved in the nearest future is signal handling in the namespace boundary. That is, currently the namespace's init is treated like an ordinary task that can be killed from within an namespace. Ideally, the signal handling by the namespace's init should have two sides: when signaling the init from its namespace, the init should look like a real init task, i.e. receive only those signals, that is explicitly wants to; when signaling the init from one of the parent namespaces, init should look like an ordinary task, i.e. receive any signal, only taking the general permissions into account. The pid namespace was developed by Pavel Emlyanov and Sukadev Bhattiprolu and we eventually came to almost the same implementation, which differed in some details. This set is based on Pavel's patches, but it includes comments and patches that from Sukadev. Many thanks to Oleg, who reviewed the patches, pointed out many BUGs and made valuable advises on how to make this set cleaner. This patch: We have to call exit_task_namespaces() only after the exiting task has reparented all his children and is sure that no other threads will reparent theirs for it. Why this is needed is explained in appropriate patch. This one only reworks the forget_original_parent() so that after calling this a task cannot be/become parent of any other task. We check PF_EXITING instead of ->exit_state while choosing the new parent. Note that tasklits_lock acts as a barrier, everyone who takes tasklist after us (when forget_original_parent() drops it) must see PF_EXITING. The other changes are just cleanups. They just move some code from exit_notify to forget_original_parent(). It is a bit silly to declare ptrace_dead in exit_notify(), take tasklist, pass ptrace_dead to forget_original_parent(), unlock-lock-unlock tasklist, and then use ptrace_dead. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Pavel Emelyanov <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Paul Menage <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>