| Age | Commit message (Collapse) | Author | Files | Lines |
|
pid_ns_for_children set by a task is known only to the task itself, and
it's impossible to identify it from outside.
It's a big problem for checkpoint/restore software like CRIU, because it
can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
their work.
This patch solves the problem, and it exposes pid_ns_for_children to ns
directory in standard way with the name "pid_for_children":
~# ls /proc/5531/ns -l | grep pid
lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836]
lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286]
Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul Moore <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Serge Hallyn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Expose task pid_ns_for_children to userspace".
pid_ns_for_children set by a task is known only to the task itself, and
it's impossible to identify it from outside.
It's a big problem for checkpoint/restore software like CRIU, because it
can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
their work. If they have a custom pid_ns_for_children before dump, they
must have the same ns after restore. Otherwise, restored task bumped
into enviroment it does not expect.
This patchset solves the problem. It exposes pid_ns_for_children to ns
directory in standard way with the name "pid_for_children":
~# ls /proc/5531/ns -l | grep pid
lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836]
lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286]
This patch (of 2):
Make possible to have link content prefix yyy different from the link
name xxx:
$ readlink /proc/[pid]/ns/xxx
yyy:[4026531838]
This will be used in next patch.
Link: http://lkml.kernel.org/r/149201120318.6007.7362655181033883000.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Acked-by: Andrei Vagin <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul Moore <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Serge Hallyn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.
In a future we will use this interface to dump and restore nested
namespaces.
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
Return -EPERM if an owning user namespace is outside of a process
current user namespace.
v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
This special cases was removed from this version. There is nothing
outside of init_user_ns, so we can return EPERM.
v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
Introduce the ability to create new cgroup namespace. The newly created
cgroup namespace remembers the cgroup of the process at the point
of creation of the cgroup namespace (referred as cgroupns-root).
The main purpose of cgroup namespace is to virtualize the contents
of /proc/self/cgroup file. Processes inside a cgroup namespace
are only able to see paths relative to their namespace root
(unless they are moved outside of their cgroupns-root, at which point
they will see a relative path from their cgroupns-root).
For a correctly setup container this enables container-tools
(like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
containers without leaking system level cgroup hierarchy to the task.
This patch only implements the 'unshare' part of the cgroupns.
Signed-off-by: Aditya Kali <[email protected]>
Signed-off-by: Serge Hallyn <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().
This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot. The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).
Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.
As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
from ns_get_path().
Signed-off-by: Al Viro <[email protected]>
|
|
a) make get_proc_ns() return a pointer to struct ns_common
b) mirror ns_ops in dentry->d_fsdata of ns dentries, so that
is_mnt_ns_file() could get away with fewer dereferences.
That way struct proc_ns becomes invisible outside of fs/proc/*.c
Signed-off-by: Al Viro <[email protected]>
|
|
take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum()
Signed-off-by: Al Viro <[email protected]>
|
|
We can do that now. And kill ->inum(), while we are at it - all instances
are identical.
Signed-off-by: Al Viro <[email protected]>
|
|
Split the proc namespace stuff out into linux/proc_ns.h.
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
cc: Serge E. Hallyn <[email protected]>
cc: Eric W. Biederman <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|