| Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a slight change in the namespace cgroup subsystem api.
The change is that previously when cgroup_clone() was called (currently
only from the unshare path in ns_proxy cgroup, you'd get a new group named
"node_$pid" whereas now you'll get a group named after just your pid.)
The only users who would notice it are those who are using the ns_proxy
cgroup subsystem to auto-create cgroups when namespaces are unshared -
something of an experimental feature, which I think really needs more
complete container/namespace support in order to be useful. I suspect the
only users are Cedric and Serge, or maybe a few others on
[email protected]. And in fact it would only be
noticed by the users who make the assumption about how the name is
generated, rather than getting it from the /proc/<pid>/cgroups file for
the process in question.
Whether the change is actually needed or not I'm fairly agnostic on, but I
guess it is more elegant to just use the pid as the new group name rather
than adding a fairly arbitrary "node_" prefix on the front.
[[email protected]: provided changelog]
Signed-off-by: Cedric Le Goater <[email protected]>
Cc: "Paul Menage" <[email protected]>
Cc: "Serge E. Hallyn" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a new BDI capability flag: BDI_CAP_NO_ACCT_WB. If this flag is
set, then don't update the per-bdi writeback stats from
test_set_page_writeback() and test_clear_page_writeback().
Misc cleanups:
- convert bdi_cap_writeback_dirty() and friends to static inline functions
- create a flag that includes all three dirty/writeback related flags,
since almst all users will want to have them toghether
Signed-off-by: Miklos Szeredi <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove the mem_cgroup member from mm_struct and instead adds an owner.
This approach was suggested by Paul Menage. The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined. It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.
A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.
This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes. The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.
I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.
This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.
After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.
Signed-off-by: Balbir Singh <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Sudhir Kumar <[email protected]>
Cc: YAMAMOTO Takashi <[email protected]>
Cc: Hirokazu Takahashi <[email protected]>
Cc: David Rientjes <[email protected]>,
Cc: Balbir Singh <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Reviewed-by: Paul Menage <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Introduce a read_seq() helper in cftype, which uses seq_file to print out
lists. Use it in the devices cgroup. Also split devices.allow into two
files, so now devices.deny and devices.allow are the ones to use to manipulate
the whitelist, while devices.list outputs the cgroup's current whitelist.
Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: Paul Menage <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now we can run through the hash table instead of running through the
linked-list.
Signed-off-by: Li Zefan <[email protected]>
Reviewed-by: Paul Menage <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We are at system boot and there is only 1 cgroup group (i,e, init_css_set), so
we don't need to run through the css_set linked list. Neither do we need to
run through the task list, since no processes have been created yet.
Also referring to a comment in cgroup.h:
struct css_set
{
...
/*
* Set of subsystem states, one for each subsystem. This array
* is immutable after creation apart from the init_css_set
* during subsystem registration (at boot time).
*/
struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
}
Signed-off-by: Li Zefan <[email protected]>
Reviewed-by: Paul Menage <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When we attach a process to a different cgroup, the css_set linked-list will
be run through to find a suitable existing css_set to use. This patch
implements a hash table for better performance.
The following benchmarks have been tested:
For N in 1, 5, 10, 50, 100, 500, 1000, create N cgroups with one sleeping
task in each, and then move an additional task through each cgroup in
turn.
Here is a test result:
N Loop orig - Time(s) hash - Time(s)
----------------------------------------------
1 10000 1.201231728 1.196311177
5 2000 1.065743872 1.040566424
10 1000 0.991054735 0.986876440
50 200 0.976554203 0.969608733
100 100 0.998504680 0.969218270
500 20 1.157347764 0.962602963
1000 10 1.619521852 1.085140172
Signed-off-by: Li Zefan <[email protected]>
Reviewed-by: Paul Menage <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Trigger callback can be used to receive a kick-up from the user space. The
string written is ignored.
The cftype->private is used for multiplexing events.
Signed-off-by: Pavel Emelyanov <[email protected]>
Acked-by: Paul Menage <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Balbir Singh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is a race between create_proc_entry() and the assignment of file ops.
proc_create() is invented to fix it.
Signed-off-by: Li Zefan <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is called by cgroup_init() and cgroup_init_early() only, which are
annotated with __init.
Signed-off-by: Li Zefan <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
These patches add cgroups read_s64 and write_s64 control file methods (the
signed equivalent of read_u64/write_u64) and use them to implement the
cpu.rt_runtime_us control file in the CFS cgroup subsystem.
This patch:
These are the signed equivalents of the read_u64/write_u64 methods
Signed-off-by: Paul Menage <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The "releasable" control file provided by the cgroup framework exports the
state of a per-cgroup flag that's related to the notify-on-release feature.
This isn't really generally useful, unless you're trying to debug this
particular feature of cgroups.
This patch moves the "releasable" file to the cgroup_debug subsystem.
Signed-off-by: Paul Menage <[email protected]>
Cc: "Li Zefan" <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: "YAMAMOTO Takashi" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Adds a new type of supported control file representation, a map from strings
to u64 values.
Each map entry is printed as a line in a similar format to /proc/vmstat, i.e.
"$key $value\n"
Signed-off-by: Paul Menage <[email protected]>
Cc: "Li Zefan" <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: "YAMAMOTO Takashi" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This removes the need for people to remember to pass the -n flag to echo when
writing values to cgroup control files.
Signed-off-by: Paul Menage <[email protected]>
Cc: "Li Zefan" <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: "YAMAMOTO Takashi" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Several people have justifiably complained that the "_uint" suffix is
inappropriate for functions that handle u64 values, so this patch just renames
all these functions and their users to have the suffic _u64.
[[email protected]: build fix]
Signed-off-by: Paul Menage <[email protected]>
Cc: "Li Zefan" <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: "YAMAMOTO Takashi" <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix a code warning: symbol 'p' shadows an earlier one
This is a reincarnation of Harvey Harrison's patch:
cpuset: sparse warnings in cpuset.c
Independently, Cliff Wickman moved the affected code,
from kernel/cpuset.c to kernel/cgroup.c, in his patch:
cpusets: update_cpumask revision
Signed-off-by: Paul Jackson <[email protected]>
Cc: Harvey Harrison <[email protected]>
Cc: Cliff Wickman <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make the needlessly global cgroup_enable_task_cg_lists() static.
Signed-off-by: Adrian Bunk <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When I ran a test program to fork mass processes and at the same time
'cat /cgroup/tasks', I got the following oops:
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:72!
invalid opcode: 0000 [#1] SMP
Pid: 4178, comm: a.out Not tainted (2.6.25-rc9 #72)
...
Call Trace:
[<c044a5f9>] ? cgroup_exit+0x55/0x94
[<c0427acf>] ? do_exit+0x217/0x5ba
[<c0427ed7>] ? do_group_exit+0.65/0x7c
[<c0427efd>] ? sys_exit_group+0xf/0x11
[<c0404842>] ? syscall_call+0x7/0xb
[<c05e0000>] ? init_cyrix+0x2fa/0x479
...
EIP: [<c04df671>] list_del+0x35/0x53 SS:ESP 0068:ebc7df4
---[ end trace caffb7332252612b ]---
Fixing recursive fault but reboot is needed!
After digging into the code and debugging, I finlly found out a race
situation:
do_exit()
->cgroup_exit()
->if (!list_empty(&tsk->cg_list))
list_del(&tsk->cg_list);
cgroup_iter_start()
->cgroup_enable_task_cg_list()
->list_add(&tsk->cg_list, ..);
In this case the list won't be deleted though the process has exited.
We got two bug reports in the past, which seem to be the same bug as
this one:
http://lkml.org/lkml/2008/3/5/332
http://lkml.org/lkml/2007/10/17/224
Actually sometimes I got oops on list_del, sometimes oops on list_add.
And I can change my test program a bit to trigger other oops.
The patch has been tested both on x86_32 and x86_64.
Signed-off-by: Li Zefan <[email protected]>
Acked-by: Paul Menage <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Extend the /proc/<pid>/cgroup file to include the appropriate hierarchy ID on
each line.
Currently this ID isn't really needed since a hierarchy can be completely
identified by the set of subsystems bound to it, but this is likely to change
in the near future in order to support stateless subsystems and
merging/rebinding of subsystems. Getting this change into 2.6.25 reduces the
need for an API change later.
Signed-off-by: Paul Menage <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The effects of cgroup_disable=foo are:
- foo isn't auto-mounted if you mount all cgroups in a single hierarchy
- foo isn't visible as an individually mountable subsystem
As a result there will only ever be one call to foo->create(), at init time;
all processes will stay in this group, and the group will never be mounted on
a visible hierarchy. Any additional effects (e.g. not allocating metadata)
are up to the foo subsystem.
This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
but it could easily be extended to do so if any of the early_init systems
wanted it - I think it would just involve some nastier parameter processing
since it would occur before the command-line argument parser had been run.
Hugh said:
Ballpark figures, I'm trying to get this question out rather than
processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
to the affected paths, booting with cgroup_disable=memory cuts that back to
1% overhead (due to slightly bigger struct page).
I'm no expert on distros, they may have no interest whatever in
CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
without it, or apply the cgroup_disable=memory patches.
Unix bench's execl test result on x86_64 was
== just after boot without mounting any cgroup fs.==
mem_cgorup=off : Execl Throughput 43.0 3150.1 732.6
mem_cgroup=on : Execl Throughput 43.0 2932.6 682.0
==
[[email protected]: fix boot option parsing]
Signed-off-by: Balbir Singh <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Sudhir Kumar <[email protected]>
Cc: YAMAMOTO Takashi <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The documentation says the default value of notify_on_release of a child
cgroup is inherited from its parent, which is reasonable, but the
implementation just sets the flag disabled.
Signed-off-by: Li Zefan <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Li Zefan <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The list head res->tasks gets initialized twice in find_css_set().
Signed-off-by: Li Zefan <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Cgroup uses unsigned long for subsys bitops, not unsigned long long.
Signed-off-by: Li Zefan <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
opts.release_agent is not kfree()ed in all necessary places.
Signed-off-by: Li Zefan <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
fix:
- comments about need_forkexit_callback
- comments about release agent
- typo and comment style, etc.
Signed-off-by: Li Zefan <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There's one place that works with task pids - its the "tasks" file in cgroups.
The read/write handlers assume, that the pid values go to/come from the user
space and thus it is a virtual pid, i.e. the pid as it is seen from inside a
namespace.
Tune the code accordingly.
Signed-off-by: Pavel Emelyanov <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch corrects a situation that occurs when one disables all the cpus in
a cpuset.
Currently, the disabled (cpu-less) cpuset inherits the cpus of its parent,
which is incorrect because it may then overlap its cpu-exclusive sibling.
Tasks of an empty cpuset should be moved to the cpuset which is the parent of
their current cpuset. Or if the parent cpuset has no cpus, to its parent,
etc.
And the empty cpuset should be released (if it is flagged notify_on_release).
Depends on the cgroup_scan_tasks() function (proposed by David Rientjes) to
iterate through all tasks in the cpu-less cpuset. We are deliberately
avoiding a walk of the tasklist.
[[email protected]: coding-style fixes]
Signed-off-by: Cliff Wickman <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Provide cgroup_scan_tasks(), which iterates through every task in a cgroup,
calling a test function and a process function for each. And call the process
function without holding the css_set_lock lock.
The idea is David Rientjes', predicting that such a function will make it much
easier in the future to extend things that require access to each task in a
cgroup without holding the lock,
[[email protected]: cleanup]
[[email protected]: coding-style fixes]
Signed-off-by: Cliff Wickman <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Paul Jackson <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a handler "pre_destroy" to cgroup_subsys. It is called before
cgroup_rmdir() checks all subsys's refcnt.
I think this is useful for subsys which have some extra refs even if there
are no tasks in cgroup. By adding pre_destroy(), the kernel keeps the rule
"destroy() against subsystem is called only when refcnt=0." and allows css
ref to be used by other objects than tasks.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
cgroup_is_releasable() and notify_on_release() should be static,
not global inline.
Signed-off-by: Adrian Bunk <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Move the calls to the cgroup subsystem destroy() methods from
cgroup_rmdir() to cgroup_diput(). This allows control file reads and
writes to access their subsystem state without having to be concerned with
locking against cgroup destruction - the control file dentry will keep the
cgroup and its subsystem state objects alive until the file is closed.
The documentation is updated to reflect the changed semantics of destroy();
additionally the locking comments for destroy() and some other methods were
clarified and decrustified.
Signed-off-by: Paul Menage <[email protected]>
Cc: Paul Jackson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Simplify the space stripping code in cgroup file write.
[[email protected]: s/BUG_ON/BUILD_BUG_ON/]
Signed-off-by: Paul Jackson <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Coding style fix - one line conditionals don't get braces.
Signed-off-by: Paul Jackson <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch removes dead code spotted by the Coverity checker
(look at the "(nbytes >= PATH_MAX)" check).
Signed-off-by: Adrian Bunk <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When I boot with the 'quiet' parameter, I see on the screen:
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 39.036026] Initializing cgroup subsys cpuacct
[ 39.036080] Initializing cgroup subsys debug
[ 39.036118] Initializing cgroup subsys ns
This patch lowers the priority of those messages, adds a "cgroup: " prefix
to another couple of printks and kills the useless reference to the source
file.
Signed-off-by: Diego Calleja <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Jeff Garzik <[email protected]>
|
|
Replace "cont" with "cgrp" and other misc renaming
This patch finishes some of the names that got missed in the great
"task containers" -> "control groups" rename. Primarily it renames
the local variable "cont" to "cgrp" in a number of places, and renames
the CONT_* enum members to CGRP_*.
This patch is not intended to have any effect on the generated code;
the output of "objdump -d kernel/cgroup.o" is unchanged.
Signed-off-by: Paul Menage <[email protected]>
Acked-by: Paul Jackson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are two places that do so - the cgroups subsystem and the autofs
code.
Signed-off-by: Pavel Emelyanov <[email protected]>
Cc: Ian Kent <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263. The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)
This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface. A new set of cgroup operations are
registered with commands and attributes. It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.
The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add. Currently
user space requests for statistics by passing the cgroup file
descriptor. Statistics about the state of all the tasks in the cgroup
is returned to user space.
TODO's/NOTE:
This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.
Sample output
# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
If the approach looks good, I'll enhance and post the user space utility for
the same
Feedback, comments, test results are always welcome!
[[email protected]: build fix]
Signed-off-by: Balbir Singh <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Jay Lan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the following files to the cgroup filesystem:
notify_on_release - configures/reports whether the cgroup subsystem should
attempt to run a release script when this cgroup becomes unused
release_agent - configures/reports the release agent to be used for this
hierarchy (top level in each hierarchy only)
releasable - reports whether this cgroup would have been auto-released if
notify_on_release was true and a release agent was configured (mainly useful
for debugging)
To avoid locking issues, invoking the userspace release agent is done via a
workqueue task; cgroups that need to have their release agents invoked by
the workqueue task are linked on to a list.
[[email protected]: Need to include kmod.h]
Signed-off-by: Paul Menage <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Cedric Le Goater <[email protected]>
Signed-off-by: Paul Jackson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Replace the struct css_set embedded in task_struct with a pointer; all tasks
that have the same set of memberships across all hierarchies will share a
css_set object, and will be linked via their css_sets field to the "tasks"
list_head in the css_set.
Assuming that many tasks share the same cgroup assignments, this reduces
overall space usage and keeps the size of the task_struct down (three pointers
added to task_struct compared to a non-cgroups kernel, no matter how many
subsystems are registered).
[[email protected]: fix a printk]
[[email protected]: build fix]
Signed-off-by: Paul Menage <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Cedric Le Goater <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Cedric Le Goater <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add:
/proc/cgroups - general system info
/proc/*/cgroup - per-task cgroup membership info
[[email protected]: cgroups: bdi init hooks]
Signed-off-by: Paul Menage <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Cedric Le Goater <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add support for cgroup_clone(), a way to create new cgroups intended to
be used for systems such as namespace unsharing. A new subsystem callback,
post_clone(), is added to allow subsystems to automatically configure cloned
cgroups.
Signed-off-by: Paul Menage <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Cedric Le Goater <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This adds the necessary hooks to the fork() and exit() paths to ensure
that new children inherit their parent's cgroup assignments, and that
exiting processes release reference counts on their cgroups.
Signed-off-by: Paul Menage <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Cedric Le Goater <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add write_uint() helper method for cgroup subsystems
This helper is analagous to the read_uint() helper method for
reporting u64 values to userspace. It's designed to reduce the amount
of boilerplate requierd for creating new cgroup subsystems.
Signed-off-by: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the per-directory "tasks" file for cgroupfs mounts; this allows the
user to determine which tasks are members of a cgroup by reading a
cgroup's "tasks", and to move a task into a cgroup by writing its pid to
its "tasks".
Signed-off-by: Paul Menage <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Cedric Le Goater <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others. These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
management systems is substantially reduced, since it doesn't need
to provide process grouping/containment, hence improving their
chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Paul Jackson <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Cedric Le Goater <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|