| Age | Commit message (Collapse) | Author | Files | Lines |
|
Remove the mem_cgroup member from mm_struct and instead adds an owner.
This approach was suggested by Paul Menage. The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined. It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.
A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.
This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes. The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.
I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.
This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.
After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.
Signed-off-by: Balbir Singh <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Sudhir Kumar <[email protected]>
Cc: YAMAMOTO Takashi <[email protected]>
Cc: Hirokazu Takahashi <[email protected]>
Cc: David Rientjes <[email protected]>,
Cc: Balbir Singh <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Reviewed-by: Paul Menage <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Nothing uses mem_cgroup_uncharge apart from mem_cgroup_uncharge_page, (a
trivial wrapper around it) and mem_cgroup_end_migration (which does the same
as mem_cgroup_uncharge_page). And it often ends up having to lock just to let
its caller unlock. Remove it (but leave the silly locking until a later
patch).
Moved mem_cgroup_cache_charge next to mem_cgroup_charge in memcontrol.h.
Signed-off-by: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hirokazu Takahashi <[email protected]>
Cc: YAMAMOTO Takashi <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Replace free_hot_cold_page's VM_BUG_ON(page_get_page_cgroup(page)) by a "Bad
page state" and clear: most users don't have CONFIG_DEBUG_VM on, and if it
were set here, it'd likely cause corruption when the page is reused.
Don't use page_assign_page_cgroup to clear it: that should be private to
memcontrol.c, and always called with the lock taken; and memmap_init_zone
doesn't need it either - like page->mapping and other pointers throughout the
kernel, Linux assumes pointers in zeroed structures are NULL pointers.
Instead use page_reset_bad_cgroup, added to memcontrol.h for this only.
Signed-off-by: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hirokazu Takahashi <[email protected]>
Cc: YAMAMOTO Takashi <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Each caller of mem_cgroup_move_lists is having to use page_get_page_cgroup:
it's more convenient if it acts upon the page itself not the page_cgroup; and
in a later patch this becomes important to handle within memcontrol.c.
Signed-off-by: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hirokazu Takahashi <[email protected]>
Cc: YAMAMOTO Takashi <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
vm_match_cgroup is a perverse name for a macro to match mm with cgroup: rename
it mm_match_cgroup, matching mm_init_cgroup and mm_free_cgroup.
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hirokazu Takahashi <[email protected]>
Cc: YAMAMOTO Takashi <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rename Memory Controller to Memory Resource Controller. Reflect the same
changes in the CONFIG definition for the Memory Resource Controller. Group
together the config options for Resource Counters and Memory Resource
Controller.
Signed-off-by: Balbir Singh <[email protected]>
Cc: Paul Menage <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix build failure on sparc:
In file included from include/linux/mm.h:39,
from include/linux/memcontrol.h:24,
from include/linux/swap.h:8,
from include/linux/suspend.h:7,
from init/do_mounts.c:6:
include/asm/pgtable.h:344: warning: parameter names (without
types) in function declaration
include/asm/pgtable.h:345: warning: parameter names (without
types) in function declaration
include/asm/pgtable.h:346: error: expected '=', ',', ';', 'asm' or
'__attribute__' before '___f___swp_entry'
viro sayeth:
I've run allmodconfig builds on a bunch of target, FWIW (essentially the
same patch). Note that these includes are recent addition caused by added
inline function that had since then become a define. So while I agree with
your comments in general, in _this_ case it's pretty safe.
The commit that had done it is 3062fc67dad01b1d2a15d58c709eff946389eca4
("memcontrol: move mm_cgroup to header file") and the switch to #define
is in commit 60c12b1202a60eabb1c61317e5d2678fcea9893f ("memcontrol: add
vm_match_cgroup()") (BTW, that probably warranted mentioning in the
changelog of the latter).
Cc: Adrian Bunk <[email protected]>
Cc: Robert Reif <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
mm_cgroup() is exclusively used to test whether an mm's mem_cgroup pointer
is pointing to a specific cgroup. Instead of returning the pointer, we can
just do the test itself in a new macro:
vm_match_cgroup(mm, cgroup)
returns non-zero if the mm's mem_cgroup points to cgroup. Otherwise it
returns zero.
Signed-off-by: David Rientjes <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Adrian Bunk <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Based on the discussion at http://lkml.org/lkml/2007/12/20/383, it was felt
that control_type might not be a good thing to implement right away. We
can add this flexibility at a later point when required.
Signed-off-by: Balbir Singh <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
number of pages to be scanned per cgroup
Define function for calculating the number of scan target on each Zone/LRU.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
priority in memory cgroup
Functions to remember reclaim priority per cgroup (as zone->prev_priority)
[[email protected]: build fixes]
[[email protected]: more build fixes]
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
active/inactive imbalance per cgroup
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
mapper_ratio per cgroup
Define function for calculating mapped_ratio in memory cgroup.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
While using memory control cgroup, page-migration under it works as following.
==
1. uncharge all refs at try to unmap.
2. charge regs again remove_migration_ptes()
==
This is simple but has following problems.
==
The page is uncharged and charged back again if *mapped*.
- This means that cgroup before migration can be different from one after
migration
- If page is not mapped but charged as page cache, charge is just ignored
(because not mapped, it will not be uncharged before migration)
This is memory leak.
==
This patch tries to keep memory cgroup at page migration by increasing
one refcnt during it. 3 functions are added.
mem_cgroup_prepare_migration() --- increase refcnt of page->page_cgroup
mem_cgroup_end_migration() --- decrease refcnt of page->page_cgroup
mem_cgroup_page_migration() --- copy page->page_cgroup from old page to
new page.
During migration
- old page is under PG_locked.
- new page is under PG_locked, too.
- both old page and new page is not on LRU.
These 3 facts guarantee that page_cgroup() migration has no race.
Tested and worked well in x86_64/fake-NUMA box.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Creates a helper function to return non-zero if a task is a member of a
memory controller:
int task_in_mem_cgroup(const struct task_struct *task,
const struct mem_cgroup *mem);
When the OOM killer is constrained by the memory controller, the exclusion
of tasks that are not a member of that controller was previously misplaced
and appeared in the badness scoring function. It should be excluded
during the tasklist scan in select_bad_process() instead.
[[email protected]: build fix]
Cc: Christoph Lameter <[email protected]>
Cc: Balbir Singh <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Inline functions must preceed their use, so mm_cgroup() should be defined
in linux/memcontrol.h.
include/linux/memcontrol.h:48: warning: 'mm_cgroup' declared inline after
being called
include/linux/memcontrol.h:48: warning: previous declaration of
'mm_cgroup' was here
[[email protected]: build fix]
[[email protected]: nuther build fix]
Cc: Balbir Singh <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Nick Piggin pointed out that swap cache and page cache addition routines
could be called from non GFP_KERNEL contexts. This patch makes the
charging routine aware of the gfp context. Charging might fail if the
cgroup is over it's limit, in which case a suitable error is returned.
This patch was tested on a Powerpc box. I am still looking at being able
to test the path, through which allocations happen in non GFP_KERNEL
contexts.
[[email protected]: problem with ZONE_MOVABLE]
Signed-off-by: Balbir Singh <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make page_referenced() cgroup aware. Without this patch, page_referenced()
can cause a page to be skipped while reclaiming pages. This patch ensures
that other cgroups do not hold pages in a particular cgroup hostage. It
is required to ensure that shared pages are freed from a cgroup when they
are not actively referenced from the cgroup that brought them in
Signed-off-by: Balbir Singh <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Choose if we want cached pages to be accounted or not. By default both are
accounted for. A new set of tunables are added.
echo -n 1 > mem_control_type
switches the accounting to account for only mapped pages
echo -n 3 > mem_control_type
switches the behaviour back
[[email protected]: mm/memcontrol.c: clenups]
[[email protected]: fix sparc32 build]
Signed-off-by: Balbir Singh <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Adrian Bunk <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Out of memory handling for cgroups over their limit. A task from the
cgroup over limit is chosen using the existing OOM logic and killed.
TODO:
1. As discussed in the OLS BOF session, consider implementing a user
space policy for OOM handling.
[[email protected]: fix build due to oom-killer changes]
Signed-off-by: Pavel Emelianov <[email protected]>
Signed-off-by: Balbir Singh <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the page_cgroup to the per cgroup LRU. The reclaim algorithm has
been modified to make the isolate_lru_pages() as a pluggable component. The
scan_control data structure now accepts the cgroup on behalf of which
reclaims are carried out. try_to_free_pages() has been extended to become
cgroup aware.
[[email protected]: fix warning]
[[email protected]: initialize all scan_control's isolate_pages member]
[[email protected]: make do_try_to_free_pages() static]
[[email protected]: memcgroup: fix try_to_free order]
[[email protected]: this unlock_page_cgroup() is unnecessary]
Signed-off-by: Pavel Emelianov <[email protected]>
Signed-off-by: Balbir Singh <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Lee Schermerhorn <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the accounting hooks. The accounting is carried out for RSS and Page
Cache (unmapped) pages. There is now a common limit and accounting for both.
The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
time. Page cache is accounted at add_to_page_cache(),
__delete_from_page_cache(). Swap cache is also accounted for.
Each page's page_cgroup is protected with the last bit of the
page_cgroup pointer, this makes handling of race conditions involving
simultaneous mappings of a page easier. A reference count is kept in the
page_cgroup to deal with cases where a page might be unmapped from the RSS
of all tasks, but still lives in the page cache.
Credits go to Vaidyanathan Srinivasan for helping with reference counting work
of the page cgroup. Almost all of the page cache accounting code has help
from Vaidyanathan Srinivasan.
[[email protected]: fix swapoff breakage]
[[email protected]: fix locking]
Signed-off-by: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Balbir Singh <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Basic setup routines, the mm_struct has a pointer to the cgroup that
it belongs to and the the page has a page_cgroup associated with it.
Signed-off-by: Pavel Emelianov <[email protected]>
Signed-off-by: Balbir Singh <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Setup the memory cgroup and add basic hooks and controls to integrate
and work with the cgroup.
Signed-off-by: Balbir Singh <[email protected]>
Cc: Pavel Emelianov <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Kirill Korotaev <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|