Age | Commit message (Collapse) | Author | Files | Lines |
|
Macro HUGETLBFS_SB is clear enough, so one statement is clearer than 3
lines statements.
Remove redundant return statements for non-return functions, which can
save lines, at least.
Signed-off-by: Chen Gang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:
CPU0 CPU1
isolate_migratepages_block()
page_count()
compound_head()
!!PageTail() == true
put_page()
tail->first_page = NULL
head = tail->first_page
alloc_pages(__GFP_COMP)
prep_compound_page()
tail->first_page = head
__SetPageTail(p);
!!PageTail() == true
<head == NULL dereferencing>
The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.
We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.
The patch introduces page->compound_head into third double word block in
front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
the rest bits are pointer to head page if bit zero is set.
The patch moves page->pmd_huge_pte out of word, just in case if an
architecture defines pgtable_t into something what can have the bit 0
set.
hugetlb_cgroup uses page->lru.next in the second tail page to store
pointer struct hugetlb_cgroup. The patch switch it to use page->private
in the second tail page instead. The space is free since ->first_page is
removed from the union.
The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
limitation, since there's now space in first tail page to store struct
hugetlb_cgroup pointer. But that's out of scope of the patch.
That means page->compound_head shares storage space with:
- page->lru.next;
- page->next;
- page->rcu_head.next;
That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word.
page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
call_rcu_lazy() is not allowed as it makes use of the bit and we can
get false positive PageTail().
[1] http://lkml.kernel.org/g/[email protected]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Replace cgroup_subsys->disabled tests in controllers with
cgroup_subsys_enabled(). cgroup_subsys_enabled() requires literal
subsys name as its parameter and thus can't be used for cgroup core
which iterates through controllers. For cgroup core, introduce and
use cgroup_ssid_enabled() which uses slower static_key_enabled() test
and can be indexed by subsys ID.
This leaves cgroup_subsys->disabled unused. Removed.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Zefan Li <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
|
|
Abandon the spinlock-protected byte counters in favor of the unlocked
page counters in the hugetlb controller as well.
Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Vladimir Davydov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
cgroup_subsys is a bit messier than it needs to be.
* The name of a subsys can be different from its internal identifier
defined in cgroup_subsys.h. Most subsystems use the matching name
but three - cpu, memory and perf_event - use different ones.
* cgroup_subsys_id enums are postfixed with _subsys_id and each
cgroup_subsys is postfixed with _subsys. cgroup.h is widely
included throughout various subsystems, it doesn't and shouldn't
have claim on such generic names which don't have any qualifier
indicating that they belong to cgroup.
* cgroup_subsys->subsys_id should always equal the matching
cgroup_subsys_id enum; however, we require each controller to
initialize it and then BUG if they don't match, which is a bit
silly.
This patch cleans up cgroup_subsys names and initialization by doing
the followings.
* cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
cgroup_subsys with _cgrp_subsys.
* With the above, renaming subsys identifiers to match the userland
visible names doesn't cause any naming conflicts. All non-matching
identifiers are renamed to match the official names.
cpu_cgroup -> cpu
mem_cgroup -> memory
perf -> perf_event
* controllers no longer need to initialize ->subsys_id and ->name.
They're generated in cgroup core and set automatically during boot.
* Redundant cgroup_subsys declarations removed.
* While updating BUG_ON()s in cgroup_init_early(), convert them to
WARN()s. BUGging that early during boot is stupid - the kernel
can't print anything, even through serial console and the trap
handler doesn't even link stack frame properly for back-tracing.
This patch doesn't introduce any behavior changes.
v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
classid handling into core").
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: "David S. Miller" <[email protected]>
Acked-by: "Rafael J. Wysocki" <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Aristeu Rozanski <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Acked-by: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Thomas Graf <[email protected]>
|
|
Most of the VM_BUG_ON assertions are performed on a page. Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.
I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.
This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.
[[email protected]: fix up includes]
Signed-off-by: Sasha Levin <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y and
CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option, system
will fail to boot.
This failure is caused by following code path:
setup_hugepagesz
hugetlb_add_hstate
hugetlb_cgroup_file_init
cgroup_add_cftypes
kzalloc <--slab is *not available* yet
For this path, slab is not available yet, so memory allocated will be
failed, and cause WARN_ON() in hugetlb_cgroup_file_init().
So I move hugetlb_cgroup_file_init() into hugetlb_init().
[[email protected]: tweak coding-style, remove pointless __init on inlined function]
[[email protected]: fix warning]
Signed-off-by: Jianguo Wu <[email protected]>
Signed-off-by: Jiang Liu <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
migration
With HugeTLB pages, hugetlb cgroup is uncharged in compound page
destructor. Since we are holding a hugepage reference, we can be sure
that old page won't get uncharged till the last put_page().
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: David Rientjes <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the control files for hugetlb controller
[[email protected]: s/CONFIG_CGROUP_HUGETLB_RES_CTLR/CONFIG_MEMCG_HUGETLB/g]
[[email protected]: s/CONFIG_MEMCG_HUGETLB/CONFIG_CGROUP_HUGETLB/]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: David Rientjes <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hillf Danton <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the charge and uncharge routines for hugetlb cgroup. We do cgroup
charging in page alloc and uncharge in compound page destructor.
Assigning page's hugetlb cgroup is protected by hugetlb_lock.
[[email protected]: add huge_page_order check to avoid incorrect uncharge]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: David Rientjes <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the hugetlb cgroup pointer to 3rd page lru.next. This limit the usage
to hugetlb cgroup to only hugepages with 3 or more normal pages. I guess
that is an acceptable limitation.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: David Rientjes <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hillf Danton <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Implement a new controller that allows us to control HugeTLB allocations.
The extension allows to limit the HugeTLB usage per control group and
enforces the controller limit during page fault. Since HugeTLB doesn't
support page reclaim, enforcing the limit at page fault time implies that,
the application will get SIGBUS signal if it tries to access HugeTLB pages
beyond its limit. This requires the application to know beforehand how
much HugeTLB pages it would require for its use.
The charge/uncharge calls will be added to HugeTLB code in later patch.
Support for cgroup removal will be added in later patches.
[[email protected]: s/CONFIG_CGROUP_HUGETLB_RES_CTLR/CONFIG_MEMCG_HUGETLB/g]
[[email protected]: s/CONFIG_MEMCG_HUGETLB/CONFIG_CGROUP_HUGETLB/g]
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hillf Danton <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|