blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2014-08-06	mm/hwpoison-inject.c: remove unnecessary null test before ↵	Fabian Frederick	1	-2/+1
	debugfs_remove_recursive Fix checkpatch warning: "WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required" Signed-off-by: Fabian Frederick <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: export NR_SHMEM via sysinfo(2) / si_meminfo() interfaces	Rafael Aquini	3	-3/+4
	Historically, we exported shared pages to userspace via sysinfo(2) sharedram and /proc/meminfo's "MemShared" fields. With the advent of tmpfs, from kernel v2.4 onward, that old way for accounting shared mem was deemed inaccurate and we started to export a hard-coded 0 for sysinfo.sharedram. Later on, during the 2.6 timeframe, "MemShared" got re-introduced to /proc/meminfo re-branded as "Shmem", but we're still reporting sysinfo.sharedmem as that old hard-coded zero, which makes the "shared memory" report inconsistent across interfaces. This patch leverages the addition of explicit accounting for pages used by shmem/tmpfs -- "4b02108 mm: oom analysis: add shmem vmstat" -- in order to make the users of sysinfo(2) and si_meminfo*() friends aware of that vmstat entry and make them report it consistently across the interfaces, as well to make sysinfo(2) returned data consistent with our current API documentation states. Signed-off-by: Rafael Aquini <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: catch memory commitment underflow	Konstantin Khlebnikov	1	-0/+5
	Print a warning (if CONFIG_DEBUG_VM=y) when memory commitment becomes too negative. This shouldn't happen any more - the previous two patches fixed the committed_as underflow issues. [[email protected]: use VM_WARN_ONCE, per Dave] Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	shmem: update memory reservation on truncate	Konstantin Khlebnikov	1	-0/+17
	A shared anonymous mapping created without MAP_NORESERVE holds memory reservation for whole range of shmem segment. Usually there is no way to change its size, but /proc/<pid>/map_files/... (available if CONFIG_CHECKPOINT_RESTORE=y) allows that. This patch adjusts the memory reservation in shmem_setattr(). Signed-off-by: Konstantin Khlebnikov <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	shmem: fix double uncharge in __shmem_file_setup()	Konstantin Khlebnikov	1	-6/+6
	If __shmem_file_setup() fails on struct file allocation it uncharges memory commitment twice: first by shmem_unacct_size() and second time implicitly in shmem_evict_inode() when it kills the newly created inode. This patch removes shmem_unacct_size() from error path if the inode was already there. Signed-off-by: Konstantin Khlebnikov <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	include/linux/mmdebug.h: add VM_WARN_ONCE()	Andrew Morton	1	-0/+2
	It was missing... Cc: Konstantin Khlebnikov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm, vmalloc: constify allocation mask	David Rientjes	1	-4/+4
	tmp_mask in the __vmalloc_area_node() iteration never changes so it can be moved into function scope and marked with const. This causes the movl and orl to only be done once per call rather than area->nr_pages times. nested_gfp can also be marked const. Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm/vmalloc.c: add a schedule point to vmalloc()	Eric Dumazet	1	-0/+2
	It is not uncommon on busy servers to get stuck hundred of ms in vmalloc() calls (like file descriptor expansions). Add a cond_resched() to __vmalloc_area_node() to be gentle to other tasks. [[email protected]: only do it for __GFP_WAIT, per David] Signed-off-by: Eric Dumazet <[email protected]> Cc: Hugh Dickins <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: update the description for madvise_remove	Wang Sheng-Hui	1	-3/+0
	Currently, we have more filesystems supporting fallocate, e.g ext4/btrfs. Remove the outdated comment for madvise_remove. Signed-off-by: Wang Sheng-Hui <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm tracing: tell mm_migrate_pages event about numa_misplaced	Max Asbock	1	-0/+1
	The mm_migrate_pages trace event reports a reason for the migration, typically as a symbolic string. The exception is the reason MR_NUMA_MISPLACED for which it just displays the numeric value: mm_migrate_pages: nr_succeeded=1 nr_failed=0 mode=MIGRATE_ASYNC reason=0x5 This patch makes the output consistent by introducing a string value for MR_NUMA_MISPLACED. The event is then reported as: mm_migrate_pages: nr_succeeded=1 nr_failed=0 mode=MIGRATE_ASYNC reason=numa_misplaced Signed-off-by: Max Asbock <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Acked-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: vmscan: clean up struct scan_control	Johannes Weiner	1	-53/+46
	Reorder the members by input and output, then turn the individual integers for may_writepage, may_unmap, may_swap, compaction_ready, hibernation_mode into bit fields to save stack space: +72/-296 -224 kswapd 104 176 +72 try_to_free_pages 80 56 -24 try_to_free_mem_cgroup_pages 80 56 -24 shrink_all_memory 88 64 -24 reclaim_clean_pages_from_list 168 144 -24 mem_cgroup_shrink_node_zone 104 80 -24 __zone_reclaim 176 152 -24 balance_pgdat 152 - -152 Signed-off-by: Johannes Weiner <[email protected]> Suggested-by: Mel Gorman <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: vmscan: move swappiness out of scan_control	Johannes Weiner	1	-14/+13
	Swappiness is determined for each scanned memcg individually in shrink_zone() and is not a parameter that applies throughout the reclaim scan. Move it out of struct scan_control to prevent accidental use of a stale value. Signed-off-by: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: vmscan: remove all_unreclaimable()	Johannes Weiner	1	-25/+24
	Direct reclaim currently calls shrink_zones() to reclaim all members of a zonelist, and if that wasn't successful it does another pass through the same zonelist to check overall reclaimability. Just check reclaimability in shrink_zones() directly and propagate the result through the return value. Then remove all_unreclaimable(). Signed-off-by: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: vmscan: rework compaction-ready signaling in direct reclaim	Johannes Weiner	1	-38/+32
	Page reclaim for a higher-order page runs until compaction is ready, then aborts and signals this situation through the return value of shrink_zones(). This is an oddly specific signal to encode in the return value of shrink_zones(), though, and can be quite confusing. Introduce sc->compaction_ready and signal the compactability of the zones out-of-band to free up the return value of shrink_zones() for actual zone reclaimability. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Michal Hocko <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: vmscan: remove remains of kswapd-managed zone->all_unreclaimable	Johannes Weiner	1	-8/+0
	shrink_zones() has a special branch to skip the all_unreclaimable() check during hibernation, because a frozen kswapd can't mark a zone unreclaimable. But ever since commit 6e543d5780e3 ("mm: vmscan: fix do_try_to_free_pages() livelock"), determining a zone to be unreclaimable is done by directly looking at its scan history and no longer relies on kswapd setting the per-zone flag. Remove this branch and let shrink_zones() check the reclaimability of the target zones regardless of hibernation state. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mem-hotplug: improve zone_movable_is_highmem logic	Wang Nan	1	-0/+2
	In original code, zone_movable_is_highmem() assumes ZONE_MOVABLE not highmem if CONFIG_HAVE_MEMBLOCK_NODE_MAP is not set. In online_pages, it extracts pages from the previous zone before ZONE_MOVABLE. Which is logically inconsistent: If HAVE_MEMBLOCK_NODE_MAP is turned off but HIGHMEM is on, zone_movable_is_highmem() makes movable zone not highmem, but online_pages() extracts pages from ZONE_HIGHMEM. This inconsistency doesn't cause real problem currently, because all architectures support online_pages also have HAVE_MEMBLOCK_NODE_MAP. However, fixing it makes code clear, and also helps futher coding. Signed-off-by: Wang Nan <[email protected]> Cc: Zhang Zhen <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Li Zefan <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm/mem-hotplug: replace simple_strtoull() with kstrtoull()	Zhang Zhen	1	-1/+3
	Use the newer and more pleasant kstrtoull() to replace simple_strtoull(), because simple_strtoull() is marked for obsoletion. Signed-off-by: Zhang Zhen <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: do not acquire page_cgroup lock for kmem pages	Johannes Weiner	1	-14/+7
	Kmem page charging and uncharging is serialized by means of exclusive access to the page. Do not take the page_cgroup lock and don't set pc->flags atomically. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: remove ordering between pc->mem_cgroup and PageCgroupUsed	Johannes Weiner	1	-9/+0
	There is a write barrier between setting pc->mem_cgroup and PageCgroupUsed, which was added to allow LRU operations to lookup the memcg LRU list of a page without acquiring the page_cgroup lock. But ever since commit 38c5d72f3ebe ("memcg: simplify LRU handling by new rule"), pages are ensured to be off-LRU while charging, so nobody else is changing LRU state while pc->mem_cgroup is being written, and there are no read barriers anymore. Remove the unnecessary write barrier. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: use root_mem_cgroup res_counter	Johannes Weiner	1	-108/+44
	Due to an old optimization to keep expensive res_counter changes at a minimum, the root_mem_cgroup res_counter is never charged; there is no limit at that level anyway, and any statistics can be generated on demand by summing up the counters of all other cgroups. However, with per-cpu charge caches, res_counter operations do not even show up in profiles anymore, so this optimization is no longer necessary. Remove it to simplify the code. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: catch root bypass in move precharge	Johannes Weiner	1	-1/+8
	When mem_cgroup_try_charge() returns -EINTR, it bypassed the charge to the root memcg. But move precharging does not catch this and treats this case as if no charge had happened, thus leaking a charge against root. Because of an old optimization, the root memcg's res_counter is not actually charged right now, but it's still an imbalance and subsequent patches will charge the root memcg again. Catch those bypasses to the root memcg and properly cancel them before giving up the move. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: simplify move precharge function	Johannes Weiner	1	-33/+15
	The move precharge function does some baroque things: it tries raw res_counter charging of the entire amount first, and then falls back to a loop of one-by-one charges, with checks for pending signals and cond_resched() batching. Just use mem_cgroup_try_charge() without __GFP_WAIT for the first bulk charge attempt. In the one-by-one loop, remove the signal check (this is already checked in try_charge), and simply call cond_resched() after every charge - it's not that expensive. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: remove explicit OOM parameter in charge path	Michal Hocko	1	-22/+10
	For the page allocator, __GFP_NORETRY implies that no OOM should be triggered, whereas memcg has an explicit parameter to disable OOM. The only callsites that want OOM disabled are THP charges and charge moving. THP already uses __GFP_NORETRY and charge moving can use it as well - one full reclaim cycle should be plenty. Switch it over, then remove the OOM parameter. Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: retry reclaim for oom-disabled and __GFP_NOFAIL charges	Johannes Weiner	1	-4/+4
	There is no reason why oom-disabled and __GFP_NOFAIL charges should try to reclaim only once when every other charge tries several times before giving up. Make them all retry the same number of times. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: huge_memory: use GFP_TRANSHUGE when charging huge pages	Johannes Weiner	1	-3/+3
	Transparent huge page charges prefer falling back to regular pages rather than spending a lot of time in direct reclaim. Desired reclaim behavior is usually declared in the gfp mask, but THP charges use GFP_KERNEL and then rely on the fact that OOM is disabled for THP charges, and that OOM-disabled charges don't retry reclaim. Needless to say, this is anything but obvious and quite error prone. Convert THP charges to use GFP_TRANSHUGE instead, which implies __GFP_NORETRY, to indicate the low-latency requirement. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: reclaim at least once for __GFP_NORETRY	Johannes Weiner	1	-3/+3
	Currently, __GFP_NORETRY tries charging once and gives up before even trying to reclaim. Bring the behavior on par with the page allocator and reclaim at least once before giving up. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: rearrange charging fast path	Johannes Weiner	1	-16/+17
	The charging path currently starts out with OOM condition checks when OOM is the rarest possible case. Rearrange this code to run OOM/task dying checks only after trying the percpu charge and the res_counter charge and bail out before entering reclaim. Attempting a charge does not hurt an (oom-)killed task as much as every charge attempt having to check OOM conditions. Also, only check __GFP_NOFAIL when the charge would actually fail. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: memcontrol: fold mem_cgroup_do_charge()	Johannes Weiner	1	-102/+64
	These patches rework memcg charge lifetime to integrate more naturally with the lifetime of user pages. This drastically simplifies the code and reduces charging and uncharging overhead. The most expensive part of charging and uncharging is the page_cgroup bit spinlock, which is removed entirely after this series. Here are the top-10 profile entries of a stress test that reads a 128G sparse file on a freshly booted box, without even a dedicated cgroup (i.e. executing in the root memcg). Before: 15.36% cat [kernel.kallsyms] [k] copy_user_generic_string 13.31% cat [kernel.kallsyms] [k] memset 11.48% cat [kernel.kallsyms] [k] do_mpage_readpage 4.23% cat [kernel.kallsyms] [k] get_page_from_freelist 2.38% cat [kernel.kallsyms] [k] put_page 2.32% cat [kernel.kallsyms] [k] __mem_cgroup_commit_charge 2.18% kswapd0 [kernel.kallsyms] [k] __mem_cgroup_uncharge_common 1.92% kswapd0 [kernel.kallsyms] [k] shrink_page_list 1.86% cat [kernel.kallsyms] [k] __radix_tree_lookup 1.62% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn After: 15.67% cat [kernel.kallsyms] [k] copy_user_generic_string 13.48% cat [kernel.kallsyms] [k] memset 11.42% cat [kernel.kallsyms] [k] do_mpage_readpage 3.98% cat [kernel.kallsyms] [k] get_page_from_freelist 2.46% cat [kernel.kallsyms] [k] put_page 2.13% kswapd0 [kernel.kallsyms] [k] shrink_page_list 1.88% cat [kernel.kallsyms] [k] __radix_tree_lookup 1.67% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn 1.39% kswapd0 [kernel.kallsyms] [k] free_pcppages_bulk 1.30% cat [kernel.kallsyms] [k] kfree As you can see, the memcg footprint has shrunk quite a bit. text data bss dec hex filename 37970 9892 400 48262 bc86 mm/memcontrol.o.old 35239 9892 400 45531 b1db mm/memcontrol.o This patch (of 13): This function was split out because mem_cgroup_try_charge() got too big. But having essentially one sequence of operations arbitrarily split in half is not good for reworking the code. Fold it back in. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: page-flags: clean up the page flag test, set, clear macros	Johannes Weiner	1	-7/+14
	- PAGEFLAG_FALSE only defines TEST, make it define SET and CLEAR as well, analogous to PAGEFLAG. - Define TESTSETFLAG_FALSE, analogous to TESTSETFLAG. - Define TESTSCFLAG_FALSE, analogous to TESTSCFLAG - Make PG_mlocked accessors the same on both MMU and !MMU setups Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm, thp: replace smp_mb after atomic_add by smp_mb__after_atomic	Waiman Long	1	-1/+1
	In some architectures like x86, atomic_add() is a full memory barrier. In that case, an additional smp_mb() is just a waste of time. This patch replaces that smp_mb() by smp_mb__after_atomic() which will avoid the redundant memory barrier in some architectures. With a 3.16-rc1 based kernel, this patch reduced the execution time of breaking 1000 transparent huge pages from 38,245us to 30,964us. A reduction of 19% which is quite sizeable. It also reduces the %cpu time of the __split_huge_page_refcount function in the perf profile from 2.18% to 1.15%. Signed-off-by: Waiman Long <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Scott J Norton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm, thp: move invariant bug check out of loop in __split_huge_page_map	Waiman Long	1	-2/+2
	In __split_huge_page_map(), the check for page_mapcount(page) is invariant within the for loop. Because of the fact that the macro is implemented using atomic_read(), the redundant check cannot be optimized away by the compiler leading to unnecessary read to the page structure. This patch moves the invariant bug check out of the loop so that it will be done only once. On a 3.16-rc1 based kernel, the execution time of a microbenchmark that broke up 1000 transparent huge pages using munmap() had an execution time of 38,245us and 38,548us with and without the patch respectively. The performance gain is about 1%. Signed-off-by: Waiman Long <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Scott J Norton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm, CMA: clean-up log message	Joonsoo Kim	1	-2/+2
	We don't need explicit 'CMA:' prefix, since we already define prefix 'cma:' in pr_fmt. So remove it. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Reviewed-by: Zhang Yanfei <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gleb Natapov <[email protected]> Acked-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm, CMA: change cma_declare_contiguous() to obey coding convention	Joonsoo Kim	4	-10/+11
	Conventionally, we put output param to the end of param list and put the 'base' ahead of 'size', but cma_declare_contiguous() doesn't look like that, so change it. Additionally, move down cma_areas reference code to the position where it is really needed. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gleb Natapov <[email protected]> Acked-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm, CMA: clean-up CMA allocation error path	Joonsoo Kim	1	-3/+4
	We can remove one call sites for clear_cma_bitmap() if we first call it before checking error number. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Minchan Kim <[email protected]> Reviewed-by: Michal Nazarewicz <[email protected]> Reviewed-by: Zhang Yanfei <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gleb Natapov <[email protected]> Acked-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	PPC, KVM, CMA: use general CMA reserved area management framework	Joonsoo Kim	5	-277/+14
	Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gleb Natapov <[email protected]> Acked-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	CMA: generalize CMA reserved area management functionality	Joonsoo Kim	8	-291/+383
	Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the KVM on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. From my guess, it is caused by some needs on bitmap management. KVM side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying & pasting this reserved area management code. In previous patches, we have prepared some features to generalize CMA reserved area management and now it's time to do it. This patch moves core functions to mm/cma.c and change DMA APIs to use these functions. There is no functional change in DMA APIs. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Acked-by: Zhang Yanfei <[email protected]> Acked-by: Minchan Kim <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gleb Natapov <[email protected]> Acked-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	DMA, CMA: support arbitrary bitmap granularity	Joonsoo Kim	1	-24/+53
	PPC KVM's CMA area management requires arbitrary bitmap granularity, since they want to reserve very large memory and manage this region with bitmap that one bit for several pages to reduce management overheads. So support arbitrary bitmap granularity for following generalization. [[email protected]: s/1/1UL/] Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Acked-by: Zhang Yanfei <[email protected]> Acked-by: Minchan Kim <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gleb Natapov <[email protected]> Acked-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	DMA, CMA: support alignment constraint on CMA region	Joonsoo Kim	1	-8/+18
	PPC KVM's CMA area management needs alignment constraint on CMA region. So support it to prepare generalization of CMA area management functionality. Additionally, add some comments which tell us why alignment constraint is needed on CMA region. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gleb Natapov <[email protected]> Acked-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	DMA, CMA: separate core CMA management codes from DMA APIs	Joonsoo Kim	1	-48/+77
	To prepare future generalization work on CMA area management code, we need to separate core CMA management codes from DMA APIs. We will extend these core functions to cover requirements of PPC KVM's CMA area management functionality in following patches. This separation helps us not to touch DMA APIs while extending core functions. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Acked-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm/internal.h: use nth_page	Fabian Frederick	1	-1/+1
	Use nth_page instead of pfn_to_page(page_to_pfn Signed-off-by: Fabian Frederick <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm: page_alloc: simplify drain_zone_pages by using min()	Michal Nazarewicz	1	-6/+2
	Instead of open-coding getting minimal value of two, just use min macro. That is why it is there for. While changing the function also change type of batch local variable to match type of per_cpu_pages::batch (which is int). Signed-off-by: Michal Nazarewicz <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mem-hotplug: introduce MMOP_OFFLINE to replace the hard coding -1	Tang Chen	3	-16/+20
	In store_mem_state(), we have: ... 334 else if (!strncmp(buf, "offline", min_t(int, count, 7))) 335 online_type = -1; ... 355 case -1: 356 ret = device_offline(&mem->dev); 357 break; ... Here, "offline" is hard coded as -1. This patch does the following renaming: ONLINE_KEEP -> MMOP_ONLINE_KEEP ONLINE_KERNEL -> MMOP_ONLINE_KERNEL ONLINE_MOVABLE -> MMOP_ONLINE_MOVABLE and introduces MMOP_OFFLINE = -1 to avoid hard coding. Signed-off-by: Tang Chen <[email protected]> Cc: Hu Tao <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Gu Zheng <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mem-hotplug: avoid illegal state prefixed with legal state when changing ↵	Tang Chen	1	-4/+4
	state of memory_block We use the following command to online a memory_block: echo online\|online_kernel\|online_movable > /sys/devices/system/memory/memoryXXX/state But, if we do the following: echo online_fhsjkghfkd > /sys/devices/system/memory/memoryXXX/state the block will also be onlined. This is because the following code in store_mem_state() does not compare the whole string, but only the prefix of the string. store_mem_state() { ...... 328 if (!strncmp(buf, "online_kernel", min_t(int, count, 13))) Here, only compare the first 13 letters of the string. If we give "online_kernelXXXXXX", it will be recognized as online_kernel, which is incorrect. 329 online_type = ONLINE_KERNEL; 330 else if (!strncmp(buf, "online_movable", min_t(int, count, 14))) We have the same problem here, 331 online_type = ONLINE_MOVABLE; 332 else if (!strncmp(buf, "online", min_t(int, count, 6))) here, (Here is more problematic. If we give online_movalbe, which is a typo of online_movable, it will be recognized as online without noticing the author.) 333 online_type = ONLINE_KEEP; 334 else if (!strncmp(buf, "offline", min_t(int, count, 7))) and here. 335 online_type = -1; 336 else { 337 ret = -EINVAL; 338 goto err; 339 } ...... } This patch fixes this problem by using sysfs_streq() to compare the whole string. Signed-off-by: Tang Chen <[email protected]> Reported-by: Hu Tao <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Gu Zheng <[email protected]> Acked-by: Toshi Kani <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm/memory.c: use entry = ACCESS_ONCE(*pte) in handle_pte_fault()	Hugh Dickins	1	-1/+1
	Use ACCESS_ONCE() in handle_pte_fault() when getting the entry or orig_pte upon which all subsequent decisions and pte_same() tests will be made. I have no evidence that its lack is responsible for the mm/filemap.c:202 BUG_ON(page_mapped(page)) in __delete_from_page_cache() found by trinity, and I am not optimistic that it will fix it. But I have found no other explanation, and ACCESS_ONCE() here will surely not hurt. If gcc does re-access the pte before passing it down, then that would be disastrous for correct page fault handling, and certainly could explain the page_mapped() BUGs seen (concurrent fault causing page to be mapped in a second time on top of itself: mapcount 2 for a single pte). Signed-off-by: Hugh Dickins <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	vmalloc: use rcu list iterator to reduce vmap_area_lock contention	Joonsoo Kim	1	-3/+3
	Richard Yao reported a month ago that his system have a trouble with vmap_area_lock contention during performance analysis by /proc/meminfo. Andrew asked why his analysis checks /proc/meminfo stressfully, but he didn't answer it. https://lkml.org/lkml/2014/4/10/416 Although I'm not sure that this is right usage or not, there is a solution reducing vmap_area_lock contention with no side-effect. That is just to use rcu list iterator in get_vmalloc_info(). rcu can be used in this function because all RCU protocol is already respected by writers, since Nick Piggin commit db64fe02258f1 ("mm: rewrite vmap layer") back in linux-2.6.28 Specifically : insertions use list_add_rcu(), deletions use list_del_rcu() and kfree_rcu(). Note the rb tree is not used from rcu reader (it would not be safe), only the vmap_area_list has full RCU protection. Note that __purge_vmap_area_lazy() already uses this rcu protection. rcu_read_lock(); list_for_each_entry_rcu(va, &vmap_area_list, list) { if (va->flags & VM_LAZY_FREE) { if (va->va_start < start) start = va->va_start; if (va->va_end > end) end = va->va_end; nr += (va->va_end - va->va_start) >> PAGE_SHIFT; list_add_tail(&va->purge_list, &valist); va->flags \|= VM_LAZY_FREEING; va->flags &= ~VM_LAZY_FREE; } } rcu_read_unlock(); Peter: : While rcu list traversal over the vmap_area_list is safe, this may : arrive at different results than the spinlocked version. The rcu list : traversal version will not be a 'snapshot' of a single, valid instant : of the entire vmap_area_list, but rather a potential amalgam of : different list states. Joonsoo: : Yes, you are right, but I don't think that we should be strict here. : Meminfo is already not a 'snapshot' at specific time. While we try to get : certain stats, the other stats can change. And, although we may arrive at : different results than the spinlocked version, the difference would not be : large and would not make serious side-effect. [[email protected]: add more commit description] Signed-off-by: Joonsoo Kim <[email protected]> Reported-by: Richard Yao <[email protected]> Acked-by: Eric Dumazet <[email protected]> Cc: Peter Hurley <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	include/linux/memblock.h: add __init to memblock_set_bottom_up()	Fabian Frederick	1	-2/+2
	memblock_set_bottom_up() is only called by __init cmdline_parse_movable_node() and __init numa_init(). Signed-off-by: Fabian Frederick <[email protected]> Reviewed-by: Tang Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm/page_alloc.c: unexport alloc_pages_exact_nid()	Andrew Morton	1	-1/+0
	It is only called by mm/page_cgroup.c whcih cannot be modular. Reported-by: David Rientjes <[email protected]> Cc: Fabian Frederick <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm/page_alloc.c: add __meminit to alloc_pages_exact_nid()	Fabian Frederick	2	-2/+2
	alloc_pages_exact_nid() is only called by __meminit alloc_page_cgroup() Signed-off-by: Fabian Frederick <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm/memory_hotplug.c: add __meminit to grow_zone_span/grow_pgdat_span	Fabian Frederick	1	-4/+4
	grow_zone_span and grow_pgdat_span are only called by __meminit __add_zone Signed-off-by: Fabian Frederick <[email protected]> Cc: Toshi Kani <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06	mm/readahead.c: remove unused file_ra_state from count_history_pages	Fabian Frederick	1	-2/+1
	count_history_pages does only call page_cache_prev_hole in rcu_lock context using address_space mapping. There's no need to have file_ra_state here. Signed-off-by: Fabian Frederick <[email protected]> Acked-by: Fengguang Wu <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>