Age | Commit message (Collapse) | Author | Files | Lines |
|
possible
Several callers have constant 'start' and an 'nbits' that is a multiple
of 8, so we can turn them into calls to memset. We don't need the
entirety of 'start' and 'nbits' to be constant, we just need to know
whether they're divisible by 8.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We have eight users calling bitmap_clear for a single bit and seventeen
calling bitmap_set for a single bit. Rather than fix all of them to
call __clear_bit or __set_bit, turn bitmap_clear and bitmap_set into
inline functions and make this special case efficient.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Bitmap optimisations", v2.
These three bitmap patches use more efficient specialisations when the
compiler can figure out that it's safe to do so. Thanks to Rasmus's
eagle eyes, a nasty bug in v1 was avoided, and I've added a test case
which would have caught it.
This patch (of 4):
This version of the test is actually a no-op; the next patch will enable
it.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We poke at proc sysctl enough that really we should declare it
maintained. We'll just be Cc'd and sending updates / ACK'ing changes
through akpm's tree.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
'all_var' looks like a variable, but is actually a macro. Use
IS_ENABLED(CONFIG_KALLSYMS_ALL) for clarification.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: "David S. Miller" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
setgroups is not exactly a hot path, so we might as well use the library
function instead of open-coding the sorting. Saves ~150 bytes.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
1120 544 16 1680 690 kernel/ksysfs.o
File size After adding 'const':
text data bss dec hex filename
1160 480 16 1656 678 kernel/ksysfs.o
Link: http://lkml.kernel.org/r/aa224b3cc923fdbb3edd0c41b2c639c85408c9e8.1498737347.git.arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Russell King <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Petr Tesarik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The global variable 'rd_size' is declared as 'int' in source file
arch/arm/kernel/atags_parse.c and as 'unsigned long' in
drivers/block/brd.c. Fix this inconsistency.
Additionally, remove the declarations of rd_image_start, rd_prompt and
rd_doload from parse_tag_ramdisk() since these duplicate existing
declarations in <linux/initrd.h>.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Bart Van Assche <[email protected]>
Acked-by: Russell King <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jason Yan <[email protected]>
Cc: Zhaohongjiang <[email protected]>
Cc: Miao Xie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Including <linux/bug.h> pulls in a lot of bloat from <asm/bug.h> and
<asm-generic/bug.h> that is not needed to call the BUILD_BUG() family of
macros. Split them out into their own header, <linux/build_bug.h>.
Also correct some checkpatch.pl errors for the BUILD_BUG_ON_ZERO() and
BUILD_BUG_ON_NULL() macros by adding parentheses around the bitfield
widths that begin with a minus sign.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Abbott <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Correct these checkpatch.pl errors:
|ERROR: space required before that '-' (ctx:OxO)
|#37: FILE: include/linux/bug.h:37:
|+#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
|ERROR: space required before that '-' (ctx:OxO)
|#38: FILE: include/linux/bug.h:38:
|+#define BUILD_BUG_ON_NULL(e) ((void *)sizeof(struct { int:-!!(e); }))
I decided to wrap the bitfield expressions that begin with minus signs
in parentheses rather than insert spaces before the minus signs.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Abbott <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Correct this checkpatch.pl error:
|ERROR: "(foo*)" should be "(foo *)"
|#19: FILE: include/linux/bug.h:19:
|+#define BUILD_BUG_ON_NULL(e) ((void*)0)
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Abbott <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Correct these checkpatch.pl warnings:
|WARNING: Block comments use * on subsequent lines
|#34: FILE: include/linux/bug.h:34:
|+/* Force a compilation error if condition is true, but also produce a
|+ result (of value 0 and type size_t), so the expression can be used
|WARNING: Block comments use a trailing */ on a separate line
|#36: FILE: include/linux/bug.h:36:
|+ aren't permitted). */
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Abbott <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This series of patches splits BUILD_BUG related macros out of
"include/linux/bug.h" into new file "include/linux/build_bug.h" (patch
5), and changes the pointer type checking in the `container_of()` macro
to deal with pointers of array type better (patch 6). Patches 1 to 4
are prerequisites.
Patches 2, 3, 4, and 5 have been inserted since the previous version of
this patch series. Patch 6 here corresponds to v3 and v4's patch 2.
Patch 1 was a prerequisite in v3 of this series to avoid a lot of
warnings when <linux/bug.h> was included by <linux/kernel.h>. That is
no longer relevant for v5 of the series, but I left it in because it was
acked by a Arnd Bergmann and Michal Nazarewicz.
Patches 2, 3, and 4 are some checkpatch clean-ups on
"include/linux/bug.h" before splitting out the BUILD_BUG stuff in patch
5.
Patch 5 splits the BUILD_BUG related macros out of "include/linux/bug.h"
into new file "include/linux/build_bug.h" because including
<linux/bug.h> in "include/linux/kernel.h" would result in build failures
due to circular dependencies.
Patch 6 changes the pointer type checking by `container_of()` to avoid
some incompatible pointer warnings when the dereferenced pointer has
array type.
1) asm-generic/bug.h: declare struct pt_regs; before function prototype
2) linux/bug.h: correct formatting of block comment
3) linux/bug.h: correct "(foo*)" should be "(foo *)"
4) linux/bug.h: correct "space required before that '-'"
5) bug: split BUILD_BUG stuff out into <linux/build_bug.h>
6) kernel.h: handle pointers to arrays better in container_of()
This patch (of 6):
The declaration of `__warn()` has `struct pt_regs *regs` as one of its
parameters. This can result in compiler warnings if `struct regs` is not
already declared. Add an empty declaration of `struct pt_regs` to avoid
the warnings.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Abbott <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The code can be much simplified by switching to ida_simple_get/remove.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Heiner Kallweit <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
FRV supports 64-bit cmpxchg, which is provided by the arch code as
__cmpxchg_64 and subsequently used to implement atomic64_cmpxchg.
This patch hooks up the generic cmpxchg64 API using the same function,
which also provides default definitions of the relaxed, acquire and
release variants. This fixes the build when COMPILE_TEST=y and
IOMMU_IO_PGTABLE_LPAE=y.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The arch uses a verbatim copy of the asm-generic version and does not
add any own implementations to the header, so use asm-generic/fb.h
instead of duplicating code.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tobias Klauser <[email protected]>
Reviewed-by: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
frv's asm/device.h is merely including asm-generic/device.h. Thus, the
arch specific header can be omitted and the generic header can be used
directly.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tobias Klauser <[email protected]>
Reviewed-by: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The helper function get_wild_bug_type() does not need to be in global
scope, so make it static.
Cleans up sparse warning:
"symbol 'get_wild_bug_type' was not declared. Should it be static?"
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
They return positive value, that is, true, if non-zero value is found.
Rename them to reduce confusion.
Link: http://lkml.kernel.org/r/20170516012350.GA16015@js1304-desktop
Signed-off-by: Joonsoo Kim <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
KASAN doesn't happen work with memory hotplug because hotplugged memory
doesn't have any shadow memory. So any access to hotplugged memory
would cause a crash on shadow check.
Use memory hotplug notifier to allocate and map shadow memory when the
hotplugged memory is going online and free shadow after the memory
offlined.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We used to read several bytes of the shadow memory in advance.
Therefore additional shadow memory mapped to prevent crash if
speculative load would happen near the end of the mapped shadow memory.
Now we don't have such speculative loads, so we no longer need to map
additional shadow memory.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We used to read several bytes of the shadow memory in advance.
Therefore additional shadow memory mapped to prevent crash if
speculative load would happen near the end of the mapped shadow memory.
Now we don't have such speculative loads, so we no longer need to map
additional shadow memory.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For some unaligned memory accesses we have to check additional byte of
the shadow memory. Currently we load that byte speculatively to have
only single load + branch on the optimistic fast path.
However, this approach has some downsides:
- It's unaligned access, so this prevents porting KASAN on
architectures which doesn't support unaligned accesses.
- We have to map additional shadow page to prevent crash if speculative
load happens near the end of the mapped memory. This would
significantly complicate upcoming memory hotplug support.
I wasn't able to notice any performance degradation with this patch. So
these speculative loads is just a pain with no gain, let's remove them.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is missing optimization in zero_p4d_populate() that can save some
memory when mapping zero shadow. Implement it like as others.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Andrey Ryabinin <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 40f9fb8cffc6 ("mm/zsmalloc: support allocating obj with size of
ZS_MAX_ALLOC_SIZE") fixes a size calculation error that prevented
zsmalloc to allocate an object of the maximal size (ZS_MAX_ALLOC_SIZE).
I think however the fix is unneededly complicated.
This patch replaces the dynamic calculation of zs_size_classes at init
time by a compile time calculation that uses the DIV_ROUND_UP() macro
already used in get_size_class_index().
[[email protected]: use min_t]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jerome Marchand <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Mahendran Ganesh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
8293 841 4 9138 23b2 drivers/block/zram/zram_drv.o
File size After adding 'const':
text data bss dec hex filename
8357 777 4 9138 23b2 drivers/block/zram/zram_drv.o
Link: http://lkml.kernel.org/r/65680c1c4d85818f7094cbfa31c91bf28185ba1b.1499061182.git.arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
early_pfn_to_nid will return node 0 if both HAVE_ARCH_EARLY_PFN_TO_NID
and HAVE_MEMBLOCK_NODE_MAP are disabled. It seems we are safe now
because all architectures which support NUMA define one of them (with an
exception of alpha which however has CONFIG_NUMA marked as broken) so
this works as expected. It can get silently and subtly broken too
easily, though. Make sure we fail the compilation if NUMA is enabled
and there is no proper implementation for this function. If that ever
happens we know that either the specific configuration is invalid and
the fix should either disable NUMA or enable one of the above configs.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Andrey reported a potential deadlock with the memory hotplug lock and
the cpu hotplug lock.
The reason is that memory hotplug takes the memory hotplug lock and then
calls stop_machine() which calls get_online_cpus(). That's the reverse
lock order to get_online_cpus(); get_online_mems(); in mm/slub_common.c
The problem has been there forever. The reason why this was never
reported is that the cpu hotplug locking had this homebrewn recursive
reader writer semaphore construct which due to the recursion evaded the
full lock dep coverage. The memory hotplug code copied that construct
verbatim and therefor has similar issues.
Three steps to fix this:
1) Convert the memory hotplug locking to a per cpu rwsem so the
potential issues get reported proper by lockdep.
2) Lock the online cpus in mem_hotplug_begin() before taking the memory
hotplug rwsem and use stop_machine_cpuslocked() in the page_alloc
code to avoid recursive locking.
3) The cpu hotpluck locking in #2 causes a recursive locking of the cpu
hotplug lock via __offline_pages() -> lru_add_drain_all(). Solve this
by invoking lru_add_drain_all_cpuslocked() instead.
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The rework of the cpu hotplug locking unearthed potential deadlocks with
the memory hotplug locking code.
The solution for these is to rework the memory hotplug locking code as
well and take the cpu hotplug lock before the memory hotplug lock in
mem_hotplug_begin(), but this will cause a recursive locking of the cpu
hotplug lock when the memory hotplug code calls lru_add_drain_all().
Split out the inner workings of lru_add_drain_all() into
lru_add_drain_all_cpuslocked() so this function can be invoked from the
memory hotplug code with the cpu hotplug lock held.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Reported-by: Andrey Ryabinin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use rlimit() helper instead of manually writing whole chain from current
task to rlim_cur.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Opasiak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
__list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer
duration if there are more number of items in the lru list. As per the
current code, it can hold the spin lock for upto maximum UINT_MAX
entries at a time. So if there are more number of items in the lru
list, then "BUG: spinlock lockup suspected" is observed in the below
path:
spin_bug+0x90
do_raw_spin_lock+0xfc
_raw_spin_lock+0x28
list_lru_add+0x28
dput+0x1c8
path_put+0x20
terminate_walk+0x3c
path_lookupat+0x100
filename_lookup+0x6c
user_path_at_empty+0x54
SyS_faccessat+0xd0
el0_svc_naked+0x24
This nlru->lock is acquired by another CPU in this path -
d_lru_shrink_move+0x34
dentry_lru_isolate_shrink+0x48
__list_lru_walk_one.isra.10+0x94
list_lru_walk_node+0x40
shrink_dcache_sb+0x60
do_remount_sb+0xbc
do_emergency_remount+0xb0
process_one_work+0x228
worker_thread+0x2e0
kthread+0xf4
ret_from_fork+0x10
Fix this lockup by reducing the number of entries to be shrinked from
the lru list to 1024 at once. Also, add cond_resched() before
processing the lru list again.
Link: http://marc.info/?t=149722864900001&r=1&w=2
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sahitya Tummala <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Suggested-by: Vladimir Davydov <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Cc: Alexander Polakov <[email protected]>
Cc: Al Viro <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
list_lru_count_node() iterates over all memcgs to get the total number of
entries on the node but it can race with memcg_drain_all_list_lrus(),
which migrates the entries from a dead cgroup to another. This can return
incorrect number of entries from list_lru_count_node().
Fix this by keeping track of entries per node and simply return it in
list_lru_count_node().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sahitya Tummala <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Alexander Polakov <[email protected]>
Cc: Al Viro <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
expand_stack(vma) fails if address < stack_guard_gap even if there is no
vma->vm_prev. I don't think this makes sense, and we didn't do this
before the recent commit 1be7107fbe18 ("mm: larger stack guard gap,
between vmas").
We do not need a gap in this case, any address is fine as long as
security_mmap_addr() doesn't object.
This also simplifies the code, we know that address >= prev->vm_end and
thus underflow is not possible.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Larry Woodman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") has
introduced a regression in some rust and Java environments which are
trying to implement their own stack guard page. They are punching a new
MAP_FIXED mapping inside the existing stack Vma.
This will confuse expand_{downwards,upwards} into thinking that the
stack expansion would in fact get us too close to an existing non-stack
vma which is a correct behavior wrt safety. It is a real regression on
the other hand.
Let's work around the problem by considering PROT_NONE mapping as a part
of the stack. This is a gros hack but overflowing to such a mapping
would trap anyway an we only can hope that usespace knows what it is
doing and handle it propely.
Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Debugged-by: Vlastimil Babka <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Willy Tarreau <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
presently pages in the balloon device have random value, and these pages
will be scanned by ksmd on the host. They usually cannot be merged.
Enqueue zero pages will resolve this problem.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: zhenwei.pi <[email protected]>
Cc: Gioh Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Rafael Aquini <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The align_offset parameter is used by bitmap_find_next_zero_area_off()
to represent the offset of map's base from the previous alignment
boundary; the function ensures that the returned index, plus the
align_offset, honors the specified align_mask.
The logic introduced by commit b5be83e308f7 ("mm: cma: align to physical
address, not CMA region position") has the cma driver calculate the
offset to the *next* alignment boundary. In most cases, the base
alignment is greater than that specified when making allocations,
resulting in a zero offset whether we align up or down. In the example
given with the commit, the base alignment (8MB) was half the requested
alignment (16MB) so the math also happened to work since the offset is
8MB in both directions. However, when requesting allocations with an
alignment greater than twice that of the base, the returned index would
not be correctly aligned.
Also, the align_order arguments of cma_bitmap_aligned_mask() and
cma_bitmap_aligned_offset() should not be negative so the argument type
was made unsigned.
Fixes: b5be83e308f7 ("mm: cma: align to physical address, not CMA region position")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Angus Clark <[email protected]>
Signed-off-by: Doug Berger <[email protected]>
Acked-by: Gregory Fong <[email protected]>
Cc: Doug Berger <[email protected]>
Cc: Angus Clark <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Shiraz Hashim <[email protected]>
Cc: Jaewon Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
__remove_zone() sets up up zone_type, but never uses it for anything.
This does not cause a warning, due to the (necessary) use of
-Wno-unused-but-set-variable. However, it's noise, so just delete it.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It seems that there are still people using 32b kernels which a lot of
memory and the IO tend to suck a lot for them by default. Mostly
because writers are throttled too when the lowmem is used. We have
highmem_is_dirtyable to work around that issue but it seems we never
bothered to document it. Let's do it now, finally.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Alkis Georgopoulos <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
wb_stat_sum() disables interrupts and calls __wb_stat_sum() which
eventually calls __percpu_counter_sum(). However, the percpu routine is
already irq-safe. Simplify the code a bit by making wb_stat_sum()
directly call percpu_counter_sum_positive() and not disable interrupts.
Also remove the now-uneeded __wb_stat_sum() which was just a wrapper
over percpu_counter_sum_positive().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nikolay Borisov <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently pg_data_t is just a struct which describes a NUMA node memory
layout. Let's keep the comment simple and remove ambiguity.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nikolay Borisov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
get_cpu_var() disables preemption and returns the per-CPU version of the
variable. Disabling preemption is useful to ensure atomic access to the
variable within the critical section.
In this case however, after the per-CPU version of the variable is
obtained the ->free_lock is acquired. For that reason it seems the raw
accessor could be used. It only seems that ->slots_ret should be
retested (because with disabled preemption this variable can not be set
to NULL otherwise).
This popped up during PREEMPT-RT testing because it tries to take
spinlocks in a preempt disabled section. In RT, spinlocks can sleep.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ying Huang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since current_order starts as MAX_ORDER-1 and is then only decremented,
the second half of the loop condition seems superfluous. However, if
order is 0, we may decrement current_order past 0, making it UINT_MAX.
This is obviously too subtle ([1], [2]).
Since we need to add some comment anyway, change the two variables to
signed, making the counting-down for loop look more familiar, and
apparently also making gcc generate slightly smaller code.
[1] https://lkml.org/lkml/2016/6/20/493
[2] https://lkml.org/lkml/2017/6/19/345
[[email protected]: fix up reject fixupping]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Rasmus Villemoes <[email protected]>
Reported-by: Hao Lee <[email protected]>
Acked-by: Wei Yang <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
we do not hide stack guard page in /proc/<pid>/maps
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vasily Averin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
__register_one_node() initializes local parameters "p_node" & "parent"
for register_node().
But, register_node() does not use them.
Remove the related code of "parent" node, cleanup __register_one_node()
and register_node().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dou Liyang <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
pagetypeinfo_showmixedcount_print is found to take a lot of time to
complete and it does this holding the zone lock and disabling
interrupts. In some cases it is found to take more than a second (On a
2.4GHz,8Gb RAM,arm64 cpu).
Avoid taking the zone lock similar to what is done by read_page_owner,
which means possibility of inaccurate results.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vinayak Menon <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: zhongjiang <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Sudip Mukherjee <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
new_page is yet another duplication of the migration callback which has
to handle hugetlb migration specially. We can safely use the generic
new_page_nodemask for the same purpose.
Please note that gigantic hugetlb pages do not need any special handling
because alloc_huge_page_nodemask will make sure to check pages in all
per node pools. The reason this was done previously was that
alloc_huge_page_node treated NO_NUMA_NODE and a specific node
differently and so alloc_huge_page_node(nid) would check on this
specific node.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reported-by: Vlastimil Babka <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Tested-by: Mike Kravetz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
alloc_huge_page_nodemask tries to allocate from any numa node in the
allowed node mask starting from lower numa nodes. This might lead to
filling up those low NUMA nodes while others are not used. We can
reduce this risk by introducing a concept of the preferred node similar
to what we have in the regular page allocator. We will start allocating
from the preferred nid and then iterate over all allowed nodes in the
zonelist order until we try them all.
This is mimicing the page allocator logic except it operates on per-node
mempools. dequeue_huge_page_vma already does this so distill the
zonelist logic into a more generic dequeue_huge_page_nodemask and use it
in alloc_huge_page_nodemask.
This will allow us to use proper per numa distance fallback also for
alloc_huge_page_node which can use alloc_huge_page_nodemask now and we
can get rid of alloc_huge_page_node helper which doesn't have any user
anymore.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Tested-by: Mike Kravetz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm, hugetlb: allow proper node fallback dequeue".
While working on a hugetlb migration issue addressed in a separate
patchset[1] I have noticed that the hugetlb allocations from the
preallocated pool are quite subotimal.
[1] //lkml.kernel.org/r/[email protected]
There is no fallback mechanism implemented and no notion of preferred
node. I have tried to work around it but Vlastimil was right to push
back for a more robust solution. It seems that such a solution is to
reuse zonelist approach we use for the page alloctor.
This series has 3 patches. The first one tries to make hugetlb
allocation layers more clear. The second one implements the zonelist
hugetlb pool allocation and introduces a preferred node semantic which
is used by the migration callbacks. The last patch is a clean up.
This patch (of 3):
Hugetlb allocation path for fresh huge pages is unnecessarily complex
and it mixes different interfaces between layers.
__alloc_buddy_huge_page is the central place to perform a new
allocation. It checks for the hugetlb overcommit and then relies on
__hugetlb_alloc_buddy_huge_page to invoke the page allocator. This is
all good except that __alloc_buddy_huge_page pushes vma and address down
the callchain and so __hugetlb_alloc_buddy_huge_page has to deal with
two different allocation modes - one for memory policy and other node
specific (or to make it more obscure node non-specific) requests.
This just screams for a reorganization.
This patch pulls out all the vma specific handling up to
__alloc_buddy_huge_page_with_mpol where it belongs.
__alloc_buddy_huge_page will get nodemask argument and
__hugetlb_alloc_buddy_huge_page will become a trivial wrapper over the
page allocator.
In short:
__alloc_buddy_huge_page_with_mpol - memory policy handling
__alloc_buddy_huge_page - overcommit handling and accounting
__hugetlb_alloc_buddy_huge_page - page allocator layer
Also note that __hugetlb_alloc_buddy_huge_page and its cpuset retry loop
is not really needed because the page allocator already handles the
cpusets update.
Finally __hugetlb_alloc_buddy_huge_page had a special case for node
specific allocations (when no policy is applied and there is a node
given). This has relied on __GFP_THISNODE to not fallback to a different
node. alloc_huge_page_node is the only caller which relies on this
behavior so move the __GFP_THISNODE there.
Not only does this remove quite some code it also should make those
layers easier to follow and clear wrt responsibilities.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Tested-by: Mike Kravetz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
During the debugging of the problem described in
https://lkml.org/lkml/2017/5/17/542 and fixed by Tetsuo Handa in
https://lkml.org/lkml/2017/5/19/383 , I've found that the existing debug
output is not really useful to understand issues related to the oom
reaper.
So, I assume, that adding some tracepoints might help with debugging of
similar issues.
Trace the following events:
1) a process is marked as an oom victim,
2) a process is added to the oom reaper list,
3) the oom reaper starts reaping process's mm,
4) the oom reaper finished reaping,
5) the oom reaper skips reaping.
How it works in practice? Below is an example which show how the problem
mentioned above can be found: one process is added twice to the
oom_reaper list:
$ cd /sys/kernel/debug/tracing
$ echo "oom:mark_victim" > set_event
$ echo "oom:wake_reaper" >> set_event
$ echo "oom:skip_task_reaping" >> set_event
$ echo "oom:start_task_reaping" >> set_event
$ echo "oom:finish_task_reaping" >> set_event
$ cat trace_pipe
allocate-502 [001] .... 91.836405: mark_victim: pid=502
allocate-502 [001] .N.. 91.837356: wake_reaper: pid=502
allocate-502 [000] .N.. 91.871149: wake_reaper: pid=502
oom_reaper-23 [000] .... 91.871177: start_task_reaping: pid=502
oom_reaper-23 [000] .N.. 91.879511: finish_task_reaping: pid=502
oom_reaper-23 [000] .... 91.879580: skip_task_reaping: pid=502
Link: http://lkml.kernel.org/r/20170530185231.GA13412@castle
Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
MADV_FREE is identical to MADV_DONTNEED from the point of view of uffd
monitor. The monitor has to stop handling #PF events in the range being
freed. We are reusing userfaultfd_remove callback along with the logic
required to re-get and re-validate the VMA which may change or disappear
because userfaultfd_remove releases mmap_sem.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|