Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix the following issues in test_meminit.c:
- |size| in fill_with_garbage_skip() should be signed so that it
doesn't overflow if it's not aligned on sizeof(*p);
- fill_with_garbage_skip() should actually skip |skip| bytes;
- do_kmem_cache_size() should deallocate memory in the RCU case.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 7e659650cbda ("lib: introduce test_meminit module")
Fixes: 94e8988d91c7 ("lib/test_meminit.c: fix -Wmaybe-uninitialized false positive")
Signed-off-by: Alexander Potapenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The conditional logic is too complicated for the compiler to fully
comprehend:
lib/test_meminit.c: In function 'test_meminit_init':
lib/test_meminit.c:236:5: error: 'buf_copy' may be used uninitialized in this function [-Werror=maybe-uninitialized]
kfree(buf_copy);
^~~~~~~~~~~~~~~
lib/test_meminit.c:201:14: note: 'buf_copy' was declared here
Simplify it by splitting out the non-rcu section.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: af734ee6ec85 ("lib: introduce test_meminit module")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Alexander Potapenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Due to some sad limitations in how kerneldoc comments are parsed, the
documentation in lib/string_helpers.c generates these warnings:
lib/string_helpers.c:236: WARNING: Unexpected indentation.
lib/string_helpers.c:241: WARNING: Block quote ends without a blank line; unexpected unindent.
lib/string_helpers.c:446: WARNING: Unexpected indentation.
lib/string_helpers.c:451: WARNING: Block quote ends without a blank line; unexpected unindent.
lib/string_helpers.c:474: WARNING: Unexpected indentation.
Rework the comments to obtain something like the desired result.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Finish up what commit c2febafc6773 ("mm: convert generic code to 5-level
paging") started while levelling up P4D huge mapping support at par with
PUD and PMD. A new arch call back arch_ioremap_p4d_supported() is added
which just maintains status quo (P4D huge map not supported) on x86,
arm64 and powerpc.
When HAVE_ARCH_HUGE_VMAP is enabled its just a simple check from the
arch about the support, hence runtime effects are minimal.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Virtual address alignment is essential in ensuring correct clearing for
all intermediate level pgtable entries and freeing associated pgtable
pages. An unaligned address can end up randomly freeing pgtable page
that potentially still contains valid mappings. Hence also check it's
alignment along with existing phys_addr check.
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add tests for heap and pagealloc initialization. These can be used to
check init_on_alloc and init_on_free implementations as well as other
approaches to initialization.
Expected test output in the case the kernel provides heap initialization
(e.g. when running with either init_on_alloc=1 or init_on_free=1):
test_meminit: all 10 tests in test_pages passed
test_meminit: all 40 tests in test_kvmalloc passed
test_meminit: all 60 tests in test_kmemcache passed
test_meminit: all 10 tests in test_rcu_persistent passed
test_meminit: all 120 tests passed!
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Kostya Serebryany <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Sandeep Patil <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Marco Elver <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This adds __GFP_NOWARN to the kmalloc()-portions of the overflow test to
avoid tainting the kernel. Additionally fixes up the math on wrap size
to be architecture and page size agnostic.
Link: http://lkml.kernel.org/r/201905282012.0A8767E24@keescook
Fixes: ca90800a91ba ("test_overflow: Add memory allocation overflow tests")
Signed-off-by: Kees Cook <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
Suggested-by: Rasmus Villemoes <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make sure that the trailing NUL is considered part of the string and can
be found.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Rosin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If a memsetXX implementation is completely broken and fails in the first
iteration, when i, j, and k are all zero, the failure is masked as zero
is returned. Failing in the first iteration is perhaps the most likely
failure, so this makes the tests pretty much useless. Avoid the
situation by always setting a random unused bit in the result on
failure.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 03270c13c5ff ("lib/string.c: add testcases for memset16/32/64")
Signed-off-by: Peter Rosin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "lib/string: search for NUL with strchr/strnchr".
I noticed an inconsistency where strchr and strnchr do not behave the
same with respect to the trailing NUL. strchr is standardised and the
kernel function conforms, and the kernel relies on the behavior. So,
naturally strchr stays as-is and strnchr is what I change.
While writing a few tests to verify that my new strnchr loop was sane, I
noticed that the tests for memset16/32/64 had a problem. Since it's all
about the lib/string.c file I made a short series of it all...
This patch (of 3):
strchr considers the terminating NUL to be part of the string, and NUL
can thus be searched for with that function. For consistency, do the
same with strnchr.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Rosin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
list_del() poisoning can generate 2 64-bit immediate loads but it also can
generate one 64-bit immediate load and an addition:
48 b8 00 01 00 00 00 00 ad de movabs rax,0xdead000000000100
48 89 47 58 mov QWORD PTR [rdi+0x58],rax
48 05 00 01 00 00 <=====> add rax,0x100
48 89 47 60 mov QWORD PTR [rdi+0x60],rax
However on x86_64 not all constants are equal: those within [-128, 127]
range can be added with shorter "add r64, imm32" instruction:
48 b8 00 01 00 00 00 00 ad de movabs rax,0xdead000000000100
48 89 47 58 mov QWORD PTR [rdi+0x58],rax
48 83 c0 22 <======> add rax,0x22
48 89 47 60 mov QWORD PTR [rdi+0x60],rax
Patch saves 2 bytes per some LIST_POISON2 usage.
(Slightly disappointing) space savings on F29 x86_64 config:
add/remove: 0/0 grow/shrink: 0/2164 up/down: 0/-5184 (-5184)
Function old new delta
zstd_get_workspace 548 546 -2
...
mlx4_delete_all_resources_for_slave 4826 4804 -22
Total: Before=83304131, After=83298947, chg -0.01%
New constants are:
0xdead000000000100
0xdead000000000122
Note: LIST_POISON1 can't be changed to ...11 because something in page
allocator requires low bit unset.
Link: http://lkml.kernel.org/r/20190513191502.GA8492@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Cc: Vasiliy Kulikov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a command line switch --no-moderated to skip L: mailing lists marked
with 'moderated'.
Some people prefer not emailing moderated mailing lists as the
moderation time can be indeterminate and some emails can be
intentionally dropped by a moderator.
This can cause fragmentation of email threads when some are subscribed
to a moderated list but others are not and emails are dropped.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Joe Perches <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix this compilation warning on x86 by making flush_cache_vmap() inline.
lib/ioremap.c: In function 'ioremap_page_range':
lib/ioremap.c:214:16: warning: variable 'start' set but not used [-Wunused-but-set-variable]
unsigned long start;
^~~~~
While at it, convert all other similar functions to inline for
consistency.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Qian Cai <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
isa_page_to_bus() is deprecated and is no longer used anywhere. Remove
it entirely.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Kitt <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now that BIT() can be used from assembly code, we can safely replace
_BITUL() with equivalent BIT().
UAPI headers are still required to use _BITUL(), but there is no more
reason to use it in kernel headers. BIT() is shorter.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
BIT(), GENMASK(), etc. are useful to define register bits of hardware.
However, low-level code is often written in assembly, where they are
not available due to the hard-coded 1UL, 0UL.
In fact, in-kernel headers such as arch/arm64/include/asm/sysreg.h
use _BITUL() instead of BIT() so that the register bit macros are
available in assembly.
Using macros in include/uapi/linux/const.h have two reasons:
[1] For use in uapi headers
We should use underscore-prefixed variants for user-space.
[2] For use in assembly code
Since _BITUL() uses UL(1) instead of 1UL, it can be used as an
alternative of BIT().
For [2], it is pretty easy to change BIT() etc. for use in assembly.
This allows to replace _BUTUL() in kernel-space headers with BIT().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
fix lenght to length
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Weitao Hou <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Colin Ian King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
inodes.
Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
but this is not a correct behavior for proc. Since sysctl permission
check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more
sense to use these values in u_[ug]id of proc inodes. In other words:
although uid/gid in the inode is not read during test_perm, the inode
logically belongs to the root of the namespace. I have confirmed this
with Eric Biederman at LPC and in this thread:
https://lore.kernel.org/lkml/[email protected]
Consequences
============
Since the i_[ug]id values of proc nodes are not used for permissions
checks, this change usually makes no functional difference. However, it
causes an issue in a setup where:
* a namespace container is created without root user in container -
hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID
* container creator tries to configure it by writing /proc/sys files,
e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit
Kernel does not allow to open an inode for writing if its i_[ug]id are
invalid, making it impossible to write shmmax and thus - configure the
container.
Using a container with no root mapping is apparently rare, but we do use
this configuration at Google. Also, we use a generic tool to configure
the container limits, and the inability to write any of them causes a
failure.
History
=======
The invalid uids/gids in inodes first appeared due to 81754357770e (fs:
Update i_[ug]id_(read|write) to translate relative to s_user_ns).
However, AFAIK, this did not immediately cause any issues. The
inability to write to these "invalid" inodes was only caused by a later
commit 0bd23d09b874 (vfs: Don't modify inodes with a uid or gid unknown
to the vfs).
Tested: Used a repro program that creates a user namespace without any
mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
Before the change, it shows the overflow uid, with the change it's 0.
The overflow uid indicates that the uid in the inode is not correct and
thus it is not possible to open the file for writing.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0bd23d09b874 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs")
Signed-off-by: Radoslaw Burny <[email protected]>
Acked-by: Luis Chamberlain <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Seth Forshee <[email protected]>
Cc: John Sperbeck <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I thought that /proc/sysvipc has the same bug as /proc/net
commit 1fde6f21d90f8ba5da3cb9c54ca991ed72696c43
proc: fix /proc/net/* after setns(2)
However, it doesn't! /proc/sysvipc files do
get_ipc_ns(current->nsproxy->ipc_ns);
in their open() hook and avoid the problem.
Keep the test, maybe /proc/sysvipc will become broken someday :-\
Link: http://lkml.kernel.org/r/20190706180146.GA21015@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Don't repeat function signatures twice.
This is a kind-of-precursor for "struct proc_ops".
Note:
typeof(pde->proc_fops->...) ...;
can't be used because ->proc_fops is "const struct file_operations *".
"const" prevents assignment down the code and it can't be deleted in the
type system.
Link: http://lkml.kernel.org/r/20190529191110.GB5703@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add typeof_member() macro so that types can be extracted without
introducing dummy variables.
Link: http://lkml.kernel.org/r/20190529190720.GA5703@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since commit 2724273e8fd0 ("vmcore: add API to collect hardware dump in
second kernel"), drivers are allowed to add device related dump data to
vmcore as they want by using the device dump API. This has a potential
issue, the data is stored in memory, drivers may append too much data
and use too much memory. The vmcore is typically used in a kdump kernel
which runs in a pre-reserved small chunk of memory. So as a result it
will make kdump unusable at all due to OOM issues.
So introduce new 'novmcoredd' command line option. User can disable
device dump to reduce memory usage. This is helpful if device dump is
using too much memory, disabling device dump could make sure a regular
vmcore without device dump data is still available.
[[email protected]: tweak documentation]
[[email protected]: vmcore.c needs moduleparam.h]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kairui Song <[email protected]>
Acked-by: Dave Young <[email protected]>
Reviewed-by: Bhupesh Sharma <[email protected]>
Cc: Rahul Lakkireddy <[email protected]>
Cc: "David S . Miller" <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Baoquan He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
ffffffffff600000" dmesg spam
Test tries to access vsyscall page and if it doesn't exist gets SIGSEGV
which can spam into dmesg. However the segfault happens by design.
Handle it and carry information via exit code to parent.
Link: http://lkml.kernel.org/r/20190524181256.GA2260@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The whole header file deals with swap entries and PTEs, none of which
can exist for nommu builds. The current nommu ports have lots of stubs
to allow the inline functions in swapops.h to compile, but as none of
this functionality is actually used there is no point in even providing
it. This way we don't have to provide the stubs for the upcoming RISC-V
nommu port, and can eventually remove it from the existing ports.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Vladimir Murzin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Vladimir Murzin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We can't expose UAPI symbols differently based on CONFIG_ symbols, as
userspace won't have them available. Instead always define the flag,
but only respect it based on the config option.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Vladimir Murzin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The description of cma_declare_contiguous() indicates that if the
'fixed' argument is true the reserved contiguous area must be exactly at
the address of the 'base' argument.
However, the function currently allows the 'base', 'size', and 'limit'
arguments to be silently adjusted to meet alignment constraints. This
commit enforces the documented behavior through explicit checks that
return an error if the region does not fit within a specified region.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 5ea3b1b2f8ad ("cma: add placement specifier for "cma=" kernel parameter")
Signed-off-by: Doug Berger <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Yue Hu <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Peng Fan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE).
However, zhdr contains fields that can't be directly coppied over (ex:
list_head, a circular linked list). We only need to initialize the
linked lists in new_zhdr, as z3fold_isolate_page() already ensures that
these lists are empty
Additionally it is possible that zhdr->work has been placed in a
workqueue. In this case we shouldn't migrate the page, as zhdr->work
references zhdr as opposed to new_zhdr.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1f862989b04ade61d3 ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Vitaly Vul <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Jonathan Adams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
z3fold_page_migrate() will never succeed because it attempts to acquire
a lock that has already been taken by migrate.c in __unmap_and_move().
__unmap_and_move() migrate.c
trylock_page(oldpage)
move_to_new_page(oldpage_newpage)
a_ops->migrate_page(oldpage, newpage)
z3fold_page_migrate(oldpage, newpage)
trylock_page(oldpage)
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Vitaly Vul <[email protected]>
Cc: Jonathan Adams <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Snild Dolkow <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Six sites are presently altering current->reclaim_state. There is a
risk that one function stomps on a caller's value. Use a helper
function to catch such errors.
Cc: Yafang Shao <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are six different reclaim paths by now:
- kswapd reclaim path
- node reclaim path
- hibernate preallocate memory reclaim path
- direct reclaim path
- memcg reclaim path
- memcg softlimit reclaim path
The slab caches reclaimed in these paths are only calculated in the
above three paths.
There're some drawbacks if we don't calculate the reclaimed slab caches.
- The sc->nr_reclaimed isn't correct if there're some slab caches
relcaimed in this path.
- The slab caches may be reclaimed thoroughly if there're lots of
reclaimable slab caches and few page caches.
Let's take an easy example for this case. If one memcg is full of
slab caches and the limit of it is 512M, in other words there're
approximately 512M slab caches in this memcg. Then the limit of the
memcg is reached and the memcg reclaim begins, and then in this memcg
reclaim path it will continuesly reclaim the slab caches until the
sc->priority drops to 0. After this reclaim stops, you will find
there're few slab caches left, which is less than 20M in my test
case. While after this patch applied the number is greater than 300M
and the sc->priority only drops to 3.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm/vmscan: calculate reclaimed slab in all reclaim paths".
This patchset is to fix the issues in doing shrink slab.
There're six different reclaim paths by now,
- kswapd reclaim path
- node reclaim path
- hibernate preallocate memory reclaim path
- direct reclaim path
- memcg reclaim path
- memcg softlimit reclaim path
The slab caches reclaimed in these paths are only calculated in the
above three paths. The issues are detailed explained in patch #2. We
should calculate the reclaimed slab caches in every reclaim path. In
order to do it, the struct reclaim_state is placed into the struct
shrink_control.
In node reclaim path, there'is another issue about shrinking slab, which
is adressed in "mm/vmscan: shrink slab in node reclaim"
(https://lore.kernel.org/linux-mm/[email protected]/).
This patch (of 2):
The struct reclaim_state is used to record how many slab caches are
reclaimed in one reclaim path. The struct shrink_control is used to
control one reclaim path. So we'd better put reclaim_state into
shrink_control.
[[email protected]: remove reclaim_state assignment from __perform_reclaim()]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After commit 815744d75152 ("mm: memcontrol: don't batch updates of local
VM stats and events"), the local VM counter are not in sync with the
hierarchical ones.
Below is one example in a leaf memcg on my server (with 8 CPUs):
inactive_file 3567570944
total_inactive_file 3568029696
We find that the deviation is very great because the 'val' in
__mod_memcg_state() is in pages while the effective value in
memcg_stat_show() is in bytes.
So the maximum of this deviation between local VM stats and total VM
stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an
unacceptably great value.
We should keep the local VM stats in sync with the total stats. In
order to keep this behavior the same across counters, this patch updates
__mod_lruvec_state() and __count_memcg_events() as well.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Yafang Shao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
One of the gfp flags used to show that a page is movable is
__GFP_HIGHMEM. Currently z3fold_alloc() fails when __GFP_HIGHMEM is
passed. Now that z3fold pages are movable, we allow __GFP_HIGHMEM. We
strip the movability related flags from the call to kmem_cache_alloc()
for our slots since it is a kernel allocation.
[[email protected]: coding-style fixes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Henry Burns <[email protected]>
Acked-by: Vitaly Wool <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A comment referred to a non-existent function alloc_cma(), which should
have been cma_alloc().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryohei Suzuki <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Clang gets rather confused about two variables in the same special
section when one of them is not initialized, leading to an assembler
warning later:
/tmp/slab_common-18f869.s: Assembler messages:
/tmp/slab_common-18f869.s:7526: Warning: ignoring changed section attributes for .data..ro_after_init
Adding an initialization to kmalloc_caches is rather silly here
but does avoid the issue.
Link: https://bugs.llvm.org/show_bug.cgi?id=42570
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The mpi library contains some rather old inline assembly statements that
produce a lot of warnings for 32-bit x86, such as:
lib/mpi/mpih-div.c:76:16: error: invalid use of a cast in a inline asm context requiring an l-value: remove the cast or build with -fheinous-gnu-extensions
udiv_qrnnd(qp[i], n1, n1, np[i], d);
~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
lib/mpi/longlong.h:423:20: note: expanded from macro 'udiv_qrnnd'
: "=a" ((USItype)(q)), \
~~~~~~~~~~^~
There is no point in doing a type cast for the output of an inline
assembler statement, so just remove the cast here, as we have done for
other architectures in the past.
See also dea632cadd12 ("lib/mpi: fix build with clang").
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Cc: Stefan Agner <[email protected]>
Cc: Dmitry Kasatkin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When CONFIG_SYSFS is disabled but CONFIG_TMPFS is enabled, we get a
warning about shmem_parse_huge() never being called:
mm/shmem.c:417:12: error: unused function 'shmem_parse_huge' [-Werror,-Wunused-function]
static int shmem_parse_huge(const char *str)
Change the #ifdef so we no longer build this function in that configuration.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 144df3b288c4 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Howells <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Vineeth Remanan Pillai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As reported by Henry Burns:
Running z3fold stress testing with address sanitization showed zhdr->slots
was being used after it was freed.
z3fold_free(z3fold_pool, handle)
free_handle(handle)
kmem_cache_free(pool->c_handle, zhdr->slots)
release_z3fold_page_locked_list(kref)
__release_z3fold_page(zhdr, true)
zhdr_to_pool(zhdr)
slots_to_pool(zhdr->slots) *BOOM*
To fix this, add pointer to the pool back to z3fold_header and modify
zhdr_to_pool to return zhdr->pool.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 7c2b8baa61fe ("mm/z3fold.c: add structure for buddy handles")
Signed-off-by: Vitaly Wool <[email protected]>
Reported-by: Henry Burns <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Jonathan Adams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Two small fixes.
This is just a fix for an unused value that Colin King sent me and a
related fix I added"
* tag 'for-linus-5.3-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: eliminate needless variable assignments
orangefs: remove redundant assignment to variable buffer_index
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Highlights:
- chunks that have been trimmed and unchanged since last mount are
tracked and skipped on repeated trims
- use hw assissed crc32c on more arches, speedups if native
instructions or optimized implementation is available
- the RAID56 incompat bit is automatically removed when the last
block group of that type is removed
Fixes:
- fsync fix for reflink on NODATACOW files that could lead to ENOSPC
- fix data loss after inode eviction, renaming it, and fsync it
- fix fsync not persisting dentry deletions due to inode evictions
- update ctime/mtime/iversion after hole punching
- fix compression type validation (reported by KASAN)
- send won't be allowed to start when relocation is in progress, this
can cause spurious errors or produce incorrect send stream
Core:
- new tracepoints for space update
- tree-checker: better check for end of extents for some tree items
- preparatory work for more checksum algorithms
- run delayed iput at unlink time and don't push the work to cleaner
thread where it's not properly throttled
- wrap block mapping to structures and helpers, base for further
refactoring
- split large files, part 1:
- space info handling
- block group reservations
- delayed refs
- delayed allocation
- other cleanups and refactoring"
* tag 'for-5.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (103 commits)
btrfs: fix memory leak of path on error return path
btrfs: move the subvolume reservation stuff out of extent-tree.c
btrfs: migrate the delalloc space stuff to it's own home
btrfs: migrate btrfs_trans_release_chunk_metadata
btrfs: migrate the delayed refs rsv code
btrfs: Evaluate io_tree in find_lock_delalloc_range()
btrfs: migrate the global_block_rsv helpers to block-rsv.c
btrfs: migrate the block-rsv code to block-rsv.c
btrfs: stop using block_rsv_release_bytes everywhere
btrfs: cleanup the target logic in __btrfs_block_rsv_release
btrfs: export __btrfs_block_rsv_release
btrfs: export btrfs_block_rsv_add_bytes
btrfs: move btrfs_block_rsv definitions into it's own header
btrfs: Simplify update of space_info in __reserve_metadata_bytes()
btrfs: unexport can_overcommit
btrfs: move reserve_metadata_bytes and supporting code to space-info.c
btrfs: move dump_space_info to space-info.c
btrfs: export block_rsv_use_bytes
btrfs: move btrfs_space_info_add_*_bytes to space-info.c
btrfs: move the space info update macro to space-info.h
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
- long due rewrite of do_page_fault
- refactoring of entry/exit code to utilize the double load/store
instructions
- hsdk platform updates
* tag 'arc-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [plat-hsdk]: Enable AXI DW DMAC in defconfig
ARC: [plat-hsdk]: enable DW SPI controller
ARC: hide unused function unw_hdr_alloc
ARC: [haps] Add Virtio support
ARCv2: entry: simplify return to Delay Slot via interrupt
ARC: entry: EV_Trap expects r10 (vs. r9) to have exception cause
ARCv2: entry: rewrite to enable use of double load/stores LDD/STD
ARCv2: entry: avoid a branch
ARCv2: entry: push out the Z flag unclobber from common EXCEPTION_PROLOGUE
ARCv2: entry: comments about hardware auto-save on taken interrupts
ARC: mm: do_page_fault refactor #8: release mmap_sem sooner
ARC: mm: do_page_fault refactor #7: fold the various error handling
ARC: mm: do_page_fault refactor #6: error handlers to use same pattern
ARC: mm: do_page_fault refactor #5: scoot no_context to end
ARC: mm: do_page_fault refactor #4: consolidate retry related logic
ARC: mm: do_page_fault refactor #3: tidyup vma access permission code
ARC: mm: do_page_fault refactor #2: remove short lived variable
ARC: mm: do_page_fault refactor #1: remove label @good_area
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull rst conversion of docs from Mauro Carvalho Chehab:
"As agreed with Jon, I'm sending this big series directly to you, c/c
him, as this series required a special care, in order to avoid
conflicts with other trees"
* tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits)
docs: kbuild: fix build with pdf and fix some minor issues
docs: block: fix pdf output
docs: arm: fix a breakage with pdf output
docs: don't use nested tables
docs: gpio: add sysfs interface to the admin-guide
docs: locking: add it to the main index
docs: add some directories to the main documentation index
docs: add SPDX tags to new index files
docs: add a memory-devices subdir to driver-api
docs: phy: place documentation under driver-api
docs: serial: move it to the driver-api
docs: driver-api: add remaining converted dirs to it
docs: driver-api: add xilinx driver API documentation
docs: driver-api: add a series of orphaned documents
docs: admin-guide: add a series of orphaned documents
docs: cgroup-v1: add it to the admin-guide book
docs: aoe: add it to the driver-api book
docs: add some documentation dirs to the driver-api book
docs: driver-model: move it to the driver-api book
docs: lp855x-driver.rst: add it to the driver-api book
...
|
|
Pull Xtensa updates from Max Filippov:
- clean up PCI support code
- add defconfig and DTS for the 'virt' board
- abstract 'entry' and 'retw' uses in xtensa assembly in preparation
for XEA3/NX pipeline support
- random small cleanups
* tag 'xtensa-20190715' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: virt: add defconfig and DTS
xtensa: abstract 'entry' and 'retw' in assembly code
xtensa: One function call less in bootmem_init()
xtensa: remove arch/xtensa/include/asm/types.h
xtensa: use generic pcibios_set_master and pcibios_enable_device
xtensa: drop dead PCI support code
xtensa/PCI: Remove unused variable
|
|
Pull safesetid updates from Micah Morton:
"These changes from Jann Horn fix a couple issues in the recently added
SafeSetID LSM:
- There was a simple logic bug in one of the hooks for the LSM where
the code was incorrectly returning early in some cases before all
security checks had been passed.
- There was a more high level issue with how this LSM gets configured
that could allow for a program to bypass the security restrictions
by switching to an allowed UID and then again to any other UID on
the system if the target UID of the first transition is
unconstrained on the system. Luckily this is an easy fix that we
now enforce at the time the LSM gets configured.
There are also some changes from Jann that make policy updates for
this LSM atomic. Kees Cook, Jann and myself have reviewed these
changes and they look good from our point of view"
* tag 'safesetid-5.3' of git://github.com/micah-morton/linux:
LSM: SafeSetID: fix use of literal -1 in capable hook
LSM: SafeSetID: verify transitive constrainedness
LSM: SafeSetID: add read handler
LSM: SafeSetID: rewrite userspace API to atomic updates
LSM: SafeSetID: fix userns handling in securityfs
LSM: SafeSetID: refactor policy parsing
LSM: SafeSetID: refactor safesetid_security_capable()
LSM: SafeSetID: refactor policy hash table
LSM: SafeSetID: fix check for setresuid(new1, new2, new3)
LSM: SafeSetID: fix pr_warn() to include newline
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd and clone3 fixes from Christian Brauner:
"This contains a bugfix for CLONE_PIDFD when used with the legacy clone
syscall, two fixes to ensure that syscall numbering and clone3
entrypoint implementations will stay consistent, and an update for the
maintainers file:
- The addition of clone3 broke CLONE_PIDFD for legacy clone on all
architectures that use do_fork() directly instead of calling the
clone syscall itself. (Fwiw, cleaning do_fork() up is on my todo.)
The reason this happened was that during conversion of _do_fork()
to use struct kernel_clone_args we missed that do_fork() is called
directly by various architectures. This is fixed by making sure
that the pidfd argument in struct kernel_clone_args is correctly
initialized with the parent_tidptr argument passed down from
do_fork(). Additionally, do_fork() missed a check to make
CLONE_PIDFD and CLONE_PARENT_SETTID mutually exclusive just a
clone() does. This is now fixed too.
- When clone3() was introduced we skipped architectures that require
special handling for fork-like syscalls. Their syscall tables did
not contain any mention of clone3().
To make sure that Arnd's work to make syscall numbers on all
architectures identical (minus alpha) was not for naught we are
placing a comment in all syscall tables that do not yet implement
clone3(). The comment makes it clear that 435 is reserved for
clone3 and should not be used.
- Also, this contains a patch to make the clone3() syscall definition
in asm-generic/unist.h conditional on __ARCH_WANT_SYS_CLONE3. This
lets us catch new architectures that implicitly make use of clone3
without setting __ARCH_WANT_SYS_CLONE3 which is a good indicator
that they did not check whether it needs special treatment or not.
- Finally, this contains a patch to add me as maintainer for pidfd
stuff so people can start blaming me (more)"
* tag 'for-linus-20190715' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
MAINTAINERS: add new entry for pidfd api
unistd: protect clone3 via __ARCH_WANT_SYS_CLONE3
arch: mark syscall number 435 reserved for clone3
clone: fix CLONE_PIDFD support
|
|
This fixes two problems reported with the cmdline simplification and
cleanup last year:
- the setproctitle() special cases didn't quite match the original
semantics, and it can be noticeable:
https://lore.kernel.org/lkml/[email protected]/
- it could leak an uninitialized byte from the temporary buffer under
the right (wrong) circustances:
https://lore.kernel.org/lkml/[email protected]/
It rewrites the logic entirely, splitting it into two separate commits
(and two separate functions) for the two different cases ("unedited
cmdline" vs "setproctitle() has been used to change the command line").
* proc-cmdline:
/proc/<pid>/cmdline: add back the setproctitle() special case
/proc/<pid>/cmdline: remove all the special cases
|
|
This makes the setproctitle() special case very explicit indeed, and
handles it with a separate helper function entirely. In the process, it
re-instates the original semantics of simply stopping at the first NUL
character when the original last NUL character is no longer there.
[ The original semantics can still be seen in mm/util.c: get_cmdline()
that is limited to a fixed-size buffer ]
This makes the logic about when we use the string lengths etc much more
obvious, and makes it easier to see what we do and what the two very
different cases are.
Note that even when we allow walking past the end of the argument array
(because the setproctitle() might have overwritten and overflowed the
original argv[] strings), we only allow it when it overflows into the
environment region if it is immediately adjacent.
[ Fixed for missing 'count' checks noted by Alexey Izbyshev ]
Link: https://lore.kernel.org/lkml/[email protected]/
Fixes: 5ab827189965 ("fs/proc: simplify and clarify get_mm_cmdline() function")
Cc: Jakub Jankowski <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Alexey Izbyshev <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Start off with a clean slate that only reads exactly from arg_start to
arg_end, without any oddities. This simplifies the code and in the
process removes the case that caused us to potentially leak an
uninitialized byte from the temporary kernel buffer.
Note that in order to start from scratch with an understandable base,
this simplifies things _too_ much, and removes all the legacy logic to
handle setproctitle() having changed the argument strings.
We'll add back those special cases very differently in the next commit.
Link: https://lore.kernel.org/lkml/[email protected]/
Fixes: f5b65348fd77 ("proc: fix missing final NUL in get_mm_cmdline() rewrite")
Cc: Alexey Izbyshev <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"New Functionality:
- Provide support for ACPI enumeration; gpio_backlight
Fix-ups:
- SPDX fixups; pwm_bl
- Fix linear brightness levels to include number available; pwm_bl"
* tag 'backlight-next-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: pwm_bl: Fix heuristic to determine number of brightness levels
backlight: gpio_backlight: Enable ACPI enumeration
backlight: pwm_bl: Convert to use SPDX identifier
|