Age | Commit message (Collapse) | Author | Files | Lines |
|
Starting from v4.19 commit 29ef680ae7c2 ("memcg, oom: move out_of_memory
back to the charge path") cgroup oom killer is no longer invoked only
from page faults. Now it implements the same semantics as global OOM
killer: allocation context invokes OOM killer and keeps retrying until
success.
[[email protected]: fixes per Randy]
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Randy Dunlap <[email protected]>
Link: http://lkml.kernel.org/r/158894738928.208854.5244393925922074518.stgit@buzz
Signed-off-by: Linus Torvalds <[email protected]>
|
|
flush_icache_range generally operates on kernel addresses, but for some
reason m68k needed a set_fs override. Move that into the m68k code
insted of keeping it in the module loader.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Acked-by: Jessica Yu <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
These obviously operate on user addresses.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
load_flat_file works on user addresses.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Cc: Alexander Viro <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
read_code operates on user addresses.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Only build read_code when binary formats that use it are built into the
kernel.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rename the current flush_icache_range to flush_icache_user_range as per
commit ae92ef8a4424 ("PATCH] flush icache in correct context") there
seems to be an assumption that it operates on user addresses. Add a
flush_icache_range around it that for now is a no-op.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
flush_icache_user_range will be the name for a generic primitive. Move
the arm name so that arm already has an implementation.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Russell King <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The Xtensa implementation of flush_icache_range seems to be able to cope
with user addresses. Just define flush_icache_user_range to
flush_icache_range.
[[email protected]: fix flush_icache_user_range in noMMU configs]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Max Filippov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Max Filippov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The SuperH implementation of flush_icache_range seems to be able to cope
with user addresses. Just define flush_icache_user_range to
flush_icache_range.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Define flush_icache_user_range to flush_icache_range unless the
architecture provides its own implementation.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The function currently known as flush_icache_user_range only operates on
a single page. Rename it to flush_icache_user_page as we'll need the
name flush_icache_user_range for something else soon.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
flush_icache_user_range is only used by <asm-generic/cacheflush.h>, so
remove it from the architectures that implement it, but don't use
<asm-generic/cacheflush.h>.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Russell King <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guan Xuetao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
RISC-V needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Also remove the pointless __KERNEL__ ifdef while we're at it.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Power needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Also remove the pointless __KERNEL__ ifdef while we're at it.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
OpenRISC needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Stafford Horne <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
m68knommu needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Microblaze needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Michal Simek <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
IA64 needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Hexagon needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Brian Cain <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
C6x needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Mark Salter <[email protected]>
Cc: Aurelien Jacquiot <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
ARM64 needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Alpha needs almost no cache flushing routines of its own. Rely on
asm-generic/cacheflush.h for the defaults.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is a magic ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE cpp symbol that
guards non-stub availability of flush_dcache_pagge. Use that to check
if flush_dcache_pagg is implemented.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This seems to lead to some crazy include loops when using
asm-generic/cacheflush.h on more architectures, so leave it to the arch
header for now.
[[email protected]: fix warning]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
cacheflush.h uses a somewhat to generic include guard name that clashes
with various arch files. Use a more specific one.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
flush_cache_user_range is an ARMism not used by any generic or unicore32
specific code.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Guan Xuetao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
flush_icache_user_range is only used by copy_to_user_page, which is only
used by core VM code.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
flush_icache_page is only used by mm/memory.c.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "sort out the flush_icache_range mess", v2.
flush_icache_range is mostly used for kernel address, except for the
following cases:
- the nommu brk and mmap implementations
- the read_code helper that is only used for binfmt_flat,
binfmt_elf_fdpic, and binfmt_aout including the broken
ia32 compat version
- binfmt_flat itself
none of which really are used by a typical MMU enabled kernel, as a.out
can only be build for alpha and m68k to start with.
But strangely enough commit ae92ef8a4424 ("PATCH] flush icache in
correct context") added a "set_fs(KERNEL_DS)" around the
flush_icache_range call in the module loader, because apparently m68k
assumed user pointers.
This series first cleans up the cacheflush implementations, largely by
switching as much as possible to the asm-generic version after a few
preparations, then moves the misnamed current flush_icache_user_range to
a new name, to finally introduce a real flush_icache_user_range to be
used for the above use cases to flush the instruction cache for a
userspace address range. The last patch then drops the set_fs in the
module code and moves it into the m68k implementation.
This patch (of 29):
The arguments passed look bogus, try to fix them to something that seems
to make sense.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Roman Zippel <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Aurelien Jacquiot <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This code was using get_user_pages*(), in approximately a "Case 5"
scenario (accessing the data within a page), using the categorization
from [1]. That means that it's time to convert the get_user_pages*() +
put_page() calls to pin_user_pages*() + unpin_user_pages() calls.
There is some helpful background in [2]: basically, this is a small part
of fixing a long-standing disconnect between pinning pages, and file
systems' use of those pages.
[1] Documentation/core-api/pin_user_pages.rst
[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Acked-by: Pankaj Gupta <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "vhost, docs: convert to pin_user_pages(), new "case 5""
It recently became clear to me that there are some get_user_pages*()
callers that don't fit neatly into any of the four cases that are so far
listed in pin_user_pages.rst. vhost.c is one of those.
Add a Case 5 to the documentation, and refer to that when converting
vhost.c.
Thanks to Jan Kara for helping me (again) in understanding the
interaction between get_user_pages() and page writeback [1].
This is based on today's mmotm, which has a nearby patch to
pin_user_pages.rst that rewords cases 3 and 4.
Note that I have only compile-tested the vhost.c patch, although that
does also include cross-compiling for a few other arches. Any run-time
testing would be greatly appreciated.
[1] https://lore.kernel.org/r/[email protected]
This patch (of 2):
There are four cases listed in pin_user_pages.rst. These are intended
to help developers figure out whether to use get_user_pages*(), or
pin_user_pages*(). However, the four cases do not cover all the
situations. For example, drivers/vhost/vhost.c has a "pin, write to
page, set page dirty, unpin" case.
Add a fifth case, to help explain that there is a general pattern that
requires pin_user_pages*() API calls.
[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
All of the pin_user_pages*() API calls will cause pages to be
dma-pinned. As such, they are all suitable for either DMA, RDMA, and/or
Direct IO.
The documentation should say so, but it was instead saying that three of
the API calls were only suitable for Direct IO. This was discovered
when a reviewer wondered why an API call that specifically recommended
against Case 2 (DMA/RDMA) was being used in a DMA situation [1].
Fix this by simply deleting those claims. The gup.c comments already
refer to the more extensive Documentation/core-api/pin_user_pages.rst,
which does have the correct guidance. So let's just write it once,
there.
[1] https://lore.kernel.org/r/20200529074658.GM30374@kadam
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Pankaj Gupta <[email protected]>
Acked-by: Souptick Joarder <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This code was using get_user_pages*(), and all of the callers so far
were in a "Case 2" scenario (DMA/RDMA), using the categorization from [1].
That means that it's time to convert the get_user_pages*() + put_page()
calls to pin_user_pages*() + unpin_user_pages() calls.
There is some helpful background in [2]: basically, this is a small part
of fixing a long-standing disconnect between pinning pages, and file
systems' use of those pages.
[1] Documentation/core-api/pin_user_pages.rst
[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Souptick Joarder <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm/gup: introduce pin_user_pages_locked(), use it in frame_vector.c", v2.
This adds yet one more pin_user_pages*() variant, and uses that to
convert mm/frame_vector.c.
With this, along with maybe 20 or 30 other recent patches in various
trees, we are close to having the relevant gup call sites
converted--with the notable exception of the bio/block layer.
This patch (of 2):
Introduce pin_user_pages_locked(), which is nearly identical to
get_user_pages_locked() except that it sets FOLL_PIN and rejects
FOLL_GET.
As with other pairs of get_user_pages*() and pin_user_pages() API calls,
it's prudent to assert that FOLL_PIN is *not* set in the
get_user_pages*() call, so add that as part of this.
[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Pankaj Gupta <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Souptick Joarder <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update case 3 so that it covers the use of mmu notifiers, for hardware
that does, or does not have replayable page faults.
Also, elaborate case 4 slightly, as it was quite cryptic.
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
API __get_user_pages_fast() renamed to get_user_pages_fast_only() to
align with pin_user_pages_fast_only().
As part of this we will get rid of write parameter. Instead caller will
pass FOLL_WRITE to get_user_pages_fast_only(). This will not change any
existing functionality of the API.
All the callers are changed to pass FOLL_WRITE.
Also introduce get_user_page_fast_only(), and use it in a few places
that hard-code nr_pages to 1.
Updated the documentation of the API.
Signed-off-by: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Reviewed-by: Paul Mackerras <[email protected]> [arch/powerpc/kvm]
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Michal Suchanek <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Users with SYS_ADMIN capability can add arbitrary taint flags to the
running kernel by writing to /proc/sys/kernel/tainted or issuing the
command 'sysctl -w kernel.tainted=...'. This interface, however, is
open for any integer value and this might cause an invalid set of flags
being committed to the tainted_mask bitset.
This patch introduces a simple way for proc_taint() to ignore any
eventual invalid bit coming from the user input before committing those
bits to the kernel tainted_mask.
Signed-off-by: Rafael Aquini <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Usually when the kernel reaches an oops condition, it's a point of no
return; in case not enough debug information is available in the kernel
splat, one of the last resorts would be to collect a kernel crash dump
and analyze it. The problem with this approach is that in order to
collect the dump, a panic is required (to kexec-load the crash kernel).
When in an environment of multiple virtual machines, users may prefer to
try living with the oops, at least until being able to properly shutdown
their VMs / finish their important tasks.
This patch implements a way to collect a bit more debug details when an
oops event is reached, by printing all the CPUs backtraces through the
usage of NMIs (on architectures that support that). The sysctl added
(and documented) here was called "oops_all_cpu_backtrace", and when set
will (as the name suggests) dump all CPUs backtraces.
Far from ideal, this may be the last option though for users that for
some reason cannot panic on oops. Most of times oopses are clear enough
to indicate the kernel portion that must be investigated, but in virtual
environments it's possible to observe hypervisor/KVM issues that could
lead to oopses shown in other guests CPUs (like virtual APIC crashes).
This patch hence aims to help debug such complex issues without
resorting to kdump.
Signed-off-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
detected
Commit 401c636a0eeb ("kernel/hung_task.c: show all hung tasks before
panic") introduced a change in that we started to show all CPUs
backtraces when a hung task is detected _and_ the sysctl/kernel
parameter "hung_task_panic" is set. The idea is good, because usually
when observing deadlocks (that may lead to hung tasks), the culprit is
another task holding a lock and not necessarily the task detected as
hung.
The problem with this approach is that dumping backtraces is a slightly
expensive task, specially printing that on console (and specially in
many CPU machines, as servers commonly found nowadays). So, users that
plan to collect a kdump to investigate the hung tasks and narrow down
the deadlock definitely don't need the CPUs backtrace on dmesg/console,
which will delay the panic and pollute the log (crash tool would easily
grab all CPUs traces with 'bt -a' command).
Also, there's the reciprocal scenario: some users may be interested in
seeing the CPUs backtraces but not have the system panic when a hung
task is detected. The current approach hence is almost as embedding a
policy in the kernel, by forcing the CPUs backtraces' dump (only) on
hung_task_panic.
This patch decouples the panic event on hung task from the CPUs
backtraces dump, by creating (and documenting) a new sysctl called
"hung_task_all_cpu_backtrace", analog to the approach taken on soft/hard
lockups, that have both a panic and an "all_cpu_backtrace" sysctl to
allow individual control. The new mechanism for dumping the CPUs
backtraces on hung task detection respects "hung_task_warnings" by not
dumping the traces in case there's no warnings left.
Signed-off-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After a recent change introduced by Vlastimil's series [0], kernel is
able now to handle sysctl parameters on kernel command line; also, the
series introduced a simple infrastructure to convert legacy boot
parameters (that duplicate sysctls) into sysctl aliases.
This patch converts the watchdog parameters softlockup_panic and
{hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure.
It fixes the documentation too, since the alias only accepts values 0 or
1, not the full range of integers.
We also took the opportunity here to improve the documentation of the
previously converted hung_task_panic (see the patch series [0]) and put
the alias table in alphabetical order.
[0] http://lkml.kernel.org/r/[email protected]
Signed-off-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Testing is done by a new parameter debug.test_sysctl.boot_int which
defaults to 0 and it's expected that the tester passes a boot parameter
that sets it to 1. The test checks if it's set to 1.
To distinguish true failure from parameter not being set, the test
checks /proc/cmdline for the expected parameter, and whether test_sysctl
is built-in and not a module.
[[email protected]: skip the new test if boot_int sysctl is not present]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Guilherme G . Piccoli" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Ivan Teterevkov <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The testing script recommends CONFIG_TEST_SYSCTL=y, but actually only
works with CONFIG_TEST_SYSCTL=m. Testing of sysctl setting via boot
param however requires the test to be built-in, so make sure the test
script supports it.
Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Luis Chamberlain <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Guilherme G . Piccoli" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Ivan Teterevkov <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We can now handle sysctl parameters on kernel command line and have
infrastructure to convert legacy command line options that duplicate
sysctl to become a sysctl alias.
This patch converts the hung_task_panic parameter. Note that the sysctl
handler is more strict and allows only 0 and 1, while the legacy
parameter allowed any non-zero value. But there is little reason anyone
would not be using 1.
Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Guilherme G . Piccoli" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Ivan Teterevkov <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We can now handle sysctl parameters on kernel command line, but
historically some parameters introduced their own command line
equivalent, which we don't want to remove for compatibility reasons.
We can, however, convert them to the generic infrastructure with a table
translating the legacy command line parameters to their sysctl names,
and removing the one-off param handlers.
This patch adds the support and makes the first conversion to
demonstrate it, on the (deprecated) numa_zonelist_order parameter.
Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Guilherme G . Piccoli" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Ivan Teterevkov <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "support setting sysctl parameters from kernel command line", v3.
This series adds support for something that seems like many people
always wanted but nobody added it yet, so here's the ability to set
sysctl parameters via kernel command line options in the form of
sysctl.vm.something=1
The important part is Patch 1. The second, not so important part is an
attempt to clean up legacy one-off parameters that do the same thing as
a sysctl. I don't want to remove them completely for compatibility
reasons, but with generic sysctl support the idea is to remove the
one-off param handlers and treat the parameters as aliases for the
sysctl variants.
I have identified several parameters that mention sysctl counterparts in
Documentation/admin-guide/kernel-parameters.txt but there might be more.
The conversion also has varying level of success:
- numa_zonelist_order is converted in Patch 2 together with adding the
necessary infrastructure. It's easy as it doesn't really do anything
but warn on deprecated value these days.
- hung_task_panic is converted in Patch 3, but there's a downside that
now it only accepts 0 and 1, while previously it was any integer
value
- nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic,
so there's no straighforward conversion possible
- traceoff_on_warning is a flag without value and it would be required
to handle that somehow in the conversion infractructure, which seems
pointless for a single flag
This patch (of 5):
A recently proposed patch to add vm_swappiness command line parameter in
addition to existing sysctl [1] made me wonder why we don't have a
general support for passing sysctl parameters via command line.
Googling found only somebody else wondering the same [2], but I haven't
found any prior discussion with reasons why not to do this.
Settings the vm_swappiness issue aside (the underlying issue might be
solved in a different way), quick search of kernel-parameters.txt shows
there are already some that exist as both sysctl and kernel parameter -
hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning.
A general mechanism would remove the need to add more of those one-offs
and might be handy in situations where configuration by e.g.
/etc/sysctl.d/ is impractical.
Hence, this patch adds a new parse_args() pass that looks for parameters
prefixed by 'sysctl.' and tries to interpret them as writes to the
corresponding sys/ files using an temporary in-kernel procfs mount.
This mechanism was suggested by Eric W. Biederman [3], as it handles
all dynamically registered sysctl tables, even though we don't handle
modular sysctls. Errors due to e.g. invalid parameter name or value
are reported in the kernel log.
The processing is hooked right before the init process is loaded, as
some handlers might be more complicated than simple setters and might
need some subsystems to be initialized. At the moment the init process
can be started and eventually execute a process writing to /proc/sys/
then it should be also fine to do that from the kernel.
Sysctls registered later on module load time are not set by this
mechanism - it's expected that in such scenarios, setting sysctl values
from userspace is practical enough.
[1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
[2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
[3] https://lore.kernel.org/r/[email protected]/
Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Ivan Teterevkov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Guilherme G . Piccoli" <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Christian Brauner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
__xa_store() and xa_store() document that the functions can fail, and
that the return code can be an xa_err() encoded error code.
xa_store_bh() and xa_store_irq() do not document that the functions can
fail and that they can also return xa_err() encoded error codes.
Thus: Update the documentation.
Signed-off-by: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Analogously to the introduction of panic_on_warn, this patch introduces
a kernel option named panic_on_taint in order to provide a simple and
generic way to stop execution and catch a coredump when the kernel gets
tainted by any given flag.
This is useful for debugging sessions as it avoids having to rebuild the
kernel to explicitly add calls to panic() into the code sites that
introduce the taint flags of interest.
For instance, if one is interested in proceeding with a post-mortem
analysis at the point a given code path is hitting a bad page (i.e.
unaccount_page_cache_page(), or slab_bug()), a coredump can be collected
by rebooting the kernel with 'panic_on_taint=0x20' amended to the
command line.
Another, perhaps less frequent, use for this option would be as a means
for assuring a security policy case where only a subset of taints, or no
single taint (in paranoid mode), is allowed for the running system. The
optional switch 'nousertaint' is handy in this particular scenario, as
it will avoid userspace induced crashes by writes to sysctl interface
/proc/sys/kernel/tainted causing false positive hits for such policies.
[[email protected]: tweak kernel-parameters.txt wording]
Suggested-by: Qian Cai <[email protected]>
Signed-off-by: Rafael Aquini <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Adrian Bunk <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Jeff Mahoney <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Takashi Iwai <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Instead of enabling dynamic debug globally with CONFIG_DYNAMIC_DEBUG,
CONFIG_DYNAMIC_DEBUG_CORE will only enable core function of dynamic
debug. With the DYNAMIC_DEBUG_MODULE defined for any modules, dynamic
debug will be tied to them.
This is useful for people who only want to enable dynamic debug for
kernel modules without worrying about kernel image size and memory
consumption is increasing too much.
[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Orson Zhai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Acked-by: Petr Mladek <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Randy Dunlap <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
the reason is to avoid a delay caused by the synchronize_rcu() call in
kern_umount() when the mqueue mount is freed.
the code:
#define _GNU_SOURCE
#include <sched.h>
#include <error.h>
#include <errno.h>
#include <stdlib.h>
int main()
{
int i;
for (i = 0; i < 1000; i++)
if (unshare(CLONE_NEWIPC) < 0)
error(EXIT_FAILURE, errno, "unshare");
}
goes from
Command being timed: "./ipc-namespace"
User time (seconds): 0.00
System time (seconds): 0.06
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.05
to
Command being timed: "./ipc-namespace"
User time (seconds): 0.00
System time (seconds): 0.02
Percent of CPU this job got: 96%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Reviewed-by: Waiman Long <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|