Age | Commit message (Collapse) | Author | Files | Lines |
|
Using padata during deferred init has only been tested on x86, so for now
limit it to this architecture.
If another arch wants this, it can find the max thread limit that's best
for it and override deferred_page_init_max_threads().
Signed-off-by: Daniel Jordan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Josh Triplett <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Robert Elliott <[email protected]>
Cc: Shile Zhang <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sometimes the kernel doesn't take full advantage of system memory
bandwidth, leading to a single CPU spending excessive time in
initialization paths where the data scales with memory size.
Multithreading naturally addresses this problem.
Extend padata, a framework that handles many parallel yet singlethreaded
jobs, to also handle multithreaded jobs by adding support for splitting up
the work evenly, specifying a minimum amount of work that's appropriate
for one helper thread to do, load balancing between helpers, and
coordinating them.
This is inspired by work from Pavel Tatashin and Steve Sistare.
Signed-off-by: Daniel Jordan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Josh Triplett <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Robert Elliott <[email protected]>
Cc: Shile Zhang <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
padata allocates per-CPU, per-instance work structs for parallel jobs. A
do_parallel call assigns a job to a sequence number and hashes the number
to a CPU, where the job will eventually run using the corresponding work.
This approach fit with how padata used to bind a job to each CPU
round-robin, makes less sense after commit bfde23ce200e6 ("padata: unbind
parallel jobs from specific CPUs") because a work isn't bound to a
particular CPU anymore, and isn't needed at all for multithreaded jobs
because they don't have sequence numbers.
Replace the per-CPU works with a preallocated pool, which allows sharing
them between existing padata users and the upcoming multithreaded user.
The pool will also facilitate setting NUMA-aware concurrency limits with
later users.
The pool is sized according to the number of possible CPUs. With this
limit, MAX_OBJ_NUM no longer makes sense, so remove it.
If the global pool is exhausted, a parallel job is run in the current task
instead to throttle a system trying to do too much in parallel.
Signed-off-by: Daniel Jordan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Josh Triplett <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Robert Elliott <[email protected]>
Cc: Shile Zhang <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
padata will soon initialize the system's struct pages in parallel, so it
needs to be ready by page_alloc_init_late().
The error return from padata_driver_init() triggers an initcall warning,
so add a warning to padata_init() to avoid silent failure.
Signed-off-by: Daniel Jordan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Josh Triplett <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Robert Elliott <[email protected]>
Cc: Shile Zhang <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Initializing struct pages is a long task and keeping interrupts disabled
for the duration of this operation introduces a number of problems.
1. jiffies are not updated for long period of time, and thus incorrect time
is reported. See proposed solution and discussion here:
lkml/[email protected]
2. It prevents farther improving deferred page initialization by allowing
intra-node multi-threading.
We are keeping interrupts disabled to solve a rather theoretical problem
that was never observed in real world (See 3a2d7fa8a3d5).
Let's keep interrupts enabled. In case we ever encounter a scenario where
an interrupt thread wants to allocate large amount of memory this early in
boot we can deal with that by growing zone (see deferred_grow_zone()) by
the needed amount before starting deferred_init_memmap() threads.
Before:
[ 1.232459] node 0 initialised, 12058412 pages in 1ms
After:
[ 1.632580] node 0 initialised, 12051227 pages in 436ms
Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages")
Reported-by: Shile Zhang <[email protected]>
Signed-off-by: Pavel Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: James Morris <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Yiqian Wei <[email protected]>
Cc: <[email protected]> [4.17+]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Restrict elements in compound_page_dtors[] array per NR_COMPOUND_DTORS and
explicitly position them according to enum compound_dtor_id. This
improves protection against possible misalignment between
compound_page_dtors[] and enum compound_dtor_id later on.
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pageblock migrate type is encoded in GFP flags, just as zone_type and
zonelist.
Currently we use gfp_zone() and gfp_zonelist() to extract related
information, it would be proper to use the same naming convention for
migrate type.
Signed-off-by: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
classzone_idx is just different name for high_zoneidx now. So, integrate
them and add some comment to struct alloc_context in order to reduce
future confusion about the meaning of this variable.
The accessor, ac_classzone_idx() is also removed since it isn't needed
after integration.
In addition to integration, this patch also renames high_zoneidx to
highest_zoneidx since it represents more precise meaning.
Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ye Xiaolong <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since commit 397dc00e249ec64e10 ("mips: sgi-ip27: switch from DISCONTIGMEM
to SPARSEMEM"), the last caller of free_bootmem_with_active_regions() was
gone. Now no user calls it any more.
Let's remove it.
Signed-off-by: Baoquan He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
free_area_init_node() is only used by x86 to initialize a memory-less
nodes. Make its name reflect this and drop all the function parameters
except node ID as they are anyway zero.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Hoan Tran <[email protected]> [arm64]
Cc: Baoquan He <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Some architectures (e.g. ARC) have the ZONE_HIGHMEM zone below the
ZONE_NORMAL. Allowing free_area_init() parse max_zone_pfn array even it
is sorted in descending order allows using free_area_init() on such
architectures.
Add top -> down traversal of max_zone_pfn array in free_area_init() and
use the latter in ARC node/zone initialization.
[[email protected]: ARC fix]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: arc: free_area_init(): take into account PAE40 mode]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: declare arch_has_descending_max_zone_pfns()]
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Hoan Tran <[email protected]> [arm64]
Reviewed-by: Baoquan He <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Guenter Roeck <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
free_area_init() has effectively became a wrapper for
free_area_init_nodes() and there is no point of keeping it. Still
free_area_init() name is shorter and more general as it does not imply
necessity to initialize multiple nodes.
Rename free_area_init_nodes() to free_area_init(), update the callers and
drop old version of free_area_init().
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Hoan Tran <[email protected]> [arm64]
Reviewed-by: Baoquan He <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently, architectures that use free_area_init() to initialize memory
map and node and zone structures need to calculate zone and hole sizes.
We can use free_area_init_nodes() instead and let it detect the zone
boundaries while the architectures will only have to supply the possible
limits for the zones.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Hoan Tran <[email protected]> [arm64]
Reviewed-by: Baoquan He <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
nodes and zones structures between the systems that have region to node
mapping in memblock and those that don't.
Currently all the NUMA architectures enable this option and for the
non-NUMA systems we can presume that all the memory belongs to node 0 and
therefore the compile time configuration option is not required.
The remaining few architectures that use DISCONTIGMEM without NUMA are
easily updated to use memblock_add_node() instead of memblock_add() and
thus have proper correspondence of memblock regions to NUMA nodes.
Still, free_area_init_node() must have a backward compatible version
because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
different. Once all the architectures will use the new semantics, the
entire compatibility layer can be dropped.
To avoid addition of extra run time memory to store node id for
architectures that keep memblock but have only a single node, the node id
field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
the corresponding accessors presume that in those cases it is always 0.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Hoan Tran <[email protected]> [arm64]
Acked-by: Catalin Marinas <[email protected]> [arm64]
Cc: Baoquan He <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
early_pfn_to_nid() and its helper __early_pfn_to_nid() are spread around
include/linux/mm.h, include/linux/mmzone.h and mm/page_alloc.c.
Drop unused stub for __early_pfn_to_nid() and move its actual generic
implementation close to its users.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Hoan Tran <[email protected]> [arm64]
Reviewed-by: Baoquan He <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It seems that the existing documentation is not explicit about the
expected usage and potential risks enough. While it is calls out that
users have to free memory when using this flag it is not really apparent
that users have to careful to not deplete memory reserves and that they
should implement some sort of throttling wrt. freeing process.
This is partly based on Neil's explanation [1].
Let's also call out that a pre allocated pool allocator should be
considered.
[1] http://lkml.kernel.org/r/[email protected]
[[email protected]: coding style fixes]
[[email protected]: update]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: John Hubbard <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The memcmp KASAN self-test fails on a kernel with both KASAN and
FORTIFY_SOURCE.
When FORTIFY_SOURCE is on, a number of functions are replaced with
fortified versions, which attempt to check the sizes of the operands.
However, these functions often directly invoke __builtin_foo() once they
have performed the fortify check. Using __builtins may bypass KASAN
checks if the compiler decides to inline it's own implementation as
sequence of instructions, rather than emit a function call that goes out
to a KASAN-instrumented implementation.
Why is only memcmp affected?
============================
Of the string and string-like functions that kasan_test tests, only memcmp
is replaced by an inline sequence of instructions in my testing on x86
with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2).
I believe this is due to compiler heuristics. For example, if I annotate
kmalloc calls with the alloc_size annotation (and disable some fortify
compile-time checking!), the compiler will replace every memset except the
one in kmalloc_uaf_memset with inline instructions. (I have some WIP
patches to add this annotation.)
Does this affect other functions in string.h?
=============================================
Yes. Anything that uses __builtin_* rather than __real_* could be
affected. This looks like:
- strncpy
- strcat
- strlen
- strlcpy maybe, under some circumstances?
- strncat under some circumstances
- memset
- memcpy
- memmove
- memcmp (as noted)
- memchr
- strcpy
Whether a function call is emitted always depends on the compiler. Most
bugs should get caught by FORTIFY_SOURCE, but the missed memcmp test shows
that this is not always the case.
Isn't FORTIFY_SOURCE disabled with KASAN?
========================================-
The string headers on all arches supporting KASAN disable fortify with
kasan, but only when address sanitisation is _also_ disabled. For example
from x86:
#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
/*
* For files that are not instrumented (e.g. mm/slub.c) we
* should use not instrumented version of mem* functions.
*/
#define memcpy(dst, src, len) __memcpy(dst, src, len)
#define memmove(dst, src, len) __memmove(dst, src, len)
#define memset(s, c, n) __memset(s, c, n)
#ifndef __NO_FORTIFY
#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
#endif
#endif
This comes from commit 6974f0c4555e ("include/linux/string.h: add the
option of fortified string.h functions"), and doesn't work when KASAN is
enabled and the file is supposed to be sanitised - as with test_kasan.c
I'm pretty sure this is not wrong, but not as expansive it should be:
* we shouldn't use __builtin_memcpy etc in files where we don't have
instrumentation - it could devolve into a function call to memcpy,
which will be instrumented. Rather, we should use __memcpy which
by convention is not instrumented.
* we also shouldn't be using __builtin_memcpy when we have a KASAN
instrumented file, because it could be replaced with inline asm
that will not be instrumented.
What is correct behaviour?
==========================
Firstly, there is some overlap between fortification and KASAN: both
provide some level of _runtime_ checking. Only fortify provides
compile-time checking.
KASAN and fortify can pick up different things at runtime:
- Some fortify functions, notably the string functions, could easily be
modified to consider sub-object sizes (e.g. members within a struct),
and I have some WIP patches to do this. KASAN cannot detect these
because it cannot insert poision between members of a struct.
- KASAN can detect many over-reads/over-writes when the sizes of both
operands are unknown, which fortify cannot.
So there are a couple of options:
1) Flip the test: disable fortify in santised files and enable it in
unsanitised files. This at least stops us missing KASAN checking, but
we lose the fortify checking.
2) Make the fortify code always call out to real versions. Do this only
for KASAN, for fear of losing the inlining opportunities we get from
__builtin_*.
(We can't use kasan_check_{read,write}: because the fortify functions are
_extern inline_, you can't include _static_ inline functions without a
compiler warning. kasan_check_{read,write} are static inline so we can't
use them even when they would otherwise be suitable.)
Take approach 2 and call out to real versions when KASAN is enabled.
Use __underlying_foo to distinguish from __real_foo: __real_foo always
refers to the kernel's implementation of foo, __underlying_foo could be
either the kernel implementation or the __builtin_foo implementation.
This is sometimes enough to make the memcmp test succeed with
FORTIFY_SOURCE enabled. It is at least enough to get the function call
into the module. One more fix is needed to make it reliable: see the next
patch.
Fixes: 6974f0c4555e ("include/linux/string.h: add the option of fortified string.h functions")
Signed-off-by: Daniel Axtens <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: David Gow <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Cc: Daniel Micay <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is the FOLL_PIN equivalent of __get_user_pages_fast(), except with a
more descriptive name, and gup_flags instead of a boolean "write" in the
argument list.
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: "Joonas Lahtinen" <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There were two nearly identical sets of code for gup_fast() style of
walking the page tables with interrupts disabled. This has lead to the
usual maintenance problems that arise from having duplicated code.
There is already a core internal routine in gup.c for gup_fast(), so just
enhance it very slightly: allow skipping the fall-back to "slow" (regular)
get_user_pages(), via the new FOLL_FAST_ONLY flag. Then, just call
internal_get_user_pages_fast() from __get_user_pages_fast(), and adjust
the API to match pre-existing API behavior.
There is a change in behavior from this refactoring: the nested form of
interrupt disabling is used in all gup_fast() variants now. That's
because there is only one place that interrupt disabling for page walking
is done, and so the safer form is required. This should, if anything,
eliminate possible (rare) bugs, because the non-nested form of enabling
interrupts was fragile at best.
[[email protected]: fixup]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: "Joonas Lahtinen" <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Highlights:
- speedup dead root detection during orphan cleanup, eg. when there
are many deleted subvolumes waiting to be cleaned, the trees are
now looked up in radix tree instead of a O(N^2) search
- snapshot creation with inherited qgroup will mark the qgroup
inconsistent, requires a rescan
- send will emit file capabilities after chown, this produces a
stream that does not need postprocessing to set the capabilities
again
- direct io ported to iomap infrastructure, cleaned up and simplified
code, notably removing last use of struct buffer_head in btrfs code
Core changes:
- factor out backreference iteration, to be used by ordinary
backreferences and relocation code
- improved global block reserve utilization
* better logic to serialize requests
* increased maximum available for unlink
* improved handling on large pages (64K)
- direct io cleanups and fixes
* simplify layering, where cloned bios were unnecessarily created
for some cases
* error handling fixes (submit, endio)
* remove repair worker thread, used to avoid deadlocks during
repair
- refactored block group reading code, preparatory work for new type
of block group storage that should improve mount time on large
filesystems
Cleanups:
- cleaned up (and slightly sped up) set/get helpers for metadata data
structure members
- root bit REF_COWS got renamed to SHAREABLE to reflect the that the
blocks of the tree get shared either among subvolumes or with the
relocation trees
Fixes:
- when subvolume deletion fails due to ENOSPC, the filesystem is not
turned read-only
- device scan deals with devices from other filesystems that changed
ownership due to overwrite (mkfs)
- fix a race between scrub and block group removal/allocation
- fix long standing bug of a runaway balance operation, printing the
same line to the syslog, caused by a stale status bit on a reloc
tree that prevented progress
- fix corrupt log due to concurrent fsync of inodes with shared
extents
- fix space underflow for NODATACOW and buffered writes when it for
some reason needs to fallback to COW mode"
* tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
btrfs: fix space_info bytes_may_use underflow during space cache writeout
btrfs: fix space_info bytes_may_use underflow after nocow buffered write
btrfs: fix wrong file range cleanup after an error filling dealloc range
btrfs: remove redundant local variable in read_block_for_search
btrfs: open code key_search
btrfs: split btrfs_direct_IO to read and write part
btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
fs: remove dio_end_io()
btrfs: switch to iomap_dio_rw() for dio
iomap: remove lockdep_assert_held()
iomap: add a filesystem hook for direct I/O bio submission
fs: export generic_file_buffered_read()
btrfs: turn space cache writeout failure messages into debug messages
btrfs: include error on messages about failure to write space/inode caches
btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
btrfs: make checksum item extension more efficient
btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
btrfs: unexport btrfs_compress_set_level()
btrfs: simplify iget helpers
...
|
|
Pull DAX updates part two from Darrick Wong:
"This time around, we're hoisting the DONTCACHE flag from XFS into the
VFS so that we can make the incore DAX mode changes become effective
sooner.
We can't change the file data access mode on a live inode because we
don't have a safe way to change the file ops pointers. The incore
state change becomes effective at inode loading time, which can happen
if the inode is evicted. Therefore, we're making it so that
filesystems can ask the VFS to evict the inode as soon as the last
holder drops.
The per-fs changes to make this call this will be in subsequent pull
requests from Ted and myself.
Summary:
- Introduce DONTCACHE flags for dentries and inodes. This hint will
cause the VFS to drop the associated objects immediately after the
last put, so that we can change the file access mode (DAX or page
cache) on the fly"
* tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: Introduce DCACHE_DONTCACHE
fs: Lift XFS_IDONTCACHE to the VFS layer
|
|
Pull DAX updates part one from Darrick Wong:
"After many years of LKML-wrangling about how to enable programs to
query and influence the file data access mode (DAX) when a filesystem
resides on storage devices such as persistent memory, Ira Weiny has
emerged with a proposed set of standard behaviors that has not been
shot down by anyone! We're more or less standardizing on the current
XFS behavior and adapting ext4 to do the same.
This is the first of a handful pull requests that will make ext4 and
XFS present a consistent interface for user programs that care about
DAX. We add a statx attribute that programs can check to see if DAX is
enabled on a particular file. Then, we update the DAX documentation to
spell out the user-visible behaviors that filesystems will guarantee
(until the next storage industry shakeup). The on-disk inode flag has
been in XFS for a few years now.
Summary:
- Clean up io_is_direct.
- Add a new statx flag to indicate when file data access is being
done via DAX (as opposed to the page cache).
- Update the documentation for how system administrators and
application programmers can take advantage of the (still
experimental DAX) feature"
Link: https://lore.kernel.org/lkml/[email protected]/
* tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
Documentation/dax: Update Usage section
fs/stat: Define DAX statx attribute
fs: Remove unneeded IS_DAX() check in io_is_direct()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull lockdown update from James Morris:
"An update for the security subsystem to allow unprivileged users
to see the status of the lockdown feature. From Jeremy Cline"
Also an added comment to describe CAP_SETFCAP.
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
capabilities: add description for CAP_SETFCAP
lockdown: Allow unprivileged users to see lockdown status
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Summary of the significant patches:
- Record information about binds/unbinds to the audit multicast
socket. This helps identify which processes have/had access to the
information in the audit stream.
- Cleanup and add some additional information to the netfilter
configuration events collected by audit.
- Fix some of the audit error handling code so we don't leak network
namespace references"
* tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: add subj creds to NETFILTER_CFG record to
audit: Replace zero-length array with flexible-array
audit: make symbol 'audit_nfcfgs' static
netfilter: add audit table unregister actions
audit: tidy and extend netfilter_cfg x_tables
audit: log audit netlink multicast bind and unbind
audit: fix a net reference leak in audit_list_rules_send()
audit: fix a net reference leak in audit_send_reply()
|
|
Document the purpose of CAP_SETFCAP. For some reason this capability
had no description while the others did.
Signed-off-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: James Morris <[email protected]>
|
|
Pull io_uring updates from Jens Axboe:
"A relatively quiet round, mostly just fixes and code improvements. In
particular:
- Make statx just use the generic statx handler, instead of open
coding it. We don't need that anymore, as we always call it async
safe (Bijan)
- Enable closing of the ring itself. Also fixes O_PATH closure (me)
- Properly name completion members (me)
- Batch reap of dead file registrations (me)
- Allow IORING_OP_POLL with double waitqueues (me)
- Add tee(2) support (Pavel)
- Remove double off read (Pavel)
- Fix overflow cancellations (Pavel)
- Improve CQ timeouts (Pavel)
- Async defer drain fixes (Pavel)
- Add support for enabling/disabling notifications on a registered
eventfd (Stefano)
- Remove dead state parameter (Xiaoguang)
- Disable SQPOLL submit on dying ctx (Xiaoguang)
- Various code cleanups"
* tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits)
io_uring: fix overflowed reqs cancellation
io_uring: off timeouts based only on completions
io_uring: move timeouts flushing to a helper
statx: hide interfaces no longer used by io_uring
io_uring: call statx directly
statx: allow system call to be invoked from io_uring
io_uring: add io_statx structure
io_uring: get rid of manual punting in io_close
io_uring: separate DRAIN flushing into a cold path
io_uring: don't re-read sqe->off in timeout_prep()
io_uring: simplify io_timeout locking
io_uring: fix flush req->refs underflow
io_uring: don't submit sqes when ctx->refs is dying
io_uring: async task poll trigger cleanup
io_uring: add tee(2) support
splice: export do_tee()
io_uring: don't repeat valid flag list
io_uring: rename io_file_put()
io_uring: remove req->needs_fixed_files
io_uring: cleanup io_poll_remove_one() logic
...
|
|
Pull block driver updates from Jens Axboe:
"On top of the core changes, here are the block driver changes for this
merge window:
- NVMe changes:
- NVMe over Fibre Channel protocol updates, which also reach
over to drivers/scsi/lpfc (James Smart)
- namespace revalidation support on the target (Anthony
Iliopoulos)
- gcc zero length array fix (Arnd Bergmann)
- nvmet cleanups (Chaitanya Kulkarni)
- misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
- use a SRQ per completion vector (Max Gurtovoy)
- fix handling of runtime changes to the queue count (Weiping
Zhang)
- t10 protection information support for nvme-rdma and
nvmet-rdma (Israel Rukshin and Max Gurtovoy)
- target side AEN improvements (Chaitanya Kulkarni)
- various fixes and minor improvements all over, icluding the
nvme part of the lpfc driver"
- Floppy code cleanup series (Willy, Denis)
- Floppy contention fix (Jiri)
- Loop CONFIGURE support (Martijn)
- bcache fixes/improvements (Coly, Joe, Colin)
- q->queuedata cleanups (Christoph)
- Get rid of ioctl_by_bdev (Christoph, Stefan)
- md/raid5 allocation fixes (Coly)
- zero length array fixes (Gustavo)
- swim3 task state fix (Xu)"
* tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
bcache: configure the asynchronous registertion to be experimental
bcache: asynchronous devices registration
bcache: fix refcount underflow in bcache_device_free()
bcache: Convert pr_<level> uses to a more typical style
bcache: remove redundant variables i and n
lpfc: Fix return value in __lpfc_nvme_ls_abort
lpfc: fix axchg pointer reference after free and double frees
lpfc: Fix pointer checks and comments in LS receive refactoring
nvme: set dma alignment to qword
nvmet: cleanups the loop in nvmet_async_events_process
nvmet: fix memory leak when removing namespaces and controllers concurrently
nvmet-rdma: add metadata/T10-PI support
nvmet: add metadata support for block devices
nvmet: add metadata/T10-PI support
nvme: add Metadata Capabilities enumerations
nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
nvmet: rename nvmet_rw_len to nvmet_rw_data_len
nvmet: add metadata characteristics for a namespace
nvme-rdma: add metadata/T10-PI support
nvme-rdma: introduce nvme_rdma_sgl structure
...
|
|
Pull block updates from Jens Axboe:
"Core block changes that have been queued up for this release:
- Remove dead blk-throttle and blk-wbt code (Guoqing)
- Include pid in blktrace note traces (Jan)
- Don't spew I/O errors on wouldblock termination (me)
- Zone append addition (Johannes, Keith, Damien)
- IO accounting improvements (Konstantin, Christoph)
- blk-mq hardware map update improvements (Ming)
- Scheduler dispatch improvement (Salman)
- Inline block encryption support (Satya)
- Request map fixes and improvements (Weiping)
- blk-iocost tweaks (Tejun)
- Fix for timeout failing with error injection (Keith)
- Queue re-run fixes (Douglas)
- CPU hotplug improvements (Christoph)
- Queue entry/exit improvements (Christoph)
- Move DMA drain handling to the few drivers that use it (Christoph)
- Partition handling cleanups (Christoph)"
* tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
block: mark bio_wouldblock_error() bio with BIO_QUIET
blk-wbt: rename __wbt_update_limits to wbt_update_limits
blk-wbt: remove wbt_update_limits
blk-throttle: remove tg_drain_bios
blk-throttle: remove blk_throtl_drain
null_blk: force complete for timeout request
blk-mq: drain I/O when all CPUs in a hctx are offline
blk-mq: add blk_mq_all_tag_iter
blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
blk-mq: use BLK_MQ_NO_TAG in more places
blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
blk-mq: move more request initialization to blk_mq_rq_ctx_init
blk-mq: simplify the blk_mq_get_request calling convention
blk-mq: remove the bio argument to ->prepare_request
nvme: force complete cancelled requests
blk-mq: blk-mq: provide forced completion method
block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
block: blk-crypto-fallback: remove redundant initialization of variable err
block: reduce part_stat_lock() scope
block: use __this_cpu_add() instead of access by smp_processor_id()
...
|
|
Pull drm updates from Dave Airlie:
"Highlights:
- Core DRM had a lot of refactoring around managed drm resources to
make drivers simpler.
- Intel Tigerlake support is on by default
- amdgpu now support p2p PCI buffer sharing and encrypted GPU memory
Details:
core:
- uapi: error out EBUSY when existing master
- uapi: rework SET/DROP MASTER permission handling
- remove drm_pci.h
- drm_pci* are now legacy
- introduced managed DRM resources
- subclassing support for drm_framebuffer
- simple encoder helper
- edid improvements
- vblank + writeback documentation improved
- drm/mm - optimise tree searches
- port drivers to use devm_drm_dev_alloc
dma-buf:
- add flag for p2p buffer support
mst:
- ACT timeout improvements
- remove drm_dp_mst_has_audio
- don't use 2nd TX slot - spec recommends against it
bridge:
- dw-hdmi various improvements
- chrontel ch7033 support
- fix stack issues with old gcc
hdmi:
- add unpack function for drm infoframe
fbdev:
- misc fbdev driver fixes
i915:
- uapi: global sseu pinning
- uapi: OA buffer polling
- uapi: remove generated perf code
- uapi: per-engine default property values in sysfs
- Tigerlake GEN12 enabled.
- Lots of gem refactoring
- Tigerlake enablement patches
- move to drm_device logging
- Icelake gamma HW readout
- push MST link retrain to hotplug work
- bandwidth atomic helpers
- ICL fixes
- RPS/GT refactoring
- Cherryview full-ppgtt support
- i915 locking guidelines documented
- require linear fb stride to be 512 multiple on gen9
- Tigerlake SAGV support
amdgpu:
- uapi: encrypted GPU memory handling
- uapi: add MEM_SYNC IB flag
- p2p dma-buf support
- export VRAM dma-bufs
- FRU chip access support
- RAS/SR-IOV updates
- Powerplay locking fixes
- VCN DPG (powergating) enablement
- GFX10 clockgating fixes
- DC fixes
- GPU reset fixes
- navi SDMA fix
- expose FP16 for modesetting
- DP 1.4 compliance fixes
- gfx10 soft recovery
- Improved Critical Thermal Faults handling
- resizable BAR on gmc10
amdkfd:
- uapi: GWS resource management
- track GPU memory per process
- report PCI domain in topology
radeon:
- safe reg list generator fixes
nouveau:
- HD audio fixes on recent systems
- vGPU detection (fail probe if we're on one, for now)
- Interlaced mode fixes (mostly avoidance on Turing, which doesn't support it)
- SVM improvements/fixes
- NVIDIA format modifier support
- Misc other fixes.
adv7511:
- HDMI SPDIF support
ast:
- allocate crtc state size
- fix double assignment
- fix suspend
bochs:
- drop connector register
cirrus:
- move to tiny drivers.
exynos:
- fix imported dma-buf mapping
- enable runtime PM
- fixes and cleanups
mediatek:
- DPI pin mode swap
- config mipi_tx current/impedance
lima:
- devfreq + cooling device support
- task handling improvements
- runtime PM support
pl111:
- vexpress init improvements
- fix module auto-load
rcar-du:
- DT bindings conversion to YAML
- Planes zpos sanity check and fix
- MAINTAINERS entry for LVDS panel driver
mcde:
- fix return value
mgag200:
- use managed config init
stm:
- read endpoints from DT
vboxvideo:
- use PCI managed functions
- drop WC mtrr
vkms:
- enable cursor by default
rockchip:
- afbc support
virtio:
- various cleanups
qxl:
- fix cursor notify port
hisilicon:
- 128-byte stride alignment fix
sun4i:
- improved format handling"
* tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm: (1401 commits)
drm/amd/display: Fix potential integer wraparound resulting in a hang
drm/amd/display: drop cursor position check in atomic test
drm/amdgpu: fix device attribute node create failed with multi gpu
drm/nouveau: use correct conflicting framebuffer API
drm/vblank: Fix -Wformat compile warnings on some arches
drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode
drm/amd/display: Handle GPU reset for DC block
drm/amdgpu: add apu flags (v2)
drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven
drm/amdgpu: fix pm sysfs node handling (v2)
drm/amdgpu: move gpu_info parsing after common early init
drm/amdgpu: move discovery gfx config fetching
drm/nouveau/dispnv50: fix runtime pm imbalance on error
drm/nouveau: fix runtime pm imbalance on error
drm/nouveau: fix runtime pm imbalance on error
drm/nouveau/debugfs: fix runtime pm imbalance on error
drm/nouveau/nouveau/hmm: fix migrate zero page to GPU
drm/nouveau/nouveau/hmm: fix nouveau_dmem_chunk allocations
drm/nouveau/kms/nv50-: Share DP SST mode_valid() handling with MST
drm/nouveau/kms/nv50-: Move 8BPC limit for MST into nv50_mstc_get_modes()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe:
"This series adds a selftest for hmm_range_fault() and several of the
DEVICE_PRIVATE migration related actions, and another simplification
for hmm_range_fault()'s API.
- Simplify hmm_range_fault() with a simpler return code, no
HMM_PFN_SPECIAL, and no customizable output PFN format
- Add a selftest for hmm_range_fault() and DEVICE_PRIVATE related
functionality"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
MAINTAINERS: add HMM selftests
mm/hmm/test: add selftests for HMM
mm/hmm/test: add selftest driver for HMM
mm/hmm: remove the customizable pfn format from hmm_range_fault
mm/hmm: remove HMM_PFN_SPECIAL
drm/amdgpu: remove dead code after hmm_range_fault()
mm/hmm: make hmm_range_fault return 0 or -1
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream revision
20200430, fix several reference counting errors related to ACPI
tables, add _Exx / _Lxx support to the GED driver, add a new
acpi_evaluate_reg() helper, add new DPTF battery participant driver
and extend the DPFT power participant driver, improve the handling of
memory failures in the APEI code, add a blacklist entry to the
backlight driver, update the PMIC driver and the processor idle
driver, fix two kobject reference count leaks, and make a few janitory
changes.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20200430:
- Move acpi_gbl_next_cmd_num definition (Erik Kaneda).
- Ignore AE_ALREADY_EXISTS status in the disassembler when parsing
create operators (Erik Kaneda).
- Add status checks to the dispatcher (Erik Kaneda).
- Fix required parameters for _NIG and _NIH (Erik Kaneda).
- Make acpi_protocol_lengths static (Yue Haibing).
- Fix ACPI table reference counting errors in several places, mostly
in error code paths (Hanjun Guo).
- Extend the Generic Event Device (GED) driver to support _Exx and
_Lxx handler methods (Ard Biesheuvel).
- Add new acpi_evaluate_reg() helper and modify the ACPI PCI hotplug
code to use it (Hans de Goede).
- Add new DPTF battery participant driver and make the DPFT power
participant driver create more sysfs device attributes (Srinivas
Pandruvada).
- Improve the handling of memory failures in APEI (James Morse).
- Add new blacklist entry for Acer TravelMate 5735Z to the backlight
driver (Paul Menzel).
- Add i2c address for thermal control to the PMIC driver (Mauro
Carvalho Chehab).
- Allow the ACPI processor idle driver to work on platforms with only
one ACPI C-state present (Zhang Rui).
- Fix kobject reference count leaks in error code paths in two places
(Qiushi Wu).
- Delete unused proc filename macros and make some symbols static
(Pascal Terjan, Zheng Zengkai, Zou Wei)"
* tag 'acpi-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
ACPI: CPPC: Fix reference count leak in acpi_cppc_processor_probe()
ACPI: sysfs: Fix reference count leak in acpi_sysfs_add_hotplug_profile()
ACPI: GED: use correct trigger type field in _Exx / _Lxx handling
ACPI: DPTF: Add battery participant driver
ACPI: DPTF: Additional sysfs attributes for power participant driver
ACPI: video: Use native backlight on Acer TravelMate 5735Z
arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
ACPI: APEI: Kick the memory_failure() queue for synchronous errors
mm/memory-failure: Add memory_failure_queue_kick()
ACPI / PMIC: Add i2c address for thermal control
ACPI: GED: add support for _Exx / _Lxx handler methods
ACPI: Delete unused proc filename macros
ACPI: hotplug: PCI: Use the new acpi_evaluate_reg() helper
ACPI: utils: Add acpi_evaluate_reg() helper
ACPI: debug: Make two functions static
ACPI: sleep: Put the FACS table after using it
ACPI: scan: Put SPCR and STAO table after using it
ACPI: EC: Put the ACPI table after using it
ACPI: APEI: Put the HEST table for error path
ACPI: APEI: Put the error record serialization table for error path
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These rework the system-wide PM driver flags, make runtime switching
of cpuidle governors easier, improve the user space hibernation
interface code, add intel-speed-select interface documentation, add
more debug messages to the ACPI code handling suspend to idle, update
the cpufreq core and drivers, fix a minor issue in the cpuidle core
and update two cpuidle drivers, improve the PM-runtime framework,
update the Intel RAPL power capping driver, update devfreq core and
drivers, and clean up the cpupower utility.
Specifics:
- Rework the system-wide PM driver flags to make them easier to
understand and use and update their documentation (Rafael Wysocki,
Alan Stern).
- Allow cpuidle governors to be switched at run time regardless of
the kernel configuration and update the related documentation
accordingly (Hanjun Guo).
- Improve the resume device handling in the user space hibernarion
interface code (Domenico Andreoli).
- Document the intel-speed-select sysfs interface (Srinivas
Pandruvada).
- Make the ACPI code handing suspend to idle print more debug
messages to help diagnose issues with it (Rafael Wysocki).
- Fix a helper routine in the cpufreq core and correct a typo in the
struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang
Wenhu).
- Update cpufreq drivers:
- Make the intel_pstate driver start in the passive mode by
default on systems without HWP (Rafael Wysocki).
- Add i.MX7ULP support to the imx-cpufreq-dt driver and add
i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan).
- Convert the qoriq cpufreq driver to a platform one, make the
platform code create a suitable device object for it and add
platform dependencies to it (Mian Yousaf Kaukab, Geert
Uytterhoeven).
- Fix wrong compatible binding in the qcom driver (Ansuel Smith).
- Build the omap driver by default for ARCH_OMAP2PLUS (Anders
Roxell).
- Add r8a7742 SoC support to the dt cpufreq driver (Lad
Prabhakar).
- Update cpuidle core and drivers:
- Fix three reference count leaks in error code paths in the
cpuidle core (Qiushi Wu).
- Convert Qualcomm SPM to a generic cpuidle driver (Stephan
Gerhold).
- Fix up the execution order when entering a domain idle state in
the PSCI driver (Ulf Hansson).
- Fix a reference counting issue related to clock management and
clean up two oddities in the PM-runtime framework (Rafael Wysocki,
Andy Shevchenko).
- Add ElkhartLake support to the Intel RAPL power capping driver and
remove an unused local MSR definition from it (Jacob Pan, Sumeet
Pawnikar).
- Update devfreq core and drivers:
- Replace strncpy() with strscpy() in the devfreq core and use
lockdep asserts instead of manual checks for a locked mutex in
it (Dmitry Osipenko, Krzysztof Kozlowski).
- Add a generic imx bus scaling driver and make it register an
interconnect device (Leonard Crestez, Gustavo A. R. Silva).
- Make the cpufreq notifier in the tegra30 driver take boosting
into account and delete an unuseful error message from that
driver (Dmitry Osipenko, Markus Elfring).
- Remove unneeded semicolon from the cpupower code (Zou Wei)"
* tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (51 commits)
cpuidle: Fix three reference count leaks
PM: runtime: Replace pm_runtime_callbacks_present()
PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex
PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR
PM / devfreq: Replace strncpy with strscpy
PM / devfreq: imx: Register interconnect device
PM / devfreq: Add generic imx bus scaling driver
PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe()
PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting
PM: hibernate: Restrict writes to the resume device
PM: runtime: clk: Fix clk_pm_runtime_get() error path
cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver
ACPI: EC: PM: s2idle: Extend GPE dispatching debug message
ACPI: PM: s2idle: Print type of wakeup debug messages
powercap: RAPL: remove unused local MSR define
PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend()
Documentation: admin-guide: pm: Document intel-speed-select
PM: hibernate: Split off snapshot dev option
PM: hibernate: Incorporate concurrency handling
Documentation: ABI: make current_governer_ro as a candidate for removal
...
|
|
git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver updates from Andy Shevchenko:
- Add a support of the media keys on the ASUS laptop UX325JA/UX425JA
- ASUS WMI driver can now handle 2-in-1 models T100TA, T100CHI, T100HA,
T200TA
- Big refactoring of Intel SCU driver with Elkhart Lake support has
been added
- Slim Bootloarder firmware update signaling WMI driver has been added
- Thinkpad ACPI driver can handle dual fan configuration on new P and X
models
- Touchscreen DMI driver has been extended to support
- MP-man MPWIN895CL tablet
- ONDA V891 v5 tablet
- techBite Arc 11.6
- Trekstor Twin 10.1
- Trekstor Yourbook C11B
- Vinga J116
- Virtual Button driver got a few fixes to detect mode of 2-in-1 tablet
models
- Intel Speed Select tools update
- Plenty of small cleanups here and there
* tag 'platform-drivers-x86-v5.8-1' of git://git.infradead.org/linux-platform-drivers-x86: (89 commits)
platform/x86: dcdbas: Check SMBIOS for protected buffer address
platform/x86: asus_wmi: Reserve more space for struct bias_args
platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type
platform/x86: intel-hid: Add a quirk to support HP Spectre X2 (2015)
platform/x86: touchscreen_dmi: Update Trekstor Twin 10.1 entry
platform/x86: touchscreen_dmi: Add info for the Trekstor Yourbook C11B
platform/x86: hp-wmi: Introduce HPWMI_POWER_FW_OR_HW as convenient shortcut
platform/x86: hp-wmi: Convert simple_strtoul() to kstrtou32()
platform/x86: hp-wmi: Refactor postcode_store() to follow standard patterns
platform/x86: acerhdf: replace space by * in modalias
platform/x86: ISST: Increase timeout
tools/power/x86/intel-speed-select: Fix invalid core mask
tools/power/x86/intel-speed-select: Increase CPU count
tools/power/x86/intel-speed-select: Fix json perf-profile output output
platform/x86: dell-wmi: Ignore keyboard attached / detached events
platform/x86: dell-laptop: don't register micmute LED if there is no token
platform/x86: thinkpad_acpi: Replace custom approach by kstrtoint()
platform/x86: thinkpad_acpi: Use strndup_user() in dispatch_proc_write()
platform/x86: thinkpad_acpi: Replace next_cmd(&buf) with strsep(&buf, ",")
platform/x86: intel-vbtn: Detect switch position before registering the input-device
...
|
|
Pull MMC updates from Ulf Hansson:
"MMC core:
- Enable erase/discard/trim support for all (e)MMC/SD hosts
- Export information through sysfs about enhanced RPMB support (eMMC v5.1+)
- Align the initialization commands for SDIO cards
- Fix SDIO initialization to prevent memory leaks and NULL pointer errors
- Do not export undefined MMC_NAME/MODALIAS for SDIO cards
- Export device/vendor field from common CIS for SDIO cards
- Move SDIO IDs from functional drivers to the common SDIO header
- Introduce the ->request_atomic() host ops
MMC host:
- Improve support for HW busy signaling for several hosts
- Converting some DT bindings to the json-schema
- meson-mx-sdhc: Add driver and DT doc for the Amlogic Meson SDHC controller
- meson-mx-sdio: Run a soft reset to recover from timeout/CRC error
- mmci: Convert to use mmc_regulator_set_vqmmc()
- mmci_stm32_sdmmc: Fix a couple of DMA bugs
- mmci_stm32_sdmmc: Fix power on issue
- renesas,mmcif,sdhci: Document r8a7742 DT bindings
- renesas_sdhi: Add support for M3-W ES1.2 and 1.3 revisions
- renesas_sdhi: Improvements to the TAP selection
- renesas_sdhi/tmio: Further fixup runtime PM management at ->remove()
- sdhci: Introduce ops to dump vendor specific registers
- sdhci-cadence: Fix PHY write sequence
- sdhci-esdhc-imx: Improve tunings
- sdhci-esdhc-imx: Enable GPIO card detect as system wakeup
- sdhci-esdhc-imx: Add HS400 support for i.MX6SLL
- sdhci-esdhc-mcf: Add driver for the Coldfire/M5441X esdhc controller
- m68k: mcf5441x: Add platform data to enable esdhc mmc controller
- sdhci-msm: Improve HS400 tuning
- sdhci-msm: Dump vendor specific registers at error
- sdhci-msm: Add support for DLL/DDR properties provided from DT
- sdhci-msm: Add support for the sm8250 variant
- sdhci-msm: Add support for DVFS by converting to dev_pm_opp_set_rate()
- sdhci-of-arasan: Add support for Intel Keem Bay variant
- sdhci-of-arasan: Add support for Xilinx Versal SD variant
- sdhci-of-dwcmshc: Add support for system suspend/resume
- sdhci-of-dwcmshc: Fix UHS signaling support
- sdhci-of-esdhc: Fix tuning for eMMC HS400 mode
- sdhci-pci-gli: Add Genesys Logic GL9763E support
- sdhci-sprd: Add support for the ->request_atomic() ops
- sdhci-tegra: Avoid reading autocal timeout values when not applicable
MEMSTICK:
- Minor trivial update"
* tag 'mmc-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (127 commits)
dt-bindings: mmc: Convert sdhci-pxa to json-schema
mmc: sdhci-msm: Clear tuning done flag while hs400 tuning
mmc: core: Export device/vendor ids from Common CIS for SDIO cards
mmc: core: Do not export MMC_NAME= and MODALIAS=mmc:block for SDIO cards
mmc: sdhci-of-at91: fix CALCR register being rewritten
mmc: sdhci-esdhc-imx: disable the CMD CRC check for standard tuning
mmc: sdhci-esdhc-imx: fix the mask for tuning start point
mmc: host: sdhci-esdhc-imx: add wakeup feature for GPIO CD pin
mmc: mmci_sdmmc: fix DMA API warning max segment size
mmc: mmci_sdmmc: fix DMA API warning overlapping mappings
mmc: sdhci-of-arasan: Add support for Intel Keem Bay
dt-bindings: mmc: arasan: Add compatible strings for Intel Keem Bay
mmc: sdhci-cadence: fix PHY write
mmc: sdio: Sort all SDIO IDs in common include file
mmc: sdio: Fix Cypress SDIO IDs macros in common include file
mmc: sdio: Move SDIO IDs from b43-sdio driver to common include file
mmc: sdio: Move SDIO IDs from ath10k driver to common include file
mmc: sdio: Move SDIO IDs from ath6kl driver to common include file
mmc: sdio: Move SDIO IDs from smssdio driver to common include file
mmc: sdio: Move SDIO IDs from btmtksdio driver to common include file
...
|
|
Merge updates from Andrew Morton:
"A few little subsystems and a start of a lot of MM patches.
Subsystems affected by this patch series: squashfs, ocfs2, parisc,
vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
swap, memcg, pagemap, memory-failure, vmalloc, kasan"
* emailed patches from Andrew Morton <[email protected]>: (128 commits)
kasan: move kasan_report() into report.c
mm/mm_init.c: report kasan-tag information stored in page->flags
ubsan: entirely disable alignment checks under UBSAN_TRAP
kasan: fix clang compilation warning due to stack protector
x86/mm: remove vmalloc faulting
mm: remove vmalloc_sync_(un)mappings()
x86/mm/32: implement arch_sync_kernel_mappings()
x86/mm/64: implement arch_sync_kernel_mappings()
mm/ioremap: track which page-table levels were modified
mm/vmalloc: track which page-table levels were modified
mm: add functions to track page directory modifications
s390: use __vmalloc_node in stack_alloc
powerpc: use __vmalloc_node in alloc_vm_stack
arm64: use __vmalloc_node in arch_alloc_vmap_stack
mm: remove vmalloc_user_node_flags
mm: switch the test_vmalloc module to use __vmalloc_node
mm: remove __vmalloc_node_flags_caller
mm: remove both instances of __vmalloc_node_flags
mm: remove the prot argument to __vmalloc_node
mm: remove the pgprot argument to __vmalloc
...
|
|
These functions are not needed anymore because the vmalloc and ioremap
mappings are now synchronized when they are created or torn down.
Remove all callers and function definitions.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Track at which levels in the page-table entries were modified by
vmap/vunmap.
After the page-table has been modified, use that information do decide
whether the new arch_sync_kernel_mappings() needs to be called.
[[email protected]: map_kernel_range_noflush() needs the arch_sync_kernel_mappings() call]
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: Get rid of vmalloc_sync_(un)mappings()", v3.
After the recent issue with vmalloc and tracing code[1] on x86 and a
long history of previous issues related to the vmalloc_sync_mappings()
interface, I thought the time has come to remove it. Please see [2],
[3], and [4] for some other issues in the past.
The patches add tracking of page-table directory changes to the vmalloc
and ioremap code. Depending on which page-table levels changes have
been made, a new per-arch function is called:
arch_sync_kernel_mappings().
On x86-64 with 4-level paging, this function will not be called more
than 64 times in a systems runtime (because vmalloc-space takes 64 PGD
entries which are only populated, but never cleared).
As a side effect this also allows to get rid of vmalloc faults on x86,
making it safe to touch vmalloc'ed memory in the page-fault handler.
Note that this potentially includes per-cpu memory.
This patch (of 7):
Add page-table allocation functions which will keep track of changed
directory entries. They are needed for new PGD, P4D, PUD, and PMD
entries and will be used in vmalloc and ioremap code to decide whether
any changes in the kernel mappings need to be synchronized between
page-tables in the system.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Open code it in __bpf_map_area_alloc, which is the only caller. Also
clean up __bpf_map_area_alloc to have a single vmalloc call with slightly
different flags instead of the current two different calls.
For this to compile for the nommu case add a __vmalloc_node_range stub to
nommu.c.
[[email protected]: fix nommu.c build]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Just use __vmalloc_node instead which gets and extra argument. To be able
to to use __vmalloc_node in all caller make it available outside of
vmalloc and implement it in nommu.c.
[[email protected]: fix nommu build]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The real version just had a few callers that can open code it and remove
one layer of indirection. The nommu stub was public but only had a single
caller, so remove it and avoid a CONFIG_MMU ifdef in vmalloc.h.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Michael Kelley <[email protected]> [hyperv]
Acked-by: Gao Xiang <[email protected]> [erofs]
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Wei Liu <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
To help enforcing the W^X protection don't allow remapping existing pages
as executable.
x86 bits from Peter Zijlstra, arm64 bits from Mark Rutland.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Mark Rutland <[email protected]>.
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is always PAGE_KERNEL - for long term mappings with other properties
vmap should be used.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Switch all callers to map_kernel_range, which symmetric to the unmap side
(as well as the _noflush versions).
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rename the Kconfig variable to clarify the scope.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Switch the two remaining callers to use __get_vm_area_caller instead.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Kelley <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The page table entry is passed in the 'val' argument to note_page(),
however this was previously an "unsigned long" which is fine on 64-bit
platforms. But for 32 bit x86 it is not always big enough to contain a
page table entry which may be 64 bits.
Change the type to u64 to ensure that it is always big enough.
[[email protected]: fix riscv]
Reported-by: Jan Beulich <[email protected]>
Signed-off-by: Steven Price <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Fix W+X debug feature on x86"
Jan alerted me[1] that the W+X detection debug feature was broken in x86
by my change[2] to switch x86 to use the generic ptdump infrastructure.
Fundamentally the approach of trying to move the calculation of
effective permissions into note_page() was broken because note_page() is
only called for 'leaf' entries and the effective permissions are passed
down via the internal nodes of the page tree. The solution I've taken
here is to create a new (optional) callback which is called for all
nodes of the page tree and therefore can calculate the effective
permissions.
Secondly on some configurations (32 bit with PAE) "unsigned long" is not
large enough to store the table entries. The fix here is simple - let's
just use a u64.
[1] https://lore.kernel.org/lkml/[email protected]/
[2] 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")
This patch (of 2):
By switching the x86 page table dump code to use the generic code the
effective permissions are no longer calculated correctly because the
note_page() function is only called for *leaf* entries. To calculate
the actual effective permissions it is necessary to observe the full
hierarchy of the page tree.
Introduce a new callback for ptdump which is called for every entry and
can therefore update the prot_levels array correctly. note_page() can
then simply access the appropriate element in the array.
[[email protected]: make the assignment conditional on val != 0]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")
Reported-by: Jan Beulich <[email protected]>
Signed-off-by: Steven Price <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a memory.swap.high knob, which can be used to protect the system
from SWAP exhaustion. The mechanism used for penalizing is similar to
memory.high penalty (sleep on return to user space).
That is not to say that the knob itself is equivalent to memory.high.
The objective is more to protect the system from potentially buggy tasks
consuming a lot of swap and impacting other tasks, or even bringing the
whole system to stand still with complete SWAP exhaustion. Hopefully
without the need to find per-task hard limits.
Slowing misbehaving tasks down gradually allows user space oom killers
or other protection mechanisms to react. oomd and earlyoom already do
killing based on swap exhaustion, and memory.swap.high protection will
help implement such userspace oom policies more reliably.
We can use one counter for number of pages allocated under pressure to
save struct task space and avoid two separate hierarchy walks on the hot
path. The exact overage is calculated on return to user space, anyway.
Take the new high limit into account when determining if swap is "full".
Borrowing the explanation from Johannes:
The idea behind "swap full" is that as long as the workload has plenty
of swap space available and it's not changing its memory contents, it
makes sense to generously hold on to copies of data in the swap device,
even after the swapin. A later reclaim cycle can drop the page without
any IO. Trading disk space for IO.
But the only two ways to reclaim a swap slot is when they're faulted
in and the references go away, or by scanning the virtual address space
like swapoff does - which is very expensive (one could argue it's too
expensive even for swapoff, it's often more practical to just reboot).
So at some point in the fill level, we have to start freeing up swap
slots on fault/swapin. Otherwise we could eventually run out of swap
slots while they're filled with copies of data that is also in RAM.
We don't want to OOM a workload because its available swap space is
filled with redundant cache.
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Chris Down <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Hugh Dickins <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|