Age | Commit message (Collapse) | Author | Files | Lines |
|
A last-minute fixlet which I'd failed to merge at the appropriate time
had the predictable effect.
Fixes: f672e2c217e2d4b2 ("lib: untag user pointers in strn*_user")
Cc: Andrey Konovalov <[email protected]>
Cc: David Miller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
people, and until recently arm64 used these erroneously/pointlessly for
other levels of page table.
To make it incredibly clear that these only apply to the PTE level, and to
align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
to pgtable_pte_page_{ctor,dtor}().
These changes were generated with the following shell script:
----
git grep -lw 'pgtable_page_.tor' | while read FILE; do
sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
done
----
... with the documentation re-flowed to remain under 80 columns, and
whitespace fixed up in macros to keep backslashes aligned.
There should be no functional change as a result of this patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]> [m68k]
Cc: Anshuman Khandual <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
"likely(!IS_ERR(x))" is excessive. IS_ERR() already uses
unlikely() internally.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Denis Efremov <[email protected]>
Cc: Anton Altaparmakov <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses
unlikely() internally.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Denis Efremov <[email protected]>
Cc: Mike Marciniszyn <[email protected]>
Cc: Joe Perches <[email protected]>
Acked-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely()
internally.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Denis Efremov <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely()
internally.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Denis Efremov <[email protected]>
Cc: Inaky Perez-Gonzalez <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely()
internally.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Denis Efremov <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
"unlikely(WARN(x))" is excessive. WARN() already uses unlikely()
internally.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Denis Efremov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Joe Perches <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
IS_ERR(), IS_ERR_OR_NULL(), IS_ERR_VALUE() and WARN*() already contain
unlikely() optimization internally. Thus, there is no point in calling
these functions and defines under likely()/unlikely().
This check is based on the coccinelle rule developed by Enrico Weigelt
https://lore.kernel.org/lkml/[email protected]/
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Denis Efremov <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Anton Altaparmakov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Boris Pismenny <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Denis Efremov <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Cc: Inaky Perez-Gonzalez <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mike Marciniszyn <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Saeed Mahameed <[email protected]>
Cc: Sean Paul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
hexagon never reserves or initializes initrd and the only mention of it is
the empty free_initrd_mem() function.
As we have a generic implementation of free_initrd_mem(), there is no need
to define an empty stub for the hexagon implementation and it can be
dropped.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Richard Kuo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are many common parts between MADV_COLD and MADV_PAGEOUT.
This patch factor them out to save code duplication.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Minchan Kim <[email protected]>
Suggested-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Daniel Colascione <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: kbuild test robot <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Oleksandr Natalenko <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Sonny Rao <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tim Murray <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When a process expects no accesses to a certain memory range for a long
time, it could hint kernel that the pages can be reclaimed instantly but
data should be preserved for future use. This could reduce workingset
eviction so it ends up increasing performance.
This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall.
MADV_PAGEOUT can be used by a process to mark a memory range as not
expected to be used for a long time so that kernel reclaims *any LRU*
pages instantly. The hint can help kernel in deciding which pages to
evict proactively.
A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit
intentionally because it's automatically bounded by PMD size. If PMD
size(e.g., 256) makes some trouble, we could fix it later by limit it to
SWAP_CLUSTER_MAX[1].
- man-page material
MADV_PAGEOUT (since Linux x.x)
Do not expect access in the near future so pages in the specified
regions could be reclaimed instantly regardless of memory pressure.
Thus, access in the range after successful operation could cause
major page fault but never lose the up-to-date contents unlike
MADV_DONTNEED. Pages belonging to a shared mapping are only processed
if a write access is allowed for the calling process.
MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
VM_PFNMAP pages.
[1] https://lore.kernel.org/lkml/[email protected]/
[[email protected]: clear PG_active on MADV_PAGEOUT]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: resolve conflicts with hmm.git]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Minchan Kim <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Daniel Colascione <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Oleksandr Natalenko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Sonny Rao <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tim Murray <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN
as default. It is for preventing to reclaim dirty pages when CMA try to
migrate pages. Strictly speaking, we don't need it because CMA didn't
allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list.
Moreover, it has a problem to prevent anonymous pages's swap out even
though force_reclaim = true in shrink_page_list on upcoming patch. So
this patch makes references's default value to PAGEREF_RECLAIM and rename
force_reclaim with ignore_references to make it more clear.
This is a preparatory work for next patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Minchan Kim <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Daniel Colascione <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: kbuild test robot <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Oleksandr Natalenko <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Sonny Rao <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tim Murray <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.
- Background
The Android terminology used for forking a new process and starting an app
from scratch is a cold start, while resuming an existing app is a hot
start. While we continually try to improve the performance of cold
starts, hot starts will always be significantly less power hungry as well
as faster so we are trying to make hot start more likely than cold start.
To increase hot start, Android userspace manages the order that apps
should be killed in a process called ActivityManagerService.
ActivityManagerService tracks every Android app or service that the user
could be interacting with at any time and translates that into a ranked
list for lmkd(low memory killer daemon). They are likely to be killed by
lmkd if the system has to reclaim memory. In that sense they are similar
to entries in any other cache. Those apps are kept alive for
opportunistic performance improvements but those performance improvements
will vary based on the memory requirements of individual workloads.
- Problem
Naturally, cached apps were dominant consumers of memory on the system.
However, they were not significant consumers of swap even though they are
good candidate for swap. Under investigation, swapping out only begins
once the low zone watermark is hit and kswapd wakes up, but the overall
allocation rate in the system might trip lmkd thresholds and cause a
cached process to be killed(we measured performance swapping out vs.
zapping the memory by killing a process. Unsurprisingly, zapping is 10x
times faster even though we use zram which is much faster than real
storage) so kill from lmkd will often satisfy the high zone watermark,
resulting in very few pages actually being moved to swap.
- Approach
The approach we chose was to use a new interface to allow userspace to
proactively reclaim entire processes by leveraging platform information.
This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
that are known to be cold from userspace and to avoid races with lmkd by
reclaiming apps as soon as they entered the cached state. Additionally,
it could provide many chances for platform to use much information to
optimize memory efficiency.
To achieve the goal, the patchset introduce two new options for madvise.
One is MADV_COLD which will deactivate activated pages and the other is
MADV_PAGEOUT which will reclaim private pages instantly. These new
options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
ways to gain some free memory space. MADV_PAGEOUT is similar to
MADV_DONTNEED in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed immediately; MADV_COLD is similar
to MADV_FREE in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed when memory pressure rises.
This patch (of 5):
When a process expects no accesses to a certain memory range, it could
give a hint to kernel that the pages can be reclaimed when memory pressure
happens but data should be preserved for future use. This could reduce
workingset eviction so it ends up increasing performance.
This patch introduces the new MADV_COLD hint to madvise(2) syscall.
MADV_COLD can be used by a process to mark a memory range as not expected
to be used in the near future. The hint can help kernel in deciding which
pages to evict early during memory pressure.
It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves
active file page -> inactive file LRU
active anon page -> inacdtive anon LRU
Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file
LRU's head because MADV_COLD is a little bit different symantic.
MADV_FREE means it's okay to discard when the memory pressure because the
content of the page is *garbage* so freeing such pages is almost zero
overhead since we don't need to swap out and access afterward causes just
minor fault. Thus, it would make sense to put those freeable pages in
inactive file LRU to compete other used-once pages. It makes sense for
implmentaion point of view, too because it's not swapbacked memory any
longer until it would be re-dirtied. Even, it could give a bonus to make
them be reclaimed on swapless system. However, MADV_COLD doesn't mean
garbage so reclaiming them requires swap-out/in in the end so it's bigger
cost. Since we have designed VM LRU aging based on cost-model, anonymous
cold pages would be better to position inactive anon's LRU list, not file
LRU. Furthermore, it would help to avoid unnecessary scanning if system
doesn't have a swap device. Let's start simpler way without adding
complexity at this moment. However, keep in mind, too that it's a caveat
that workloads with a lot of pages cache are likely to ignore MADV_COLD on
anonymous memory because we rarely age anonymous LRU lists.
* man-page material
MADV_COLD (since Linux x.x)
Pages in the specified regions will be treated as less-recently-accessed
compared to pages in the system with similar access frequencies. In
contrast to MADV_FREE, the contents of the region are preserved regardless
of subsequent writes to pages.
MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
pages.
[[email protected]: resolve conflicts with hmm.git]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Minchan Kim <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Daniel Colascione <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Oleksandr Natalenko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Sonny Rao <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tim Murray <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There isn't a good reason to differentiate between the user address space
layout modification syscalls and the other memory permission/attributes
ones (e.g. mprotect, madvise) w.r.t. the tagged address ABI. Untag the
user addresses on entry to these functions.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Andrey Konovalov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Szabolcs Nagy <[email protected]>
Cc: Kevin Brodsky <[email protected]>
Cc: Dave P Martin <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
vaddr_get_pfn() uses provided user pointers for vma lookups, which can
only by done with untagged pointers.
Untag user pointers in this function.
Link: http://lkml.kernel.org/r/87422b4d72116a975896f2b19b00f38acbd28f33.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
tee_shm_register()->optee_shm_unregister()->check_mem_type() uses provided
user pointers for vma lookups (via __check_mem_type()), which can only by
done with untagged pointers.
Untag user pointers in this function.
Link: http://lkml.kernel.org/r/4b993f33196b3566ac81285ff8453219e2079b45.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Jens Wiklander <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
videobuf_dma_contig_user_get() uses provided user pointers for vma
lookups, which can only by done with untagged pointers.
Untag the pointers in this function.
Link: http://lkml.kernel.org/r/100436d5f8e4349a78f27b0bbb27e4801fcb946b.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mauro Carvalho Chehab <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
In radeon_gem_userptr_ioctl() an MMU notifier is set up with a (tagged)
userspace pointer. The untagged address should be used so that MMU
notifiers for the untagged address get correctly matched up with the right
BO. This funcation also calls radeon_ttm_tt_pin_userptr(), which uses
provided user pointers for vma lookups, which can only by done with
untagged pointers.
This patch untags user pointers in radeon_gem_userptr_ioctl().
Link: http://lkml.kernel.org/r/c856babeb67195b35603b8d5ba386a2819cec5ff.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Suggested-by: Felix Kuehling <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
In amdgpu_gem_userptr_ioctl() and amdgpu_amdkfd_gpuvm.c/init_user_pages()
an MMU notifier is set up with a (tagged) userspace pointer. The untagged
address should be used so that MMU notifiers for the untagged address get
correctly matched up with the right BO. This patch untag user pointers in
amdgpu_gem_userptr_ioctl() for the GEM case and in amdgpu_amdkfd_gpuvm_
alloc_memory_of_gpu() for the KFD case. This also makes sure that an
untagged pointer is passed to amdgpu_ttm_tt_get_user_pages(), which uses
it for vma lookups.
Link: http://lkml.kernel.org/r/d684e1df08f2ecb6bc292e222b64fa9efbc26e69.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Suggested-by: Felix Kuehling <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
userfaultfd code use provided user pointers for vma lookups, which can
only by done with untagged pointers.
Untag user pointers in validate_range().
Link: http://lkml.kernel.org/r/cdc59ddd7011012ca2e689bc88c3b65b1ea7e413.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
In copy_mount_options a user address is being subtracted from TASK_SIZE.
If the address is lower than TASK_SIZE, the size is calculated to not
allow the exact_copy_from_user() call to cross TASK_SIZE boundary.
However if the address is tagged, then the size will be calculated
incorrectly.
Untag the address before subtracting.
Link: http://lkml.kernel.org/r/1de225e4a54204bfd7f25dac2635e31aa4aa1d90.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
get_vaddr_frames uses provided user pointers for vma lookups, which can
only by done with untagged pointers. Instead of locating and changing all
callers of this function, perform untagging in it.
Link: http://lkml.kernel.org/r/28f05e49c92b2a69c4703323d6c12208f3d881fe.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
mm/gup.c provides a kernel interface that accepts user addresses and
manipulates user pages directly (for example get_user_pages, that is used
by the futex syscall). Since a user can provided tagged addresses, we
need to handle this case.
Add untagging to gup.c functions that use user addresses for vma lookups.
Link: http://lkml.kernel.org/r/4731bddba3c938658c10ff4ed55cc01c60f4c8f8.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
This patch allows tagged pointers to be passed to the following memory
syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect,
mremap, msync, munlock, move_pages.
The mmap and mremap syscalls do not currently accept tagged addresses.
Architectures may interpret the tag as a background colour for the
corresponding vma.
Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "arm64: untag user pointers passed to the kernel", v19.
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.
The mmap and mremap (only new_addr) syscalls do not currently accept
tagged addresses. Architectures may interpret the tag as a background
colour for the corresponding vma.
Other memory syscalls (mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally. Thus for memory syscalls we untag
pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a
custom wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there. Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched. Thus this approach
would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call
find_vma() (and other similar functions) or directly compare against
vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare
user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].
This patchset has been merged into the Pixel 2 & 3 kernel trees and is
now being used to enable testing of Pixel phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292
[3] https://lkml.org/lkml/2019/6/12/745
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a
This patch (of 11)
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
strncpy_from_user and strnlen_user accept user addresses as arguments, and
do not go through the same path as copy_from_user and others, so here we
need to handle the case of tagged user addresses separately.
Untag user pointers passed to these functions.
Note, that this patch only temporarily untags the pointers to perform
validity checks, but then uses them as is to perform user memory accesses.
[[email protected]: fix sparc4 build]
Link: http://lkml.kernel.org/r/CAAeHK+yx4a-P0sDrXTUxMvO2H0CJZUFPffBrg_cU7oJOZyC7ew@mail.gmail.com
Link: http://lkml.kernel.org/r/c5a78bcad3e94d6cda71fcaa60a423231ae71e4c.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jens Wiklander <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix an unaligned access which breaks on platforms where this is not
permitted (e.g., Sparc).
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dave Rodgman <[email protected]>
Cc: Dave Rodgman <[email protected]>
Cc: Markus F.X.J. Oberhumer <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
CONFIG_PROVE_RCU_LIST requires list_for_each_entry_rcu() to pass a lockdep
expression if using srcu or locking for protection. It can only check
regular RCU protection, all other protection needs to be passed as lockdep
expression.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Cc: Jonathan Derrick <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Lorenzo Pieralisi <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Null pointers were assigned to local variables in a few cases as exception
handling. The jump target “out” was used where no meaningful data
processing actions should eventually be performed by branches of an if
statement then. Use an additional jump target for calling dev_kfree_skb()
directly.
Return also directly after error conditions were detected when no extra
clean-up is needed by this function implementation.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Markus Elfring <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
dev_kfree_skb() input parameter validation, thus the test around the call
is not needed.
This issue was detected by using the Coccinelle software.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Markus Elfring <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The original clean up of "cut here" missed the WARN_ON() case (that does
not have a printk message), which was fixed recently by adding an explicit
printk of "cut here". This had the downside of adding a printk() to every
WARN_ON() caller, which reduces the utility of using an instruction
exception to streamline the resulting code. By making this a new BUGFLAG,
all of these can be removed and "cut here" can be handled by the exception
handler.
This was very pronounced on PowerPC, but the effect can be seen on x86 as
well. The resulting text size of a defconfig build shows some small
savings from this patch:
text data bss dec hex filename
19691167 5134320 1646664 26472151 193eed7 vmlinux.before
19676362 5134260 1663048 26473670 193f4c6 vmlinux.after
This change also opens the door for creating something like BUG_MSG(),
where a custom printk() before issuing BUG(), without confusing the "cut
here" line.
Link: http://lkml.kernel.org/r/201908200943.601DD59DCE@keescook
Fixes: 6b15f678fb7d ("include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures")
Signed-off-by: Kees Cook <[email protected]>
Reported-by: Christophe Leroy <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Drew Davenport <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: YueHaibing <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Instead of having separate tests for __WARN_FLAGS, merge the two #ifdef
blocks and replace the synonym WANT_WARN_ON_SLOWPATH macro.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Drew Davenport <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: YueHaibing <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In preparation for cleaning up "cut here" even more, this removes the
__WARN_*TAINT() helpers, as they limit the ability to add new BUGFLAG_*
flags to call sites. They are removed by expanding them into full
__WARN_FLAGS() calls.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Drew Davenport <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: YueHaibing <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In preparation for cleaning up "cut here", move the "cut here" logic up
out of __warn() and into callers that pass non-NULL args. For anyone
looking closely, there are two callers that pass NULL args: one already
explicitly prints "cut here". The remaining case is covered by how a WARN
is built, which will be cleaned up in the next patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Drew Davenport <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: YueHaibing <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Instead of having a separate helper for no printk output, just consolidate
the logic into warn_slowpath_fmt().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Drew Davenport <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: YueHaibing <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This just renames the helper to improve readability.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Drew Davenport <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: YueHaibing <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Clean up WARN() "cut here" handling", v2.
Christophe Leroy noticed that the fix for missing "cut here" in the WARN()
case was adding explicit printk() calls instead of teaching the exception
handler to add it. This refactors the bug/warn infrastructure to pass
this information as a new BUGFLAG.
Longer details repeated from the last patch in the series:
bug: move WARN_ON() "cut here" into exception handler
The original cleanup of "cut here" missed the WARN_ON() case (that does
not have a printk message), which was fixed recently by adding an explicit
printk of "cut here". This had the downside of adding a printk() to every
WARN_ON() caller, which reduces the utility of using an instruction
exception to streamline the resulting code. By making this a new BUGFLAG,
all of these can be removed and "cut here" can be handled by the exception
handler.
This was very pronounced on PowerPC, but the effect can be seen on x86 as
well. The resulting text size of a defconfig build shows some small
savings from this patch:
text data bss dec hex filename
19691167 5134320 1646664 26472151 193eed7 vmlinux.before
19676362 5134260 1663048 26473670 193f4c6 vmlinux.after
This change also opens the door for creating something like BUG_MSG(),
where a custom printk() before issuing BUG(), without confusing the "cut
here" line.
This patch (of 7):
There's no reason to have specialized helpers for passing the warn taint
down to __warn(). Consolidate and refactor helper macros, removing
__WARN_printf() and warn_slowpath_fmt_taint().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Drew Davenport <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: YueHaibing <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Some systems (like Chrome OS) may use "split debug" for kernel modules.
That means that the debug symbols are in a different file than the main
elf file. Let's handle that by also searching for debug symbols that end
in ".ko.debug".
This is a packaging topic. You can take a normal elf file and split the
debug out of it using objcopy. Try "man objcopy" and then take a look at
the "--only-keep-debug" option. It'll give you a whole recipe for doing
splitdebug. The suffix used for the debug symbols is arbitrary. If
people have other another suffix besides ".ko.debug" then we could
presumably support that too...
For portage (which is the packaging system used by Chrome OS) split debug
is supported by default (and the suffix is .ko.debug). ...and so in
Chrome OS we always get the installed elf files stripped and then the
symbols stashed away.
At the moment we don't actually use the normal portage magic to do this
for the kernel though since it affects our ability to get good stack dumps
in the kernel. We instead pass a script as "strip" [1].
[1] https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/refs/heads/master/eclass/cros-kernel/strip_splitdebug
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Reviewed-by: Jan Kiszka <[email protected]>
Cc: Kieran Bingham <[email protected]>
Cc: Jason Wessel <[email protected]>
Cc: Daniel Thompson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Right now kgdb/kdb hooks up to debug panics by registering for the panic
notifier. This works OK except that it means that kgdb/kdb gets called
_after_ the CPUs in the system are taken offline. That means that if
anything important was happening on those CPUs (like something that might
have contributed to the panic) you can't debug them.
Specifically I ran into a case where I got a panic because a task was
"blocked for more than 120 seconds" which was detected on CPU 2. I nicely
got shown stack traces in the kernel log for all CPUs including CPU 0,
which was running 'PID: 111 Comm: kworker/0:1H' and was in the middle of
__mmc_switch().
I then ended up at the kdb prompt where switched over to kgdb to try to
look at local variables of the process on CPU 0. I found that I couldn't.
Digging more, I found that I had no info on any tasks running on CPUs
other than CPU 2 and that asking kdb for help showed me "Error: no saved
data for this cpu". This was because all the CPUs were offline.
Let's move the entry of kdb/kgdb to a direct call from panic() and stop
using the generic notifier. Putting a direct call in allows us to order
things more properly and it also doesn't seem like we're breaking any
abstractions by calling into the debugger from the panic function.
Daniel said:
: This patch changes the way kdump and kgdb interact with each other.
: However it would seem rather odd to have both tools simultaneously armed
: and, even if they were, the user still has the option to use panic_timeout
: to force a kdump to happen. Thus I think the change of order is
: acceptable.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Daniel Thompson <[email protected]>
Cc: Jason Wessel <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 9012d011660e ("compiler: allow all arches to enable
CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable this
option. A couple of build errors were reported by randconfig, but all of
them have been ironed out.
Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely (and it
will simplify the 'inline' macro in compiler_types.h), this commit changes
it to always-on option. Going forward, the compiler will always be
allowed to not inline functions marked 'inline'.
This is not a problem for x86 since it has been long used by
arch/x86/configs/{x86_64,i386}_defconfig.
I am keeping the config option just in case any problem crops up for other
architectures.
The code clean-up will be done after confirming this is solid.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Nick Desaulniers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The usercopy implementation comments describe that callers of the
copy_*_user() family of functions must always have their return values
checked. This can be enforced at compile time with __must_check, so add
it where needed.
Link: http://lkml.kernel.org/r/201908251609.ADAD5CAAC1@keescook
Signed-off-by: Kees Cook <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Carpenter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
arch_kexec_kernel_image_probe function declaration has been removed by
commit 9ec4ecef0af7 ("kexec_file,x86,powerpc: factor out kexec_file_ops
functions"). Still this function is overridden by couple of architectures
and proper prototype declaration is therefore important, so bring it back.
This fixes the following sparse warning on s390:
arch/s390/kernel/machine_kexec_file.c:333:5: warning: symbol
'arch_kexec_kernel_image_probe' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/patch.git-ff1c9045ebdc.your-ad-here.call-01564402297-ext-5690@work.hours
Signed-off-by: Vasily Gorbik <[email protected]>
Acked-by: Dave Young <[email protected]>
Reviewed-by: Bhupesh Sharma <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: AKASHI Takahiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
syzbot found that a thread can stall for minutes inside kexec_load() after
that thread was killed by SIGKILL [1]. It turned out that the reproducer
was trying to allocate 2408MB of memory using kimage_alloc_page() from
kimage_load_normal_segment(). Let's check for SIGKILL before doing memory
allocation.
[1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tetsuo Handa <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Eric Biederman <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Mask arguments can be swapped without changing anything. Make arguments
names reflect that:
#define for_each_cpu_and(cpu, mask1, mask2)
Link: http://lkml.kernel.org/r/20190724183350.GA15041@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When a user process exits, the kernel cleans up the mm_struct of the user
process and during cleanup, check_mm() checks the page tables of the user
process for corruption (E.g: unexpected page flags set/cleared). For
corrupted page tables, the error message printed by check_mm() isn't very
clear as it prints the loop index instead of page table type (E.g:
Resident file mapping pages vs Resident shared memory pages). The loop
index in check_mm() is used to index rss_stat[] which represents
individual memory type stats. Hence, instead of printing index, print
memory type, thereby improving error message.
Without patch:
--------------
[ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467)
[ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2
[ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5
[ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
With patch:
-----------
[ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467)
[ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2
[ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5
[ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
that it matches the other print statement.
Link: http://lkml.kernel.org/r/da75b5153f617f4c5739c08ee6ebeb3d19db0fbc.1565123758.git.sai.praneeth.prakhya@intel.com
Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Suggested-by: Dave Hansen <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
brelse() tests whether its argument is NULL and then returns immediately.
Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Markus Elfring <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix the following gcc warning:
fs/reiserfs/do_balan.c: In function balance_leaf_insert_right:
fs/reiserfs/do_balan.c:629:6: warning: variable ret set but not used
[-Wunused-but-set-variable]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jason Yan <[email protected]>
Cc: zhengbin <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix the following gcc warning:
fs/reiserfs/journal.c: In function flush_used_journal_lists:
fs/reiserfs/journal.c:1791:6: warning: variable ret set but not used
[-Wunused-but-set-variable]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jason Yan <[email protected]>
Cc: zhengbin <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
fs/reiserfs/do_balan.c: In function balance_leaf_when_delete:
fs/reiserfs/do_balan.c:245:20: warning: variable ih set but not used [-Wunused-but-set-variable]
fs/reiserfs/do_balan.c: In function balance_leaf_insert_left:
fs/reiserfs/do_balan.c:301:7: warning: variable version set but not used [-Wunused-but-set-variable]
fs/reiserfs/do_balan.c: In function balance_leaf_insert_right:
fs/reiserfs/do_balan.c:649:7: warning: variable version set but not used [-Wunused-but-set-variable]
fs/reiserfs/do_balan.c: In function balance_leaf_new_nodes_insert:
fs/reiserfs/do_balan.c:953:7: warning: variable version set but not used [-Wunused-but-set-variable]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: zhengbin <[email protected]>
Reported-by: Hulk Robot <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
fs/reiserfs/fix_node.c: In function get_num_ver:
fs/reiserfs/fix_node.c:379:6: warning: variable cur_free set but not used [-Wunused-but-set-variable]
fs/reiserfs/fix_node.c: In function dc_check_balance_internal:
fs/reiserfs/fix_node.c:1737:6: warning: variable maxsize set but not used [-Wunused-but-set-variable]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: zhengbin <[email protected]>
Reported-by: Hulk Robot <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|