Age | Commit message (Collapse) | Author | Files | Lines |
|
Patch series "sections: Unify kernel sections range check and use", v4.
There are three head files(kallsyms.h, kernel.h and sections.h) which
include the kernel sections range check, let's make some cleanup and unify
them.
1. cleanup arch specific text/data check and fix address boundary check
in kallsyms.h
2. make all the basic/core kernel range check function into sections.h
3. update all the callers, and use the helper in sections.h to simplify
the code
After this series, we have 5 APIs about kernel sections range check in
sections.h
* is_kernel_rodata() --- already in sections.h
* is_kernel_core_data() --- come from core_kernel_data() in kernel.h
* is_kernel_inittext() --- come from kernel.h and kallsyms.h
* __is_kernel_text() --- add new internal helper
* __is_kernel() --- add new internal helper
Note: For the last two helpers, people should not use directly, consider to
use corresponding function in kallsyms.h.
This patch (of 11):
Remove arch specific text and data check after commit 4ba66a976072 ("arch:
remove blackfin port"), no need arch-specific text/data check.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
"A -= B; A" is equivalent to "A -= B".
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") reverted back to using MAP_FIXED to map ELF LOAD
segments because it was found that the segments in some binaries overlap
and can cause MAP_FIXED_NOREPLACE to fail.
The original intent of MAP_FIXED_NOREPLACE in the ELF loader was to
prevent the silent clobbering of an existing mapping (e.g. stack) by
the ELF image, which could lead to exploitable conditions. Quoting
commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map"),
which originally introduced the use of MAP_FIXED_NOREPLACE in the
loader:
Both load_elf_interp and load_elf_binary rely on elf_map to map
segments [to a specific] address and they use MAP_FIXED to enforce
that. This is however [a] dangerous thing prone to silent data
corruption which can be even exploitable.
...
Let's take CVE-2017-1000253 as an example ... we could end up mapping
[the executable] over the existing stack ... The [stack layout] issue
has been fixed since then ... So we should be safe and any [similar]
attack should be impractical. On the other hand this is just too
subtle [an] assumption ... it can break quite easily and [be] hard to
spot.
...
Address this [weakness] by changing MAP_FIXED to the newly added
MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is
an existing mapping clashing with the requested one [instead of
silently] clobbering it.
Then processing ET_DYN binaries the loader already calculates a total
size for the image when the first segment is mapped, maps the entire
image, and then unmaps the remainder before the remaining segments are
then individually mapped.
To avoid the earlier problems (legitimate overlapping LOAD segments
specified in the ELF), apply the same logic to ET_EXEC binaries as well.
For both ET_EXEC and ET_DYN+INTERP use MAP_FIXED_NOREPLACE for the
initial total size mapping and then use MAP_FIXED to build the final
(possibly legitimately overlapping) mappings. For ET_DYN w/out INTERP,
continue to map at a system-selected address in the mmap region.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]
Co-developed-by: Anthony Yznaga <[email protected]>
Signed-off-by: Anthony Yznaga <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Cc: Russell King <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Chen Jingwen <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The standard location of dictionary.txt is under codespell's package, on
my machine atm (codespell 2.1, Artix Linux):
/usr/lib/python3.9/site-packages/codespell_lib/data/dictionary.txt
Since we enable the codespell by default for SOF I have constant:
No codespell typos will be found - file '/usr/share/codespell/dictionary.txt': No such file or directory
The patch proposes to try to fix up the path following the
recommendation found here:
https://github.com/codespell-project/codespell/issues/1540
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Ujfalusi <[email protected]>
Acked-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The EXPORT_SYMBOL test expects a single argument but definitions of
EXPORT_SYMBOL_NS have multiple arguments.
Update the test to extract only the first argument from any
EXPORT_SYMBOL related definition.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Joe Perches <[email protected]>
Reported-by: Ian Pilcher <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a couple of commonly used (>50 instances) sound ops structs that are
typically const.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rikard Falkeborn <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Liam Girdwood <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Takashi Iwai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
sg_miter_stop() checks for disabled preemption before unmapping a page
via kunmap_atomic(). The kernel doc mentions under context that
preemption must be disabled if SG_MITER_ATOMIC is set.
There is no active requirement for the caller to have preemption
disabled before invoking sg_mitter_stop(). The sg_mitter_*()
implementation itself has no such requirement.
In fact, preemption is disabled by kmap_atomic() as part of
sg_miter_next() and remains disabled as long as there is an active
SG_MITER_ATOMIC mapping. This is a consequence of kmap_atomic() and not
a requirement for sg_mitter_*() itself.
The user chooses SG_MITER_ATOMIC because it uses the API in a context
where blocking is not possible or blocking is possible but he chooses a
lower weight mapping which is not available on all CPUs and so it might
need less overhead to setup at a price that now preemption will be
disabled.
The kmap_atomic() implementation on PREEMPT_RT does not disable
preemption. It simply disables CPU migration to ensure that the task
remains on the same CPU while the caller remains preemptible. This in
turn triggers the warning in sg_miter_stop() because preemption is
allowed.
The PREEMPT_RT and !PREEMPT_RT implementation of kmap_atomic() disable
pagefaults as a requirement. It is sufficient to check for this instead
of disabled preemption.
Check for disabled pagefault handler in the SG_MITER_ATOMIC case.
Remove the "preemption disabled" part from the kernel doc as the
sg_milter*() implementation does not care.
[[email protected]: commit description]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Codegen become bloated again after simple_strntoull() introduction
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-224 (-224)
Function old new delta
simple_strtoul 5 2 -3
simple_strtol 23 20 -3
simple_strtoull 119 15 -104
simple_strtoll 155 41 -114
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Cc: Richard Fitzgerald <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
linux/string_helpers.h uses strlen(), so include the correponding header.
Otherwise we get a compilation error if it's not also included by whoever
included the helper.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
To print stack entries into a buffer, users of stackdepot, first get a
list of stack entries using stack_depot_fetch and then print this list
into a buffer using stack_trace_snprint. Provide a helper in stackdepot
for this purpose. Also change above mentioned users to use this helper.
[[email protected]: fix build error]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: export stack_depot_snprint() to modules]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Imran Khan <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Jani Nikula <[email protected]> [i915]
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
To print a stack entries, users of stackdepot, first use stack_depot_fetch
to get a list of stack entries and then use stack_trace_print to print
this list. Provide a helper in stackdepot to print stack entries based on
stackdepot handle. Also change above mentioned users to use this helper.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Imran Khan <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "lib, stackdepot: check stackdepot handle before accessing slabs", v2.
PATCH-1: Checks validity of a stackdepot handle before proceeding to
access stackdepot slab/objects.
PATCH-2: Adds a helper in stackdepot, to allow users to print stack
entries just by specifying the stackdepot handle. It also changes such
users to use this new interface.
PATCH-3: Adds a helper in stackdepot, to allow users to print stack
entries into buffers just by specifying the stackdepot handle and
destination buffer. It also changes such users to use this new interface.
This patch (of 3):
stack_depot_save allocates slabs that will be used for storing objects in
future.If this slab allocation fails we may get to a situation where space
allocation for a new stack_record fails, causing stack_depot_save to
return 0 as handle. If user of this handle ends up invoking
stack_depot_fetch with this handle value, current implementation of
stack_depot_fetch will end up using slab from wrong index. To avoid this
check handle value at the beginning.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Imran Khan <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit f9e784dcb63f ("dt-bindings: hwlock: add sun6i_hwspinlock") adds
Documentation/devicetree/bindings/hwlock/allwinner,sun6i-a31-hwspinlock.yaml,
but the related commit 3c881e05c814 ("hwspinlock: add sun6i hardware
spinlock support") adds a file reference to
allwinner,sun6i-hwspinlock.yaml instead.
Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:
warning: no file matches F: Documentation/devicetree/bindings/hwlock/allwinner,sun6i-hwspinlock.yaml
Rectify this file reference in ALLWINNER HARDWARE SPINLOCK SUPPORT.
Link: https://lkml.kernel.org/r/[email protected]
Reviewed-by: Wilken Gottwalt <[email protected]>
Signed-off-by: Lukas Bulwahn <[email protected]>
Cc: Anitha Chrisanthus <[email protected]>
Cc: Edmund Dea <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Nobuhiro Iwamatsu <[email protected]>
Cc: Punit Agrawal <[email protected]>
Cc: Ralf Ramsauer <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Yu Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit ed794057b052 ("drm/kmb: Build files for KeemBay Display driver")
refers to the non-existing file intel,kmb_display.yaml in
Documentation/devicetree/bindings/display/.
Commit 5a76b1ed73b9 ("dt-bindings: display: Add support for Intel
KeemBay Display") originating from the same patch series however adds
the file intel,keembay-display.yaml in that directory instead.
So, refer to intel,keembay-display.yaml in the INTEL KEEM BAY DRM DRIVER
section instead.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ed794057b052 ("drm/kmb: Build files for KeemBay Display driver")
Signed-off-by: Lukas Bulwahn <[email protected]>
Cc: Anitha Chrisanthus <[email protected]>
Cc: Edmund Dea <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Nobuhiro Iwamatsu <[email protected]>
Cc: Punit Agrawal <[email protected]>
Cc: Ralf Ramsauer <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Wilken Gottwalt <[email protected]>
Cc: Yu Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 7a6ff4c4cbc3 ("misc: hisi_hikey_usb: Driver to support onboard
USB gpio hub on Hikey960") refers to the non-existing file
Documentation/devicetree/bindings/misc/hisilicon-hikey-usb.yaml, but
this commit's patch series does not add any related devicetree binding
in misc.
So, just drop this file reference in HIKEY960 ONBOARD USB GPIO HUB DRIVER.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7a6ff4c4cbc3 ("misc: hisi_hikey_usb: Driver to support onboard USB gpio hub on Hikey960")
Signed-off-by: Lukas Bulwahn <[email protected]>
Cc: Anitha Chrisanthus <[email protected]>
Cc: Edmund Dea <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Nobuhiro Iwamatsu <[email protected]>
Cc: Punit Agrawal <[email protected]>
Cc: Ralf Ramsauer <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Wilken Gottwalt <[email protected]>
Cc: Yu Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Rectify file references for dt-bindings in MAINTAINERS", v5.
A patch series that cleans up some file references for dt-bindings in
MAINTAINERS.
This patch (of 4):
Commit 836863a08c99 ("MAINTAINERS: Add information for Toshiba Visconti
ARM SoCs") refers to the non-existing file toshiba,tmpv7700-pinctrl.yaml
in ./Documentation/devicetree/bindings/pinctrl/. Commit 1825c1fe0057
("pinctrl: Add DT bindings for Toshiba Visconti TMPV7700 SoC")
originating from the same patch series however adds the file
toshiba,visconti-pinctrl.yaml in that directory instead.
So, refer to toshiba,visconti-pinctrl.yaml in the ARM/TOSHIBA VISCONTI
ARCHITECTURE section instead.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 836863a08c99 ("MAINTAINERS: Add information for Toshiba Visconti ARM SoCs")
Signed-off-by: Lukas Bulwahn <[email protected]>
Acked-by: Nobuhiro Iwamatsu <[email protected]>
Reviewed-by: Nobuhiro Iwamatsu <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Punit Agrawal <[email protected]>
Cc: Anitha Chrisanthus <[email protected]>
Cc: Wilken Gottwalt <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Yu Chen <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Edmund Dea <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Ralf Ramsauer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I'd like more continuity of review for the exec and binfmt (and ELF)
stuff. Eric and I have been the most active lately, so list us as
reviewers.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: Eric Biederman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Colin King has moved to Intel to update gmail and Canonical email
addresses.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
_Static_assert() is evaluated already in the compiler's frontend, and
gives a somehat more to-the-point error, compared to the BUILD_BUG_ON
macro, which only fires after the optimizer has had a chance to
eliminate calls to functions marked with __attribute__((error)). In
theory, this might make builds a tiny bit faster.
There's also a little less gunk in the error message emitted:
lib/sort.c: In function `foo':
include/linux/build_bug.h:78:41: error: static assertion failed: "pointer type mismatch in container_of()"
78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
compared to
lib/sort.c: In function `foo':
include/linux/compiler_types.h:322:38: error: call to `__compiletime_assert_2' declared with attribute error: pointer type mismatch in container_of()
322 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
While at it, fix the copy-pasto in container_of_safe().
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]/T/
Signed-off-by: Rasmus Villemoes <[email protected]>
Reviewed-by: Miguel Ojeda <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
bottom_half.h needs _THIS_IP_ to be standalone, so split that and
_RET_IP_ out from kernel.h into the new instruction_pointer.h. kernel.h
directly needs them, so include it there and replace the include of
kernel.h with this new file in bottom_half.h.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
inclusions
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
[[email protected]: include math.h for round_up()]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
[[email protected]: cxd2880_common.h needs bits.h for GENMASK()]
[[email protected]: delay.h: fix for removed kernel.h]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: include/linux/fwnode.h needs bits.h for BIT()]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Sakari Ailus <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt cleaning it up by splitting out container_of() and
typeof_member() macros.
For time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.
Note, there are _a lot_ of headers and modules that include kernel.h
solely for one of these macros and this allows to unburden compiler for
the twisted inclusion paths and to make new code cleaner in the future.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "kernel.h further split", v5.
kernel.h is a set of something which is not related to each other and
often used in non-crossed compilation units, especially when drivers
need only one or two macro definitions from it.
This patch (of 7):
There is no evidence we need kernel.h inclusion in certain headers.
Drop unneeded <linux/kernel.h> inclusion from other headers.
[[email protected]: bottom_half.h needs kernel]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Problem Description:
When running running ~128 parallel instances of
TZ=/etc/localtime ps -fe >/dev/null
on a 128CPU machine, the %sys utilization reaches 97%, and perf shows
the following code path as being responsible for heavy contention on the
d_lockref spinlock:
walk_component()
lookup_fast()
d_revalidate()
pid_revalidate() // returns -ECHILD
unlazy_child()
lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention
The reason is that pid_revalidate() is triggering a drop from RCU to ref
path walk mode. All concurrent path lookups thus try to grab a
reference to the dentry for /proc/, before re-executing pid_revalidate()
and then stepping into the /proc/$pid directory. Thus there is huge
spinlock contention.
This patch allows pid_revalidate() to execute in RCU mode, meaning that
the path lookup can successfully enter the /proc/$pid directory while
still in RCU mode. Later on, the path lookup may still drop into ref
mode, but the contention will be much reduced at this point.
By applying this patch, %sys utilization falls to around 85% under the
same workload, and the number of ps processes executed per unit time
increases by 3x-4x. Although this particular workload is a bit
contrived, we have seen some large collections of eager monitoring
scripts which produced similarly high %sys time due to contention in the
/proc directory.
As a result this patch, Al noted that several procfs methods which were
only called in ref-walk mode could now be called from RCU mode. To
ensure that this patch is safe, I audited all the inode get_link and
permission() implementations, as well as dentry d_revalidate()
implementations, in fs/proc. The purpose here is to ensure that they
either are safe to call in RCU (i.e. don't sleep) or correctly bail out
of RCU mode if they don't support it. My analysis shows that all
at-risk procfs methods are safe to call under RCU, and thus this patch
is safe.
Procfs RCU-walk Analysis:
This analysis is up-to-date with 5.15-rc3. When called under RCU mode,
these functions have arguments as follows:
* get_link() receives a NULL dentry pointer when called in RCU mode.
* permission() receives MAY_NOT_BLOCK in the mode parameter when called
from RCU.
* d_revalidate() receives LOOKUP_RCU in flags.
For the following functions, either they are trivially RCU safe, or they
explicitly bail at the beginning of the function when they run:
proc_ns_get_link (bails out)
proc_get_link (RCU safe)
proc_pid_get_link (bails out)
map_files_d_revalidate (bails out)
map_misc_d_revalidate (bails out)
proc_net_d_revalidate (RCU safe)
proc_sys_revalidate (bails out, also not under /proc/$pid)
tid_fd_revalidate (bails out)
proc_sys_permission (not under /proc/$pid)
The remainder of the functions require a bit more detail:
* proc_fd_permission: RCU safe. All of the body of this function is
under rcu_read_lock(), except generic_permission() which declares
itself RCU safe in its documentation string.
* proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware
and otherwise looks safe. The same is true of proc_thread_self_get_link.
* proc_map_files_get_link: calls ns_capable, which calls capable(), and
thus calls into the audit code (see note #1 below). The remainder is
just a call to the trivially safe proc_pid_get_link().
* proc_pid_permission: calls ptrace_may_access(), which appears RCU
safe, although it does call into the "security_ptrace_access_check()"
hook, which looks safe under smack and selinux. Just the audit code is
of concern. Also uses get_task_struct() and put_task_struct(), see
note #2 below.
* proc_tid_comm_permission: Appears safe, though calls put_task_struct
(see note #2 below).
Note #1:
Most of the concern of RCU safety has centered around the audit code.
However, since b17ec22fb339 ("selinux: slow_avc_audit has become
non-blocking"), it's safe to call this code under RCU. So all of the
above are safe by my estimation.
Note #2: get_task_struct() and put_task_struct():
The majority of get_task_struct() is under RCU read lock, and in any
case it is a simple increment. But put_task_struct() is complex, given
that it could at some point free the task struct, and this process has
many steps which I couldn't manually verify. However, several other
places call put_task_struct() under RCU, so it appears safe to use
here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296)
Patch description:
pid_revalidate() drops from RCU into REF lookup mode. When many threads
are resolving paths within /proc in parallel, this can result in heavy
spinlock contention on d_lockref as each thread tries to grab a
reference to the /proc dentry (and drop it shortly thereafter).
Investigation indicates that it is not necessary to drop RCU in
pid_revalidate(), as no RCU data is modified and the function never
sleeps. So, remove the LOOKUP_RCU check.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Brennan <[email protected]>
Cc: Konrad Wilk <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Although virtio-mem currently supports reading unplugged memory in the
hypervisor, this will change in the future, indicated to the device via
a new feature flag.
We similarly sanitized /proc/kcore access recently. [1]
Let's register a vmcore callback, to allow vmcore code to check if a PFN
belonging to a virtio-mem device is either currently plugged and should
be dumped or is currently unplugged and should not be accessed, instead
mapping the shared zeropage or returning zeroes when reading.
This is important when not capturing /proc/vmcore via tools like
"makedumpfile" that can identify logically unplugged virtio-mem memory
via PG_offline in the memmap, but simply by e.g., copying the file.
Distributions that support virtio-mem+kdump have to make sure that the
virtio_mem module will be part of the kdump kernel or the kdump initrd;
dracut was recently [2] extended to include virtio-mem in the generated
initrd. As long as no special kdump kernels are used, this will
automatically make sure that virtio-mem will be around in the kdump
initrd and sanitize /proc/vmcore access -- with dracut.
With this series, we'll send one virtio-mem state request for every ~2
MiB chunk of virtio-mem memory indicated in the vmcore that we intend to
read/map.
In the future, we might want to allow building virtio-mem for kdump mode
only, even without CONFIG_MEMORY_HOTPLUG and friends: this way, we could
support special stripped-down kdump kernels that have many other config
options disabled; we'll tackle that once required. Further, we might
want to try sensing bigger blocks (e.g., memory sections) first before
falling back to device blocks on demand.
Tested with Fedora rawhide, which contains a recent kexec-tools version
(considering "System RAM (virtio_mem)" when creating the vmcore header)
and a recent dracut version (including the virtio_mem module in the
kdump initrd).
Link: https://lkml.kernel.org/r/[email protected] [1]
Link: https://github.com/dracutdevs/dracut/pull/1157 [2]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
virtio_mem_deinit_hotplug()
Let's prepare for a new virtio-mem kdump mode in which we don't actually
hot(un)plug any memory but only observe the state of device blocks.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
virtio_mem_init_hotplug()
Let's prepare for a new virtio-mem kdump mode in which we don't actually
hot(un)plug any memory but only observe the state of device blocks.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
virtio_mem_init_hotplug()
Let's prepare for a new virtio-mem kdump mode in which we don't actually
hot(un)plug any memory but only observe the state of device blocks.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's support multiple registered callbacks, making sure that
registering vmcore callbacks cannot fail. Make the callback return a
bool instead of an int, handling how to deal with errors internally.
Drop unused HAVE_OLDMEM_PFN_IS_RAM.
We soon want to make use of this infrastructure from other drivers:
virtio-mem, registering one callback for each virtio-mem device, to
prevent reading unplugged virtio-mem memory.
Handle it via a generic vmcore_cb structure, prepared for future
extensions: for example, once we support virtio-mem on s390x where the
vmcore is completely constructed in the second kernel, we want to detect
and add plugged virtio-mem memory ranges to the vmcore in order for them
to get dumped properly.
Handle corner cases that are unexpected and shouldn't happen in sane
setups: registering a callback after the vmcore has already been opened
(warn only) and unregistering a callback after the vmcore has already been
opened (warn and essentially read only zeroes from that point on).
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The callback should deal with errors internally, it doesn't make sense
to expose these via pfn_is_ram(). We'll rework the callbacks next.
Right now we consider errors as if "it's RAM"; no functional change.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
HVMOP_get_mem_type is not expected to fail, "This call failing is
indication of something going quite wrong and it would be good to know
about this." [1]
Let's add a pr_warn_once().
Link: https://lkml.kernel.org/r/[email protected] [1]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Suggested-by: Boris Ostrovsky <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's simplify return handling.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem,
this series tackles the last sane way how a VM could accidentially
access logically unplugged memory managed by a virtio-mem device:
/proc/vmcore
When dumping memory via "makedumpfile", PG_offline pages, used by
virtio-mem to flag logically unplugged memory, are already properly
excluded; however, especially when accessing/copying /proc/vmcore "the
usual way", we can still end up reading logically unplugged memory part
of a virtio-mem device.
Patch #1-#3 are cleanups. Patch #4 extends the existing
oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings
for patch #8, which implements the virtio-mem logic to query the state
of device blocks.
Patch #8:
"Although virtio-mem currently supports reading unplugged memory in the
hypervisor, this will change in the future, indicated to the device
via a new feature flag. We similarly sanitized /proc/kcore access
recently.
[...]
Distributions that support virtio-mem+kdump have to make sure that the
virtio_mem module will be part of the kdump kernel or the kdump
initrd; dracut was recently [2] extended to include virtio-mem in the
generated initrd. As long as no special kdump kernels are used, this
will automatically make sure that virtio-mem will be around in the
kdump initrd and sanitize /proc/vmcore access -- with dracut"
This is the last remaining bit to support
VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of
virtio-mem.
Note: this is best-effort. We'll never be able to control what runs
inside the second kernel, really, but we also don't have to care: we
only care about sane setups where we don't want our VM getting zapped
once we touch the wrong memory location while dumping. While we usually
expect sane setups to use "makedumfile", nothing really speaks against
just copying /proc/vmcore, especially in environments where HWpoisioning
isn't typically expected. Also, we really don't want to put all our
trust completely on the memmap, so sanitizing also makes sense when just
using "makedumpfile".
[1] https://lkml.kernel.org/r/[email protected]
[2] https://github.com/dracutdevs/dracut/pull/1157
[3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html
This patch (of 9):
The callback is only used for the vmcore nowadays.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If a task exits concurrently, task_pid_nr_ns may return 0.
[[email protected]: coding style tweaks]
[[email protected]: test that /proc/*/task doesn't contain "0"]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Florian Weimer <[email protected]>
Signed-off-by: Alexey Dobriyan <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Reviewed-by: Alexey Dobriyan <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 21a3c273f88c ("mm, hugetlb: add thread name and pid to
SHM_HUGETLB mlock rlimit warning") marked this as deprecated in 2012,
but it is not deleted yet.
Mike says he still sees that message in log files on occasion, so maybe we
should preserve this warning.
Also remove hugetlbfs related user_shm_unlock in ipc/shm.c and remove the
user_shm_unlock after out.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: zhangyiru <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Liu Zixian <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: wuxu.wu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Historically (pre-2.5), the inode shrinker used to reclaim only empty
inodes and skip over those that still contained page cache. This caused
problems on highmem hosts: struct inode could put fill lowmem zones
before the cache was getting reclaimed in the highmem zones.
To address this, the inode shrinker started to strip page cache to
facilitate reclaiming lowmem. However, this comes with its own set of
problems: the shrinkers may drop actively used page cache just because
the inodes are not currently open or dirty - think working with a large
git tree. It further doesn't respect cgroup memory protection settings
and can cause priority inversions between containers.
Nowadays, the page cache also holds non-resident info for evicted cache
pages in order to detect refaults. We've come to rely heavily on this
data inside reclaim for protecting the cache workingset and driving swap
behavior. We also use it to quantify and report workload health through
psi. The latter in turn is used for fleet health monitoring, as well as
driving automated memory sizing of workloads and containers, proactive
reclaim and memory offloading schemes.
The consequences of dropping page cache prematurely is that we're seeing
subtle and not-so-subtle failures in all of the above-mentioned
scenarios, with the workload generally entering unexpected thrashing
states while losing the ability to reliably detect it.
To fix this on non-highmem systems at least, going back to rotating
inodes on the LRU isn't feasible. We've tried (commit a76cf1a474d7
("mm: don't reclaim inodes with many attached pages")) and failed
(commit 69056ee6a8a3 ("Revert "mm: don't reclaim inodes with many
attached pages"")).
The issue is mostly that shrinker pools attract pressure based on their
size, and when objects get skipped the shrinkers remember this as
deferred reclaim work. This accumulates excessive pressure on the
remaining inodes, and we can quickly eat into heavily used ones, or
dirty ones that require IO to reclaim, when there potentially is plenty
of cold, clean cache around still.
Instead, this patch keeps populated inodes off the inode LRU in the
first place - just like an open file or dirty state would. An otherwise
clean and unused inode then gets queued when the last cache entry
disappears. This solves the problem without reintroducing the reclaim
issues, and generally is a bit more scalable than having to wade through
potentially hundreds of thousands of busy inodes.
Locking is a bit tricky because the locks protecting the inode state
(i_lock) and the inode LRU (lru_list.lock) don't nest inside the
irq-safe page cache lock (i_pages.xa_lock). Page cache deletions are
serialized through i_lock, taken before the i_pages lock, to make sure
depopulated inodes are queued reliably. Additions may race with
deletions, but we'll check again in the shrinker. If additions race
with the shrinker itself, we're protected by the i_lock: if find_inode()
or iput() win, the shrinker will bail on the elevated i_count or
I_REFERENCED; if the shrinker wins and goes ahead with the inode, it
will set I_FREEING and inhibit further igets(), which will cause the
other side to create a new instance of the inode instead.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Dave Chinner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Fix-ups:
- Standardise *_exit() and *_remove() return values in ili9320 and
vgg2432a4
Bug Fixes:
- Do not override maximum brightness
- Propagate errors from get_brightness()"
* tag 'backlight-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
video: backlight: ili9320: Make ili9320_remove() return void
backlight: Propagate errors from get_brightness()
video: backlight: Drop maximum brightness override for brightness zero
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"Removed Drivers:
- Remove support for TI TPS80031/TPS80032 PMICs
New Device Support:
- Add support for Magnetic Reader to TI AM335x
- Add support for DA9063_EA to Dialog DA9063
- Add support for SC2730 PMIC to Spreadtrum SC27xx
- Add support for MacBookPro16,2 ICL-N UART Intel LPSS PCI
- Add support for lots of new PMICS in QCom SPMI PMIC
- Add support for ADC to Diolan DLN2
New Functionality:
- Add support for Power Off to Rockchip RK817
Fix-ups:
- Simplify Regmap passing to child devices in hi6421-spmi-pmic
- SPDX licensing updates in ti_am335x_tscadc
- Improve error handling in ti_am335x_tscadc
- Expedite clock search in ti_am335x_tscadc
- Generic simplifications in ti_am335x_tscadc
- Use generic macros/defines in ti_am335x_tscadc
- Remove unused code in ti_am335x_tscadc, cros_ec_dev
- Convert to GPIOD in wcd934x
- Add namespacing in ti_am335x_tscadc
- Restrict compilation to relevant arches in intel_pmt
- Provide better description/documentation in exynos_lpass
- Add SPI device ID table in altera-a10sr, motorola-cpcap,
sprd-sc27xx-spi
- Change IRQ handling in qcom-pm8xxx
- Split out I2C and SPI code in arizona
- Explicitly include used headers in altera-a10sr
- Convert sysfs show() function to in sysfs_emit
- Standardise *_exit() and *_remove() return values in mc13xxx,
stmpe, tps65912
- Trivial (style/spelling/whitespace) fixups in ti_am335x_tscadc,
qcom-spmi-pmic, max77686-private
- Device Tree fix-ups in ti,am3359-tscadc, samsung,s2mps11,
samsung,s2mpa01, samsung,s5m8767, brcm,misc, brcm,cru, syscon,
qcom,tcsr, xylon,logicvc, max77686, x-powers,ac100,
x-powers,axp152, x-powers,axp209-gpio, syscon, qcom,spmi-pmic
Bug Fixes:
- Balance refcounting (get/put) in ti_am335x_tscadc, mfd-core
- Fix IRQ trigger type in sec-irq, max77693, max14577
- Repair off-by-one in altera-sysmgr
- Add explicit 'select MFD_CORE' to MFD_SIMPLE_MFD_I2C"
* tag 'mfd-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (95 commits)
mfd: simple-mfd-i2c: Select MFD_CORE to fix build error
mfd: tps80031: Remove driver
mfd: max77686: Correct tab-based alignment of register addresses
mfd: wcd934x: Replace legacy gpio interface for gpiod
dt-bindings: mfd: qcom: pm8xxx: Add pm8018 compatible
mfd: dln2: Add cell for initializing DLN2 ADC
mfd: qcom-spmi-pmic: Add missing PMICs supported by socinfo
mfd: qcom-spmi-pmic: Document ten more PMICs in the binding
mfd: qcom-spmi-pmic: Sort compatibles in the driver
mfd: qcom-spmi-pmic: Sort the compatibles in the binding
mfd: janz-cmoio: Replace snprintf in show functions with sysfs_emit
mfd: altera-a10sr: Include linux/module.h
mfd: tps65912: Make tps65912_device_exit() return void
mfd: stmpe: Make stmpe_remove() return void
mfd: mc13xxx: Make mc13xxx_common_exit() return void
dt-bindings: mfd: syscon: Add samsung,exynosautov9-sysreg compatible
mfd: altera-sysmgr: Fix a mistake caused by resource_size conversion
dt-bindings: gpio: Convert X-Powers AXP209 GPIO binding to a schema
dt-bindings: mfd: syscon: Add rk3368 QoS register compatible
mfd: arizona: Split of_match table into I2C and SPI versions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"We have a single new driver, new features in others and some cleanups
all over the place.
Nothing really stands out and it is all relatively small.
- new driver: gpio-modepin (plus relevant change in zynqmp firmware)
- add interrupt support to gpio-virtio
- enable the 'gpio-line-names' property in the DT bindings for
gpio-rockchip
- use the subsystem helpers where applicable in gpio-uniphier instead
of accessing IRQ structures directly
- code shrink in gpio-xilinx
- add interrupt to gpio-mlxbf2 (and include the removal of custom
interrupt code from the mellanox ethernet driver)
- support multiple interrupts per bank in gpio-tegra186 (and force
one interrupt per bank in older models)
- fix GPIO line IRQ offset calculation in gpio-realtek-otto
- drop unneeded MODULE_ALIAS expansions in multiple drivers
- code cleanup in gpio-aggregator
- minor improvements in gpio-max730x and gpio-mc33880
- Kconfig cleanups"
* tag 'gpio-updates-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
virtio_gpio: drop packed attribute
gpio: virtio: Add IRQ support
gpio: realtek-otto: fix GPIO line IRQ offset
gpio: clean up Kconfig file
net: mellanox: mlxbf_gige: Replace non-standard interrupt handling
gpio: mlxbf2: Introduce IRQ support
gpio: mc33880: Drop if with an always false condition
gpio: max730x: Make __max730x_remove() return void
gpio: aggregator: Wrap access to gpiochip_fwd.tmp[]
gpio: modepin: Add driver support for modepin GPIO controller
dt-bindings: gpio: zynqmp: Add binding documentation for modepin
firmware: zynqmp: Add MMIO read and write support for PS_MODE pin
gpio: tps65218: drop unneeded MODULE_ALIAS
gpio: max77620: drop unneeded MODULE_ALIAS
gpio: xilinx: simplify getting .driver_data
gpio: tegra186: Support multiple interrupts per bank
gpio: tegra186: Force one interrupt per bank
gpio: uniphier: Use helper functions to get private data from IRQ data
gpio: uniphier: Use helper function to get IRQ hardware number
dt-bindings: gpio: add gpio-line-names to rockchip,gpio-bank.yaml
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl updates from Dan Williams:
"More preparation and plumbing work in the CXL subsystem.
From an end user perspective the highlight here is lighting up the CXL
Persistent Memory related commands (label read / write) with the
generic ioctl() front-end in LIBNVDIMM.
Otherwise, the ability to instantiate new persistent and volatile
memory regions is still on track for v5.17.
Summary:
- Fix support for platforms that do not enumerate every ACPI0016 (CXL
Host Bridge) in the CHBS (ACPI Host Bridge Structure).
- Introduce a common pci_find_dvsec_capability() helper, clean up
open coded implementations in various drivers.
- Add 'cxl_test' for regression testing CXL subsystem ABIs.
'cxl_test' is a module built from tools/testing/cxl/ that mocks up
a CXL topology to augment the nascent support for emulation of CXL
devices in QEMU.
- Convert libnvdimm to use the uuid API.
- Complete the definition of CXL namespace labels in libnvdimm.
- Tunnel libnvdimm label operations from nd_ioctl() back to the CXL
mailbox driver. Enable 'ndctl {read,write}-labels' for CXL.
- Continue to sort and refactor functionality into distinct driver
and core-infrastructure buckets. For example, mailbox handling is
now a generic core capability consumed by the PCI and cxl_test
drivers"
* tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits)
ocxl: Use pci core's DVSEC functionality
cxl/pci: Use pci core's DVSEC functionality
PCI: Add pci_find_dvsec_capability to find designated VSEC
cxl/pci: Split cxl_pci_setup_regs()
cxl/pci: Add @base to cxl_register_map
cxl/pci: Make more use of cxl_register_map
cxl/pci: Remove pci request/release regions
cxl/pci: Fix NULL vs ERR_PTR confusion
cxl/pci: Remove dev_dbg for unknown register blocks
cxl/pci: Convert register block identifiers to an enum
cxl/acpi: Do not fail cxl_acpi_probe() based on a missing CHBS
cxl/pci: Disambiguate cxl_pci further from cxl_mem
Documentation/cxl: Add bus internal docs
cxl/core: Split decoder setup into alloc + add
tools/testing/cxl: Introduce a mock memory device + driver
cxl/mbox: Move command definitions to common location
cxl/bus: Populate the target list at decoder create
tools/testing/cxl: Introduce a mocked-up CXL port hierarchy
cxl/pmem: Add support for multiple nvdimm-bridge objects
cxl/pmem: Translate NVDIMM label commands to CXL label commands
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
- big refactoring of the PASEMI driver to support the Apple M1
- huge improvements to the XIIC in terms of locking and SMP safety
- refactoring and clean ups for the i801 driver
... and the usual bunch of small driver updates
* 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (43 commits)
i2c: amd-mp2-plat: ACPI: Use ACPI_COMPANION() directly
i2c: i801: Add support for Intel Ice Lake PCH-N
i2c: virtio: update the maintainer to Conghui
i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()'
i2c: qup: move to use request_irq by IRQF_NO_AUTOEN flag
i2c: qup: fix a trivial typo
i2c: tegra: Ensure that device is suspended before driver is removed
i2c: i801: Fix incorrect and needless software PEC disabling
i2c: mediatek: Dump i2c/dma register when a timeout occurs
i2c: mediatek: Reset the handshake signal between i2c and dma
i2c: mlxcpld: Allow flexible polling time setting for I2C transactions
i2c: pasemi: Set enable bit for Apple variant
i2c: pasemi: Add Apple platform driver
i2c: pasemi: Refactor _probe to use devm_*
i2c: pasemi: Allow to configure bus frequency
i2c: pasemi: Move common reset code to own function
i2c: pasemi: Split pci driver to its own file
i2c: pasemi: Split off common probing code
i2c: pasemi: Remove usage of pci_dev
i2c: pasemi: Use dev_name instead of port number
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd updates from Miquel Raynal:
"Core:
- Remove obsolete macros only used by the old nand_ecclayout struct
- Don't remove debugfs directory if device is in use
- MAINTAINERS:
- Add entry for Qualcomm NAND controller driver
- Update the devicetree documentation path of hyperbus
MTD devices:
- block2mtd:
- Add support for an optional custom MTD label
- Minor refactor to avoid hard coded constant
- mtdswap: Remove redundant assignment of pointer eb
CFI:
- Fixup CFI on ixp4xx
Raw NAND controller drivers:
- Arasan:
- Prevent an unsupported configuration
- Xway, Socrates: plat_nand, Pasemi, Orion, mpc5121, GPIO, Au1550nd,
AMS-Delta:
- Keep the driver compatible with on-die ECC engines
- cs553x, lpc32xx_slc, ndfc, sharpsl, tmio, txx9ndfmc:
- Revert the commits: "Fix external use of SW Hamming ECC helper"
- And let callers use the bare Hamming helpers
- Fsmc: Fix use of SM ORDER
- Intel:
- Fix potential buffer overflow in probe
- xway, vf610, txx9ndfm, tegra, stm32, plat_nand, oxnas, omap, mtk,
hisi504, gpmi, gpio, denali, bcm6368, atmel:
- Make use of the helper function devm_platform_ioremap_resource{,byname}()
Onenand drivers:
- Samsung: Drop Exynos4 and describe driver in KConfig
Raw NAND chip drivers:
- Hynix: Add support for H27UCG8T2ETR-BC MLC NAND
SPI NOR core:
- Add spi-nor device tree binding under SPI NOR maintainers
SPI NOR manufacturer drivers:
- Enable locking for n25q128a13
SPI NOR controller drivers:
- Use devm_platform_ioremap_resource_byname()"
* tag 'mtd/for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (50 commits)
mtd: core: don't remove debugfs directory if device is in use
MAINTAINERS: Update the devicetree documentation path of hyperbus
mtd: block2mtd: add support for an optional custom MTD label
mtd: block2mtd: minor refactor to avoid hard coded constant
mtd: fixup CFI on ixp4xx
mtd: rawnand: arasan: Prevent an unsupported configuration
MAINTAINERS: Add entry for Qualcomm NAND controller driver
mtd: rawnand: hynix: Add support for H27UCG8T2ETR-BC MLC NAND
mtd: rawnand: xway: Keep the driver compatible with on-die ECC engines
mtd: rawnand: socrates: Keep the driver compatible with on-die ECC engines
mtd: rawnand: plat_nand: Keep the driver compatible with on-die ECC engines
mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC engines
mtd: rawnand: orion: Keep the driver compatible with on-die ECC engines
mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC engines
mtd: rawnand: gpio: Keep the driver compatible with on-die ECC engines
mtd: rawnand: au1550nd: Keep the driver compatible with on-die ECC engines
mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC engines
Revert "mtd: rawnand: cs553x: Fix external use of SW Hamming ECC helper"
Revert "mtd: rawnand: lpc32xx_slc: Fix external use of SW Hamming ECC helper"
Revert "mtd: rawnand: ndfc: Fix external use of SW Hamming ECC helper"
...
|