Age | Commit message (Collapse) | Author | Files | Lines |
|
Patch series "mm/memory_hotplug: "auto-movable" online policy and memory groups", v3.
I. Goal
The goal of this series is improving in-kernel auto-online support. It
tackles the fundamental problems that:
1) We can create zone imbalances when onlining all memory blindly to
ZONE_MOVABLE, in the worst case crashing the system. We have to know
upfront how much memory we are going to hotplug such that we can
safely enable auto-onlining of all hotplugged memory to ZONE_MOVABLE
via "online_movable". This is far from practical and only applicable in
limited setups -- like inside VMs under the RHV/oVirt hypervisor which
will never hotplug more than 3 times the boot memory (and the
limitation is only in place due to the Linux limitation).
2) We see more setups that implement dynamic VM resizing, hot(un)plugging
memory to resize VM memory. In these setups, we might hotplug a lot of
memory, but it might happen in various small steps in both directions
(e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...). virtio-mem is the
primary driver of this upstream right now, performing such dynamic
resizing NUMA-aware via multiple virtio-mem devices.
Onlining all hotplugged memory to ZONE_NORMAL means we basically have
no hotunplug guarantees. Onlining all to ZONE_MOVABLE means we can
easily run into zone imbalances when growing a VM. We want a mixture,
and we want as much memory as reasonable/configured in ZONE_MOVABLE.
Details regarding zone imbalances can be found at [1].
3) Memory devices consist of 1..X memory block devices, however, the
kernel doesn't really track the relationship. Consequently, also user
space has no idea. We want to make per-device decisions.
As one example, for memory hotunplug it doesn't make sense to use a
mixture of zones within a single DIMM: we want all MOVABLE if
possible, otherwise all !MOVABLE, because any !MOVABLE part will easily
block the whole DIMM from getting hotunplugged.
As another example, virtio-mem operates on individual units that span
1..X memory blocks. Similar to a DIMM, we want a unit to either be all
MOVABLE or !MOVABLE. A "unit" can be thought of like a DIMM, however,
all units of a virtio-mem device logically belong together and are
managed (added/removed) by a single driver. We want as much memory of
a virtio-mem device to be MOVABLE as possible.
4) We want memory onlining to be done right from the kernel while adding
memory, not triggered by user space via udev rules; for example, this
is reqired for fast memory hotplug for drivers that add individual
memory blocks, like virito-mem. We want a way to configure a policy in
the kernel and avoid implementing advanced policies in user space.
The auto-onlining support we have in the kernel is not sufficient. All we
have is a) online everything MOVABLE (online_movable) b) online everything
!MOVABLE (online_kernel) c) keep zones contiguous (online). This series
allows configuring c) to mean instead "online movable if possible
according to the coniguration, driven by a maximum MOVABLE:KERNEL ratio"
-- a new onlining policy.
II. Approach
This series does 3 things:
1) Introduces the "auto-movable" online policy that initially operates on
individual memory blocks only. It uses a maximum MOVABLE:KERNEL ratio
to make a decision whether a memory block will be onlined to
ZONE_MOVABLE or not. However, in the basic form, hotplugged KERNEL
memory does not allow for more MOVABLE memory (details in the
patches). CMA memory is treated like MOVABLE memory.
2) Introduces static (e.g., DIMM) and dynamic (e.g., virtio-mem) memory
groups and uses group information to make decisions in the
"auto-movable" online policy across memory blocks of a single memory
device (modeled as memory group). More details can be found in patch
#3 or in the DIMM example below.
3) Maximizes ZONE_MOVABLE memory within dynamic memory groups, by
allowing ZONE_NORMAL memory within a dynamic memory group to allow for
more ZONE_MOVABLE memory within the same memory group. The target use
case is dynamic VM resizing using virtio-mem. See the virtio-mem
example below.
I remember that the basic idea of using a ratio to implement a policy in
the kernel was once mentioned by Vitaly Kuznetsov, but I might be wrong (I
lost the pointer to that discussion).
For me, the main use case is using it along with virtio-mem (and DIMMs /
ppc64 dlpar where necessary) for dynamic resizing of VMs, increasing the
amount of memory we can hotunplug reliably again if we might eventually
hotplug a lot of memory to a VM.
III. Target Usage
The target usage will be:
1) Linux boots with "mhp_default_online_type=offline"
2) User space (e.g., systemd unit) configures memory onlining (according
to a config file and system properties), for example:
* Setting memory_hotplug.online_policy=auto-movable
* Setting memory_hotplug.auto_movable_ratio=301
* Setting memory_hotplug.auto_movable_numa_aware=true
3) User space enabled auto onlining via "echo online >
/sys/devices/system/memory/auto_online_blocks"
4) User space triggers manual onlining of all already-offline memory
blocks (go over offline memory blocks and set them to "online")
IV. Example
For DIMMs, hotplugging 4 GiB DIMMs to a 4 GiB VM with a configured ratio of
301% results in the following layout:
Memory block 0-15: DMA32 (early)
Memory block 32-47: Normal (early)
Memory block 48-79: Movable (DIMM 0)
Memory block 80-111: Movable (DIMM 1)
Memory block 112-143: Movable (DIMM 2)
Memory block 144-275: Normal (DIMM 3)
Memory block 176-207: Normal (DIMM 4)
... all Normal
(-> hotplugged Normal memory does not allow for more Movable memory)
For virtio-mem, using a simple, single virtio-mem device with a 4 GiB VM
will result in the following layout:
Memory block 0-15: DMA32 (early)
Memory block 32-47: Normal (early)
Memory block 48-143: Movable (virtio-mem, first 12 GiB)
Memory block 144: Normal (virtio-mem, next 128 MiB)
Memory block 145-147: Movable (virtio-mem, next 384 MiB)
Memory block 148: Normal (virtio-mem, next 128 MiB)
Memory block 149-151: Movable (virtio-mem, next 384 MiB)
... Normal/Movable mixture as above
(-> hotplugged Normal memory allows for more Movable memory within
the same device)
Which gives us maximum flexibility when dynamically growing/shrinking a
VM in smaller steps.
V. Doc Update
I'll update the memory-hotplug.rst documentation, once the overhaul [1] is
usptream. Until then, details can be found in patch #2.
VI. Future Work
1) Use memory groups for ppc64 dlpar
2) Being able to specify a portion of (early) kernel memory that will be
excluded from the ratio. Like "128 MiB globally/per node" are excluded.
This might be helpful when starting VMs with extremely small memory
footprint (e.g., 128 MiB) and hotplugging memory later -- not wanting
the first hotplugged units getting onlined to ZONE_MOVABLE. One
alternative would be a trigger to not consider ZONE_DMA memory
in the ratio. We'll have to see if this is really rrequired.
3) Indicate to user space that MOVABLE might be a bad idea -- especially
relevant when memory ballooning without support for balloon compaction
is active.
This patch (of 9):
For implementing a new memory onlining policy, which determines when to
online memory blocks to ZONE_MOVABLE semi-automatically, we need the
number of present early (boot) pages -- present pages excluding hotplugged
pages. Let's track these pages per zone.
Pass a page instead of the zone to adjust_present_page_count(), similar as
adjust_managed_page_count() and derive the zone from the page.
It's worth noting that a memory block to be offlined/onlined is either
completely "early" or "not early". add_memory() and friends can only add
complete memory blocks and we only online/offline complete (individual)
memory blocks.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Marek Kedzierski <[email protected]>
Cc: Hui Zhu <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We allocate + initialize everything from scratch. In case enabling the
device fails, we free all memory resourcs.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pierre Morel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Scott Cheloha <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is only a single user remaining. We can simply lookup the nid only
used for node offlining purposes when walking our memory blocks. We don't
expect to remove multi-nid ranges; and if we'd ever do, we most probably
don't care about removing multi-nid ranges that actually result in empty
nodes.
If ever required, we can detect the "multi-nid" scenario and simply try
offlining all online nodes.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Scott Cheloha <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pierre Morel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The parameter is unused, let's remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Acked-by: Heiko Carstens <[email protected]> [s390]
Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Pierre Morel <[email protected]>
Cc: Jia He <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Len Brown <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Scott Cheloha <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory"
These are all cleanups and one fix previously sent as part of [1]:
[PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory
groups.
These patches make sense even without the other series, therefore I pulled
them out to make the other series easier to digest.
[1] https://lkml.kernel.org/r/[email protected]
This patch (of 4):
Checkpatch complained on a follow-up patch that we are using "unsigned"
here, which defaults to "unsigned int" and checkpatch is correct.
As we will search for a fitting zone using the wrong pfn, we might end
up onlining memory to one of the special kernel zones, such as ZONE_DMA,
which can end badly as the onlined memory does not satisfy properties of
these zones.
Use "unsigned long" instead, just as we do in other places when handling
PFNs. This can bite us once we have physical addresses in the range of
multiple TB.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e5e689302633 ("mm, memory_hotplug: display allowed zones in the preferred ordering")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pierre Morel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Scott Cheloha <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When test_pages_in_a_zone() used pfn_valid_within() is has some logic
surrounding pfn_valid_within() checks.
Since pfn_valid_within() is gone, this logic can be removed.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE".
After recent updates to freeing unused parts of the memory map, no
architecture can have holes in the memory map within a pageblock. This
makes pfn_valid_within() check and CONFIG_HOLES_IN_ZONE configuration
option redundant.
The first patch removes them both in a mechanical way and the second patch
simplifies memory_hotplug::test_pages_in_a_zone() that had
pfn_valid_within() surrounded by more logic than simple if.
This patch (of 2):
After recent changes in freeing of the unused parts of the memory map and
rework of pfn_valid() in arm and arm64 there are no architectures that can
have holes in the memory map within a pageblock and so nothing can enable
CONFIG_HOLES_IN_ZONE which guards non trivial implementation of
pfn_valid_within().
With that, pfn_valid_within() is always hardwired to 1 and can be
completely removed.
Remove calls to pfn_valid_within() and CONFIG_HOLES_IN_ZONE.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The memory hot(un)plug documentation is outdated and incomplete. Most of
the content dates back to 2007, so it's time for a major overhaul.
Let's rewrite, reorganize and update most parts of the documentation. In
addition to memory hot(un)plug, also add some details regarding
ZONE_MOVABLE, with memory hotunplug being one of its main consumers.
Drop the file history, that information can more reliably be had from the
git log.
The style of the document is also properly fixed that e.g., "restview"
renders it cleanly now.
In the future, we might add some more details about virt users like
virtio-mem, the XEN balloon, the Hyper-V balloon and ppc64 dlpar.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "memory-hotplug.rst: complete admin-guide overhaul", v3.
This patch (of 2):
We have the same content at Documentation/core-api/memory-hotplug.rst and
it doesn't fit into the admin-guide. The documentation was accidentially
duplicated when merging.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
No need to check for 64BIT. While at it, let's just select
ARCH_SUPPORTS_HUGETLBFS from arch/s390/Kconfig.
Signed-off-by: David Hildenbrand <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
|
|
syzbot is reporting page fault at vga16fb_fillrect() [1], for
vga16fb_check_var() is failing to detect multiplication overflow.
if (vxres * vyres > maxmem) {
vyres = maxmem / vxres;
if (vyres < yres)
return -ENOMEM;
}
Since no module would accept too huge resolutions where multiplication
overflow happens, let's reject in the common path.
Link: https://syzkaller.appspot.com/bug?extid=04168c8063cfdde1db5e [1]
Reported-by: syzbot <[email protected]>
Debugged-by: Randy Dunlap <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
When using gcc (SUSE Linux) 7.5.0 (on openSUSE 15.3), I see a build
warning:
kernel/trace/trace_osnoise.c: In function 'start_kthread':
kernel/trace/trace_osnoise.c:1461:8: warning: 'main' is usually a function [-Wmain]
void *main = osnoise_main;
^~~~
Quieten that warning by using "-Wno-main". It's OK to use "main" as a
declaration name in the kernel.
Build-tested on most ARCHes.
[ v2: only do it for gcc, since clang doesn't have that particular warning ]
Signed-off-by: Randy Dunlap <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Suggested-by: Steven Rostedt <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.15
A collection of fixes that came in during the merge window, nothing too
remarkable but a reasonably large number of fixes.
|
|
* acpi-pm:
ACPI: PM: s2idle: Run both AMD and Microsoft methods if both are supported
* acpi-docs:
Documentation: ACPI: Align the SSDT overlays file with the code
|
|
* pm-opp:
dt-bindings: opp: Convert to DT schema
dt-bindings: Clean-up OPP binding node names in examples
ARM: dts: omap: Drop references to opp.txt
|
|
* pm-cpufreq:
Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
cpufreq: Remove ready() callback
cpufreq: sh: Remove sh_cpufreq_cpu_ready()
cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
cpufreq: scmi: Use .register_em() to register with energy model
cpufreq: vexpress: Use .register_em() to register with energy model
cpufreq: scpi: Use .register_em() to register with energy model
cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
cpufreq: omap: Use .register_em() to register with energy model
cpufreq: mediatek: Use .register_em() to register with energy model
cpufreq: imx6q: Use .register_em() to register with energy model
cpufreq: dt: Use .register_em() to register with energy model
cpufreq: Add callback to register with energy model
cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag
|
|
The address of ftrace_graph_caller is already virtual.
Using __va() to translate the address is wrong.
Reviewed-by: Alexander Gordeev <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Many comments above functions start with a kernel doc indicator, but
the comments are not using kernel doc style. Get rid of the warnings
by simply removing the indicator.
E.g.:
drivers/s390/crypto/zcrypt_msgtype6.c:111: warning:
This comment starts with '/**', but isn't a kernel-doc comment.
Reviewed-by: Harald Freudenberger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
A couple of function names don't match what the kernel doc comments
indicate.
Acked-by: Benjamin Block <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Add __nonstring annotation, since the missing string termination for
id member of sclp_trace_entry is intended. This way we get rid of this
warning:
drivers/s390/char/sclp.c:84:9: warning: ‘strncpy’ output truncated before terminating nul copying 4 bytes from a string of the same length [-Wstringop-truncation]
84 | strncpy(e.id, id, sizeof(e.id));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Peter Oberparleiter <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
This symbol is not used outside of trace.c, so marks it static.
Fix the following sparse warning:
drivers/infiniband/hw/hfi1/trace.c:491:23: warning: symbol 'hist' was not declared. Should it be static?
Link: https://lore.kernel.org/r/1630921723-21545-1-git-send-email-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: chongjiapeng <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
As noted in the "Deprecated Interfaces, Language Features, Attributes, and
Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead to
values wrapping around and a smaller allocation being made than the caller
was expecting. Using those allocations could lead to linear overflows of
heap memory and other misbehaviors.
In this case this is not actually dynamic sizes: both sides of the
multiplication are constant values. However it is best to refactor this
anyway, just to keep the open-coded math idiom out of code.
So, use the purpose specific kcalloc() function instead of the argument
size * count in the kzalloc() function.
Also, remove the unnecessary initialization of the sqp_tbl variable since
it is set a few lines later.
[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Len Baker <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
>> drivers/infiniband/hw/qib/qib_sysfs.c:411:1: warning: performing pointer subtraction with a null pointer has undefined behavior
+[-Wnull-pointer-subtraction]
QIB_DIAGC_ATTR(rc_resends);
^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/qib/qib_sysfs.c:408:51: note: expanded from macro 'QIB_DIAGC_ATTR'
.counter = &((struct qib_ibport *)0)->rvp.n_##N - (u64 *)0, \
Use offsetof and accomplish the type check using static_assert.
Fixes: 4a7aaf88c89f ("RDMA/qib: Use attributes for the port sysfs")
Link: https://lore.kernel.org/r/[email protected]
Reported-by: kernel test robot <[email protected]>
Acked-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The XLT chunk alignment depends on ent_size not sizeof(ent_size) aka
sizeof(size_t). The incoming ent_size is either 8 or 16, so the
miscalculation when 16 is required is only an over-alignment and
functional harmless.
Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Niklas Schnelle <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In commit 8010d74b9965b ("RDMA/mlx5: Split the WR setup out of
mlx5_ib_update_xlt()") the allocation logic was split out of
mlx5_ib_update_xlt() and the logic was changed to enable better OOM
handling. Sadly this change introduced a miscalculation of the number of
entries that were actually allocated when under memory pressure where it
can actually become 0 which on s390 lets dma_map_single() fail.
It can also lead to corruption of the free pages list when the wrong
number of entries is used in the calculation of sg->length which is used
as argument for free_pages().
Fix this by using the allocation size instead of misusing get_order(size).
Cc: [email protected]
Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Niklas Schnelle <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Convert controller drivers to generic_handle_domain_irq() (Marc
Zyngier)
- Simplify VPD (Vital Product Data) access and search (Heiner
Kallweit)
- Update bnx2, bnx2x, bnxt, cxgb4, cxlflash, sfc, tg3 drivers to use
simplified VPD interfaces (Heiner Kallweit)
- Run Max Payload Size quirks before configuring MPS; work around
ASMedia ASM1062 SATA MPS issue (Marek Behún)
Resource management:
- Refactor pci_ioremap_bar() and pci_ioremap_wc_bar() (Krzysztof
Wilczyński)
- Optimize pci_resource_len() to reduce kernel size (Zhen Lei)
PCI device hotplug:
- Fix a double unmap in ibmphp (Vishal Aslot)
PCIe port driver:
- Enable Bandwidth Notification only if port supports it (Stuart
Hayes)
Sysfs/proc/syscalls:
- Add schedule point in proc_bus_pci_read() (Krzysztof Wilczyński)
- Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure (Krzysztof
Wilczyński)
- Return "int" from pciconfig_read() syscall (Krzysztof Wilczyński)
Virtualization:
- Extend "pci=noats" to also turn on Translation Blocking to protect
against some DMA attacks (Alex Williamson)
- Add sysfs mechanism to control the type of reset used between
device assignments to VMs (Amey Narkhede)
- Add support for ACPI _RST reset method (Shanker Donthineni)
- Add ACS quirks for Cavium multi-function devices (George Cherian)
- Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms (Wasim Khan)
- Allow HiSilicon AMBA devices that appear as fake PCI devices to use
PASID and SVA (Zhangfei Gao)
Endpoint framework:
- Add support for SR-IOV Endpoint devices (Kishon Vijay Abraham I)
- Zero-initialize endpoint test tool parameters so we don't use
random parameters (Shunyong Yang)
APM X-Gene PCIe controller driver:
- Remove redundant dev_err() call in xgene_msi_probe() (ErKun Yang)
Broadcom iProc PCIe controller driver:
- Don't fail devm_pci_alloc_host_bridge() on missing 'ranges' because
it's optional on BCMA devices (Rob Herring)
- Fix BCMA probe resource handling (Rob Herring)
Cadence PCIe driver:
- Work around J7200 Link training electrical issue by increasing
delays in LTSSM (Nadeem Athani)
Intel IXP4xx PCI controller driver:
- Depend on ARCH_IXP4XX to avoid useless config questions (Geert
Uytterhoeven)
Intel Keembay PCIe controller driver:
- Add Intel Keem Bay PCIe controller (Srikanth Thokala)
Marvell Aardvark PCIe controller driver:
- Work around config space completion handling issues (Evan Wang)
- Increase timeout for config access completions (Pali Rohár)
- Emulate CRS Software Visibility bit (Pali Rohár)
- Configure resources from DT 'ranges' property to fix I/O space
access (Pali Rohár)
- Serialize INTx mask/unmask (Pali Rohár)
MediaTek PCIe controller driver:
- Add MT7629 support in DT (Chuanjia Liu)
- Fix an MSI issue (Chuanjia Liu)
- Get syscon regmap ("mediatek,generic-pciecfg"), IRQ number
("pci_irq"), PCI domain ("linux,pci-domain") from DT properties if
present (Chuanjia Liu)
Microsoft Hyper-V host bridge driver:
- Add ARM64 support (Boqun Feng)
- Support "Create Interrupt v3" message (Sunil Muthuswamy)
NVIDIA Tegra PCIe controller driver:
- Use seq_puts(), move err_msg from stack to static, fix OF node leak
(Christophe JAILLET)
NVIDIA Tegra194 PCIe driver:
- Disable suspend when in Endpoint mode (Om Prakash Singh)
- Fix MSI-X address programming error (Om Prakash Singh)
- Disable interrupts during suspend to avoid spurious AER link down
(Om Prakash Singh)
Renesas R-Car PCIe controller driver:
- Work around hardware issue that prevents Link L1->L0 transition
(Marek Vasut)
- Fix runtime PM refcount leak (Dinghao Liu)
Rockchip DesignWare PCIe controller driver:
- Add Rockchip RK356X host controller driver (Simon Xue)
TI J721E PCIe driver:
- Add support for J7200 and AM64 (Kishon Vijay Abraham I)
Toshiba Visconti PCIe controller driver:
- Add Toshiba Visconti PCIe host controller driver (Nobuhiro
Iwamatsu)
Xilinx NWL PCIe controller driver:
- Enable PCIe reference clock via CCF (Hyun Kwon)
Miscellaneous:
- Convert sta2x11 from 'pci_' to 'dma_' API (Christophe JAILLET)
- Fix pci_dev_str_match_path() alloc while atomic bug (used for
kernel parameters that specify devices) (Dan Carpenter)
- Remove pointless Precision Time Management warning when PTM is
present but not enabled (Jakub Kicinski)
- Remove surplus "break" statements (Krzysztof Wilczyński)"
* tag 'pci-v5.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (132 commits)
PCI: ibmphp: Fix double unmap of io_mem
x86/PCI: sta2x11: switch from 'pci_' to 'dma_' API
PCI/VPD: Use unaligned access helpers
PCI/VPD: Clean up public VPD defines and inline functions
cxgb4: Use pci_vpd_find_id_string() to find VPD ID string
PCI/VPD: Add pci_vpd_find_id_string()
PCI/VPD: Include post-processing in pci_vpd_find_tag()
PCI/VPD: Stop exporting pci_vpd_find_info_keyword()
PCI/VPD: Stop exporting pci_vpd_find_tag()
PCI: Set dma-can-stall for HiSilicon chips
PCI: rockchip-dwc: Add Rockchip RK356X host controller driver
PCI: dwc: Remove surplus break statement after return
PCI: artpec6: Remove local code block from switch statement
PCI: artpec6: Remove surplus break statement after return
MAINTAINERS: Add entries for Toshiba Visconti PCIe controller
PCI: visconti: Add Toshiba Visconti PCIe host controller driver
PCI/portdrv: Enable Bandwidth Notification only if port supports it
PCI: Allow PASID on fake PCIe devices without TLP prefixes
PCI: mediatek: Use PCI domain to handle ports detection
PCI: mediatek: Add new method to get irq number
...
|
|
The cross-product of the kernel's supported toolchains, architectures,
and configuration options is large. So large, that it's generally
accepted to be infeasible to enumerate and build+test them all
(many compile-testers rely on randomly generated configs).
Without the possibility to enumerate all possible combinations of
toolchains, architectures, and configuration options, it is inevitable
that compiler warnings in this space exist.
With -Werror, this means that an innumerable set of kernels are now
broken, yet had been perfectly usable before (confused compilers, code
with warnings unused, or luck).
Distributors will necessarily pick a point in the toolchain X arch X
config space, and if unlucky, will have a broken build. Granted, those
will likely disable CONFIG_WERROR and move on.
The kernel's default configuration is unlikely to be suitable for all
users, but it's inappropriate to force many users to set CONFIG_WERROR=n.
This also holds for CI systems which are focused on runtime testing,
where the odd warning in some subsystem will disrupt testing of the rest
of the kernel. Many of those runtime-focused CI systems run tests or
fuzz the kernel using runtime debugging tools. Runtime testing of
different subsystems can proceed in parallel, and potentially uncover
serious bugs; halting runtime testing of the entire kernel because of
the odd warning (now error) in a subsystem or driver is simply
inappropriate.
Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n
as well.
The appropriate usecase for -Werror is therefore compile-test focused
builds (often done by developers or CI systems).
Reflect this in the Kconfig option by making the default value of WERROR
match COMPILE_TEST.
Signed-off-by: Marco Elver <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Reviwed-by: Mark Brown <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are two bugs in this code. First, if the kzalloc() fails it leads
to a NULL dereference of "ep" on the next line. Second, if the
alloc_event_probe() function returns an error then it leads to an
error pointer dereference in the caller.
Link: https://lkml.kernel.org/r/20210824115150.GI31143@kili
Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes and stragglers from Jakub Kicinski:
"Networking stragglers and fixes, including changes from netfilter,
wireless and can.
Current release - regressions:
- qrtr: revert check in qrtr_endpoint_post(), fixes audio and wifi
- ip_gre: validate csum_start only on pull
- bnxt_en: fix 64-bit doorbell operation on 32-bit kernels
- ionic: fix double use of queue-lock, fix a sleeping in atomic
- can: c_can: fix null-ptr-deref on ioctl()
- cs89x0: disable compile testing on powerpc
Current release - new code bugs:
- bridge: mcast: fix vlan port router deadlock, consistently disable
BH
Previous releases - regressions:
- dsa: tag_rtl4_a: fix egress tags, only port 0 was working
- mptcp: fix possible divide by zero
- netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex
- netfilter: socket: icmp6: fix use-after-scope
- stmmac: fix MAC not working when system resume back with WoL active
Previous releases - always broken:
- ip/ip6_gre: use the same logic as SIT interfaces when computing
v6LL address
- seg6: set fc_nlinfo in nh_create_ipv4, nh_create_ipv6
- mptcp: only send extra TCP acks in eligible socket states
- dsa: lantiq_gswip: fix maximum frame length
- stmmac: fix overall budget calculation for rxtx_napi
- bnxt_en: fix firmware version reporting via devlink
- renesas: sh_eth: add missing barrier to fix freeing wrong tx
descriptor
Stragglers:
- netfilter: conntrack: switch to siphash
- netfilter: refuse insertion if chain has grown too large
- ncsi: add get MAC address command to get Intel i210 MAC address"
* tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
ieee802154: Remove redundant initialization of variable ret
net: stmmac: fix MAC not working when system resume back with WoL active
net: phylink: add suspend/resume support
net: renesas: sh_eth: Fix freeing wrong tx descriptor
bonding: 3ad: pass parameter bond_params by reference
cxgb3: fix oops on module removal
can: c_can: fix null-ptr-deref on ioctl()
can: rcar_canfd: add __maybe_unused annotation to silence warning
net: wwan: iosm: Unify IO accessors used in the driver
net: wwan: iosm: Replace io.*64_lo_hi() with regular accessors
net: qcom/emac: Replace strlcpy with strscpy
ip6_gre: Revert "ip6_gre: add validation for csum_start"
net: hns3: make hclgevf_cmd_caps_bit_map0 and hclge_cmd_caps_bit_map0 static
selftests/bpf: Test XDP bonding nest and unwind
bonding: Fix negative jump label count on nested bonding
MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry
stmmac: dwmac-loongson:Fix missing return value
iwlwifi: fix printk format warnings in uefi.c
net: create netdev->dev_addr assignment helpers
bnxt_en: Fix possible unintended driver initiated error recovery
...
|
|
git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add Mediatek MT7986 & MT8195 wdt support
- add Maxim MAX63xx
- drop bd70528 support
- rewrite ixp4xx to watchdog framework
- constify static struct watchdog_ops for sl28cpld_wdt, mpc8xxx_wdt and
tqmx86
- introduce watchdog_dev_suspend/resume
- several fixes and improvements
* tag 'linux-watchdog-5.15-rc1' of git://www.linux-watchdog.org/linux-watchdog:
dt-bindings: watchdog: Add compatible for Mediatek MT7986
watchdog: ixp4xx: Rewrite driver to use core
watchdog: Start watchdog in watchdog_set_last_hw_keepalive only if appropriate
watchdog: max63xx_wdt: Add device tree probing
dt-bindings: watchdog: Add Maxim MAX63xx bindings
watchdog: mediatek: mt8195: add wdt support
dt-bindings: reset: mt8195: add toprgu reset-controller header file
watchdog: tqmx86: Constify static struct watchdog_ops
watchdog: mpc8xxx_wdt: Constify static struct watchdog_ops
watchdog: sl28cpld_wdt: Constify static struct watchdog_ops
watchdog: iTCO_wdt: Fix detection of SMI-off case
watchdog: bcm2835_wdt: consider system-power-controller property
watchdog: imx2_wdg: notify wdog core to stop ping worker on suspend
watchdog: introduce watchdog_dev_suspend/resume
watchdog: Fix NULL pointer dereference when releasing cdev
watchdog: only run driver set_pretimeout op if device supports it
watchdog: bd70528 drop bd70528 support
|
|
Pull KVM updates from Paolo Bonzini:
"ARM:
- Page ownership tracking between host EL1 and EL2
- Rely on userspace page tables to create large stage-2 mappings
- Fix incompatibility between pKVM and kmemleak
- Fix the PMU reset state, and improve the performance of the virtual
PMU
- Move over to the generic KVM entry code
- Address PSCI reset issues w.r.t. save/restore
- Preliminary rework for the upcoming pKVM fixed feature
- A bunch of MM cleanups
- a vGIC fix for timer spurious interrupts
- Various cleanups
s390:
- enable interpretation of specification exceptions
- fix a vcpu_idx vs vcpu_id mixup
x86:
- fast (lockless) page fault support for the new MMU
- new MMU now the default
- increased maximum allowed VCPU count
- allow inhibit IRQs on KVM_RUN while debugging guests
- let Hyper-V-enabled guests run with virtualized LAPIC as long as
they do not enable the Hyper-V "AutoEOI" feature
- fixes and optimizations for the toggling of AMD AVIC (virtualized
LAPIC)
- tuning for the case when two-dimensional paging (EPT/NPT) is
disabled
- bugfixes and cleanups, especially with respect to vCPU reset and
choosing a paging mode based on CR0/CR4/EFER
- support for 5-level page table on AMD processors
Generic:
- MMU notifier invalidation callbacks do not take mmu_lock unless
necessary
- improved caching of LRU kvm_memory_slot
- support for histogram statistics
- add statistics for halt polling and remote TLB flush requests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
KVM: Drop unused kvm_dirty_gfn_invalid()
KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
KVM: MMU: mark role_regs and role accessors as maybe unused
KVM: MIPS: Remove a "set but not used" variable
x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
KVM: stats: Add VM stat for remote tlb flush requests
KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
kvm: x86: Increase MAX_VCPUS to 1024
kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
KVM: s390: index kvm->arch.idle_mask by vcpu_idx
KVM: s390: Enable specification exception interpretation
KVM: arm64: Trim guest debug exception handling
KVM: SVM: Add 5-level page table support for SVM
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Pull dmi fix from Jean Delvare.
Unbreak some existing udev/hwdb modalias matches due to misplaced
product_sku field.
* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware: dmi: Move product_sku info to the end of the modalias
|
|
Pull NTB updates from Jon Mason:
"Bug fixes and clean-ups for Linux v5.15"
* tag 'ntb-5.15' of git://github.com/jonmason/ntb:
NTB: switch from 'pci_' to 'dma_' API
ntb: ntb_pingpong: remove redundant initialization of variables msg_data and spad_data
NTB: perf: Fix an error code in perf_setup_inbuf()
NTB: Fix an error code in ntb_msit_probe()
ntb: intel: remove invalid email address in header comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
- move the crash recovery worker to the freezable work queue to avoid
interaction with other drivers during suspend & resume
- fix a couple of typos in comments
- add support for handling the audio DSP on SDM660
- fix a race between the Qualcomm wireless subsystem driver and the
associated driver for the RF chip
* tag 'rproc-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
remoteproc: q6v5_pas: Add sdm660 ADSP PIL compatible
dt-bindings: remoteproc: qcom: adsp: Add SDM660 ADSP
remoteproc: use freezable workqueue for crash notifications
remoteproc: fix kernel doc for struct rproc_ops
remoteproc: fix an typo in fw_elf_get_class code comments
remoteproc: qcom: wcnss: Fix race with iris probe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Fix-ups:
- Improve bootloader/kernel device handover
Bug Fixes:
- Stabilise backlight in ktd253 driver"
* tag 'backlight-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: pwm_bl: Improve bootloader/kernel device handover
backlight: ktd253: Stabilize backlight
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"Core Frameworks:
- Add support for registering devices via MFD cells to Simple MFD (I2C)
New Drivers:
- Add support for Renesas Synchronization Management Unit (SMU)
New Device Support:
- Add support for N5010 to Intel M10 BMC
- Add support for Cannon Lake to Intel LPSS ACPI
- Add support for Samsung SSG{1,2} to ST-Ericsson's U8500 family
- Add support for TQMx110EB and TQMxE40x to TQ-Systems PLD TQMx86
New Functionality:
- Add support for GPIO to Intel LPC ICH
- Add support for Reset to Texas Instruments TPS65086
Fix-ups:
- Trivial, sorting, whitespace, renaming, etc; mt6360-core, db8500-prcmu-regs, tqmx86
- Device Tree fiddling; syscon, axp20x, qcom,pm8008, ti,tps65086, brcm,cru
- Use proper APIs for IRQ map resolution; ab8500-core, stmpe, tc3589x, wm8994-irq
- Pass 'supplied-from' property through axp288_fuel_gauge via swnode
- Remove unused file entry; MAINTAINERS
- Make interrupt line optional; tps65086
- Rename db8500-cpuidle driver symbol; db8500-prcmu
- Remove support for unused hardware; tqmx86
- Provide a standard LPC clock frequency for unknown boards; tqmx86
- Remove unused code; ti_am335x_tscadc
- Use of_iomap() instead of ioremap(); syscon
Bug Fixes:
- Clear GPIO IRQ resource flags when no IRQ is set; tqmx86
- Fix incorrect/misleading frequencies; db8500-prcmu
- Mitigate namespace clash with other GPIOBASE users"
* tag 'mfd-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (31 commits)
mfd: lpc_sch: Rename GPIOBASE to prevent build error
mfd: syscon: Use of_iomap() instead of ioremap()
dt-bindings: mfd: Add Broadcom CRU
mfd: ti_am335x_tscadc: Delete superfluous error message
mfd: tqmx86: Assume 24MHz LPC clock for unknown boards
mfd: tqmx86: Add support for TQ-Systems DMI IDs
mfd: tqmx86: Add support for TQMx110EB and TQMxE40x
mfd: tqmx86: Fix typo in "platform"
mfd: tqmx86: Remove incorrect TQMx90UC board ID
mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set
mfd: simple-mfd-i2c: Add support for registering devices via MFD cells
mfd/cpuidle: ux500: Rename driver symbol
mfd: tps65086: Add cell entry for reset driver
mfd: tps65086: Make interrupt line optional
dt-bindings: mfd: Convert tps65086.txt to YAML
MAINTAINERS: Adjust ARM/NOMADIK/Ux500 ARCHITECTURES to file renaming
mfd: db8500-prcmu: Handle missing FW variant
mfd: db8500-prcmu: Rename register header
mfd: axp20x: Add supplied-from property to axp288_fuel_gauge cell
mfd: Don't use irq_create_mapping() to resolve a mapping
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"We mostly have various improvements and refactoring all over the place
but also some interesting new features - like the virtio GPIO driver
that allows guest VMs to use host's GPIOs. We also have a new/old GPIO
driver for rockchip - this one has been split out of the pinctrl
driver.
Summary:
- new driver: gpio-virtio allowing a guest VM running linux to access
GPIO lines provided by the host
- split the GPIO driver out of the rockchip pin control driver
- add support for a new model to gpio-aspeed-sgpio, refactor the
driver and use generic device property interfaces, improve property
sanitization
- add ACPI support to gpio-tegra186
- improve the code setting the line names to support multiple GPIO
banks per device
- constify a bunch of OF functions in the core GPIO code and make the
declaration for one of the core OF functions we use consistent
within its header
- use software nodes in intel_quark_i2c_gpio
- add support for the gpio-line-names property in gpio-mt7621
- use the standard GPIO function for setting the GPIO names in
gpio-brcmstb
- fix a bunch of leaks and other bugs in gpio-mpc8xxx
- use generic pm callbacks in gpio-ml-ioh
- improve resource management and PM handling in gpio-mlxbf2
- modernize and improve the gpio-dwapb driver
- coding style improvements in gpio-rcar
- documentation fixes and improvements
- update the MAINTAINERS entry for gpio-zynq
- minor tweaks in several drivers"
* tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (35 commits)
gpio: mpc8xxx: Use 'devm_gpiochip_add_data()' to simplify the code and avoid a leak
gpio: mpc8xxx: Fix a potential double iounmap call in 'mpc8xxx_probe()'
gpio: mpc8xxx: Fix a resources leak in the error handling path of 'mpc8xxx_probe()'
gpio: viperboard: remove platform_set_drvdata() call in probe
gpio: virtio: Add missing mailings lists in MAINTAINERS entry
gpio: virtio: Fix sparse warnings
gpio: remove the obsolete MX35 3DS BOARD MC9S08DZ60 GPIO functions
gpio: max730x: Use the right include
gpio: Add virtio-gpio driver
gpio: mlxbf2: Use DEFINE_RES_MEM_NAMED() helper macro
gpio: mlxbf2: Use devm_platform_ioremap_resource()
gpio: mlxbf2: Drop wrong use of ACPI_PTR()
gpio: mlxbf2: Convert to device PM ops
gpio: dwapb: Get rid of legacy platform data
mfd: intel_quark_i2c_gpio: Convert GPIO to use software nodes
gpio: dwapb: Read GPIO base from gpio-base property
gpio: dwapb: Unify ACPI enumeration checks in get_irq() and configure_irqs()
gpiolib: Deduplicate forward declaration in the consumer.h header
MAINTAINERS: update gpio-zynq.yaml reference
gpio: tegra186: Add ACPI support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Allow mounting an active fuse device. Previously the fuse device
would always be mounted during initialization, and sharing a fuse
superblock was only possible through mount or namespace cloning
- Fix data flushing in syncfs (virtiofs only)
- Fix data flushing in copy_file_range()
- Fix a possible deadlock in atomic O_TRUNC
- Misc fixes and cleanups
* tag 'fuse-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: remove unused arg in fuse_write_file_get()
fuse: wait for writepages in syncfs
fuse: flush extending writes
fuse: truncate pagecache on atomic_o_trunc
fuse: allow sharing existing sb
fuse: move fget() to fuse_get_tree()
fuse: move option checking into fuse_fill_super()
fuse: name fs_context consistently
fuse: fix use after free in fuse_read_interrupt()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Changes for kgdb/kdb this cycle are dominated by a change from Sumit
that removes as small (256K) private heap from kdb. This is change
I've hoped for ever since I discovered how few users of this heap
remained in the kernel, so many thanks to Sumit for hunting these
down.
The other change is an incremental step towards SPDX headers"
* tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kernel: debug: Convert to SPDX identifier
kdb: Rename members of struct kdbtab_t
kdb: Simplify kdb_defcmd macro logic
kdb: Get rid of redundant kdb_register_flags()
kdb: Rename struct defcmd_set to struct kdb_macro
kdb: Get rid of custom debug heap allocator
|
|
Commit 0f06157e0135 ("cxl/core: Move register mapping infrastructure")
neglected to add a DOC header for the new drivers/core/regs.c file.
Reported-by: Ben Widawsky <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072206675.2250120.3527179192933919995.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
|
|
Commit 06737cd0d216 ("cxl/core: Move pmem functionality") neglected to
add a DOC header for the new drivers/cxl/core/pmem.c file.
Reported-by: Ben Widawsky <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072206163.2250120.11486436976516079516.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
|
|
Fix unused-const-variable warnings emitted by gcc when cxlmem.h is used
by pretty much all files except pci.c
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072205652.2250120.16833548560832424468.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
|
|
Indicator string for mbox and memdev register set to status
incorrectly in error message.
Cc: <[email protected]>
Fixes: 30af97296f48 ("cxl/pci: Map registers based on capabilities")
Signed-off-by: Li Qiang (Johnny Li) <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072205089.2250120.8103605864156687395.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
|
|
A proposed rework of security_locked_down() users identified that the
cxl_pci driver was passing the wrong lockdown_reason. Update
cxl_mem_raw_command_allowed() to fail raw command access when raw pci
access is also disabled.
Fixes: 13237183c735 ("cxl/mem: Add a "RAW" send command")
Cc: Ben Widawsky <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: <[email protected]>
Cc: Ondrej Mosnacek <[email protected]>
Cc: Paul Moore <[email protected]>
Link: https://lore.kernel.org/r/163072204525.2250120.16615792476976546735.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
|
|
During CXL ACPI probe, host bridge ports are discovered by scanning
the ACPI0017 root port for ACPI0016 host bridge devices. The scan
matches on the hardware id of "ACPI0016". An issue occurs when an
ACPI0016 device is defined in the DSDT yet disabled on the platform.
Attempts by the cxl_acpi driver to add host bridge ports using a
disabled device fails, and the entire cxl_acpi probe fails.
The DSDT table includes an _STA method that sets the status and the
ACPI subsystem has checks available to examine it. One such check is
in the acpi_pci_find_root() path. Move the call to acpi_pci_find_root()
to the matching function to prevent this issue when adding either
upstream or downstream ports.
Suggested-by: Dan Williams <[email protected]>
Signed-off-by: Alison Schofield <[email protected]>
Fixes: 7d4b5ca2e2cb ("cxl/acpi: Add downstream port data to cxl_port instances")
Cc: <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072203957.2250120.2178685721061002124.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
|
|
This reverts commit b655843444152c0a14b749308e4cb35d91cbcf0b.
Just like with the memcg lock accounting, the kernel test robot reports
a sizeable performance regression for this commit, and while it clearly
does the rigth thing in theory, we'll need to look at just how to avoid
or minimize the performance overhead of the memcg accounting.
People already have suggestions on how to do that, but it's "future
work".
So revert it for now.
[ Note: the first link below is for this same commit but a different
commit ID, because it's the kernel test robot ended up noticing it in
Andrew Morton's patch queue ]
Link: https://lore.kernel.org/lkml/20210905132732.GC15026@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
Acked-by: Jens Axboe <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This reverts commit 0f12156dff2862ac54235fc72703f18770769042.
The kernel test robot reports a sizeable performance regression for this
commit, and while it clearly does the rigth thing in theory, we'll need
to look at just how to avoid or minimize the performance overhead of the
memcg accounting.
People already have suggestions on how to do that, but it's "future
work".
So revert it for now.
Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
Acked-by: Jens Axboe <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e.
That commit was completely broken, and I should have caught on to it
earlier. But happily, the kernel test robot noticed the breakage fairly
quickly.
The breakage is because "try_get_page()" is about avoiding the page
reference count overflow case, but is otherwise the exact same as a
plain "get_page()".
In contrast, "try_get_compound_head()" is an entirely different beast,
and uses __page_cache_add_speculative() because it's not just about the
page reference count, but also about possibly racing with the underlying
page going away.
So all the commentary about how
"try_get_page() has fallen a little behind in terms of maintenance,
try_get_compound_head() handles speculative page references more
thoroughly"
was just completely wrong: yes, try_get_compound_head() handles
speculative page references, but the point is that try_get_page() does
not, and must not.
So there's no lack of maintainance - there are fundamentally different
semantics.
A speculative page reference would be entirely wrong in "get_page()",
and it's entirely wrong in "try_get_page()". It's not about
speculation, it's purely about "uhhuh, you can't get this page because
you've tried to increment the reference count too much already".
The reason the kernel test robot noticed this bug was that it hit the
VM_BUG_ON() in __page_cache_add_speculative(), which is all about
verifying that the context of any speculative page access is correct.
But since that isn't what try_get_page() is all about, the VM_BUG_ON()
tests things that are not correct to test for try_get_page().
Reported-by: kernel test robot <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull more ARM cpufreq changes for v5.15-rc1 from Viresh Kumar:
"This adds a new cpufreq driver for Mediatek, which had been going
through reviews since last one year."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
|
|
Revert commit d0e936adbd22 ("cpufreq: intel_pstate: Process HWP
Guaranteed change notification"), because it causes a NULL pointer
dereference to occur on Lenovo X1 gen9 laptops due to an HWP
guaranteed performance change interrupt arriving prematurely.
This feature will be revisited in the next cycle.
Reported-by: Jens Axboe <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|