Age | Commit message (Collapse) | Author | Files | Lines |
|
Use memory groups to improve our "auto-movable" onlining policy:
1. For static memory groups (e.g., a DIMM), online a memory block MOVABLE
only if all other memory blocks in the group are either MOVABLE or could
be onlined MOVABLE. A DIMM will either be MOVABLE or not, not a mixture.
2. For dynamic memory groups (e.g., a virtio-mem device), online a
memory block MOVABLE only if all other memory blocks inside the
current unit are either MOVABLE or could be onlined MOVABLE. For a
virtio-mem device with a device block size with 512 MiB, all 128 MiB
memory blocks wihin a 512 MiB unit will either be MOVABLE or not, not
a mixture.
We have to pass the memory group to zone_for_pfn_range() to take the
memory group into account.
Note: for now, there seems to be no compelling reason to make this
behavior configurable.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Hui Zhu <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Marek Kedzierski <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's track all present pages in each memory group. Especially, track
memory present in ZONE_MOVABLE and memory present in one of the kernel
zones (which really only is ZONE_NORMAL right now as memory groups only
apply to hotplugged memory) separately within a memory group, to prepare
for making smart auto-online decision for individual memory blocks within
a memory group based on group statistics.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Hui Zhu <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Marek Kedzierski <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In our "auto-movable" memory onlining policy, we want to make decisions
across memory blocks of a single memory device. Examples of memory
devices include ACPI memory devices (in the simplest case a single DIMM)
and virtio-mem. For now, we don't have a connection between a single
memory block device and the real memory device. Each memory device
consists of 1..X memory block devices.
Let's logically group memory blocks belonging to the same memory device in
"memory groups". Memory groups can span multiple physical ranges and a
memory group itself does not contain any information regarding physical
ranges, only properties (e.g., "max_pages") necessary for improved memory
onlining.
Introduce two memory group types:
1) Static memory group: E.g., a single ACPI memory device, consisting
of 1..X memory resources. A memory group consists of 1..Y memory
blocks. The whole group is added/removed in one go. If any part
cannot get offlined, the whole group cannot be removed.
2) Dynamic memory group: E.g., a single virtio-mem device. Memory is
dynamically added/removed in a fixed granularity, called a "unit",
consisting of 1..X memory blocks. A unit is added/removed in one go.
If any part of a unit cannot get offlined, the whole unit cannot be
removed.
In case of 1) we usually want either all memory managed by ZONE_MOVABLE or
none. In case of 2) we usually want to have as many units as possible
managed by ZONE_MOVABLE. We want a single unit to be of the same type.
For now, memory groups are an internal concept that is not exposed to user
space; we might want to change that in the future, though.
add_memory() users can specify a mgid instead of a nid when passing the
MHP_NID_IS_MGID flag.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Hui Zhu <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Marek Kedzierski <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm/memory_hotplug: "auto-movable" online policy and memory groups", v3.
I. Goal
The goal of this series is improving in-kernel auto-online support. It
tackles the fundamental problems that:
1) We can create zone imbalances when onlining all memory blindly to
ZONE_MOVABLE, in the worst case crashing the system. We have to know
upfront how much memory we are going to hotplug such that we can
safely enable auto-onlining of all hotplugged memory to ZONE_MOVABLE
via "online_movable". This is far from practical and only applicable in
limited setups -- like inside VMs under the RHV/oVirt hypervisor which
will never hotplug more than 3 times the boot memory (and the
limitation is only in place due to the Linux limitation).
2) We see more setups that implement dynamic VM resizing, hot(un)plugging
memory to resize VM memory. In these setups, we might hotplug a lot of
memory, but it might happen in various small steps in both directions
(e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...). virtio-mem is the
primary driver of this upstream right now, performing such dynamic
resizing NUMA-aware via multiple virtio-mem devices.
Onlining all hotplugged memory to ZONE_NORMAL means we basically have
no hotunplug guarantees. Onlining all to ZONE_MOVABLE means we can
easily run into zone imbalances when growing a VM. We want a mixture,
and we want as much memory as reasonable/configured in ZONE_MOVABLE.
Details regarding zone imbalances can be found at [1].
3) Memory devices consist of 1..X memory block devices, however, the
kernel doesn't really track the relationship. Consequently, also user
space has no idea. We want to make per-device decisions.
As one example, for memory hotunplug it doesn't make sense to use a
mixture of zones within a single DIMM: we want all MOVABLE if
possible, otherwise all !MOVABLE, because any !MOVABLE part will easily
block the whole DIMM from getting hotunplugged.
As another example, virtio-mem operates on individual units that span
1..X memory blocks. Similar to a DIMM, we want a unit to either be all
MOVABLE or !MOVABLE. A "unit" can be thought of like a DIMM, however,
all units of a virtio-mem device logically belong together and are
managed (added/removed) by a single driver. We want as much memory of
a virtio-mem device to be MOVABLE as possible.
4) We want memory onlining to be done right from the kernel while adding
memory, not triggered by user space via udev rules; for example, this
is reqired for fast memory hotplug for drivers that add individual
memory blocks, like virito-mem. We want a way to configure a policy in
the kernel and avoid implementing advanced policies in user space.
The auto-onlining support we have in the kernel is not sufficient. All we
have is a) online everything MOVABLE (online_movable) b) online everything
!MOVABLE (online_kernel) c) keep zones contiguous (online). This series
allows configuring c) to mean instead "online movable if possible
according to the coniguration, driven by a maximum MOVABLE:KERNEL ratio"
-- a new onlining policy.
II. Approach
This series does 3 things:
1) Introduces the "auto-movable" online policy that initially operates on
individual memory blocks only. It uses a maximum MOVABLE:KERNEL ratio
to make a decision whether a memory block will be onlined to
ZONE_MOVABLE or not. However, in the basic form, hotplugged KERNEL
memory does not allow for more MOVABLE memory (details in the
patches). CMA memory is treated like MOVABLE memory.
2) Introduces static (e.g., DIMM) and dynamic (e.g., virtio-mem) memory
groups and uses group information to make decisions in the
"auto-movable" online policy across memory blocks of a single memory
device (modeled as memory group). More details can be found in patch
#3 or in the DIMM example below.
3) Maximizes ZONE_MOVABLE memory within dynamic memory groups, by
allowing ZONE_NORMAL memory within a dynamic memory group to allow for
more ZONE_MOVABLE memory within the same memory group. The target use
case is dynamic VM resizing using virtio-mem. See the virtio-mem
example below.
I remember that the basic idea of using a ratio to implement a policy in
the kernel was once mentioned by Vitaly Kuznetsov, but I might be wrong (I
lost the pointer to that discussion).
For me, the main use case is using it along with virtio-mem (and DIMMs /
ppc64 dlpar where necessary) for dynamic resizing of VMs, increasing the
amount of memory we can hotunplug reliably again if we might eventually
hotplug a lot of memory to a VM.
III. Target Usage
The target usage will be:
1) Linux boots with "mhp_default_online_type=offline"
2) User space (e.g., systemd unit) configures memory onlining (according
to a config file and system properties), for example:
* Setting memory_hotplug.online_policy=auto-movable
* Setting memory_hotplug.auto_movable_ratio=301
* Setting memory_hotplug.auto_movable_numa_aware=true
3) User space enabled auto onlining via "echo online >
/sys/devices/system/memory/auto_online_blocks"
4) User space triggers manual onlining of all already-offline memory
blocks (go over offline memory blocks and set them to "online")
IV. Example
For DIMMs, hotplugging 4 GiB DIMMs to a 4 GiB VM with a configured ratio of
301% results in the following layout:
Memory block 0-15: DMA32 (early)
Memory block 32-47: Normal (early)
Memory block 48-79: Movable (DIMM 0)
Memory block 80-111: Movable (DIMM 1)
Memory block 112-143: Movable (DIMM 2)
Memory block 144-275: Normal (DIMM 3)
Memory block 176-207: Normal (DIMM 4)
... all Normal
(-> hotplugged Normal memory does not allow for more Movable memory)
For virtio-mem, using a simple, single virtio-mem device with a 4 GiB VM
will result in the following layout:
Memory block 0-15: DMA32 (early)
Memory block 32-47: Normal (early)
Memory block 48-143: Movable (virtio-mem, first 12 GiB)
Memory block 144: Normal (virtio-mem, next 128 MiB)
Memory block 145-147: Movable (virtio-mem, next 384 MiB)
Memory block 148: Normal (virtio-mem, next 128 MiB)
Memory block 149-151: Movable (virtio-mem, next 384 MiB)
... Normal/Movable mixture as above
(-> hotplugged Normal memory allows for more Movable memory within
the same device)
Which gives us maximum flexibility when dynamically growing/shrinking a
VM in smaller steps.
V. Doc Update
I'll update the memory-hotplug.rst documentation, once the overhaul [1] is
usptream. Until then, details can be found in patch #2.
VI. Future Work
1) Use memory groups for ppc64 dlpar
2) Being able to specify a portion of (early) kernel memory that will be
excluded from the ratio. Like "128 MiB globally/per node" are excluded.
This might be helpful when starting VMs with extremely small memory
footprint (e.g., 128 MiB) and hotplugging memory later -- not wanting
the first hotplugged units getting onlined to ZONE_MOVABLE. One
alternative would be a trigger to not consider ZONE_DMA memory
in the ratio. We'll have to see if this is really rrequired.
3) Indicate to user space that MOVABLE might be a bad idea -- especially
relevant when memory ballooning without support for balloon compaction
is active.
This patch (of 9):
For implementing a new memory onlining policy, which determines when to
online memory blocks to ZONE_MOVABLE semi-automatically, we need the
number of present early (boot) pages -- present pages excluding hotplugged
pages. Let's track these pages per zone.
Pass a page instead of the zone to adjust_present_page_count(), similar as
adjust_managed_page_count() and derive the zone from the page.
It's worth noting that a memory block to be offlined/onlined is either
completely "early" or "not early". add_memory() and friends can only add
complete memory blocks and we only online/offline complete (individual)
memory blocks.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Marek Kedzierski <[email protected]>
Cc: Hui Zhu <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is only a single user remaining. We can simply lookup the nid only
used for node offlining purposes when walking our memory blocks. We don't
expect to remove multi-nid ranges; and if we'd ever do, we most probably
don't care about removing multi-nid ranges that actually result in empty
nodes.
If ever required, we can detect the "multi-nid" scenario and simply try
offlining all online nodes.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Scott Cheloha <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pierre Morel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The parameter is unused, let's remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Acked-by: Heiko Carstens <[email protected]> [s390]
Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Pierre Morel <[email protected]>
Cc: Jia He <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Len Brown <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Scott Cheloha <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory"
These are all cleanups and one fix previously sent as part of [1]:
[PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory
groups.
These patches make sense even without the other series, therefore I pulled
them out to make the other series easier to digest.
[1] https://lkml.kernel.org/r/[email protected]
This patch (of 4):
Checkpatch complained on a follow-up patch that we are using "unsigned"
here, which defaults to "unsigned int" and checkpatch is correct.
As we will search for a fitting zone using the wrong pfn, we might end
up onlining memory to one of the special kernel zones, such as ZONE_DMA,
which can end badly as the onlined memory does not satisfy properties of
these zones.
Use "unsigned long" instead, just as we do in other places when handling
PFNs. This can bite us once we have physical addresses in the range of
multiple TB.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e5e689302633 ("mm, memory_hotplug: display allowed zones in the preferred ordering")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pierre Morel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Scott Cheloha <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE".
After recent updates to freeing unused parts of the memory map, no
architecture can have holes in the memory map within a pageblock. This
makes pfn_valid_within() check and CONFIG_HOLES_IN_ZONE configuration
option redundant.
The first patch removes them both in a mechanical way and the second patch
simplifies memory_hotplug::test_pages_in_a_zone() that had
pfn_valid_within() surrounded by more logic than simple if.
This patch (of 2):
After recent changes in freeing of the unused parts of the memory map and
rework of pfn_valid() in arm and arm64 there are no architectures that can
have holes in the memory map within a pageblock and so nothing can enable
CONFIG_HOLES_IN_ZONE which guards non trivial implementation of
pfn_valid_within().
With that, pfn_valid_within() is always hardwired to 1 and can be
completely removed.
Remove calls to pfn_valid_within() and CONFIG_HOLES_IN_ZONE.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
timespec64_ns() prevents multiplication overflows by comparing the seconds
value of the timespec to KTIME_SEC_MAX. If the value is greater or equal it
returns KTIME_MAX.
But that check casts the signed seconds value to unsigned which makes the
comparision true for all negative values and therefore return wrongly
KTIME_MAX.
Negative second values are perfectly valid and required in some places,
e.g. ptp_clock_adjtime().
Remove the cast and add a check for the negative boundary which is required
to prevent undefined behaviour due to multiplication underflow.
Fixes: cb47755725da ("time: Prevent undefined behaviour in timespec64_to_ns()")'
Signed-off-by: Lukas Hannen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/AM6PR01MB541637BD6F336B8FFB72AF80EEC69@AM6PR01MB5416.eurprd01.prod.exchangelabs.com
|
|
* pm-cpufreq:
Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
cpufreq: Remove ready() callback
cpufreq: sh: Remove sh_cpufreq_cpu_ready()
cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
cpufreq: scmi: Use .register_em() to register with energy model
cpufreq: vexpress: Use .register_em() to register with energy model
cpufreq: scpi: Use .register_em() to register with energy model
cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
cpufreq: omap: Use .register_em() to register with energy model
cpufreq: mediatek: Use .register_em() to register with energy model
cpufreq: imx6q: Use .register_em() to register with energy model
cpufreq: dt: Use .register_em() to register with energy model
cpufreq: Add callback to register with energy model
cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Convert controller drivers to generic_handle_domain_irq() (Marc
Zyngier)
- Simplify VPD (Vital Product Data) access and search (Heiner
Kallweit)
- Update bnx2, bnx2x, bnxt, cxgb4, cxlflash, sfc, tg3 drivers to use
simplified VPD interfaces (Heiner Kallweit)
- Run Max Payload Size quirks before configuring MPS; work around
ASMedia ASM1062 SATA MPS issue (Marek Behún)
Resource management:
- Refactor pci_ioremap_bar() and pci_ioremap_wc_bar() (Krzysztof
Wilczyński)
- Optimize pci_resource_len() to reduce kernel size (Zhen Lei)
PCI device hotplug:
- Fix a double unmap in ibmphp (Vishal Aslot)
PCIe port driver:
- Enable Bandwidth Notification only if port supports it (Stuart
Hayes)
Sysfs/proc/syscalls:
- Add schedule point in proc_bus_pci_read() (Krzysztof Wilczyński)
- Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure (Krzysztof
Wilczyński)
- Return "int" from pciconfig_read() syscall (Krzysztof Wilczyński)
Virtualization:
- Extend "pci=noats" to also turn on Translation Blocking to protect
against some DMA attacks (Alex Williamson)
- Add sysfs mechanism to control the type of reset used between
device assignments to VMs (Amey Narkhede)
- Add support for ACPI _RST reset method (Shanker Donthineni)
- Add ACS quirks for Cavium multi-function devices (George Cherian)
- Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms (Wasim Khan)
- Allow HiSilicon AMBA devices that appear as fake PCI devices to use
PASID and SVA (Zhangfei Gao)
Endpoint framework:
- Add support for SR-IOV Endpoint devices (Kishon Vijay Abraham I)
- Zero-initialize endpoint test tool parameters so we don't use
random parameters (Shunyong Yang)
APM X-Gene PCIe controller driver:
- Remove redundant dev_err() call in xgene_msi_probe() (ErKun Yang)
Broadcom iProc PCIe controller driver:
- Don't fail devm_pci_alloc_host_bridge() on missing 'ranges' because
it's optional on BCMA devices (Rob Herring)
- Fix BCMA probe resource handling (Rob Herring)
Cadence PCIe driver:
- Work around J7200 Link training electrical issue by increasing
delays in LTSSM (Nadeem Athani)
Intel IXP4xx PCI controller driver:
- Depend on ARCH_IXP4XX to avoid useless config questions (Geert
Uytterhoeven)
Intel Keembay PCIe controller driver:
- Add Intel Keem Bay PCIe controller (Srikanth Thokala)
Marvell Aardvark PCIe controller driver:
- Work around config space completion handling issues (Evan Wang)
- Increase timeout for config access completions (Pali Rohár)
- Emulate CRS Software Visibility bit (Pali Rohár)
- Configure resources from DT 'ranges' property to fix I/O space
access (Pali Rohár)
- Serialize INTx mask/unmask (Pali Rohár)
MediaTek PCIe controller driver:
- Add MT7629 support in DT (Chuanjia Liu)
- Fix an MSI issue (Chuanjia Liu)
- Get syscon regmap ("mediatek,generic-pciecfg"), IRQ number
("pci_irq"), PCI domain ("linux,pci-domain") from DT properties if
present (Chuanjia Liu)
Microsoft Hyper-V host bridge driver:
- Add ARM64 support (Boqun Feng)
- Support "Create Interrupt v3" message (Sunil Muthuswamy)
NVIDIA Tegra PCIe controller driver:
- Use seq_puts(), move err_msg from stack to static, fix OF node leak
(Christophe JAILLET)
NVIDIA Tegra194 PCIe driver:
- Disable suspend when in Endpoint mode (Om Prakash Singh)
- Fix MSI-X address programming error (Om Prakash Singh)
- Disable interrupts during suspend to avoid spurious AER link down
(Om Prakash Singh)
Renesas R-Car PCIe controller driver:
- Work around hardware issue that prevents Link L1->L0 transition
(Marek Vasut)
- Fix runtime PM refcount leak (Dinghao Liu)
Rockchip DesignWare PCIe controller driver:
- Add Rockchip RK356X host controller driver (Simon Xue)
TI J721E PCIe driver:
- Add support for J7200 and AM64 (Kishon Vijay Abraham I)
Toshiba Visconti PCIe controller driver:
- Add Toshiba Visconti PCIe host controller driver (Nobuhiro
Iwamatsu)
Xilinx NWL PCIe controller driver:
- Enable PCIe reference clock via CCF (Hyun Kwon)
Miscellaneous:
- Convert sta2x11 from 'pci_' to 'dma_' API (Christophe JAILLET)
- Fix pci_dev_str_match_path() alloc while atomic bug (used for
kernel parameters that specify devices) (Dan Carpenter)
- Remove pointless Precision Time Management warning when PTM is
present but not enabled (Jakub Kicinski)
- Remove surplus "break" statements (Krzysztof Wilczyński)"
* tag 'pci-v5.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (132 commits)
PCI: ibmphp: Fix double unmap of io_mem
x86/PCI: sta2x11: switch from 'pci_' to 'dma_' API
PCI/VPD: Use unaligned access helpers
PCI/VPD: Clean up public VPD defines and inline functions
cxgb4: Use pci_vpd_find_id_string() to find VPD ID string
PCI/VPD: Add pci_vpd_find_id_string()
PCI/VPD: Include post-processing in pci_vpd_find_tag()
PCI/VPD: Stop exporting pci_vpd_find_info_keyword()
PCI/VPD: Stop exporting pci_vpd_find_tag()
PCI: Set dma-can-stall for HiSilicon chips
PCI: rockchip-dwc: Add Rockchip RK356X host controller driver
PCI: dwc: Remove surplus break statement after return
PCI: artpec6: Remove local code block from switch statement
PCI: artpec6: Remove surplus break statement after return
MAINTAINERS: Add entries for Toshiba Visconti PCIe controller
PCI: visconti: Add Toshiba Visconti PCIe host controller driver
PCI/portdrv: Enable Bandwidth Notification only if port supports it
PCI: Allow PASID on fake PCIe devices without TLP prefixes
PCI: mediatek: Use PCI domain to handle ports detection
PCI: mediatek: Add new method to get irq number
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes and stragglers from Jakub Kicinski:
"Networking stragglers and fixes, including changes from netfilter,
wireless and can.
Current release - regressions:
- qrtr: revert check in qrtr_endpoint_post(), fixes audio and wifi
- ip_gre: validate csum_start only on pull
- bnxt_en: fix 64-bit doorbell operation on 32-bit kernels
- ionic: fix double use of queue-lock, fix a sleeping in atomic
- can: c_can: fix null-ptr-deref on ioctl()
- cs89x0: disable compile testing on powerpc
Current release - new code bugs:
- bridge: mcast: fix vlan port router deadlock, consistently disable
BH
Previous releases - regressions:
- dsa: tag_rtl4_a: fix egress tags, only port 0 was working
- mptcp: fix possible divide by zero
- netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex
- netfilter: socket: icmp6: fix use-after-scope
- stmmac: fix MAC not working when system resume back with WoL active
Previous releases - always broken:
- ip/ip6_gre: use the same logic as SIT interfaces when computing
v6LL address
- seg6: set fc_nlinfo in nh_create_ipv4, nh_create_ipv6
- mptcp: only send extra TCP acks in eligible socket states
- dsa: lantiq_gswip: fix maximum frame length
- stmmac: fix overall budget calculation for rxtx_napi
- bnxt_en: fix firmware version reporting via devlink
- renesas: sh_eth: add missing barrier to fix freeing wrong tx
descriptor
Stragglers:
- netfilter: conntrack: switch to siphash
- netfilter: refuse insertion if chain has grown too large
- ncsi: add get MAC address command to get Intel i210 MAC address"
* tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
ieee802154: Remove redundant initialization of variable ret
net: stmmac: fix MAC not working when system resume back with WoL active
net: phylink: add suspend/resume support
net: renesas: sh_eth: Fix freeing wrong tx descriptor
bonding: 3ad: pass parameter bond_params by reference
cxgb3: fix oops on module removal
can: c_can: fix null-ptr-deref on ioctl()
can: rcar_canfd: add __maybe_unused annotation to silence warning
net: wwan: iosm: Unify IO accessors used in the driver
net: wwan: iosm: Replace io.*64_lo_hi() with regular accessors
net: qcom/emac: Replace strlcpy with strscpy
ip6_gre: Revert "ip6_gre: add validation for csum_start"
net: hns3: make hclgevf_cmd_caps_bit_map0 and hclge_cmd_caps_bit_map0 static
selftests/bpf: Test XDP bonding nest and unwind
bonding: Fix negative jump label count on nested bonding
MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry
stmmac: dwmac-loongson:Fix missing return value
iwlwifi: fix printk format warnings in uefi.c
net: create netdev->dev_addr assignment helpers
bnxt_en: Fix possible unintended driver initiated error recovery
...
|
|
git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add Mediatek MT7986 & MT8195 wdt support
- add Maxim MAX63xx
- drop bd70528 support
- rewrite ixp4xx to watchdog framework
- constify static struct watchdog_ops for sl28cpld_wdt, mpc8xxx_wdt and
tqmx86
- introduce watchdog_dev_suspend/resume
- several fixes and improvements
* tag 'linux-watchdog-5.15-rc1' of git://www.linux-watchdog.org/linux-watchdog:
dt-bindings: watchdog: Add compatible for Mediatek MT7986
watchdog: ixp4xx: Rewrite driver to use core
watchdog: Start watchdog in watchdog_set_last_hw_keepalive only if appropriate
watchdog: max63xx_wdt: Add device tree probing
dt-bindings: watchdog: Add Maxim MAX63xx bindings
watchdog: mediatek: mt8195: add wdt support
dt-bindings: reset: mt8195: add toprgu reset-controller header file
watchdog: tqmx86: Constify static struct watchdog_ops
watchdog: mpc8xxx_wdt: Constify static struct watchdog_ops
watchdog: sl28cpld_wdt: Constify static struct watchdog_ops
watchdog: iTCO_wdt: Fix detection of SMI-off case
watchdog: bcm2835_wdt: consider system-power-controller property
watchdog: imx2_wdg: notify wdog core to stop ping worker on suspend
watchdog: introduce watchdog_dev_suspend/resume
watchdog: Fix NULL pointer dereference when releasing cdev
watchdog: only run driver set_pretimeout op if device supports it
watchdog: bd70528 drop bd70528 support
|
|
Pull KVM updates from Paolo Bonzini:
"ARM:
- Page ownership tracking between host EL1 and EL2
- Rely on userspace page tables to create large stage-2 mappings
- Fix incompatibility between pKVM and kmemleak
- Fix the PMU reset state, and improve the performance of the virtual
PMU
- Move over to the generic KVM entry code
- Address PSCI reset issues w.r.t. save/restore
- Preliminary rework for the upcoming pKVM fixed feature
- A bunch of MM cleanups
- a vGIC fix for timer spurious interrupts
- Various cleanups
s390:
- enable interpretation of specification exceptions
- fix a vcpu_idx vs vcpu_id mixup
x86:
- fast (lockless) page fault support for the new MMU
- new MMU now the default
- increased maximum allowed VCPU count
- allow inhibit IRQs on KVM_RUN while debugging guests
- let Hyper-V-enabled guests run with virtualized LAPIC as long as
they do not enable the Hyper-V "AutoEOI" feature
- fixes and optimizations for the toggling of AMD AVIC (virtualized
LAPIC)
- tuning for the case when two-dimensional paging (EPT/NPT) is
disabled
- bugfixes and cleanups, especially with respect to vCPU reset and
choosing a paging mode based on CR0/CR4/EFER
- support for 5-level page table on AMD processors
Generic:
- MMU notifier invalidation callbacks do not take mmu_lock unless
necessary
- improved caching of LRU kvm_memory_slot
- support for histogram statistics
- add statistics for halt polling and remote TLB flush requests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
KVM: Drop unused kvm_dirty_gfn_invalid()
KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
KVM: MMU: mark role_regs and role accessors as maybe unused
KVM: MIPS: Remove a "set but not used" variable
x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
KVM: stats: Add VM stat for remote tlb flush requests
KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
kvm: x86: Increase MAX_VCPUS to 1024
kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
KVM: s390: index kvm->arch.idle_mask by vcpu_idx
KVM: s390: Enable specification exception interpretation
KVM: arm64: Trim guest debug exception handling
KVM: SVM: Add 5-level page table support for SVM
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
- move the crash recovery worker to the freezable work queue to avoid
interaction with other drivers during suspend & resume
- fix a couple of typos in comments
- add support for handling the audio DSP on SDM660
- fix a race between the Qualcomm wireless subsystem driver and the
associated driver for the RF chip
* tag 'rproc-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
remoteproc: q6v5_pas: Add sdm660 ADSP PIL compatible
dt-bindings: remoteproc: qcom: adsp: Add SDM660 ADSP
remoteproc: use freezable workqueue for crash notifications
remoteproc: fix kernel doc for struct rproc_ops
remoteproc: fix an typo in fw_elf_get_class code comments
remoteproc: qcom: wcnss: Fix race with iris probe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"Core Frameworks:
- Add support for registering devices via MFD cells to Simple MFD (I2C)
New Drivers:
- Add support for Renesas Synchronization Management Unit (SMU)
New Device Support:
- Add support for N5010 to Intel M10 BMC
- Add support for Cannon Lake to Intel LPSS ACPI
- Add support for Samsung SSG{1,2} to ST-Ericsson's U8500 family
- Add support for TQMx110EB and TQMxE40x to TQ-Systems PLD TQMx86
New Functionality:
- Add support for GPIO to Intel LPC ICH
- Add support for Reset to Texas Instruments TPS65086
Fix-ups:
- Trivial, sorting, whitespace, renaming, etc; mt6360-core, db8500-prcmu-regs, tqmx86
- Device Tree fiddling; syscon, axp20x, qcom,pm8008, ti,tps65086, brcm,cru
- Use proper APIs for IRQ map resolution; ab8500-core, stmpe, tc3589x, wm8994-irq
- Pass 'supplied-from' property through axp288_fuel_gauge via swnode
- Remove unused file entry; MAINTAINERS
- Make interrupt line optional; tps65086
- Rename db8500-cpuidle driver symbol; db8500-prcmu
- Remove support for unused hardware; tqmx86
- Provide a standard LPC clock frequency for unknown boards; tqmx86
- Remove unused code; ti_am335x_tscadc
- Use of_iomap() instead of ioremap(); syscon
Bug Fixes:
- Clear GPIO IRQ resource flags when no IRQ is set; tqmx86
- Fix incorrect/misleading frequencies; db8500-prcmu
- Mitigate namespace clash with other GPIOBASE users"
* tag 'mfd-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (31 commits)
mfd: lpc_sch: Rename GPIOBASE to prevent build error
mfd: syscon: Use of_iomap() instead of ioremap()
dt-bindings: mfd: Add Broadcom CRU
mfd: ti_am335x_tscadc: Delete superfluous error message
mfd: tqmx86: Assume 24MHz LPC clock for unknown boards
mfd: tqmx86: Add support for TQ-Systems DMI IDs
mfd: tqmx86: Add support for TQMx110EB and TQMxE40x
mfd: tqmx86: Fix typo in "platform"
mfd: tqmx86: Remove incorrect TQMx90UC board ID
mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set
mfd: simple-mfd-i2c: Add support for registering devices via MFD cells
mfd/cpuidle: ux500: Rename driver symbol
mfd: tps65086: Add cell entry for reset driver
mfd: tps65086: Make interrupt line optional
dt-bindings: mfd: Convert tps65086.txt to YAML
MAINTAINERS: Adjust ARM/NOMADIK/Ux500 ARCHITECTURES to file renaming
mfd: db8500-prcmu: Handle missing FW variant
mfd: db8500-prcmu: Rename register header
mfd: axp20x: Add supplied-from property to axp288_fuel_gauge cell
mfd: Don't use irq_create_mapping() to resolve a mapping
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"We mostly have various improvements and refactoring all over the place
but also some interesting new features - like the virtio GPIO driver
that allows guest VMs to use host's GPIOs. We also have a new/old GPIO
driver for rockchip - this one has been split out of the pinctrl
driver.
Summary:
- new driver: gpio-virtio allowing a guest VM running linux to access
GPIO lines provided by the host
- split the GPIO driver out of the rockchip pin control driver
- add support for a new model to gpio-aspeed-sgpio, refactor the
driver and use generic device property interfaces, improve property
sanitization
- add ACPI support to gpio-tegra186
- improve the code setting the line names to support multiple GPIO
banks per device
- constify a bunch of OF functions in the core GPIO code and make the
declaration for one of the core OF functions we use consistent
within its header
- use software nodes in intel_quark_i2c_gpio
- add support for the gpio-line-names property in gpio-mt7621
- use the standard GPIO function for setting the GPIO names in
gpio-brcmstb
- fix a bunch of leaks and other bugs in gpio-mpc8xxx
- use generic pm callbacks in gpio-ml-ioh
- improve resource management and PM handling in gpio-mlxbf2
- modernize and improve the gpio-dwapb driver
- coding style improvements in gpio-rcar
- documentation fixes and improvements
- update the MAINTAINERS entry for gpio-zynq
- minor tweaks in several drivers"
* tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (35 commits)
gpio: mpc8xxx: Use 'devm_gpiochip_add_data()' to simplify the code and avoid a leak
gpio: mpc8xxx: Fix a potential double iounmap call in 'mpc8xxx_probe()'
gpio: mpc8xxx: Fix a resources leak in the error handling path of 'mpc8xxx_probe()'
gpio: viperboard: remove platform_set_drvdata() call in probe
gpio: virtio: Add missing mailings lists in MAINTAINERS entry
gpio: virtio: Fix sparse warnings
gpio: remove the obsolete MX35 3DS BOARD MC9S08DZ60 GPIO functions
gpio: max730x: Use the right include
gpio: Add virtio-gpio driver
gpio: mlxbf2: Use DEFINE_RES_MEM_NAMED() helper macro
gpio: mlxbf2: Use devm_platform_ioremap_resource()
gpio: mlxbf2: Drop wrong use of ACPI_PTR()
gpio: mlxbf2: Convert to device PM ops
gpio: dwapb: Get rid of legacy platform data
mfd: intel_quark_i2c_gpio: Convert GPIO to use software nodes
gpio: dwapb: Read GPIO base from gpio-base property
gpio: dwapb: Unify ACPI enumeration checks in get_irq() and configure_irqs()
gpiolib: Deduplicate forward declaration in the consumer.h header
MAINTAINERS: update gpio-zynq.yaml reference
gpio: tegra186: Add ACPI support
...
|
|
Fix the kernel-doc comments for the improved Energy Model documentation.
Signed-off-by: Lukasz Luba <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Changes for kgdb/kdb this cycle are dominated by a change from Sumit
that removes as small (256K) private heap from kdb. This is change
I've hoped for ever since I discovered how few users of this heap
remained in the kernel, so many thanks to Sumit for hunting these
down.
The other change is an incremental step towards SPDX headers"
* tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kernel: debug: Convert to SPDX identifier
kdb: Rename members of struct kdbtab_t
kdb: Simplify kdb_defcmd macro logic
kdb: Get rid of redundant kdb_register_flags()
kdb: Rename struct defcmd_set to struct kdb_macro
kdb: Get rid of custom debug heap allocator
|
|
This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e.
That commit was completely broken, and I should have caught on to it
earlier. But happily, the kernel test robot noticed the breakage fairly
quickly.
The breakage is because "try_get_page()" is about avoiding the page
reference count overflow case, but is otherwise the exact same as a
plain "get_page()".
In contrast, "try_get_compound_head()" is an entirely different beast,
and uses __page_cache_add_speculative() because it's not just about the
page reference count, but also about possibly racing with the underlying
page going away.
So all the commentary about how
"try_get_page() has fallen a little behind in terms of maintenance,
try_get_compound_head() handles speculative page references more
thoroughly"
was just completely wrong: yes, try_get_compound_head() handles
speculative page references, but the point is that try_get_page() does
not, and must not.
So there's no lack of maintainance - there are fundamentally different
semantics.
A speculative page reference would be entirely wrong in "get_page()",
and it's entirely wrong in "try_get_page()". It's not about
speculation, it's purely about "uhhuh, you can't get this page because
you've tried to increment the reference count too much already".
The reason the kernel test robot noticed this bug was that it hit the
VM_BUG_ON() in __page_cache_add_speculative(), which is all about
verifying that the context of any speculative page access is correct.
But since that isn't what try_get_page() is all about, the VM_BUG_ON()
tests things that are not correct to test for try_get_page().
Reported-by: kernel test robot <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull more ARM cpufreq changes for v5.15-rc1 from Viresh Kumar:
"This adds a new cpufreq driver for Mediatek, which had been going
through reviews since last one year."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
|
|
Joakim Zhang reports that Wake-on-Lan with the stmmac ethernet driver broke
when moving the incorrect handling of mac link state out of mac_config().
This reason this breaks is because the stmmac's WoL is handled by the MAC
rather than the PHY, and phylink doesn't cater for that scenario.
This patch adds the necessary phylink code to handle suspend/resume events
according to whether the MAC still needs a valid link or not. This is the
barest minimum for this support.
Reported-by: Joakim Zhang <[email protected]>
Tested-by: Joakim Zhang <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull libata fixes from Jens Axboe:
"Fixes for queued trim on certain Samsung SSDs, in conjunction with
certain ATI controllers"
* tag 'libata-5.15-2021-09-05' of git://git.kernel.dk/linux-block:
libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD.
libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs
|
|
This patch introduces an attribute for vDPA device to indicate
whether virtual address can be used. If vDPA device driver set
it, vhost-vdpa bus driver will not pin user page and transfer
userspace virtual address instead of physical address during
DMA mapping. And corresponding vma->vm_file and offset will be
also passed as an opaque pointer.
Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Add an opaque pointer for DMA mapping.
Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Add an opaque pointer for vhost IOTLB. And introduce
vhost_iotlb_add_range_ctx() to accept it.
Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
This adds a new callback to support device specific reset
behavior. The vdpa bus driver will call the reset function
instead of setting status to zero during resetting.
Signed-off-by: Xie Yongji <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Fix some code indent issues and following checkpatch warning:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
371: FILE: include/linux/vdpa.h:371:
+static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset,
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Export receive_fd() so that some modules can use
it to pass file descriptor between processes without
missing any security stuffs.
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 5.15
- Page ownership tracking between host EL1 and EL2
- Rely on userspace page tables to create large stage-2 mappings
- Fix incompatibility between pKVM and kmemleak
- Fix the PMU reset state, and improve the performance of the virtual PMU
- Move over to the generic KVM entry code
- Address PSCI reset issues w.r.t. save/restore
- Preliminary rework for the upcoming pKVM fixed feature
- A bunch of MM cleanups
- a vGIC fix for timer spurious interrupts
- Various cleanups
|
|
Add a new stat that counts the number of times a remote TLB flush is
requested, regardless of whether it kicks vCPUs out of guest mode. This
allows us to look at how often flushes are initiated.
Unlike remote_tlb_flush, this one applies to ARM's instruction-set-based
TLB flush implementation, so apply it there too.
Original-by: David Matlack <[email protected]>
Signed-off-by: Jing Zhang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add of_perf_domain_get_sharing_cpumask function to group cpu
to specific performance domain.
Signed-off-by: Hector.Yuan <[email protected]>
[ Viresh: create separate routine parse_perf_domain() and always set the
cpumask. ]
Signed-off-by: Viresh Kumar <[email protected]>
|
|
Recent work on converting address list to a tree made it obvious
we need an abstraction around writing netdev->dev_addr. Without
such abstraction updating the main device address is invisible
to the core.
Introduce a number of helpers which for now just wrap memcpy()
but in the future can make necessary changes to the address
tree.
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT
- bootconfig can now start histograms
- bootconfig supports group/all enabling
- histograms now can put values in linear size buckets
- execnames can be passed to synthetic events
- introduce "event probes" that attach to other events and can retrieve
data from pointers of fields, or record fields as different types (a
pointer to a string as a string instead of just a hex number)
- various fixes and clean ups
* tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits)
tracing/doc: Fix table format in histogram code
selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes
selftests/ftrace: Add selftest for testing eprobe events on synthetic events
selftests/ftrace: Add test case to test adding and removing of event probe
selftests/ftrace: Fix requirement check of README file
selftests/ftrace: Add clear_dynamic_events() to test cases
tracing: Add a probe that attaches to trace events
tracing/probes: Reject events which have the same name of existing one
tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs
tracing/probe: Change traceprobe_set_print_fmt() to take a type
tracing/probes: Use struct_size() instead of defining custom macros
tracing/probes: Allow for dot delimiter as well as slash for system names
tracing/probe: Have traceprobe_parse_probe_arg() take a const arg
tracing: Have dynamic events have a ref counter
tracing: Add DYNAMIC flag for dynamic events
tracing: Replace deprecated CPU-hotplug functions.
MAINTAINERS: Add an entry for os noise/latency
tracepoint: Fix kerneldoc comments
bootconfig/tracing/ktest: Update ktest example for boot-time tracing
tools/bootconfig: Use per-group/all enable option in ftrace2bconf script
...
|
|
Pull MAP_DENYWRITE removal from David Hildenbrand:
"Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove
VM_DENYWRITE.
There are some (minor) user-visible changes:
- We no longer deny write access to shared libaries loaded via legacy
uselib(); this behavior matches modern user space e.g. dlopen().
- We no longer deny write access to the elf interpreter after exec
completed, treating it just like shared libraries (which it often
is).
- We always deny write access to the file linked via /proc/pid/exe:
sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the
file cannot be denied, and write access to the file will remain
denied until the link is effectivel gone (exec, termination,
sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file.
Cross-compiled for a bunch of architectures (alpha, microblaze, i386,
s390x, ...) and verified via ltp that especially the relevant tests
(i.e., creat07 and execve04) continue working as expected"
* tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux:
fs: update documentation of get_write_access() and friends
mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
mm: remove VM_DENYWRITE
binfmt: remove in-tree usage of MAP_DENYWRITE
kernel/fork: always deny write access to current MM exe_file
kernel/fork: factor out replacing the current MM exe_file
binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
|
|
Pull NFS client updates from Anna Schumaker:
"New Features:
- Better client responsiveness when server isn't replying
- Use refcount_t in sunrpc rpc_client refcount tracking
- Add srcaddr and dst_port to the sunrpc sysfs info files
- Add basic support for connection sharing between servers with multiple NICs`
Bugfixes and Cleanups:
- Sunrpc tracepoint cleanups
- Disconnect after ib_post_send() errors to avoid deadlocks
- Fix for tearing down rpcrdma_reps
- Fix a potential pNFS layoutget livelock loop
- pNFS layout barrier fixes
- Fix a potential memory corruption in rpc_wake_up_queued_task_set_status()
- Fix reconnection locking
- Fix return value of get_srcport()
- Remove rpcrdma_post_sends()
- Remove pNFS dead code
- Remove copy size restriction for inter-server copies
- Overhaul the NFS callback service
- Clean up sunrpc TCP socket shutdowns
- Always provide aligned buffers to RPC read layers"
* tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (39 commits)
NFS: Always provide aligned buffers to the RPC read layers
NFSv4.1 add network transport when session trunking is detected
SUNRPC enforce creation of no more than max_connect xprts
NFSv4 introduce max_connect mount options
SUNRPC add xps_nunique_destaddr_xprts to xprt_switch_info in sysfs
SUNRPC keep track of number of transports to unique addresses
NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox
SUNRPC: Tweak TCP socket shutdown in the RPC client
SUNRPC: Simplify socket shutdown when not reusing TCP ports
NFSv4.2: remove restriction of copy size for inter-server copy.
NFS: Clean up the synopsis of callback process_op()
NFS: Extract the xdr_init_encode/decode() calls from decode_compound
NFS: Remove unused callback void decoder
NFS: Add a private local dispatcher for NFSv4 callback operations
SUNRPC: Eliminate the RQ_AUTHERR flag
SUNRPC: Set rq_auth_stat in the pg_authenticate() callout
SUNRPC: Add svc_rqst::rq_auth_stat
SUNRPC: Add dst_port to the sysfs xprt info file
SUNRPC: Add srcaddr as a file in sysfs
sunrpc: Fix return value of get_srcport()
...
|
|
Embed local_lock into struct kmem_cpu_slab and use the irq-safe versions of
local_lock instead of plain local_irq_save/restore. On !PREEMPT_RT that's
equivalent, with better lockdep visibility. On PREEMPT_RT that means better
preemption.
However, the cost on PREEMPT_RT is the loss of lockless fast paths which only
work with cpu freelist. Those are designed to detect and recover from being
preempted by other conflicting operations (both fast or slow path), but the
slow path operations assume they cannot be preempted by a fast path operation,
which is guaranteed naturally with disabled irqs. With local locks on
PREEMPT_RT, the fast paths now also need to take the local lock to avoid races.
In the allocation fastpath slab_alloc_node() we can just defer to the slowpath
__slab_alloc() which also works with cpu freelist, but under the local lock.
In the free fastpath do_slab_free() we have to add a new local lock protected
version of freeing to the cpu freelist, as the existing slowpath only works
with the page freelist.
Also update the comment about locking scheme in SLUB to reflect changes done
by this series.
[ Mike Galbraith <[email protected]>: use local_lock() without irq in PREEMPT_RT
scope; debugging of RT crashes resulting in put_cpu_partial() locking changes ]
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
Remember how many bytes were truncated and reverted back. Because
not reexpanded iterators don't always work well with reverting, we may
need to know that to reexpand ourselves when needed.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Protect nft_ct template with global mutex, from Pavel Skripkin.
2) Two recent commits switched inet rt and nexthop exception hashes
from jhash to siphash. If those two spots are problematic then
conntrack is affected as well, so switch voer to siphash too.
While at it, add a hard upper limit on chain lengths and reject
insertion if this is hit. Patches from Florian Westphal.
3) Fix use-after-scope in nf_socket_ipv6 reported by KASAN,
from Benjamin Hesmans.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: socket: icmp6: fix use-after-scope
netfilter: refuse insertion if chain has grown too large
netfilter: conntrack: switch to siphash
netfilter: conntrack: sanitize table size default settings
netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
As another step of shortening irq disabled sections in ___slab_alloc(), delay
disabling irqs until we pass the initial checks if there is a cached percpu
slab and it's suitable for our allocation.
Now we have to recheck c->page after actually disabling irqs as an allocation
in irq handler might have replaced it.
Because we call pfmemalloc_match() as one of the checks, we might hit
VM_BUG_ON_PAGE(!PageSlab(page)) in PageSlabPfmemalloc in case we get
interrupted and the page is freed. Thus introduce a pfmemalloc_match_unsafe()
variant that lacks the PageSlab check.
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add -s option (strict mode) to merge_config.sh to make it fail when
any symbol is redefined.
- Show a warning if a different compiler is used for building external
modules.
- Infer --target from ARCH for CC=clang to let you cross-compile the
kernel without CROSS_COMPILE.
- Make the integrated assembler default (LLVM_IAS=1) for CC=clang.
- Add <linux/stdarg.h> to the kernel source instead of borrowing
<stdarg.h> from the compiler.
- Add Nick Desaulniers as a Kbuild reviewer.
- Drop stale cc-option tests.
- Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG
to handle symbols in inline assembly.
- Show a warning if 'FORCE' is missing for if_changed rules.
- Various cleanups
* tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
kbuild: redo fake deps at include/ksym/*.h
kbuild: clean up objtool_args slightly
modpost: get the *.mod file path more simply
checkkconfigsymbols.py: Fix the '--ignore' option
kbuild: merge vmlinux_link() between ARCH=um and other architectures
kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh
kbuild: merge vmlinux_link() between the ordinary link and Clang LTO
kbuild: remove stale *.symversions
kbuild: remove unused quiet_cmd_update_lto_symversions
gen_compile_commands: extract compiler command from a series of commands
x86: remove cc-option-yn test for -mtune=
arc: replace cc-option-yn uses with cc-option
s390: replace cc-option-yn uses with cc-option
ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild
sparc: move the install rule to arch/sparc/Makefile
security: remove unneeded subdir-$(CONFIG_...)
kbuild: sh: remove unused install script
kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
kbuild: Switch to 'f' variants of integrated assembler flag
kbuild: Shuffle blank line to improve comment meaning
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- New DART IOMMU driver for Apple Silicon M1 chips
- Optimizations for iommu_[map/unmap] performance
- Selective TLB flush support for the AMD IOMMU driver to make it more
efficient on emulated IOMMUs
- Rework IOVA setup and default domain type setting to move more code
out of IOMMU drivers and to support runtime switching between certain
types of default domains
- VT-d Updates from Lu Baolu:
- Update the virtual command related registers
- Enable Intel IOMMU scalable mode by default
- Preset A/D bits for user space DMA usage
- Allow devices to have more than 32 outstanding PRs
- Various cleanups
- ARM SMMU Updates from Will Deacon:
SMMUv3:
- Minor optimisation to avoid zeroing struct members on CMD submission
- Increased use of batched commands to reduce submission latency
- Refactoring in preparation for ECMDQ support
SMMUv2:
- Fix races when probing devices with identical StreamIDs
- Optimise walk cache flushing for Qualcomm implementations
- Allow deep sleep states for some Qualcomm SoCs with shared clocks
- Various smaller optimizations, cleanups, and fixes
* tag 'iommu-updates-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (85 commits)
iommu/io-pgtable: Abstract iommu_iotlb_gather access
iommu/arm-smmu: Fix missing unlock on error in arm_smmu_device_group()
iommu/vt-d: Add present bit check in pasid entry setup helpers
iommu/vt-d: Use pasid_pte_is_present() helper function
iommu/vt-d: Drop the kernel doc annotation
iommu/vt-d: Allow devices to have more than 32 outstanding PRs
iommu/vt-d: Preset A/D bits for user space DMA usage
iommu/vt-d: Enable Intel IOMMU scalable mode by default
iommu/vt-d: Refactor Kconfig a bit
iommu/vt-d: Remove unnecessary oom message
iommu/vt-d: Update the virtual command related registers
iommu: Allow enabling non-strict mode dynamically
iommu: Merge strictness and domain type configs
iommu: Only log strictness for DMA domains
iommu: Expose DMA domain strictness via sysfs
iommu: Express DMA strictness via the domain type
iommu/vt-d: Prepare for multiple DMA domain types
iommu/arm-smmu: Prepare for multiple DMA domain types
iommu/amd: Prepare for multiple DMA domain types
iommu: Introduce explicit type for non-strict DMA domains
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb updates from Konrad Rzeszutek Wilk:
"A new feature called restricted DMA pools. It allows SWIOTLB to
utilize per-device (or per-platform) allocated memory pools instead of
using the global one.
The first big user of this is ARM Confidential Computing where the
memory for DMA operations can be set per platform"
* 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: (23 commits)
swiotlb: use depends on for DMA_RESTRICTED_POOL
of: restricted dma: Don't fail device probe on rmem init failure
of: Move of_dma_set_restricted_buffer() into device.c
powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()
s390/pv: fix the forcing of the swiotlb
swiotlb: Free tbl memory in swiotlb_exit()
swiotlb: Emit diagnostic in swiotlb_exit()
swiotlb: Convert io_default_tlb_mem to static allocation
of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS
swiotlb: add overflow checks to swiotlb_bounce
swiotlb: fix implicit debugfs declarations
of: Add plumbing for restricted DMA pool
dt-bindings: of: Add restricted DMA pool
swiotlb: Add restricted DMA pool initialization
swiotlb: Add restricted DMA alloc/free support
swiotlb: Refactor swiotlb_tbl_unmap_single
swiotlb: Move alloc_size to swiotlb_find_slots
swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
swiotlb: Update is_swiotlb_active to add a struct device argument
swiotlb: Update is_swiotlb_buffer to add a struct device argument
...
|
|
Merge misc updates from Andrew Morton:
"173 patches.
Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
oom-kill, migration, ksm, percpu, vmstat, and madvise)"
* emailed patches from Andrew Morton <[email protected]>: (173 commits)
mm/madvise: add MADV_WILLNEED to process_madvise()
mm/vmstat: remove unneeded return value
mm/vmstat: simplify the array size calculation
mm/vmstat: correct some wrong comments
mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
selftests: vm: add COW time test for KSM pages
selftests: vm: add KSM merging time test
mm: KSM: fix data type
selftests: vm: add KSM merging across nodes test
selftests: vm: add KSM zero page merging test
selftests: vm: add KSM unmerge test
selftests: vm: add KSM merge test
mm/migrate: correct kernel-doc notation
mm: wire up syscall process_mrelease
mm: introduce process_mrelease system call
memblock: make memblock_find_in_range method private
mm/mempolicy.c: use in_task() in mempolicy_slab_node()
mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
mm/mempolicy: advertise new MPOL_PREFERRED_MANY
mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
...
|
|
Split off from prev patch in the series that implements the syscall.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jan Engelhardt <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tim Murray <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are a lot of uses of memblock_find_in_range() along with
memblock_reserve() from the times memblock allocation APIs did not exist.
memblock_find_in_range() is the very core of memblock allocations, so any
future changes to its internal behaviour would mandate updates of all the
users outside memblock.
Replace the calls to memblock_find_in_range() with an equivalent calls to
memblock_phys_alloc() and memblock_phys_alloc_range() and make
memblock_find_in_range() private method of memblock.
This simplifies the callers, ensures that (unlikely) errors in
memblock_reserve() are handled and improves maintainability of
memblock_find_in_range().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]> [arm64]
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]> [ACPI]
Acked-by: Russell King (Oracle) <[email protected]>
Acked-by: Nick Kossifidis <[email protected]> [riscv]
Tested-by: Guenter Roeck <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Implement the missing huge page allocation functionality while obeying the
preferred node semantics. This is similar to the implementation for
general page allocation, as it uses a fallback mechanism to try multiple
preferred nodes first, and then all other nodes.
To avoid adding too many "#ifdef CONFIG_NUMA" check, add a helper function
in mempolicy.h to check whether a mempolicy is MPOL_PREFERRED_MANY.
[[email protected]: fix compiling issue when merging with other hugetlb patch]
[Thanks to 0day bot for catching the !CONFIG_NUMA compiling issue]
[[email protected]: suggest to remove the #ifdef CONFIG_NUMA check]
[[email protected]: add helpers to avoid ifdefs]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: initialize page to NULL in alloc_buddy_huge_page_with_mpol()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ben Widawsky <[email protected]>
Signed-off-by: Feng Tang <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Co-developed-by: Feng Tang <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The proactive compaction[1] gets triggered for every 500msec and run
compaction on the node for COMPACTION_HPAGE_ORDER (usually order-9) pages
based on the value set to sysctl.compaction_proactiveness. Triggering the
compaction for every 500msec in search of COMPACTION_HPAGE_ORDER pages is
not needed for all applications, especially on the embedded system
usecases which may have few MB's of RAM. Enabling the proactive
compaction in its state will endup in running almost always on such
systems.
Other side, proactive compaction can still be very much useful for getting
a set of higher order pages in some controllable manner(controlled by
using the sysctl.compaction_proactiveness). So, on systems where enabling
the proactive compaction always may proove not required, can trigger the
same from user space on write to its sysctl interface. As an example, say
app launcher decide to launch the memory heavy application which can be
launched fast if it gets more higher order pages thus launcher can prepare
the system in advance by triggering the proactive compaction from
userspace.
This triggering of proactive compaction is done on a write to
sysctl.compaction_proactiveness by user.
[1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=facdaa917c4d5a376d09d25865f5a863f906234a
[[email protected]: tweak vm.rst, per Mike]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Charan Teja Reddy <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vinayak Menon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The return value of kswapd_run() is unused now. Clean it up.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We can get memcg directly form vmpr instead of vmpr->memcg->css->memcg, so
add a new func helper vmpressure_to_memcg(). And no code will use
vmpressure_to_css(), so delete it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hui Su <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Chris Down <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|