aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-08mm: track present early pages per zoneDavid Hildenbrand5-11/+29
Patch series "mm/memory_hotplug: "auto-movable" online policy and memory groups", v3. I. Goal The goal of this series is improving in-kernel auto-online support. It tackles the fundamental problems that: 1) We can create zone imbalances when onlining all memory blindly to ZONE_MOVABLE, in the worst case crashing the system. We have to know upfront how much memory we are going to hotplug such that we can safely enable auto-onlining of all hotplugged memory to ZONE_MOVABLE via "online_movable". This is far from practical and only applicable in limited setups -- like inside VMs under the RHV/oVirt hypervisor which will never hotplug more than 3 times the boot memory (and the limitation is only in place due to the Linux limitation). 2) We see more setups that implement dynamic VM resizing, hot(un)plugging memory to resize VM memory. In these setups, we might hotplug a lot of memory, but it might happen in various small steps in both directions (e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...). virtio-mem is the primary driver of this upstream right now, performing such dynamic resizing NUMA-aware via multiple virtio-mem devices. Onlining all hotplugged memory to ZONE_NORMAL means we basically have no hotunplug guarantees. Onlining all to ZONE_MOVABLE means we can easily run into zone imbalances when growing a VM. We want a mixture, and we want as much memory as reasonable/configured in ZONE_MOVABLE. Details regarding zone imbalances can be found at [1]. 3) Memory devices consist of 1..X memory block devices, however, the kernel doesn't really track the relationship. Consequently, also user space has no idea. We want to make per-device decisions. As one example, for memory hotunplug it doesn't make sense to use a mixture of zones within a single DIMM: we want all MOVABLE if possible, otherwise all !MOVABLE, because any !MOVABLE part will easily block the whole DIMM from getting hotunplugged. As another example, virtio-mem operates on individual units that span 1..X memory blocks. Similar to a DIMM, we want a unit to either be all MOVABLE or !MOVABLE. A "unit" can be thought of like a DIMM, however, all units of a virtio-mem device logically belong together and are managed (added/removed) by a single driver. We want as much memory of a virtio-mem device to be MOVABLE as possible. 4) We want memory onlining to be done right from the kernel while adding memory, not triggered by user space via udev rules; for example, this is reqired for fast memory hotplug for drivers that add individual memory blocks, like virito-mem. We want a way to configure a policy in the kernel and avoid implementing advanced policies in user space. The auto-onlining support we have in the kernel is not sufficient. All we have is a) online everything MOVABLE (online_movable) b) online everything !MOVABLE (online_kernel) c) keep zones contiguous (online). This series allows configuring c) to mean instead "online movable if possible according to the coniguration, driven by a maximum MOVABLE:KERNEL ratio" -- a new onlining policy. II. Approach This series does 3 things: 1) Introduces the "auto-movable" online policy that initially operates on individual memory blocks only. It uses a maximum MOVABLE:KERNEL ratio to make a decision whether a memory block will be onlined to ZONE_MOVABLE or not. However, in the basic form, hotplugged KERNEL memory does not allow for more MOVABLE memory (details in the patches). CMA memory is treated like MOVABLE memory. 2) Introduces static (e.g., DIMM) and dynamic (e.g., virtio-mem) memory groups and uses group information to make decisions in the "auto-movable" online policy across memory blocks of a single memory device (modeled as memory group). More details can be found in patch #3 or in the DIMM example below. 3) Maximizes ZONE_MOVABLE memory within dynamic memory groups, by allowing ZONE_NORMAL memory within a dynamic memory group to allow for more ZONE_MOVABLE memory within the same memory group. The target use case is dynamic VM resizing using virtio-mem. See the virtio-mem example below. I remember that the basic idea of using a ratio to implement a policy in the kernel was once mentioned by Vitaly Kuznetsov, but I might be wrong (I lost the pointer to that discussion). For me, the main use case is using it along with virtio-mem (and DIMMs / ppc64 dlpar where necessary) for dynamic resizing of VMs, increasing the amount of memory we can hotunplug reliably again if we might eventually hotplug a lot of memory to a VM. III. Target Usage The target usage will be: 1) Linux boots with "mhp_default_online_type=offline" 2) User space (e.g., systemd unit) configures memory onlining (according to a config file and system properties), for example: * Setting memory_hotplug.online_policy=auto-movable * Setting memory_hotplug.auto_movable_ratio=301 * Setting memory_hotplug.auto_movable_numa_aware=true 3) User space enabled auto onlining via "echo online > /sys/devices/system/memory/auto_online_blocks" 4) User space triggers manual onlining of all already-offline memory blocks (go over offline memory blocks and set them to "online") IV. Example For DIMMs, hotplugging 4 GiB DIMMs to a 4 GiB VM with a configured ratio of 301% results in the following layout: Memory block 0-15: DMA32 (early) Memory block 32-47: Normal (early) Memory block 48-79: Movable (DIMM 0) Memory block 80-111: Movable (DIMM 1) Memory block 112-143: Movable (DIMM 2) Memory block 144-275: Normal (DIMM 3) Memory block 176-207: Normal (DIMM 4) ... all Normal (-> hotplugged Normal memory does not allow for more Movable memory) For virtio-mem, using a simple, single virtio-mem device with a 4 GiB VM will result in the following layout: Memory block 0-15: DMA32 (early) Memory block 32-47: Normal (early) Memory block 48-143: Movable (virtio-mem, first 12 GiB) Memory block 144: Normal (virtio-mem, next 128 MiB) Memory block 145-147: Movable (virtio-mem, next 384 MiB) Memory block 148: Normal (virtio-mem, next 128 MiB) Memory block 149-151: Movable (virtio-mem, next 384 MiB) ... Normal/Movable mixture as above (-> hotplugged Normal memory allows for more Movable memory within the same device) Which gives us maximum flexibility when dynamically growing/shrinking a VM in smaller steps. V. Doc Update I'll update the memory-hotplug.rst documentation, once the overhaul [1] is usptream. Until then, details can be found in patch #2. VI. Future Work 1) Use memory groups for ppc64 dlpar 2) Being able to specify a portion of (early) kernel memory that will be excluded from the ratio. Like "128 MiB globally/per node" are excluded. This might be helpful when starting VMs with extremely small memory footprint (e.g., 128 MiB) and hotplugging memory later -- not wanting the first hotplugged units getting onlined to ZONE_MOVABLE. One alternative would be a trigger to not consider ZONE_DMA memory in the ratio. We'll have to see if this is really rrequired. 3) Indicate to user space that MOVABLE might be a bad idea -- especially relevant when memory ballooning without support for balloon compaction is active. This patch (of 9): For implementing a new memory onlining policy, which determines when to online memory blocks to ZONE_MOVABLE semi-automatically, we need the number of present early (boot) pages -- present pages excluding hotplugged pages. Let's track these pages per zone. Pass a page instead of the zone to adjust_present_page_count(), similar as adjust_managed_page_count() and derive the zone from the page. It's worth noting that a memory block to be offlined/onlined is either completely "early" or "not early". add_memory() and friends can only add complete memory blocks and we only online/offline complete (individual) memory blocks. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Marek Kedzierski <[email protected]> Cc: Hui Zhu <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Wei Yang <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Len Brown <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08ACPI: memhotplug: memory resources cannot be enabled yetDavid Hildenbrand1-4/+0
We allocate + initialize everything from scratch. In case enabling the device fails, we free all memory resourcs. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Baoquan He <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jia He <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Len Brown <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nathan Lynch <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Pierre Morel <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Rich Felker <[email protected]> Cc: Scott Cheloha <[email protected]> Cc: Sergei Trofimovich <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm/memory_hotplug: remove nid parameter from remove_memory() and friendsDavid Hildenbrand6-31/+30
There is only a single user remaining. We can simply lookup the nid only used for node offlining purposes when walking our memory blocks. We don't expect to remove multi-nid ranges; and if we'd ever do, we most probably don't care about removing multi-nid ranges that actually result in empty nodes. If ever required, we can detect the "multi-nid" scenario and simply try offlining all online nodes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Michael Ellerman <[email protected]> (powerpc) Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Len Brown <[email protected]> Cc: Dan Williams <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Dave Jiang <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Nathan Lynch <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Scott Cheloha <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jia He <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Pierre Morel <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Rich Felker <[email protected]> Cc: Sergei Trofimovich <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm/memory_hotplug: remove nid parameter from arch_remove_memory()David Hildenbrand10-22/+11
The parameter is unused, let's remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Michael Ellerman <[email protected]> [powerpc] Acked-by: Heiko Carstens <[email protected]> [s390] Reviewed-by: Pankaj Gupta <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Baoquan He <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Sergei Trofimovich <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Joe Perches <[email protected]> Cc: Pierre Morel <[email protected]> Cc: Jia He <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jason Wang <[email protected]> Cc: Len Brown <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nathan Lynch <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Scott Cheloha <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()David Hildenbrand2-4/+4
Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory" These are all cleanups and one fix previously sent as part of [1]: [PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory groups. These patches make sense even without the other series, therefore I pulled them out to make the other series easier to digest. [1] https://lkml.kernel.org/r/[email protected] This patch (of 4): Checkpatch complained on a follow-up patch that we are using "unsigned" here, which defaults to "unsigned int" and checkpatch is correct. As we will search for a fitting zone using the wrong pfn, we might end up onlining memory to one of the special kernel zones, such as ZONE_DMA, which can end badly as the onlined memory does not satisfy properties of these zones. Use "unsigned long" instead, just as we do in other places when handling PFNs. This can bite us once we have physical addresses in the range of multiple TB. Link: https://lkml.kernel.org/r/[email protected] Fixes: e5e689302633 ("mm, memory_hotplug: display allowed zones in the preferred ordering") Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Wei Yang <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Len Brown <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: [email protected] Cc: Andy Lutomirski <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Baoquan He <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Jiang <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jia He <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Nathan Lynch <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Pierre Morel <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Rich Felker <[email protected]> Cc: Scott Cheloha <[email protected]> Cc: Sergei Trofimovich <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm: memory_hotplug: cleanup after removal of pfn_valid_within()Mike Rapoport1-6/+3
When test_pages_in_a_zone() used pfn_valid_within() is has some logic surrounding pfn_valid_within() checks. Since pfn_valid_within() is gone, this logic can be removed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONEMike Rapoport8-75/+11
Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE". After recent updates to freeing unused parts of the memory map, no architecture can have holes in the memory map within a pageblock. This makes pfn_valid_within() check and CONFIG_HOLES_IN_ZONE configuration option redundant. The first patch removes them both in a mechanical way and the second patch simplifies memory_hotplug::test_pages_in_a_zone() that had pfn_valid_within() surrounded by more logic than simple if. This patch (of 2): After recent changes in freeing of the unused parts of the memory map and rework of pfn_valid() in arm and arm64 there are no architectures that can have holes in the memory map within a pageblock and so nothing can enable CONFIG_HOLES_IN_ZONE which guards non trivial implementation of pfn_valid_within(). With that, pfn_valid_within() is always hardwired to 1 and can be completely removed. Remove calls to pfn_valid_within() and CONFIG_HOLES_IN_ZONE. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08memory-hotplug.rst: complete admin-guide overhaulDavid Hildenbrand1-306/+455
The memory hot(un)plug documentation is outdated and incomplete. Most of the content dates back to 2007, so it's time for a major overhaul. Let's rewrite, reorganize and update most parts of the documentation. In addition to memory hot(un)plug, also add some details regarding ZONE_MOVABLE, with memory hotunplug being one of its main consumers. Drop the file history, that information can more reliably be had from the git log. The style of the document is also properly fixed that e.g., "restview" renders it cleanly now. In the future, we might add some more details about virt users like virtio-mem, the XEN balloon, the Hyper-V balloon and ppc64 dlpar. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Muchun Song <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08memory-hotplug.rst: remove locking details from admin-guideDavid Hildenbrand1-39/+0
Patch series "memory-hotplug.rst: complete admin-guide overhaul", v3. This patch (of 2): We have the same content at Documentation/core-api/memory-hotplug.rst and it doesn't fit into the admin-guide. The documentation was accidentially duplicated when merging. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Mike Rapoport <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Muchun Song <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08hugetlbfs: s390 is always 64bitDavid Hildenbrand2-2/+2
No need to check for 64BIT. While at it, let's just select ARCH_SUPPORTS_HUGETLBFS from arch/s390/Kconfig. Signed-off-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Heiko Carstens <[email protected]>
2021-09-08fbmem: don't allow too huge resolutionsTetsuo Handa1-0/+6
syzbot is reporting page fault at vga16fb_fillrect() [1], for vga16fb_check_var() is failing to detect multiplication overflow. if (vxres * vyres > maxmem) { vyres = maxmem / vxres; if (vyres < yres) return -ENOMEM; } Since no module would accept too huge resolutions where multiplication overflow happens, let's reject in the common path. Link: https://syzkaller.appspot.com/bug?extid=04168c8063cfdde1db5e [1] Reported-by: syzbot <[email protected]> Debugged-by: Randy Dunlap <[email protected]> Signed-off-by: Tetsuo Handa <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Cc: [email protected] Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-09-08Makefile: use -Wno-main in the full kernel treeRandy Dunlap1-0/+2
When using gcc (SUSE Linux) 7.5.0 (on openSUSE 15.3), I see a build warning: kernel/trace/trace_osnoise.c: In function 'start_kthread': kernel/trace/trace_osnoise.c:1461:8: warning: 'main' is usually a function [-Wmain] void *main = osnoise_main; ^~~~ Quieten that warning by using "-Wno-main". It's OK to use "main" as a declaration name in the kernel. Build-tested on most ARCHes. [ v2: only do it for gcc, since clang doesn't have that particular warning ] Signed-off-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Suggested-by: Steven Rostedt <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Michal Marek <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08Merge tag 'asoc-fix-v5.15-rc1' of ↵Takashi Iwai9-39/+47
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.15 A collection of fixes that came in during the merge window, nothing too remarkable but a reasonably large number of fixes.
2021-09-08Merge branches 'acpi-pm' and 'acpi-docs'Rafael J. Wysocki2-52/+64
* acpi-pm: ACPI: PM: s2idle: Run both AMD and Microsoft methods if both are supported * acpi-docs: Documentation: ACPI: Align the SSDT overlays file with the code
2021-09-08Merge branch 'pm-opp'Rafael J. Wysocki18-639/+757
* pm-opp: dt-bindings: opp: Convert to DT schema dt-bindings: Clean-up OPP binding node names in examples ARM: dts: omap: Drop references to opp.txt
2021-09-08Merge branch 'pm-cpufreq'Rafael J. Wysocki21-128/+684
* pm-cpufreq: Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification" cpufreq: mediatek-hw: Add support for CPUFREQ HW cpufreq: Add of_perf_domain_get_sharing_cpumask dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW cpufreq: Remove ready() callback cpufreq: sh: Remove sh_cpufreq_cpu_ready() cpufreq: acpi: Remove acpi_cpufreq_cpu_ready() cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support cpufreq: scmi: Use .register_em() to register with energy model cpufreq: vexpress: Use .register_em() to register with energy model cpufreq: scpi: Use .register_em() to register with energy model cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model cpufreq: omap: Use .register_em() to register with energy model cpufreq: mediatek: Use .register_em() to register with energy model cpufreq: imx6q: Use .register_em() to register with energy model cpufreq: dt: Use .register_em() to register with energy model cpufreq: Add callback to register with energy model cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag
2021-09-08s390/ftrace: remove incorrect __va usageHeiko Carstens1-2/+2
The address of ftrace_graph_caller is already virtual. Using __va() to translate the address is wrong. Reviewed-by: Alexander Gordeev <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2021-09-08s390/zcrypt: remove incorrect kernel doc indicatorsHeiko Carstens6-48/+48
Many comments above functions start with a kernel doc indicator, but the comments are not using kernel doc style. Get rid of the warnings by simply removing the indicator. E.g.: drivers/s390/crypto/zcrypt_msgtype6.c:111: warning: This comment starts with '/**', but isn't a kernel-doc comment. Reviewed-by: Harald Freudenberger <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2021-09-08scsi: zfcp: fix kernel doc commentsHeiko Carstens4-6/+6
A couple of function names don't match what the kernel doc comments indicate. Acked-by: Benjamin Block <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2021-09-08s390/sclp: add __nonstring annotationHeiko Carstens1-1/+1
Add __nonstring annotation, since the missing string termination for id member of sclp_trace_entry is intended. This way we get rid of this warning: drivers/s390/char/sclp.c:84:9: warning: ‘strncpy’ output truncated before terminating nul copying 4 bytes from a string of the same length [-Wstringop-truncation] 84 | strncpy(e.id, id, sizeof(e.id)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: kernel test robot <[email protected]> Reviewed-by: Peter Oberparleiter <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2021-09-08IB/hfi1: make hist staticchongjiapeng1-1/+1
This symbol is not used outside of trace.c, so marks it static. Fix the following sparse warning: drivers/infiniband/hw/hfi1/trace.c:491:23: warning: symbol 'hist' was not declared. Should it be static? Link: https://lore.kernel.org/r/1630921723-21545-1-git-send-email-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot <[email protected]> Signed-off-by: chongjiapeng <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-09-08RDMA/bnxt_re: Prefer kcalloc over open coded arithmeticLen Baker1-2/+2
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. In this case this is not actually dynamic sizes: both sides of the multiplication are constant values. However it is best to refactor this anyway, just to keep the open-coded math idiom out of code. So, use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function. Also, remove the unnecessary initialization of the sqp_tbl variable since it is set a few lines later. [1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Len Baker <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Acked-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-09-08IB/qib: Fix null pointer subtraction compiler warningJason Gunthorpe1-1/+3
>> drivers/infiniband/hw/qib/qib_sysfs.c:411:1: warning: performing pointer subtraction with a null pointer has undefined behavior +[-Wnull-pointer-subtraction] QIB_DIAGC_ATTR(rc_resends); ^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/qib/qib_sysfs.c:408:51: note: expanded from macro 'QIB_DIAGC_ATTR' .counter = &((struct qib_ibport *)0)->rvp.n_##N - (u64 *)0, \ Use offsetof and accomplish the type check using static_assert. Fixes: 4a7aaf88c89f ("RDMA/qib: Use attributes for the port sysfs") Link: https://lore.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-09-08RDMA/mlx5: Fix xlt_chunk_align calculationNiklas Schnelle1-1/+1
The XLT chunk alignment depends on ent_size not sizeof(ent_size) aka sizeof(size_t). The incoming ent_size is either 8 or 16, so the miscalculation when 16 is required is only an over-alignment and functional harmless. Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-09-08RDMA/mlx5: Fix number of allocated XLT entriesNiklas Schnelle1-1/+1
In commit 8010d74b9965b ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()") the allocation logic was split out of mlx5_ib_update_xlt() and the logic was changed to enable better OOM handling. Sadly this change introduced a miscalculation of the number of entries that were actually allocated when under memory pressure where it can actually become 0 which on s390 lets dma_map_single() fail. It can also lead to corruption of the free pages list when the wrong number of entries is used in the calculation of sg->length which is used as argument for free_pages(). Fix this by using the allocation size instead of misusing get_order(size). Cc: [email protected] Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-09-07Merge tag 'pci-v5.15-changes' of ↵Linus Torvalds100-1555/+3860
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Convert controller drivers to generic_handle_domain_irq() (Marc Zyngier) - Simplify VPD (Vital Product Data) access and search (Heiner Kallweit) - Update bnx2, bnx2x, bnxt, cxgb4, cxlflash, sfc, tg3 drivers to use simplified VPD interfaces (Heiner Kallweit) - Run Max Payload Size quirks before configuring MPS; work around ASMedia ASM1062 SATA MPS issue (Marek Behún) Resource management: - Refactor pci_ioremap_bar() and pci_ioremap_wc_bar() (Krzysztof Wilczyński) - Optimize pci_resource_len() to reduce kernel size (Zhen Lei) PCI device hotplug: - Fix a double unmap in ibmphp (Vishal Aslot) PCIe port driver: - Enable Bandwidth Notification only if port supports it (Stuart Hayes) Sysfs/proc/syscalls: - Add schedule point in proc_bus_pci_read() (Krzysztof Wilczyński) - Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure (Krzysztof Wilczyński) - Return "int" from pciconfig_read() syscall (Krzysztof Wilczyński) Virtualization: - Extend "pci=noats" to also turn on Translation Blocking to protect against some DMA attacks (Alex Williamson) - Add sysfs mechanism to control the type of reset used between device assignments to VMs (Amey Narkhede) - Add support for ACPI _RST reset method (Shanker Donthineni) - Add ACS quirks for Cavium multi-function devices (George Cherian) - Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms (Wasim Khan) - Allow HiSilicon AMBA devices that appear as fake PCI devices to use PASID and SVA (Zhangfei Gao) Endpoint framework: - Add support for SR-IOV Endpoint devices (Kishon Vijay Abraham I) - Zero-initialize endpoint test tool parameters so we don't use random parameters (Shunyong Yang) APM X-Gene PCIe controller driver: - Remove redundant dev_err() call in xgene_msi_probe() (ErKun Yang) Broadcom iProc PCIe controller driver: - Don't fail devm_pci_alloc_host_bridge() on missing 'ranges' because it's optional on BCMA devices (Rob Herring) - Fix BCMA probe resource handling (Rob Herring) Cadence PCIe driver: - Work around J7200 Link training electrical issue by increasing delays in LTSSM (Nadeem Athani) Intel IXP4xx PCI controller driver: - Depend on ARCH_IXP4XX to avoid useless config questions (Geert Uytterhoeven) Intel Keembay PCIe controller driver: - Add Intel Keem Bay PCIe controller (Srikanth Thokala) Marvell Aardvark PCIe controller driver: - Work around config space completion handling issues (Evan Wang) - Increase timeout for config access completions (Pali Rohár) - Emulate CRS Software Visibility bit (Pali Rohár) - Configure resources from DT 'ranges' property to fix I/O space access (Pali Rohár) - Serialize INTx mask/unmask (Pali Rohár) MediaTek PCIe controller driver: - Add MT7629 support in DT (Chuanjia Liu) - Fix an MSI issue (Chuanjia Liu) - Get syscon regmap ("mediatek,generic-pciecfg"), IRQ number ("pci_irq"), PCI domain ("linux,pci-domain") from DT properties if present (Chuanjia Liu) Microsoft Hyper-V host bridge driver: - Add ARM64 support (Boqun Feng) - Support "Create Interrupt v3" message (Sunil Muthuswamy) NVIDIA Tegra PCIe controller driver: - Use seq_puts(), move err_msg from stack to static, fix OF node leak (Christophe JAILLET) NVIDIA Tegra194 PCIe driver: - Disable suspend when in Endpoint mode (Om Prakash Singh) - Fix MSI-X address programming error (Om Prakash Singh) - Disable interrupts during suspend to avoid spurious AER link down (Om Prakash Singh) Renesas R-Car PCIe controller driver: - Work around hardware issue that prevents Link L1->L0 transition (Marek Vasut) - Fix runtime PM refcount leak (Dinghao Liu) Rockchip DesignWare PCIe controller driver: - Add Rockchip RK356X host controller driver (Simon Xue) TI J721E PCIe driver: - Add support for J7200 and AM64 (Kishon Vijay Abraham I) Toshiba Visconti PCIe controller driver: - Add Toshiba Visconti PCIe host controller driver (Nobuhiro Iwamatsu) Xilinx NWL PCIe controller driver: - Enable PCIe reference clock via CCF (Hyun Kwon) Miscellaneous: - Convert sta2x11 from 'pci_' to 'dma_' API (Christophe JAILLET) - Fix pci_dev_str_match_path() alloc while atomic bug (used for kernel parameters that specify devices) (Dan Carpenter) - Remove pointless Precision Time Management warning when PTM is present but not enabled (Jakub Kicinski) - Remove surplus "break" statements (Krzysztof Wilczyński)" * tag 'pci-v5.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (132 commits) PCI: ibmphp: Fix double unmap of io_mem x86/PCI: sta2x11: switch from 'pci_' to 'dma_' API PCI/VPD: Use unaligned access helpers PCI/VPD: Clean up public VPD defines and inline functions cxgb4: Use pci_vpd_find_id_string() to find VPD ID string PCI/VPD: Add pci_vpd_find_id_string() PCI/VPD: Include post-processing in pci_vpd_find_tag() PCI/VPD: Stop exporting pci_vpd_find_info_keyword() PCI/VPD: Stop exporting pci_vpd_find_tag() PCI: Set dma-can-stall for HiSilicon chips PCI: rockchip-dwc: Add Rockchip RK356X host controller driver PCI: dwc: Remove surplus break statement after return PCI: artpec6: Remove local code block from switch statement PCI: artpec6: Remove surplus break statement after return MAINTAINERS: Add entries for Toshiba Visconti PCIe controller PCI: visconti: Add Toshiba Visconti PCIe host controller driver PCI/portdrv: Enable Bandwidth Notification only if port supports it PCI: Allow PASID on fake PCIe devices without TLP prefixes PCI: mediatek: Use PCI domain to handle ports detection PCI: mediatek: Add new method to get irq number ...
2021-09-07kbuild: Only default to -Werror if COMPILE_TESTMarco Elver1-1/+1
The cross-product of the kernel's supported toolchains, architectures, and configuration options is large. So large, that it's generally accepted to be infeasible to enumerate and build+test them all (many compile-testers rely on randomly generated configs). Without the possibility to enumerate all possible combinations of toolchains, architectures, and configuration options, it is inevitable that compiler warnings in this space exist. With -Werror, this means that an innumerable set of kernels are now broken, yet had been perfectly usable before (confused compilers, code with warnings unused, or luck). Distributors will necessarily pick a point in the toolchain X arch X config space, and if unlucky, will have a broken build. Granted, those will likely disable CONFIG_WERROR and move on. The kernel's default configuration is unlikely to be suitable for all users, but it's inappropriate to force many users to set CONFIG_WERROR=n. This also holds for CI systems which are focused on runtime testing, where the odd warning in some subsystem will disrupt testing of the rest of the kernel. Many of those runtime-focused CI systems run tests or fuzz the kernel using runtime debugging tools. Runtime testing of different subsystems can proceed in parallel, and potentially uncover serious bugs; halting runtime testing of the entire kernel because of the odd warning (now error) in a subsystem or driver is simply inappropriate. Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n as well. The appropriate usecase for -Werror is therefore compile-test focused builds (often done by developers or CI systems). Reflect this in the Kconfig option by making the default value of WERROR match COMPILE_TEST. Signed-off-by: Marco Elver <[email protected]> Acked-by: Guenter Roeck <[email protected]> Acked-by: Randy Dunlap <[email protected]> Reviwed-by: Mark Brown <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-07tracing: Fix some alloc_event_probe() error handling bugsDan Carpenter1-2/+3
There are two bugs in this code. First, if the kzalloc() fails it leads to a NULL dereference of "ep" on the next line. Second, if the alloc_event_probe() function returns an error then it leads to an error pointer dereference in the caller. Link: https://lkml.kernel.org/r/20210824115150.GI31143@kili Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-07Merge tag 'net-5.15-rc1' of ↵Linus Torvalds89-389/+1066
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes and stragglers from Jakub Kicinski: "Networking stragglers and fixes, including changes from netfilter, wireless and can. Current release - regressions: - qrtr: revert check in qrtr_endpoint_post(), fixes audio and wifi - ip_gre: validate csum_start only on pull - bnxt_en: fix 64-bit doorbell operation on 32-bit kernels - ionic: fix double use of queue-lock, fix a sleeping in atomic - can: c_can: fix null-ptr-deref on ioctl() - cs89x0: disable compile testing on powerpc Current release - new code bugs: - bridge: mcast: fix vlan port router deadlock, consistently disable BH Previous releases - regressions: - dsa: tag_rtl4_a: fix egress tags, only port 0 was working - mptcp: fix possible divide by zero - netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex - netfilter: socket: icmp6: fix use-after-scope - stmmac: fix MAC not working when system resume back with WoL active Previous releases - always broken: - ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address - seg6: set fc_nlinfo in nh_create_ipv4, nh_create_ipv6 - mptcp: only send extra TCP acks in eligible socket states - dsa: lantiq_gswip: fix maximum frame length - stmmac: fix overall budget calculation for rxtx_napi - bnxt_en: fix firmware version reporting via devlink - renesas: sh_eth: add missing barrier to fix freeing wrong tx descriptor Stragglers: - netfilter: conntrack: switch to siphash - netfilter: refuse insertion if chain has grown too large - ncsi: add get MAC address command to get Intel i210 MAC address" * tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits) ieee802154: Remove redundant initialization of variable ret net: stmmac: fix MAC not working when system resume back with WoL active net: phylink: add suspend/resume support net: renesas: sh_eth: Fix freeing wrong tx descriptor bonding: 3ad: pass parameter bond_params by reference cxgb3: fix oops on module removal can: c_can: fix null-ptr-deref on ioctl() can: rcar_canfd: add __maybe_unused annotation to silence warning net: wwan: iosm: Unify IO accessors used in the driver net: wwan: iosm: Replace io.*64_lo_hi() with regular accessors net: qcom/emac: Replace strlcpy with strscpy ip6_gre: Revert "ip6_gre: add validation for csum_start" net: hns3: make hclgevf_cmd_caps_bit_map0 and hclge_cmd_caps_bit_map0 static selftests/bpf: Test XDP bonding nest and unwind bonding: Fix negative jump label count on nested bonding MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry stmmac: dwmac-loongson:Fix missing return value iwlwifi: fix printk format warnings in uefi.c net: create netdev->dev_addr assignment helpers bnxt_en: Fix possible unintended driver initiated error recovery ...
2021-09-07Merge tag 'linux-watchdog-5.15-rc1' of ↵Linus Torvalds17-318/+213
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - add Mediatek MT7986 & MT8195 wdt support - add Maxim MAX63xx - drop bd70528 support - rewrite ixp4xx to watchdog framework - constify static struct watchdog_ops for sl28cpld_wdt, mpc8xxx_wdt and tqmx86 - introduce watchdog_dev_suspend/resume - several fixes and improvements * tag 'linux-watchdog-5.15-rc1' of git://www.linux-watchdog.org/linux-watchdog: dt-bindings: watchdog: Add compatible for Mediatek MT7986 watchdog: ixp4xx: Rewrite driver to use core watchdog: Start watchdog in watchdog_set_last_hw_keepalive only if appropriate watchdog: max63xx_wdt: Add device tree probing dt-bindings: watchdog: Add Maxim MAX63xx bindings watchdog: mediatek: mt8195: add wdt support dt-bindings: reset: mt8195: add toprgu reset-controller header file watchdog: tqmx86: Constify static struct watchdog_ops watchdog: mpc8xxx_wdt: Constify static struct watchdog_ops watchdog: sl28cpld_wdt: Constify static struct watchdog_ops watchdog: iTCO_wdt: Fix detection of SMI-off case watchdog: bcm2835_wdt: consider system-power-controller property watchdog: imx2_wdg: notify wdog core to stop ping worker on suspend watchdog: introduce watchdog_dev_suspend/resume watchdog: Fix NULL pointer dereference when releasing cdev watchdog: only run driver set_pretimeout op if device supports it watchdog: bd70528 drop bd70528 support
2021-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds120-1605/+2891
Pull KVM updates from Paolo Bonzini: "ARM: - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups s390: - enable interpretation of specification exceptions - fix a vcpu_idx vs vcpu_id mixup x86: - fast (lockless) page fault support for the new MMU - new MMU now the default - increased maximum allowed VCPU count - allow inhibit IRQs on KVM_RUN while debugging guests - let Hyper-V-enabled guests run with virtualized LAPIC as long as they do not enable the Hyper-V "AutoEOI" feature - fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC) - tuning for the case when two-dimensional paging (EPT/NPT) is disabled - bugfixes and cleanups, especially with respect to vCPU reset and choosing a paging mode based on CR0/CR4/EFER - support for 5-level page table on AMD processors Generic: - MMU notifier invalidation callbacks do not take mmu_lock unless necessary - improved caching of LRU kvm_memory_slot - support for histogram statistics - add statistics for halt polling and remote TLB flush requests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits) KVM: Drop unused kvm_dirty_gfn_invalid() KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted KVM: MMU: mark role_regs and role accessors as maybe unused KVM: MIPS: Remove a "set but not used" variable x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait KVM: stats: Add VM stat for remote tlb flush requests KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 kvm: x86: Increase MAX_VCPUS to 1024 kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host KVM: s390: index kvm->arch.idle_mask by vcpu_idx KVM: s390: Enable specification exception interpretation KVM: arm64: Trim guest debug exception handling KVM: SVM: Add 5-level page table support for SVM ...
2021-09-07Merge branch 'dmi-for-linus' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull dmi fix from Jean Delvare. Unbreak some existing udev/hwdb modalias matches due to misplaced product_sku field. * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: firmware: dmi: Move product_sku info to the end of the modalias
2021-09-07Merge tag 'ntb-5.15' of git://github.com/jonmason/ntbLinus Torvalds7-38/+11
Pull NTB updates from Jon Mason: "Bug fixes and clean-ups for Linux v5.15" * tag 'ntb-5.15' of git://github.com/jonmason/ntb: NTB: switch from 'pci_' to 'dma_' API ntb: ntb_pingpong: remove redundant initialization of variables msg_data and spad_data NTB: perf: Fix an error code in perf_setup_inbuf() NTB: Fix an error code in ntb_msit_probe() ntb: intel: remove invalid email address in header comment
2021-09-07Merge tag 'rproc-v5.15' of ↵Linus Torvalds8-90/+96
git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc Pull remoteproc updates from Bjorn Andersson: - move the crash recovery worker to the freezable work queue to avoid interaction with other drivers during suspend & resume - fix a couple of typos in comments - add support for handling the audio DSP on SDM660 - fix a race between the Qualcomm wireless subsystem driver and the associated driver for the RF chip * tag 'rproc-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: remoteproc: q6v5_pas: Add sdm660 ADSP PIL compatible dt-bindings: remoteproc: qcom: adsp: Add SDM660 ADSP remoteproc: use freezable workqueue for crash notifications remoteproc: fix kernel doc for struct rproc_ops remoteproc: fix an typo in fw_elf_get_class code comments remoteproc: qcom: wcnss: Fix race with iris probe
2021-09-07Merge tag 'backlight-next-5.15' of ↵Linus Torvalds2-46/+83
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight Pull backlight updates from Lee Jones: "Fix-ups: - Improve bootloader/kernel device handover Bug Fixes: - Stabilise backlight in ktd253 driver" * tag 'backlight-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: backlight: pwm_bl: Improve bootloader/kernel device handover backlight: ktd253: Stabilize backlight
2021-09-07Merge tag 'mfd-next-5.15' of ↵Linus Torvalds37-120/+1906
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "Core Frameworks: - Add support for registering devices via MFD cells to Simple MFD (I2C) New Drivers: - Add support for Renesas Synchronization Management Unit (SMU) New Device Support: - Add support for N5010 to Intel M10 BMC - Add support for Cannon Lake to Intel LPSS ACPI - Add support for Samsung SSG{1,2} to ST-Ericsson's U8500 family - Add support for TQMx110EB and TQMxE40x to TQ-Systems PLD TQMx86 New Functionality: - Add support for GPIO to Intel LPC ICH - Add support for Reset to Texas Instruments TPS65086 Fix-ups: - Trivial, sorting, whitespace, renaming, etc; mt6360-core, db8500-prcmu-regs, tqmx86 - Device Tree fiddling; syscon, axp20x, qcom,pm8008, ti,tps65086, brcm,cru - Use proper APIs for IRQ map resolution; ab8500-core, stmpe, tc3589x, wm8994-irq - Pass 'supplied-from' property through axp288_fuel_gauge via swnode - Remove unused file entry; MAINTAINERS - Make interrupt line optional; tps65086 - Rename db8500-cpuidle driver symbol; db8500-prcmu - Remove support for unused hardware; tqmx86 - Provide a standard LPC clock frequency for unknown boards; tqmx86 - Remove unused code; ti_am335x_tscadc - Use of_iomap() instead of ioremap(); syscon Bug Fixes: - Clear GPIO IRQ resource flags when no IRQ is set; tqmx86 - Fix incorrect/misleading frequencies; db8500-prcmu - Mitigate namespace clash with other GPIOBASE users" * tag 'mfd-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (31 commits) mfd: lpc_sch: Rename GPIOBASE to prevent build error mfd: syscon: Use of_iomap() instead of ioremap() dt-bindings: mfd: Add Broadcom CRU mfd: ti_am335x_tscadc: Delete superfluous error message mfd: tqmx86: Assume 24MHz LPC clock for unknown boards mfd: tqmx86: Add support for TQ-Systems DMI IDs mfd: tqmx86: Add support for TQMx110EB and TQMxE40x mfd: tqmx86: Fix typo in "platform" mfd: tqmx86: Remove incorrect TQMx90UC board ID mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set mfd: simple-mfd-i2c: Add support for registering devices via MFD cells mfd/cpuidle: ux500: Rename driver symbol mfd: tps65086: Add cell entry for reset driver mfd: tps65086: Make interrupt line optional dt-bindings: mfd: Convert tps65086.txt to YAML MAINTAINERS: Adjust ARM/NOMADIK/Ux500 ARCHITECTURES to file renaming mfd: db8500-prcmu: Handle missing FW variant mfd: db8500-prcmu: Rename register header mfd: axp20x: Add supplied-from property to axp288_fuel_gauge cell mfd: Don't use irq_create_mapping() to resolve a mapping ...
2021-09-07Merge tag 'gpio-updates-for-v5.15' of ↵Linus Torvalds29-447/+752
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio updates from Bartosz Golaszewski: "We mostly have various improvements and refactoring all over the place but also some interesting new features - like the virtio GPIO driver that allows guest VMs to use host's GPIOs. We also have a new/old GPIO driver for rockchip - this one has been split out of the pinctrl driver. Summary: - new driver: gpio-virtio allowing a guest VM running linux to access GPIO lines provided by the host - split the GPIO driver out of the rockchip pin control driver - add support for a new model to gpio-aspeed-sgpio, refactor the driver and use generic device property interfaces, improve property sanitization - add ACPI support to gpio-tegra186 - improve the code setting the line names to support multiple GPIO banks per device - constify a bunch of OF functions in the core GPIO code and make the declaration for one of the core OF functions we use consistent within its header - use software nodes in intel_quark_i2c_gpio - add support for the gpio-line-names property in gpio-mt7621 - use the standard GPIO function for setting the GPIO names in gpio-brcmstb - fix a bunch of leaks and other bugs in gpio-mpc8xxx - use generic pm callbacks in gpio-ml-ioh - improve resource management and PM handling in gpio-mlxbf2 - modernize and improve the gpio-dwapb driver - coding style improvements in gpio-rcar - documentation fixes and improvements - update the MAINTAINERS entry for gpio-zynq - minor tweaks in several drivers" * tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (35 commits) gpio: mpc8xxx: Use 'devm_gpiochip_add_data()' to simplify the code and avoid a leak gpio: mpc8xxx: Fix a potential double iounmap call in 'mpc8xxx_probe()' gpio: mpc8xxx: Fix a resources leak in the error handling path of 'mpc8xxx_probe()' gpio: viperboard: remove platform_set_drvdata() call in probe gpio: virtio: Add missing mailings lists in MAINTAINERS entry gpio: virtio: Fix sparse warnings gpio: remove the obsolete MX35 3DS BOARD MC9S08DZ60 GPIO functions gpio: max730x: Use the right include gpio: Add virtio-gpio driver gpio: mlxbf2: Use DEFINE_RES_MEM_NAMED() helper macro gpio: mlxbf2: Use devm_platform_ioremap_resource() gpio: mlxbf2: Drop wrong use of ACPI_PTR() gpio: mlxbf2: Convert to device PM ops gpio: dwapb: Get rid of legacy platform data mfd: intel_quark_i2c_gpio: Convert GPIO to use software nodes gpio: dwapb: Read GPIO base from gpio-base property gpio: dwapb: Unify ACPI enumeration checks in get_irq() and configure_irqs() gpiolib: Deduplicate forward declaration in the consumer.h header MAINTAINERS: update gpio-zynq.yaml reference gpio: tegra186: Add ACPI support ...
2021-09-07Merge tag 'fuse-update-5.15' of ↵Linus Torvalds6-80/+214
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Allow mounting an active fuse device. Previously the fuse device would always be mounted during initialization, and sharing a fuse superblock was only possible through mount or namespace cloning - Fix data flushing in syncfs (virtiofs only) - Fix data flushing in copy_file_range() - Fix a possible deadlock in atomic O_TRUNC - Misc fixes and cleanups * tag 'fuse-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: remove unused arg in fuse_write_file_get() fuse: wait for writepages in syncfs fuse: flush extending writes fuse: truncate pagecache on atomic_o_trunc fuse: allow sharing existing sb fuse: move fget() to fuse_get_tree() fuse: move option checking into fuse_fill_super() fuse: name fs_context consistently fuse: fix use after free in fuse_read_interrupt()
2021-09-07Merge tag 'kgdb-5.15-rc1' of ↵Linus Torvalds10-728/+387
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "Changes for kgdb/kdb this cycle are dominated by a change from Sumit that removes as small (256K) private heap from kdb. This is change I've hoped for ever since I discovered how few users of this heap remained in the kernel, so many thanks to Sumit for hunting these down. The other change is an incremental step towards SPDX headers" * tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kernel: debug: Convert to SPDX identifier kdb: Rename members of struct kdbtab_t kdb: Simplify kdb_defcmd macro logic kdb: Get rid of redundant kdb_register_flags() kdb: Rename struct defcmd_set to struct kdb_macro kdb: Get rid of custom debug heap allocator
2021-09-07cxl/registers: Fix Documentation warningDan Williams2-2/+15
Commit 0f06157e0135 ("cxl/core: Move register mapping infrastructure") neglected to add a DOC header for the new drivers/core/regs.c file. Reported-by: Ben Widawsky <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/163072206675.2250120.3527179192933919995.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]>
2021-09-07cxl/pmem: Fix Documentation warningDan Williams2-3/+29
Commit 06737cd0d216 ("cxl/core: Move pmem functionality") neglected to add a DOC header for the new drivers/cxl/core/pmem.c file. Reported-by: Ben Widawsky <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/163072206163.2250120.11486436976516079516.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]>
2021-09-07cxl/uapi: Fix defined but not used warningsBen Widawsky1-1/+1
Fix unused-const-variable warnings emitted by gcc when cxlmem.h is used by pretty much all files except pci.c Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/163072205652.2250120.16833548560832424468.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]>
2021-09-07cxl/pci: Fix debug message in cxl_probe_regs()Li Qiang (Johnny Li)1-2/+2
Indicator string for mbox and memdev register set to status incorrectly in error message. Cc: <[email protected]> Fixes: 30af97296f48 ("cxl/pci: Map registers based on capabilities") Signed-off-by: Li Qiang (Johnny Li) <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/163072205089.2250120.8103605864156687395.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]>
2021-09-07cxl/pci: Fix lockdown levelDan Williams1-1/+1
A proposed rework of security_locked_down() users identified that the cxl_pci driver was passing the wrong lockdown_reason. Update cxl_mem_raw_command_allowed() to fail raw command access when raw pci access is also disabled. Fixes: 13237183c735 ("cxl/mem: Add a "RAW" send command") Cc: Ben Widawsky <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: <[email protected]> Cc: Ondrej Mosnacek <[email protected]> Cc: Paul Moore <[email protected]> Link: https://lore.kernel.org/r/163072204525.2250120.16615792476976546735.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]>
2021-09-07cxl/acpi: Do not add DSDT disabled ACPI0016 host bridge portsAlison Schofield1-4/+8
During CXL ACPI probe, host bridge ports are discovered by scanning the ACPI0017 root port for ACPI0016 host bridge devices. The scan matches on the hardware id of "ACPI0016". An issue occurs when an ACPI0016 device is defined in the DSDT yet disabled on the platform. Attempts by the cxl_acpi driver to add host bridge ports using a disabled device fails, and the entire cxl_acpi probe fails. The DSDT table includes an _STA method that sets the status and the ACPI subsystem has checks available to examine it. One such check is in the acpi_pci_find_root() path. Move the call to acpi_pci_find_root() to the matching function to prevent this issue when adding either upstream or downstream ports. Suggested-by: Dan Williams <[email protected]> Signed-off-by: Alison Schofield <[email protected]> Fixes: 7d4b5ca2e2cb ("cxl/acpi: Add downstream port data to cxl_port instances") Cc: <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/163072203957.2250120.2178685721061002124.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]>
2021-09-07Revert "memcg: enable accounting for pollfd and select bits arrays"Linus Torvalds1-2/+2
This reverts commit b655843444152c0a14b749308e4cb35d91cbcf0b. Just like with the memcg lock accounting, the kernel test robot reports a sizeable performance regression for this commit, and while it clearly does the rigth thing in theory, we'll need to look at just how to avoid or minimize the performance overhead of the memcg accounting. People already have suggestions on how to do that, but it's "future work". So revert it for now. [ Note: the first link below is for this same commit but a different commit ID, because it's the kernel test robot ended up noticing it in Andrew Morton's patch queue ] Link: https://lore.kernel.org/lkml/20210905132732.GC15026@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/ Acked-by: Jens Axboe <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-07Revert "memcg: enable accounting for file lock caches"Linus Torvalds1-4/+2
This reverts commit 0f12156dff2862ac54235fc72703f18770769042. The kernel test robot reports a sizeable performance regression for this commit, and while it clearly does the rigth thing in theory, we'll need to look at just how to avoid or minimize the performance overhead of the memcg accounting. People already have suggestions on how to do that, but it's "future work". So revert it for now. Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/ Acked-by: Jens Axboe <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-07Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"Linus Torvalds4-20/+15
This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e. That commit was completely broken, and I should have caught on to it earlier. But happily, the kernel test robot noticed the breakage fairly quickly. The breakage is because "try_get_page()" is about avoiding the page reference count overflow case, but is otherwise the exact same as a plain "get_page()". In contrast, "try_get_compound_head()" is an entirely different beast, and uses __page_cache_add_speculative() because it's not just about the page reference count, but also about possibly racing with the underlying page going away. So all the commentary about how "try_get_page() has fallen a little behind in terms of maintenance, try_get_compound_head() handles speculative page references more thoroughly" was just completely wrong: yes, try_get_compound_head() handles speculative page references, but the point is that try_get_page() does not, and must not. So there's no lack of maintainance - there are fundamentally different semantics. A speculative page reference would be entirely wrong in "get_page()", and it's entirely wrong in "try_get_page()". It's not about speculation, it's purely about "uhhuh, you can't get this page because you've tried to increment the reference count too much already". The reason the kernel test robot noticed this bug was that it hit the VM_BUG_ON() in __page_cache_add_speculative(), which is all about verifying that the context of any speculative page access is correct. But since that isn't what try_get_page() is all about, the VM_BUG_ON() tests things that are not correct to test for try_get_page(). Reported-by: kernel test robot <[email protected]> Cc: John Hubbard <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-07Merge branch 'cpufreq/arm/linux-next' of ↵Rafael J. Wysocki5-1/+448
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull more ARM cpufreq changes for v5.15-rc1 from Viresh Kumar: "This adds a new cpufreq driver for Mediatek, which had been going through reviews since last one year." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: mediatek-hw: Add support for CPUFREQ HW cpufreq: Add of_perf_domain_get_sharing_cpumask dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
2021-09-07Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"Rafael J. Wysocki1-39/+0
Revert commit d0e936adbd22 ("cpufreq: intel_pstate: Process HWP Guaranteed change notification"), because it causes a NULL pointer dereference to occur on Lenovo X1 gen9 laptops due to an HWP guaranteed performance change interrupt arriving prematurely. This feature will be revisited in the next cycle. Reported-by: Jens Axboe <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>