Age | Commit message (Collapse) | Author | Files | Lines |
|
We soon want to pass flags, e.g., to mark added System RAM resources.
mergeable. Prepare for that.
This patch is based on a similar patch by Oscar Salvador:
https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Juergen Gross <[email protected]> # Xen related part
Reviewed-by: Pankaj Gupta <[email protected]>
Acked-by: Wei Liu <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: "Oliver O'Halloran" <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Libor Pechacek <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Leonardo Bras <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Julien Grall <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Roger Pau Monné <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Wei Yang <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The conversion to request_mem_region() is broken because it assumes that
the range is marked busy prior to release. However, due to the way that
the kmem driver manipulates the IORESOURCE_BUSY flag (clears it to let
{add,remove}_memory() handle busy) it requires a manual release_resource()
to perform cleanup.
Given that the actual 'struct resource *' needs to be recalled, not just
the range, add that tracking to the kmem driver-data.
Fixes: 0513bd5bb114 ("device-dax/kmem: replace release_resource() with release_mem_region()")
Reported-by: David Hildenbrand <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Link: https://lkml.kernel.org/r/160272252925.3136502.17220638073995895400.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Break the requirement that device-dax instances are physically contiguous.
With this constraint removed it allows fragmented available capacity to
be fully allocated.
This capability is useful to mitigate the "noisy neighbor" problem with
memory-side-cache management for virtual machines, or any other scenario
where a platform address boundary also designates a performance boundary.
For example a direct mapped memory side cache might rotate cache colors at
1GB boundaries. With dis-contiguous allocations a device-dax instance
could be configured to contain only 1 cache color.
It also satisfies Joao's use case (see link) for partitioning memory for
exclusive guest access. It allows for a future potential mode where the
host kernel need not allocate 'struct page' capacity up-front.
Reported-by: Joao Martins <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Hulk Robot <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Yan <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Jia He <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lkml.kernel.org/r/159643104304.4062302.16561669534797528660.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106116875.30709.11456649969327399771.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In preparation for introducing seed devices the dax-bus core needs to be
able to intercept ->probe() and ->remove() operations. Towards that end
arrange for the bus and drivers to switch from raw 'struct device' driver
operations to 'struct dev_dax' typed operations.
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jason Yan <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/160106113357.30709.4541750544799737855.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, change
the kmem driver to use the idiomatic release_mem_region() to pair with the
initial request_mem_region(). This also eliminates the need to open code
the release of the resource allocated by request_mem_region().
As there are no more dax_kmem_res users, delete this struct member.
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Hulk Robot <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Yan <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/160106112239.30709.15909567572288425294.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, move
resource name tracking to driver data. The memory for the resource name
needs to have its own lifetime separate from the device bind lifetime for
cases where the driver is unbound, but the kmem range could not be
unplugged from the page allocator.
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Hulk Robot <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Yan <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/160106111639.30709.17624822766862009183.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, teach the
driver to calculate the hotplug range from the device range. The hotplug
range is the trivially calculated memory-block-size aligned version of the
device range.
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Hulk Robot <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Yan <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/160106111109.30709.3173462396758431559.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The passed in dev_pagemap is only required in the pmem case as the
libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
place the memmap in pmem directly. In the hmem case there is no agent
reserving an altmap so it can all be handled by a core internal default.
Pass the resource range via a new @range property of 'struct
dev_dax_data'.
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Hulk Robot <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Yan <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently, when adding memory, we create entries in /sys/firmware/memmap/
as "System RAM". This will lead to kexec-tools to add that memory to the
fixed-up initial memmap for a kexec kernel (loaded via kexec_load()). The
memory will be considered initial System RAM by the kexec'd kernel and can
no longer be reconfigured. This is not what happens during a real reboot.
Let's add our memory via add_memory_driver_managed() now, so we won't
create entries in /sys/firmware/memmap/ and indicate the memory as "System
RAM (kmem)" in /proc/iomem. This allows everybody (especially
kexec-tools) to identify that this memory is special and has to be treated
differently than ordinary (hotplugged) System RAM.
Before configuring the namespace:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-33fffffff : namespace0.0
3280000000-32ffffffff : PCI Bus 0000:00
After configuring the namespace:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
148200000-33fffffff : dax0.0
3280000000-32ffffffff : PCI Bus 0000:00
After loading kmem before this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM
3280000000-32ffffffff : PCI Bus 0000:00
After loading kmem after this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM (kmem)
3280000000-32ffffffff : PCI Bus 0000:00
After a proper reboot:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
148200000-33fffffff : dax0.0
3280000000-32ffffffff : PCI Bus 0000:00
Within the kexec kernel before this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : System RAM
3280000000-32ffffffff : PCI Bus 0000:00
Within the kexec kernel after this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
148200000-33fffffff : dax0.0
3280000000-32ffffffff : PCI Bus 0000:00
/sys/firmware/memmap/ before this change:
0000000000000000-000000000009fc00 (System RAM)
000000000009fc00-00000000000a0000 (Reserved)
00000000000f0000-0000000000100000 (Reserved)
0000000000100000-00000000bffdf000 (System RAM)
00000000bffdf000-00000000c0000000 (Reserved)
00000000feffc000-00000000ff000000 (Reserved)
00000000fffc0000-0000000100000000 (Reserved)
0000000100000000-0000000140000000 (System RAM)
0000000150000000-0000000340000000 (System RAM)
/sys/firmware/memmap/ after a proper reboot:
0000000000000000-000000000009fc00 (System RAM)
000000000009fc00-00000000000a0000 (Reserved)
00000000000f0000-0000000000100000 (Reserved)
0000000000100000-00000000bffdf000 (System RAM)
00000000bffdf000-00000000c0000000 (Reserved)
00000000feffc000-00000000ff000000 (Reserved)
00000000fffc0000-0000000100000000 (Reserved)
0000000100000000-0000000140000000 (System RAM)
/sys/firmware/memmap/ after this change:
0000000000000000-000000000009fc00 (System RAM)
000000000009fc00-00000000000a0000 (Reserved)
00000000000f0000-0000000000100000 (Reserved)
0000000000100000-00000000bffdf000 (System RAM)
00000000bffdf000-00000000c0000000 (Reserved)
00000000feffc000-00000000ff000000 (Reserved)
00000000fffc0000-0000000100000000 (Reserved)
0000000100000000-0000000140000000 (System RAM)
kexec-tools already seem to basically ignore any System RAM that's not on
top level when searching for areas to place kexec images - but also for
determining crash areas to dump via kdump. Changing the resource name
won't have an impact.
Handle unloading of the driver after memory hotremove failed properly, by
duplicating the string if necessary.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Pankaj Gupta <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Dan Williams <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Assume we have kmem configured and loaded:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory$
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM
Assume we try to unload kmem. This force-unloading will work, even if
memory cannot get removed from the system.
[root@localhost ~]# rmmod kmem
[ 86.380228] removing memory fails, because memory [0x0000000150000000-0x0000000157ffffff] is onlined
...
[ 86.431225] kmem dax0.0: DAX region [mem 0x150000000-0x33fffffff] cannot be hotremoved until the next reboot
Now, we can reconfigure the namespace:
[root@localhost ~]# ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax
[ 131.409351] nd_pmem namespace0.0: could not reserve region [mem 0x140000000-0x33fffffff]dax
[ 131.410147] nd_pmem: probe of namespace0.0 failed with error -16namespace0.0 --mode=devdax
...
This fails as expected due to the busy memory resource, and the memory
cannot be used. However, the dax0.0 device is removed, and along its
name.
The name of the memory resource now points at freed memory (name of the
device):
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : �_�^7_��/_��wR��WQ���^��� ...
150000000-33fffffff : System RAM
We have to make sure to duplicate the string. While at it, remove the
superfluous setting of the name and fixup a stale comment.
Fixes: 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used like normal RAM")
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: <[email protected]> [5.3]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is now allowed to use persistent memory like a regular RAM, but
currently there is no way to remove this memory until machine is
rebooted.
This work expands the functionality to also allows hotremoving
previously hotplugged persistent memory, and recover the device for use
for other purposes.
To hotremove persistent memory, the management software must first
offline all memory blocks of dax region, and than unbind it from
device-dax/kmem driver. So, operations should look like this:
echo offline > /sys/devices/system/memory/memoryN/state
...
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: James Morris <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Yaowei Bai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series ""Hotremove" persistent memory", v6.
Recently, adding a persistent memory to be used like a regular RAM was
added to Linux. This work extends this functionality to also allow hot
removing persistent memory.
We (Microsoft) have an important use case for this functionality.
The requirement is for physical machines with small amount of RAM (~8G)
to be able to reboot in a very short period of time (<1s). Yet, there
is a userland state that is expensive to recreate (~2G).
The solution is to boot machines with 2G preserved for persistent
memory.
Copy the state, and hotadd the persistent memory so machine still has
all 8G available for runtime. Before reboot, offline and hotremove
device-dax 2G, copy the memory that is needed to be preserved to pmem0
device, and reboot.
The series of operations look like this:
1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
and free ramdisk.
2. Convert raw pmem0 to devdax
ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
3. Hotadd to System RAM
echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
echo online_movable > /sys/devices/system/memoryXXX/state
4. Before reboot hotremove device-dax memory from System RAM
echo offline > /sys/devices/system/memoryXXX/state
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
5. Create raw pmem0 device
ndctl create-namespace --mode raw -e namespace0.0 -f
6. Copy the state that was stored by apps to ramdisk to pmem device
7. Do kexec reboot or reboot through firmware if firmware does not
zero memory in pmem0 region (These machines have only regular
volatile memory). So to have pmem0 device either memmap kernel
parameter is used, or devices nodes in dtb are specified.
This patch (of 3):
When add_memory() fails, the resource and the memory should be freed.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM")
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: James Morris <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Yaowei Bai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is intended for use with NVDIMMs that are physically persistent
(physically like flash) so that they can be used as a cost-effective
RAM replacement. Intel Optane DC persistent memory is one
implementation of this kind of NVDIMM.
Currently, a persistent memory region is "owned" by a device driver,
either the "Direct DAX" or "Filesystem DAX" drivers. These drivers
allow applications to explicitly use persistent memory, generally
by being modified to use special, new libraries. (DIMM-based
persistent memory hardware/software is described in great detail
here: Documentation/nvdimm/nvdimm.txt).
However, this limits persistent memory use to applications which
*have* been modified. To make it more broadly usable, this driver
"hotplugs" memory into the kernel, to be managed and used just like
normal RAM would be.
To make this work, management software must remove the device from
being controlled by the "Device DAX" infrastructure:
echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
and then tell the new driver that it can bind to the device:
echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
After this, there will be a number of new memory sections visible
in sysfs that can be onlined, or that may get onlined by existing
udev-initiated memory hotplug rules.
This rebinding procedure is currently a one-way trip. Once memory
is bound to "kmem", it's there permanently and can not be
unbound and assigned back to device_dax.
The kmem driver will never bind to a dax device unless the device
is *explicitly* bound to the driver. There are two reasons for
this: One, since it is a one-way trip, it can not be undone if
bound incorrectly. Two, the kmem driver destroys data on the
device. Think of if you had good data on a pmem device. It
would be catastrophic if you compile-in "kmem", but leave out
the "device_dax" driver. kmem would take over the device and
write volatile data all over your good data.
This inherits any existing NUMA information for the newly-added
memory from the persistent memory device that came from the
firmware. On Intel platforms, the firmware has guarantees that
require each socket's persistent memory to be in a separate
memory-only NUMA node. That means that this patch is not expected
to create NUMA nodes, but will simply hotplug memory into existing
nodes.
Because NUMA nodes are created, the existing NUMA APIs and tools
are sufficient to create policies for applications or memory areas
to have affinity for or an aversion to using this memory.
There is currently some metadata at the beginning of pmem regions.
The section-size memory hotplug restrictions, plus this small
reserved area can cause the "loss" of a section or two of capacity.
This should be fixable in follow-on patches. But, as a first step,
losing 256MB of memory (worst case) out of hundreds of gigabytes
is a good tradeoff vs. the required code to fix this up precisely.
This calculation is also the reason we export
memory_block_size_bytes().
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Huang Ying <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Yaowei Bai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Jerome Glisse <[email protected]>
Reviewed-by: Vishal Verma <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|