diff options
author | Dan Williams <[email protected]> | 2020-10-13 16:48:57 -0700 |
---|---|---|
committer | Linus Torvalds <[email protected]> | 2020-10-13 18:38:27 -0700 |
commit | 2dd57d3415f8623a5e9494c88978a202886041aa (patch) | |
tree | 7e2eac876a89897c983d7271549e3828e80aee10 | |
parent | 1abbef4f51724fb11f09adf0e75275f7cb422a8a (diff) |
x86/numa: cleanup configuration dependent command-line options
Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5.
The device-dax facility allows an address range to be directly mapped
through a chardev, or optionally hotplugged to the core kernel page
allocator as System-RAM. It is the mechanism for converting persistent
memory (pmem) to be used as another volatile memory pool i.e. the current
Memory Tiering hot topic on linux-mm.
In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
it, but that labeling mechanism is not available / applicable to
soft-reserved ("EFI specific purpose") memory [3]. This series provides a
sysfs-mechanism for the daxctl utility to enable provisioning of
volatile-soft-reserved memory ranges.
The motivations for this facility are:
1/ Allow performance differentiated memory ranges to be split between
kernel-managed and directly-accessed use cases.
2/ Allow physical memory to be provisioned along performance relevant
address boundaries. For example, divide a memory-side cache [4] along
cache-color boundaries.
3/ Parcel out soft-reserved memory to VMs using device-dax as a security
/ permissions boundary [5]. Specifically I have seen people (ab)using
memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
device-dax interface on custom address ranges. A follow-on for the VM
use case is to teach device-dax to dynamically allocate 'struct page' at
runtime to reduce the duplication of 'struct page' space in both the
guest and the host kernel for the same physical pages.
[2]: http://lore.kernel.org/r/[email protected]
[3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
[4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
[5]: http://lore.kernel.org/r/[email protected]
This patch (of 23):
In preparation for adding a new numa= option clean up the existing ones to
avoid ifdefs in numa_setup(), and provide feedback when the option is
numa=fake= option is invalid due to kernel config. The same does not need
to be done for numa=noacpi, since the capability is already hard disabled
at compile-time.
Suggested-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jia He <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Hulk Robot <[email protected]>
Cc: Jason Yan <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Vivek Goyal <[email protected]>
Link: https://lkml.kernel.org/r/160106109960.30709.7379926726669669398.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/159643094925.4062302.14979872973043772305.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
-rw-r--r-- | arch/x86/include/asm/numa.h | 8 | ||||
-rw-r--r-- | arch/x86/mm/numa.c | 8 | ||||
-rw-r--r-- | arch/x86/mm/numa_emulation.c | 3 | ||||
-rw-r--r-- | arch/x86/xen/enlighten_pv.c | 2 | ||||
-rw-r--r-- | drivers/acpi/numa/srat.c | 9 | ||||
-rw-r--r-- | include/acpi/acpi_numa.h | 6 |
6 files changed, 24 insertions, 12 deletions
diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h index bbfde3d2662f..0aecc0b629e0 100644 --- a/arch/x86/include/asm/numa.h +++ b/arch/x86/include/asm/numa.h @@ -3,6 +3,7 @@ #define _ASM_X86_NUMA_H #include <linux/nodemask.h> +#include <linux/errno.h> #include <asm/topology.h> #include <asm/apicdef.h> @@ -77,7 +78,12 @@ void debug_cpumask_set_cpu(int cpu, int node, bool enable); #ifdef CONFIG_NUMA_EMU #define FAKE_NODE_MIN_SIZE ((u64)32 << 20) #define FAKE_NODE_MIN_HASH_MASK (~(FAKE_NODE_MIN_SIZE - 1UL)) -void numa_emu_cmdline(char *); +int numa_emu_cmdline(char *str); +#else /* CONFIG_NUMA_EMU */ +static inline int numa_emu_cmdline(char *str) +{ + return -EINVAL; +} #endif /* CONFIG_NUMA_EMU */ #endif /* _ASM_X86_NUMA_H */ diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index aa76ec2d359b..87c52822cc44 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -37,14 +37,10 @@ static __init int numa_setup(char *opt) return -EINVAL; if (!strncmp(opt, "off", 3)) numa_off = 1; -#ifdef CONFIG_NUMA_EMU if (!strncmp(opt, "fake=", 5)) - numa_emu_cmdline(opt + 5); -#endif -#ifdef CONFIG_ACPI_NUMA + return numa_emu_cmdline(opt + 5); if (!strncmp(opt, "noacpi", 6)) - acpi_numa = -1; -#endif + disable_srat(); return 0; } early_param("numa", numa_setup); diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c index 683cd12f4793..87d77cc52f86 100644 --- a/arch/x86/mm/numa_emulation.c +++ b/arch/x86/mm/numa_emulation.c @@ -13,9 +13,10 @@ static int emu_nid_to_phys[MAX_NUMNODES]; static char *emu_cmdline __initdata; -void __init numa_emu_cmdline(char *str) +int __init numa_emu_cmdline(char *str) { emu_cmdline = str; + return 0; } static int __init emu_find_memblk_by_nid(int nid, const struct numa_meminfo *mi) diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 41485a8a6dcf..b1418a6c0e90 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1300,7 +1300,7 @@ asmlinkage __visible void __init xen_start_kernel(void) * any NUMA information the kernel tries to get from ACPI will * be meaningless. Prevent it from trying. */ - acpi_numa = -1; + disable_srat(); #endif WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_pv, xen_cpu_dead_pv)); diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c index 15bbaab8500b..1b0ae0a1959b 100644 --- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -27,7 +27,12 @@ static int node_to_pxm_map[MAX_NUMNODES] = { [0 ... MAX_NUMNODES - 1] = PXM_INVAL }; unsigned char acpi_srat_revision __initdata; -int acpi_numa __initdata; +static int acpi_numa __initdata; + +void __init disable_srat(void) +{ + acpi_numa = -1; +} int pxm_to_node(int pxm) { @@ -163,7 +168,7 @@ static int __init slit_valid(struct acpi_table_slit *slit) void __init bad_srat(void) { pr_err("SRAT: SRAT not used.\n"); - acpi_numa = -1; + disable_srat(); } int __init srat_disabled(void) diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h index fdebcfc6c8df..8784183b2204 100644 --- a/include/acpi/acpi_numa.h +++ b/include/acpi/acpi_numa.h @@ -17,10 +17,14 @@ extern int pxm_to_node(int); extern int node_to_pxm(int); extern int acpi_map_pxm_to_node(int); extern unsigned char acpi_srat_revision; -extern int acpi_numa __initdata; +extern void disable_srat(void); extern void bad_srat(void); extern int srat_disabled(void); +#else /* CONFIG_ACPI_NUMA */ +static inline void disable_srat(void) +{ +} #endif /* CONFIG_ACPI_NUMA */ #endif /* __ACP_NUMA_H */ |