aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/e820.c
AgeCommit message (Collapse)AuthorFilesLines
2017-01-28x86/boot/e820: Rename e820_any_mapped()/e820_all_mapped() to ↵Ingo Molnar1-3/+3
e820__mapped_any()/e820__mapped_all() The 'any' and 'all' are modified to the 'mapped' concept, so move them last in the name. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename sanitize_e820_table() to e820__update_table()Ingo Molnar1-8/+8
sanitize_e820_table() is a minor misnomer in that it suggests that the E820 table requires sanitizing - which implies that it will only do anything if the E820 table is irregular (not sane). That is wrong, because sanitize_e820_table() also does a very regular sorting of the E820 table, which is a necessity in the basic append-only flow of E820 updates the kernel is allowed to perform to it. So rename it to e820__update_table() to include that purpose as well. This also lines up all the table-update functions into a coherent naming family: int e820__update_table(struct e820_entry *biosmap, int max_nr_map, u32 *pnr_map); void e820__update_table_print(void); void e820__update_table_firmware(void); No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename update_e820() to e820__update_table()Ingo Molnar1-3/+3
update_e820() should have 'e820' as a prefix as most of the other E820 functions have - but it's also a bit unclear about its purpose, as it's unclear what is updated - the whole table, or an entry? Also, the name does not express that it's a trivial wrapper around sanitize_e820_table() that also prints out the resulting table. So rename it to e820__update_table_print(). This also makes it harmonize with the e820__update_table_firmware() function which has a very similar purpose. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename early_reserve_e820() to e820__memblock_alloc() and ↵Ingo Molnar1-3/+8
document it early_reserve_e820() is an early hack for kexec that does a limited fixup of the mptable and passes it to the kexec kernel as if it was the real thing. For this it needs to allocate memory - but no memory allocator is available yet beyond the memblock allocator, so early_reserve_e820() is really a wrapper around memblock_alloc() plus a hack to update the e820_table_firmware entries. The name 'reserve' is really a bit of a misnomer, as 'reserved' memory typically means memory completely inaccessible to the kernel - while here what we want to do is a special RAM allocation for our own purposes and insert that as RAM_RESERVED. Rename the function to e820__memblock_alloc_reserved() to better signal this dual purpose, plus document it better, which was omitted when it was merged. The barely comprehensible and cryptic comment: /* * pre allocated 4k and reserved it in memblock and e820_table_firmware */ u64 __init e820__memblock_alloc_reserved(u64 size, u64 align) ... does not count as documentation, replace it with: /* * Allocate the requested number of bytes with the requsted alignment * and return (the physical address) to the caller. Also register this * range in the 'firmware' E820 table. * * This allows kexec to fake a new mptable, as if it came from the real * system. */ u64 __init e820__memblock_alloc_reserved(u64 size, u64 align) No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Clarify the role of finish_e820_parsing() and rename it to ↵Ingo Molnar1-1/+6
e820__finish_early_params() finish_e820_parsing() is closely related to parse_early_params(), but the name does not tell us this clearly, so rename it to e820__finish_early_params(). Also add a few comments to explain what the function does. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Move e820_reserve_setup_data() to e820.cIngo Molnar1-0/+22
The e820_reserve_setup_data() is local to arch/x86/kernel/setup.c, but it is E820 functionality - so move it to e820.c to better isolate E820 functionality. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename parse_e820_ext() to e820__memory_setup_extended()Ingo Molnar1-1/+1
parse_e820_ext() is very similar to e820__memory_setup_default(), both are taking bootloader provided data, add it to the E820 table and then pass it sanitize_e820_table(). Rename it to e820__memory_setup_extended() to better signal their similar role. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Move the memblock_find_dma_reserve() function and rename it ↵Ingo Molnar1-31/+0
to memblock_set_dma_reserve() We introduced memblock_find_dma_reserve() in this commit: 6f2a75369e75 x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve But there's several problems with it: - The changelog is full of typos and is incomprehensible in general, and the comments in the code are not much better either. - The function was inexplicably placed into e820.c, while it has very little connection to the E820 table: when we call memblock_find_dma_reserve() then memblock is already set up and we are not using the E820 table anymore. - The function is a wrapper around set_dma_reserve(), but changed the 'set' name to 'find' - actively misleading about its primary purpose, which is still to set the DMA-reserve value. - The function is limited to 64-bit systems, but neither the changelog nor the comments explain why. The change would appear to be relevant to 32-bit systems as well, as the ISA DMA zone is the first 16 MB of RAM. So address some of these problems: - Move it into arch/x86/mm/init.c, next to the other zone setup related functions. - Clean up the code flow and names of local variables a bit. - Rename it to memblock_set_dma_reserve() - Improve the comments. No change in functionality. Enabling it for 32-bit systems is left for a separate patch. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Convert printk(KERN_* ...) to pr_*()Ingo Molnar1-25/+25
No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Consolidate 'struct e820_entry *entry' local variable namesIngo Molnar1-55/+55
So the E820 code has a lot of cases of: struct e820_entry *ei; ... but the 'ei' name makes very little sense if you think about it, it's not an abbreviation of anything obviously related to E820 table entries. This results in weird looking lines such as: if (type && ei->type != type) where you might have to double check what 'ei' really means, plus weird looking secondary variable names, such as: u64 ei_end; The 'ei' name was introduced in a single function over a decade ago, and then mindlessly cargo-copied over into other functions - with usage growing to over 60 uses altogether (!). ( My best guess is that it might have been originally meant as abbreviation of 'entry interval'. ) Anyway, rename these to the much more obvious: struct e820_entry *entry; No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename memblock_x86_fill() to e820__memblock_setup() and ↵Ingo Molnar1-5/+9
improve the explanations So memblock_x86_fill() is another E820 code misnomer: - nothing in its name tells us that it's part of the E820 subsystem ... - The 'fill' wording is ambiguous and doesn't tell us whether it's a single entry or some process - while the _real_ purpose of the function is hidden, which is to do a complete setup of the (platform independent) memblock regions. So rename it accordingly, to e820__memblock_setup(). Also translate this incomprehensible and misleading comment: /* * EFI may have more than 128 entries * We are safe to enable resizing, beause memblock_x86_fill() * is rather later for x86 */ memblock_allow_resize(); The worst aspect of this comment isn't even the sloppy typos, but that it casually mentions a '128' number with no explanation, which makes one lead to the assumption that this is related to the well-known limit of a maximum of 128 E820 entries passed via legacy bootloaders. But no, the _real_ meaning of 128 here is that of the memblock subsystem, which too happens to have a 128 entries limit for very early memblock regions (which is unrelated to E820), via INIT_MEMBLOCK_REGIONS ... So change the comment to a more comprehensible version: /* * The bootstrap memblock region count maximum is 128 entries * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries * than that - so allow memblock resizing. * * This is safe, because this call happens pretty late during x86 setup, * so we know about reserved memory regions already. (This is important * so that memblock resizing does no stomp over reserved areas.) */ memblock_allow_resize(); No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Basic cleanup of e820.cIngo Molnar1-240/+207
Over the last decade or so e820.c has become an ureadable mess of tinkerware. Perform some very basic cleanups before doing more intricate cleanups, so that my eyes don't start bleeding when I look at it. Here's some of the excesses: - Total disregard of countless aspects of Documentation/CodingStyle. - Totally inconsistent hodge-podge of various coding styles and practices. - Gems like: (unsigned long long) e820_table->entries[i].addr ... which is a completely unnecessary type conversion of an u64 value. - Incomprehensible comments while there are major functions with absolutely no explanation - plus an armada of typos and grammar mistakes. - Mindless checkpatch artifacts such as: if (append_e820_table(boot_params.e820_table, boot_params.e820_entries) < 0) { for_each_free_mem_range(u, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) { - Actively misleading comments: /* In case someone cares... */ return who; ( The usage site of the return value just a few lines further down makes it clear that we very much care about the return value, we use it to print out the e820 map... ) - Colorfully inconsistent capitalization and punctuation throughout. - etc. This patch fixes only the worst excesses - there's more to fix. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename e820_table_saved to e820_table_firmware and improve ↵Ingo Molnar1-29/+49
the description So the 'e820_table_saved' is a bit of a misnomer that hides its real purpose. At first sight the name suggests that it's some sort save/restore mechanism, as this is how we typically name such facilities in the kernel. But that is not so, e820_table_saved is the original firmware version of the e820 table, not modified by the kernel. This table is displayed in the /sys/firmware/memmap file, and it's also used by the hibernation code to calculate a physical memory layout MD5 fingerprint checksum which is invariant of the kernel. So rename it to 'e820_table_firmware' and update all the comments to better describe the main e820 data strutures. Also rename: 'initial_e820_table_saved' => 'e820_table_firmware_init' 'e820_update_range_saved' => 'e820_update_range_firmware' ... to better match the new nomenclature. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename default_machine_specific_memory_setup() to ↵Ingo Molnar1-2/+7
e820__memory_setup_default() The default_machine_specific_memory_setup() is a mouthful and despite the many words it doesn't actually tell us clearly what it does. The function is the x86 legacy memory layout setup code, based on E820-formatted memory layout information passed by the bootloader via the boot_params. Rename it to e820__memory_setup_default() to better signal its purpose. Also rename the related higher level function to be consistent with this new naming: setup_memory_map() => e820__memory_setup() No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Harmonize the 'struct e820_table' fieldsIngo Molnar1-58/+56
So the e820_table->map and e820_table->nr_map names are a bit confusing, because it's not clear what a 'map' really means (it could be a bitmap, or some other data structure), nor is it clear what nr_map means (is it a current index, or some other count). Rename the fields from: e820_table->map => e820_table->entries e820_table->nr_map => e820_table->nr_entries which makes it abundantly clear that these are entries of the table, and that the size of the table is ->nr_entries. Propagate the changes to all affected files. Where necessary, adjust local variable names to better reflect the new field names. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename everything to e820_tableIngo Molnar1-76/+76
No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename 'e820_map' variables to 'e820_array'Ingo Molnar1-74/+74
In line with the rename to 'struct e820_array', harmonize the naming of common e820 table variable names as well: e820 => e820_array e820_saved => e820_array_saved e820_map => e820_array initial_e820 => e820_array_init This makes the variable names more consistent and easier to grep for. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Remove e820_mark_nosave_regions() definition ugliesIngo Molnar1-3/+0
The e820_mark_nosave_regions definition has a number of ugly #ifdef conditions that unnecessarily uglify both the header and the e820.c file. Make this function unconditional: most distro kernels have hibernation enabled. If LTO functionality is added in the future it will be able to eliminate unused functions without uglifying the source code. No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Rename the basic e820 data types to 'struct e820_entry' and ↵Ingo Molnar1-32/+32
'struct e820_array' The 'e820entry' and 'e820map' names have various annoyances: - the missing underscore departs from the usual kernel style and makes the code look weird, - in the past I kept confusing the 'map' with the 'entry', because a 'map' is ambiguous in that regard, - it's not really clear from the 'e820map' that this is a regular C array. Rename them to 'struct e820_entry' and 'struct e820_array' accordingly. ( Leave the legacy UAPI header alone but do the rename in the bootparam.h and e820/types.h file - outside tools relying on these defines should either adjust their code, or should use the legacy header, or should create their private copies for the definitions. ) No change in functionality. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-28x86/boot/e820: Move asm/e820.h to asm/e820/api.hIngo Molnar1-1/+1
In line with asm/e820/types.h, move the e820 API declarations to asm/e820/api.h and update all usage sites. This is just a mechanical, obviously correct move & replace patch, there will be subsequent changes to clean up the code and to make better use of the new header organization. Cc: Alex Thorlton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Yang <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-01-12x86/e820/32: Fix e820_search_gap() error handling on x86-32Arnd Bergmann1-2/+4
GCC correctly points out that on 32-bit kernels, e820_search_gap() not finding a start now leads to pci_mem_start ('gapstart') being set to an uninitialized value: arch/x86/kernel/e820.c: In function 'e820_setup_gap': arch/x86/kernel/e820.c:641:16: error: 'gapstart' may be used uninitialized in this function [-Werror=maybe-uninitialized] This restores the behavior from before this cleanup: b4ed1d15b453 ("x86/e820: Make e820_search_gap() static and remove unused variables") ... defaulting to address 0x10000000 if nothing was found. Signed-off-by: Arnd Bergmann <[email protected]> Cc: Dan Williams <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Wei Yang <[email protected]> Fixes: b4ed1d15b453 ("x86/e820: Make e820_search_gap() static and remove unused variables") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-12-28x86/e820: Make e820_search_gap() static and remove unused variablesWei Yang1-11/+5
e820_search_gap() is just used locally now and the 'start_addr' and 'end_addr' parameters are fixed values. Also, 'gapstart' is not checked in this function anymore. So make the function static and remove those unused variables. Signed-off-by: Wei Yang <[email protected]> Acked-by: Yinghai Lu <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-10-16x86/e820: Don't merge consecutive E820_PRAM rangesDan Williams1-1/+1
Commit: 917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation") ... fixed up the broken manipulations of max_pfn in the presence of E820_PRAM ranges. However, it also broke the sanitize_e820_map() support for not merging E820_PRAM ranges. Re-introduce the enabling to keep resource boundaries between consecutive defined ranges. Otherwise, for example, an environment that boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0 device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size. Reported-by: Dave Chinner <[email protected]> Signed-off-by: Dan Williams <[email protected]> Cc: <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Zhang Yi <[email protected]> Cc: [email protected] Fixes: 917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation") Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <[email protected]>
2016-09-22x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulationDan Williams1-9/+5
In commit: ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type") Christoph references the original patch I wrote implementing pmem support. The intent of the 'max_pfn' changes in that commit were to enable persistent memory ranges to be covered by the struct page memmap by default. However, that approach was abandoned when Christoph ported the patches [1], and that functionality has since been replaced by devm_memremap_pages(). In the meantime, this max_pfn manipulation is confusing kdump [2] that assumes that everything covered by the max_pfn is "System RAM". This results in kdump hanging or crashing. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-March/000348.html [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1351098 So fix it. Reported-by: Zhang Yi <[email protected]> Reported-by: Jeff Moyer <[email protected]> Tested-by: Zhang Yi <[email protected]> Signed-off-by: Dan Williams <[email protected]> Reviewed-by: Jeff Moyer <[email protected]> Cc: <[email protected]> # v4.1 and later kernels Cc: Andrew Morton <[email protected]> Cc: Boaz Harrosh <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type") Link: http://lkml.kernel.org/r/147448744538.34910.11287693517367139607.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <[email protected]>
2016-09-21x86/e820: Use much less memory for e820/e820_saved, save up to 120kDenys Vlasenko1-4/+4
The maximum size of e820 map array for EFI systems is defined as E820_X_MAX (E820MAX + 3 * MAX_NUMNODES). In x86_64 defconfig, this ends up with E820_X_MAX = 320, e820 and e820_saved are 6404 bytes each. With larger configs, for example Fedora kernels, E820_X_MAX = 3200, e820 and e820_saved are 64004 bytes each. Most of this space is wasted. Typical machines have some 20-30 e820 areas at most. After previous patch, e820 and e820_saved are pointers to e280 maps. Change them to initially point to maps which are __initdata. At the very end of kernel init, just before __init[data] sections are freed in free_initmem(), allocate smaller blocks, copy maps there, and change pointers. The late switch makes sure that all functions which can be used to change e820 maps are no longer accessible (they are all __init functions). Run-tested. Signed-off-by: Denys Vlasenko <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-09-21x86/e820: Prepare e280 code for switch to dynamic storageDenys Vlasenko1-48/+77
This patch turns e820 and e820_saved into pointers to e820 tables, of the same size as before. Signed-off-by: Denys Vlasenko <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-09-21x86/e820: Mark some static functions __initDenys Vlasenko1-6/+6
They are all called only from other __init functions in e820.c Signed-off-by: Denys Vlasenko <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-09-08x86/e820: Fix very large 'size' handling boundary conditionWei Yang1-2/+2
The (start, size) tuple represents a range [start, start + size - 1], which means "start" and "start + size - 1" should be compared to see whether the range overflows. For example, a range with (start, size): (0xffffffff fffffff0, 0x00000000 00000010) represents [0xffffffff fffffff0, 0xffffffff ffffffff] ... would be judged overflow in the original code, while actually it is not. This patch fixes this and makes sure it still works when size is zero. Signed-off-by: Wei Yang <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-15Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "This is another big update. Main changes are: - lots of x86 system call (and other traps/exceptions) entry code enhancements. In particular the complex parts of the 64-bit entry code have been migrated to C code as well, and a number of dusty corners have been refreshed. (Andy Lutomirski) - vDSO special mapping robustification and general cleanups (Andy Lutomirski) - cpufeature refactoring, cleanups and speedups (Borislav Petkov) - lots of other changes ..." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/cpufeature: Enable new AVX-512 features x86/entry/traps: Show unhandled signal for i386 in do_trap() x86/entry: Call enter_from_user_mode() with IRQs off x86/entry/32: Change INT80 to be an interrupt gate x86/entry: Improve system call entry comments x86/entry: Remove TIF_SINGLESTEP entry work x86/entry/32: Add and check a stack canary for the SYSENTER stack x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed x86/entry: Vastly simplify SYSENTER TF (single-step) handling x86/entry/traps: Clear DR6 early in do_debug() and improve the comment x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions x86/entry/32: Restore FLAGS on SYSEXIT x86/entry/32: Filter NT and speed up AC filtering in SYSENTER x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test selftests/x86: In syscall_nt, test NT|TF as well x86/asm-offsets: Remove PARAVIRT_enabled x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled uprobes: __create_xol_area() must nullify xol_mapping.fault x86/cpufeature: Create a new synthetic cpu capability for machine check recovery ...
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov1-0/+1
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-30x86/e820: Set System RAM type and descriptorToshi Kani1-1/+37
Change e820_reserve_resources() to set 'flags' and 'desc' from e820 types. Set E820_RESERVED_KERN and E820_RAM's (System RAM) io resource type to IORESOURCE_SYSTEM_RAM. Do the same for "Kernel data", "Kernel code", and "Kernel bss", which are child nodes of System RAM. I/O resource descriptor is set to 'desc' for entries that are (and will be) target ranges of walk_iomem_res() and region_intersects(). Signed-off-by: Toshi Kani <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Young <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Mark Salter <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Toshi Kani <[email protected]> Cc: WANG Chao <[email protected]> Cc: [email protected] Cc: linux-mm <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-09-30x86/e820: Deinline e820_type_to_string, save 126 bytesDenys Vlasenko1-1/+1
This function compiles to 102 bytes of machine code. It has two callsites. Signed-off-by: Denys Vlasenko <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-29Merge tag 'libnvdimm-for-4.2' of ↵Linus Torvalds1-4/+24
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm Pull libnvdimm subsystem from Dan Williams: "The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 4 drivers / enabling modules: NFIT: Instantiates an "nvdimm bus" with the core and registers memory devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface table). After registering NVDIMMs the NFIT driver then registers "region" devices. A libnvdimm-region defines an access mode and the boundaries of persistent memory media. A region may span multiple NVDIMMs that are interleaved by the hardware memory controller. In turn, a libnvdimm-region can be carved into a "namespace" device and bound to the PMEM or BLK driver which will attach a Linux block device (disk) interface to the memory. PMEM: Initially merged in v4.1 this driver for contiguous spans of persistent memory address ranges is re-worked to drive PMEM-namespaces emitted by the libnvdimm-core. In this update the PMEM driver, on x86, gains the ability to assert that writes to persistent memory have been flushed all the way through the caches and buffers in the platform to persistent media. See memcpy_to_pmem() and wmb_pmem(). BLK: This new driver enables access to persistent memory media through "Block Data Windows" as defined by the NFIT. The primary difference of this driver to PMEM is that only a small window of persistent memory is mapped into system address space at any given point in time. Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access different portions of the media. BLK-mode, by definition, does not support DAX. BTT: This is a library, optionally consumed by either PMEM or BLK, that converts a byte-accessible namespace into a disk with atomic sector update semantics (prevents sector tearing on crash or power loss). The sinister aspect of sector tearing is that most applications do not know they have a atomic sector dependency. At least today's disk's rarely ever tear sectors and if they do one almost certainly gets a CRC error on access. NVDIMMs will always tear and always silently. Until an application is audited to be robust in the presence of sector-tearing the usage of BTT is recommended. Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig, Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox, Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael Wysocki, and Bob Moore" * tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits) arch, x86: pmem api for ensuring durability of persistent memory updates libnvdimm: Add sysfs numa_node to NVDIMM devices libnvdimm: Set numa_node to NVDIMM devices acpi: Add acpi_map_pxm_to_online_node() libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only pmem: flag pmem block devices as non-rotational libnvdimm: enable iostat pmem: make_request cleanups libnvdimm, pmem: fix up max_hw_sectors libnvdimm, blk: add support for blk integrity libnvdimm, btt: add support for blk integrity fs/block_dev.c: skip rw_page if bdev has integrity libnvdimm: Non-Volatile Devices tools/testing/nvdimm: libnvdimm unit test infrastructure libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory nd_btt: atomic sector updates libnvdimm: infrastructure for btt devices libnvdimm: write blk label set libnvdimm: write pmem label set libnvdimm: blk labels and namespace instantiation ...
2015-06-24mm/memblock: add extra "flags" to memblock to allow selection of memory ↵Tony Luck1-1/+2
based on attribute Some high end Intel Xeon systems report uncorrectable memory errors as a recoverable machine check. Linux has included code for some time to process these and just signal the affected processes (or even recover completely if the error was in a read only page that can be replaced by reading from disk). But we have no recovery path for errors encountered during kernel code execution. Except for some very specific cases were are unlikely to ever be able to recover. Enter memory mirroring. Actually 3rd generation of memory mirroing. Gen1: All memory is mirrored Pro: No s/w enabling - h/w just gets good data from other side of the mirror Con: Halves effective memory capacity available to OS/applications Gen2: Partial memory mirror - just mirror memory begind some memory controllers Pro: Keep more of the capacity Con: Nightmare to enable. Have to choose between allocating from mirrored memory for safety vs. NUMA local memory for performance Gen3: Address range partial memory mirror - some mirror on each memory controller Pro: Can tune the amount of mirror and keep NUMA performance Con: I have to write memory management code to implement The current plan is just to use mirrored memory for kernel allocations. This has been broken into two phases: 1) This patch series - find the mirrored memory, use it for boot time allocations 2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the unused mirrored memory from mm/memblock.c and only give it out to select kernel allocations (this is still being scoped because page_alloc.c is scary). This patch (of 3): Add extra "flags" to memblock to allow selection of memory based on attribute. No functional changes Signed-off-by: Tony Luck <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Hanjun Guo <[email protected]> Cc: Xiexiuqi <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-05-27e820, efi: add ACPI 6.0 persistent memory typesDan Williams1-4/+24
ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory. Mark it "reserved" and allow it to be claimed by a persistent memory device driver. This definition is in addition to the Linux kernel's existing type-12 definition that was recently added in support of shipping platforms with NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as OEM reserved). Note, /proc/iomem can be consulted for differentiating legacy "Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory" E820_PMEM. Cc: Boaz Harrosh <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Thomas Gleixner <[email protected]> Acked-by: Jeff Moyer <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Tested-by: Toshi Kani <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2015-04-18Merge branch 'x86-pmem-for-linus' of ↵Linus Torvalds1-6/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull PMEM driver from Ingo Molnar: "This is the initial support for the pmem block device driver: persistent non-volatile memory space mapped into the system's physical memory space as large physical memory regions. The driver is based on Intel code, written by Ross Zwisler, with fixes by Boaz Harrosh, integrated with x86 e820 memory resource management and tidied up by Christoph Hellwig. Note that there were two other separate pmem driver submissions to lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are reasonably happy with this initial version. This version enables minimal support that enables persistent memory devices out in the wild to work as block devices, identified through a magic (non-standard) e820 flag and auto-discovered if CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the memory maps via the "memmap=..." boot option with the new, special '!' modifier character. Limitations: this is a regular block device, and since the pmem areas are not struct page backed, they are invisible to the rest of the system (other than the block IO device), so direct IO to/from pmem areas, direct mmap() or XIP is not possible yet. The page cache will also shadow and double buffer pmem contents, etc. Initial support is for x86" * 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: drivers/block/pmem: Fix 32-bit build warning in pmem_alloc() drivers/block/pmem: Add a driver for persistent memory x86/mm: Add support for the non-standard protected e820 type
2015-04-01x86/mm: Add support for the non-standard protected e820 typeChristoph Hellwig1-6/+20
Various recent BIOSes support NVDIMMs or ADR using a non-standard e820 memory type, and Intel supplied reference Linux code using this type to various vendors. Wire this e820 table type up to export platform devices for the pmem driver so that we can use it in Linux. Based on earlier work from: Dave Jiang <[email protected]> Dan Williams <[email protected]> Includes fixes for NUMA regions from Boaz Harrosh. Tested-by: Ross Zwisler <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Dan Williams <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Boaz Harrosh <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Keith Busch <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Minor cleanups. ] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-24x86/mm: Use early_memunmap() instead of early_iounmap()Juergen Gross1-1/+1
Memory mapped via early_memremap() should be unmapped with early_memunmap() instead of early_iounmap(). Signed-off-by: Juergen Gross <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-01-23x86, e820: Clean up sanitize_e820_map() usersWANG Chao1-18/+8
The argument 3 of sanitize_e820_map() will only be updated upon a successful sanitization. Some of the callers have extra conditionals for the same purpose. Clean them up. default_machine_specific_memory_setup() must keep the extra conditional because boot_params.e820_entries is an u8 and not an u32, so the direct update would overwrite other fields in boot_params. [ tglx: Massaged changelog ] Signed-off-by: WANG Chao <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Lee Chun-Yi <[email protected]> Cc: Xishi Qiu <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-12-11x86/mm: Use min() instead of min_t() in the e820 printout codeXishi Qiu1-2/+2
The type of "MAX_DMA_PFN" and "xXx_pfn" are both unsigned long now, so use min() instead of min_t(). Suggested-by: Andrew Morton <[email protected]> Signed-off-by: Xishi Qiu <[email protected]> Cc: Linux MM <[email protected]> Cc: <[email protected]> Cc: Rik van Riel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-09-16x86/mm, hibernate: Do not assume the first e820 area to be RAMLee, Chun-Yi1-4/+3
In arch/x86/kernel/setup.c::trim_bios_range(), the codes introduced by 1b5576e6 (base on d8a9e6a5), it updates the first 4Kb of memory to be E820_RESERVED region. That's because it's a BIOS owned area but generally not listed in the E820 table: e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x0000000000096fff] usable BIOS-e820: [mem 0x0000000000097000-0x0000000000097fff] reserved ... e820: update [mem 0x00000000-0x00000fff] usable ==> reserved e820: remove [mem 0x000a0000-0x000fffff] usable But the region of first 4Kb didn't register to nosave memory: PM: Registered nosave memory: [mem 0x00097000-0x00097fff] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] The code in e820_mark_nosave_regions() assumes the first e820 area to be RAM, so it causes the first 4Kb E820_RESERVED region ignored when register to nosave. This patch removed assumption of the first e820 area. Signed-off-by: Lee, Chun-Yi <[email protected]> Acked-by: Pavel Machek <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Len Brown <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Takashi Iwai <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-01-21x86/mm: memblock: switch to use NUMA_NO_NODEGrygorii Strashko1-1/+1
Update X86 code to use NUMA_NO_NODE instead of MAX_NUMNODES while calling memblock APIs, because memblock API will be changed to use NUMA_NO_NODE and will produce warning during boot otherwise. See: https://lkml.org/lkml/2013/12/9/898 Signed-off-by: Grygorii Strashko <[email protected]> Cc: Santosh Shilimkar <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-08-13x86: avoid remapping data in parse_setup_data()Linn Crosetto1-1/+4
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size larger than early_memremap() is able to handle, which is currently limited to 256kB. If this occurs it leads to a NULL dereference in parse_setup_data(). To avoid this, remap the setup_data header and allow parsing functions for individual types to handle their own data remapping. Signed-off-by: Linn Crosetto <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Acked-by: Yinghai Lu <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2012-11-17x86, mm: Let "memmap=" take more entries one timeYinghai Lu1-1/+15
Current "memmap=" only can take one entry every time. when we have more entries, we have to use memmap= for each of them. For pxe booting, we have command line length limitation, those extra "memmap=" would waste too much space. This patch make memmap= could take several entries one time, and those entries will be split with ',' Signed-off-by: Yinghai Lu <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2012-10-24x86, mm: Trim memory in memblock to be page alignedYinghai Lu1-0/+3
We will not map partial pages, so need to make sure memblock allocation will not allocate those bytes out. Also we will use for_each_mem_pfn_range() to loop to map memory range to keep them consistent. Signed-off-by: Yinghai Lu <[email protected]> Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com Acked-by: Jacob Shin <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: <[email protected]>
2012-07-30firmware_map: make firmware_map_add_early() argument consistent with ↵Yasuaki Ishimatsu1-1/+1
firmware_map_add_hotplug() There are two ways to create /sys/firmware/memmap/X sysfs: - firmware_map_add_early When the system starts, it is calledd from e820_reserve_resources() - firmware_map_add_hotplug When the memory is hot plugged, it is called from add_memory() But these functions are called without unifying value of end argument as below: - end argument of firmware_map_add_early() : start + size - 1 - end argument of firmware_map_add_hogplug() : start + size The patch unifies them to "start + size". Even if applying the patch, /sys/firmware/memmap/X/end file content does not change. [[email protected]: clarify comments] Signed-off-by: Yasuaki Ishimatsu <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-05-29x86: print e820 physical addresses consistently with other parts of kernelBjorn Helgaas1-26/+27
Print physical address info in a style consistent with the %pR style used elsewhere in the kernel. For example: -BIOS-provided physical RAM map: +e820: BIOS-provided physical RAM map: - BIOS-e820: 0000000000000100 - 000000000009e000 (usable) +BIOS-e820: [mem 0x0000000000000100-0x000000000009dfff] usable -Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000) +e820: [mem 0x90000000-0xfed1bfff] available for PCI devices -reserve RAM buffer: 000000000009e000 - 000000000009ffff +e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff] Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-01-18Merge branch 'release' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux This includes initial support for the recently published ACPI 5.0 spec. In particular, support for the "hardware-reduced" bit that eliminates the dependency on legacy hardware. APEI has patches resulting from testing on real hardware. Plus other random fixes. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits) acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec intel_idle: Split up and provide per CPU initialization func ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2 ACPI processor: Remove unneeded cpuidle_unregister_driver call intel idle: Make idle driver more robust intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle ACPI: kernel-parameters.txt : Add intel_idle.max_cstate intel_idle: remove redundant local_irq_disable() call ACPI processor: Fix error path, also remove sysdev link ACPI: processor: fix acpi_get_cpuid for UP processor intel_idle: fix API misuse ACPI APEI: Convert atomicio routines ACPI: Export interfaces for ioremapping/iounmapping ACPI registers ACPI: Fix possible alignment issues with GAS 'address' references ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64) ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64) ACPI: Store SRAT table revision ACPI, APEI, Resolve false conflict between ACPI NVS and APEI ACPI, Record ACPI NVS regions ACPI, APEI, EINJ, Refine the fix of resource conflict ...
2012-01-17ACPI, Record ACPI NVS regionsHuang Ying1-2/+2
Some firmware will access memory in ACPI NVS region via APEI. That is, instructions in APEI ERST/EINJ table will read/write ACPI NVS region. The original resource conflict checking in APEI code will check memory/ioport accessed by APEI via general resource management mechanism. But ACPI NVS region is marked as busy already, so that the false resource conflict will prevent APEI ERST/EINJ to work. To fix this, this patch record ACPI NVS regions, so that we can avoid request resources for memory region inside it. Signed-off-by: Huang Ying <[email protected]> Signed-off-by: Len Brown <[email protected]>
2012-01-11Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-35/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/numa: Add constraints check for nid parameters mm, x86: Remove debug_pagealloc_enabled x86/mm: Initialize high mem before free_all_bootmem() arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map() x86: Fix mmap random address range x86, mm: Unify zone_sizes_init() x86, mm: Prepare zone_sizes_init() for unification x86, mm: Use max_low_pfn for ZONE_NORMAL on 64-bit x86, mm: Wrap ZONE_DMA32 with CONFIG_ZONE_DMA32 x86, mm: Use max_pfn instead of highend_pfn x86, mm: Move zone init from paging_init() on 64-bit x86, mm: Use MAX_DMA_PFN for ZONE_DMA on 32-bit