aboutsummaryrefslogtreecommitdiff
path: root/kernel/kexec_core.c
AgeCommit message (Collapse)AuthorFilesLines
2021-12-13exit: Move oops specific logic from do_exit into make_task_deadEric W. Biederman1-1/+1
The beginning of do_exit has become cluttered and difficult to read as it is filled with checks to handle things that can only happen when the kernel is operating improperly. Now that we have a dedicated function for cleaning up a task when the kernel is operating improperly move the checks there. Signed-off-by: "Eric W. Biederman" <[email protected]>
2021-08-30Merge branch 'rework/printk_safe-removal' into for-linusPetr Mladek1-1/+0
2021-07-26printk: remove safe buffersJohn Ogness1-1/+0
With @logbuf_lock removed, the high level printk functions for storing messages are lockless. Messages can be stored from any context, so there is no need for the NMI and safe buffers anymore. Remove the NMI and safe buffers. Although the safe buffers are removed, the NMI and safe context tracking is still in place. In these contexts, store the message immediately but still use irq_work to defer the console printing. Since printk recursion tracking is in place, safe context tracking for most of printk is not needed. Remove it. Only safe context tracking relating to the console and console_owner locks is left in place. This is because the console and console_owner locks are needed for the actual printing. Signed-off-by: John Ogness <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-07-01kernel.h: split out panic and oops helpersAndy Shevchenko1-0/+1
kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out panic and oops helpers. There are several purposes of doing this: - dropping dependency in bug.h - dropping a loop by moving out panic_notifier.h - unload kernel.h from something which has its own domain At the same time convert users tree-wide to use new headers, although for the time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [[email protected]: thread_info.h needs limits.h] [[email protected]: ia64 fix] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Bjorn Andersson <[email protected]> Co-developed-by: Andrew Morton <[email protected]> Acked-by: Mike Rapoport <[email protected]> Acked-by: Corey Minyard <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Sebastian Reichel <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Acked-by: Stephen Boyd <[email protected]> Acked-by: Thomas Bogendoerfer <[email protected]> Acked-by: Helge Deller <[email protected]> # parisc Signed-off-by: Linus Torvalds <[email protected]>
2021-05-07kexec: dump kmessage before machine_kexecPavel Tatashin1-0/+2
kmsg_dump(KMSG_DUMP_SHUTDOWN) is called before machine_restart(), machine_halt(), and machine_power_off(). The only one that is missing is machine_kexec(). The dmesg output that it contains can be used to study the shutdown performance of both kernel and systemd during kexec reboot. Here is example of dmesg data collected after kexec: root@dplat-cp22:~# cat /sys/fs/pstore/dmesg-ramoops-0 | tail ... [ 70.914592] psci: CPU3 killed (polled 0 ms) [ 70.915705] CPU4: shutdown [ 70.916643] psci: CPU4 killed (polled 4 ms) [ 70.917715] CPU5: shutdown [ 70.918725] psci: CPU5 killed (polled 0 ms) [ 70.919704] CPU6: shutdown [ 70.920726] psci: CPU6 killed (polled 4 ms) [ 70.921642] CPU7: shutdown [ 70.922650] psci: CPU7 killed (polled 0 ms) Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Reviewed-by: Bhupesh Sharma <[email protected]> Acked-by: Baoquan He <[email protected]> Reviewed-by: Tyler Hicks <[email protected]> Cc: James Morris <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Anton Vorontsov <[email protected]> Cc: Colin Cross <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-07kexec: Add kexec reboot stringJoe LeVeque1-1/+1
The purpose is to notify the kernel module for fast reboot. Upstream a patch from the SONiC network operating system [1]. [1]: https://github.com/Azure/sonic-linux-kernel/pull/46 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joe LeVeque <[email protected]> Signed-off-by: Paul Menzel <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Guohan Lu <[email protected]> Cc: Joe LeVeque <[email protected]> Cc: Paul Menzel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-21Merge branch 'work.elf-compat' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ELF compat updates from Al Viro: "Sanitizing ELF compat support, especially for triarch architectures: - X32 handling cleaned up - MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now - Kconfig side of things regularized Eventually I hope to have compat_binfmt_elf.c killed, with both native and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed by kbuild, but that's a separate story - not included here" * 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of COMPAT_ELF_EXEC_PAGESIZE compat_binfmt_elf: don't bother with undef of ELF_ARCH Kconfig: regularize selection of CONFIG_BINFMT_ELF mips compat: switch to compat_binfmt_elf.c mips: don't bother with ELF_CORE_EFLAGS mips compat: don't bother with ELF_ET_DYN_BASE mips: KVM_GUEST makes no sense for 64bit builds... mips: kill unused definitions in binfmt_elf[on]32.c mips binfmt_elf*32.c: use elfcore-compat.h x32: make X32, !IA32_EMULATION setups able to execute x32 binaries [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly elf_prstatus: collect the common part (everything before pr_reg) into a struct binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
2021-01-25kernel: kexec: remove the lock operation of system_transition_mutexBaoquan He1-2/+0
Function kernel_kexec() is called with lock system_transition_mutex held in reboot system call. While inside kernel_kexec(), it will acquire system_transition_mutex agin. This will lead to dead lock. The dead lock should be easily triggered, it hasn't caused any failure report just because the feature 'kexec jump' is almost not used by anyone as far as I know. An inquiry can be made about who is using 'kexec jump' and where it's used. Before that, let's simply remove the lock operation inside CONFIG_KEXEC_JUMP ifdeffery scope. Fixes: 55f2503c3b69 ("PM / reboot: Eliminate race between reboot and suspend") Signed-off-by: Baoquan He <[email protected]> Reported-by: Dan Carpenter <[email protected]> Reviewed-by: Pingfan Liu <[email protected]> Cc: 4.19+ <[email protected]> # 4.19+ Signed-off-by: Rafael J. Wysocki <[email protected]>
2021-01-06elf_prstatus: collect the common part (everything before pr_reg) into a structAl Viro1-1/+1
Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating the beginning of compat_elf_prstatus, take these fields into a separate structure (compat_elf_prstatus_common), so that it could be reused. Due to the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we need the same shape change done to native struct elf_prstatus, gathering the fields prior to pr_reg into a new structure (struct elf_prstatus_common). Fortunately, offset of pr_reg is always a multiple of 16 with no padding right before it, so it's possible to turn all the stuff prior to it into a single member without disturbing the layout. [build fix from Geert Uytterhoeven folded in] Signed-off-by: Al Viro <[email protected]>
2020-11-20crypto: sha - split sha.h into sha1.h and sha2.hEric Biggers1-1/+0
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2, and <crypto/sha3.h> contains declarations for SHA-3. This organization is inconsistent, but more importantly SHA-1 is no longer considered to be cryptographically secure. So to the extent possible, SHA-1 shouldn't be grouped together with any of the other SHA versions, and usage of it should be phased out. Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and <crypto/sha2.h>, and make everyone explicitly specify whether they want the declarations for SHA-1, SHA-2, or both. This avoids making the SHA-1 declarations visible to files that don't want anything to do with SHA-1. It also prepares for potentially moving sha1.h into a new insecure/ or dangerous/ directory. Signed-off-by: Eric Biggers <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Acked-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2020-10-16kernel/: fix repeated words in commentsRandy Dunlap1-1/+1
Fix multiple occurrences of duplicated words in kernel/. Fix one typo/spello on the same line as a duplicate word. Change one instance of "the the" to "that the". Otherwise just drop one of the repeated words. Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-09-10objtool: Rename frame.h -> objtool.hJulien Thierry1-1/+1
Header frame.h is getting more code annotations to help objtool analyze object files. Rename the file to objtool.h. [ jpoimboe: add objtool.h to MAINTAINERS ] Signed-off-by: Julien Thierry <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]>
2020-01-08kexec: add machine_kexec_post_load()Pavel Tatashin1-0/+6
It is the same as machine_kexec_prepare(), but is called after segments are loaded. This way, can do processing work with already loaded relocation segments. One such example is arm64: it has to have segments loaded in order to create a page table, but it cannot do it during kexec time, because at that time allocations won't be possible anymore. Signed-off-by: Pavel Tatashin <[email protected]> Acked-by: Dave Young <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2020-01-08kexec: quiet down kexec rebootPavel Tatashin1-1/+1
Here is a regular kexec command sequence and output: ===== $ kexec --reuse-cmdline -i --load Image $ kexec -e [ 161.342002] kexec_core: Starting new kernel Welcome to Buildroot buildroot login: ===== Even when "quiet" kernel parameter is specified, "kexec_core: Starting new kernel" is printed. This message has KERN_EMERG level, but there is no emergency, it is a normal kexec operation, so quiet it down to appropriate KERN_NOTICE. Machines that have slow console baud rate benefit from less output. Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Dave Young <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2019-09-25kexec: bail out upon SIGKILL when allocating memory.Tetsuo Handa1-0/+2
syzbot found that a thread can stall for minutes inside kexec_load() after that thread was killed by SIGKILL [1]. It turned out that the reproducer was trying to allocate 2408MB of memory using kimage_alloc_page() from kimage_load_normal_segment(). Let's check for SIGKILL before doing memory allocation. [1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tetsuo Handa <[email protected]> Reported-by: syzbot <[email protected]> Cc: Eric Biederman <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230Thomas Gleixner1-3/+1
Based on 2 normalized pattern(s): this source code is licensed under the gnu general public license version 2 see the file copying for more details this source code is licensed under general public license version 2 see extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 52 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Enrico Weigelt <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-05-03power/suspend: Add function to disable secondaries for suspendNicholas Piggin1-2/+2
This adds a function to disable secondary CPUs for suspend that are not necessarily non-zero / non-boot CPUs. Platforms will be able to use this to suspend using non-zero CPUs. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-12-28mm: convert totalram_pages and totalhigh_pages variables to atomicArun KS1-1/+1
totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [[email protected]: coding style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arun KS <[email protected]> Suggested-by: Michal Hocko <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Reviewed-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-12-28mm: reference totalram_pages and managed_pages once per functionArun KS1-2/+3
Patch series "mm: convert totalram_pages, totalhigh_pages and managed pages to atomic", v5. This series converts totalram_pages, totalhigh_pages and zone->managed_pages to atomic variables. totalram_pages, zone->managed_pages and totalhigh_pages updates are protected by managed_page_count_lock, but readers never care about it. Convert these variables to atomic to avoid readers potentially seeing a store tear. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better to remove the lock and convert variables to atomic. With the change, preventing poteintial store-to-read tearing comes as a bonus. This patch (of 4): This is in preparation to a later patch which converts totalram_pages and zone->managed_pages to atomic variables. Please note that re-reading the value might lead to a different value and as such it could lead to unexpected behavior. There are no known bugs as a result of the current code but it is better to prevent from them in principle. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arun KS <[email protected]> Reviewed-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-06kexec: Allocate decrypted control pages for kdump if SME is enabledLianbo Jiang1-0/+6
When SME is enabled in the first kernel, it needs to allocate decrypted pages for kdump because when the kdump kernel boots, these pages need to be accessed decrypted in the initial boot stage, before SME is enabled. [ bp: clean up text. ] Signed-off-by: Lianbo Jiang <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-06-15kexec: yield to scheduler when loading kimage segmentsJarrett Farnitano1-0/+4
Without yielding while loading kimage segments, a large initrd will block all other work on the CPU performing the load until it is completed. For example loading an initrd of 200MB on a low power single core system will lock up the system for a few seconds. To increase system responsiveness to other tasks at that time, call cond_resched() in both the crash kernel and normal kernel segment loading loops. I did run into a practical problem. Hardware watchdogs on embedded systems can have short timers on the order of seconds. If the system is locked up for a few seconds with only a single core available, the watchdog may not be pet in a timely fashion. If this happens, the hardware watchdog will fire and reset the system. This really only becomes a problem when you are working with a single core, a decently sized initrd, and have a constrained hardware watchdog. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jarrett Farnitano <[email protected]> Reviewed-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-18x86/mm, kexec: Allow kexec to be used with SMETom Lendacky1-1/+11
Provide support so that kexec can be used to boot a kernel when SME is enabled. Support is needed to allocate pages for kexec without encryption. This is needed in order to be able to reboot in the kernel in the same manner as originally booted. Additionally, when shutting down all of the CPUs we need to be sure to flush the caches and then halt. This is needed when booting from a state where SME was not active into a state where SME is active (or vice-versa). Without these steps, it is possible for cache lines to exist for the same physical location but tagged both with and without the encryption bit. This can cause random memory corruption when caches are flushed depending on which cacheline is written last. Signed-off-by: Tom Lendacky <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: Dave Young <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Larry Woodman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Toshimitsu Kani <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/b95ff075db3e7cd545313f2fb609a49619a09625.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <[email protected]>
2017-07-12kdump: protect vmcoreinfo data under the crash memoryXunlei Pang1-0/+39
Currently vmcoreinfo data is updated at boot time subsys_initcall(), it has the risk of being modified by some wrong code during system is running. As a result, vmcore dumped may contain the wrong vmcoreinfo. Later on, when using "crash", "makedumpfile", etc utility to parse this vmcore, we probably will get "Segmentation fault" or other unexpected errors. E.g. 1) wrong code overwrites vmcoreinfo_data; 2) further crashes the system; 3) trigger kdump, then we obviously will fail to recognize the crash context correctly due to the corrupted vmcoreinfo. Now except for vmcoreinfo, all the crash data is well protected(including the cpu note which is fully updated in the crash path, thus its correctness is guaranteed). Given that vmcoreinfo data is a large chunk prepared for kdump, we better protect it as well. To solve this, we relocate and copy vmcoreinfo_data to the crash memory when kdump is loading via kexec syscalls. Because the whole crash memory will be protected by existing arch_kexec_protect_crashkres() mechanism, we naturally protect vmcoreinfo_data from write(even read) access under kernel direct mapping after kdump is loaded. Since kdump is usually loaded at the very early stage after boot, we can trust the correctness of the vmcoreinfo data copied. On the other hand, we still need to operate the vmcoreinfo safe copy when crash happens to generate vmcoreinfo_note again, we rely on vmap() to map out a new kernel virtual address and update to use this new one instead in the following crash_save_vmcoreinfo(). BTW, we do not touch vmcoreinfo_note, because it will be fully updated using the protected vmcoreinfo_data after crash which is surely correct just like the cpu crash note. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Xunlei Pang <[email protected]> Tested-by: Michael Holzheu <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Mahesh Salgaonkar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-06-30objtool, x86: Add several functions and files to the objtool whitelistJosh Poimboeuf1-1/+3
In preparation for an objtool rewrite which will have broader checks, whitelist functions and files which cause problems because they do unusual things with the stack. These whitelists serve as a TODO list for which functions and files don't yet have undwarf unwinder coverage. Eventually most of the whitelists can be removed in favor of manual CFI hint annotations or objtool improvements. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2017-05-08ia64: reuse append_elf_note() and final_note() functionsHari Bathini1-28/+0
Get rid of multiple definitions of append_elf_note() & final_note() functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also, define Elf_Word and use it instead of generic u32 or the more specific Elf64_Word. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Hari Bathini <[email protected]> Acked-by: Dave Young <[email protected]> Acked-by: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Mahesh Salgaonkar <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-08crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_COREHari Bathini1-403/+0
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and reuse crashkernel parameter for fadump", v4. Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. This patchset removes dependency with CONFIG_KEXEC for crashkernel parameter and vmcoreinfo related code as it can be reused without kexec support. Also, crashkernel parameter is reused instead of fadump_reserve_mem to reserve memory for fadump. The first patch moves crashkernel parameter parsing and vmcoreinfo related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The second patch reuses the definitions of append_elf_note() & final_note() functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump) in powerpc. The next patch reuses crashkernel parameter for reserving memory for fadump, instead of the fadump_reserve_mem parameter. This has the advantage of using all syntaxes crashkernel parameter supports, for fadump as well. The last patch updates fadump kernel documentation about use of crashkernel parameter. This patch (of 5): Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. But currently, code related to vmcoreinfo and parsing of crashkernel parameter is built under CONFIG_KEXEC_CORE. This patch introduces CONFIG_CRASH_CORE and moves the above mentioned code under this config, allowing code reuse without dependency on CONFIG_KEXEC. There is no functional change with this patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Hari Bathini <[email protected]> Acked-by: Dave Young <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Tony Luck <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Mahesh Salgaonkar <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven Rostedt as the printk reviewer. This idea came up after the discussion about printk issues at Kernel Summit. It was formulated and discussed at lkml[1]. - Extend a lock-less NMI per-cpu buffers idea to handle recursive printk() calls by Sergey Senozhatsky[2]. It is the first step in sanitizing printk as discussed at Kernel Summit. The change allows to see messages that would normally get ignored or would cause a deadlock. Also it allows to enable lockdep in printk(). This already paid off. The testing in linux-next helped to discover two old problems that were hidden before[3][4]. - Remove unused parameter by Sergey Senozhatsky. Clean up after a past change. [1] http://lkml.kernel.org/r/[email protected] [2] http://lkml.kernel.org/r/[email protected] [3] http://lkml.kernel.org/r/[email protected] [4] http://lkml.kernel.org/r/[email protected] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: drop call_console_drivers() unused param printk: convert the rest to printk-safe printk: remove zap_locks() function printk: use printk_safe buffers in printk printk: report lost messages in printk safe/nmi contexts printk: always use deferred printk when flush printk_safe lines printk: introduce per-cpu safe_print seq buffer printk: rename nmi.c and exported api printk: use vprintk_func in vprintk() MAINTAINERS: Add printk maintainers
2017-02-08printk: rename nmi.c and exported apiSergey Senozhatsky1-1/+1
A preparation patch for printk_safe work. No functional change. - rename nmi.c to print_safe.c - add `printk_safe' prefix to some (which used both by printk-safe and printk-nmi) of the exported functions. Link: http://lkml.kernel.org/r/[email protected] Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jan Kara <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Calvin Owens <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Hurley <[email protected]> Cc: [email protected] Signed-off-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Petr Mladek <[email protected]>
2017-01-11kexec: Switch to __pa_symbolLaura Abbott1-1/+1
__pa_symbol is the correct api to get the physical address of kernel symbols. Switch to it to allow for better debug checking. Reviewed-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Laura Abbott <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2016-12-14kexec: add cond_resched into kimage_alloc_crash_control_pageszhong jiang1-0/+2
A soft lookup will occur when I run trinity in syscall kexec_load. the corresponding stack information is as follows. BUG: soft lockup - CPU#6 stuck for 22s! [trinity-c6:13859] Kernel panic - not syncing: softlockup: hung tasks CPU: 6 PID: 13859 Comm: trinity-c6 Tainted: G O L ----V------- 3.10.0-327.28.3.35.zhongjiang.x86_64 #1 Hardware name: Huawei Technologies Co., Ltd. Tecal BH622 V2/BC01SRSA0, BIOS RMIBV386 06/30/2014 Call Trace: <IRQ> dump_stack+0x19/0x1b panic+0xd8/0x214 watchdog_timer_fn+0x1cc/0x1e0 __hrtimer_run_queues+0xd2/0x260 hrtimer_interrupt+0xb0/0x1e0 ? call_softirq+0x1c/0x30 local_apic_timer_interrupt+0x37/0x60 smp_apic_timer_interrupt+0x3f/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> ? kimage_alloc_control_pages+0x80/0x270 ? kmem_cache_alloc_trace+0x1ce/0x1f0 ? do_kimage_alloc_init+0x1f/0x90 kimage_alloc_init+0x12a/0x180 SyS_kexec_load+0x20a/0x260 system_call_fastpath+0x16/0x1b the first time allocation of control pages may take too much time because crash_res.end can be set to a higher value. we need to add cond_resched to avoid the issue. The patch have been tested and above issue is not appear. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhong jiang <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Xunlei Pang <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-14kexec: export the value of phys_base instead of symbol addressBaoquan He1-3/+0
Currently in x86_64, the symbol address of phys_base is exported to vmcoreinfo. Dave Anderson complained this is really useless for his Crash implementation. Because in user-space utility Crash and Makedumpfile which exported vmcore information is mainly used for, value of phys_base is needed to covert virtual address of exported kernel symbol to physical address. Especially init_level4_pgt, if we want to access and go over the page table to look up a PA corresponding to VA, firstly we need calculate page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base; Now in Crash and Makedumpfile, we have to analyze the vmcore elf program header to get value of phys_base. As Dave said, it would be preferable if it were readily availabl in vmcoreinfo rather than depending upon the PT_LOAD semantics. Hence in this patch change to export the value of phys_base instead of its virtual address. And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64 only, should be moved into arch dependent function arch_crash_save_vmcoreinfo. Do the moving in this patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Cc: Thomas Garnier <[email protected]> Cc: Baoquan He <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Xunlei Pang <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: Kees Cook <[email protected]> Cc: Eugene Surovegin <[email protected]> Cc: Dave Young <[email protected]> Cc: AKASHI Takahiro <[email protected]> Cc: Atsushi Kumagai <[email protected]> Cc: Dave Anderson <[email protected]> Cc: Pratyush Anand <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02kexec: add restriction on kexec_load() segment sizeszhong jiang1-0/+17
I hit the following issue when run trinity in my system. The kernel is 3.4 version, but mainline has the same issue. The root cause is that the segment size is too large so the kerenl spends too long trying to allocate a page. Other cases will block until the test case quits. Also, OOM conditions will occur. Call Trace: __alloc_pages_nodemask+0x14c/0x8f0 alloc_pages_current+0xaf/0x120 kimage_alloc_pages+0x10/0x60 kimage_alloc_control_pages+0x5d/0x270 machine_kexec_prepare+0xe5/0x6c0 ? kimage_free_page_list+0x52/0x70 sys_kexec_load+0x141/0x600 ? vfs_write+0x100/0x180 system_call_fastpath+0x16/0x1b The patch changes sanity_check_segment_list() to verify that the usage by all segments does not exceed half of memory. [[email protected]: fix for kexec-return-error-number-directly.patch, update comment] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhong jiang <[email protected]> Suggested-by: Eric W. Biederman <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02kexec: add a kexec_crash_loaded() functionPetr Tesarik1-0/+6
Provide a wrapper function to be used by kernel code to check whether a crash kernel is loaded. It returns the same value that can be seen in /sys/kernel/kexec_crash_loaded by userspace programs. I'm exporting the function, because it will be used by Xen, and it is possible to compile Xen modules separately to enable the use of PV drivers with unmodified bare-metal kernels. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Tesarik <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Eric Biederman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Young <[email protected]> Cc: David Vrabel <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02kexec: allow architectures to override boot mappingRussell King1-13/+13
kexec physical addresses are the boot-time view of the system. For certain ARM systems (such as Keystone 2), the boot view of the system does not match the kernel's view of the system: the boot view uses a special alias in the lower 4GB of the physical address space. To cater for these kinds of setups, we need to translate between the boot view physical addresses and the normal kernel view physical addresses. This patch extracts the current transation points into linux/kexec.h, and allows an architecture to override the functions. Due to the translations required, we unfortunately end up with six translation functions, which are reduced down to four that the architecture can override. [[email protected]: kexec.h needs asm/io.h for phys_to_virt()] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Russell King <[email protected]> Cc: Keerthy <[email protected]> Cc: Pratyush Anand <[email protected]> Cc: Vitaly Andrianov <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02kdump: arrange for paddr_vmcoreinfo_note() to return phys_addr_tRussell King1-1/+1
On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB physical on 32-bit architectures, so we need a wider type than "unsigned long" here. Arrange for paddr_vmcoreinfo_note() to return a phys_addr_t, thereby allowing it to be located above 4GB. This makes no difference for kexec-tools, as they already assume a 64-bit type when reading from this file. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Russell King <[email protected]> Reviewed-by: Pratyush Anand <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Keerthy <[email protected]> Cc: Vitaly Andrianov <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02kexec: ensure user memory sizes do not wrapRussell King1-0/+2
Ensure that user memory sizes do not wrap around when validating the user input, which can lead to the following input validation working incorrectly. [[email protected]: fix it for kexec-return-error-number-directly.patch] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Russell King <[email protected]> Reviewed-by: Pratyush Anand <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Keerthy <[email protected]> Cc: Vitaly Andrianov <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02kexec: return error number directlyMinfei Huang1-10/+6
This is a cleanup patch to make kexec more clear to return error number directly. The variable result is useless, because there is no other function's return value assignes to it. So remove it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minfei Huang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Xunlei Pang <[email protected]> Cc: Atsushi Kumagai <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-23s390/kexec: consolidate crash_map/unmap_reserved_pages() and ↵Xunlei Pang1-9/+2
arch_kexec_protect(unprotect)_crashkres() Commit 3f625002581b ("kexec: introduce a protection mechanism for the crashkernel reserved memory") is a similar mechanism for protecting the crash kernel reserved memory to previous crash_map/unmap_reserved_pages() implementation, the new one is more generic in name and cleaner in code (besides, some arch may not be allowed to unmap the pgtable). Therefore, this patch consolidates them, and uses the new arch_kexec_protect(unprotect)_crashkres() to replace former crash_map/unmap_reserved_pages() which by now has been only used by S390. The consolidation work needs the crash memory to be mapped initially, this is done in machine_kdump_pm_init() which is after reserve_crashkernel(). Once kdump kernel is loaded, the new arch_kexec_protect_crashkres() implemented for S390 will actually unmap the pgtable like before. Signed-off-by: Xunlei Pang <[email protected]> Signed-off-by: Michael Holzheu <[email protected]> Acked-by: Michael Holzheu <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Minfei Huang <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-23kexec: introduce a protection mechanism for the crashkernel reserved memoryXunlei Pang1-0/+6
For the cases that some kernel (module) path stamps the crash reserved memory(already mapped by the kernel) where has been loaded the second kernel data, the kdump kernel will probably fail to boot when panic happens (or even not happens) leaving the culprit at large, this is unacceptable. The patch introduces a mechanism for detecting such cases: 1) After each crash kexec loading, it simply marks the reserved memory regions readonly since we no longer access it after that. When someone stamps the region, the first kernel will panic and trigger the kdump. The weak arch_kexec_protect_crashkres() is introduced to do the actual protection. 2) To allow multiple loading, once 1) was done we also need to remark the reserved memory to readwrite each time a system call related to kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced to do the actual protection. The architecture can make its specific implementation by overriding arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres(). Signed-off-by: Xunlei Pang <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Dave Young <[email protected]> Cc: Minfei Huang <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-20printk/nmi: flush NMI messages on the system panicPetr Mladek1-0/+1
In NMI context, printk() messages are stored into per-CPU buffers to avoid a possible deadlock. They are normally flushed to the main ring buffer via an IRQ work. But the work is never called when the system calls panic() in the very same NMI handler. This patch tries to flush NMI buffers before the crash dump is generated. In this case it does not risk a double release and bails out when the logbuf_lock is already taken. The aim is to get the messages into the main ring buffer when possible. It makes them better accessible in the vmcore. Then the patch tries to flush the buffers second time when other CPUs are down. It might be more aggressive and reset logbuf_lock. The aim is to get the messages available for the consequent kmsg_dump() and console_flush_on_panic() calls. The patch causes vprintk_emit() to be called even in NMI context again. But it is done via printk_deferred() so that the console handling is skipped. Consoles use internal locks and we could not prevent a deadlock easily. They are explicitly called later when the crash dump is not generated, see console_flush_on_panic(). Signed-off-by: Petr Mladek <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Daniel Thompson <[email protected]> Cc: David Miller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Russell King <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-19mm: rename _count, field of the struct page, to _refcountJoonsoo Kim1-1/+1
Many developers already know that field for reference count of the struct page is _count and atomic type. They would try to handle it directly and this could break the purpose of page reference count tracepoint. To prevent direct _count modification, this patch rename it to _refcount and add warning message on the code. After that, developer who need to handle reference count will find that field should not be accessed directly. [[email protected]: fix comments, per Vlastimil] [[email protected]: Documentation/vm/transhuge.txt too] [[email protected]: sync ethernet driver changes] Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Berg <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Sunil Goutham <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Manish Chopra <[email protected]> Cc: Yuval Mintz <[email protected]> Cc: Tariq Toukan <[email protected]> Cc: Saeed Mahameed <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-04-28kexec: export OFFSET(page.compound_head) to find out compound tail pageAtsushi Kumagai1-0/+1
PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail page's page->mapping has just a poisoned data since commit 1c290f642101 ("mm: sanitize page->mapping for tail pages"). If makedumpfile checks page->mapping of a compound tail page to distinguish anonymous page as usual, it must fail in newer kernel. So it's necessary to export OFFSET(page.compound_head) to avoid checking compound tail pages. The problem is that unnecessary hugepages won't be removed from a dump file in kernels 4.5.x and later. This means that extra disk space would be consumed. It's a problem, but not critical. Signed-off-by: Atsushi Kumagai <[email protected]> Acked-by: Dave Young <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-04-28kexec: update VMCOREINFO for compound_order/dtorAtsushi Kumagai1-2/+4
makedumpfile refers page.lru.next to get the order of compound pages for page filtering. However, now the order is stored in page.compound_order, hence VMCOREINFO should be updated to export the offset of page.compound_order. The fact is, page.compound_order was introduced already in kernel 4.0, but the offset of it was the same as page.lru.next until kernel 4.3, so this was not actual problem. The above can be said also for page.lru.prev and page.compound_dtor, it's necessary to detect hugetlbfs pages. Further, the content was changed from direct address to the ID which means dtor. The problem is that unnecessary hugepages won't be removed from a dump file in kernels 4.4.x and later. This means that extra disk space would be consumed. It's a problem, but not critical. Signed-off-by: Atsushi Kumagai <[email protected]> Acked-by: Dave Young <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-30kexec: Set IORESOURCE_SYSTEM_RAM for System RAMToshi Kani1-3/+5
Set proper ioresource flags and types for crash kernel reservation areas. Signed-off-by: Toshi Kani <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Dave Young <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Minfei Huang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: [email protected] Cc: [email protected] Cc: linux-mm <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-20kernel/kexec_core.c: use list_for_each_entry_safe in kimage_free_page_listGeliang Tang1-5/+2
Use list_for_each_entry_safe() instead of list_for_each_safe() to simplify the code. Signed-off-by: Geliang Tang <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Acked-by: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-12-19kexec: Fix race between panic() and crash_kexec()Hidehiro Kawai1-1/+29
Currently, panic() and crash_kexec() can be called at the same time. For example (x86 case): CPU 0: oops_end() crash_kexec() mutex_trylock() // acquired nmi_shootdown_cpus() // stop other CPUs CPU 1: panic() crash_kexec() mutex_trylock() // failed to acquire smp_send_stop() // stop other CPUs infinite loop If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump fails. In another case: CPU 0: oops_end() crash_kexec() mutex_trylock() // acquired <NMI> io_check_error() panic() crash_kexec() mutex_trylock() // failed to acquire infinite loop Clearly, this is an undesirable result. To fix this problem, this patch changes crash_kexec() to exclude others by using the panic_cpu atomic. Signed-off-by: Hidehiro Kawai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Martin Schwidefsky <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Minfei Huang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: x86-ml <[email protected]> Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrs Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2015-11-06kexec: use file name as the output message prefixMinfei Huang1-2/+2
kexec output message misses the prefix "kexec", when Dave Young split the kexec code. Now, we use file name as the output message prefix. Currently, the format of output message: [ 140.290795] SYSC_kexec_load: hello, world [ 140.291534] kexec: sanity_check_segment_list: hello, world Ideally, the format of output message: [ 30.791503] kexec: SYSC_kexec_load, Hello, world [ 79.182752] kexec_core: sanity_check_segment_list, Hello, world Remove the custom prefix "kexec" in output message. Signed-off-by: Minfei Huang <[email protected]> Acked-by: Dave Young <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-10-21kexec/crash: Say which char is the unrecognizedBorislav Petkov1-3/+3
It is helpful when the crashkernel cmdline parsing routines actually say which character is the unrecognized one. Make them do so. Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Dave Young <[email protected]> Reviewed-by: Joerg Roedel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Baoquan He <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salter <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: WANG Chao <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-09-10kexec: export KERNEL_IMAGE_SIZE to vmcoreinfoBaoquan He1-0/+3
In x86_64, since v2.6.26 the KERNEL_IMAGE_SIZE is changed to 512M, and accordingly the MODULES_VADDR is changed to 0xffffffffa0000000. However, in v3.12 Kees Cook introduced kaslr to randomise the location of kernel. And the kernel text mapping addr space is enlarged from 512M to 1G. That means now KERNEL_IMAGE_SIZE is variable, its value is 512M when kaslr support is not compiled in and 1G when kaslr support is compiled in. Accordingly the MODULES_VADDR is changed too to be: #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) So when kaslr is compiled in and enabled, the kernel text mapping addr space and modules vaddr space need be adjusted. Otherwise makedumpfile will collapse since the addr for some symbols is not correct. Hence KERNEL_IMAGE_SIZE need be exported to vmcoreinfo and got in makedumpfile to help calculate MODULES_VADDR. Signed-off-by: Baoquan He <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-10kexec: align crash_notes allocation to make it be inside one physical pageBaoquan He1-1/+22
People reported that crash_notes in /proc/vmcore were corrupted and this cause crash kdump failure. With code debugging and log we got the root cause. This is because percpu variable crash_notes are allocated in 2 vmalloc pages. Currently percpu is based on vmalloc by default. Vmalloc can't guarantee 2 continuous vmalloc pages are also on 2 continuous physical pages. So when 1st kernel exports the starting address and size of crash_notes through sysfs like below: /sys/devices/system/cpu/cpux/crash_notes /sys/devices/system/cpu/cpux/crash_notes_size kdump kernel use them to get the content of crash_notes. However the 2nd part may not be in the next neighbouring physical page as we expected if crash_notes are allocated accross 2 vmalloc pages. That's why nhdr_ptr->n_namesz or nhdr_ptr->n_descsz could be very huge in update_note_header_size_elf64() and cause note header merging failure or some warnings. In this patch change to call __alloc_percpu() to passed in the align value by rounding crash_notes_size up to the nearest power of two. This makes sure the crash_notes is allocated inside one physical page since sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE. Meanwhile add a BUILD_BUG_ON to break compile if size is bigger than PAGE_SIZE since crash_notes definitely will be in 2 pages. That need be avoided, and need be reported if it's unavoidable. [[email protected]: use correct comment layout] Signed-off-by: Baoquan He <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Lisa Mitchell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>