blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2018-12-28	mm: convert totalram_pages and totalhigh_pages variables to atomic	Arun KS	1	-1/+1
	totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [[email protected]: coding style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arun KS <[email protected]> Suggested-by: Michal Hocko <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Reviewed-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-12-28	mm: reference totalram_pages and managed_pages once per function	Arun KS	1	-2/+3
	Patch series "mm: convert totalram_pages, totalhigh_pages and managed pages to atomic", v5. This series converts totalram_pages, totalhigh_pages and zone->managed_pages to atomic variables. totalram_pages, zone->managed_pages and totalhigh_pages updates are protected by managed_page_count_lock, but readers never care about it. Convert these variables to atomic to avoid readers potentially seeing a store tear. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better to remove the lock and convert variables to atomic. With the change, preventing poteintial store-to-read tearing comes as a bonus. This patch (of 4): This is in preparation to a later patch which converts totalram_pages and zone->managed_pages to atomic variables. Please note that re-reading the value might lead to a different value and as such it could lead to unexpected behavior. There are no known bugs as a result of the current code but it is better to prevent from them in principle. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arun KS <[email protected]> Reviewed-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-06	kexec: Allocate decrypted control pages for kdump if SME is enabled	Lianbo Jiang	1	-0/+6
	When SME is enabled in the first kernel, it needs to allocate decrypted pages for kdump because when the kdump kernel boots, these pages need to be accessed decrypted in the initial boot stage, before SME is enabled. [ bp: clean up text. ] Signed-off-by: Lianbo Jiang <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-06-15	kexec: yield to scheduler when loading kimage segments	Jarrett Farnitano	1	-0/+4
	Without yielding while loading kimage segments, a large initrd will block all other work on the CPU performing the load until it is completed. For example loading an initrd of 200MB on a low power single core system will lock up the system for a few seconds. To increase system responsiveness to other tasks at that time, call cond_resched() in both the crash kernel and normal kernel segment loading loops. I did run into a practical problem. Hardware watchdogs on embedded systems can have short timers on the order of seconds. If the system is locked up for a few seconds with only a single core available, the watchdog may not be pet in a timely fashion. If this happens, the hardware watchdog will fire and reset the system. This really only becomes a problem when you are working with a single core, a decently sized initrd, and have a constrained hardware watchdog. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jarrett Farnitano <[email protected]> Reviewed-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-18	x86/mm, kexec: Allow kexec to be used with SME	Tom Lendacky	1	-1/+11
	Provide support so that kexec can be used to boot a kernel when SME is enabled. Support is needed to allocate pages for kexec without encryption. This is needed in order to be able to reboot in the kernel in the same manner as originally booted. Additionally, when shutting down all of the CPUs we need to be sure to flush the caches and then halt. This is needed when booting from a state where SME was not active into a state where SME is active (or vice-versa). Without these steps, it is possible for cache lines to exist for the same physical location but tagged both with and without the encryption bit. This can cause random memory corruption when caches are flushed depending on which cacheline is written last. Signed-off-by: Tom Lendacky <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: Dave Young <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Larry Woodman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Toshimitsu Kani <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/b95ff075db3e7cd545313f2fb609a49619a09625.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <[email protected]>
2017-07-12	kdump: protect vmcoreinfo data under the crash memory	Xunlei Pang	1	-0/+39
	Currently vmcoreinfo data is updated at boot time subsys_initcall(), it has the risk of being modified by some wrong code during system is running. As a result, vmcore dumped may contain the wrong vmcoreinfo. Later on, when using "crash", "makedumpfile", etc utility to parse this vmcore, we probably will get "Segmentation fault" or other unexpected errors. E.g. 1) wrong code overwrites vmcoreinfo_data; 2) further crashes the system; 3) trigger kdump, then we obviously will fail to recognize the crash context correctly due to the corrupted vmcoreinfo. Now except for vmcoreinfo, all the crash data is well protected(including the cpu note which is fully updated in the crash path, thus its correctness is guaranteed). Given that vmcoreinfo data is a large chunk prepared for kdump, we better protect it as well. To solve this, we relocate and copy vmcoreinfo_data to the crash memory when kdump is loading via kexec syscalls. Because the whole crash memory will be protected by existing arch_kexec_protect_crashkres() mechanism, we naturally protect vmcoreinfo_data from write(even read) access under kernel direct mapping after kdump is loaded. Since kdump is usually loaded at the very early stage after boot, we can trust the correctness of the vmcoreinfo data copied. On the other hand, we still need to operate the vmcoreinfo safe copy when crash happens to generate vmcoreinfo_note again, we rely on vmap() to map out a new kernel virtual address and update to use this new one instead in the following crash_save_vmcoreinfo(). BTW, we do not touch vmcoreinfo_note, because it will be fully updated using the protected vmcoreinfo_data after crash which is surely correct just like the cpu crash note. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Xunlei Pang <[email protected]> Tested-by: Michael Holzheu <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Mahesh Salgaonkar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-06-30	objtool, x86: Add several functions and files to the objtool whitelist	Josh Poimboeuf	1	-1/+3
	In preparation for an objtool rewrite which will have broader checks, whitelist functions and files which cause problems because they do unusual things with the stack. These whitelists serve as a TODO list for which functions and files don't yet have undwarf unwinder coverage. Eventually most of the whitelists can be removed in favor of manual CFI hint annotations or objtool improvements. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2017-05-08	ia64: reuse append_elf_note() and final_note() functions	Hari Bathini	1	-28/+0
	Get rid of multiple definitions of append_elf_note() & final_note() functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also, define Elf_Word and use it instead of generic u32 or the more specific Elf64_Word. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Hari Bathini <[email protected]> Acked-by: Dave Young <[email protected]> Acked-by: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Mahesh Salgaonkar <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-08	crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE	Hari Bathini	1	-403/+0
	Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and reuse crashkernel parameter for fadump", v4. Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. This patchset removes dependency with CONFIG_KEXEC for crashkernel parameter and vmcoreinfo related code as it can be reused without kexec support. Also, crashkernel parameter is reused instead of fadump_reserve_mem to reserve memory for fadump. The first patch moves crashkernel parameter parsing and vmcoreinfo related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The second patch reuses the definitions of append_elf_note() & final_note() functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump) in powerpc. The next patch reuses crashkernel parameter for reserving memory for fadump, instead of the fadump_reserve_mem parameter. This has the advantage of using all syntaxes crashkernel parameter supports, for fadump as well. The last patch updates fadump kernel documentation about use of crashkernel parameter. This patch (of 5): Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. But currently, code related to vmcoreinfo and parsing of crashkernel parameter is built under CONFIG_KEXEC_CORE. This patch introduces CONFIG_CRASH_CORE and moves the above mentioned code under this config, allowing code reuse without dependency on CONFIG_KEXEC. There is no functional change with this patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Hari Bathini <[email protected]> Acked-by: Dave Young <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Tony Luck <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Mahesh Salgaonkar <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	Merge branch 'for-linus' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven Rostedt as the printk reviewer. This idea came up after the discussion about printk issues at Kernel Summit. It was formulated and discussed at lkml[1]. - Extend a lock-less NMI per-cpu buffers idea to handle recursive printk() calls by Sergey Senozhatsky[2]. It is the first step in sanitizing printk as discussed at Kernel Summit. The change allows to see messages that would normally get ignored or would cause a deadlock. Also it allows to enable lockdep in printk(). This already paid off. The testing in linux-next helped to discover two old problems that were hidden before[3][4]. - Remove unused parameter by Sergey Senozhatsky. Clean up after a past change. [1] http://lkml.kernel.org/r/[email protected] [2] http://lkml.kernel.org/r/[email protected] [3] http://lkml.kernel.org/r/[email protected] [4] http://lkml.kernel.org/r/[email protected] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: drop call_console_drivers() unused param printk: convert the rest to printk-safe printk: remove zap_locks() function printk: use printk_safe buffers in printk printk: report lost messages in printk safe/nmi contexts printk: always use deferred printk when flush printk_safe lines printk: introduce per-cpu safe_print seq buffer printk: rename nmi.c and exported api printk: use vprintk_func in vprintk() MAINTAINERS: Add printk maintainers
2017-02-08	printk: rename nmi.c and exported api	Sergey Senozhatsky	1	-1/+1
	A preparation patch for printk_safe work. No functional change. - rename nmi.c to print_safe.c - add `printk_safe' prefix to some (which used both by printk-safe and printk-nmi) of the exported functions. Link: http://lkml.kernel.org/r/[email protected] Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jan Kara <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Calvin Owens <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Hurley <[email protected]> Cc: [email protected] Signed-off-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Petr Mladek <[email protected]>
2017-01-11	kexec: Switch to __pa_symbol	Laura Abbott	1	-1/+1
	__pa_symbol is the correct api to get the physical address of kernel symbols. Switch to it to allow for better debug checking. Reviewed-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Laura Abbott <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2016-12-14	kexec: add cond_resched into kimage_alloc_crash_control_pages	zhong jiang	1	-0/+2
	A soft lookup will occur when I run trinity in syscall kexec_load. the corresponding stack information is as follows. BUG: soft lockup - CPU#6 stuck for 22s! [trinity-c6:13859] Kernel panic - not syncing: softlockup: hung tasks CPU: 6 PID: 13859 Comm: trinity-c6 Tainted: G O L ----V------- 3.10.0-327.28.3.35.zhongjiang.x86_64 #1 Hardware name: Huawei Technologies Co., Ltd. Tecal BH622 V2/BC01SRSA0, BIOS RMIBV386 06/30/2014 Call Trace: <IRQ> dump_stack+0x19/0x1b panic+0xd8/0x214 watchdog_timer_fn+0x1cc/0x1e0 __hrtimer_run_queues+0xd2/0x260 hrtimer_interrupt+0xb0/0x1e0 ? call_softirq+0x1c/0x30 local_apic_timer_interrupt+0x37/0x60 smp_apic_timer_interrupt+0x3f/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> ? kimage_alloc_control_pages+0x80/0x270 ? kmem_cache_alloc_trace+0x1ce/0x1f0 ? do_kimage_alloc_init+0x1f/0x90 kimage_alloc_init+0x12a/0x180 SyS_kexec_load+0x20a/0x260 system_call_fastpath+0x16/0x1b the first time allocation of control pages may take too much time because crash_res.end can be set to a higher value. we need to add cond_resched to avoid the issue. The patch have been tested and above issue is not appear. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhong jiang <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Xunlei Pang <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-14	kexec: export the value of phys_base instead of symbol address	Baoquan He	1	-3/+0
	Currently in x86_64, the symbol address of phys_base is exported to vmcoreinfo. Dave Anderson complained this is really useless for his Crash implementation. Because in user-space utility Crash and Makedumpfile which exported vmcore information is mainly used for, value of phys_base is needed to covert virtual address of exported kernel symbol to physical address. Especially init_level4_pgt, if we want to access and go over the page table to look up a PA corresponding to VA, firstly we need calculate page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base; Now in Crash and Makedumpfile, we have to analyze the vmcore elf program header to get value of phys_base. As Dave said, it would be preferable if it were readily availabl in vmcoreinfo rather than depending upon the PT_LOAD semantics. Hence in this patch change to export the value of phys_base instead of its virtual address. And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64 only, should be moved into arch dependent function arch_crash_save_vmcoreinfo. Do the moving in this patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Cc: Thomas Garnier <[email protected]> Cc: Baoquan He <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Xunlei Pang <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: Kees Cook <[email protected]> Cc: Eugene Surovegin <[email protected]> Cc: Dave Young <[email protected]> Cc: AKASHI Takahiro <[email protected]> Cc: Atsushi Kumagai <[email protected]> Cc: Dave Anderson <[email protected]> Cc: Pratyush Anand <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02	kexec: add restriction on kexec_load() segment sizes	zhong jiang	1	-0/+17
	I hit the following issue when run trinity in my system. The kernel is 3.4 version, but mainline has the same issue. The root cause is that the segment size is too large so the kerenl spends too long trying to allocate a page. Other cases will block until the test case quits. Also, OOM conditions will occur. Call Trace: __alloc_pages_nodemask+0x14c/0x8f0 alloc_pages_current+0xaf/0x120 kimage_alloc_pages+0x10/0x60 kimage_alloc_control_pages+0x5d/0x270 machine_kexec_prepare+0xe5/0x6c0 ? kimage_free_page_list+0x52/0x70 sys_kexec_load+0x141/0x600 ? vfs_write+0x100/0x180 system_call_fastpath+0x16/0x1b The patch changes sanity_check_segment_list() to verify that the usage by all segments does not exceed half of memory. [[email protected]: fix for kexec-return-error-number-directly.patch, update comment] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhong jiang <[email protected]> Suggested-by: Eric W. Biederman <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02	kexec: add a kexec_crash_loaded() function	Petr Tesarik	1	-0/+6
	Provide a wrapper function to be used by kernel code to check whether a crash kernel is loaded. It returns the same value that can be seen in /sys/kernel/kexec_crash_loaded by userspace programs. I'm exporting the function, because it will be used by Xen, and it is possible to compile Xen modules separately to enable the use of PV drivers with unmodified bare-metal kernels. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Tesarik <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Eric Biederman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Young <[email protected]> Cc: David Vrabel <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02	kexec: allow architectures to override boot mapping	Russell King	1	-13/+13
	kexec physical addresses are the boot-time view of the system. For certain ARM systems (such as Keystone 2), the boot view of the system does not match the kernel's view of the system: the boot view uses a special alias in the lower 4GB of the physical address space. To cater for these kinds of setups, we need to translate between the boot view physical addresses and the normal kernel view physical addresses. This patch extracts the current transation points into linux/kexec.h, and allows an architecture to override the functions. Due to the translations required, we unfortunately end up with six translation functions, which are reduced down to four that the architecture can override. [[email protected]: kexec.h needs asm/io.h for phys_to_virt()] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Russell King <[email protected]> Cc: Keerthy <[email protected]> Cc: Pratyush Anand <[email protected]> Cc: Vitaly Andrianov <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02	kdump: arrange for paddr_vmcoreinfo_note() to return phys_addr_t	Russell King	1	-1/+1
	On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB physical on 32-bit architectures, so we need a wider type than "unsigned long" here. Arrange for paddr_vmcoreinfo_note() to return a phys_addr_t, thereby allowing it to be located above 4GB. This makes no difference for kexec-tools, as they already assume a 64-bit type when reading from this file. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Russell King <[email protected]> Reviewed-by: Pratyush Anand <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Keerthy <[email protected]> Cc: Vitaly Andrianov <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02	kexec: ensure user memory sizes do not wrap	Russell King	1	-0/+2
	Ensure that user memory sizes do not wrap around when validating the user input, which can lead to the following input validation working incorrectly. [[email protected]: fix it for kexec-return-error-number-directly.patch] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Russell King <[email protected]> Reviewed-by: Pratyush Anand <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Keerthy <[email protected]> Cc: Vitaly Andrianov <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-02	kexec: return error number directly	Minfei Huang	1	-10/+6
	This is a cleanup patch to make kexec more clear to return error number directly. The variable result is useless, because there is no other function's return value assignes to it. So remove it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minfei Huang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Xunlei Pang <[email protected]> Cc: Atsushi Kumagai <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-23	s390/kexec: consolidate crash_map/unmap_reserved_pages() and ↵	Xunlei Pang	1	-9/+2
	arch_kexec_protect(unprotect)_crashkres() Commit 3f625002581b ("kexec: introduce a protection mechanism for the crashkernel reserved memory") is a similar mechanism for protecting the crash kernel reserved memory to previous crash_map/unmap_reserved_pages() implementation, the new one is more generic in name and cleaner in code (besides, some arch may not be allowed to unmap the pgtable). Therefore, this patch consolidates them, and uses the new arch_kexec_protect(unprotect)_crashkres() to replace former crash_map/unmap_reserved_pages() which by now has been only used by S390. The consolidation work needs the crash memory to be mapped initially, this is done in machine_kdump_pm_init() which is after reserve_crashkernel(). Once kdump kernel is loaded, the new arch_kexec_protect_crashkres() implemented for S390 will actually unmap the pgtable like before. Signed-off-by: Xunlei Pang <[email protected]> Signed-off-by: Michael Holzheu <[email protected]> Acked-by: Michael Holzheu <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Minfei Huang <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-23	kexec: introduce a protection mechanism for the crashkernel reserved memory	Xunlei Pang	1	-0/+6
	For the cases that some kernel (module) path stamps the crash reserved memory(already mapped by the kernel) where has been loaded the second kernel data, the kdump kernel will probably fail to boot when panic happens (or even not happens) leaving the culprit at large, this is unacceptable. The patch introduces a mechanism for detecting such cases: 1) After each crash kexec loading, it simply marks the reserved memory regions readonly since we no longer access it after that. When someone stamps the region, the first kernel will panic and trigger the kdump. The weak arch_kexec_protect_crashkres() is introduced to do the actual protection. 2) To allow multiple loading, once 1) was done we also need to remark the reserved memory to readwrite each time a system call related to kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced to do the actual protection. The architecture can make its specific implementation by overriding arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres(). Signed-off-by: Xunlei Pang <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Dave Young <[email protected]> Cc: Minfei Huang <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-20	printk/nmi: flush NMI messages on the system panic	Petr Mladek	1	-0/+1
	In NMI context, printk() messages are stored into per-CPU buffers to avoid a possible deadlock. They are normally flushed to the main ring buffer via an IRQ work. But the work is never called when the system calls panic() in the very same NMI handler. This patch tries to flush NMI buffers before the crash dump is generated. In this case it does not risk a double release and bails out when the logbuf_lock is already taken. The aim is to get the messages into the main ring buffer when possible. It makes them better accessible in the vmcore. Then the patch tries to flush the buffers second time when other CPUs are down. It might be more aggressive and reset logbuf_lock. The aim is to get the messages available for the consequent kmsg_dump() and console_flush_on_panic() calls. The patch causes vprintk_emit() to be called even in NMI context again. But it is done via printk_deferred() so that the console handling is skipped. Consoles use internal locks and we could not prevent a deadlock easily. They are explicitly called later when the crash dump is not generated, see console_flush_on_panic(). Signed-off-by: Petr Mladek <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Daniel Thompson <[email protected]> Cc: David Miller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Russell King <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-19	mm: rename _count, field of the struct page, to _refcount	Joonsoo Kim	1	-1/+1
	Many developers already know that field for reference count of the struct page is _count and atomic type. They would try to handle it directly and this could break the purpose of page reference count tracepoint. To prevent direct _count modification, this patch rename it to _refcount and add warning message on the code. After that, developer who need to handle reference count will find that field should not be accessed directly. [[email protected]: fix comments, per Vlastimil] [[email protected]: Documentation/vm/transhuge.txt too] [[email protected]: sync ethernet driver changes] Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Berg <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Sunil Goutham <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Manish Chopra <[email protected]> Cc: Yuval Mintz <[email protected]> Cc: Tariq Toukan <[email protected]> Cc: Saeed Mahameed <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-04-28	kexec: export OFFSET(page.compound_head) to find out compound tail page	Atsushi Kumagai	1	-0/+1
	PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail page's page->mapping has just a poisoned data since commit 1c290f642101 ("mm: sanitize page->mapping for tail pages"). If makedumpfile checks page->mapping of a compound tail page to distinguish anonymous page as usual, it must fail in newer kernel. So it's necessary to export OFFSET(page.compound_head) to avoid checking compound tail pages. The problem is that unnecessary hugepages won't be removed from a dump file in kernels 4.5.x and later. This means that extra disk space would be consumed. It's a problem, but not critical. Signed-off-by: Atsushi Kumagai <[email protected]> Acked-by: Dave Young <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-04-28	kexec: update VMCOREINFO for compound_order/dtor	Atsushi Kumagai	1	-2/+4
	makedumpfile refers page.lru.next to get the order of compound pages for page filtering. However, now the order is stored in page.compound_order, hence VMCOREINFO should be updated to export the offset of page.compound_order. The fact is, page.compound_order was introduced already in kernel 4.0, but the offset of it was the same as page.lru.next until kernel 4.3, so this was not actual problem. The above can be said also for page.lru.prev and page.compound_dtor, it's necessary to detect hugetlbfs pages. Further, the content was changed from direct address to the ID which means dtor. The problem is that unnecessary hugepages won't be removed from a dump file in kernels 4.4.x and later. This means that extra disk space would be consumed. It's a problem, but not critical. Signed-off-by: Atsushi Kumagai <[email protected]> Acked-by: Dave Young <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-30	kexec: Set IORESOURCE_SYSTEM_RAM for System RAM	Toshi Kani	1	-3/+5
	Set proper ioresource flags and types for crash kernel reservation areas. Signed-off-by: Toshi Kani <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Dave Young <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Minfei Huang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: [email protected] Cc: [email protected] Cc: linux-mm <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-20	kernel/kexec_core.c: use list_for_each_entry_safe in kimage_free_page_list	Geliang Tang	1	-5/+2
	Use list_for_each_entry_safe() instead of list_for_each_safe() to simplify the code. Signed-off-by: Geliang Tang <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Acked-by: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-12-19	kexec: Fix race between panic() and crash_kexec()	Hidehiro Kawai	1	-1/+29
	Currently, panic() and crash_kexec() can be called at the same time. For example (x86 case): CPU 0: oops_end() crash_kexec() mutex_trylock() // acquired nmi_shootdown_cpus() // stop other CPUs CPU 1: panic() crash_kexec() mutex_trylock() // failed to acquire smp_send_stop() // stop other CPUs infinite loop If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump fails. In another case: CPU 0: oops_end() crash_kexec() mutex_trylock() // acquired <NMI> io_check_error() panic() crash_kexec() mutex_trylock() // failed to acquire infinite loop Clearly, this is an undesirable result. To fix this problem, this patch changes crash_kexec() to exclude others by using the panic_cpu atomic. Signed-off-by: Hidehiro Kawai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Martin Schwidefsky <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Minfei Huang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: x86-ml <[email protected]> Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrs Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2015-11-06	kexec: use file name as the output message prefix	Minfei Huang	1	-2/+2
	kexec output message misses the prefix "kexec", when Dave Young split the kexec code. Now, we use file name as the output message prefix. Currently, the format of output message: [ 140.290795] SYSC_kexec_load: hello, world [ 140.291534] kexec: sanity_check_segment_list: hello, world Ideally, the format of output message: [ 30.791503] kexec: SYSC_kexec_load, Hello, world [ 79.182752] kexec_core: sanity_check_segment_list, Hello, world Remove the custom prefix "kexec" in output message. Signed-off-by: Minfei Huang <[email protected]> Acked-by: Dave Young <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-10-21	kexec/crash: Say which char is the unrecognized	Borislav Petkov	1	-3/+3
	It is helpful when the crashkernel cmdline parsing routines actually say which character is the unrecognized one. Make them do so. Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Dave Young <[email protected]> Reviewed-by: Joerg Roedel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Baoquan He <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salter <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: WANG Chao <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-09-10	kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo	Baoquan He	1	-0/+3
	In x86_64, since v2.6.26 the KERNEL_IMAGE_SIZE is changed to 512M, and accordingly the MODULES_VADDR is changed to 0xffffffffa0000000. However, in v3.12 Kees Cook introduced kaslr to randomise the location of kernel. And the kernel text mapping addr space is enlarged from 512M to 1G. That means now KERNEL_IMAGE_SIZE is variable, its value is 512M when kaslr support is not compiled in and 1G when kaslr support is compiled in. Accordingly the MODULES_VADDR is changed too to be: #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) So when kaslr is compiled in and enabled, the kernel text mapping addr space and modules vaddr space need be adjusted. Otherwise makedumpfile will collapse since the addr for some symbols is not correct. Hence KERNEL_IMAGE_SIZE need be exported to vmcoreinfo and got in makedumpfile to help calculate MODULES_VADDR. Signed-off-by: Baoquan He <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-10	kexec: align crash_notes allocation to make it be inside one physical page	Baoquan He	1	-1/+22
	People reported that crash_notes in /proc/vmcore were corrupted and this cause crash kdump failure. With code debugging and log we got the root cause. This is because percpu variable crash_notes are allocated in 2 vmalloc pages. Currently percpu is based on vmalloc by default. Vmalloc can't guarantee 2 continuous vmalloc pages are also on 2 continuous physical pages. So when 1st kernel exports the starting address and size of crash_notes through sysfs like below: /sys/devices/system/cpu/cpux/crash_notes /sys/devices/system/cpu/cpux/crash_notes_size kdump kernel use them to get the content of crash_notes. However the 2nd part may not be in the next neighbouring physical page as we expected if crash_notes are allocated accross 2 vmalloc pages. That's why nhdr_ptr->n_namesz or nhdr_ptr->n_descsz could be very huge in update_note_header_size_elf64() and cause note header merging failure or some warnings. In this patch change to call __alloc_percpu() to passed in the align value by rounding crash_notes_size up to the nearest power of two. This makes sure the crash_notes is allocated inside one physical page since sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE. Meanwhile add a BUILD_BUG_ON to break compile if size is bigger than PAGE_SIZE since crash_notes definitely will be in 2 pages. That need be avoided, and need be reported if it's unavoidable. [[email protected]: use correct comment layout] Signed-off-by: Baoquan He <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Cc: Lisa Mitchell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-10	kexec: remove unnecessary test in kimage_alloc_crash_control_pages()	Minfei Huang	1	-2/+1
	Transforming PFN(Page Frame Number) to struct page is never failure, so we can simplify the code logic to do the image->control_page assignment directly in the loop, and remove the unnecessary conditional judgement. Signed-off-by: Minfei Huang <[email protected]> Acked-by: Dave Young <[email protected]> Acked-by: Vivek Goyal <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-10	kexec: split kexec_load syscall from kexec core code	Dave Young	1	-0/+1511
	There are two kexec load syscalls, kexec_load another and kexec_file_load. kexec_file_load has been splited as kernel/kexec_file.c. In this patch I split kexec_load syscall code to kernel/kexec.c. And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and use kexec_file_load only, or vice verse. The original requirement is from Ted Ts'o, he want kexec kernel signature being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use kexec_load syscall can bypass the checking. Vivek Goyal proposed to create a common kconfig option so user can compile in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects KEXEC_CORE so that old config files still work. Because there's general code need CONFIG_KEXEC_CORE, so I updated all the architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects KEXEC_CORE in arch Kconfig. Also updated general kernel code with to kexec_load syscall. [[email protected]: coding-style fixes] Signed-off-by: Dave Young <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Petr Tesarik <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Josh Boyer <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>