Age | Commit message (Collapse) | Author | Files | Lines |
|
Use the fwnode to create a named domain so diagnosis works, but only when
the the ioapic is not device tree based.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Use the fwnode to create a named domain so diagnosis works.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Add the missing name, so debugging will work proper.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
tsc_clocksource_reliable is initialized in check_system_tsc_reliable(), but
it is checked in unsynchronized_tsc() which is called before the
initialization.
In practice that's not an issue because systems which mark the TSC
reliable have X86_FEATURE_CONSTANT_TSC set as well, which is evaluated
in unsynchronized_tsc() before tsc_clocksource_reliable.
Reorder the calls so initialization happens before usage.
[ tglx: Massaged changelog ]
Signed-off-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/b1532ef7-cd9f-45f7-9f49-48dd2a5c2495@default
|
|
It was found that SMI_TRESHOLD of 50000 is not enough for Hyper-V
guests in nested environment and falling back to counting jiffies
is not an option for Gen2 guests as they don't have PIT. As Hyper-V
provides TSC frequency in a synthetic MSR we can just use this information
instead of doing a error prone calibration.
Reported-and-tested-by: Ladi Prosek <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Jork Loeser <[email protected]>
Cc: [email protected]
Cc: "K. Y. Srinivasan" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Hyper-V TLFS specifies two bits which should be checked before accessing
frequency MSRs:
- AccessFrequencyMsrs (BIT(11) in EAX) which indicates if we have access to
frequency MSRs.
- FrequencyMsrsAvailable (BIT(8) in EDX) which indicates is these MSRs are
present.
Rename and specify these bits accordingly.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Ladi Prosek <[email protected]>
Cc: Jork Loeser <[email protected]>
Cc: [email protected]
Cc: "K. Y. Srinivasan" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The following change in 2013:
0212f9159694 ("x86: Add Crash kernel low reservation")
... introduced reserve_crashkernel_low(). This function is used to
reserve crash kernel memory either if crashkernel=size,low is given
on the command line or if the region reserved by reserve_crashkernel
is entirely above 4G.
reserve_crashkernel_low() tries to find a block of 'low_size' bytes.
But there seems to be no good reason to restrict the lower bound
of the range to 'low_size'.
Make memblock_find_in_range() search from the start of memory.
Signed-off-by: Jiri Bohac <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yinghai Lu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The only call site also calls idle_task_exit(), and idle_task_exit()
puts us into a clean state by explicitly switching to init_mm.
Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/3acc7ad02a2ec060d2321a1e0f6de1cb90069517.1498022414.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
These two functions are only called by arch_early_irq_init(), which
is an __init function, so mark them __init as well.
Signed-off-by: Dou Liyang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This function is only called by arch_early_irq_init(), which is an
__init function, so mark the child function __init as well.
In addition mark it inline for the !CONFIG_X86_IO_APIC case.
Signed-off-by: Dou Liyang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When running under Xen as dom0, /dev/mcelog is being provided by Xen
instead of the normal mcelog character device of the MCE core. Convert
an error message being issued by the MCE core in this case to an
informative message that Xen has registered the device.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Put __startup_64() and fixup_pointer() into .head.text section to make
sure it's always near startup_64() and always callable.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Normally, when the initrd is gone, we can't search it for microcode
blobs to apply anymore. For that we need to stash away the patch in our
own storage.
And save_microcode_in_initrd_intel() looks like the proper place to
do that from. So in order for early loading to work, invalidate the
intel_ucode_patch pointer to the patch *before* scanning the initrd one
last time.
If the scanning code finds a microcode patch, it will assign that
pointer again, this time with our own storage's address.
This way, early microcode application during resume-from-RAM works too,
even after the initrd is long gone.
Tested-by: Dominik Brodowski <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Early during boot, the BSP finds the ramdisk's position from boot_params
but by the time the APs get to boot, the BSP has continued in the mean
time and has potentially managed to relocate that ramdisk.
And in that case, the APs need to find the ramdisk at its new position,
in *physical* memory as they're running before paging has been enabled.
Thus, get the updated physical location of the ramdisk which is in the
relocated_ramdisk variable.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We're supposed to exit the loop with "timeout" set to zero.
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Don Zickus <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: 99e8b9ca90d6 ("x86, NMI: Add NMI IPI selftest")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Conflicts:
kernel/sched/Makefile
Pick up the waitqueue related renames - it didn't get much feedback,
so it appears to be uncontroversial. Famous last words? ;-)
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When hpet=force is supplied on the kernel command line and the HPET
supports the Legacy Replacement Interrupt Route option (HPET_ID_LEGSUP),
the legacy interrupts init code uses the boot CPU's mask initially by
calling smp_processor_id() assuming that it is running on the BSP.
It does run on the BSP but the code region is preemptible and the
preemption check fires.
Simply use the BSP's id directly to avoid the warning.
Signed-off-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <[email protected]>
Original-patch-by: Michal Hocko <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I made a mistake in commit bfd20f1. We should skip the force on with the
option enabled instead of vice versa. Not sure why this passed our
performance test, sorry.
Fixes: bfd20f1cc850 ('x86, iommu/vt-d: Add an option to disable Intel IOMMU force on')
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
|
|
The bootlog option is only disabled by default on AMD Fam10h and older
systems.
Update bootlog description to say this. Change the family value to hex
to avoid confusion.
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
AMD systems have non-core, shared MCA banks within a die. These banks
are controlled by a master CPU per die. If this CPU is offlined then all
the shared banks are disabled in addition to the CPU's core banks.
Also, Fam17h systems may have SMT enabled. The MCA_CTL register is shared
between SMT thread siblings. If a CPU is offlined then all its sibling's
MCA banks are also disabled.
Extend the existing vendor check to AMD too.
Signed-off-by: Yazen Ghannam <[email protected]>
[ Fix up comment. ]
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Populate the MCE injection struct before doing initial injection so that
values which don't change have sane defaults.
Tested-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Yazen Ghannam <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Not really needed.
Tested-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Yazen Ghannam <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Make the mcelog call a notifier which lands in the injector module and
does the injection. This allows for mce-inject to be a normal kernel
module now.
Tested-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Yazen Ghannam <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Reuse mce_amd_inj's debugfs interface so that mce-inject can
benefit from it too. The old functionality is still preserved under
CONFIG_X86_MCELOG_LEGACY.
Tested-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Yazen Ghannam <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
In the amd_threshold_interrupt() handler, we loop through every possible
block in each bank and rediscover the block's address and if it's valid,
e.g. valid, counter present and not locked.
However, we already have the address saved in the threshold blocks list
for each CPU and bank. The list only contains blocks that have passed
all the valid checks.
Besides the redundancy, there's also a smp_call_function* in
get_block_address() which causes a warning when servicing the interrupt:
WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0
...
Call Trace:
<IRQ>
rdmsr_safe_on_cpu()
get_block_address.isra.2()
amd_threshold_interrupt()
smp_threshold_interrupt()
threshold_interrupt()
because we do get called in an interrupt handler *with* interrupts
disabled, which can result in a deadlock.
Drop the redundant valid checks and move the overflow check, logging and
block reset into a separate function.
Check the first block then iterate over the rest. This procedure is
needed since the first block is used as the head of the list.
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The value of MCA_STATUS is used as the MSR when clearing MCA_STATUS.
This may cause the following warning:
unchecked MSR access error: WRMSR to 0x11b (tried to write 0x0000000000000000)
Call Trace:
<IRQ>
smp_threshold_interrupt()
threshold_interrupt()
Use msr_stat instead which has the MSR address.
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Fixes: 37d43acfd79f ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This patch adds support for 5-level paging during early boot.
It generalizes boot for 4- and 5-level paging on 64-bit systems with
compile-time switch between them.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables.
Let's give these variable more generic names: init_top_pgt and
early_top_pgt.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The patch write most of startup_64 logic in C.
This is preparation for 5-level paging enabling.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The kernel has several code paths that read CR3. Most of them assume that
CR3 contains the PGD's physical address, whereas some of them awkwardly
use PHYSICAL_PAGE_MASK to mask off low bits.
Add explicit mask macros for CR3 and convert all of the CR3 readers.
This will keep them from breaking when PCID is enabled.
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: xen-devel <[email protected]>
Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Andy will need the following scheduler fix for the PCID series:
252d2a4117bc: sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
So do a cross-merge.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This function isn't used outside of time.c, so let's mark it static.
Signed-off-by: Dou Liyang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Hans managed to trigger a WARN very early in the boot which killed his
(Virtual) box.
The reason is that the recent rework of WARN() to use UD0 forgot to add the
fixup_bug() call to early_fixup_exception(). As a result the kernel does
not handle the WARN_ON injected UD0 exception and panics.
Add the missing fixup call, so early UD's injected by WARN() get handled.
Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0")
Reported-and-tested-by: Hans de Goede <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Frank Mehnert <[email protected]>
Cc: Hans de Goede <[email protected]>
Cc: Michael Thayer <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Pull KVM fixes from Paolo Bonzini:
"Bug fixes (ARM, s390, x86)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: async_pf: avoid async pf injection when in guest mode
KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
arm: KVM: Allow unaligned accesses at HYP
arm64: KVM: Allow unaligned accesses at EL2
arm64: KVM: Preserve RES1 bits in SCTLR_EL2
KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
KVM: nVMX: Fix exception injection
kvm: async_pf: fix rcu_irq_enter() with irqs enabled
KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
KVM: s390: fix ais handling vs cpu model
KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration
|
|
During early boot, load_ucode_intel_ap() uses __load_ucode_intel()
to obtain a pointer to the relevant microcode patch (embedded in the
initrd), and stores this value in 'intel_ucode_patch' to speed up the
microcode patch application for subsequent CPUs.
On resuming from suspend-to-RAM, however, load_ucode_ap() calls
load_ucode_intel_ap() for each non-boot-CPU. By then the initramfs is
long gone so the pointer stored in 'intel_ucode_patch' no longer points to
a valid microcode patch.
Clear that pointer so that we effectively fall back to the CPU hotplug
notifier callbacks to update the microcode.
Signed-off-by: Dominik Brodowski <[email protected]>
[ Edit and massage commit message. ]
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]> # 4.10..
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
... because this is exactly what it is: the number of entries in the
LDT. Calling it "size" is simply confusing and it is actually begging
to be called "nr_entries" or somesuch, especially if you see constructs
like:
alloc_size = size * LDT_ENTRY_SIZE;
since LDT_ENTRY_SIZE is the size of a single entry.
There should be no functionality change resulting from this patch, as
the before/after output from tools/testing/selftests/x86/ldt_gdt.c
shows.
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Renamed 'n_entries' to 'nr_entries' ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
native_safe_halt enables interrupts, and you just shouldn't
call rcu_irq_enter() with interrupts enabled. Reorder the
call with the following local_irq_disable() to respect the
invariant.
Reported-by: Ross Zwisler <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Tested-by: Wanpeng Li <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Lazy TLB state is currently managed in a rather baroque manner.
AFAICT, there are three possible states:
- Non-lazy. This means that we're running a user thread or a
kernel thread that has called use_mm(). current->mm ==
current->active_mm == cpu_tlbstate.active_mm and
cpu_tlbstate.state == TLBSTATE_OK.
- Lazy with user mm. We're running a kernel thread without an mm
and we're borrowing an mm_struct. We have current->mm == NULL,
current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
!= TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0). The current cpu is set
in mm_cpumask(current->active_mm). CR3 points to
current->active_mm->pgd. The TLB is up to date.
- Lazy with init_mm. This happens when we call leave_mm(). We
have current->mm == NULL, current->active_mm ==
cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
the scheduler is tracking it for refcounting. cpu_tlbstate.state
!= TLBSTATE_OK. The current cpu is clear in
mm_cpumask(current->active_mm). CR3 points to swapper_pg_dir,
i.e. init_mm->pgd.
This patch simplifies the situation. Other than perf, x86 stops
caring about current->active_mm at all. We have
cpu_tlbstate.loaded_mm pointing to the mm that CR3 references. The
TLB is always up to date for that mm. leave_mm() just switches us
to init_mm. There are no longer any special cases for mm_cpumask,
and switch_mm() switches mms without worrying about laziness.
After this patch, cpu_tlbstate.state serves only to tell the TLB
flush code whether it may switch to init_mm instead of doing a
normal flush.
This makes fairly extensive changes to xen_exit_mmap(), which used
to look a bit like black magic.
Perf is unchanged. With or without this change, perf may behave a bit
erratically if it tries to read user memory in kernel thread context.
We should build on this patch to teach perf to never look at user
memory when cpu_tlbstate.loaded_mm != current->mm.
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
A SoC variant of Geode GX1, notably NSC branded SC1100, seems to
report an inverted Device ID in its DIR0 configuration register,
specifically 0xb instead of the expected 0x4.
Catch this presumably quirky version so it's properly recognized
as GX1 and has its cache switched to write-back mode, which provides
a significant performance boost in most workloads.
SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip",
states in section 1.1.7.1 "Device ID" that device identification
values are specified in SC1100's device errata. These, however,
seem to not have been publicly released.
Wading through a number of boot logs and /proc/cpuinfo dumps found on
pastebin and blogs, this patch should mostly be relevant for a number
of now admittedly aging Soekris NET4801 and PC Engines WRAP devices,
the latter being the platform this issue was discovered on.
Performance impact was verified using "openssl speed", with
write-back caching scaling throughput between -3% and +41%.
Signed-off-by: Christian Sünkenberg <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that all affected platforms have a microcode update; and we check
this and disable TSC_DEADLINE and print a microcode revision update
error if its too old, we can remove the TSC_ADJUST clamp.
This should help with systems where the second socket runs ahead of
the first socket and needs a negative adjustment. In this case we'd
hit the 0 clamp and give up for not achieving synchronization.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Due to errata it is possible for the TSC_DEADLINE timer to misbehave
after using TSC_ADJUST. A microcode update is available to fix this
situation.
Avoid using the TSC_DEADLINE timer if it is affected by this issue and
report the required microcode version.
[ tglx: Renamed function to apic_check_deadline_errata() ]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
So that we can more easily see in what mode the lapic timer operates.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
DEBUG_PREEMPT warning
... to raw_smp_processor_id() to not trip the
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
check. The reasoning behind it is that __warn() already uses the raw_
variants but the show_regs() path on 32-bit doesn't.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
preemptibility bug
With CONFIG_DEBUG_PREEMPT enabled, I get:
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is debug_smp_processor_id
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc2+ #2
Call Trace:
dump_stack
check_preemption_disabled
debug_smp_processor_id
save_microcode_in_initrd_amd
? microcode_init
save_microcode_in_initrd
...
because, well, it says it above, we're using smp_processor_id() in
preemptible code.
But passing the CPU number is not really needed. It is only used to
determine whether we're on the BSP, and, if so, to save the microcode
patch for early loading.
[ We don't absolutely need to do it on the BSP but we do that
customarily there. ]
Instead, convert that function parameter to a boolean which denotes
whether the patch should be saved or not, thereby avoiding the use of
smp_processor_id() in preemptible code.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A series of fixes for X86:
- The final fix for the end-of-stack issue in the unwinder
- Handle non PAT systems gracefully
- Prevent access to uninitiliazed memory
- Move early delay calaibration after basic init
- Fix Kconfig help text
- Fix a cross compile issue
- Unbreak older make versions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/timers: Move simple_udelay_calibration past init_hypervisor_platform
x86/alternatives: Prevent uninitialized stack byte read in apply_alternatives()
x86/PAT: Fix Xorg regression on CPUs that don't support PAT
x86/watchdog: Fix Kconfig help text file path reference to lockup watchdog documentation
x86/build: Permit building with old make versions
x86/unwind: Add end-of-stack check for ftrace handlers
Revert "x86/entry: Fix the end of the stack for newly forked tasks"
x86/boot: Use CROSS_COMPILE prefix for readelf
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fixes from Thomas Gleixner:
"Two fixlets for RAS:
- Export memory_error() so the NFIT module can utilize it
- Handle memory errors in NFIT correctly"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
acpi, nfit: Fix the memory error check in nfit_handle_mce()
x86/MCE: Export memory_error()
|
|
ftrace use module_alloc() to allocate trampoline pages. The mapping of
module_alloc() is RWX, which makes sense as the memory is written to right
after allocation. But nothing makes these pages RO after writing to them.
Add proper set_memory_rw/ro() calls to protect the trampolines after
modification.
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705251056410.1862@nanos
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|