aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2020-11-06x86/mm/highmem: Use generic kmap atomic implementationThomas Gleixner9-158/+28
Convert X86 to the generic kmap atomic implementation and make the iomap_atomic() naming convention consistent while at it. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-06x86/mce: Correct the detection of invalid notifier prioritiesZhen Lei2-2/+4
Commit c9c6d216ed28 ("x86/mce: Rename "first" function as "early"") changed the enumeration of MCE notifier priorities. Correct the check for notifier priorities to cover the new range. [ bp: Rewrite commit message, remove superfluous brackets in conditional. ] Fixes: c9c6d216ed28 ("x86/mce: Rename "first" function as "early"") Signed-off-by: Zhen Lei <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-06ftrace: Add recording of functions that caused recursionSteven Rostedt (VMware)1-1/+1
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file "recursed_functions" all the functions that caused recursion while a callback to the function tracer was running. Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Guo Ren <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: Kees Cook <[email protected]> Cc: Anton Vorontsov <[email protected]> Cc: Colin Cross <[email protected]> Cc: Tony Luck <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Joe Lawrence <[email protected]> Cc: Kamalesh Babulal <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-06kprobes/ftrace: Add recursion protection to the ftrace callbackSteven Rostedt (VMware)1-2/+10
If a ftrace callback does not supply its own recursion protection and does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will make a helper trampoline to do so before calling the callback instead of just calling the callback directly. The default for ftrace_ops is going to change. It will expect that handlers provide their own recursion protection, unless its ftrace_ops states otherwise. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Guo Ren <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-06x86/mce: Assign boolean values to a bool variableKaixu Xia1-2/+2
Fix the following coccinelle warnings: ./arch/x86/kernel/cpu/mce/core.c:1765:3-20: WARNING: Assignment of 0/1 to bool variable ./arch/x86/kernel/cpu/mce/core.c:1584:2-9: WARNING: Assignment of 0/1 to bool variable [ bp: Massage commit message. ] Reported-by: Tosk Robot <[email protected]> Signed-off-by: Kaixu Xia <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-06ima: generalize x86/EFI arch glue for other EFI architecturesChester Lin3-96/+3
Move the x86 IMA arch code into security/integrity/ima/ima_efi.c, so that we will be able to wire it up for arm64 in a future patch. Co-developed-by: Chester Lin <[email protected]> Signed-off-by: Chester Lin <[email protected]> Acked-by: Mimi Zohar <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2020-11-05x86/speculation: Allow IBPB to be conditionally enabled on CPUs with ↵Anand K Mistry1-18/+33
always-on STIBP On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED condition leads to a return before the task flag is set. Similarly, ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP or IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. [ bp: Massage commit message and comment; space out statements for better readability. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Signed-off-by: Anand K Mistry <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
2020-11-05Merge tag 'hyperv-fixes-signed' of ↵Linus Torvalds1-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - clarify a comment (Michael Kelley) - change a pr_warn() to pr_info() (Olaf Hering) * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Clarify comment on x2apic mode hv_balloon: disable warning when floor reached
2020-11-04efi: generalize efi_get_securebootChester Lin1-1/+1
Generalize the efi_get_secureboot() function so not only efistub but also other subsystems can use it. Note that the MokSbState handling is not factored out: the variable is boot time only, and so it cannot be parameterized as easily. Also, the IMA code will switch to this version in a future patch, and it does not incorporate the MokSbState exception in the first place. Note that the new efi_get_secureboot_mode() helper treats any failures to read SetupMode as setup mode being disabled. Co-developed-by: Chester Lin <[email protected]> Signed-off-by: Chester Lin <[email protected]> Acked-by: Mimi Zohar <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2020-11-04x86/entry: Move nmi entry/exit into common codeThomas Gleixner5-49/+13
Lockdep state handling on NMI enter and exit is nothing specific to X86. It's not any different on other architectures. Also the extra state type is not necessary, irqentry_state_t can carry the necessary information as well. Move it to common code and extend irqentry_state_t to carry lockdep state. [ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive between the IRQ and NMI exceptions, and add kernel documentation for struct irqentry_state_t ] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-04Merge branch 'core/urgent' into core/entryThomas Gleixner333-12851/+18420
Pick up the entry fix before further modifications.
2020-11-04x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.SFangrui Song3-9/+3
Commit 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions") added .weak directives to arch/x86/lib/mem*_64.S instead of changing the existing ENTRY macros to WEAK. This can lead to the assembly snippet .weak memcpy ... .globl memcpy which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy with LLVM's integrated assembler before LLVM 12. LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Commit ef1e03152cb0 ("x86/asm: Make some functions local") changed ENTRY in arch/x86/lib/memcpy_64.S to SYM_FUNC_START_LOCAL, which was ineffective due to the preceding .weak directive. Use the appropriate SYM_FUNC_START_WEAK instead. Fixes: 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions") Fixes: ef1e03152cb0 ("x86/asm: Make some functions local") Reported-by: Sami Tolvanen <[email protected]> Signed-off-by: Fangrui Song <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Tested-by: Nick Desaulniers <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-04x86/ioapic: Use I/O-APIC ID for finding irqdomain, not indexDavid Woodhouse1-2/+2
In commit b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain") the I/O-APIC code was changed to find its parent irqdomain using irq_find_matching_fwspec(), but the key used for the lookup was wrong. It shouldn't use 'ioapic' which is the index into its own ioapics[] array. It should use the actual arbitration ID of the I/O-APIC in question, which is mpc_ioapic_id(ioapic). Fixes: b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain") Reported-by: lkp <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-04x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports itDexuan Cui2-0/+36
When a Linux VM runs on Hyper-V, if the VM has CPUs with >255 APIC IDs, the CPUs can't be the destination of IOAPIC interrupts, because the IOAPIC RTE's Dest Field has only 8 bits. Currently the hackery driver drivers/iommu/hyperv-iommu.c is used to ensure IOAPIC interrupts are only routed to CPUs that don't have >255 APIC IDs. However, there is an issue with kdump, because the kdump kernel can run on any CPU, and hence IOAPIC interrupts can't work if the kdump kernel run on a CPU with a >255 APIC ID. The kdump issue can be fixed by the Extended Dest ID, which is introduced recently by David Woodhouse (for IOAPIC, see the field virt_destid_8_14 in struct IO_APIC_route_entry). Of course, the Extended Dest ID needs the support of the underlying hypervisor. The latest Hyper-V has added the support recently: with this commit, on such a Hyper-V host, Linux VM does not use hyperv-iommu.c because hyperv_prepare_irq_remapping() returns -ENODEV; instead, Linux kernel's generic support of Extended Dest ID from David is used, meaning that Linux VM is able to support up to 32K CPUs, and IOAPIC interrupts can be routed to all the CPUs. On an old Hyper-V host that doesn't support the Extended Dest ID, nothing changes with this commit: Linux VM is still able to bring up the CPUs with > 255 APIC IDs with the help of hyperv-iommu.c, but IOAPIC interrupts still can not go to such CPUs, and the kdump kernel still can not work properly on such CPUs. [ tglx: Updated comment as suggested by David ] Signed-off-by: Dexuan Cui <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: David Woodhouse <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-03Merge tag 'x86_seves_for_v5.10_rc3' of ↵Linus Torvalds8-8/+167
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV-ES fixes from Borislav Petkov: "A couple of changes to the SEV-ES code to perform more stringent hypervisor checks before enabling encryption (Joerg Roedel)" * tag 'x86_seves_for_v5.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Do not support MMIO to/from encrypted memory x86/head/64: Check SEV encryption before switching to kernel page-table x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler x86/boot/compressed/64: Introduce sev_status
2020-11-02x86/mtrr: Fix a kernel-doc markupMauro Carvalho Chehab1-1/+2
Kernel-doc markup should use this format: identifier - description Fix it. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/2217cd4ae9e561da2825485eb97de77c65741489.1603469755.git.mchehab+huawei@kernel.org
2020-11-02x86/mce: Enable additional error logging on certain Intel CPUsTony Luck2-0/+21
The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an optional additional error logging mode which is enabled by an MSR. Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu, but userspace should not be poking at MSRs. So move the enabling into the kernel. [ bp: Correct the explanation why this is done. ] Suggested-by: Boris Petkov <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-01Merge tag 'x86-urgent-2020-11-01' of ↵Linus Torvalds1-13/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Three fixes all related to #DB: - Handle the BTF bit correctly so it doesn't get lost due to a kernel #DB - Only clear and set the virtual DR6 value used by ptrace on user space triggered #DB. A kernel #DB must leave it alone to ensure data consistency for ptrace. - Make the bitmasking of the virtual DR6 storage correct so it does not lose DR_STEP" * tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6) x86/debug: Only clear/set ->virtual_dr6 for userspace #DB x86/debug: Fix BTF handling
2020-11-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-32/+31
Pull kvm fixes from Paolo Bonzini: "ARM: - selftest fix - force PTE mapping on device pages provided via VFIO - fix detection of cacheable mapping at S2 - fallback to PMD/PTE mappings for composite huge pages - fix accounting of Stage-2 PGD allocation - fix AArch32 handling of some of the debug registers - simplify host HYP entry - fix stray pointer conversion on nVHE TLB invalidation - fix initialization of the nVHE code - simplify handling of capabilities exposed to HYP - nuke VCPUs caught using a forbidden AArch32 EL0 x86: - new nested virtualization selftest - miscellaneous fixes - make W=1 fixes - reserve new CPUID bit in the KVM leaves" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: vmx: remove unused variable KVM: selftests: Don't require THP to run tests KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again KVM: selftests: test behavior of unmapped L2 APIC-access address KVM: x86: Fix NULL dereference at kvm_msr_ignored_check() KVM: x86: replace static const variables with macros KVM: arm64: Handle Asymmetric AArch32 systems arm64: cpufeature: upgrade hyp caps to final arm64: cpufeature: reorder cpus_have_{const, final}_cap() KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code() KVM: arm64: Force PTE mapping on fault resulting in a device mapping KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes KVM: arm64: Fix masks in stage2_pte_cacheable() KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
2020-10-31KVM: vmx: remove unused variablePaolo Bonzini1-2/+0
Reported-by: kernel test robot <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-31KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work againVitaly Kuznetsov3-5/+5
It was noticed that evmcs_sanitize_exec_ctrls() is not being executed nowadays despite the code checking 'enable_evmcs' static key looking correct. Turns out, static key magic doesn't work in '__init' section (and it is unclear when things changed) but setup_vmcs_config() is called only once per CPU so we don't really need it to. Switch to checking 'enlightened_vmcs' instead, it is supposed to be in sync with 'enable_evmcs'. Opportunistically make evmcs_sanitize_exec_ctrls '__init' and drop unneeded extra newline from it. Reported-by: Yang Weijiang <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-30timekeeping: default GENERIC_CLOCKEVENTS to enabledArnd Bergmann1-1/+0
Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to require each one to select that symbol manually. Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as a simplification. It should be possible to select both GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now and decide at runtime between the two. For the clockevents arch-support.txt file, this means that additional architectures are marked as TODO when they have at least one machine that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when at least one machine has been converted. This means that both m68k and arm (for riscpc) revert to TODO. At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS rather than leaving it off when not needed. I built an m68k defconfig kernel (using gcc-10.1.0) and found that this would add around 5.5KB in kernel image size: text data bss dec hex filename 3861936 1092236 196656 5150828 4e986c obj-m68k/vmlinux-no-clockevent 3866201 1093832 196184 5156217 4ead79 obj-m68k/vmlinux-clockevent On Arm (MACH_RPC), that difference appears to be twice as large, around 11KB on top of an 6MB vmlinux. Reviewed-by: Geert Uytterhoeven <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2020-10-30KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()Takashi Iwai1-4/+4
The newly introduced kvm_msr_ignored_check() tries to print error or debug messages via vcpu_*() macros, but those may cause Oops when NULL vcpu is passed for KVM_GET_MSRS ioctl. Fix it by replacing the print calls with kvm_*() macros. (Note that this will leave vcpu argument completely unused in the function, but I didn't touch it to make the fix as small as possible. A clean up may be applied later.) Fixes: 12bc2132b15e ("KVM: X86: Do the same ignore_msrs check for feature msrs") BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178280 Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-30KVM: x86: replace static const variables with macrosPaolo Bonzini3-21/+21
Even though the compiler is able to replace static const variables with their value, it will warn about them being unused when Linux is built with W=1. Use good old macros instead, this is not C++. Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-30crypto: hash - Use memzero_explicit() for clearing stateArvind Sankar1-1/+1
Without the barrier_data() inside memzero_explicit(), the compiler may optimize away the state-clearing if it can tell that the state is not used afterwards. Signed-off-by: Arvind Sankar <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2020-10-30crypto: x86/aes - remove unused file aes_glue.cEric Biggers1-1/+0
Commit 1d2c3279311e ("crypto: x86/aes - drop scalar assembler implementations") was meant to remove aes_glue.c, but it actually left it as an unused one-line file. Remove this unused file. Cc: Ard Biesheuvel <[email protected]> Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2020-10-29x86/build: Fix vmlinux size check on 64-bitArvind Sankar5-35/+29
Commit b4e0409a36f4 ("x86: check vmlinux limits, 64-bit") added a check that the size of the 64-bit kernel is less than KERNEL_IMAGE_SIZE. The check uses (_end - _text), but this is not enough. The initial PMD used in startup_64() (level2_kernel_pgt) can only map upto KERNEL_IMAGE_SIZE from __START_KERNEL_map, not from _text, and the modules area (MODULES_VADDR) starts at KERNEL_IMAGE_SIZE. The correct check is what is currently done for 32-bit, since LOAD_OFFSET is defined appropriately for the two architectures. Just check (_end - LOAD_OFFSET) against KERNEL_IMAGE_SIZE unconditionally. Note that on 32-bit, the limit is not strict: KERNEL_IMAGE_SIZE is not really used by the main kernel. The higher the kernel is located, the less the space available for the vmalloc area. However, it is used by KASLR in the compressed stub to limit the maximum address of the kernel to a safe value. Clean up various comments to clarify that despite the name, KERNEL_IMAGE_SIZE is not a limit on the size of the kernel image, but a limit on the maximum virtual address that the image can occupy. Signed-off-by: Arvind Sankar <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/sev-es: Do not support MMIO to/from encrypted memoryJoerg Roedel1-7/+13
MMIO memory is usually not mapped encrypted, so there is no reason to support emulated MMIO when it is mapped encrypted. Prevent a possible hypervisor attack where a RAM page is mapped as an MMIO page in the nested page-table, so that any guest access to it will trigger a #VC exception and leak the data on that page to the hypervisor via the GHCB (like with valid MMIO). On the read side this attack would allow the HV to inject data into the guest. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/head/64: Check SEV encryption before switching to kernel page-tableJoerg Roedel2-0/+17
When SEV is enabled, the kernel requests the C-bit position again from the hypervisor to build its own page-table. Since the hypervisor is an untrusted source, the C-bit position needs to be verified before the kernel page-table is used. Call sev_verify_cbit() before writing the CR3. [ bp: Massage. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/boot/compressed/64: Check SEV encryption in 64-bit boot-pathJoerg Roedel4-0/+96
Check whether the hypervisor reported the correct C-bit when running as an SEV guest. Using a wrong C-bit position could be used to leak sensitive data from the guest to the hypervisor. The check function is in a separate file: arch/x86/kernel/sev_verify_cbit.S so that it can be re-used in the running kernel image. [ bp: Massage. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handlerJoerg Roedel1-0/+26
The early #VC handler which doesn't have a GHCB can only handle CPUID exit codes. It is needed by the early boot code to handle #VC exceptions raised in verify_cpu() and to get the position of the C-bit. But the CPUID information comes from the hypervisor which is untrusted and might return results which trick the guest into the no-SEV boot path with no C-bit set in the page-tables. All data written to memory would then be unencrypted and could leak sensitive data to the hypervisor. Add sanity checks to the early #VC handler to make sure the hypervisor can not pretend that SEV is disabled. [ bp: Massage a bit. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86: Wire up TIF_NOTIFY_SIGNALJens Axboe1-0/+2
The generic entry code has support for TIF_NOTIFY_SIGNAL already. Just provide the TIF bit. [ tglx: Adopted to other TIF changes in x86 ] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-29perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANYKan Liang1-1/+2
The event CYCLE_ACTIVITY.STALLS_MEM_ANY (0x14a3) should be available on all 8 GP counters on ICL, but it's only scheduled on the first four counters due to the current ICL constraint table. Add a line for the CYCLE_ACTIVITY.STALLS_MEM_ANY event in the ICL constraint table. Correct the comments for the CYCLE_ACTIVITY.CYCLES_MEM_ANY event. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Reported-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-10-29perf/x86/intel/uncore: Add Rocket Lake supportKan Liang2-1/+25
For Rocket Lake, the MSR uncore, e.g., CBOX, ARB and CLOCKBOX, are the same as Tiger Lake. Share the perf code with it. For Rocket Lake and Tiger Lake, the 8th CBOX is not mapped into a different MSR space anymore. Add rkl_uncore_msr_init_box() to replace skl_uncore_msr_init_box(). The IMC uncore is the similar to Ice Lake. Add new PCIIDs of IMC for Rocket Lake. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29perf/x86/msr: Add Rocket Lake CPU supportKan Liang1-0/+1
Like Ice Lake and Tiger Lake, PPERF and SMI_COUNT MSRs are also supported by Rocket Lake. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29perf/x86/cstate: Add Rocket Lake CPU supportKan Liang1-9/+10
From the perspective of Intel cstate residency counters, Rocket Lake is the same as Ice Lake and Tiger Lake. Share the code with them. Update the comments for Rocket Lake. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29perf/x86/intel: Add Rocket Lake CPU supportKan Liang1-0/+1
From the perspective of Intel PMU, Rocket Lake is the same as Ice Lake and Tiger Lake. Share the perf code with them. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZEStephane Eranian1-1/+1
When studying code layout, it is useful to capture the page size of the sampled code address. Add a new sample type for code page size. The new sample type requires collecting the ip. The code page size can be calculated from the NMI-safe perf_get_page_size(). For large PEBS, it's very unlikely that the mapping is gone for the earlier PEBS records. Enable the feature for the large PEBS. The worst case is that page-size '0' is returned. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZEKan Liang1-3/+8
The new sample type, PERF_SAMPLE_DATA_PAGE_SIZE, requires the virtual address. Update the data->addr if the sample type is set. The large PEBS is disabled with the sample type, because perf doesn't support munmap tracking yet. The PEBS buffer for large PEBS cannot be flushed for each munmap. Wrong page size may be calculated. The large PEBS can be enabled later separately when munmap tracking is supported. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/boot/compressed/64: Introduce sev_statusJoerg Roedel1-1/+15
Introduce sev_status and initialize it together with sme_me_mask to have an indicator which SEV features are enabled. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29entry: Add support for TIF_NOTIFY_SIGNALJens Axboe1-2/+2
Add TIF_NOTIFY_SIGNAL handling in the generic entry code, which if set, will return true if signal_pending() is used in a wait loop. That causes an exit of the loop so that notify_signal tracehooks can be run. If the wait loop is currently inside a system call, the system call is restarted once task_work has been processed. In preparation for only having arch_do_signal() handle syscall restarts if _TIF_SIGPENDING isn't set, rename it to arch_do_signal_or_restart(). Pass in a boolean that tells the architecture specific signal handler if it should attempt to get a signal, or just process a potential syscall restart. For !CONFIG_GENERIC_ENTRY archs, add the TIF_NOTIFY_SIGNAL handling to get_signal(). This is done to minimize the needed architecture changes to support this feature. Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detectedDavid Woodhouse1-0/+6
This allows the host to indicate that MSI emulation supports 15-bit destination IDs, allowing up to 32768 CPUs without interrupt remapping. cf. https://patchwork.kernel.org/patch/11816693/ for qemu Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/apic: Support 15 bits of APIC ID in MSI where availableDavid Woodhouse4-3/+29
Some hypervisors can allow the guest to use the Extended Destination ID field in the MSI address to address up to 32768 CPUs. This applies to all downstream devices which generate MSI cycles, including HPET, I/O-APIC and PCI MSI. HPET and PCI MSI use the same __irq_msi_compose_msg() function, while I/O-APIC generates its own and had support for the extended bits added in a previous commit. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Handle Extended Destination ID field in RTEDavid Woodhouse2-6/+17
Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4 of the address used in the resulting MSI cycle. Historically, the x86 MSI format only used the top 8 of those 16 bits as the destination APIC ID, and the "Extended Destination ID" in the lower 8 bits was unused. With interrupt remapping, the lowest bit of the Extended Destination ID (bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable format MSI. A hypervisor can use the other 7 bits of the Extended Destination ID to permit guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs before having to expose a vIOMMU and interrupt remapping to the guest. No behavioural change in this patch, since nothing yet permits APIC IDs above 255 to be used with the non-IR I/OAPIC domain. [ tglx: Converted it to the cleaned up entry/msi_msg format and added commentry ] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86: Kill all traces of irq_remapping_get_irq_domain()David Woodhouse2-11/+0
All users are converted to use the fwspec based parent domain lookup. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomainDavid Woodhouse1-12/+13
All possible parent domains have a select method now. Make use of it. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomainDavid Woodhouse1-10/+14
All possible parent domains have a select method now. Make use of it. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/apic: Add select() method on vector irqdomainDavid Woodhouse2-0/+46
This will be used to select the irqdomain for I/O-APIC and HPET. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Generate RTE directly from parent irqchip's MSI messageDavid Woodhouse2-37/+77
The I/O-APIC generates an MSI cycle with address/data bits taken from its Redirection Table Entry in some combination which used to make sense, but now is just a bunch of bits which get passed through in some seemingly arbitrary order. Instead of making IRQ remapping drivers directly frob the I/OA-PIC RTE, let them just do their job and generate an MSI message. The bit swizzling to turn that MSI message into the I/O-APIC's RTE is the same in all cases, since it's a function of the I/O-APIC hardware. The IRQ remappers have no real need to get involved with that. The only slight caveat is that the I/OAPIC is interpreting some of those fields too, and it does want the 'vector' field to be unique to make EOI work. The AMD IOMMU happens to put its IRTE index in the bits that the I/O-APIC thinks are the vector field, and accommodates this requirement by reserving the first 32 indices for the I/O-APIC. The Intel IOMMU doesn't actually use the bits that the I/O-APIC thinks are the vector field, so it fills in the 'pin' value there instead. [ tglx: Replaced the unreadably macro maze with the cleaned up RTE/msi_msg bitfields and added commentry to explain the mapping magic ] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Cleanup IO/APIC route entry structsThomas Gleixner2-129/+93
Having two seperate structs for the I/O-APIC RTE entries (non-remapped and DMAR remapped) requires type casts and makes it hard to map. Combine them in IO_APIC_routing_entry by defining a union of two 64bit bitfields. Use naming which reflects which bits are shared and which bits are actually different for the operating modes. [dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for register access into the same union and eliminating a few more places where bits were accessed through masks and shifts.] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]