aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2018-09-23Merge tag 'for-linus-4.19d-rc5-tag' of ↵Greg Kroah-Hartman1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Juergen writes: "xen: Two small fixes for xen drivers." * tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: issue warning message when out of grant maptrack entries xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code
2018-09-23Merge branch 'x86-urgent-for-linus' of ↵Greg Kroah-Hartman15-39/+234
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Thomas writes: "A set of fixes for x86: - Resolve the kvmclock regression on AMD systems with memory encryption enabled. The rework of the kvmclock memory allocation during early boot results in encrypted storage, which is not shareable with the hypervisor. Create a new section for this data which is mapped unencrypted and take care that the later allocations for shared kvmclock memory is unencrypted as well. - Fix the build regression in the paravirt code introduced by the recent spectre v2 updates. - Ensure that the initial static page tables cover the fixmap space correctly so early console always works. This worked so far by chance, but recent modifications to the fixmap layout can - depending on kernel configuration - move the relevant entries to a different place which is not covered by the initial static page tables. - Address the regressions and issues which got introduced with the recent extensions to the Intel Recource Director Technology code. - Update maintainer entries to document reality" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Expand static page table for fixmap space MAINTAINERS: Add X86 MM entry x86/intel_rdt: Add Reinette as co-maintainer for RDT MAINTAINERS: Add Borislav to the x86 maintainers x86/paravirt: Fix some warning messages x86/intel_rdt: Fix incorrect loop end condition x86/intel_rdt: Fix exclusive mode handling of MBA resource x86/intel_rdt: Fix incorrect loop end condition x86/intel_rdt: Do not allow pseudo-locking of MBA resource x86/intel_rdt: Fix unchecked MSR access x86/intel_rdt: Fix invalid mode warning when multiple resources are managed x86/intel_rdt: Global closid helper to support future fixes x86/intel_rdt: Fix size reporting of MBA resource x86/intel_rdt: Fix data type in parsing callbacks x86/kvm: Use __bss_decrypted attribute in shared variables x86/mm: Add .bss..decrypted section to hold shared variables
2018-09-22x86/CPU: Change query logic so CPUID is enabled before testingMatthew Whitehead1-1/+3
Presently we check first if CPUID is enabled. If it is not already enabled, then we next call identify_cpu_without_cpuid() and clear X86_FEATURE_CPUID. Unfortunately, identify_cpu_without_cpuid() is the function where CPUID becomes _enabled_ on Cyrix 6x86/6x86L CPUs. Reverse the calling sequence so that CPUID is first enabled, and then check a second time to see if the feature has now been activated. [ bp: Massage commit message and remove trailing whitespace. ] Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Matthew Whitehead <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Cc: David Woodhouse <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-09-22x86/CPU: Use correct macros for Cyrix callsMatthew Whitehead1-1/+1
There are comments in processor-cyrix.h advising you to _not_ make calls using the deprecated macros in this style: setCx86_old(CX86_CCR4, getCx86_old(CX86_CCR4) | 0x80); This is because it expands the macro into a non-functioning calling sequence. The calling order must be: outb(CX86_CCR2, 0x22); inb(0x23); From the comments: * When using the old macros a line like * setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88); * gets expanded to: * do { * outb((CX86_CCR2), 0x22); * outb((({ * outb((CX86_CCR2), 0x22); * inb(0x23); * }) | 0x88), 0x23); * } while (0); The new macros fix this problem, so use them instead. Signed-off-by: Matthew Whitehead <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jia Zhang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Philippe Ombredanne <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2018-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGreg Kroah-Hartman9-93/+214
Paolo writes: "It's mostly small bugfixes and cleanups, mostly around x86 nested virtualization. One important change, not related to nested virtualization, is that the ability for the guest kernel to trap CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is now masked by default. This is because the feature is detected through an MSR; a very bad idea that Intel seems to like more and more. Some applications choke if the other fields of that MSR are not initialized as on real hardware, hence we have to disable the whole MSR by default, as was the case before Linux 4.12." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits) KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs kvm: selftests: Add platform_info_test KVM: x86: Control guest reads of MSR_PLATFORM_INFO KVM: x86: Turbo bits in MSR_PLATFORM_INFO nVMX x86: Check VPID value on vmentry of L2 guests nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2 KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv KVM: VMX: check nested state and CR4.VMXE against SMM kvm: x86: make kvm_{load|put}_guest_fpu() static x86/hyper-v: rename ipi_arg_{ex,non_ex} structures KVM: VMX: use preemption timer to force immediate VMExit KVM: VMX: modify preemption timer bit only when arming timer KVM: VMX: immediately mark preemption timer expired only for zero value KVM: SVM: Switch to bitmap_zalloc() KVM/MMU: Fix comment in walk_shadow_page_lockless_end() kvm: selftests: use -pthread instead of -lpthread KVM: x86: don't reset root in kvm_mmu_setup() kvm: mmu: Don't read PDPTEs when paging is not enabled x86/kvm/lapic: always disable MMIO interface in x2APIC mode KVM: s390: Make huge pages unavailable in ucontrol VMs ...
2018-09-21signal/x86: Use force_sig_fault where appropriateEric W. Biederman4-32/+9
Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86: Pass pkey by valueEric W. Biederman1-7/+7
Now that si_code == SEGV_PKUERR is the flag indicating that a pkey is present there is no longer a need to pass a pointer to a local pkey value, instead pkey can be passed more efficiently by value. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86: Replace force_sig_info_fault with force_sig_faultEric W. Biederman1-19/+4
Now that the pkey handling has been removed force_sig_info_fault and force_sig_fault perform identical work. Just the type of the address paramter is different. So replace calls to force_sig_info_fault with calls to force_sig_fault, and remove force_sig_info_fault. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86: Call force_sig_pkuerr from __bad_area_nosemaphoreEric W. Biederman1-52/+24
There is only one code path that can generate a pkuerr signal. That code path calls __bad_area_nosemaphore and can be dectected by testing if si_code == SEGV_PKUERR. It can be seen from inspection that all of the other tests in fill_sig_info_pkey are unnecessary. Therefore call force_sig_pkuerr directly from __bad_area_semaphore and remove fill_sig_info_pkey. At the same time move the comment above force_sig_info_pkey into bad_area_access_error, so that the documentation about pkey generation races is not lost. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86: Pass pkey not vma into __bad_areaEric W. Biederman1-12/+8
There is only one caller of __bad_area that passes in PKUERR and thus will generate a siginfo with si_pkey set. Therefore simplify the logic and hoist reading of vma_pkey up into that caller, and just pass *pkey into __bad_area. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86: Don't compute pkey in __do_page_faultEric W. Biederman1-4/+0
There are no more users of the computed pkey value in __do_page_fault so stop computing the value. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86: Remove pkey parameter from mm_fault_errorEric W. Biederman1-2/+2
After the previous cleanups to do_sigbus and and bad_area_nosemaphore mm_fault_error no now longer uses it's pkey parameter. Therefore remove the unused parameter. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86: Remove the pkey parameter from do_sigbusEric W. Biederman1-3/+3
The function do_sigbus never sets si_code to PKUERR so it can never return a pkey to userspace. Therefore remove the unusable pkey parameter from do_sigbus. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86: Remove pkey parameter from bad_area_nosemaphoreEric W. Biederman1-7/+7
The function bad_area_nosemaphore always sets si_code to SEGV_MAPERR and as such can never return a pkey parameter. Therefore remove the unusable pkey parameter from bad_area_nosemaphore. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86/traps: Simplify trap generationEric W. Biederman1-61/+24
Update the DO_ERROR macro to take si_code and si_addr values for a siginfo, removing the need for the fill_trap_info function. Update do_trap to also take the sicode and si_addr values for a sigininfo and modify the code to call force_sig when a sicode is not passed in and to call force_sig_fault when all of the information is present. Making this a more obvious, simpler and less error prone construction. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86/traps: Use force_sig instead of open coding it.Eric W. Biederman1-1/+1
The function "force_sig(sig, tsk)" is equivalent to " force_sig_info(sig, SEND_SIG_PRIV, tsk)". Using the siginfo variants can be error prone so use the simpler old fashioned force_sig variant, and with luck the force_sig_info variant can go away. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86/traps: Use force_sig_bnderrEric W. Biederman1-10/+9
Instead of generating the siginfo in x86 specific code use the new helper function force_sig_bnderr to separate the concerns of collecting the information and generating a proper siginfo. Making the code easier to understand and maintain. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21signal/x86/traps: Move more code into do_trap_no_signal so it can be reusedEric W. Biederman1-16/+14
The function do_trap_no_signal embodies almost all of the work of the function do_trap. The exceptions are setting of thread.error_code and thread.trap_nr in the case when the signal will be sent, and reporting which signal will be sent with show_signal. Filling in struct siginfo and then calling do_trap is problematic as filling in struct siginfo is an fiddly process that can through inattention has resulted in fields not initialized and the wrong fields being filled in. To avoid this error prone situation I am replacing force_sig_info with a set of functions that take as arguments the information needed to send a specific kind of signal. The function do_trap is called in the context of several different kinds of signals today. Having a solid do_trap_no_signal that can be reused allows call sites that send different kinds of signals to reuse all of the code in do_trap_no_signal. Modify do_trap_no_signal to have a single exit there signals where be sent (aka returning -1) to allow more of the signal sending path to be moved to from do_trap to do_trap_no_signal. Move setting thread.trap_nr and thread.error_code into do_trap_no_signal so the code does not need to be duplicated. Make the type of the string that is passed into do_trap_no_signal to const. The only user of that str is die and it already takes a const string, so this just makes it explicit that the string won't change. All of this prepares the way for using do_trap_no_signal outside of do_trap. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-21x86/mce-inject: Reset injection struct after injectionBorislav Petkov1-0/+6
Clear the MCE struct which is used for collecting the injection details after injection. Also, populate it with more details from the machine. Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Herbert Xu5-5/+0
Merge crypto-2.6 to resolve caam conflict with skcipher conversion.
2018-09-20x86/mm: Expand static page table for fixmap spaceFeng Tang6-8/+42
We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup. Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled. So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables. Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Feng Tang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: kernel test robot <[email protected]> Reviewed-by: Juergen Gross <[email protected]> (Xen parts) Cc: H Peter Anvin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andy Lutomirsky <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-09-20KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLsLiran Alon1-8/+14
The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set their return value in "r" local var and break out of switch block when they encounter some error. This is because vcpu_load() is called before the switch block which have a proper cleanup of vcpu_put() afterwards. However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return immediately on error without performing above mentioned cleanup. Thus, change these handlers to behave as expected. Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") Reviewed-by: Mark Kanda <[email protected]> Reviewed-by: Patrick Colp <[email protected]> Signed-off-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20dma-mapping: merge direct and noncoherent opsChristoph Hellwig1-3/+3
All the cache maintainance is already stubbed out when not enabled, but merging the two allows us to nicely handle the case where cache maintainance is required for some devices, but not others. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Paul Burton <[email protected]> # MIPS parts
2018-09-20KVM: x86: Control guest reads of MSR_PLATFORM_INFODrew Schmitt2-0/+12
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access to reads of MSR_PLATFORM_INFO. Disabling access to reads of this MSR gives userspace the control to "expose" this platform-dependent information to guests in a clear way. As it exists today, guests that read this MSR would get unpopulated information if userspace hadn't already set it (and prior to this patch series, only the CPUID faulting information could have been populated). This existing interface could be confusing if guests don't handle the potential for incorrect/incomplete information gracefully (e.g. zero reported for base frequency). Signed-off-by: Drew Schmitt <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20KVM: x86: Turbo bits in MSR_PLATFORM_INFODrew Schmitt1-1/+0
Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only the CPUID faulting bit was settable. But now any bit in MSR_PLATFORM_INFO would be settable. This can be used, for example, to convey frequency information about the platform on which the guest is running. Signed-off-by: Drew Schmitt <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20nVMX x86: Check VPID value on vmentry of L2 guestsKrish Sadhukhan1-0/+3
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check needs to be enforced on vmentry of L2 guests: If the 'enable VPID' VM-execution control is 1, the value of the of the VPID VM-execution control field must not be 0000H. Signed-off-by: Krish Sadhukhan <[email protected]> Reviewed-by: Mark Kanda <[email protected]> Reviewed-by: Liran Alon <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2Krish Sadhukhan1-1/+5
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check needs to be enforced on vmentry of L2 guests: - Bits 5:0 of the posted-interrupt descriptor address are all 0. - The posted-interrupt descriptor address does not set any bits beyond the processor's physical-address width. Signed-off-by: Krish Sadhukhan <[email protected]> Reviewed-by: Mark Kanda <[email protected]> Reviewed-by: Liran Alon <[email protected]> Reviewed-by: Darren Kenny <[email protected]> Reviewed-by: Karl Heubaum <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICvLiran Alon3-1/+32
In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state, it is possible for a vCPU to be blocked while it is in guest-mode. According to Intel SDM 26.6.5 Interrupt-Window Exiting and Virtual-Interrupt Delivery: "These events wake the logical processor if it just entered the HLT state because of a VM entry". Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU should be waken from the HLT state and injected with the interrupt. In addition, if while the vCPU is blocked (while it is in guest-mode), it receives a nested posted-interrupt, then the vCPU should also be waken and injected with the posted interrupt. To handle these cases, this patch enhances kvm_vcpu_has_events() to also check if there is a pending interrupt in L2 virtual APICv provided by L1. That is, it evaluates if there is a pending virtual interrupt for L2 by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1 Evaluation of Pending Interrupts. Note that this also handles the case of nested posted-interrupt by the fact RVI is updated in vmx_complete_nested_posted_interrupt() which is called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_running() -> vmx_check_nested_events() -> vmx_complete_nested_posted_interrupt(). Reviewed-by: Nikita Leshenko <[email protected]> Reviewed-by: Darren Kenny <[email protected]> Signed-off-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20KVM: VMX: check nested state and CR4.VMXE against SMMPaolo Bonzini1-2/+11
VMX cannot be enabled under SMM, check it when CR4 is set and when nested virtualization state is restored. This should fix some WARNs reported by syzkaller, mostly around alloc_shadow_vmcs. Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20kvm: x86: make kvm_{load|put}_guest_fpu() staticSebastian Andrzej Siewior1-23/+23
The functions kvm_load_guest_fpu() kvm_put_guest_fpu() are only used locally, make them static. This requires also that both functions are moved because they are used before their implementation. Those functions were exported (via EXPORT_SYMBOL) before commit e5bb40251a920 ("KVM: Drop kvm_{load,put}_guest_fpu() exports"). Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20x86/hyper-v: rename ipi_arg_{ex,non_ex} structuresVitaly Kuznetsov2-11/+13
These structures are going to be used from KVM code so let's make their names reflect their Hyper-V origin. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Roman Kagan <[email protected]> Acked-by: K. Y. Srinivasan <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20KVM: VMX: use preemption timer to force immediate VMExitSean Christopherson4-2/+31
A VMX preemption timer value of '0' is guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. Use the preemption timer (if it's supported) to trigger immediate VMExit in place of the current method of sending a self-IPI. This ensures that pending VMExit injection to L1 occurs prior to executing any instructions in the guest (regardless of nesting level). When deferring VMExit injection, KVM generates an immediate VMExit from the (possibly nested) guest by sending itself an IPI. Because hardware interrupts are blocked prior to VMEnter and are unblocked (in hardware) after VMEnter, this results in taking a VMExit(INTR) before any guest instruction is executed. But, as this approach relies on the IPI being received before VMEnter executes, it only works as intended when KVM is running as L0. Because there are no architectural guarantees regarding when IPIs are delivered, when running nested the INTR may "arrive" long after L2 is running e.g. L0 KVM doesn't force an immediate switch to L1 to deliver an INTR. For the most part, this unintended delay is not an issue since the events being injected to L1 also do not have architectural guarantees regarding their timing. The notable exception is the VMX preemption timer[1], which is architecturally guaranteed to cause a VMExit prior to executing any instructions in the guest if the timer value is '0' at VMEnter. Specifically, the delay in injecting the VMExit causes the preemption timer KVM unit test to fail when run in a nested guest. Note: this approach is viable even on CPUs with a broken preemption timer, as broken in this context only means the timer counts at the wrong rate. There are no known errata affecting timer value of '0'. [1] I/O SMIs also have guarantees on when they arrive, but I have no idea if/how those are emulated in KVM. Signed-off-by: Sean Christopherson <[email protected]> [Use a hook for SVM instead of leaving the default in x86.c - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20KVM: VMX: modify preemption timer bit only when arming timerSean Christopherson1-29/+32
Provide a singular location where the VMX preemption timer bit is set/cleared so that future usages of the preemption timer can ensure the VMCS bit is up-to-date without having to modify unrelated code paths. For example, the preemption timer can be used to force an immediate VMExit. Cache the status of the timer to avoid redundant VMREAD and VMWRITE, e.g. if the timer stays armed across multiple VMEnters/VMExits. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20KVM: VMX: immediately mark preemption timer expired only for zero valueSean Christopherson1-6/+8
A VMX preemption timer value of '0' at the time of VMEnter is architecturally guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. This architectural definition is in place to ensure that a previously expired timer is correctly recognized by the CPU as it is possible for the timer to reach zero and not trigger a VMexit due to a higher priority VMExit being signalled instead, e.g. a pending #DB that morphs into a VMExit. Whether by design or coincidence, commit f4124500c2c1 ("KVM: nVMX: Fully emulate preemption timer") special cased timer values of '0' and '1' to ensure prompt delivery of the VMExit. Unlike '0', a timer value of '1' has no has no architectural guarantees regarding when it is delivered. Modify the timer emulation to trigger immediate VMExit if and only if the timer value is '0', and document precisely why '0' is special. Do this even if calibration of the virtual TSC failed, i.e. VMExit will occur immediately regardless of the frequency of the timer. Making only '0' a special case gives KVM leeway to be more aggressive in ensuring the VMExit is injected prior to executing instructions in the nested guest, and also eliminates any ambiguity as to why '1' is a special case, e.g. why wasn't the threshold for a "short timeout" set to 10, 100, 1000, etc... Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20KVM: SVM: Switch to bitmap_zalloc()Andy Shevchenko1-3/+2
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20KVM/MMU: Fix comment in walk_shadow_page_lockless_end()Tianyu Lan1-1/+1
kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page() This patch is to fix the commit. Signed-off-by: Lan Tianyu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20KVM: x86: don't reset root in kvm_mmu_setup()Wei Yang1-1/+6
Here is the code path which shows kvm_mmu_setup() is invoked after kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path, this means the root_hpa and prev_roots are guaranteed to be invalid. And it is not necessary to reset it again. kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_create() vmx_create_vcpu() kvm_vcpu_init() kvm_arch_vcpu_init() kvm_mmu_create() kvm_arch_vcpu_setup() kvm_mmu_setup() kvm_init_mmu() This patch set reset_roots to false in kmv_mmu_setup(). Fixes: 50c28f21d045dde8c52548f8482d456b3f0956f5 Signed-off-by: Wei Yang <[email protected]> Reviewed-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20kvm: mmu: Don't read PDPTEs when paging is not enabledJunaid Shahid1-2/+2
kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and CR4.PAE = 1. Signed-off-by: Junaid Shahid <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-20x86/kvm/lapic: always disable MMIO interface in x2APIC modeVitaly Kuznetsov2-3/+20
When VMX is used with flexpriority disabled (because of no support or if disabled with module parameter) MMIO interface to lAPIC is still available in x2APIC mode while it shouldn't be (kvm-unit-tests): PASS: apic_disable: Local apic enabled in x2APIC mode PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set FAIL: apic_disable: *0xfee00030: 50014 The issue appears because we basically do nothing while switching to x2APIC mode when APIC access page is not used. apic_mmio_{read,write} only check if lAPIC is disabled before proceeding to actual write. When APIC access is virtualized we correctly manipulate with VMX controls in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes in x2APIC mode so there's no issue. Disabling MMIO interface seems to be easy. The question is: what do we do with these reads and writes? If we add apic_x2apic_mode() check to apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will go to userspace. When lAPIC is in kernel, Qemu uses this interface to inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This somehow works with disabled lAPIC but when we're in xAPIC mode we will get a real injected MSI from every write to lAPIC. Not good. The simplest solution seems to be to just ignore writes to the region and return ~0 for all reads when we're in x2APIC mode. This is what this patch does. However, this approach is inconsistent with what currently happens when flexpriority is enabled: we allocate APIC access page and create KVM memory region so in x2APIC modes all reads and writes go to this pre-allocated page which is, btw, the same for all vCPUs. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-09-19xen/x86/vpmu: Zero struct pt_regs before calling into sample handling codeBoris Ostrovsky1-1/+1
Otherwise we may leak kernel stack for events that sample user registers. Reported-by: Mark Rutland <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]> Cc: [email protected]
2018-09-19signal/x86/traps: Factor out show_signalEric W. Biederman1-18/+19
The code for conditionally printing unhanded signals is duplicated twice in arch/x86/kernel/traps.c. Factor it out into it's own subroutine called show_signal to make the code clearer and easier to maintain. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-19signal/x86: Move mpx siginfo generation into do_boundsEric W. Biederman3-29/+32
This separates the logic of generating the signal from the logic of gathering the information about the bounds violation. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-19signal/x86: In trace_mpx_bounds_register_exception add __user annotationsEric W. Biederman1-2/+2
The value passed in to addr_referenced is of type void __user *, so update the addr_referenced parameter in trace_mpx_bounds_register_exception to match. Also update the addr_referenced paramater in TP_STRUCT__entry as it again holdes the same value. I don't know why this was missed earlier but sparse was complaining when testing test branch so fix this now. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-19signal/x86: Use send_sig_mceerr as apropriateEric W. Biederman1-10/+1
This simplifies the code making it clearer what is going on. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-19signal/x86: Move MCE error reporting out of force_sig_info_faultEric W. Biederman1-13/+13
Only the call from do_sigbus will send SIGBUS due to a memory machine check error. Consolidate all of the machine check signal generation code in do_sigbus and remove the now unnecessary fault parameter from force_sig_info_fault. Explicitly use the now constant si_code BUS_ADRERR in the call to force_sig_info_fault from do_sigbus. This makes the code in arch/x86/mm/fault.c easier to follower and simpler to maintain. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-19signal/x86: Inline fill_sigtrap_info in it's only caller send_sigtrapEric W. Biederman1-15/+7
The function fill_sigtrap_info now only has one caller so remove it and put it's contents in it's caller. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-19signal: Simplify tracehook_report_syscall_exitEric W. Biederman2-7/+6
Replace user_single_step_siginfo with user_single_step_report that allocates siginfo structure on the stack and sends it. This allows tracehook_report_syscall_exit to become a simple if statement that calls user_single_step_report or ptrace_report_syscall depending on the value of step. Update the default helper function now called user_single_step_report to explicitly set si_code to SI_USER and to set si_uid and si_pid to 0. The default helper has always been doing this (using memset) but it was far from obvious. The powerpc helper can now just call force_sig_fault. The x86 helper can now just call send_sigtrap. Unfortunately the default implementation of user_single_step_report can not use force_sig_fault as it does not use a SIGTRAP si_code. So it has to carefully setup the siginfo and use use force_sig_info. The net result is code that is easier to understand and simpler to maintain. Ref: 85ec7fd9f8e5 ("ptrace: introduce user_single_step_siginfo() helper") Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-19x86/paravirt: Fix some warning messagesDan Carpenter1-2/+2
The first argument to WARN_ONCE() is a condition. Fixes: 5800dc5c19f3 ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Alok Kataria <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/20180919103553.GD9238@mwanda
2018-09-19Merge branch 'linus' of ↵Greg Kroah-Hartman5-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Crypto stuff from Herbert: "This push fixes a potential boot hang in ccp and an incorrect CPU capability check in aegis/morus on x86." * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2 crypto: ccp - add timeout support in the SEV command
2018-09-18x86/intel_rdt: Fix incorrect loop end conditionReinette Chatre1-1/+1
In order to determine a sane default cache allocation for a new CAT/CDP resource group, all resource groups are checked to determine which cache portions are available to share. At this time all possible CLOSIDs that can be supported by the resource is checked. This is problematic if the resource supports more CLOSIDs than another CAT/CDP resource. In this case, the number of CLOSIDs that could be allocated are fewer than the number of CLOSIDs that can be supported by the resource. Limit the check of closids to that what is supported by the system based on the minimum across all resources. Fixes: 95f0b77ef ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Fenghua Yu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: "H Peter Anvin" <[email protected]> Cc: "Tony Luck" <[email protected]> Cc: "Xiaochen Shen" <[email protected]> Cc: "Chen Yu" <[email protected]> Link: https://lkml.kernel.org/r/[email protected]