Age | Commit message (Collapse) | Author | Files | Lines |
|
sm4_aesni_avx_crypt8() sets up the frame pointer (which includes pushing
RBP) before doing a conditional sibling call to sm4_aesni_avx_crypt4(),
which sets up an additional frame pointer. Things will not go well when
sm4_aesni_avx_crypt4() pops only the innermost single frame pointer and
then tries to return to the outermost frame pointer.
Sibling calls need to occur with an empty stack frame. Do the
conditional sibling call *before* setting up the stack pointer.
This fixes the following warning:
arch/x86/crypto/sm4-aesni-avx-asm_64.o: warning: objtool: sm4_aesni_avx_crypt8()+0x8: sibling call from callable instruction with modified stack frame
Fixes: a7ee22ee1445 ("crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation")
Reported-by: kernel test robot <[email protected]>
Reported-by: Arnd Bergmann <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Tianjia Zhang <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
This reverts commit 437b38c51162f8b87beb28a833c4d5dc85fa864e.
The memory semantics added in commit 437b38c51162 causes SystemMemory
Operation region, whose address range is not described in the EFI memory
map to be mapped as NormalNC memory on arm64 platforms (through
acpi_os_map_memory() in acpi_ex_system_memory_space_handler()).
This triggers the following abort on an ARM64 Ampere eMAG machine,
because presumably the physical address range area backing the Opregion
does not support NormalNC memory attributes driven on the bus.
Internal error: synchronous external abort: 96000410 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0+ #462
Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 0.14 02/22/2019
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[...snip...]
Call trace:
acpi_ex_system_memory_space_handler+0x26c/0x2c8
acpi_ev_address_space_dispatch+0x228/0x2c4
acpi_ex_access_region+0x114/0x268
acpi_ex_field_datum_io+0x128/0x1b8
acpi_ex_extract_from_field+0x14c/0x2ac
acpi_ex_read_data_from_field+0x190/0x1b8
acpi_ex_resolve_node_to_value+0x1ec/0x288
acpi_ex_resolve_to_value+0x250/0x274
acpi_ds_evaluate_name_path+0xac/0x124
acpi_ds_exec_end_op+0x90/0x410
acpi_ps_parse_loop+0x4ac/0x5d8
acpi_ps_parse_aml+0xe0/0x2c8
acpi_ps_execute_method+0x19c/0x1ac
acpi_ns_evaluate+0x1f8/0x26c
acpi_ns_init_one_device+0x104/0x140
acpi_ns_walk_namespace+0x158/0x1d0
acpi_ns_initialize_devices+0x194/0x218
acpi_initialize_objects+0x48/0x50
acpi_init+0xe0/0x498
If the Opregion address range is not present in the EFI memory map there
is no way for us to determine the memory attributes to use to map it -
defaulting to NormalNC does not work (and it is not correct on a memory
region that may have read side-effects) and therefore commit
437b38c51162 should be reverted, which means reverting back to the
original behavior whereby address ranges that are mapped using
acpi_os_map_memory() default to the safe devicenGnRnE attributes on
ARM64 if the mapped address range is not defined in the EFI memory map.
Fixes: 437b38c51162 ("ACPI: Add memory semantics to acpi_os_map_memory()")
Signed-off-by: Jia He <[email protected]>
Acked-by: Lorenzo Pieralisi <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Pull rseq fixes from Paolo Bonzini:
"A fix for a bug with restartable sequences and KVM.
KVM's handling of TIF_NOTIFY_RESUME, e.g. for task migration, clears
the flag without informing rseq and leads to stale data in userspace's
rseq struct"
* tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Remove __NR_userfaultfd syscall fallback
KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
tools: Move x86 syscall number fallbacks to .../uapi/
entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
Previous releases - regressions:
- introduce a shutdown method to mdio device drivers, and make DSA
switch drivers compatible with masters disappearing on shutdown;
preventing infinite reference wait
- fix issues in mdiobus users related to ->shutdown vs ->remove
- virtio-net: fix pages leaking when building skb in big mode
- xen-netback: correct success/error reporting for the
SKB-with-fraglist
- dsa: tear down devlink port regions when tearing down the devlink
port on error
- nexthop: fix division by zero while replacing a resilient group
- hns3: check queue, vf, vlan ids range before using
Previous releases - always broken:
- napi: fix race against netpoll causing NAPI getting stuck
- mlx4_en: ensure link operstate is updated even if link comes up
before netdev registration
- bnxt_en: fix TX timeout when TX ring size is set to the smallest
- enetc: fix illegal access when reading affinity_hint; prevent oops
on sysfs access
- mtk_eth_soc: avoid creating duplicate offload entries
Misc:
- core: correct the sock::sk_lock.owned lockdep annotations"
* tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
atlantic: Fix issue in the pm resume flow.
net/mlx4_en: Don't allow aRFS for encapsulated packets
net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
nfc: st-nci: Add SPI ID matching DT compatible
MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
nexthop: Fix memory leaks in nexthop notification chain listeners
mptcp: ensure tx skbs always have the MPTCP ext
qed: rdma - don't wait for resources under hw error recovery flow
s390/qeth: fix deadlock during failing recovery
s390/qeth: Fix deadlock in remove_discipline
s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
net: dsa: realtek: register the MDIO bus under devres
net: dsa: don't allocate the slave_mii_bus using devres
Doc: networking: Fox a typo in ice.rst
net: dsa: fix dsa_tree_setup error path
net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
net/smc: add missing error check in smc_clc_prfx_set()
net: hns3: fix a return value error in hclge_get_reset_status()
net: hns3: check vlan id before using it
...
|
|
If gpte is changed from non-present to present, the guest doesn't need
to flush tlb per SDM. So the host must synchronze sp before
link it. Otherwise the guest might use a wrong mapping.
For example: the guest first changes a level-1 pagetable, and then
links its parent to a new place where the original gpte is non-present.
Finally the guest can access the remapped area without flushing
the tlb. The guest's behavior should be allowed per SDM, but the host
kvm mmu makes it wrong.
Fixes: 4731d4c7a077 ("KVM: MMU: out of sync shadow core")
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
A noted side-effect of commit 0c6c2d3615ef ("arm64: Generate cpucaps.h")
is that cpucaps are now sorted, changing the enumeration order. This
assumed no dependencies between cpucaps, which turned out not to be true
in one case. UNMAP_KERNEL_AT_EL0 currently needs to be processed after
WORKAROUND_CAVIUM_27456. ThunderX systems are incompatible with KPTI, so
unmap_kernel_at_el0() bails if WORKAROUND_CAVIUM_27456 is set. But because
of the sorting, WORKAROUND_CAVIUM_27456 will not yet have been considered
when unmap_kernel_at_el0() checks for it, so the kernel tries to
run w/ KPTI - and quickly falls over.
Because all ThunderX implementations have homogeneous CPUs, we can remove
this dependency by just checking the current CPU for the erratum.
Fixes: 0c6c2d3615ef ("arm64: Generate cpucaps.h")
Cc: <[email protected]> # 5.13.x
Signed-off-by: dann frazier <[email protected]>
Suggested-by: Suzuki K Poulose <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
|
|
When kvm->tlbs_dirty > 0, some rmaps might have been deleted
without flushing tlb remotely after kvm_sync_page(). If @gfn
was writable before and it's rmaps was deleted in kvm_sync_page(),
and if the tlb entry is still in a remote running VCPU, the @gfn
is not safely protected.
To fix the problem, kvm_sync_page() does the remote flush when
needed to avoid the problem.
Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
These field correspond to features that we don't expose yet to L2
While currently there are no CVE worthy features in this field,
if AMD adds more features to this field, that could allow guest
escapes similar to CVE-2021-3653 and CVE-2021-3656.
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
GP SVM errata workaround made the #GP handler always emulate
the SVM instructions.
However these instructions #GP in case the operand is not 4K aligned,
but the workaround code didn't check this and we ended up
emulating these instructions anyway.
This is only an emulation accuracy check bug as there is no harm for
KVM to read/write unaligned vmcb images.
Fixes: 82a11e9c6fa2 ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions")
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In svm_clear_vintr we try to restore the virtual interrupt
injection that might be pending, but we fail to restore
the interrupt vector.
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When building under GCC 4.9 and 5.5:
arch/x86/include/asm/special_insns.h: Assembler messages:
arch/x86/include/asm/special_insns.h:286: Error: operand size mismatch for `setz'
Change the type to "bool" for condition code arguments, as documented.
Fixes: 7f5933f81bd8 ("x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction")
Co-developed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Intel PMU MSRs is in msrs_to_save_all[], so add AMD PMU MSRs to have a
consistent behavior between Intel and AMD when using KVM_GET_MSRS,
KVM_SET_MSRS or KVM_GET_MSR_INDEX_LIST.
We have to add legacy and new MSRs to handle guests running without
X86_FEATURE_PERFCTR_CORE.
Signed-off-by: Fares Mehanna <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
If L1 had invalid state on VM entry (can happen on SMM transactions
when we enter from real mode, straight to nested guest),
then after we load 'host' state from VMCS12, the state has to become
valid again, but since we load the segment registers with
__vmx_set_segment we weren't always updating emulation_required.
Update emulation_required explicitly at end of load_vmcs12_host_state.
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
!from_vmentry
It is possible that when non root mode is entered via special entry
(!from_vmentry), that is from SMM or from loading the nested state,
the L2 state could be invalid in regard to non unrestricted guest mode,
but later it can become valid.
(for example when RSM emulation restores segment registers from SMRAM)
Thus delay the check to VM entry, where we will check this and fail.
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Since no actual VM entry happened, the VM exit information is stale.
To avoid this, synthesize an invalid VM guest state VM exit.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Use return statements instead of nested if, and fix error
path to free all the maps that were allocated.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Currently the KVM_REQ_GET_NESTED_STATE_PAGES on SVM only reloads PDPTRs,
and MSR bitmap, with former not really needed for SMM as SMM exit code
reloads them again from SMRAM'S CR3, and later happens to work
since MSR bitmap isn't modified while in SMM.
Still it is better to be consistient with VMX.
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When exiting SMM, pdpts are loaded again from the guest memory.
This fixes a theoretical bug, when exit from SMM triggers entry to the
nested guest which re-uses some of the migration
code which uses this flag as a workaround for a legacy userspace.
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
SMM exit
Otherwise guest entry code might see incorrect L1 state (e.g paging state).
Fixes: 37be407b2ce8 ("KVM: nSVM: Fix L1 state corruption upon return from SMM")
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Windows Server 2022 with Hyper-V role enabled failed to boot on KVM when
enlightened VMCS is advertised. Debugging revealed there are two exposed
secondary controls it is not happy with: SECONDARY_EXEC_ENABLE_VMFUNC and
SECONDARY_EXEC_SHADOW_VMCS. These controls are known to be unsupported,
as there are no corresponding fields in eVMCSv1 (see the comment above
EVMCS1_UNSUPPORTED_2NDEXEC definition).
Previously, commit 31de3d2500e4 ("x86/kvm/hyper-v: move VMX controls
sanitization out of nested_enable_evmcs()") introduced the required
filtering mechanism for VMX MSRs but for some reason put only known
to be problematic (and not full EVMCS1_UNSUPPORTED_* lists) controls
there.
Note, Windows Server 2022 seems to have gained some sanity check for VMX
MSRs: it doesn't even try to launch a guest when there's something it
doesn't like, nested_evmcs_check_controls() mechanism can't catch the
problem.
Let's be bold this time and instead of playing whack-a-mole just filter out
all unsupported controls from VMX MSRs.
Fixes: 31de3d2500e4 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
KASAN reports the following issue:
BUG: KASAN: stack-out-of-bounds in kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
Read of size 8 at addr ffffc9001364f638 by task qemu-kvm/4798
CPU: 0 PID: 4798 Comm: qemu-kvm Tainted: G X --------- ---
Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RYM0081C 07/13/2020
Call Trace:
dump_stack+0xa5/0xe6
print_address_description.constprop.0+0x18/0x130
? kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
__kasan_report.cold+0x7f/0x114
? kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
kasan_report+0x38/0x50
kasan_check_range+0xf5/0x1d0
kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
kvm_make_scan_ioapic_request_mask+0x84/0xc0 [kvm]
? kvm_arch_exit+0x110/0x110 [kvm]
? sched_clock+0x5/0x10
ioapic_write_indirect+0x59f/0x9e0 [kvm]
? static_obj+0xc0/0xc0
? __lock_acquired+0x1d2/0x8c0
? kvm_ioapic_eoi_inject_work+0x120/0x120 [kvm]
The problem appears to be that 'vcpu_bitmap' is allocated as a single long
on stack and it should really be KVM_MAX_VCPUS long. We also seem to clear
the lower 16 bits of it with bitmap_zero() for no particular reason (my
guess would be that 'bitmap' and 'vcpu_bitmap' variables in
kvm_bitmap_or_dest_vcpus() caused the confusion: while the later is indeed
16-bit long, the later should accommodate all possible vCPUs).
Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Fixes: 9a2ae9f6b6bb ("KVM: x86: Zero the IOAPIC scan request dest vCPUs bitmap")
Reported-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
A mirrored SEV-ES VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to
setup its vCPUs and have them measured, and their VMSAs encrypted. Without
this change, it is impossible to have mirror VMs as part of SEV-ES VMs.
Also allow the guest status check and debugging commands since they do
not change any guest state.
Signed-off-by: Peter Gonda <[email protected]>
Cc: Marc Orr <[email protected]>
Cc: Nathan Tempelman <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steve Rutherford <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21)
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
For mirroring SEV-ES the mirror VM will need more then just the ASID.
The FD and the handle are required to all the mirror to call psp
commands. The mirror VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to
setup its vCPUs' VMSAs for SEV-ES.
Signed-off-by: Peter Gonda <[email protected]>
Cc: Marc Orr <[email protected]>
Cc: Nathan Tempelman <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steve Rutherford <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21)
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Nested bus lock VM exits are not supported yet. If L2 triggers bus lock
VM exit, it will be directed to L1 VMM, which would cause unexpected
behavior. Therefore, handle L2's bus lock VM exits in L0 directly.
Fixes: fe6b6bc802b4 ("KVM: VMX: Enable bus lock VM exit")
Signed-off-by: Chenyi Qiang <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Use vcpu_idx to identify vCPU0 when updating HyperV's TSC page, which is
shared by all vCPUs and "owned" by vCPU0 (because vCPU0 is the only vCPU
that's guaranteed to exist). Using kvm_get_vcpu() to find vCPU works,
but it's a rather odd and suboptimal method to check the index of a given
vCPU.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Read vcpu->vcpu_idx directly instead of bouncing through the one-line
wrapper, kvm_vcpu_get_idx(), and drop the wrapper. The wrapper is a
remnant of the original implementation and serves no purpose; remove it
before it gains more users.
Back when kvm_vcpu_get_idx() was added by commit 497d72d80a78 ("KVM: Add
kvm_vcpu_get_idx to get vcpu index in kvm->vcpus"), the implementation
was more than just a simple wrapper as vcpu->vcpu_idx did not exist and
retrieving the index meant walking over the vCPU array to find the given
vCPU.
When vcpu_idx was introduced by commit 8750e72a79dd ("KVM: remember
position in kvm->vcpus array"), the helper was left behind, likely to
avoid extra thrash (but even then there were only two users, the original
arm usage having been removed at some point in the past).
No functional change intended.
Suggested-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
According to Intel's SDM Vol2 and AMD's APM Vol3, when
CR4.TSD is set, use rdtsc/rdtscp instruction above privilege
level 0 should trigger a #GP.
Fixes: d7eb82030699e ("KVM: SVM: Add intercept checks for remaining group7 instructions")
Signed-off-by: Hou Wenlong <[email protected]>
Message-Id: <1297c0dd3f1bb47a6d089f850b629c7aa0247040.1629257115.git.houwenlong93@linux.alibaba.com>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Require the target guest page to be writable when pinning memory for
RECEIVE_UPDATE_DATA. Per the SEV API, the PSP writes to guest memory:
The result is then encrypted with GCTX.VEK and written to the memory
pointed to by GUEST_PADDR field.
Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Cc: [email protected]
Cc: Peter Gonda <[email protected]>
Cc: Marc Orr <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Brijesh Singh <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Brijesh Singh <[email protected]>
Reviewed-by: Peter Gonda <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
DECOMMISSION the current SEV context if binding an ASID fails after
RECEIVE_START. Per AMD's SEV API, RECEIVE_START generates a new guest
context and thus needs to be paired with DECOMMISSION:
The RECEIVE_START command is the only command other than the LAUNCH_START
command that generates a new guest context and guest handle.
The missing DECOMMISSION can result in subsequent SEV launch failures,
as the firmware leaks memory and might not able to allocate more SEV
guest contexts in the future.
Note, LAUNCH_START suffered the same bug, but was previously fixed by
commit 934002cd660b ("KVM: SVM: Call SEV Guest Decommission if ASID
binding fails").
Cc: Alper Gun <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: David Rienjes <[email protected]>
Cc: Marc Orr <[email protected]>
Cc: John Allen <[email protected]>
Cc: Peter Gonda <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vipin Sharma <[email protected]>
Cc: [email protected]
Reviewed-by: Marc Orr <[email protected]>
Acked-by: Brijesh Singh <[email protected]>
Fixes: af43cbbf954b ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command")
Signed-off-by: Mingwei Zhang <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The update-VMSA ioctl touches data stored in struct kvm_vcpu, and
therefore should not be performed concurrently with any VCPU ioctl
that might cause KVM or the processor to use the same data.
Adds vcpu mutex guard to the VMSA updating code. Refactors out
__sev_launch_update_vmsa() function to deal with per vCPU parts
of sev_launch_update_vmsa().
Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Signed-off-by: Peter Gonda <[email protected]>
Cc: Marc Orr <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
"VMXON pointer" is saved in vmx->nested.vmxon_ptr since
commit 3573e22cfeca ("KVM: nVMX: additional checks on
vmxon region"). Also, handle_vmptrld() & handle_vmclear()
now have logic to check the VMCS pointer against the VMXON
pointer.
So just remove the obsolete comments of handle_vmon().
Signed-off-by: Yu Zhang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Check the return of init_srcu_struct(), which can fail due to OOM, when
initializing the page track mechanism. Lack of checking leads to a NULL
pointer deref found by a modified syzkaller.
Reported-by: TCS Robot <[email protected]>
Signed-off-by: Haimin Zhang <[email protected]>
Message-Id: <[email protected]>
[Move the call towards the beginning of kvm_arch_init_vm. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove vcpu_vmx.nr_active_uret_msrs and its associated comment, which are
both defunct now that KVM keeps the list constant and instead explicitly
tracks which entries need to be loaded into hardware.
No functional change intended.
Fixes: ee9d22e08d13 ("KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list")
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Explicitly zero the guest's CR3 and mark it available+dirty at RESET/INIT.
Per Intel's SDM and AMD's APM, CR3 is zeroed at both RESET and INIT. For
RESET, this is a nop as vcpu is zero-allocated. For INIT, the bug has
likely escaped notice because no firmware/kernel puts its page tables root
at PA=0, let alone relies on INIT to get the desired CR3 for such page
tables.
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Mark all registers as available and dirty at vCPU creation, as the vCPU has
obviously not been loaded into hardware, let alone been given the chance to
be modified in hardware. On SVM, reading from "uninitialized" hardware is
a non-issue as VMCBs are zero allocated (thus not truly uninitialized) and
hardware does not allow for arbitrary field encoding schemes.
On VMX, backing memory for VMCSes is also zero allocated, but true
initialization of the VMCS _technically_ requires VMWRITEs, as the VMX
architectural specification technically allows CPU implementations to
encode fields with arbitrary schemes. E.g. a CPU could theoretically store
the inverted value of every field, which would result in VMREAD to a
zero-allocated field returns all ones.
In practice, only the AR_BYTES fields are known to be manipulated by
hardware during VMREAD/VMREAD; no known hardware or VMM (for nested VMX)
does fancy encoding of cacheable field values (CR0, CR3, CR4, etc...). In
other words, this is technically a bug fix, but practically speakings it's
a glorified nop.
Failure to mark registers as available has been a lurking bug for quite
some time. The original register caching supported only GPRs (+RIP, which
is kinda sorta a GPR), with the masks initialized at ->vcpu_reset(). That
worked because the two cacheable registers, RIP and RSP, are generally
speaking not read as side effects in other flows.
Arguably, commit aff48baa34c0 ("KVM: Fetch guest cr3 from hardware on
demand") was the first instance of failure to mark regs available. While
_just_ marking CR3 available during vCPU creation wouldn't have fixed the
VMREAD from an uninitialized VMCS bug because ept_update_paging_mode_cr0()
unconditionally read vmcs.GUEST_CR3, marking CR3 _and_ intentionally not
reading GUEST_CR3 when it's available would have avoided VMREAD to a
technically-uninitialized VMCS.
Fixes: aff48baa34c0 ("KVM: Fetch guest cr3 from hardware on demand")
Fixes: 6de4f3ada40b ("KVM: Cache pdptrs")
Fixes: 6de12732c42c ("KVM: VMX: Optimize vmx_get_rflags()")
Fixes: 2fb92db1ec08 ("KVM: VMX: Cache vmcs segment fields")
Fixes: bd31fe495d0d ("KVM: VMX: Add proper cache tracking for CR0")
Fixes: f98c1e77127d ("KVM: VMX: Add proper cache tracking for CR4")
Fixes: 5addc235199f ("KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags")
Fixes: 8791585837f6 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags")
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
that the two function are always called back-to-back by architectures
that have rseq. The rseq helper is stubbed out for architectures that
don't support rseq, i.e. this is a nop across the board.
Note, tracehook_notify_resume() is horribly named and arguably does not
belong in tracehook.h as literally every line of code in it has nothing
to do with tracing. But, that's been true since commit a42c6ded827d
("move key_repace_session_keyring() into tracehook_notify_resume()")
first usurped tracehook_notify_resume() back in 2012. Punt cleaning that
mess up to future patches.
No functional change intended.
Acked-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Commit 0b9902c1fcc5 ("s390/qeth: fix deadlock during recovery") removed
taking discipline_mutex inside qeth_do_reset(), fixing potential
deadlocks. An error path was missed though, that still takes
discipline_mutex and thus has the original deadlock potential.
Intermittent deadlocks were seen when a qeth channel path is configured
offline, causing a race between qeth_do_reset and ccwgroup_remove.
Call qeth_set_offline() directly in the qeth_do_reset() error case and
then a new variant of ccwgroup_set_offline(), without taking
discipline_mutex.
Fixes: b41b554c1ee7 ("s390/qeth: fix locking for discipline setup / removal")
Signed-off-by: Alexandra Winter <[email protected]>
Reviewed-by: Julian Wiedmann <[email protected]>
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 eBPF fixes from Vasily Gorbik:
"Johan Almbladh has implemented a number of new testcases for eBPF [1],
which uncovered three miscompilation issues in the s390 eBPF JIT"
Link: https://lore.kernel.org/bpf/[email protected]/ [1]
* tag 's390-5.15-ebpf-jit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/bpf: Fix optimizing out zero-extensions
s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
s390/bpf: Fix branch shortening during codegen pass
|
|
As with strlen(), the patches importing the updated str{n}cmp()
implementations were originally developed and tested before the
advent of CONFIG_KASAN_HW_TAGS, and have subsequently revealed
not to be MTE-safe. Since in-kernel MTE is still a rather niche
case, let it temporarily fall back to the generic C versions for
correctness until we can figure out the best fix.
Fixes: 758602c04409 ("arm64: Import latest version of Cortex Strings' strcmp")
Fixes: 020b199bc70d ("arm64: Import latest version of Cortex Strings' strncmp")
Cc: <[email protected]> # 5.14.x
Reported-by: Branislav Rankov <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/34dc4d12eec0adae49b0ac927df642ed10089d40.1631890770.git.robin.murphy@arm.com
Signed-off-by: Catalin Marinas <[email protected]>
|
|
This lets us avoid doing unnecessary work on hardware that does not
support MTE, and will allow us to freely use MTE instructions in the
code called by mte_thread_switch().
Since this would mean that we do a redundant check in
mte_check_tfsr_el1(), remove it and add two checks now required in its
callers. This also avoids an unnecessary DSB+ISB sequence on the syscall
exit path for hardware not supporting MTE.
Fixes: 65812c6921cc ("arm64: mte: Enable async tag check fault")
Cc: <[email protected]> # 5.13.x
Signed-off-by: Peter Collingbourne <[email protected]>
Link: https://linux-review.googlesource.com/id/I02fd000d1ef2c86c7d2952a7f099b254ec227a5d
Link: https://lore.kernel.org/r/[email protected]
[[email protected]: adjust the commit log slightly]
Signed-off-by: Catalin Marinas <[email protected]>
|
|
Commit in Fixes introduced early_reserve_memory() to do all needed
initial memblock_reserve() calls in one function. Unfortunately, the call
of early_reserve_memory() is done too late for Xen dom0, as in some
cases a Xen hook called by e820__memory_setup() will need those memory
reservations to have happened already.
Move the call of early_reserve_memory() before the call of
e820__memory_setup() in order to avoid such problems.
Fixes: a799c2bd29d1 ("x86/setup: Consolidate early memory reservations")
Reported-by: Marek Marczykowski-Górecki <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Marek Marczykowski-Górecki <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The initial observation was that in PV mode under Xen 32-bit user space
didn't work anymore. Attempts of system calls ended in #GP(0x402). All
of the sudden the vector 0x80 handler was not in place anymore. As it
turns out up to 5.13 redundant initialization did occur: Once from
cpu_initialize_context() (through its VCPUOP_initialise hypercall) and a
2nd time while each CPU was brought fully up. This 2nd initialization is
now gone, uncovering that the 1st one was flawed: Unlike for the
set_trap_table hypercall, a full virtual IDT needs to be specified here;
the "vector" fields of the individual entries are of no interest. With
many (kernel) IDT entries still(?) (i.e. at that point at least) empty,
the syscall vector 0x80 ended up in slot 0x20 of the virtual IDT, thus
becoming the domain's handler for vector 0x20.
Make xen_convert_trap_info() fit for either purpose, leveraging the fact
that on the xen_copy_trap_info() path the table starts out zero-filled.
This includes moving out the writing of the sentinel, which would also
have lead to a buffer overrun in the xen_copy_trap_info() case if all
(kernel) IDT entries were populated. Convert the writing of the sentinel
to clearing of the entire table entry rather than just the address
field.
(I didn't bother trying to identify the commit which uncovered the issue
in 5.14; the commit named below is the one which actually introduced the
bad code.)
Fixes: f87e4cac4f4e ("xen: SMP guest support")
Cc: [email protected]
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
The function __bad_area_nosemaphore() calls kernelmode_fixup_or_oops()
with the parameter @signal being actually @pkey, which will send a
signal numbered with the argument in @pkey.
This bug can be triggered when the kernel fails to access user-given
memory pages that are protected by a pkey, so it can go down the
do_user_addr_fault() path and pass the !user_mode() check in
__bad_area_nosemaphore().
Most cases will simply run the kernel fixup code to make an -EFAULT. But
when another condition current->thread.sig_on_uaccess_err is met, which
is only used to emulate vsyscall, the kernel will generate the wrong
signal.
Add a new parameter @pkey to kernelmode_fixup_or_oops() to fix this.
[ bp: Massage commit message, fix build error as reported by the 0day
bot: https://lkml.kernel.org/r/[email protected] ]
Fixes: 5042d40a264c ("x86/fault: Bypass no_context() for implicit kernel faults from usermode")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Jiashuo Liang <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Guenter reported [1] that the pci_iounmap() changes remain problematic,
with sparc64 allnoconfig and tinyconfig still not building due to the
header file changes and confusion with the arch-specific pci_iounmap()
implementation.
I'm pretty convinced that sparc should just use GENERIC_IOMAP instead of
doing its own thing, since it turns out that the sparc64 version of
pci_iounmap() is somewhat buggy (see [2]). But in the meantime, this
just fixes the build by avoiding the trivial re-definition of the empty
case.
Link: https://lore.kernel.org/lkml/[email protected]/ [1]
Link: https://lore.kernel.org/lkml/CAHk-=wgheheFx9myQyy5osh79BAazvmvYURAtub2gQtMvLrhqQ@mail.gmail.com/ [2]
Reported-by: Guenter Roeck <[email protected]>
Cc: David Miller <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The code is unreachable for HVM or PVH, and it also makes little sense
in auto-translated environments. On Arm, with
xen_{create,destroy}_contiguous_region() both being stubs, I have a hard
time seeing what good the Xen specific variant does - the generic one
ought to be fine for all purposes there. Still Arm code explicitly
references symbols here, so the code will continue to be included there.
Instead of making PCI_XEN's "select" conditional, simply drop it -
SWIOTLB_XEN will be available unconditionally in the PV case anyway, and
is - as explained above - dead code in non-PV environments.
This in turn allows dropping the stubs for
xen_{create,destroy}_contiguous_region(), the former of which was broken
anyway - it failed to set the DMA handle output.
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
xen_swiotlb and pci_xen_swiotlb_init() are only used within the file
defining them, so make them static and remove the stubs. Otoh
pci_xen_swiotlb_detect() has a use (as function pointer) from the main
pci-swiotlb.c file - convert its stub to a #define to NULL.
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Just after having obtained the pointer from kzalloc() there's no reason
at all to set part of the area to all zero yet another time. Similarly
there's no point explicitly clearing "ldt_ents".
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Russell reported that since 5.13, KVM's probing of the PMU has
started to fail on his HW. As it turns out, there is an implicit
ordering dependency between the architectural PMU probing code and
and KVM's own probing. If, due to probe ordering reasons, KVM probes
before the PMU driver, it will fail to detect the PMU and prevent it
from being advertised to guests as well as the VMM.
Obviously, this is one probing too many, and we should be able to
deal with any ordering.
Add a callback from the PMU code into KVM to advertise the registration
of a host CPU PMU, allowing for any probing order.
Fixes: 5421db1be3b1 ("KVM: arm64: Divorce the perf code from oprofile helpers")
Reported-by: "Russell King (Oracle)" <[email protected]>
Tested-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
|
|
Add FORCE so that if_changed can detect the command line change.
We'll otherwise see a compilation warning since commit e1f86d7b4b2a
("kbuild: warn if FORCE is missing for if_changed(_dep,_rule) and
filechk").
arch/arm64/kvm/hyp/nvhe/Makefile:58: FORCE prerequisite is missing
Cc: David Brazdil <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Signed-off-by: Zenghui Yu <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Prevent a infinite loop in the MCE recovery on return to user space,
which was caused by a second MCE queueing work for the same page and
thereby creating a circular work list.
- Make kern_addr_valid() handle existing PMD entries, which are marked
not present in the higher level page table, correctly instead of
blindly dereferencing them.
- Pass a valid address to sanitize_phys(). This was caused by the
mixture of inclusive and exclusive ranges. memtype_reserve() expect
'end' being exclusive, but sanitize_phys() wants it inclusive. This
worked so far, but with end being the end of the physical address
space the fail is exposed.
- Increase the maximum supported GPIO numbers for 64bit. Newer SoCs
exceed the previous maximum.
* tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Avoid infinite loop for copy from user recovery
x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
x86/platform: Increase maximum GPIO number for X86_64
x86/pat: Pass valid address to sanitize_phys()
|