aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-22ksmbd: check protocol id in ksmbd_verify_smb_message()Namjae Jeon3-5/+11
When second smb2 pdu has invalid protocol id, ksmbd doesn't detect it and allow to process smb2 request. This patch add the check it in ksmbd_verify_smb_message() and don't use protocol id of smb2 request as protocol id of response. Reviewed-by: Ronnie Sahlberg <[email protected]> Reviewed-by: Ralph Böhme <[email protected]> Reported-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-22init: Revert accidental changes to print irqs_disabled()Geert Uytterhoeven1-3/+3
Commit f8ade8dddb16 ("xsurf100: drop include of lib8390.c") accidentally changed init/main.c. Revert that part. Fixes: f8ade8dddb16 ("xsurf100: drop include of lib8390.c") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-22MAINTAINERS: Update Xen-[PCI,SWIOTLB,Block] maintainershipKonrad Rzeszutek Wilk1-3/+3
Konrad's new job role is putting a serious cramp on him being a responsive maintainer and as such he is handing off the reins to Juergen, Roger, and Stefano. Thank you! Acked-by: Juergen Gross <[email protected]> Acked-by: Roger Pau Monné <[email protected]> Acked-by: Stefano Stabellini <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-22MAINTAINERS: Update SWIOTLB maintainershipKonrad Rzeszutek Wilk1-2/+3
Konrad's new job role is putting a serious cramp on him being a responsive maintainer and as such he is handing off the reins to Christoph Hellwig. Thank you! Acked-by: Christoph Hellwig <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-22MAINTAINERS: update entry for NIOS2Dinh Nguyen1-2/+2
Ley Foon has left Intel and will no longer be able to maintain NIOS2. Update the MAINTAINER's entry to Dinh Nguyen. Acked-by: Ley Foon Tan <[email protected]> Signed-off-by: Dinh Nguyen <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-22Merge tag 'spi-fix-v5.15-rc1' of ↵Linus Torvalds1-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi modalias fix from Mark Brown: "Fix modalias issues As reported by Russell King the change to use OF style modaliases for DT enumerated broke at least the spi-nor driver, the patch here reverts that change to fix the regression. Sadly this will mean that anything that started loading since the change to OF modaliases will run into issues, there doesn't seem to be any approach which doesn't cause some problems and thi seems like the least bad approach - gory details are in the commit log for the change. I'm currently working through the SPI drivers to add ID tables and missing IDs to tables which should address things from the other end, this seems more straightforward and robust than any other options" * tag 'spi-fix-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: Revert modalias changes
2021-09-22Merge branch 'md-fixes' of ↵Jens Axboe1-5/+0
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.15 Pull MD fix from Song. * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fix a lock order reversal in md_alloc
2021-09-22fs-verity: fix signed integer overflow with i_size near S64_MAXEric Biggers2-2/+2
If the file size is almost S64_MAX, the calculated number of Merkle tree levels exceeds FS_VERITY_MAX_LEVELS, causing FS_IOC_ENABLE_VERITY to fail. This is unintentional, since as the comment above the definition of FS_VERITY_MAX_LEVELS states, it is enough for over U64_MAX bytes of data using SHA-256 and 4K blocks. (Specifically, 4096*128**8 >= 2**64.) The bug is actually that when the number of blocks in the first level is calculated from i_size, there is a signed integer overflow due to i_size being signed. Fix this by treating i_size as unsigned. This was found by the new test "generic: test fs-verity EFBIG scenarios" (https://lkml.kernel.org/r/b1d116cd4d0ea74b9cd86f349c672021e005a75c.1631558495.git.boris@bur.io). This didn't affect ext4 or f2fs since those have a smaller maximum file size, but it did affect btrfs which allows files up to S64_MAX bytes. Reported-by: Boris Burkov <[email protected]> Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl") Fixes: fd2d1acfcadf ("fs-verity: add the hook for file ->open()") Cc: <[email protected]> # v5.4+ Reviewed-by: Boris Burkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2021-09-22x86/asm: Fix SETZ size enqcmds() build failureKees Cook1-1/+1
When building under GCC 4.9 and 5.5: arch/x86/include/asm/special_insns.h: Assembler messages: arch/x86/include/asm/special_insns.h:286: Error: operand size mismatch for `setz' Change the type to "bool" for condition code arguments, as documented. Fixes: 7f5933f81bd8 ("x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction") Co-developed-by: Arnd Bergmann <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-09-22RDMA/cma: Do not change route.addr.src_addr.ss_familyJason Gunthorpe1-2/+6
If the state is not idle then rdma_bind_addr() will immediately fail and no change to global state should happen. For instance if the state is already RDMA_CM_LISTEN then this will corrupt the src_addr and would cause the test in cma_cancel_operation(): if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev) To view a mangled src_addr, eg with a IPv6 loopback address but an IPv4 family, failing the test. This would manifest as this trace from syzkaller: BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26 Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204 CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 __list_add_valid+0x93/0xa0 lib/list_debug.c:26 __list_add include/linux/list.h:67 [inline] list_add_tail include/linux/list.h:100 [inline] cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline] rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751 ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102 ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732 vfs_write+0x28e/0xa30 fs/read_write.c:603 ksys_write+0x1ee/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Which is indicating that an rdma_id_private was destroyed without doing cma_cancel_listens(). Instead of trying to re-use the src_addr memory to indirectly create an any address build one explicitly on the stack and bind to that as any other normal flow would do. Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Fixes: 732d41c545bb ("RDMA/cma: Make the locking for automatic state transition more clear") Reported-by: [email protected] Tested-by: Hao Sun <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-09-22Merge tag 'nfsd-5.15-2' of ↵Linus Torvalds2-14/+15
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Critical bug fixes: - Fix crash in NLM TEST procedure - NFSv4.1+ backchannel not restored after PATH_DOWN" * tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN NLM: Fix svcxdr_encode_owner()
2021-09-22Merge tag 'platform-drivers-x86-v5.15-2' of ↵Linus Torvalds7-15/+77
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "The first round of bug-fixes for platform-drivers-x86 for 5.15, highlights: - amd-pmc fix for some suspend/resume issues - intel-hid fix to avoid false-positive SW_TABLET_MODE=1 reporting - some build error/warning fixes - various DMI quirk additions" * tag 'platform-drivers-x86-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: gigabyte-wmi: add support for B550I Aorus Pro AX platform/x86/intel: hid: Add DMI switches allow list platform/x86: dell: fix DELL_WMI_PRIVACY dependencies & build error platform/x86: amd-pmc: Increase the response register timeout platform/x86: touchscreen_dmi: Update info for the Chuwi Hi10 Plus (CWI527) tablet platform/x86: touchscreen_dmi: Add info for the Chuwi HiBook (CWI514) tablet lg-laptop: Correctly handle dmi_get_system_info() returning NULL platform/x86/intel: punit_ipc: Drop wrong use of ACPI_PTR()
2021-09-22MAINTAINERS: ARM/VT8500, remove defunct e-mailJiri Slaby1-2/+1
[email protected] is defunct: 4.1.2 <[email protected]>: Recipient address rejected: Domain not found Remove it from MAINTAINERS and mark the ARM/VT8500 entry orphan. Signed-off-by: Jiri Slaby <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-22md: fix a lock order reversal in md_allocChristoph Hellwig1-5/+0
Commit b0140891a8cea3 ("md: Fix race when creating a new md device.") not only moved assigning mddev->gendisk before calling add_disk, which fixes the races described in the commit log, but also added a mddev->open_mutex critical section over add_disk and creation of the md kobj. Adding a kobject after add_disk is racy vs deleting the gendisk right after adding it, but md already prevents against that by holding a mddev->active reference. On the other hand taking this lock added a lock order reversal with what is not disk->open_mutex (used to be bdev->bd_mutex when the commit was added) for partition devices, which need that lock for the internal open for the partition scan, and a recent commit also takes it for non-partitioned devices, leading to further lockdep splatter. Fixes: b0140891a8ce ("md: Fix race when creating a new md device.") Fixes: d62633873590 ("block: support delayed holder registration") Reported-by: [email protected] Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: [email protected] Reviewed-by: NeilBrown <[email protected]> Signed-off-by: Song Liu <[email protected]>
2021-09-22kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[]Fares Mehanna1-0/+7
Intel PMU MSRs is in msrs_to_save_all[], so add AMD PMU MSRs to have a consistent behavior between Intel and AMD when using KVM_GET_MSRS, KVM_SET_MSRS or KVM_GET_MSR_INDEX_LIST. We have to add legacy and new MSRs to handle guests running without X86_FEATURE_PERFCTR_CORE. Signed-off-by: Fares Mehanna <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: nVMX: re-evaluate emulation_required on nested VM exitMaxim Levitsky3-4/+7
If L1 had invalid state on VM entry (can happen on SMM transactions when we enter from real mode, straight to nested guest), then after we load 'host' state from VMCS12, the state has to become valid again, but since we load the segment registers with __vmx_set_segment we weren't always updating emulation_required. Update emulation_required explicitly at end of load_vmcs12_host_state. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if ↵Maxim Levitsky2-2/+10
!from_vmentry It is possible that when non root mode is entered via special entry (!from_vmentry), that is from SMM or from loading the nested state, the L2 state could be invalid in regard to non unrestricted guest mode, but later it can become valid. (for example when RSM emulation restores segment registers from SMRAM) Thus delay the check to VM entry, where we will check this and fail. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest stateMaxim Levitsky1-3/+14
Since no actual VM entry happened, the VM exit information is stale. To avoid this, synthesize an invalid VM guest state VM exit. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smmMaxim Levitsky1-66/+69
Use return statements instead of nested if, and fix error path to free all the maps that were allocated. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM modeMaxim Levitsky3-5/+9
Currently the KVM_REQ_GET_NESTED_STATE_PAGES on SVM only reloads PDPTRs, and MSR bitmap, with former not really needed for SMM as SMM exit code reloads them again from SMRAM'S CR3, and later happens to work since MSR bitmap isn't modified while in SMM. Still it is better to be consistient with VMX. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: reset pdptrs_from_userspace when exiting smmMaxim Levitsky1-0/+7
When exiting SMM, pdpts are loaded again from the guest memory. This fixes a theoretical bug, when exit from SMM triggers entry to the nested guest which re-uses some of the migration code which uses this flag as a workaround for a legacy userspace. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on ↵Maxim Levitsky1-5/+7
SMM exit Otherwise guest entry code might see incorrect L1 state (e.g paging state). Fixes: 37be407b2ce8 ("KVM: nSVM: Fix L1 state corruption upon return from SMM") Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: nVMX: Filter out all unsupported controls when eVMCS was activatedVitaly Kuznetsov2-7/+14
Windows Server 2022 with Hyper-V role enabled failed to boot on KVM when enlightened VMCS is advertised. Debugging revealed there are two exposed secondary controls it is not happy with: SECONDARY_EXEC_ENABLE_VMFUNC and SECONDARY_EXEC_SHADOW_VMCS. These controls are known to be unsupported, as there are no corresponding fields in eVMCSv1 (see the comment above EVMCS1_UNSUPPORTED_2NDEXEC definition). Previously, commit 31de3d2500e4 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()") introduced the required filtering mechanism for VMX MSRs but for some reason put only known to be problematic (and not full EVMCS1_UNSUPPORTED_* lists) controls there. Note, Windows Server 2022 seems to have gained some sanity check for VMX MSRs: it doesn't even try to launch a guest when there's something it doesn't like, nested_evmcs_check_controls() mechanism can't catch the problem. Let's be bold this time and instead of playing whack-a-mole just filter out all unsupported controls from VMX MSRs. Fixes: 31de3d2500e4 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()") Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUsSean Christopherson1-3/+15
Check for a NULL cpumask_var_t when kicking multiple vCPUs via cpumask_available(), which performs a !NULL check if and only if cpumasks are configured to be allocated off-stack. This is a meaningless optimization, e.g. avoids a TEST+Jcc and TEST+CMOV on x86, but more importantly helps document that the NULL check is necessary even though all callers pass in a local variable. No functional change intended. Cc: Lai Jiangshan <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: Clean up benign vcpu->cpu data races when kicking vCPUsSean Christopherson1-8/+28
Fix a benign data race reported by syzbot+KCSAN[*] by ensuring vcpu->cpu is read exactly once, and by ensuring the vCPU is booted from guest mode if kvm_arch_vcpu_should_kick() returns true. Fix a similar race in kvm_make_vcpus_request_mask() by ensuring the vCPU is interrupted if kvm_request_needs_ipi() returns true. Reading vcpu->cpu before vcpu->mode (via kvm_arch_vcpu_should_kick() or kvm_request_needs_ipi()) means the target vCPU could get migrated (change vcpu->cpu) and enter !OUTSIDE_GUEST_MODE between reading vcpu->cpud and reading vcpu->mode. If that happens, the kick/IPI will be sent to the old pCPU, not the new pCPU that is now running the vCPU or reading SPTEs. Although failing to kick the vCPU is not exactly ideal, practically speaking it cannot cause a functional issue unless there is also a bug in the caller, and any such bug would exist regardless of kvm_vcpu_kick()'s behavior. The purpose of sending an IPI is purely to get a vCPU into the host (or out of reading SPTEs) so that the vCPU can recognize a change in state, e.g. a KVM_REQ_* request. If vCPU's handling of the state change is required for correctness, KVM must ensure either the vCPU sees the change before entering the guest, or that the sender sees the vCPU as running in guest mode. All architectures handle this by (a) sending the request before calling kvm_vcpu_kick() and (b) checking for requests _after_ setting vcpu->mode. x86's READING_SHADOW_PAGE_TABLES has similar requirements; KVM needs to ensure it kicks and waits for vCPUs that started reading SPTEs _before_ MMU changes were finalized, but any vCPU that starts reading after MMU changes were finalized will see the new state and can continue on uninterrupted. For uses of kvm_vcpu_kick() that are not paired with a KVM_REQ_*, e.g. x86's kvm_arch_sync_dirty_log(), the order of the kick must not be relied upon for functional correctness, e.g. in the dirty log case, userspace cannot assume it has a 100% complete log if vCPUs are still running. All that said, eliminate the benign race since the cost of doing so is an "extra" atomic cmpxchg() in the case where the target vCPU is loaded by the current pCPU or is not loaded at all. I.e. the kick will be skipped due to kvm_vcpu_exiting_guest_mode() seeing a compatible vcpu->mode as opposed to the kick being skipped because of the cpu checks. Keep the "cpu != me" checks even though they appear useless/impossible at first glance. x86 processes guest IPI writes in a fast path that runs in IN_GUEST_MODE, i.e. can call kvm_vcpu_kick() from IN_GUEST_MODE. And calling kvm_vm_bugged()->kvm_make_vcpus_request_mask() from IN_GUEST or READING_SHADOW_PAGE_TABLES is perfectly reasonable. Note, a race with the cpu_online() check in kvm_vcpu_kick() likely persists, e.g. the vCPU could exit guest mode and get offlined between the cpu_online() check and the sending of smp_send_reschedule(). But, the online check appears to exist only to avoid a WARN in x86's native_smp_send_reschedule() that fires if the target CPU is not online. The reschedule WARN exists because CPU offlining takes the CPU out of the scheduling pool, i.e. the WARN is intended to detect the case where the kernel attempts to schedule a task on an offline CPU. The actual sending of the IPI is a non-issue as at worst it will simpy be dropped on the floor. In other words, KVM's usurping of the reschedule IPI could theoretically trigger a WARN if the stars align, but there will be no loss of functionality. [*] https://syzkaller.appspot.com/bug?extid=cd4154e502f43f10808a Cc: Venkatesh Srinivas <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Fixes: 97222cc83163 ("KVM: Emulate local APIC in kernel") Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: Fix stack-out-of-bounds memory access from ioapic_write_indirect()Vitaly Kuznetsov1-5/+5
KASAN reports the following issue: BUG: KASAN: stack-out-of-bounds in kvm_make_vcpus_request_mask+0x174/0x440 [kvm] Read of size 8 at addr ffffc9001364f638 by task qemu-kvm/4798 CPU: 0 PID: 4798 Comm: qemu-kvm Tainted: G X --------- --- Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RYM0081C 07/13/2020 Call Trace: dump_stack+0xa5/0xe6 print_address_description.constprop.0+0x18/0x130 ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] __kasan_report.cold+0x7f/0x114 ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] kasan_report+0x38/0x50 kasan_check_range+0xf5/0x1d0 kvm_make_vcpus_request_mask+0x174/0x440 [kvm] kvm_make_scan_ioapic_request_mask+0x84/0xc0 [kvm] ? kvm_arch_exit+0x110/0x110 [kvm] ? sched_clock+0x5/0x10 ioapic_write_indirect+0x59f/0x9e0 [kvm] ? static_obj+0xc0/0xc0 ? __lock_acquired+0x1d2/0x8c0 ? kvm_ioapic_eoi_inject_work+0x120/0x120 [kvm] The problem appears to be that 'vcpu_bitmap' is allocated as a single long on stack and it should really be KVM_MAX_VCPUS long. We also seem to clear the lower 16 bits of it with bitmap_zero() for no particular reason (my guess would be that 'bitmap' and 'vcpu_bitmap' variables in kvm_bitmap_or_dest_vcpus() caused the confusion: while the later is indeed 16-bit long, the later should accommodate all possible vCPUs). Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Fixes: 9a2ae9f6b6bb ("KVM: x86: Zero the IOAPIC scan request dest vCPUs bitmap") Reported-by: Dr. David Alan Gilbert <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: selftests: Create a separate dirty bitmap per slotDavid Matlack1-15/+39
The calculation to get the per-slot dirty bitmap was incorrect leading to a buffer overrun. Fix it by splitting out the dirty bitmap into a separate bitmap per slot. Fixes: 609e6202ea5f ("KVM: selftests: Support multiple slots in dirty_log_perf_test") Signed-off-by: David Matlack <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: selftests: Refactor help message for -s backing_srcDavid Matlack6-22/+25
All selftests that support the backing_src option were printing their own description of the flag and then calling backing_src_help() to dump the list of available backing sources. Consolidate the flag printing in backing_src_help() to align indentation, reduce duplicated strings, and improve consistency across tests. Note: Passing "-s" to backing_src_help is unnecessary since every test uses the same flag. However I decided to keep it for code readability at the call sites. While here this opportunistically fixes the incorrectly interleaved printing -x help message and list of backing source types in dirty_log_perf_test. Fixes: 609e6202ea5f ("KVM: selftests: Support multiple slots in dirty_log_perf_test") Reviewed-by: Ben Gardon <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: selftests: Change backing_src flag to -s in demand_paging_testDavid Matlack1-5/+5
Every other KVM selftest uses -s for the backing_src, so switch demand_paging_test to match. Reviewed-by: Ben Gardon <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: SEV: Allow some commands for mirror VMPeter Gonda1-2/+17
A mirrored SEV-ES VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to setup its vCPUs and have them measured, and their VMSAs encrypted. Without this change, it is impossible to have mirror VMs as part of SEV-ES VMs. Also allow the guest status check and debugging commands since they do not change any guest state. Signed-off-by: Peter Gonda <[email protected]> Cc: Marc Orr <[email protected]> Cc: Nathan Tempelman <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steve Rutherford <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21) Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: SEV: Update svm_vm_copy_asid_from for SEV-ESPeter Gonda1-4/+12
For mirroring SEV-ES the mirror VM will need more then just the ASID. The FD and the handle are required to all the mirror to call psp commands. The mirror VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to setup its vCPUs' VMSAs for SEV-ES. Signed-off-by: Peter Gonda <[email protected]> Cc: Marc Orr <[email protected]> Cc: Nathan Tempelman <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steve Rutherford <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21) Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: nVMX: Fix nested bus lock VM exitChenyi Qiang1-0/+6
Nested bus lock VM exits are not supported yet. If L2 triggers bus lock VM exit, it will be directed to L1 VMM, which would cause unexpected behavior. Therefore, handle L2's bus lock VM exits in L0 directly. Fixes: fe6b6bc802b4 ("KVM: VMX: Enable bus lock VM exit") Signed-off-by: Chenyi Qiang <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: Identify vCPU0 by its vcpu_idx instead of its vCPUs array entrySean Christopherson1-1/+1
Use vcpu_idx to identify vCPU0 when updating HyperV's TSC page, which is shared by all vCPUs and "owned" by vCPU0 (because vCPU0 is the only vCPU that's guaranteed to exist). Using kvm_get_vcpu() to find vCPU works, but it's a rather odd and suboptimal method to check the index of a given vCPU. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: Query vcpu->vcpu_idx directly and drop its accessorSean Christopherson6-14/+8
Read vcpu->vcpu_idx directly instead of bouncing through the one-line wrapper, kvm_vcpu_get_idx(), and drop the wrapper. The wrapper is a remnant of the original implementation and serves no purpose; remove it before it gains more users. Back when kvm_vcpu_get_idx() was added by commit 497d72d80a78 ("KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus"), the implementation was more than just a simple wrapper as vcpu->vcpu_idx did not exist and retrieving the index meant walking over the vCPU array to find the given vCPU. When vcpu_idx was introduced by commit 8750e72a79dd ("KVM: remember position in kvm->vcpus array"), the helper was left behind, likely to avoid extra thrash (but even then there were only two users, the original arm usage having been removed at some point in the past). No functional change intended. Suggested-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22kvm: fix wrong exception emulation in check_rdtscHou Wenlong1-1/+1
According to Intel's SDM Vol2 and AMD's APM Vol3, when CR4.TSD is set, use rdtsc/rdtscp instruction above privilege level 0 should trigger a #GP. Fixes: d7eb82030699e ("KVM: SVM: Add intercept checks for remaining group7 instructions") Signed-off-by: Hou Wenlong <[email protected]> Message-Id: <1297c0dd3f1bb47a6d089f850b629c7aa0247040.1629257115.git.houwenlong93@linux.alibaba.com> Reviewed-by: Sean Christopherson <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: SEV: Pin guest memory for write for RECEIVE_UPDATE_DATASean Christopherson1-1/+1
Require the target guest page to be writable when pinning memory for RECEIVE_UPDATE_DATA. Per the SEV API, the PSP writes to guest memory: The result is then encrypted with GCTX.VEK and written to the memory pointed to by GUEST_PADDR field. Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command") Cc: [email protected] Cc: Peter Gonda <[email protected]> Cc: Marc Orr <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Brijesh Singh <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Brijesh Singh <[email protected]> Reviewed-by: Peter Gonda <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: SVM: fix missing sev_decommission in sev_receive_startMingwei Zhang1-1/+3
DECOMMISSION the current SEV context if binding an ASID fails after RECEIVE_START. Per AMD's SEV API, RECEIVE_START generates a new guest context and thus needs to be paired with DECOMMISSION: The RECEIVE_START command is the only command other than the LAUNCH_START command that generates a new guest context and guest handle. The missing DECOMMISSION can result in subsequent SEV launch failures, as the firmware leaks memory and might not able to allocate more SEV guest contexts in the future. Note, LAUNCH_START suffered the same bug, but was previously fixed by commit 934002cd660b ("KVM: SVM: Call SEV Guest Decommission if ASID binding fails"). Cc: Alper Gun <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: David Rienjes <[email protected]> Cc: Marc Orr <[email protected]> Cc: John Allen <[email protected]> Cc: Peter Gonda <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Vipin Sharma <[email protected]> Cc: [email protected] Reviewed-by: Marc Orr <[email protected]> Acked-by: Brijesh Singh <[email protected]> Fixes: af43cbbf954b ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command") Signed-off-by: Mingwei Zhang <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: SEV: Acquire vcpu mutex when updating VMSAPeter Gonda1-22/+29
The update-VMSA ioctl touches data stored in struct kvm_vcpu, and therefore should not be performed concurrently with any VCPU ioctl that might cause KVM or the processor to use the same data. Adds vcpu mutex guard to the VMSA updating code. Refactors out __sev_launch_update_vmsa() function to deal with per vCPU parts of sev_launch_update_vmsa(). Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Signed-off-by: Peter Gonda <[email protected]> Cc: Marc Orr <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: do not shrink halt_poll_ns below grow_startSergey Senozhatsky1-1/+5
grow_halt_poll_ns() ignores values between 0 and halt_poll_ns_grow_start (10000 by default). However, when we shrink halt_poll_ns we may fall way below halt_poll_ns_grow_start and endup with halt_poll_ns values that don't make a lot of sense: like 1 or 9, or 19. VCPU1 trace (halt_poll_ns_shrink equals 2): VCPU1 grow 10000 VCPU1 shrink 5000 VCPU1 shrink 2500 VCPU1 shrink 1250 VCPU1 shrink 625 VCPU1 shrink 312 VCPU1 shrink 156 VCPU1 shrink 78 VCPU1 shrink 39 VCPU1 shrink 19 VCPU1 shrink 9 VCPU1 shrink 4 Mirror what grow_halt_poll_ns() does and set halt_poll_ns to 0 as soon as new shrink-ed halt_poll_ns value falls below halt_poll_ns_grow_start. Signed-off-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: nVMX: fix comments of handle_vmon()Yu Zhang1-8/+1
"VMXON pointer" is saved in vmx->nested.vmxon_ptr since commit 3573e22cfeca ("KVM: nVMX: additional checks on vmxon region"). Also, handle_vmptrld() & handle_vmclear() now have logic to check the VMCS pointer against the VMXON pointer. So just remove the obsolete comments of handle_vmon(). Signed-off-by: Yu Zhang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: Handle SRCU initialization failure during page track initHaimin Zhang3-4/+9
Check the return of init_srcu_struct(), which can fail due to OOM, when initializing the page track mechanism. Lack of checking leads to a NULL pointer deref found by a modified syzkaller. Reported-by: TCS Robot <[email protected]> Signed-off-by: Haimin Zhang <[email protected]> Message-Id: <[email protected]> [Move the call towards the beginning of kvm_arch_init_vm. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: VMX: Remove defunct "nr_active_uret_msrs" fieldSean Christopherson1-4/+0
Remove vcpu_vmx.nr_active_uret_msrs and its associated comment, which are both defunct now that KVM keeps the list constant and instead explicitly tracks which entries need to be loaded into hardware. No functional change intended. Fixes: ee9d22e08d13 ("KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list") Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22selftests: KVM: Align SMCCC call with the spec in steal_timeOliver Upton1-2/+2
The SMC64 calling convention passes a function identifier in w0 and its parameters in x1-x17. Given this, there are two deviations in the SMC64 call performed by the steal_time test: the function identifier is assigned to a 64 bit register and the parameter is only 32 bits wide. Align the call with the SMCCC by using a 32 bit register to handle the function identifier and increasing the parameter width to 64 bits. Suggested-by: Andrew Jones <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22selftests: KVM: Fix check for !POLLIN in demand_paging_testOliver Upton1-1/+1
The logical not operator applies only to the left hand side of a bitwise operator. As such, the check for POLLIN not being set in revents wrong. Fix it by adding parentheses around the bitwise expression. Fixes: 4f72180eb4da ("KVM: selftests: Add demand paging content to the demand paging test") Reviewed-by: Andrew Jones <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: Clear KVM's cached guest CR3 at RESET/INITSean Christopherson1-0/+3
Explicitly zero the guest's CR3 and mark it available+dirty at RESET/INIT. Per Intel's SDM and AMD's APM, CR3 is zeroed at both RESET and INIT. For RESET, this is a nop as vcpu is zero-allocated. For INIT, the bug has likely escaped notice because no firmware/kernel puts its page tables root at PA=0, let alone relies on INIT to get the desired CR3 for such page tables. Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: x86: Mark all registers as avail/dirty at vCPU creationSean Christopherson1-0/+2
Mark all registers as available and dirty at vCPU creation, as the vCPU has obviously not been loaded into hardware, let alone been given the chance to be modified in hardware. On SVM, reading from "uninitialized" hardware is a non-issue as VMCBs are zero allocated (thus not truly uninitialized) and hardware does not allow for arbitrary field encoding schemes. On VMX, backing memory for VMCSes is also zero allocated, but true initialization of the VMCS _technically_ requires VMWRITEs, as the VMX architectural specification technically allows CPU implementations to encode fields with arbitrary schemes. E.g. a CPU could theoretically store the inverted value of every field, which would result in VMREAD to a zero-allocated field returns all ones. In practice, only the AR_BYTES fields are known to be manipulated by hardware during VMREAD/VMREAD; no known hardware or VMM (for nested VMX) does fancy encoding of cacheable field values (CR0, CR3, CR4, etc...). In other words, this is technically a bug fix, but practically speakings it's a glorified nop. Failure to mark registers as available has been a lurking bug for quite some time. The original register caching supported only GPRs (+RIP, which is kinda sorta a GPR), with the masks initialized at ->vcpu_reset(). That worked because the two cacheable registers, RIP and RSP, are generally speaking not read as side effects in other flows. Arguably, commit aff48baa34c0 ("KVM: Fetch guest cr3 from hardware on demand") was the first instance of failure to mark regs available. While _just_ marking CR3 available during vCPU creation wouldn't have fixed the VMREAD from an uninitialized VMCS bug because ept_update_paging_mode_cr0() unconditionally read vmcs.GUEST_CR3, marking CR3 _and_ intentionally not reading GUEST_CR3 when it's available would have avoided VMREAD to a technically-uninitialized VMCS. Fixes: aff48baa34c0 ("KVM: Fetch guest cr3 from hardware on demand") Fixes: 6de4f3ada40b ("KVM: Cache pdptrs") Fixes: 6de12732c42c ("KVM: VMX: Optimize vmx_get_rflags()") Fixes: 2fb92db1ec08 ("KVM: VMX: Cache vmcs segment fields") Fixes: bd31fe495d0d ("KVM: VMX: Add proper cache tracking for CR0") Fixes: f98c1e77127d ("KVM: VMX: Add proper cache tracking for CR4") Fixes: 5addc235199f ("KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags") Fixes: 8791585837f6 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags") Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: selftests: Remove __NR_userfaultfd syscall fallbackSean Christopherson1-3/+0
Revert the __NR_userfaultfd syscall fallback added for KVM selftests now that x86's unistd_{32,63}.h overrides are under uapi/ and thus not in KVM selftests' search path, i.e. now that KVM gets x86 syscall numbers from the installed kernel headers. No functional change intended. Reviewed-by: Ben Gardon <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugsSean Christopherson3-0/+240
Add a test to verify an rseq's CPU ID is updated correctly if the task is migrated while the kernel is handling KVM_RUN. This is a regression test for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM without updating rseq, leading to a stale CPU ID and other badness. Signed-off-by: Sean Christopherson <[email protected]> Acked-by: Mathieu Desnoyers <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22tools: Move x86 syscall number fallbacks to .../uapi/Sean Christopherson2-0/+0
Move unistd_{32,64}.h from x86/include/asm to x86/include/uapi/asm so that tools/selftests that install kernel headers, e.g. KVM selftests, can include non-uapi tools headers, e.g. to get 'struct list_head', without effectively overriding the installed non-tool uapi headers. Swapping KVM's search order, e.g. to search the kernel headers before tool headers, is not a viable option as doing results in linux/type.h and other core headers getting pulled from the kernel headers, which do not have the kernel-internal typedefs that are used through tools, including many files outside of selftests/kvm's control. Prior to commit cec07f53c398 ("perf tools: Move syscall number fallbacks from perf-sys.h to tools/arch/x86/include/asm/"), the handcoded numbers were actual fallbacks, i.e. overriding unistd_{32,64}.h from the kernel headers was unintentional. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-09-22entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()Sean Christopherson8-19/+8
Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now that the two function are always called back-to-back by architectures that have rseq. The rseq helper is stubbed out for architectures that don't support rseq, i.e. this is a nop across the board. Note, tracehook_notify_resume() is horribly named and arguably does not belong in tracehook.h as literally every line of code in it has nothing to do with tracing. But, that's been true since commit a42c6ded827d ("move key_repace_session_keyring() into tracehook_notify_resume()") first usurped tracehook_notify_resume() back in 2012. Punt cleaning that mess up to future patches. No functional change intended. Acked-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>