aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-08-31KVM: x86/mmu: Delete rmap_printk() and all its usageSean Christopherson2-14/+0
Delete rmap_printk() so that MMU_WARN_ON() and MMU_DEBUG can be morphed into something that can be regularly enabled for debug kernels. The information provided by rmap_printk() isn't all that useful now that the rmap and unsync code is mature, as the prints are simultaneously too verbose (_lots_ of message) and yet not verbose enough to be helpful for debug (most instances print just the SPTE pointer/value, which is rarely sufficient to root cause anything but trivial bugs). Alternatively, rmap_printk() could be reworked to into tracepoints, but it's not clear there is a real need as rmap bugs rarely escape initial development, and when bugs do escape to production, they are often edge cases and/or reside in code that isn't directly related to the rmaps. In other words, the problems with rmap_printk() being unhelpful also apply to tracepoints. And deleting rmap_printk() doesn't preclude adding tracepoints in the future. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-08-31KVM: x86/mmu: Delete pgprintk() and all its usageSean Christopherson4-28/+0
Delete KVM's pgprintk() and all its usage, as the code is very prone to bitrot due to being buried behind MMU_DEBUG, and the functionality has been rendered almost entirely obsolete by the tracepoints KVM has gained over the years. And for the situations where the information provided by KVM's tracepoints is insufficient, pgprintk() rarely fills in the gaps, and is almost always far too noisy, i.e. developers end up implementing custom prints anyways. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-08-31KVM: x86/mmu: Guard against collision with KVM-defined PFERR_IMPLICIT_ACCESSSean Christopherson1-0/+11
Add an assertion in kvm_mmu_page_fault() to ensure the error code provided by hardware doesn't conflict with KVM's software-defined IMPLICIT_ACCESS flag. In the unlikely scenario that future hardware starts using bit 48 for a hardware-defined flag, preserving the bit could result in KVM incorrectly interpreting the unknown flag as KVM's IMPLICIT_ACCESS flag. WARN so that any such conflict can be surfaced to KVM developers and resolved, but otherwise ignore the bit as KVM can't possibly rely on a flag it knows nothing about. Fixes: 4f4aa80e3b88 ("KVM: X86: Handle implicit supervisor access with SMAP") Acked-by: Kai Huang <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-08-31KVM: x86/mmu: Move the lockdep_assert of mmu_lock to inside ↵Like Xu1-1/+2
clear_dirty_pt_masked() Move the lockdep_assert_held_write(&kvm->mmu_lock) from the only one caller kvm_tdp_mmu_clear_dirty_pt_masked() to inside clear_dirty_pt_masked(). This change makes it more obvious why it's safe for clear_dirty_pt_masked() to use the non-atomic (for non-volatile SPTEs) tdp_mmu_clear_spte_bits() helper. for_each_tdp_mmu_root() does its own lockdep, so the only "loss" in lockdep coverage is if the list is completely empty. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Like Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-08-31Merge tag 'kvm-x86-misc-6.6' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini28-445/+490
KVM x86 changes for 6.6: - Misc cleanups - Retry APIC optimized recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR can diverge from the default iff TSC scaling is enabled, and clean up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID
2023-08-31Merge tag 'kvm-x86-svm-6.6' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini7-136/+252
KVM: x86: SVM changes for 6.6: - Add support for SEV-ES DebugSwap, i.e. allow SEV-ES guests to use debug registers and generate/handle #DBs - Clean up LBR virtualization code - Fix a bug where KVM fails to set the target pCPU during an IRTE update - Fix fatal bugs in SEV-ES intrahost migration - Fix a bug where the recent (architecturally correct) change to reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
2023-08-31Merge tag 'kvm-x86-vmx-6.6' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-21/+17
KVM: x86: VMX changes for 6.6: - Misc cleanups - Fix a bug where KVM reads a stale vmcs.IDT_VECTORING_INFO_FIELD when trying to handle NMI VM-Exits
2023-08-31Merge tag 'kvm-x86-pmu-6.6' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2-29/+56
KVM x86 PMU changes for 6.6: - Clean up KVM's handling of Intel architectural events
2023-08-31Merge tag 'kvm-riscv-6.6-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini21-1085/+2547
KVM/riscv changes for 6.6 - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for Guest/VM - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V
2023-08-31Merge tag 'kvm-s390-next-6.6-1' of ↵Paolo Bonzini14-91/+482
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya)
2023-08-31Merge tag 'kvm-x86-selftests-6.6' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini56-468/+1389
KVM: x86: Selftests changes for 6.6: - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs - Add support for printf() in guest code and covert all guest asserts to use printf-based reporting - Clean up the PMU event filter test and add new testcases - Include x86 selftests in the KVM x86 MAINTAINERS entry
2023-08-31Merge tag 'kvm-x86-generic-6.6' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini7-21/+22
Common KVM changes for 6.6: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations
2023-08-31Merge tag 'kvmarm-6.6' of ↵Paolo Bonzini46-328/+2993
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.6 - Add support for TLB range invalidation of Stage-2 page tables, avoiding unnecessary invalidations. Systems that do not implement range invalidation still rely on a full invalidation when dealing with large ranges. - Add infrastructure for forwarding traps taken from a L2 guest to the L1 guest, with L0 acting as the dispatcher, another baby step towards the full nested support. - Simplify the way we deal with the (long deprecated) 'CPU target', resulting in a much needed cleanup. - Fix another set of PMU bugs, both on the guest and host sides, as we seem to never have any shortage of those... - Relax the alignment requirements of EL2 VA allocations for non-stack allocations, as we were otherwise wasting a lot of that precious VA space. - The usual set of non-functional cleanups, although I note the lack of spelling fixes...
2023-08-28KVM: VMX: Refresh available regs and IDT vectoring info before NMI handlingSean Christopherson1-10/+11
Reset the mask of available "registers" and refresh the IDT vectoring info snapshot in vmx_vcpu_enter_exit(), before KVM potentially handles a an NMI VM-Exit. One of the "registers" that KVM VMX lazily loads is the vmcs.VM_EXIT_INTR_INFO field, which is holds the vector+type on "exception or NMI" VM-Exits, i.e. is needed to identify NMIs. Clearing the available registers bitmask after handling NMIs results in KVM querying info from the last VM-Exit that read vmcs.VM_EXIT_INTR_INFO, and leads to both missed NMIs and spurious NMIs in the host. Opportunistically grab vmcs.IDT_VECTORING_INFO_FIELD early in the VM-Exit path too, e.g. to guard against similar consumption of stale data. The field is read on every "normal" VM-Exit, and there's no point in delaying the inevitable. Reported-by: Like Xu <[email protected]> Fixes: 11df586d774f ("KVM: VMX: Handle NMI VM-Exits in noinstr region") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-28KVM: s390: pv: Allow AP-instructions for pv-guestsSteffen Eiden2-3/+15
Introduces new feature bits and enablement flags for AP and AP IRQ support. Signed-off-by: Steffen Eiden <[email protected]> Reviewed-by: Janosch Frank <[email protected]> Reviewed-by: Michael Mueller <[email protected]> Signed-off-by: Janosch Frank <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]>
2023-08-28KVM: s390: Add UV feature negotiationSteffen Eiden3-0/+91
Add a uv_feature list for pv-guests to the KVM cpu-model. The feature bits 'AP-interpretation for secure guests' and 'AP-interrupt for secure guests' are available. Signed-off-by: Steffen Eiden <[email protected]> Reviewed-by: Janosch Frank <[email protected]> Reviewed-by: Michael Mueller <[email protected]> Signed-off-by: Janosch Frank <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]>
2023-08-28s390/uv: UV feature check utilitySteffen Eiden4-3/+10
Introduces a function to check the existence of an UV feature. Refactor feature bit checks to use the new function. Signed-off-by: Steffen Eiden <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Reviewed-by: Janosch Frank <[email protected]> Signed-off-by: Janosch Frank <[email protected]> Reviewed-by: Michael Mueller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-Id: <[email protected]>
2023-08-28KVM: s390: pv: relax WARN_ONCE condition for destroy fastViktor Mihajlovski1-1/+2
Destroy configuration fast may return with RC 0x104 if there are still bound APQNs in the configuration. The final cleanup will occur with the standard destroy configuration UVC as at this point in time all APQNs have been reset and thus unbound. Therefore, don't warn if RC 0x104 is reported. Signed-off-by: Viktor Mihajlovski <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Reviewed-by: Janosch Frank <[email protected]> Reviewed-by: Steffen Eiden <[email protected]> Reviewed-by: Michael Mueller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Janosch Frank <[email protected]> Message-ID: <[email protected]>
2023-08-28Merge remote-tracking branch 'vfio-ap' into nextJanosch Frank7-73/+135
The Secure Execution AP support makes it possible for SE VMs to securely use APQNs without a third party being able to snoop IO. VMs first bind to an APQN to securely attach it and granting protected key crypto function access. Afterwards they can associate the APQN which grants them clear key crypto function access. Once bound the APQNs are not accessible to the host until a reset is performed. The vfio-ap patches being merged here provide the base hypervisor Secure Execution / Protected Virtualization AP support. This includes proper handling of APQNs that are securely attached to a SE/PV guest especially regarding resets.
2023-08-28KVM: s390: selftests: Add selftest for single-steppingIlya Leoshkevich2-0/+161
Test different variations of single-stepping into interrupts: - SVC and PGM interrupts; - Interrupts generated by ISKE; - Interrupts generated by instructions emulated by KVM; - Interrupts generated by instructions emulated by userspace. Reviewed-by: Claudio Imbrenda <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Message-ID: <[email protected]> Signed-off-by: Claudio Imbrenda <[email protected]> [[email protected]: s/ASSERT_EQ/TEST_ASSERT_EQ/ because function was renamed in the selftest printf series] Signed-off-by: Janosch Frank <[email protected]>
2023-08-28KVM: s390: interrupt: Fix single-stepping keyless mode exitsIlya Leoshkevich1-2/+2
kvm_s390_skey_check_enable() does not emulate any instructions, rather, it clears CPUSTAT_KSS and arranges the instruction that caused the exit (e.g., ISKE, SSKE, RRBE or LPSWE with a keyed PSW) to run again. Therefore, skip the PER check and let the instruction execution happen. Otherwise, a debugger will see two single-step events on the same instruction. Reviewed-by: Christian Borntraeger <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Message-ID: <[email protected]> Signed-off-by: Claudio Imbrenda <[email protected]> Signed-off-by: Janosch Frank <[email protected]>
2023-08-28KVM: s390: interrupt: Fix single-stepping userspace-emulated instructionsIlya Leoshkevich1-3/+20
Single-stepping a userspace-emulated instruction that generates an interrupt causes GDB to land on the instruction following it instead of the respective interrupt handler. The reason is that after arranging a KVM_EXIT_S390_SIEIC exit, kvm_handle_sie_intercept() calls kvm_s390_handle_per_ifetch_icpt(), which sets KVM_GUESTDBG_EXIT_PENDING. This bit, however, is not processed immediately, but rather persists until the next ioctl(), causing a spurious single-step exit. Fix by clearing this bit in ioctl(). Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Message-ID: <[email protected]> Signed-off-by: Claudio Imbrenda <[email protected]> Signed-off-by: Janosch Frank <[email protected]>
2023-08-28KVM: s390: interrupt: Fix single-stepping kernel-emulated instructionsIlya Leoshkevich1-3/+14
Single-stepping a kernel-emulated instruction that generates an interrupt causes GDB to land on the instruction following it instead of the respective interrupt handler. The reason is that kvm_handle_sie_intercept(), after injecting the interrupt, also processes the PER event and arranges a KVM_SINGLESTEP exit. The interrupt is not yet delivered, however, so the userspace sees the next instruction. Fix by avoiding the KVM_SINGLESTEP exit when there is a pending interrupt. The next __vcpu_run() loop iteration will arrange a KVM_SINGLESTEP exit after delivering the interrupt. Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Message-ID: <[email protected]> Signed-off-by: Claudio Imbrenda <[email protected]> Signed-off-by: Janosch Frank <[email protected]>
2023-08-28KVM: s390: interrupt: Fix single-stepping into program interrupt handlersIlya Leoshkevich1-1/+16
Currently, after single-stepping an instruction that generates a specification exception, GDB ends up on the instruction immediately following it. The reason is that vcpu_post_run() injects the interrupt and sets KVM_GUESTDBG_EXIT_PENDING, causing a KVM_SINGLESTEP exit. The interrupt is not delivered, however, therefore userspace sees the address of the next instruction. Fix by letting the __vcpu_run() loop go into the next iteration, where vcpu_pre_run() delivers the interrupt and sets KVM_GUESTDBG_EXIT_PENDING. Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Message-ID: <[email protected]> Signed-off-by: Claudio Imbrenda <[email protected]> Signed-off-by: Janosch Frank <[email protected]>
2023-08-28KVM: s390: interrupt: Fix single-stepping into interrupt handlersIlya Leoshkevich2-2/+16
After single-stepping an instruction that generates an interrupt, GDB ends up on the second instruction of the respective interrupt handler. The reason is that vcpu_pre_run() manually delivers the interrupt, and then __vcpu_run() runs the first handler instruction using the CPUSTAT_P flag. This causes a KVM_SINGLESTEP exit on the second handler instruction. Fix by delaying the KVM_SINGLESTEP exit until after the manual interrupt delivery. Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Message-ID: <[email protected]> Signed-off-by: Claudio Imbrenda <[email protected]> Signed-off-by: Janosch Frank <[email protected]>
2023-08-28Merge tag 'kvm-x86-selftests-immutable-6.6' into nextJanosch Frank55-468/+1372
Provide an immutable point in kvm-x86/selftests so that the guest printf() support can be merged into other architectures' trees.
2023-08-28Merge branch kvm-arm64/6.6/misc into kvmarm-master/nextMarc Zyngier9-96/+139
* kvm-arm64/6.6/misc: : . : Misc KVM/arm64 updates for 6.6: : : - Don't unnecessary align non-stack allocations in the EL2 VA space : : - Drop HCR_VIRT_EXCP_MASK, which was never used... : : - Don't use smp_processor_id() in kvm_arch_vcpu_load(), : but the cpu parameter instead : : - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort() : : - Remove prototypes without implementations : . KVM: arm64: Remove size-order align in the nVHE hyp private VA range KVM: arm64: Remove unused declarations KVM: arm64: Remove redundant kvm_set_pfn_accessed() from user_mem_abort() KVM: arm64: Drop HCR_VIRT_EXCP_MASK KVM: arm64: Use the known cpu id instead of smp_processor_id() Signed-off-by: Marc Zyngier <[email protected]>
2023-08-28Merge branch kvm-arm64/6.6/pmu-fixes into kvmarm-master/nextMarc Zyngier7-12/+55
* kvm-arm64/6.6/pmu-fixes: : . : Another set of PMU fixes, coutrtesy of Reiji Watanabe. : From the cover letter: : : "This series fixes a couple of PMUver related handling of : vPMU support. : : On systems where the PMUVer is not uniform across all PEs, : KVM currently does not advertise PMUv3 to the guest, : even if userspace successfully runs KVM_ARM_VCPU_INIT with : KVM_ARM_VCPU_PMU_V3." : : Additionally, a fix for an obscure counter oversubscription : issue happening when the hsot profines the guest's EL0. : . KVM: arm64: pmu: Guard PMU emulation definitions with CONFIG_KVM KVM: arm64: pmu: Resync EL0 state on counter rotation KVM: arm64: PMU: Don't advertise STALL_SLOT_{FRONTEND,BACKEND} KVM: arm64: PMU: Don't advertise the STALL_SLOT event KVM: arm64: PMU: Avoid inappropriate use of host's PMUVer KVM: arm64: PMU: Disallow vPMU on non-uniform PMUVer Signed-off-by: Marc Zyngier <[email protected]>
2023-08-28Merge branch kvm-arm64/tlbi-range into kvmarm-master/nextMarc Zyngier21-132/+286
* kvm-arm64/tlbi-range: : . : FEAT_TLBIRANGE support, courtesy of Raghavendra Rao Ananta. : From the cover letter: : : "In certain code paths, KVM/ARM currently invalidates the entire VM's : page-tables instead of just invalidating a necessary range. For example, : when collapsing a table PTE to a block PTE, instead of iterating over : each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to : flush all the entries. This is inefficient since the guest would have : to refill the TLBs again, even for the addresses that aren't covered : by the table entry. The performance impact would scale poorly if many : addresses in the VM is going through this remapping. : : For architectures that implement FEAT_TLBIRANGE, KVM can replace such : inefficient paths by performing the invalidations only on the range of : addresses that are in scope. This series tries to achieve the same in : the areas of stage-2 map, unmap and write-protecting the pages." : . KVM: arm64: Use TLBI range-based instructions for unmap KVM: arm64: Invalidate the table entries upon a range KVM: arm64: Flush only the memslot after write-protect KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range() KVM: arm64: Define kvm_tlb_flush_vmid_range() KVM: arm64: Implement __kvm_tlb_flush_vmid_range() arm64: tlb: Implement __flush_s2_tlb_range_op() arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code KVM: Allow range-based TLB invalidation from common code KVM: Remove CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL KVM: arm64: Use kvm_arch_flush_remote_tlbs() KVM: Declare kvm_arch_flush_remote_tlbs() globally KVM: Rename kvm_arch_flush_remote_tlb() to kvm_arch_flush_remote_tlbs() Signed-off-by: Marc Zyngier <[email protected]>
2023-08-28Merge branch kvm-arm64/nv-trap-forwarding into kvmarm-master/nextMarc Zyngier15-41/+2488
* kvm-arm64/nv-trap-forwarding: (30 commits) : . : This implements the so called "trap forwarding" infrastructure, which : gets used when we take a trap from an L2 guest and that the L1 guest : wants to see the trap for itself. : . KVM: arm64: nv: Add trap description for SPSR_EL2 and ELR_EL2 KVM: arm64: nv: Select XARRAY_MULTI to fix build error KVM: arm64: nv: Add support for HCRX_EL2 KVM: arm64: Move HCRX_EL2 switch to load/put on VHE systems KVM: arm64: nv: Expose FGT to nested guests KVM: arm64: nv: Add switching support for HFGxTR/HDFGxTR KVM: arm64: nv: Expand ERET trap forwarding to handle FGT KVM: arm64: nv: Add SVC trap forwarding KVM: arm64: nv: Add trap forwarding for HDFGxTR_EL2 KVM: arm64: nv: Add trap forwarding for HFGITR_EL2 KVM: arm64: nv: Add trap forwarding for HFGxTR_EL2 KVM: arm64: nv: Add fine grained trap forwarding infrastructure KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2 KVM: arm64: nv: Add trap forwarding for MDCR_EL2 KVM: arm64: nv: Expose FEAT_EVT to nested guests KVM: arm64: nv: Add trap forwarding for HCR_EL2 KVM: arm64: nv: Add trap forwarding infrastructure KVM: arm64: Restructure FGT register switching KVM: arm64: nv: Add FGT registers KVM: arm64: Add missing HCR_EL2 trap bits ... Signed-off-by: Marc Zyngier <[email protected]>
2023-08-27Linux 6.5Linus Torvalds1-1/+1
2023-08-27Merge tag 'scsi-fixes' of ↵Linus Torvalds5-57/+6
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small driver fixes and one larger unused function set removal in the raid class (so no external impact)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: snic: Fix double free in snic_tgt_create() scsi: core: raid_class: Remove raid_component_add() scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5 scsi: ufs: mcq: Fix the search/wrap around logic
2023-08-26Merge tag 'x86-urgent-2023-08-26' of ↵Linus Torvalds3-3/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix an FPU invalidation bug on exec(), and fix a performance regression due to a missing setting of X86_FEATURE_OSXSAVE" * tag 'x86-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4 x86/fpu: Invalidate FPU state correctly on exec()
2023-08-26Merge tag 'irq-urgent-2023-08-26' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A last minute fix for a regression introduced in the v6.5 merge window. The conversion of the software based interrupt resend mechanism to hlist missed to add a check whether the descriptor is already enqueued and dropped the interrupt descriptor lookup for nested interrupts. The missing check whether the descriptor is already queued causes hlist corruption and can be observed in the wild. The dropped parent descriptor lookup has not yet caused problems, but it would result in stale interrupt line in the worst case. Add the missing enqueued check and bring the descriptor lookup back to cure this" * tag 'irq-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Fix software resend lockup and nested resend
2023-08-26Merge tag 'loongarch-fixes-6.5-2' of ↵Linus Torvalds22-38/+42
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix a ptrace bug, a hw_breakpoint bug, some build errors/warnings and some trivial cleanups" * tag 'loongarch-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Fix hw_breakpoint_control() for watchpoints LoongArch: Ensure FP/SIMD registers in the core dump file is up to date LoongArch: Put the body of play_dead() into arch_cpu_idle_dead() LoongArch: Add identifier names to arguments of die() declaration LoongArch: Return earlier in die() if notify_die() returns NOTIFY_STOP LoongArch: Do not kill the task in die() if notify_die() returns NOTIFY_STOP LoongArch: Remove <asm/export.h> LoongArch: Replace #include <asm/export.h> with #include <linux/export.h> LoongArch: Remove unneeded #include <asm/export.h> LoongArch: Replace -ffreestanding with finer-grained -fno-builtin's LoongArch: Remove redundant "source drivers/firmware/Kconfig"
2023-08-26genirq: Fix software resend lockup and nested resendJohan Hovold1-1/+6
The switch to using hlist for managing software resend of interrupts broke resend in at least two ways: First, unconditionally adding interrupt descriptors to the resend list can corrupt the list when the descriptor in question has already been added. This causes the resend tasklet to loop indefinitely with interrupts disabled as was recently reported with the Lenovo ThinkPad X13s after threaded NAPI was disabled in the ath11k WiFi driver. This bug is easily fixed by restoring the old semantics of irq_sw_resend() so that it can be called also for descriptors that have already been marked for resend. Second, the offending commit also broke software resend of nested interrupts by simply discarding the code that made sure that such interrupts are retriggered using the parent interrupt. Add back the corresponding code that adds the parent descriptor to the resend list. Fixes: bc06a9e08742 ("genirq: Use hlist for managing resend handlers") Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/r/[email protected]
2023-08-26LoongArch: Fix hw_breakpoint_control() for watchpointsHuacai Chen1-2/+1
In hw_breakpoint_control(), encode_ctrl_reg() has already encoded the MWPnCFG3_LoadEn/MWPnCFG3_StoreEn bits in info->ctrl. We don't need to add (1 << MWPnCFG3_LoadEn | 1 << MWPnCFG3_StoreEn) unconditionally. Otherwise we can't set read watchpoint and write watchpoint separately. Cc: [email protected] Signed-off-by: Huacai Chen <[email protected]>
2023-08-26LoongArch: Ensure FP/SIMD registers in the core dump file is up to dateHuacai Chen2-4/+22
This is a port of commit 379eb01c21795edb4c ("riscv: Ensure the value of FP registers in the core dump file is up to date"). The values of FP/SIMD registers in the core dump file come from the thread.fpu. However, kernel saves the FP/SIMD registers only before scheduling out the process. If no process switch happens during the exception handling, kernel will not have a chance to save the latest values of FP/SIMD registers. So it may cause their values in the core dump file incorrect. To solve this problem, force fpr_get()/simd_get() to save the FP/SIMD registers into the thread.fpu if the target task equals the current task. Cc: [email protected] Signed-off-by: Huacai Chen <[email protected]>
2023-08-26KVM: arm64: Remove size-order align in the nVHE hyp private VA rangeVincent Donnefort6-85/+138
commit f922c13e778d ("KVM: arm64: Introduce pkvm_alloc_private_va_range()") and commit 92abe0f81e13 ("KVM: arm64: Introduce hyp_alloc_private_va_range()") added an alignment for the start address of any allocation into the nVHE hypervisor private VA range. This alignment (order of the size of the allocation) intends to enable efficient stack verification (if the PAGE_SHIFT bit is zero, the stack pointer is on the guard page and a stack overflow occurred). But this is only necessary for stack allocation and can waste a lot of VA space. So instead make stack-specific functions, handling the guard page requirements, while other users (e.g. fixmap) will only get page alignment. Reviewed-by: Kalesh Singh <[email protected]> Signed-off-by: Vincent Donnefort <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-25Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds3-48/+51
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "One clk driver fix and two clk framework fixes: - Fix an OOB access when devm_get_clk_from_child() is used and devm_clk_release() casts the void pointer to the wrong type - Move clk_rate_exclusive_{get,put}() within the correct ifdefs in clk.h so that the stubs are used when CONFIG_COMMON_CLK=n - Register the proper clk provider function depending on the value of #clock-cells in the TI keystone driver" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: Fix slab-out-of-bounds error in devm_clk_release() clk: Fix undefined reference to `clk_rate_exclusive_{get,put}' clk: keystone: syscon-clk: Fix audio refclk
2023-08-25lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernelsHelge Deller1-26/+6
The gcc compiler translates on some architectures the 64-bit __builtin_clzll() function to a call to the libgcc function __clzdi2(), which should take a 64-bit parameter on 32- and 64-bit platforms. But in the current kernel code, the built-in __clzdi2() function is defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG == 32, thus the return values on 32-bit kernels are in the range from [0..31] instead of the expected [0..63] range. This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to take a 64-bit parameter on 32-bit kernels as well, thus it makes the functions identical for 32- and 64-bit kernels. This bug went unnoticed since kernel 3.11 for over 10 years, and here are some possible reasons for that: a) Some architectures have assembly instructions to count the bits and which are used instead of calling __clzdi2(), e.g. on x86 the bsr instruction and on ppc cntlz is used. On such architectures the wrong __clzdi2() implementation isn't used and as such the bug has no effect and won't be noticed. b) Some architectures link to libgcc.a, and the in-kernel weak functions get replaced by the correct 64-bit variants from libgcc.a. c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many places in the kernel, and most likely only in uncritical functions, e.g. when printing hex values via seq_put_hex_ll(). The wrong return value will still print the correct number, but just in a wrong formatting (e.g. with too many leading zeroes). d) 32-bit kernels aren't used that much any longer, so they are less tested. A trivial testcase to verify if the currently running 32-bit kernel is affected by the bug is to look at the output of /proc/self/maps: Here the kernel uses a correct implementation of __clzdi2(): root@debian:~# cat /proc/self/maps 00010000-00019000 r-xp 00000000 08:05 787324 /usr/bin/cat 00019000-0001a000 rwxp 00009000 08:05 787324 /usr/bin/cat 0001a000-0003b000 rwxp 00000000 00:00 0 [heap] f7551000-f770d000 r-xp 00000000 08:05 794765 /usr/lib/hppa-linux-gnu/libc.so.6 ... and this kernel uses the broken implementation of __clzdi2(): root@debian:~# cat /proc/self/maps 0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324 /usr/bin/cat 0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324 /usr/bin/cat 000000001a000-000000003b000 rwxp 00000000 00:00 0 [heap] 00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765 /usr/lib/hppa-linux-gnu/libc.so.6 ... Signed-off-by: Helge Deller <[email protected]> Fixes: 4df87bb7b6a22 ("lib: add weak clz/ctz functions") Cc: Chanho Min <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: [email protected] # v3.11+ Signed-off-by: Linus Torvalds <[email protected]>
2023-08-25Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of ↵Linus Torvalds32-73/+279
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "18 hotfixes. 13 are cc:stable and the remainder pertain to post-6.4 issues or aren't considered suitable for a -stable backport" * tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: shmem: fix smaps BUG sleeping while atomic selftests: cachestat: catch failing fsync test on tmpfs selftests: cachestat: test for cachestat availability maple_tree: disable mas_wr_append() when other readers are possible madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check mm: multi-gen LRU: don't spin during memcg release mm: memory-failure: fix unexpected return value in soft_offline_page() radix tree: remove unused variable mm: add a call to flush_cache_vmap() in vmap_pfn() selftests/mm: FOLL_LONGTERM need to be updated to 0x100 nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers() mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast selftests: cgroup: fix test_kmem_basic less than error mm: enable page walking API to lock vmas during the walk smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd() mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
2023-08-25Merge tag 'riscv-for-linus-6.5-rc8' of ↵Linus Torvalds5-75/+7
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "This is obviously not ideal, particularly for something this late in the cycle. Unfortunately we found some uABI issues in the vector support while reviewing the GDB port, which has triggered a revert -- probably a good sign we should have reviewed GDB before merging this, I guess I just dropped the ball because I was so worried about the context extension and libc suff I forgot. Hence the late revert. There's some risk here as we're still exposing the vector context for signal handlers, but changing that would have meant reverting all of the vector support. The issues we've found so far have been fixed already and they weren't absolute showstoppers, so we're essentially just playing it safe by holding ptrace support for another release (or until we get through a proper userspace code review). Summary: - The vector ucontext extension has been extended with vlenb - The vector registers ELF core dump note type has been changed to avoid aliasing with the CSR type used in embedded systems - Support for accessing vector registers via ptrace() has been reverted - Another build fix for the ISA spec changes around Zifencei/Zicsr that manifests on some systems built with binutils-2.37 and gcc-11.2" * tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix build errors using binutils2.37 toolchains RISC-V: vector: export VLENB csr in __sc_riscv_v_state RISC-V: Remove ptrace support for vectors
2023-08-25Merge tag 'gpio-fixes-for-v6.5' of ↵Linus Torvalds1-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix an irq mapping leak in gpio-sim - associate the GPIO device's software node with the irq domain in gpio-sim * tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: sim: pass the GPIO device's software node to irq domain gpio: sim: dispose of irq mappings before destroying the irq_sim domain
2023-08-25Merge tag 'pinctrl-v6.5-4' of ↵Linus Torvalds4-7/+68
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Here are some Renesas and AMD driver fixes, the AMD fix affects important laptops in the wild so this one is pretty important. It seems a bit tough to get this right. - Fix DT parsing and related locking in the Renesas driver. - Fix wakeup IRQs in the AMD driver once again. Really tricky this one" * tag 'pinctrl-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: Mask wake bits on probe again pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function} pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map() pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
2023-08-25KVM: VMX: Delete ancient pr_warn() about KVM_SET_TSS_ADDR not being setSean Christopherson1-7/+0
Delete KVM's printk about KVM_SET_TSS_ADDR not being called. When the printk was added by commit 776e58ea3d37 ("KVM: unbreak userspace that does not sets tss address"), KVM also stuffed a "hopefully safe" value, i.e. the message wasn't purely informational. For reasons unknown, ostensibly to try and help people running outdated qemu-kvm versions, the message got left behind when KVM's stuffing was removed by commit 4918c6ca6838 ("KVM: VMX: Require KVM_SET_TSS_ADDR being called prior to running a VCPU"). Today, the message is completely nonsensical, as it has been over a decade since KVM supported userspace running a Real Mode guest, on a CPU without unrestricted guest support, without doing KVM_SET_TSS_ADDR before KVM_RUN. I.e. KVM's ABI has required KVM_SET_TSS_ADDR for 10+ years. To make matters worse, the message is prone to false positives as it triggers when simply *creating* a vCPU due to RESET putting vCPUs into Real Mode, even when the user has no intention of ever *running* the vCPU in a Real Mode. E.g. KVM selftests stuff 64-bit mode and never touch Real Mode, but trigger the message even though they run just fine without doing KVM_SET_TSS_ADDR. Creating "dummy" vCPUs, e.g. to probe features, can also trigger the message. In both scenarios, the message confuses users and falsely implies that they've done something wrong. Reported-by: Thorsten Glaser <[email protected]> Closes: https://lkml.kernel.org/r/f1afa6c0-cde2-ab8b-ea71-bfa62a45b956%40tarent.de Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-25KVM: x86: Update MAINTAINTERS to include selftestsSean Christopherson1-0/+2
Give KVM x86 the same treatment as all other KVM architectures, and officially take ownership of x86 specific KVM selftests (changes have been routed through kvm and/or kvm-x86 for quite some time). Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-25KVM: selftests: Explicit set #UD when *potentially* injecting exceptionSean Christopherson1-0/+3
Explicitly set the exception vector to #UD when potentially injecting an exception in sync_regs_test's subtests that try to detect TOCTOU bugs in KVM's handling of exceptions injected by userspace. A side effect of the original KVM bug was that KVM would clear the vector, but relying on KVM to clear the vector (i.e. make it #DE) makes it less likely that the test would ever find *new* KVM bugs, e.g. because only the first iteration would run with a legal vector to start. Explicitly inject #UD for race_events_inj_pen() as well, e.g. so that it doesn't inherit the illegal 255 vector from race_events_exc(), which currently runs first. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-25KVM: selftests: Reload "good" vCPU state if vCPU hits shutdownSean Christopherson1-1/+13
Reload known good vCPU state if the vCPU triple faults in any of the race_sync_regs() subtests, e.g. if KVM successfully injects an exception (the vCPU isn't configured to handle exceptions). On Intel, the VMCS is preserved even after shutdown, but AMD's APM states that the VMCB is undefined after a shutdown and so KVM synthesizes an INIT to sanitize vCPU/VMCB state, e.g. to guard against running with a garbage VMCB. The synthetic INIT results in the vCPU never exiting to userspace, as it gets put into Real Mode at the reset vector, which is full of zeros (as is GPA 0 and beyond), and so executes ADD for a very, very long time. Fixes: 60c4063b4752 ("KVM: selftests: Extend x86's sync_regs_test to check for event vector races") Cc: Michal Luczaj <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-25KVM: SVM: Require nrips support for SEV guests (and beyond)Sean Christopherson3-8/+6
Disallow SEV (and beyond) if nrips is disabled via module param, as KVM can't read guest memory to partially emulate and skip an instruction. All CPUs that support SEV support NRIPS, i.e. this is purely stopping the user from shooting themselves in the foot. Cc: Tom Lendacky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>