blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-05-07	KVM: x86/mmu: check for invalid async page faults involving private memory	Paolo Bonzini	2	-7/+12
	Right now the error code is not used when an async page fault is completed. This is not a problem in the current code, but it is untidy. For protected VMs, we will also need to check that the page attributes match the current state of the page, because asynchronous page faults can only occur on shared pages (private pages go through kvm_faultin_pfn_private() instead of __gfn_to_pfn_memslot()). Start by piping the error code from kvm_arch_setup_async_pf() to kvm_arch_async_page_ready() via the architecture-specific async page fault data. For now, it can be used to assert that there are no async page faults on private memory. Extracted from a patch by Isaku Yamahata. Signed-off-by: Paolo Bonzini <[email protected]>
2024-05-07	KVM: x86/mmu: Use synthetic page fault error code to indicate private faults	Sean Christopherson	3	-2/+21
	Add and use a synthetic, KVM-defined page fault error code to indicate whether a fault is to private vs. shared memory. TDX and SNP have different mechanisms for reporting private vs. shared, and KVM's software-protected VMs have no mechanism at all. Usurp an error code flag to avoid having to plumb another parameter to kvm_mmu_page_fault() and friends. Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it for TDX and software-protected VMs as appropriate, but that would require clearing the flag for SEV and SEV-ES VMs, which support encrypted memory at the hardware layer, but don't utilize private memory at the KVM layer. Opportunistically add a comment to call out that the logic for software- protected VMs is (and was before this commit) broken for nested MMUs, i.e. for nested TDP, as the GPA is an L2 GPA. Punt on trying to play nice with nested MMUs as there is a _lot_ of functionality that simply doesn't work for software-protected VMs, e.g. all of the paths where KVM accesses guest memory need to be updated to be aware of private vs. shared memory. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-05-07	KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero	Sean Christopherson	1	-0/+7
	WARN if bits 63:32 are non-zero when handling an intercepted legacy #PF, as the error code for #PF is limited to 32 bits (and in practice, 16 bits on Intel CPUS). This behavior is architectural, is part of KVM's ABI (see kvm_vcpu_events.error_code), and is explicitly documented as being preserved for intecerpted #PF in both the APM: The error code saved in EXITINFO1 is the same as would be pushed onto the stack by a non-intercepted #PF exception in protected mode. and even more explicitly in the SDM as VMCS.VM_EXIT_INTR_ERROR_CODE is a 32-bit field. Simply drop the upper bits if hardware provides garbage, as spurious information should do no harm (though in all likelihood hardware is buggy and the kernel is doomed). Handling all upper 32 bits in the #PF path will allow moving the sanity check on synthetic checks from kvm_mmu_page_fault() to npf_interception(), which in turn will allow deriving PFERR_PRIVATE_ACCESS from AMD's PFERR_GUEST_ENC_MASK without running afoul of the sanity check. Note, this is also why Intel uses bit 15 for SGX (highest bit on Intel CPUs) and AMD uses bit 31 for RMP (highest bit on AMD CPUs); using the highest bit minimizes the probability of a collision with the "other" vendor, without needing to plumb more bits through microcode. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Kai Huang <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-05-07	KVM: x86/mmu: Pass full 64-bit error code when handling page faults	Isaku Yamahata	3	-5/+4
	Plumb the full 64-bit error code throughout the page fault handling code so that KVM can use the upper 32 bits, e.g. SNP's PFERR_GUEST_ENC_MASK will be used to determine whether or not a fault is private vs. shared. Note, passing the 64-bit error code to FNAME(walk_addr)() does NOT change the behavior of permission_fault() when invoked in the page fault path, as KVM explicitly clears PFERR_IMPLICIT_ACCESS in kvm_mmu_page_fault(). Continue passing '0' from the async #PF worker, as guest_memfd and thus private memory doesn't support async page faults. Signed-off-by: Isaku Yamahata <[email protected]> [mdr: drop references/changes on rebase, update commit message] Signed-off-by: Michael Roth <[email protected]> [sean: drop truncation in call to FNAME(walk_addr)(), rewrite changelog] Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-05-07	KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler	Sean Christopherson	3	-11/+18
	Move the sanity check that hardware never sets bits that collide with KVM- define synthetic bits from kvm_mmu_page_fault() to npf_interception(), i.e. make the sanity check #NPF specific. The legacy #PF path already WARNs if _any_ of bits 63:32 are set, and the error code that comes from VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's EXIT_QUALIFICATION into error code flags). Add a compile-time assert in the legacy #PF handler to make sure that KVM- define flags are covered by its existing sanity check on the upper bits. Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we are removing the comment that defined it. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Kai Huang <[email protected]> Reviewed-by: Binbin Wu <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-05-07	KVM: x86: Define more SEV+ page fault error bits/flags for #NPF	Sean Christopherson	1	-0/+4
	Define more #NPF error code flags that are relevant to SEV+ (mostly SNP) guests, as specified by the APM: * Bit 31 (RMP): Set to 1 if the fault was caused due to an RMP check or a VMPL check failure, 0 otherwise. * Bit 34 (ENC): Set to 1 if the guest’s effective C-bit was 1, 0 otherwise. * Bit 35 (SIZEM): Set to 1 if the fault was caused by a size mismatch between PVALIDATE or RMPADJUST and the RMP, 0 otherwise. * Bit 36 (VMPL): Set to 1 if the fault was caused by a VMPL permission check failure, 0 otherwise. Note, the APM is extremely misleading, and strongly implies that the above flags can _only_ be set for #NPF exits from SNP guests. That is a lie, as bit 34 (C-bit=1, i.e. was encrypted) can be set when running _any_ flavor of SEV guest on SNP capable hardware. Signed-off-by: Sean Christopherson <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-05-07	KVM: x86: Remove separate "bit" defines for page fault error code masks	Sean Christopherson	2	-25/+12
	Open code the bit number directly in the PFERR_* masks and drop the intermediate PFERR_*_BIT defines, as having to bounce through two macros just to see which flag corresponds to which bit is quite annoying, as is having to define two macros just to add recognition of a new flag. Use ternary operator to derive the bit in permission_fault(), the one function that actually needs the bit number as part of clever shifting to avoid conditional branches. Generally the compiler is able to turn it into a conditional move, and if not it's not really a big deal. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-05-07	KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation	Sean Christopherson	2	-8/+19
	Exit to userspace with -EFAULT / KVM_EXIT_MEMORY_FAULT if a private fault triggers emulation of any kind, as KVM doesn't currently support emulating access to guest private memory. Practically speaking, private faults and emulation are already mutually exclusive, but there are many flow that can result in KVM returning RET_PF_EMULATE, and adding one last check to harden against weird, unexpected combinations and/or KVM bugs is inexpensive. Suggested-by: Yan Zhao <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-05-06	LoongArch: KVM: Add mmio trace events support	Bibo Mao	2	-6/+22
	Add mmio trace events support, currently generic mmio events KVM_TRACE_MMIO_WRITE/xxx_READ/xx_READ_UNSATISFIED are added here. Also vcpu id field is added for all kvm trace events, since perf KVM tool parses vcpu id information for kvm entry event. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-05-06	LoongArch: KVM: Add software breakpoint support	Bibo Mao	7	-3/+40
	When VM runs in kvm mode, system will not exit to host mode when executing a general software breakpoint instruction such as INSN_BREAK, trap exception happens in guest mode rather than host mode. In order to debug guest kernel on host side, one mechanism should be used to let VM exit to host mode. Here a hypercall instruction with a special code is used for software breakpoint usage. VM exits to host mode and kvm hypervisor identifies the special hypercall code and sets exit_reason with KVM_EXIT_DEBUG. And then let qemu handle it. Idea comes from ppc kvm, one api KVM_REG_LOONGARCH_DEBUG_INST is added to get the hypercall code. VMM needs get sw breakpoint instruction with this api and set the corresponding sw break point for guest kernel. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-05-06	LoongArch: KVM: Add PV IPI support on guest side	Bibo Mao	8	-2/+197
	PARAVIRT config option and PV IPI is added for the guest side, function pv_ipi_init() is used to add IPI sending and IPI receiving hooks. This function firstly checks whether system runs in VM mode, and if kernel runs in VM mode, it will call function kvm_para_available() to detect the current hypervirsor type (now only KVM type detection is supported). The paravirt functions can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver functions. With virtual IPI sender, IPI message is stored in memory rather than emulated HW. IPI multicast is also supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SWI0 is used rather than real IPI HW. Since VCPU has separate HW SWI0 like HW timer, there is no trap in IPI interrupt acknowledge. Since IPI message is stored in memory, there is no trap in getting IPI message. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-05-06	LoongArch: KVM: Add PV IPI support on host side	Bibo Mao	6	-2/+211
	On LoongArch system, IPI hw uses iocsr registers. There are one iocsr register access on IPI sending, and two iocsr access on IPI receiving for the IPI interrupt handler. In VM mode all iocsr accessing will cause VM to trap into hypervisor. So with one IPI hw notification there will be three times of trap. In this patch PV IPI is added for VM, hypercall instruction is used for IPI sender, and hypervisor will inject an SWI to the destination vcpu. During the SWI interrupt handler, only CSR.ESTAT register is written to clear irq. CSR.ESTAT register access will not trap into hypervisor, so with PV IPI supported, there is one trap with IPI sender, and no trap with IPI receiver, there is only one trap with IPI notification. Also this patch adds IPI multicast support, the method is similar with x86. With IPI multicast support, IPI notification can be sent to at most 128 vcpus at one time. It greatly reduces the times of trapping into hypervisor. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-05-06	LoongArch: KVM: Add vcpu mapping from physical cpuid	Bibo Mao	4	-0/+129
	Physical CPUID is used for interrupt routing for irqchips such as ipi, msgint and eiointc interrupt controllers. Physical CPUID is stored at the CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu is created and the physical CPUIDs of two vcpus cannot be the same. Different irqchips have different size declaration about physical CPUID, the max CPUID value for CSR LOONGARCH_CSR_CPUID on Loongson-3A5000 is 512, the max CPUID supported by IPI hardware is 1024, while for eiointc irqchip is 256, and for msgint irqchip is 65536. The smallest value from all interrupt controllers is selected now, and the max cpuid size is defines as 256 by KVM which comes from the eiointc irqchip. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-05-06	LoongArch: KVM: Add cpucfg area for kvm hypervisor	Bibo Mao	3	-17/+50
	Instruction cpucfg can be used to get processor features. And there is a trap exception when it is executed in VM mode, and also it can be used to provide cpu features to VM. On real hardware cpucfg area 0 - 20 is used by now. Here one specified area 0x40000000 -- 0x400000ff is used for KVM hypervisor to provide PV features, and the area can be extended for other hypervisors in future. This area will never be used for real HW, it is only used by software. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-05-06	LoongArch: KVM: Add hypercall instruction emulation	Bibo Mao	3	-1/+38
	On LoongArch system, there is a hypercall instruction special for virtualization. When system executes this instruction on host side, there is an illegal instruction exception reported, however it will trap into host when it is executed in VM mode. When hypercall is emulated, A0 register is set with value KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid instruction exception. So VM can continue to executing the next code. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-05-06	LoongArch/smp: Refine some ipi functions on LoongArch platform	Bibo Mao	7	-71/+63
	Refine the ipi handling on LoongArch platform, there are three modifications: 1. Add generic function get_percpu_irq(), replacing some percpu irq functions such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with get_percpu_irq(). 2. Change definition about parameter action called by function loongson_send_ipi_single() and loongson_send_ipi_mask(), and it is defined as decimal encoding format at ipi sender side. Normal decimal encoding is used rather than binary bitmap encoding for ipi action, ipi hw sender uses decimal encoding code, and ipi receiver will get binary bitmap encoding, the ipi hw will convert it into bitmap in ipi message buffer. 3. Add a structure smp_ops on LoongArch platform so that pv ipi can be used later. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-05-05	Linux 6.9-rc7	Linus Torvalds	1	-1/+1

2024-05-05	epoll: be better about file lifetimes	Linus Torvalds	1	-1/+37
	epoll can call out to vfs_poll() with a file pointer that may race with the last 'fput()'. That would make f_count go down to zero, and while the ep->mtx locking means that the resulting file pointer tear-down will be blocked until the poll returns, it means that f_count is already dead, and any use of it won't actually get a reference to the file any more: it's dead regardless. Make sure we have a valid ref on the file pointer before we call down to vfs_poll() from the epoll routines. Link: https://lore.kernel.org/lkml/[email protected]/ Reported-by: [email protected] Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-05-05	Merge tag 'edac_urgent_for_v6.9_rc7' of ↵	Linus Torvalds	1	-6/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Fix error logging and check user-supplied data when injecting an error in the versal EDAC driver * tag 'edac_urgent_for_v6.9_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/versal: Do not log total error counts EDAC/versal: Check user-supplied data before injecting an error EDAC/versal: Do not register for NOC errors
2024-05-05	Merge tag 'powerpc-6.9-4' of ↵	Linus Torvalds	3	-8/+15
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix incorrect delay handling in the plpks (keystore) code - Fix a panic when an LPAR boots with a frozen PE Thanks to Andrew Donnellan, Gaurav Batra, Nageswara R Sastry, and Nayna Jain. * tag 'powerpc-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE powerpc/pseries: make max polling consistent for longer H_CALLs
2024-05-05	Merge tag 'x86-urgent-2024-05-05' of ↵	Linus Torvalds	9	-67/+64
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Remove the broken vsyscall emulation code from the page fault code - Fix kexec crash triggered by certain SEV RMP table layouts - Fix unchecked MSR access error when disabling the x2APIC via iommu=off * tag 'x86-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove broken vsyscall emulation code from the page fault code x86/apic: Don't access the APIC when disabling x2APIC x86/sev: Add callback to apply RMP table fixups for kexec x86/e820: Add a new e820 table update helper
2024-05-05	Merge tag 'irq-urgent-2024-05-05' of ↵	Linus Torvalds	1	-4/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix suspicious RCU usage in __do_softirq()" * tag 'irq-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: softirq: Fix suspicious RCU usage in __do_softirq()
2024-05-05	Merge tag 'char-misc-6.9-rc7' of ↵	Linus Torvalds	13	-25/+118
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc/other driver fixes and new device ids for 6.9-rc7 that resolve some reported problems. Included in here are: - iio driver fixes - mei driver fix and new device ids - dyndbg bugfix - pvpanic-pci driver bugfix - slimbus driver bugfix - fpga new device id All have been in linux-next with no reported problems" * tag 'char-misc-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: slimbus: qcom-ngd-ctrl: Add timeout for wait operation dyndbg: fix old BUG_ON in >control parser misc/pvpanic-pci: register attributes via pci_driver fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card mei: me: add lunar lake point M DID mei: pxp: match against PCI_CLASS_DISPLAY_OTHER iio:imu: adis16475: Fix sync mode setting iio: accel: mxc4005: Reset chip on probe() and resume() iio: accel: mxc4005: Interrupt handling fixes dt-bindings: iio: health: maxim,max30102: fix compatible check iio: pressure: Fixes SPI support for BMP3xx devices iio: pressure: Fixes BME280 SPI driver data
2024-05-05	Merge tag 'usb-6.9-rc7' of ↵	Linus Torvalds	15	-79/+147
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some small USB driver fixes for reported problems for 6.9-rc7. Included in here are: - usb core fixes for found issues - typec driver fixes for reported problems - usb gadget driver fixes for reported problems - xhci build fixes - dwc3 driver fixes for reported issues All of these have been in linux-next this past week with no reported problems" * tag 'usb-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tcpm: Check for port partner validity before consuming it usb: typec: tcpm: enforce ready state when queueing alt mode vdm usb: typec: tcpm: unregister existing source caps before re-registration usb: typec: tcpm: clear pd_event queue in PORT_RESET usb: typec: tcpm: queue correct sop type in tcpm_queue_vdm_unlocked usb: Fix regression caused by invalid ep0 maxpacket in virtual SuperSpeed device usb: ohci: Prevent missed ohci interrupts usb: typec: qcom-pmic: fix pdphy start() error handling usb: typec: qcom-pmic: fix use-after-free on late probe errors usb: gadget: f_fs: Fix a race condition when processing setup packets. USB: core: Fix access violation during port device removal usb: dwc3: core: Prevent phy suspend during init usb: xhci-plat: Don't include xhci.h usb: gadget: uvc: use correct buffer size when parsing configfs lists usb: gadget: composite: fix OS descriptors w_value logic usb: gadget: f_fs: Fix race between aio_cancel() and AIO request complete
2024-05-05	Merge tag 'input-for-v6.9-rc6' of ↵	Linus Torvalds	2	-1/+9
	git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - a new ID for ASUS ROG RAIKIRI controllers added to xpad driver - amimouse driver structure annotated with __refdata to prevent section mismatch warnings. * tag 'input-for-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: amimouse - mark driver struct with __refdata to prevent section mismatch Input: xpad - add support for ASUS ROG RAIKIRI
2024-05-05	Merge tag 'probes-fixes-v6.9-rc6' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: - probe-events: Fix memory leak in parsing probe argument. There is a memory leak (forget to free an allocated buffer) in a memory allocation failure path. Fix it to jump to the correct error handling code. * tag 'probes-fixes-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: Fix memory leak in traceprobe_parse_probe_arg_body()
2024-05-05	Merge tag 'trace-v6.9-rc6-2' of ↵	Linus Torvalds	5	-59/+210
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing and tracefs fixes from Steven Rostedt: - Fix RCU callback of freeing an eventfs_inode. The freeing of the eventfs_inode from the kref going to zero freed the contents of the eventfs_inode and then used kfree_rcu() to free the inode itself. But the contents should also be protected by RCU. Switch to a call_rcu() that calls a function to free all of the eventfs_inode after the RCU synchronization. - The tracing subsystem maps its own descriptor to a file represented by eventfs. The freeing of this descriptor needs to know when the last reference of an eventfs_inode is released, but currently there is no interface for that. Add a "release" callback to the eventfs_inode entry array that allows for freeing of data that can be referenced by the eventfs_inode being opened. Then increment the ref counter for this descriptor when the eventfs_inode file is created, and decrement/free it when the last reference to the eventfs_inode is released and the file is removed. This prevents races between freeing the descriptor and the opening of the eventfs file. - Fix the permission processing of eventfs. The change to make the permissions of eventfs default to the mount point but keep track of when changes were made had a side effect that could cause security concerns. When the tracefs is remounted with a given gid or uid, all the files within it should inherit that gid or uid. But if the admin had changed the permission of some file within the tracefs file system, it would not get updated by the remount. This caused the kselftest of file permissions to fail the second time it is run. The first time, all changes would look fine, but the second time, because the changes were "saved", the remount did not reset them. Create a link list of all existing tracefs inodes, and clear the saved flags on them on a remount if the remount changes the corresponding gid or uid fields. This also simplifies the code by removing the distinction between the toplevel eventfs and an instance eventfs. They should both act the same. They were different because of a misconception due to the remount not resetting the flags. Now that remount resets all the files and directories to default to the root node if a uid/gid is specified, it makes the logic simpler to implement. * tag 'trace-v6.9-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Have "events" directory get permissions from its parent eventfs: Do not treat events directory different than other directories eventfs: Do not differentiate the toplevel events directory tracefs: Still use mount point as default permissions for instances tracefs: Reset permissions on remount if permissions are options eventfs: Free all of the eventfs_inode after RCU eventfs/tracing: Add callback for release of an eventfs_inode
2024-05-05	Merge tag 'dma-mapping-6.9-2024-05-04' of ↵	Linus Torvalds	1	-0/+1
	git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fix from Christoph Hellwig: - fix the combination of restricted pools and dynamic swiotlb (Will Deacon) * tag 'dma-mapping-6.9-2024-05-04' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: initialise restricted pool list_head when SWIOTLB_DYNAMIC=y
2024-05-05	Merge tag 'clk-fixes-for-linus' of ↵	Linus Torvalds	7	-8/+60
	git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A handful of clk driver fixes: - Avoid a deadlock in the Qualcomm clk driver by making the regulator which supplies the GDSC optional - Restore RPM clks on Qualcomm msm8976 by setting num_clks - Fix Allwinner H6 CPU rate changing logic to avoid system crashes by temporarily reparenting the CPU clk to something that isn't being changed - Set a MIPI PLL min/max rate on Allwinner A64 to fix blank screens on some devices - Revert back to of_match_device() in the Samsung clkout driver to get the match data based on the parent device's compatible string" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: samsung: Revert "clk: Use device_get_match_data()" clk: sunxi-ng: a64: Set minimum and maximum rate for PLL-MIPI clk: sunxi-ng: common: Support minimum and maximum rate clk: sunxi-ng: h6: Reparent CPUX during PLL CPUX rate change clk: qcom: smd-rpm: Restore msm8976 num_clk clk: qcom: gdsc: treat optional supplies as optional
2024-05-04	eventfs: Have "events" directory get permissions from its parent	Steven Rostedt (Google)	1	-6/+24
	The events directory gets its permissions from the root inode. But this can cause an inconsistency if the instances directory changes its permissions, as the permissions of the created directories under it should inherit the permissions of the instances directory when directories under it are created. Currently the behavior is: # cd /sys/kernel/tracing # chgrp 1002 instances # mkdir instances/foo # ls -l instances/foo [..] -r--r----- 1 root lkp 0 May 1 18:55 buffer_total_size_kb -rw-r----- 1 root lkp 0 May 1 18:55 current_tracer -rw-r----- 1 root lkp 0 May 1 18:55 error_log drwxr-xr-x 1 root root 0 May 1 18:55 events --w------- 1 root lkp 0 May 1 18:55 free_buffer drwxr-x--- 2 root lkp 0 May 1 18:55 options drwxr-x--- 10 root lkp 0 May 1 18:55 per_cpu -rw-r----- 1 root lkp 0 May 1 18:55 set_event All the files and directories under "foo" has the "lkp" group except the "events" directory. That's because its getting its default value from the mount point instead of its parent. Have the "events" directory make its default value based on its parent's permissions. That now gives: # ls -l instances/foo [..] -rw-r----- 1 root lkp 0 May 1 21:16 buffer_subbuf_size_kb -r--r----- 1 root lkp 0 May 1 21:16 buffer_total_size_kb -rw-r----- 1 root lkp 0 May 1 21:16 current_tracer -rw-r----- 1 root lkp 0 May 1 21:16 error_log drwxr-xr-x 1 root lkp 0 May 1 21:16 events --w------- 1 root lkp 0 May 1 21:16 free_buffer drwxr-x--- 2 root lkp 0 May 1 21:16 options drwxr-x--- 10 root lkp 0 May 1 21:16 per_cpu -rw-r----- 1 root lkp 0 May 1 21:16 set_event Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04	eventfs: Do not treat events directory different than other directories	Steven Rostedt (Google)	1	-15/+1
	Treat the events directory the same as other directories when it comes to permissions. The events directory was considered different because it's dentry is persistent, whereas the other directory dentries are created when accessed. But the way tracefs now does its ownership by using the root dentry's permissions as the default permissions, the events directory can get out of sync when a remount is performed setting the group and user permissions. Remove the special case for the events directory on setting the attributes. This allows the updates caused by remount to work properly as well as simplifies the code. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04	eventfs: Do not differentiate the toplevel events directory	Steven Rostedt (Google)	2	-25/+11
	The toplevel events directory is really no different than the events directory of instances. Having the two be different caused inconsistencies and made it harder to fix the permissions bugs. Make all events directories act the same. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04	tracefs: Still use mount point as default permissions for instances	Steven Rostedt (Google)	1	-2/+25
	If the instances directory's permissions were never change, then have it and its children use the mount point permissions as the default. Currently, the permissions of instance directories are determined by the instance directory's permissions itself. But if the tracefs file system is remounted and changes the permissions, the instance directory and its children should use the new permission. But because both the instance directory and its children use the instance directory's inode for permissions, it misses the update. To demonstrate this: # cd /sys/kernel/tracing/ # mkdir instances/foo # ls -ld instances/foo drwxr-x--- 5 root root 0 May 1 19:07 instances/foo # ls -ld instances drwxr-x--- 3 root root 0 May 1 18:57 instances # ls -ld current_tracer -rw-r----- 1 root root 0 May 1 18:57 current_tracer # mount -o remount,gid=1002 . # ls -ld instances drwxr-x--- 3 root root 0 May 1 18:57 instances # ls -ld instances/foo/ drwxr-x--- 5 root root 0 May 1 19:07 instances/foo/ # ls -ld current_tracer -rw-r----- 1 root lkp 0 May 1 18:57 current_tracer Notice that changing the group id to that of "lkp" did not affect the instances directory nor its children. It should have been: # ls -ld current_tracer -rw-r----- 1 root root 0 May 1 19:19 current_tracer # ls -ld instances/foo/ drwxr-x--- 5 root root 0 May 1 19:25 instances/foo/ # ls -ld instances drwxr-x--- 3 root root 0 May 1 19:19 instances # mount -o remount,gid=1002 . # ls -ld current_tracer -rw-r----- 1 root lkp 0 May 1 19:19 current_tracer # ls -ld instances drwxr-x--- 3 root lkp 0 May 1 19:19 instances # ls -ld instances/foo/ drwxr-x--- 5 root lkp 0 May 1 19:25 instances/foo/ Where all files were updated by the remount gid update. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04	tracefs: Reset permissions on remount if permissions are options	Steven Rostedt (Google)	3	-2/+99
	There's an inconsistency with the way permissions are handled in tracefs. Because the permissions are generated when accessed, they default to the root inode's permission if they were never set by the user. If the user sets the permissions, then a flag is set and the permissions are saved via the inode (for tracefs files) or an internal attribute field (for eventfs). But if a remount happens that specify the permissions, all the files that were not changed by the user gets updated, but the ones that were are not. If the user were to remount the file system with a given permission, then all files and directories within that file system should be updated. This can cause security issues if a file's permission was updated but the admin forgot about it. They could incorrectly think that remounting with permissions set would update all files, but miss some. For example: # cd /sys/kernel/tracing # chgrp 1002 current_tracer # ls -l [..] -rw-r----- 1 root root 0 May 1 21:25 buffer_size_kb -rw-r----- 1 root root 0 May 1 21:25 buffer_subbuf_size_kb -r--r----- 1 root root 0 May 1 21:25 buffer_total_size_kb -rw-r----- 1 root lkp 0 May 1 21:25 current_tracer -rw-r----- 1 root root 0 May 1 21:25 dynamic_events -r--r----- 1 root root 0 May 1 21:25 dyn_ftrace_total_info -r--r----- 1 root root 0 May 1 21:25 enabled_functions Where current_tracer now has group "lkp". # mount -o remount,gid=1001 . # ls -l -rw-r----- 1 root tracing 0 May 1 21:25 buffer_size_kb -rw-r----- 1 root tracing 0 May 1 21:25 buffer_subbuf_size_kb -r--r----- 1 root tracing 0 May 1 21:25 buffer_total_size_kb -rw-r----- 1 root lkp 0 May 1 21:25 current_tracer -rw-r----- 1 root tracing 0 May 1 21:25 dynamic_events -r--r----- 1 root tracing 0 May 1 21:25 dyn_ftrace_total_info -r--r----- 1 root tracing 0 May 1 21:25 enabled_functions Everything changed but the "current_tracer". Add a new link list that keeps track of all the tracefs_inodes which has the permission flags that tell if the file/dir should use the root inode's permission or not. Then on remount, clear all the flags so that the default behavior of using the root inode's permission is done for all files and directories. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04	eventfs: Free all of the eventfs_inode after RCU	Steven Rostedt (Google)	1	-9/+16
	The freeing of eventfs_inode via a kfree_rcu() callback. But the content of the eventfs_inode was being freed after the last kref. This is dangerous, as changes are being made that can access the content of an eventfs_inode from an RCU loop. Instead of using kfree_rcu() use call_rcu() that calls a function to do all the freeing of the eventfs_inode after a RCU grace period has expired. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 43aa6f97c2d03 ("eventfs: Get rid of dentry pointers without refcounts") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04	eventfs/tracing: Add callback for release of an eventfs_inode	Steven Rostedt (Google)	3	-2/+36
	Synthetic events create and destroy tracefs files when they are created and removed. The tracing subsystem has its own file descriptor representing the state of the events attached to the tracefs files. There's a race between the eventfs files and this file descriptor of the tracing system where the following can cause an issue: With two scripts 'A' and 'B' doing: Script 'A': echo "hello int aaa" > /sys/kernel/tracing/synthetic_events while : do echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable done Script 'B': echo > /sys/kernel/tracing/synthetic_events Script 'A' creates a synthetic event "hello" and then just writes zero into its enable file. Script 'B' removes all synthetic events (including the newly created "hello" event). What happens is that the opening of the "enable" file has: { struct trace_event_file *file = inode->i_private; int ret; ret = tracing_check_open_get_tr(file->tr); [..] But deleting the events frees the "file" descriptor, and a "use after free" happens with the dereference at "file->tr". The file descriptor does have a reference counter, but there needs to be a way to decrement it from the eventfs when the eventfs_inode is removed that represents this file descriptor. Add an optional "release" callback to the eventfs_entry array structure, that gets called when the eventfs file is about to be removed. This allows for the creating on the eventfs file to increment the tracing file descriptor ref counter. When the eventfs file is deleted, it can call the release function that will call the put function for the tracing file descriptor. This will protect the tracing file from being freed while a eventfs file that references it is being opened. Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Tze-nan wu <[email protected]> Tested-by: Tze-nan Wu (吳澤南) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-03	Merge tag 'cxl-fixes-6.9-rc7' of ↵	Linus Torvalds	2	-1/+21
	git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fix from Dave Jiang: "Add missing RCH support for endpoint access_coordinate calculation. A late bug was reported by Robert Richter that the Restricted CXL Host (RCH) support was missing in the CXL endpoint access_coordinate calculation. The missing support causes the topology iterator to stumble over a NULL pointer and triggers a kernel OOPS on a platform with CXL 1.1 support. The fix bypasses RCH topology as the access_coordinate calculation is not necessary since RCH does not support hotplug and the memory region exported should be covered by the HMAT table already. A unit test is also added to cxl_test to check against future regressions on the topology iterator" * tag 'cxl-fixes-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH
2024-05-03	KVM: fix documentation for KVM_CREATE_GUEST_MEMFD	Carlos López	1	-1/+1
	The KVM_CREATE_GUEST_MEMFD ioctl returns a file descriptor, and is documented as such in the description. However, the "Returns" field in the documentation states that the ioctl returns 0 on success. Update this to match the description. Signed-off-by: Carlos López <[email protected]> Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-05-03	Merge tag 'for-linus-6.9a-rc7-tag' of ↵	Linus Torvalds	2	-3/+12
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two fixes when running as Xen PV guests for issues introduced in the 6.9 merge window, both related to apic id handling" * tag 'for-linus-6.9a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: return a sane initial apic id when running as PV guest x86/xen/smp_pv: Register the boot CPU APIC properly
2024-05-03	Merge tag 'efi-urgent-for-v6.9-1' of ↵	Linus Torvalds	1	-0/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: "This works around a shortcoming in the memory acceptation API, which may apparently hog the CPU for long enough to trigger the softlockup watchdog. Note that this only affects confidential VMs running under the Intel TDX hypervisor, which is why I accepted this for now, but this should obviously be fixed properly in the future" * tag 'efi-urgent-for-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/unaccepted: touch soft lockup during memory accept
2024-05-03	Merge tag 'block-6.9-20240503' of git://git.kernel.dk/linux	Linus Torvalds	11	-39/+67
	Pull block fixes from Jens Axboe: "Nothing major in here - an nvme pull request with mostly auth/tcp fixes, and a single fix for ublk not setting segment count and size limits" * tag 'block-6.9-20240503' of git://git.kernel.dk/linux: nvme-tcp: strict pdu pacing to avoid send stalls on TLS nvmet: fix nvme status code when namespace is disabled nvmet-tcp: fix possible memory leak when tearing down a controller nvme: cancel pending I/O if nvme controller is in terminal state nvmet-auth: replace pr_debug() with pr_err() to report an error. nvmet-auth: return the error code to the nvmet_auth_host_hash() callers nvme: find numa distance only if controller has valid numa id ublk: remove segment count and size limits nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
2024-05-03	Merge tag 'sound-6.9-rc7' of ↵	Linus Torvalds	44	-241/+704
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "As usual in a late stage, we received a fair amount of fixes for ASoC, and it became bigger than wished. But all fixes are rather device- specific, and they look pretty safe to apply. A major par of changes are series of fixes for ASoC meson and SOF drivers as well as for Realtek and Cirrus codecs. In addition, recent emu10k1 regression fixes and usual HD-audio quirks are included" * tag 'sound-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (46 commits) ALSA: hda/realtek: Fix build error without CONFIG_PM ALSA: hda/realtek: Fix conflicting PCI SSID 17aa:386f for Lenovo Legion models ALSA: hda/realtek - Set GPIO3 to default at S4 state for Thinkpad with ALC1318 ALSA: hda: intel-sdw-acpi: fix usage of device_get_named_child_node() ALSA: hda: intel-dsp-config: harden I2C/I2S codec detection ASoC: cs35l56: fix usages of device_get_named_child_node() ASoC: da7219-aad: fix usage of device_get_named_child_node() ASoC: meson: cards: select SND_DYNAMIC_MINORS ASoC: meson: axg-tdm: add continuous clock support ASoC: meson: axg-tdm-interface: manage formatters in trigger ASoC: meson: axg-card: make links nonatomic ASoC: meson: axg-fifo: use threaded irq to check periods ALSA: hda/realtek: Fix mute led of HP Laptop 15-da3001TU ALSA: emu10k1: make E-MU FPGA writes potentially more reliable ALSA: emu10k1: fix E-MU dock initialization ALSA: emu10k1: use mutex for E-MU FPGA access locking ALSA: emu10k1: move the whole GPIO event handling to the workqueue ALSA: emu10k1: factor out snd_emu1010_load_dock_firmware() ALSA: emu10k1: fix E-MU card dock presence monitoring ASoC: rt715-sdca: volume step modification ...
2024-05-03	Merge tag 'drm-fixes-2024-05-03' of https://gitlab.freedesktop.org/drm/kernel	Linus Torvalds	26	-101/+223
	Pull drm fixes from Dave Airlie: "Weekly fixes, mostly made up from amdgpu and some panel changes. Otherwise xe, nouveau, vmwgfx and a couple of others, all seems pretty on track. amdgpu: - Fix VRAM memory accounting - DCN 3.1 fixes - DCN 2.0 fix - DCN 3.1.5 fix - DCN 3.5 fix - DCN 3.2.1 fix - DP fixes - Seamless boot fix - Fix call order in amdgpu_ttm_move() - Fix doorbell regression - Disable panel replay temporarily amdkfd: - Flush wq before creating kfd process xe: - Fix UAF on rebind worker - Fix ADL-N display integration imagination: - fix page-count macro nouveau: - avoid page-table allocation failures - fix firmware memory allocation panel: - ili9341: avoid OF for device properties; respect deferred probe; fix usage of errno codes ttm: - fix status output vmwgfx: - fix legacy display unit - fix read length in fence signalling" * tag 'drm-fixes-2024-05-03' of https://gitlab.freedesktop.org/drm/kernel: (25 commits) drm/xe/display: Fix ADL-N detection drm/panel: ili9341: Use predefined error codes drm/panel: ili9341: Respect deferred probe drm/panel: ili9341: Correct use of device property APIs drm/xe/vm: prevent UAF in rebind_work_func() drm/amd/display: Disable panel replay by default for now drm/amdgpu: fix doorbell regression drm/amdkfd: Flush the process wq before creating a kfd_process drm/amd/display: Disable seamless boot on 128b/132b encoding drm/amd/display: Fix DC mode screen flickering on DCN321 drm/amd/display: Add VCO speed parameter for DCN31 FPU drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2 drm/amd/display: Allocate zero bw after bw alloc enable drm/amd/display: Fix incorrect DSC instance for MST drm/amd/display: Atom Integrated System Info v2_2 for DCN35 drm/amd/display: Add dtbclk access to dcn315 drm/amd/display: Ensure that dmcub support flag is set for DCN20 drm/amd/display: Handle Y carry-over in VCP X.Y calculation drm/amdgpu: Fix VRAM memory accounting drm/vmwgfx: Fix invalid reads in fence signaled events ...
2024-05-03	Merge tag 'spi-fix-v6.9-rc6' of ↵	Linus Torvalds	3	-3/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few small fixes for v6.9, The core fix is for issues with reuse of a spi_message in the case where we've got queued messages (a relatively rare occurrence with modern code so it wasn't noticed in testing). We also avoid an issue with the Kunpeng driver by simply removing the debug interface that could trigger it, and address issues with confusing and corrupted output when printing the IP version of the AXI SPI engine" * tag 'spi-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: fix null pointer dereference within spi_sync spi: hisi-kunpeng: Delete the dump interface of data registers in debugfs spi: axi-spi-engine: fix version format string
2024-05-03	Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/next	Marc Zyngier	33	-352/+521
	* kvm-arm64/pkvm-6.10: (25 commits) : . : At last, a bunch of pKVM patches, courtesy of Fuad Tabba. : From the cover letter: : : "This series is a bit of a bombay-mix of patches we've been : carrying. There's no one overarching theme, but they do improve : the code by fixing existing bugs in pKVM, refactoring code to : make it more readable and easier to re-use for pKVM, or adding : functionality to the existing pKVM code upstream." : . KVM: arm64: Force injection of a data abort on NISV MMIO exit KVM: arm64: Restrict supported capabilities for protected VMs KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap() KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst KVM: arm64: Rename firmware pseudo-register documentation file KVM: arm64: Reformat/beautify PTP hypercall documentation KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit KVM: arm64: Introduce and use predicates that check for protected VMs KVM: arm64: Add is_pkvm_initialized() helper KVM: arm64: Simplify vgic-v3 hypercalls KVM: arm64: Move setting the page as dirty out of the critical section KVM: arm64: Change kvm_handle_mmio_return() return polarity KVM: arm64: Fix comment for __pkvm_vcpu_init_traps() KVM: arm64: Prevent kmemleak from accessing .hyp.data KVM: arm64: Do not map the host fpsimd state to hyp in pKVM KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE KVM: arm64: Support TLB invalidation in guest context KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE KVM: arm64: Check for PTE validity when checking for executable/cacheable KVM: arm64: Avoid BUG-ing from the host abort path ... Signed-off-by: Marc Zyngier <[email protected]>
2024-05-03	Merge branch kvm-arm64/lpi-xa-cache into kvmarm-master/next	Marc Zyngier	26	-418/+1572
	* kvm-arm64/lpi-xa-cache: : . : New and improved LPI translation cache from Oliver Upton. : : From the cover letter: : : "As discussed [*], here is the new take on the LPI translation cache, : migrating to an xarray indexed by (devid, eventid) per ITS. : : The end result is quite satisfying, as it becomes possible to rip out : other nasties such as the lpi_list_lock. To that end, patches 2-6 aren't : _directly_ related to the translation cache cleanup, but instead are : done to enable the cleanups at the end of the series. : : I changed out my test machine from the last time so the baseline has : moved a bit, but here are the results from the vgic_lpi_stress test: : : +----------------------------+------------+-------------------+ : \| Configuration \| v6.8-rc1 \| v6.8-rc1 + series \| : +----------------------------+------------+-------------------+ : \| -v 1 -d 1 -e 1 -i 1000000 \| 2063296.81 \| 1362602.35 \| : \| -v 16 -d 16 -e 16 -i 10000 \| 610678.33 \| 5200910.01 \| : \| -v 16 -d 16 -e 17 -i 10000 \| 678361.53 \| 5890675.51 \| : \| -v 32 -d 32 -e 1 -i 100000 \| 580918.96 \| 8304552.67 \| : \| -v 1 -d 1 -e 17 -i 1000 \| 1512443.94 \| 1425953.8 \| : +----------------------------+------------+-------------------+ : : Unlike last time, no dramatic regressions at any performance point. The : regression on a single interrupt stream is to be expected, as the : overheads of SRCU and two tree traversals (kvm_io_bus_get_dev(), : translation cache xarray) are likely greater than that of a linked-list : with a single node." : . KVM: selftests: Add stress test for LPI injection KVM: selftests: Use MPIDR_HWID_BITMASK from cputype.h KVM: selftests: Add helper for enabling LPIs on a redistributor KVM: selftests: Add a minimal library for interacting with an ITS KVM: selftests: Add quadword MMIO accessors KVM: selftests: Standardise layout of GIC frames KVM: selftests: Align with kernel's GIC definitions KVM: arm64: vgic-its: Get rid of the lpi_list_lock KVM: arm64: vgic-its: Rip out the global translation cache KVM: arm64: vgic-its: Use the per-ITS translation cache for injection KVM: arm64: vgic-its: Spin off helper for finding ITS by doorbell addr KVM: arm64: vgic-its: Maintain a translation cache per ITS KVM: arm64: vgic-its: Scope translation cache invalidations to an ITS KVM: arm64: vgic-its: Get rid of vgic_copy_lpi_list() KVM: arm64: vgic-debug: Use an xarray mark for debug iterator KVM: arm64: vgic-its: Walk LPI xarray in vgic_its_cmd_handle_movall() KVM: arm64: vgic-its: Walk LPI xarray in vgic_its_invall() KVM: arm64: vgic-its: Walk LPI xarray in its_sync_lpi_pending_table() KVM: Treat the device list as an rculist Signed-off-by: Marc Zyngier <[email protected]>
2024-05-03	Merge branch kvm-arm64/nv-eret-pauth into kvmarm-master/next	Marc Zyngier	15	-121/+524
	* kvm-arm64/nv-eret-pauth: : . : Add NV support for the ERETAA/ERETAB instructions. From the cover letter: : : "Although the current upstream NV support has some support for : correctly emulating ERET, that support is only partial as it doesn't : support the ERETAA and ERETAB variants. : : Supporting these instructions was cast aside for a long time as it : involves implementing some form of PAuth emulation, something I wasn't : overly keen on. But I have reached a point where enough of the : infrastructure is there that it actually makes sense. So here it is!" : . KVM: arm64: nv: Work around lack of pauth support in old toolchains KVM: arm64: Drop trapping of PAuth instructions/keys KVM: arm64: nv: Advertise support for PAuth KVM: arm64: nv: Handle ERETA[AB] instructions KVM: arm64: nv: Add emulation for ERETAx instructions KVM: arm64: nv: Add kvm_has_pauth() helper KVM: arm64: nv: Reinject PAC exceptions caused by HCR_EL2.API==0 KVM: arm64: nv: Handle HCR_EL2.{API,APK} independently KVM: arm64: nv: Honor HFGITR_EL2.ERET being set KVM: arm64: nv: Fast-track 'InHost' exception returns KVM: arm64: nv: Add trap forwarding for ERET and SMC KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2 KVM: arm64: nv: Drop VCPU_HYP_CONTEXT flag KVM: arm64: Constraint PAuth support to consistent implementations KVM: arm64: Add helpers for ESR_ELx_ERET_ISS_ERET* KVM: arm64: Harden __ctxt_sys_reg() against out-of-range values Signed-off-by: Marc Zyngier <[email protected]>
2024-05-03	Merge branch kvm-arm64/host_data into kvmarm-master/next	Marc Zyngier	14	-75/+106
	* kvm-arm64/host_data: : . : Rationalise the host-specific data to live as part of the per-CPU state. : : From the cover letter: : : "It appears that over the years, we have accumulated a lot of cruft in : the kvm_vcpu_arch structure. Part of the gunk is data that is strictly : host CPU specific, and this result in two main problems: : : - the structure itself is stupidly large, over 8kB. With the : arch-agnostic kvm_vcpu, we're above 10kB, which is insane. This has : some ripple effects, as we need physically contiguous allocation to : be able to map it at EL2 for !VHE. There is more to it though, as : some data structures, although per-vcpu, could be allocated : separately. : : - We lose track of the life-cycle of this data, because we're : guaranteed that it will be around forever and we start relying on : wrong assumptions. This is becoming a maintenance burden. : : This series rectifies some of these things, starting with the two main : offenders: debug and FP, a lot of which gets pushed out to the per-CPU : host structure. Indeed, their lifetime really isn't that of the vcpu, : but tied to the physical CPU the vpcu runs on. : : This results in a small reduction of the vcpu size, but mainly a much : clearer understanding of the life-cycle of these structures." : . KVM: arm64: Move management of __hyp_running_vcpu to load/put on VHE KVM: arm64: Exclude FP ownership from kvm_vcpu_arch KVM: arm64: Exclude host_fpsimd_state pointer from kvm_vcpu_arch KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_arch KVM: arm64: Exclude host_debug_data from vcpu_arch KVM: arm64: Add accessor for per-CPU state Signed-off-by: Marc Zyngier <[email protected]>
2024-05-03	KVM: arm64: Move management of __hyp_running_vcpu to load/put on VHE	Marc Zyngier	1	-1/+4
	The per-CPU host context structure contains a __hyp_running_vcpu that serves as a replacement for kvm_get_current_vcpu() in contexts where we cannot make direct use of it (such as in the nVHE hypervisor). Since there is a lot of common code between nVHE and VHE, the latter also populates this field even if kvm_get_running_vcpu() always works. We currently pretty inconsistent when populating __hyp_running_vcpu to point to the currently running vcpu: - on {n,h}VHE, we set __hyp_running_vcpu on entry to __kvm_vcpu_run and clear it on exit. - on VHE, we set __hyp_running_vcpu on entry to __kvm_vcpu_run_vhe and never clear it, effectively leaving a dangling pointer... VHE is obviously the odd one here. Although we could make it behave just like nVHE, this wouldn't match the behaviour of KVM with VHE, where the load phase is where most of the context-switch gets done. So move all the __hyp_running_vcpu management to the VHE-specific load/put phases, giving us a bit more sanity and matching the behaviour of kvm_get_running_vcpu(). Reviewed-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-05-03	KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather()	Marc Zyngier	1	-13/+3
	Linux 6.9 has introduced new bitmap manipulation helpers, with bitmap_gather() being of special interest, as it does exactly what kvm_mpidr_index() is already doing. Make the latter a wrapper around the former. Reviewed-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>