Age | Commit message (Collapse) | Author | Files | Lines |
|
Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via ghcb page.
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Tianyu Lan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Replace uuid.h with types.h in a header (Andy Shevchenko)
- Avoid sleeping in atomic context in PCI driver (Long Li)
- Avoid sending IPI to self when it shouldn't (Vitaly Kuznetsov)
* tag 'hyperv-fixes-signed-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Avoid erroneously sending IPI to 'self'
hyper-v: Replace uuid.h with types.h
PCI: hv: Fix sleep while in non-sleep context when removing child devices from the bus
|
|
__send_ipi_mask_ex() uses an optimization: when the target CPU mask is
equal to 'cpu_present_mask' it uses 'HV_GENERIC_SET_ALL' format to avoid
converting the specified cpumask to VP_SET. This case was overlooked when
'exclude_self' parameter was added. As the result, a spurious IPI to
'self' can be send.
Reported-by: Thomas Gleixner <[email protected]>
Fixes: dfb5c1e12c28 ("x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix kernel crash caused by uio driver (Vitaly Kuznetsov)
- Remove on-stack cpumask from HV APIC code (Wei Liu)
* tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself
asm-generic/hyperv: provide cpumask_to_vpset_noself
Drivers: hv: vmbus: Fix kernel crash upon unbinding a device from uio_hv_generic driver
|
|
It is not a good practice to allocate a cpumask on stack, given it may
consume up to 1 kilobytes of stack space if the kernel is configured to
have 8192 cpus.
The internal helper functions __send_ipi_mask{,_ex} need to loop over
the provided mask anyway, so it is not too difficult to skip `self'
there. We can thus do away with the on-stack cpumask in
hv_send_ipi_mask_allbutself.
Adjust call sites of __send_ipi_mask as needed.
Reported-by: Linus Torvalds <[email protected]>
Suggested-by: Michael Kelley <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Fixes: 68bb7bfb7985d ("X86/Hyper-V: Enable IPI enlightenments")
Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
For root partition the VP assist pages are pre-determined by the
hypervisor. The root kernel is not allowed to change them to
different locations. And thus, we are getting below stack as in
current implementation root is trying to perform write to specific
MSR.
[ 2.778197] unchecked MSR access error: WRMSR to 0x40000073 (tried to write 0x0000000145ac5001) at rIP: 0xffffffff810c1084 (native_write_msr+0x4/0x30)
[ 2.784867] Call Trace:
[ 2.791507] hv_cpu_init+0xf1/0x1c0
[ 2.798144] ? hyperv_report_panic+0xd0/0xd0
[ 2.804806] cpuhp_invoke_callback+0x11a/0x440
[ 2.811465] ? hv_resume+0x90/0x90
[ 2.818137] cpuhp_issue_call+0x126/0x130
[ 2.824782] __cpuhp_setup_state_cpuslocked+0x102/0x2b0
[ 2.831427] ? hyperv_report_panic+0xd0/0xd0
[ 2.838075] ? hyperv_report_panic+0xd0/0xd0
[ 2.844723] ? hv_resume+0x90/0x90
[ 2.851375] __cpuhp_setup_state+0x3d/0x90
[ 2.858030] hyperv_init+0x14e/0x410
[ 2.864689] ? enable_IR_x2apic+0x190/0x1a0
[ 2.871349] apic_intr_mode_init+0x8b/0x100
[ 2.878017] x86_late_time_init+0x20/0x30
[ 2.884675] start_kernel+0x459/0x4fb
[ 2.891329] secondary_startup_64_no_verify+0xb0/0xbb
Since the hypervisor already provides the VP assist pages for root
partition, we need to memremap the memory from hypervisor for root
kernel to use. The mapping is done in hv_cpu_init during bringup and is
unmapped in hv_cpu_die during teardown.
Signed-off-by: Praveen Kumar <[email protected]>
Reviewed-by: Sunil Muthuswamy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The check for whether hibernation is possible, and the enabling of
Hyper-V panic notification during kexec, are both architecture neutral.
Move the code from under arch/x86 and into drivers/hv/hv_common.c where
it can also be used for ARM64.
No functional change.
Signed-off-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Architecture independent Hyper-V code calls various arch-specific handlers
when needed. To aid in supporting multiple architectures, provide weak
defaults that can be overridden by arch-specific implementations where
appropriate. But when arch-specific overrides aren't needed or haven't
been implemented yet for a particular architecture, these stubs reduce
the amount of clutter under arch/.
No functional change.
Signed-off-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The code to allocate and initialize the hv_vp_index array is
architecture neutral. Similarly, the code to allocate and
populate the hypercall input and output arg pages is architecture
neutral. Move both sets of code out from arch/x86 and into
utility functions in drivers/hv/hv_common.c that can be shared
by Hyper-V initialization on ARM64.
No functional changes. However, the allocation of the hypercall
input and output arg pages is done differently so that the
size is always the Hyper-V page size, even if not the same as
the guest page size (such as with ARM64's 64K page size).
Signed-off-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The extended capability query code is currently under arch/x86, but it
is architecture neutral, and is used by arch neutral code in the Hyper-V
balloon driver. Hence the balloon driver fails to build on other
architectures.
Fix by moving the ext cap code out from arch/x86. Because it is also
called from built-in architecture specific code, it can't be in a module,
so the Makefile treats as built-in even when CONFIG_HYPERV is "m". Also
drivers/Makefile is tweaked because this is the first occurrence of a
Hyper-V file that is built-in even when CONFIG_HYPERV is "m".
While here, update the hypercall status check to use the new helper
function instead of open coding. No functional change.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Sunil Muthuswamy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tlb updates from Ingo Molnar:
"The x86 MM changes in this cycle were:
- Implement concurrent TLB flushes, which overlaps the local TLB
flush with the remote TLB flush.
In testing this improved sysbench performance measurably by a
couple of percentage points, especially if TLB-heavy security
mitigations are active.
- Further micro-optimizations to improve the performance of TLB
flushes"
* tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Micro-optimize smp_call_function_many_cond()
smp: Inline on_each_cpu_cond() and on_each_cpu()
x86/mm/tlb: Remove unnecessary uses of the inline keyword
cpumask: Mark functions as pure
x86/mm/tlb: Do not make is_lazy dirty for no reason
x86/mm/tlb: Privatize cpu_tlbstate
x86/mm/tlb: Flush remote and local TLBs concurrently
x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
smp: Run functions concurrently in smp_call_function_many_cond()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Wei Liu:
- VMBus enhancement
- Free page reporting support for Hyper-V balloon driver
- Some patches for running Linux as Arm64 Hyper-V guest
- A few misc clean-up patches
* tag 'hyperv-next-signed-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (30 commits)
drivers: hv: Create a consistent pattern for checking Hyper-V hypercall status
x86/hyperv: Move hv_do_rep_hypercall to asm-generic
video: hyperv_fb: Add ratelimit on error message
Drivers: hv: vmbus: Increase wait time for VMbus unload
Drivers: hv: vmbus: Initialize unload_event statically
Drivers: hv: vmbus: Check for pending channel interrupts before taking a CPU offline
Drivers: hv: vmbus: Drivers: hv: vmbus: Introduce CHANNELMSG_MODIFYCHANNEL_RESPONSE
Drivers: hv: vmbus: Introduce and negotiate VMBus protocol version 5.3
Drivers: hv: vmbus: Use after free in __vmbus_open()
Drivers: hv: vmbus: remove unused function
Drivers: hv: vmbus: Remove unused linux/version.h header
x86/hyperv: remove unused linux/version.h header
x86/Hyper-V: Support for free page reporting
x86/hyperv: Fix unused variable 'hi' warning in hv_apic_read
x86/hyperv: Fix unused variable 'msr_val' warning in hv_qlock_wait
hv: hyperv.h: a few mundane typo fixes
drivers: hv: Fix EXPORT_SYMBOL and tab spaces issue
Drivers: hv: vmbus: Drop error message when 'No request id available'
asm-generic/hyperv: Add missing function prototypes per -W1 warnings
clocksource/drivers/hyper-v: Move handling of STIMER0 interrupts
...
|
|
There is not a consistent pattern for checking Hyper-V hypercall status.
Existing code uses a number of variants. The variants work, but a consistent
pattern would improve the readability of the code, and be more conformant
to what the Hyper-V TLFS says about hypercall status.
Implemented new helper functions hv_result(), hv_result_success(), and
hv_repcomp(). Changed the places where hv_do_hypercall() and related variants
are used to use the helper functions.
Signed-off-by: Joseph Salisbury <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/1618620183-9967-2-git-send-email-joseph.salisbury@linux.microsoft.com
Signed-off-by: Wei Liu <[email protected]>
|
|
That header is not needed in hv_proc.c.
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Yongjun Zheng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Linux has support for free page reporting now (36e66c554b5c) for
virtualized environment. On Hyper-V when virtually backed VMs are
configured, Hyper-V will advertise cold memory discard capability,
when supported. This patch adds the support to hook into the free
page reporting infrastructure and leverage the Hyper-V cold memory
discard hint hypercall to report/free these pages back to the host.
Signed-off-by: Sunil Muthuswamy <[email protected]>
Tested-by: Matheus Castello <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/SN4PR2101MB0880121FA4E2FEC67F35C1DCC0649@SN4PR2101MB0880.namprd21.prod.outlook.com
Signed-off-by: Wei Liu <[email protected]>
|
|
Fixes the following W=1 kernel build warning(s):
arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used [-Wunused-but-set-variable]
Compiled with CONFIG_HYPERV enabled:
make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-
make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-
HV_X64_MSR_EOI occupies bit 31:0 and HV_X64_MSR_TPR occupies bit 7:0,
which means the higher 32 bits are not really used. Cast the variable hi
to void to silence this warning.
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Xu Yihang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Fixes the following W=1 kernel build warning(s):
arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable ‘msr_val’ set but not used [-Wunused-but-set-variable]
unsigned long msr_val;
As Hypervisor Top-Level Functional Specification states in chapter 7.5
Virtual Processor Idle Sleep State, "A partition which possesses the
AccessGuestIdleMsr privilege (refer to section 4.2.2) may trigger entry
into the virtual processor idle sleep state through a read to the
hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE".
That means only a read of the MSR is necessary. The returned value
msr_val is not used. Cast it to void to silence this warning.
Reference:
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Xu Yihang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Fix ~144 single-word typos in arch/x86/ code comments.
Doing this in a single commit should reduce the churn.
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: [email protected]
|
|
STIMER0 interrupts are most naturally modeled as per-cpu IRQs. But
because x86/x64 doesn't have per-cpu IRQs, the core STIMER0 interrupt
handling machinery is done in code under arch/x86 and Linux IRQs are
not used. Adding support for ARM64 means adding equivalent code
using per-cpu IRQs under arch/arm64.
A better model is to treat per-cpu IRQs as the normal path (which it is
for modern architectures), and the x86/x64 path as the exception. Do this
by incorporating standard Linux per-cpu IRQ allocation into the main
SITMER0 driver code, and bypass it in the x86/x64 exception case. For
x86/x64, special case code is retained under arch/x86, but no STIMER0
interrupt handling code is needed under arch/arm64.
No functional change.
Signed-off-by: Michael Kelley <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
With the new Hyper-V MSR set function, hyperv_report_panic_msg() can be
architecture neutral, so move it out from under arch/x86 and merge into
hv_kmsg_dump(). This move also avoids needing a separate implementation
under arch/arm64.
No functional change.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Current code defines a separate get and set macro for each Hyper-V
synthetic MSR used by the VMbus driver. Furthermore, the get macro
can't be converted to a standard function because the second argument
is modified in place, which is somewhat bad form.
Redo this by providing a single get and a single set function that
take a parameter specifying the MSR to be operated on. Fixup usage
of the get function. Calling locations are no more complex than before,
but the code under arch/x86 and the upcoming code under arch/arm64
is significantly simplified.
Also standardize the names of Hyper-V synthetic MSRs that are
architecture neutral. But keep the old x86-specific names as aliases
that can be removed later when all references (particularly in KVM
code) have been cleaned up in a separate patch series.
No functional change.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
The Hyper-V page allocator functions are implemented in an architecture
neutral way. Move them into the architecture neutral VMbus module so
a separate implementation for ARM64 is not needed.
No functional change.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. Introduce
paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
and hyper-v are only compile-tested).
While the updated smp infrastructure is capable of running a function on
a single local core, it is not optimized for this case. The multiple
function calls and the indirect branch introduce some overhead, and
might make local TLB flushes slower than they were before the recent
changes.
Before calling the SMP infrastructure, check if only a local TLB flush
is needed to restore the lost performance in this common case. This
requires to check mm_cpumask() one more time, but unless this mask is
updated very frequently, this should impact performance negatively.
Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Michael Kelley <[email protected]> # Hyper-v parts
Reviewed-by: Juergen Gross <[email protected]> # Xen and paravirt parts
Reviewed-by: Dave Hansen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
Hypervisor when Linux runs as the root partition. Implement an IRQ
domain to handle mapping and unmapping of IO-APIC interrupts.
Signed-off-by: Wei Liu <[email protected]>
Acked-by: Joerg Roedel <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When Linux runs as the root partition on Microsoft Hypervisor, its
interrupts are remapped. Linux will need to explicitly map and unmap
interrupts for hardware.
Implement an MSI domain to issue the correct hypercalls. And initialize
this irq domain as the default MSI irq domain.
Signed-off-by: Sunil Muthuswamy <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
They are used to deposit pages into Microsoft Hypervisor and bring up
logical and virtual processors.
Signed-off-by: Lillian Grassin-Drake <[email protected]>
Signed-off-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Nuno Das Neves <[email protected]>
Co-Developed-by: Lillian Grassin-Drake <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Co-Developed-by: Nuno Das Neves <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When Linux is running as the root partition, the hypercall page will
have already been setup by Hyper-V. Copy the content over to the
allocated page.
Add checks to hv_suspend & co to bail early because they are not
supported in this setup yet.
Signed-off-by: Lillian Grassin-Drake <[email protected]>
Signed-off-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Nuno Das Neves <[email protected]>
Co-Developed-by: Lillian Grassin-Drake <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Co-Developed-by: Nuno Das Neves <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
We will need the partition ID for executing some hypercalls later.
Signed-off-by: Lillian Grassin-Drake <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When Linux runs as the root partition, it will need to make hypercalls
which return data from the hypervisor.
Allocate pages for storing results when Linux runs as the root
partition.
Signed-off-by: Lillian Grassin-Drake <[email protected]>
Co-Developed-by: Lillian Grassin-Drake <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
If bit 22 of Group B Features is set, the guest has access to the
Isolation Configuration CPUID leaf. On x86, the first four bits
of EAX in this leaf provide the isolation type of the partition;
we entail three isolation types: 'SNP' (hardware-based isolation),
'VBS' (software-based isolation), and 'NONE' (no isolation).
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
|
|
With commit 4df4cb9e99f8, the Hyper-V direct-mode STIMER is actually
initialized before LAPIC is initialized: see
apic_intr_mode_init()
x86_platform.apic_post_init()
hyperv_init()
hv_stimer_alloc()
apic_bsp_setup()
setup_local_APIC()
setup_local_APIC() temporarily disables LAPIC, initializes it and
re-eanble it. The direct-mode STIMER depends on LAPIC, and when it's
registered, it can be programmed immediately and the timer can fire
very soon:
hv_stimer_init
clockevents_config_and_register
clockevents_register_device
tick_check_new_device
tick_setup_device
tick_setup_periodic(), tick_setup_oneshot()
clockevents_program_event
When the timer fires in the hypervisor, if the LAPIC is in the
disabled state, new versions of Hyper-V ignore the event and don't inject
the timer interrupt into the VM, and hence the VM hangs when it boots.
Note: when the VM starts/reboots, the LAPIC is pre-enabled by the
firmware, so the window of LAPIC being temporarily disabled is pretty
small, and the issue can only happen once out of 100~200 reboots for
a 40-vCPU VM on one dev host, and on another host the issue doesn't
reproduce after 2000 reboots.
The issue is more noticeable for kdump/kexec, because the LAPIC is
disabled by the first kernel, and stays disabled until the kdump/kexec
kernel enables it. This is especially an issue to a Generation-2 VM
(for which Hyper-V doesn't emulate the PIT timer) when CONFIG_HZ=1000
(rather than CONFIG_HZ=250) is used.
Fix the issue by moving hv_stimer_alloc() to a later place where the
LAPIC timer is initialized.
Fixes: 4df4cb9e99f8 ("x86/hyperv: Initialize clockevents earlier in CPU onlining")
Signed-off-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
We've observed crashes due to an empty cpu mask in
hyperv_flush_tlb_others. Obviously the cpu mask in question is changed
between the cpumask_empty call at the beginning of the function and when
it is actually used later.
One theory is that an interrupt comes in between and a code path ends up
changing the mask. Move the check after interrupt has been disabled to
see if it fixes the issue.
Signed-off-by: Wei Liu <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Michael Kelley <[email protected]>
|
|
Currently the kexec kernel can panic or hang due to 2 causes:
1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
old VP Assist Pages when the kexec kernel runs. The same issue is fixed
for hibernation in commit 421f090c819d ("x86/hyperv: Suspend/resume the
VP assist page for hibernation"). Now fix it for kexec.
2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
can still try to access the hypercall page and cause panic. The workaround
"hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
hyperv_cleanup() to a better place.
Signed-off-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- clarify a comment (Michael Kelley)
- change a pr_warn() to pr_info() (Olaf Hering)
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Clarify comment on x2apic mode
hv_balloon: disable warning when floor reached
|
|
The comment about Hyper-V accessors is unclear regarding their
potential use in x2apic mode, as is the associated commit message
in e211288b72f1. Clarify that while the architectural and
synthetic MSRs are equivalent in x2apic mode, the full set of xapic
accessors cannot be used because of register layout differences.
Fixes: e211288b72f1 ("x86/hyperv: Make vapic support x2apic mode")
Signed-off-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b
removed the "X64" in the symbol names so they would make sense for both x86 and
ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h
so that existing x86 code would continue to compile.
As a cleanup, update the x86 code to use the symbols without the "X64", then remove
the aliases. There's no functional change.
Signed-off-by: Joseph Salisbury <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Fix the recently added new __vmalloc_node_range callers to pass the
correct values as the owner for display in /proc/vmallocinfo.
Fixes: 800e26b81311 ("x86/hyperv: allocate the hypercall page with only read and execute bits")
Fixes: 10d5e97c1bf8 ("arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page")
Fixes: 7a0e27b2a0ce ("mm: remove vmalloc_exec")
Reported-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "fix a hyperv W^X violation and remove vmalloc_exec"
Dexuan reported a W^X violation due to the fact that the hyper hypercall
page due switching it to be allocated using vmalloc_exec.
The problem is that PAGE_KERNEL_EXEC as used by vmalloc_exec actually
sets writable permissions in the pte. This series fixes the issue by
switching to the low-level __vmalloc_node_range interface that allows
specifing more detailed permissions instead. It then also open codes
the other two callers and removes the somewhat confusing vmalloc_exec
interface.
Peter noted that the hyper hypercall page allocation also has another
long standing issue in that it shouldn't use the full vmalloc but just
the module space. This issue is so far theoretical as the allocation is
done early in the boot process. I plan to fix it with another bigger
series for 5.9.
This patch (of 3):
Avoid a W^X violation cause by the fact that PAGE_KERNEL_EXEC includes
the writable bit.
For this resurrect the removed PAGE_KERNEL_RX definition, but as
PAGE_KERNEL_ROX to match arm64 and powerpc.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 78bb17f76edc ("x86/hyperv: use vmalloc_exec for the hypercall page")
Signed-off-by: Christoph Hellwig <[email protected]>
Reported-by: Dexuan Cui <[email protected]>
Tested-by: Vitaly Kuznetsov <[email protected]>
Acked-by: Wei Liu <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Convert various hypervisor vectors to IDTENTRY_SYSVEC:
- Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
- Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
- Remove the ASM idtentries in 64-bit
- Remove the BUILD_INTERRUPT entries in 32-bit
- Remove the old prototypes
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Michael Kelley <[email protected]> [hyperv]
Acked-by: Gao Xiang <[email protected]> [erofs]
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Wei Liu <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "decruft the vmalloc API", v2.
Peter noticed that with some dumb luck you can toast the kernel address
space with exported vmalloc symbols.
I used this as an opportunity to decruft the vmalloc.c API and make it
much more systematic. This also removes any chance to create vmalloc
mappings outside the designated areas or using executable permissions
from modules. Besides that it removes more than 300 lines of code.
This patch (of 29):
Use the designated helper for allocating executable kernel memory, and
remove the now unused PAGE_KERNEL_RX define.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Acked-by: Wei Liu <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Errors during hibernation with reenlightenment notifications enabled were
reported:
[ 51.730435] PM: hibernation entry
[ 51.737435] PM: Syncing filesystems ...
...
[ 54.102216] Disabling non-boot CPUs ...
[ 54.106633] smpboot: CPU 1 is now offline
[ 54.110006] unchecked MSR access error: WRMSR to 0x40000106 (tried to
write 0x47c72780000100ee) at rIP: 0xffffffff90062f24
native_write_msr+0x4/0x20)
[ 54.110006] Call Trace:
[ 54.110006] hv_cpu_die+0xd9/0xf0
...
Normally, hv_cpu_die() just reassigns reenlightenment notifications to some
other CPU when the CPU receiving them goes offline. Upon hibernation, there
is no other CPU which is still online so cpumask_any_but(cpu_online_mask)
returns >= nr_cpu_ids and using it as hv_vp_index index is incorrect.
Disable the feature when cpumask_any_but() fails.
Also, as we now disable reenlightenment notifications upon hibernation we
need to restore them on resume. Check if hv_reenlightenment_cb was
previously set and restore from hv_resume().
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
Unlike the other CPUs, CPU0 is never offlined during hibernation, so in the
resume path, the "new" kernel's VP assist page is not suspended (i.e. not
disabled), and later when we jump to the "old" kernel, the page is not
properly re-enabled for CPU0 with the allocated page from the old kernel.
So far, the VP assist page is used by hv_apic_eoi_write(), and is also
used in the case of nested virtualization (running KVM atop Hyper-V).
For hv_apic_eoi_write(), when the page is not properly re-enabled,
hvp->apic_assist is always 0, so the HV_X64_MSR_EOI MSR is always written.
This is not ideal with respect to performance, but Hyper-V can still
correctly handle this according to the Hyper-V spec; nevertheless, Linux
still must update the Hyper-V hypervisor with the correct VP assist page
to prevent Hyper-V from writing to the stale page, which causes guest
memory corruption and consequently may have caused the hangs and triple
faults seen during non-boot CPUs resume.
Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.
Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
With the fix, hibernation can pass a long-haul test of 2000 runs.
In the case of nested virtualization, disabling/reenabling the assist
page upon hibernation may be unsafe if there are active L2 guests.
It looks KVM should be enhanced to abort the hibernation request if
there is any active L2 guest.
Fixes: 05bd330a7fd8 ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
Cc: [email protected]
Signed-off-by: Dexuan Cui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
When oops happens with panic_on_oops unset, the oops
thread is killed by die() and system continues to run.
In such case, guest should not report crash register
data to host since system still runs. Check panic_on_oops
and return directly in hyperv_report_panic() when the function
is called in the die() and panic_on_oops is unset. Fix it.
Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful")
Signed-off-by: Tianyu Lan <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
For hibernation the hypercall page must be disabled before the hibernation
image is created so that subsequent hypercall operations fail safely. On
resume the hypercall page has to be restored and reenabled to ensure proper
operation of the resumed kernel.
Implement the necessary suspend/resume callbacks.
[ tglx: Decrypted changelog ]
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Sasha Levin:
- support for new VMBus protocols (Andrea Parri)
- hibernation support (Dexuan Cui)
- latency testing framework (Branden Bonaby)
- decoupling Hyper-V page size from guest page size (Himadri Pandya)
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic
drivers/hv: Replace binary semaphore with mutex
drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
HID: hyperv: Add the support of hibernation
hv_balloon: Add the support of hibernation
x86/hyperv: Implement hv_is_hibernation_supported()
Drivers: hv: balloon: Remove dependencies on guest page size
Drivers: hv: vmbus: Remove dependencies on guest page size
x86: hv: Add function to allocate zeroed page for Hyper-V
Drivers: hv: util: Specify ring buffer size using Hyper-V page size
Drivers: hv: Specify receive buffer size using Hyper-V page size
tools: hv: add vmbus testing tool
drivers: hv: vmbus: Introduce latency testing
video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
hv_netvsc: Add the support of hibernation
hv_sock: Add the support of hibernation
video: hyperv_fb: Add the support of hibernation
scsi: storvsc: Add the support of hibernation
Drivers: hv: vmbus: Add module parameter to cap the VMBus version
...
|
|
The API will be used by the hv_balloon and hv_vmbus drivers.
Balloon up/down and hot-add of memory must not be active if the user
wants the Linux VM to support hibernation, because they are incompatible
with hibernation according to Hyper-V team, e.g. upon suspend the
balloon VSP doesn't save any info about the ballooned-out pages (if any);
so, after Linux resumes, Linux balloon VSC expects that the VSP will
return the pages if Linux is under memory pressure, but the VSP will
never do that, since the VSP thinks it never stole the pages from the VM.
So, if the user wants Linux VM to support hibernation, Linux must forbid
balloon up/down and hot-add, and the only functionality of the balloon VSC
driver is reporting the VM's memory pressure to the host.
Ideally, when Linux detects that the user wants it to support hibernation,
the balloon VSC should tell the VSP that it does not support ballooning
and hot-add. However, the current version of the VSP requires the VSC
should support these capabilities, otherwise the capability negotiation
fails and the VSC can not load at all, so with the later changes to the
VSC driver, Linux VM still reports to the VSP that the VSC supports these
capabilities, but the VSC ignores the VSP's requests of balloon up/down
and hot add, and reports an error to the VSP, when applicable. BTW, in
the future the balloon VSP driver will allow the VSC to not support the
capabilities of balloon up/down and hot add.
The ACPI S4 state is not a must for hibernation to work, because Linux is
able to hibernate as long as the system can shut down. However in practice
we decide to artificially use the presence of the virtual ACPI S4 state as
an indicator of the user's intent of using hibernation, because Linux VM
must find a way to know if the user wants to use the hibernation feature
or not.
By default, Hyper-V does not enable the virtual ACPI S4 state; on recent
Hyper-V hosts (e.g. RS5, 19H1), the administrator is able to enable the
state for a VM by WMI commands.
Once all the vmbus and VSC patches for the hibernation feature are
accepted, an extra patch will be submitted to forbid hibernation if the
virtual ACPI S4 state is absent, i.e. hv_is_hibernation_supported() is
false.
Signed-off-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
|
|
Hyper-V assumes page size to be 4K. While this assumption holds true on
x86 architecture, it might not be true for ARM64 architecture. Hence
define hyper-v specific function to allocate a zeroed page which can
have a different implementation on ARM64 architecture to handle the
conflict between hyper-v's assumed page size and actual guest page size.
Signed-off-by: Himadri Pandya <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
|
|
Hyper-V has historically initialized stimer-based clockevents late in the
process of onlining a CPU because clockevents depend on stimer
interrupts. In the original Hyper-V design, stimer interrupts generate a
VMbus message, so the VMbus machinery must be running first, and VMbus
can't be initialized until relatively late. On x86/64, LAPIC timer based
clockevents are used during early initialization before VMbus and
stimer-based clockevents are ready, and again during CPU offlining after
the stimer clockevents have been shut down.
Unfortunately, this design creates problems when offlining CPUs for
hibernation or other purposes. stimer-based clockevents are shut down
relatively early in the offlining process, so clockevents_unbind_device()
must be used to fallback to the LAPIC-based clockevents for the remainder
of the offlining process. Furthermore, the late initialization and early
shutdown of stimer-based clockevents doesn't work well on ARM64 since there
is no other timer like the LAPIC to fallback to. So CPU onlining and
offlining doesn't work properly.
Fix this by recognizing that stimer Direct Mode is the normal path for
newer versions of Hyper-V on x86/64, and the only path on other
architectures. With stimer Direct Mode, stimer interrupts don't require any
VMbus machinery. stimer clockevents can be initialized and shut down
consistent with how it is done for other clockevent devices. While the old
VMbus-based stimer interrupts must still be supported for backward
compatibility on x86, that mode of operation can be treated as legacy.
So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
that new state when in Direct Mode. Update the Hyper-V clocksource driver
to allocate and initialize stimer clockevents earlier during boot. Update
Hyper-V initialization and the VMbus driver to use this new design. As a
result, the LAPIC timer is no longer used during boot or CPU
onlining/offlining and clockevents_unbind_device() is not called. But
retain the old design as a legacy implementation for older versions of
Hyper-V that don't support Direct Mode.
Signed-off-by: Michael Kelley <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Dexuan Cui <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Pick up upstream fixes to avoid conflicts.
|