Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field
are ignored. Interestingly enough older spec says that 11 is only valid
encoding.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
It is undefined and should generate #UD.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
mov r/m, sreg generates #UD ins sreg is incorrect.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Eliminate the need to call back into KVM to get it from emulator.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Use (get|set)_cr callback to emulate lmsw inside emulator.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Make use of bool as return values, and remove some useless
bool value converting. Thanks Avi to point this out.
Signed-off-by: Gui Jianfeng <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Mov reg, cr instruction doesn't change flags in any meaningful way, so
no need to update rflags after instruction execution.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Check return value against correct define instead of open code
the value.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
During rep emulation access length to RCX depends on current address
mode.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Set correct operation length. Add RAX (64bit) handling.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races.
However, the SDM is adamant that prefetch is allowed:
"The processor may create entries in paging-structure caches for
translations required for prefetches and for accesses that are a
result of speculative execution that would never actually occur
in the executed code path."
And, in fact, there was a race in the prefetch code: we picked up the pte
without the mmu lock held, so an older invlpg could install the pte over
a newer invlpg.
Reinstate the prefetch logic, but this time note whether another invlpg has
executed using a counter. If a race occured, do not install the pte.
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
The update_pte() path currently uses a nontrapping spte when a nonpresent
(or nonaccessed) gpte is written. This is fine since at present it is only
used on sync pages. However, on an unsync page this will cause an endless
fault loop as the guest is under no obligation to invlpg a gpte that
transitions from nonpresent to present.
Needed for the next patch which reinstates update_pte() on invlpg.
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Currently emulated atomic operations are immediately followed by a non-atomic
operation, so that kvm_mmu_pte_write() can be invoked. This updates the mmu
but undoes the whole point of doing things atomically.
Fix by only performing the atomic operation and the mmu update, and avoiding
the non-atomic write.
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Once upon a time, locked operations were emulated while holding the mmu mutex.
Since mmu pages were write protected, it was safe to emulate the writes in
a non-atomic manner, since there could be no other writer, either in the
guest or in the kernel.
These days emulation takes place without holding the mmu spinlock, so the
write could be preempted by an unshadowing event, which exposes the page
to writes by the guest. This may cause corruption of guest page tables.
Fix by using an atomic cmpxchg for these operations.
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
kvm_mmu_pte_write() reads guest ptes in two different occasions, both to
allow a 32-bit pae guest to update a pte with 4-byte writes. Consolidate
these into a single read, which also allows us to consolidate another read
from an invlpg speculating a gpte into the shadow page table.
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
This patch use generic linux function native_store_idt()
instead of kvm_get_idt(), and also removed the useless
function kvm_get_idt().
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Often an exception can help point out where things start to go wrong.
Signed-off-by: Avi Kivity <[email protected]>
|
|
Reading rip is expensive on vmx, so move it inside the tracepoint so we only
incur the cost if tracing is enabled.
Signed-off-by: Avi Kivity <[email protected]>
|
|
The prep_new_page() in page allocator calls set_page_private(page, 0).
So we don't need to reinitialize private of page.
Signed-off-by: Minchan Kim <[email protected]>
Cc: Avi Kivity<[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
This patch does:
- no need call tracepoint_synchronize_unregister() when kvm module
is unloaded since ftrace can handle it
- cleanup ftrace's macro
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
LMSW is present in both group tables. It was marked privileged only in
one of them. Intel analog of VMMCALL is already marked privileged.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
These bits are ignored by the hardware too. Implement this
for nested svm too.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
This patch adds the correct handling of the nested io
permission bitmap. Old behavior was to not lookup the port
in the iopm but only reinject an io intercept to the guest.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
There is a generic function now to calculate msrpm offsets.
Use that function in nested_svm_exit_handled_msr() remove
the duplicate logic (which had a bug anyway).
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
This patch optimizes the way the msrpm of the host and the
guest are merged. The old code merged the 2 msrpm pages
completly. This code needed to touch 24kb of memory for that
operation. The optimized variant this patch introduces
merges only the parts where the host msrpm may contain zero
bits. This reduces the amount of memory which is touched to
48 bytes.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
This patch introduces a list with all msrs a guest might
have direct access to and changes the svm_vcpu_init_msrpm
function to use this list.
It also adds a check to set_msr_interception which triggers
a warning if a developer changes a msr intercept that is not
in the list.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
The algorithm to find the offset in the msrpm for a given
msr is needed at other places too. Move that logic to its
own function.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
The nested_svm_exit_handled_msr() returned an bool which is
a bug. I worked by accident because the exected integer
return values match with the true and false values. This
patch changes the return value to int and let the function
return the correct values.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
arch/x86/kvm/kvm_timer.h:13: ERROR: code indent should use tabs where possible
Signed-off-by: Andrea Gelmini <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Moorestown does not have BIOS provided MP tables, we can save some time
by avoiding scaning of these tables. e.g.
[ 0.000000] Scan SMP from c0000000 for 1024 bytes.
[ 0.000000] Scan SMP from c009fc00 for 1024 bytes.
[ 0.000000] Scan SMP from c00f0000 for 65536 bytes.
[ 0.000000] Scan SMP from c00bfff0 for 1024 bytes.
Searching EBDA with the base at 0x40E will also result in random pointer
deferencing within 1MB. This can be a problem in Lincroft if the pointer
hits VGA area and VGA mode is not enabled.
Signed-off-by: Jacob Pan <[email protected]>
LKML-Reference: <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
Moorestown PCI code has special handling of devices with fixed BARs. In
case of BAR sizing writes, we need to update the fake PCI MMCFG space with real
size decode value.
When a BAR is not present, we need to return 0 instead of ~0. ~0 will be
treated as device error per bugzilla 12006.
Signed-off-by: Jacob Pan <[email protected]>
LKML-Reference: <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
Jaswinder reported this #GP:
|
| Message from syslogd@ht at May 14 09:39:32 ...
| kernel:[ 314.908612] EIP: [<c100ccca>]
| x86_perf_event_set_period+0x19d/0x1b2 SS:ESP 0068:edac3d70
|
Ming has narrowed it down to a comparision issue
between arguments with different sizes and
signs. As result event index reached a wrong
value which in turn led to a GP fault.
At the same time it was found that p4_next_cntr
has broken logic and should return the counter
index only if it was not yet borrowed for
another event.
Reported-by: Jaswinder Singh Rajput <[email protected]>
Reported-by: Lin Ming <[email protected]>
Bisected-by: Lin Ming <[email protected]>
Tested-by: Jaswinder Singh Rajput <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Frederic Weisbecker <[email protected]>
LKML-Reference: <20100514190815.GG13509@lenovo>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mrst: Don't blindly access extended config space
|
|
Do not blindly access extended configuration space unless we actively
know we're on a Moorestown platform. The fixed-size BAR capability
lives in the extended configuration space, and thus is not applicable
if the configuration space isn't appropriately sized.
This fixes booting certain VMware configurations with CONFIG_MRST=y.
Moorestown will add a fake PCI-X 266 capability to advertise the
presence of extended configuration space.
Reported-and-tested-by: Petr Vandrovec <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Acked-by: Jacob Pan <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
LKML-Reference: <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
x86, k8: Fix build error when K8_NB is disabled
x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
x86: Fix fake apicid to node mapping for numa emulation
|
|
environments
When running a quest kernel on xen we get:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 0, comm: swapper Tainted: G W 2.6.34-rc3 #1 /HVM domU
RIP: 0010:[<ffffffff8142f2fb>] [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
2ca/0x3df
RSP: 0018:ffff880002203e08 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
Stack:
ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
<0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
<0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
Call Trace:
<IRQ>
[<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
[<ffffffff81059140>] ? mod_timer+0x23/0x25
[<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
[<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
[<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
[<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
[<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
[<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
[<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
[<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
[<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
<EOI>
[<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
[<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
[<ffffffff810114eb>] ? default_idle+0x36/0x53
[<ffffffff81008c22>] cpu_idle+0xaa/0xe4
[<ffffffff81423a9a>] rest_init+0x7e/0x80
[<ffffffff81b10dd2>] start_kernel+0x40e/0x419
[<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
[<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
RIP [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
RSP <ffff880002203e08>
CR2: 0000000000000038
---[ end trace a7919e7f17c0a726 ]---
The L3 cache index disable feature of AMD CPUs has to be disabled if the
kernel is running as guest on top of a hypervisor because northbridge
devices are not available to the guest. Currently, this fixes a boot
crash on top of Xen. In the future this will become an issue on KVM as
well.
Check if northbridge devices are present and do not enable the feature
if there are none.
[ hpa: backported to 2.6.34 ]
Signed-off-by: Frank Arnold <[email protected]>
LKML-Reference: <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Cc: <[email protected]>
|
|
K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
at the final linking stage due to missing exported num_k8_northbridges.
Add a header stub for that.
Signed-off-by: Borislav Petkov <[email protected]>
LKML-Reference: <20100503183036.GJ26107@aftab>
Signed-off-by: H. Peter Anvin <[email protected]>
Cc: <[email protected]>
|
|
Signed-off-by: Andreas Dilger <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
The newer assemblers support the .cfi_sections directive so we can put
the CFI from .S files into the .debug_frame section that is preserved
in unstripped vmlinux and in separate debuginfo, rather than the
.eh_frame section that is now discarded by vmlinux.lds.S.
Signed-off-by: Roland McGrath <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
If host CPU is exposed to a guest the OSVW MSRs are not guaranteed
to be present and a GP fault occurs. Thus checking the feature flag is
essential.
Cc: <[email protected]> # .32.x .33.x
Signed-off-by: Andreas Herrmann <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
Linear search over all p4 MSRs should be fine if only
we would not use it in events scheduling routine which
is pretty time critical. Lets use hashes. It should speed
scheduling up significantly.
v2: Steven proposed to use more gentle approach than issue
BUG on error, so we use WARN_ONCE now
Signed-off-by: Cyrill Gorcunov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Lin Ming <[email protected]>
LKML-Reference: <20100512174242.GA5190@lenovo>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
As the processor may not consider GUEST_INTR_STATE_STI as a reason for
blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
when we asked for it. But as we consider this state as NMI-blocking, we
can run into an endless loop.
Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
active (originally suggested by Gleb). Intel confirmed that this is
safe, the processor will never complain about NMI injection in this
state.
Signed-off-by: Jan Kiszka <[email protected]>
KVM-Stable-Tag
Acked-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
cpuid_update may operate VMCS, so vcpu_load() and vcpu_put()
should be called to ensure correctness.
Signed-off-by: Dongxiao Xu <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
This patch makes KVM on 32 bit SVM working again by
correcting the masks used for iret interception. With the
wrong masks the upper 32 bits of the intercepts are masked
out which leaves vmrun unintercepted. This is not legal on
svm and the vmrun fails.
Bug was introduced by commits 95ba827313 and 3cfc3092.
Cc: Jan Kiszka <[email protected]>
Cc: Gleb Natapov <[email protected]>
Cc: [email protected]
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
The ACPI spec tells us that the firmware will reenable SCI_EN on resume.
Reality disagrees in some cases. The ACPI spec tells us that the only way
to set SCI_EN is via an SMM call.
https://bugzilla.kernel.org/show_bug.cgi?id=13745 shows us that doing so
may break machines. Tracing the ACPI calls made by Windows shows that it
unconditionally sets SCI_EN on resume with a direct register write, and
therefore the overwhelming probability is that everything is fine with
this behaviour.
Signed-off-by: Matthew Garrett <[email protected]>
Tested-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Len Brown <[email protected]>
|
|
use_xsave() is now just a special case of static_cpu_has(), so use
static_cpu_has().
Signed-off-by: H. Peter Anvin <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: Suresh Siddha <[email protected]>
LKML-Reference: <[email protected]>
|