aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2013-08-09x86, ticketlock: Add slowpath logicJeremy Fitzhardinge5-25/+74
Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: Unlocker Locker test for lock pickup -> fail unlock test slowpath -> false set slowpath flags block Whereas this works in any ordering: Unlocker Locker set slowpath flags test for lock pickup -> fail block unlock test slowpath -> true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked "add" is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Srivatsa Vaddagiri <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Cc: Stephan Diestelhorst <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, pvticketlock: When paravirtualizing ticket locks, increment by 2Jeremy Fitzhardinge2-6/+14
Increment ticket head/tails by 2 rather than 1 to leave the LSB free to store a "is in slowpath state" bit. This halves the number of possible CPUs for a given ticket size, but this shouldn't matter in practice - kernels built for 32k+ CPU systems are probably specially built for the hardware rather than a generic distro kernel. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, pvticketlock: Use callee-save for lock_spinningJeremy Fitzhardinge4-4/+5
Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocksJeremy Fitzhardinge1-0/+14
Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-7-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09xen, pvticketlock: Xen implementation for PV ticket locksJeremy Fitzhardinge1-269/+79
Replace the old Xen implementation of PV spinlocks with and implementation of xen_lock_spinning and xen_unlock_kick. xen_lock_spinning simply registers the cpu in its entry in lock_waiting, adds itself to the waiting_cpus set, and blocks on an event channel until the channel becomes pending. xen_unlock_kick searches the cpus in waiting_cpus looking for the one which next wants this lock with the next ticket, if any. If found, it kicks it by making its event channel pending, which wakes it up. We need to make sure interrupts are disabled while we're relying on the contents of the per-cpu lock_waiting values, otherwise an interrupt handler could come in, try to take some other lock, block, and overwrite our values. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-6-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> [ Raghavendra: use function + enum instead of macro, cmpxchg for zero status reset Reintroduce break since we know the exact vCPU to send IPI as suggested by Konrad.] Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09xen: Defer spinlock setup until boot CPU setupJeremy Fitzhardinge1-1/+1
There's no need to do it at very early init, and doing it there makes it impossible to use the jump_label machinery. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-5-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, ticketlock: Collapse a layer of functionsJeremy Fitzhardinge1-30/+5
Now that the paravirtualization layer doesn't exist at the spinlock level any more, we can collapse the __ticket_ functions into the arch_ functions. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-4-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocksRaghavendra K T1-0/+1
The code size expands somewhat, and its better to just call a function rather than inline it. Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch, which is simplified. Suggested-by: Linus Torvalds <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-3-git-send-email-raghavendra.kt@linux.vnet.ibm.com Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, spinlock: Replace pv spinlocks with pv ticketlocksJeremy Fitzhardinge6-61/+65
Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Results: ======= setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM base = 3.11-rc patched = base + pvspinlock V12 +-----------------+----------------+--------+ dbench (Throughput in MB/sec. Higher is better) +-----------------+----------------+--------+ | base (stdev %)|patched(stdev%) | %gain | +-----------------+----------------+--------+ | 15035.3 (0.3) |15150.0 (0.6) | 0.8 | | 1470.0 (2.2) | 1713.7 (1.9) | 16.6 | | 848.6 (4.3) | 967.8 (4.3) | 14.0 | | 652.9 (3.5) | 685.3 (3.7) | 5.0 | +-----------------+----------------+--------+ pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases, and undercommits results are flat Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t] Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09Merge branch 'master' into virtio-nextRusty Russell85-1197/+399
The next commit gets conflicts because it relies on patches which were cc:stable and thus had to be merged into Linus' tree before the coming merge window. So pull in master now. Signed-off-by: Rusty Russell <[email protected]>
2013-08-07x86, relocs: Move ELF relocation handling to CKees Cook8-40/+97
Moves the relocation handling into C, after decompression. This requires that the decompressed size is passed to the decompression routine as well so that relocations can be found. Only kernels that need relocation support will use the code (currently just x86_32), but this is laying the ground work for 64-bit using it in support of KASLR. Based on work by Neill Clift and Michael Davidson. Signed-off-by: Kees Cook <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Acked-by: Zhang Yanfei <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-07Merge branch 'pm-cpufreq-ondemand' into pm-cpufreqRafael J. Wysocki1-29/+0
* pm-cpufreq: cpufreq: Remove unused function __cpufreq_driver_getavg() cpufreq: Remove unused APERF/MPERF support cpufreq: ondemand: Change the calculation of target frequency
2013-08-07KVM: nVMX: Advertise IA32_PAT in VM exit controlArthur Chunqi Li1-3/+5
Advertise VM_EXIT_SAVE_IA32_PAT and VM_EXIT_LOAD_IA32_PAT. Signed-off-by: Arthur Chunqi Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07KVM: nVMX: Fix up VM_ENTRY_IA32E_MODE control feature reportingJan Kiszka1-1/+5
Do not report that we can enter the guest in 64-bit mode if the host is 32-bit only. This is not supported by KVM. Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07KVM: nEPT: Advertise WB type EPTPJan Kiszka1-2/+2
At least WB must be possible. Signed-off-by: Jan Kiszka <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nVMX: Keep arch.pat in sync on L1-L2 switchesJan Kiszka1-3/+6
When asking vmx to load the PAT MSR for us while switching from L1 to L2 or vice versa, we have to update arch.pat as well as it may later be used again to load or read out the MSR content. Reviewed-by: Gleb Natapov <[email protected]> Tested-by: Arthur Chunqi Li <[email protected]> Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Miscelleneous cleanupsNadav Har'El1-4/+2
Some trivial code cleanups not really related to nested EPT. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Some additional commentsNadav Har'El1-0/+13
Some additional comments to preexisting code: Explain who (L0 or L1) handles EPT violation and misconfiguration exits. Don't mention "shadow on either EPT or shadow" as the only two options. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07Advertise the support of EPT to the L1 guest, through the appropriate MSR.Nadav Har'El1-2/+18
This is the last patch of the basic Nested EPT feature, so as to allow bisection through this patch series: The guest will not see EPT support until this last patch, and will not attempt to use the half-applied feature. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Nested INVEPTNadav Har'El4-0/+77
If we let L1 use EPT, we should probably also support the INVEPT instruction. In our current nested EPT implementation, when L1 changes its EPT table for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course of this modification already calls INVEPT. But if last level of shadow page is unsync not all L1's changes to EPT12 are intercepted, which means roots need to be synced when L1 calls INVEPT. Global INVEPT should not be different since roots are synced by kvm_mmu_load() each time EPTP02 changes. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: MMU context for nested EPTNadav Har'El3-1/+69
KVM's existing shadow MMU code already supports nested TDP. To use it, we need to set up a new "MMU context" for nested EPT, and create a few callbacks for it (nested_ept_*()). This context should also use the EPT versions of the page table access functions (defined in the previous patch). Then, we need to switch back and forth between this nested context and the regular MMU context when switching between L1 and L2 (when L1 runs this L2 with EPT). Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Add nEPT violation/misconfigration supportYang Zhang4-14/+95
Inject nEPT fault to L1 guest. This patch is original from Xinhao. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: correctly check if remote tlb flush is needed for shadowed EPT tablesGleb Natapov1-4/+4
need_remote_flush() assumes that shadow page is in PT64 format, but with addition of nested EPT this is no longer always true. Fix it by bits definitions that depend on host shadow page type. Reported-by: Xiao Guangrong <[email protected]> Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Redefine EPT-specific link_shadow_page()Yang Zhang2-5/+11
Since nEPT doesn't support A/D bit, so we should not set those bit when build shadow page table. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Add EPT tables support to paging_tmpl.hNadav Har'El2-1/+41
This is the first patch in a series which adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set its own cr3 and take its own page faults without either of L0 or L1 getting involved. This often significanlty improves L2's performance over the previous two alternatives (shadow page tables over EPT, and shadow page tables over shadow page tables). This patch adds EPT support to paging_tmpl.h. paging_tmpl.h contains the code for reading and writing page tables. The code for 32-bit and 64-bit tables is very similar, but not identical, so paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once with PTTYPE=64, and this generates the two sets of similar functions. There are subtle but important differences between the format of EPT tables and that of ordinary x86 64-bit page tables, so for nested EPT we need a third set of functions to read the guest EPT table and to write the shadow EPT table. So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed with "EPT") which correctly read and write EPT tables. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Support shadow paging for guest paging without A/D bitsGleb Natapov1-3/+13
Some guest paging modes do not support A/D bits. Add support for such modes in shadow page code. For such modes PT_GUEST_DIRTY_MASK, PT_GUEST_ACCESSED_MASK, PT_GUEST_DIRTY_SHIFT and PT_GUEST_ACCESSED_SHIFT should be set to zero. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: make guest's A/D bits depends on guest's paging modeGleb Natapov1-8/+22
This patch makes guest A/D bits definition to be dependable on paging mode, so when EPT support will be added it will be able to define them differently. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Move common code to paging_tmpl.hNadav Har'El2-67/+69
For preparation, we just move gpte_access(), prefetch_invalid_gpte(), s_rsvd_bits_set(), protect_clean_gpte() and is_dirty_gpte() from mmu.c to paging_tmpl.h. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Fix wrong test in kvm_set_cr3Nadav Har'El1-11/+0
kvm_set_cr3() attempts to check if the new cr3 is a valid guest physical address. The problem is that with nested EPT, cr3 is an *L2* physical address, not an L1 physical address as this test expects. As the comment above this test explains, it isn't necessary, and doesn't correspond to anything a real processor would do. So this patch removes it. Note that this wrong test could have also theoretically caused problems in nested NPT, not just in nested EPT. However, in practice, the problem was avoided: nested_svm_vmexit()/vmrun() do not call kvm_set_cr3 in the nested NPT case, and instead set the vmcb (and arch.cr3) directly, thus circumventing the problem. Additional potential calls to the buggy function are avoided in that we don't trap cr3 modifications when nested NPT is enabled. However, because in nested VMX we did want to use kvm_set_cr3() (as requested in Avi Kivity's review of the original nested VMX patches), we can't avoid this problem and need to fix it. Reviewed-by: Orit Wasserman <[email protected]> Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Fix cr3 handling in nested exit and entryNadav Har'El1-0/+26
The existing code for handling cr3 and related VMCS fields during nested exit and entry wasn't correct in all cases: If L2 is allowed to control cr3 (and this is indeed the case in nested EPT), during nested exit we must copy the modified cr3 from vmcs02 to vmcs12, and we forgot to do so. This patch adds this copy. If L0 isn't controlling cr3 when running L2 (i.e., L0 is using EPT), and whoever does control cr3 (L1 or L2) is using PAE, the processor might have saved PDPTEs and we should also save them in vmcs12 (and restore later). Reviewed-by: Xiao Guangrong <[email protected]> Reviewed-by: Orit Wasserman <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Support LOAD_IA32_EFER entry/exit controls for L1Nadav Har'El1-7/+16
Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Reviewed-by: Orit Wasserman <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07KVM: MMU: fix check the reserved bits on the gpte of L2Xiao Guangrong1-2/+1
Current code always uses arch.mmu to check the reserved bits on guest gpte which is valid only for L1 guest, we should use arch.nested_mmu instead when we translate gva to gpa for the L2 guest Fix it by using @mmu instead since it is adapted to the current mmu mode automatically The bug can be triggered when nested npt is used and L1 guest and L2 guest use different mmu mode Reported-by: Jan Kiszka <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07KVM: nVMX: correctly set tr base on nested vmexit emulationGleb Natapov1-1/+1
After commit 21feb4eb64e21f8dc91136b91ee886b978ce6421 tr base is zeroed during vmexit. Set it to L1's HOST_TR_BASE. This should fix https://bugzilla.kernel.org/show_bug.cgi?id=60679 Reported-by: Yongjie Ren <[email protected]> Reviewed-by: Arthur Chunqi Li <[email protected]> Tested-by: Yongjie Ren <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-06x86/jump-label: Show where and what was wrong on errorsSteven Rostedt1-3/+18
When modifying text sections for jump labels, a paranoid check is performed. If the check fails, the system "bugs". But why it failed is not shown. The BUG_ON()s in the jump label update code is replaced with bug_at(ip). This is a function that will show what pointer failed, and what was at the location of the failure that made jump label panic. Signed-off-by: Steven Rostedt <[email protected]>
2013-08-06x86/jump-label: Add safety checks to jump label conversionsSteven Rostedt1-4/+28
As with all modifying of kernel text, we need to be very paranoid. When converting the jump label locations to and from nops to jumps a check has been added to make sure what we are replacing is what we expect, otherwise we bug. Cc: H. Peter Anvin <[email protected]> Cc: Jason Baron <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-08-06x86/jump-label: Do not bother updating nops if they are correctSteven Rostedt1-1/+24
On boot up, the jump label init function scans all the jump label locations and converts them to the best nop for the machine. If the nop is already the ideal nop, do not bother with changing it. Cc: Jason Baron <[email protected]> Cc: H. Peter Anvin <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-08-06x86/jump-label: Use best default nops for inital jump label callsSteven Rostedt1-2/+7
As specified by H. Peter Anvin, the best nops for x86 without knowing the running computer is: 32bit: 0x3e, 0x8d, 0x74, 0x26, 0x00 also known as GENERIC_NOP5_ATOMIC 64bit: 0x0f, 0x1f, 0x44, 0x00, 0x00 also known as P6_NOP5_ATOMIC Currently the default nop that is used by jump label is: 0xe9 0x00 0x00 0x00 0x00 Which is really a 5byte jump to the next position. It's better to use a real nop than a jmp. Cc: H. Peter Anvin <[email protected]> Cc: Jason Baron <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-08-06x86, asmlinkage, vdso: Mark vdso variables __visibleAndi Kleen1-1/+1
Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage, power: Make various symbols used by the suspend asm code ↵Andi Kleen2-10/+10
visible Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make 64bit checksum functions visibleAndi Kleen1-1/+1
They are implemented in assembler. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt opsAndi Kleen3-11/+12
Cc: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage, apm: Make APM data structure used from assembler visibleAndi Kleen1-1/+1
Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make syscall tables visibleAndi Kleen2-2/+2
They are referenced from entry*.S. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make several variables used from assembler/linker script ↵Andi Kleen10-13/+14
visible Plus one function, load_gs_index(). Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make kprobes code visible and fix assembler codeAndi Kleen3-10/+7
- Make all the external assembler template symbols __visible - Move the templates inline assembler code into a top level assembler statement, not inside a function. This avoids it being optimized away or cloned. Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make various syscalls asmlinkageAndi Kleen2-5/+5
FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use standard SYSCALL wrappers. But I didn't do that change in this patch. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make 32bit/64bit __switch_to visibleAndi Kleen3-4/+4
This function is called from inline assembler, so has to be visible. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make _*_start_kernel visibleAndi Kleen3-5/+7
Obviously these functions have to be visible, otherwise the whole kernel could be optimized away. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make all interrupt handlers asmlinkage / __visibleAndi Kleen6-81/+77
These handlers are all referenced from assembler stubs, so need to be visible. The handlers without arguments become asmlinkage, the others __visible to not force regparms(0) on x86-32. I put it all into a single patch, please let me know if you want it it split up. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Change dotraplinkage into __visible on 32bitAndi Kleen1-5/+1
Mark 32bit dotraplinkage functions as __visible for LTO. 64bit already is using asmlinkage which includes it. v2: Clean up (M.Marek) Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>