aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2013-08-13KVM: x86: Update symbolic exit codesJan Kiszka1-2/+3
Add decoding for INVEPT and reorder the list according to the reason numbers. Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-13mm: Remove unused variable idx0 in __early_ioremap()Jianguo Wu1-3/+2
After commit: 8827247ffcc ("x86: don't define __this_fixmap_does_not_exist()") variable idx0 is no longer needed, so just remove it. Signed-off-by: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Hanjun Guo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-08-12Merge tag 'please-pull-mce-f-bit' of ↵Ingo Molnar3-4/+14
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE-uncorrected-error fix from Tony Luck: "Bit 12 may or may not be set in MCi_STATUS.MCACOD when an uncorrected error is reported. Ignore it when checking error signatures." Signed-off-by: Ingo Molnar <[email protected]>
2013-08-12x86, microcode, AMD: Fix early microcode loadingTorsten Kaiser3-29/+27
load_microcode_amd() (and the helper it is using) should not have an cpu parameter. The microcode loading does not depend on the CPU wrt the patches loaded since they will end up in a global list for all CPUs anyway. The change from cpu to x86family in load_microcode_amd() now allows to drop the code messing with cpu_data(cpu) from collect_cpu_info_amd_early(), which is wrong anyway because at that point the per-cpu cpu_info is not yet setup (These values would later be overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()). Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its only used at one place and without the cpuinfo_x86 accesses it was not much left. Signed-off-by: Torsten Kaiser <[email protected]> [ Fengguang: build fix ] Signed-off-by: Fengguang Wu <[email protected]> [ Boris: adapt it to current tree. ] Signed-off-by: Borislav Petkov <[email protected]>
2013-08-12x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct ↵Torsten Kaiser1-15/+5
cpuinfo_x86 cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info(). If early microcode loading is enabled its collect_cpu_info_amd_early() will fill ->x86 and so the fallback to boot_cpu_data is not used. But ->x86_vendor was not filled and is still X86_VENDOR_INTEL resulting in no errata fixes getting applied and my system hangs on boot. Using cpu_info in cpu_has_amd_erratum() is wrong anyway: its only caller init_amd() will have a struct cpuinfo_x86 as parameter and the set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses that struct. So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum() and the broken fallback can be dropped. [ Boris: Drop WARN_ON() since we're called only from init_amd() ] Signed-off-by: Torsten Kaiser <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2013-08-12Merge branch 'x86/mce' into x86/rasIngo Molnar6-10/+73
Pursue a single RAS/MCE topic branch on x86. Signed-off-by: Ingo Molnar <[email protected]>
2013-08-12PCI: remove ARCH_SUPPORTS_MSI kconfig optionThomas Petazzoni1-1/+0
Now that we have weak versions for each of the PCI MSI architecture functions, we can actually build the MSI support for all platforms, regardless of whether they provide or not architecture-specific versions of those functions. For this reason, the ARCH_SUPPORTS_MSI hidden kconfig boolean becomes useless, and this patch gets rid of it. Signed-off-by: Thomas Petazzoni <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Tested-by: Daniel Price <[email protected]> Tested-by: Thierry Reding <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: [email protected] Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: [email protected] Cc: Russell King <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: [email protected] Cc: Ralf Baechle <[email protected]> Cc: [email protected] Cc: David S. Miller <[email protected]> Cc: [email protected] Cc: Chris Metcalf <[email protected]> Signed-off-by: Jason Cooper <[email protected]>
2013-08-12PCI: use weak functions for MSI arch-specific functionsThomas Petazzoni2-30/+24
Until now, the MSI architecture-specific functions could be overloaded using a fairly complex set of #define and compile-time conditionals. In order to prepare for the introduction of the msi_chip infrastructure, it is desirable to switch all those functions to use the 'weak' mechanism. This commit converts all the architectures that were overidding those MSI functions to use the new strategy. Note that we keep two separate, non-weak, functions default_teardown_msi_irqs() and default_restore_msi_irqs() for the default behavior of the arch_teardown_msi_irqs() and arch_restore_msi_irqs(), as the default behavior is needed by x86 PCI code. Signed-off-by: Thomas Petazzoni <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Tested-by: Daniel Price <[email protected]> Tested-by: Thierry Reding <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: [email protected] Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: [email protected] Cc: Russell King <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: [email protected] Cc: Ralf Baechle <[email protected]> Cc: [email protected] Cc: David S. Miller <[email protected]> Cc: [email protected] Cc: Chris Metcalf <[email protected]> Signed-off-by: Jason Cooper <[email protected]>
2013-08-12x86, amd_nb: Clarify F15h, model 30h GART and L3 supportAravind Gopalakrishnan1-2/+11
F15h, models 0x30 and later don't have a GART. Note that. Also check CPUID leaf 0x80000006 for L3 prescence because there are models which don't sport an L3 cache. Signed-off-by: Aravind Gopalakrishnan <[email protected]> [ Boris: rewrite commit message, cleanup comments. ] Signed-off-by: Borislav Petkov <[email protected]>
2013-08-12perf/x86: Add Haswell ULT model number used in Macbook Air and other systemsAndi Kleen1-0/+1
This one was missed earlier. Signed-off-by: Andi Kleen <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-08-09x86: Don't clear olpc_ofw_header when sentinel is detectedDaniel Drake1-2/+2
OpenFirmware wasn't quite following the protocol described in boot.txt and the kernel has detected this through use of the sentinel value in boot_params. OFW does zero out almost all of the stuff that it should do, but not the sentinel. This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC support. OpenFirmware has now been fixed. However, it would be nice if we could maintain Linux compatibility with old firmware versions. To do that, we just have to avoid zeroing out olpc_ofw_header. OFW does not write to any other parts of the header that are being zapped by the sentinel-detection code, and all users of olpc_ofw_header are somewhat protected through checking for the OLPC_OFW_SIG magic value before using it. So this should not cause any problems for anyone. Signed-off-by: Daniel Drake <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Acked-by: Yinghai Lu <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: <[email protected]> # v3.9+
2013-08-09xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()David Vrabel1-1/+0
In m2p_remove_override() when removing the grant map from the kernel mapping and replacing with a mapping to the original page, the grant unmap will already have flushed the TLB and it is not necessary to do it again after updating the mapping. Signed-off-by: David Vrabel <[email protected]> Cc: Stefano Stabellini <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]>
2013-08-09xen: Support 64-bit PV guest receiving NMIsKonrad Rzeszutek Wilk4-7/+26
This is based on a patch that Zhenzhong Duan had sent - which was missing some of the remaining pieces. The kernel has the logic to handle Xen-type-exceptions using the paravirt interface in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME - pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN - pv_cpu_ops.iret). That means the nmi handler (and other exception handlers) use the hypervisor iret. The other changes that would be neccessary for this would be to translate the NMI_VECTOR to one of the entries on the ipi_vector and make xen_send_IPI_mask_allbutself use different events. Fortunately for us commit 1db01b4903639fcfaec213701a494fe3fb2c490b (xen: Clean up apic ipi interface) implemented this and we piggyback on the cleanup such that the apic IPI interface will pass the right vector value for NMI. With this patch we can trigger NMIs within a PV guest (only tested x86_64). For this to work with normal PV guests (not initial domain) we need the domain to be able to use the APIC ops - they are already implemented to use the Xen event channels. For that to be turned on in a PV domU we need to remove the masking of X86_FEATURE_APIC. Incidentally that means kgdb will also now work within a PV guest without using the 'nokgdbroundup' workaround. Note that the 32-bit version is different and this patch does not enable that. CC: Lisa Nguyen <[email protected]> CC: Ben Guthro <[email protected]> CC: Zhenzhong Duan <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> [v1: Fixed up per David Vrabel comments] Reviewed-by: Ben Guthro <[email protected]> Reviewed-by: David Vrabel <[email protected]>
2013-08-09kvm guest: Add configuration support to enable debug information for KVM GuestsSrivatsa Vaddagiri1-0/+9
Signed-off-by: Srivatsa Vaddagiri <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-14-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Gleb Natapov <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapiRaghavendra K T1-0/+1
These are needed by both guest and host. Originally-from: Srivatsa Vaddagiri <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-13-git-send-email-raghavendra.kt@linux.vnet.ibm.com Acked-by: Gleb Natapov <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09xen, pvticketlock: Allow interrupts to be enabled while blockingJeremy Fitzhardinge1-6/+40
If interrupts were enabled when taking the spinlock, we can leave them enabled while blocking to get the lock. If we can enable interrupts while waiting for the lock to become available, and we take an interrupt before entering the poll, and the handler takes a spinlock which ends up going into the slow state (invalidating the per-cpu "lock" and "want" values), then when the interrupt handler returns the event channel will remain pending so the poll will return immediately, causing it to return out to the main spinlock loop. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-12-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, ticketlock: Add slowpath logicJeremy Fitzhardinge5-25/+74
Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: Unlocker Locker test for lock pickup -> fail unlock test slowpath -> false set slowpath flags block Whereas this works in any ordering: Unlocker Locker set slowpath flags test for lock pickup -> fail block unlock test slowpath -> true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked "add" is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Srivatsa Vaddagiri <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Cc: Stephan Diestelhorst <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, pvticketlock: When paravirtualizing ticket locks, increment by 2Jeremy Fitzhardinge2-6/+14
Increment ticket head/tails by 2 rather than 1 to leave the LSB free to store a "is in slowpath state" bit. This halves the number of possible CPUs for a given ticket size, but this shouldn't matter in practice - kernels built for 32k+ CPU systems are probably specially built for the hardware rather than a generic distro kernel. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, pvticketlock: Use callee-save for lock_spinningJeremy Fitzhardinge4-4/+5
Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocksJeremy Fitzhardinge1-0/+14
Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-7-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09xen, pvticketlock: Xen implementation for PV ticket locksJeremy Fitzhardinge1-269/+79
Replace the old Xen implementation of PV spinlocks with and implementation of xen_lock_spinning and xen_unlock_kick. xen_lock_spinning simply registers the cpu in its entry in lock_waiting, adds itself to the waiting_cpus set, and blocks on an event channel until the channel becomes pending. xen_unlock_kick searches the cpus in waiting_cpus looking for the one which next wants this lock with the next ticket, if any. If found, it kicks it by making its event channel pending, which wakes it up. We need to make sure interrupts are disabled while we're relying on the contents of the per-cpu lock_waiting values, otherwise an interrupt handler could come in, try to take some other lock, block, and overwrite our values. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-6-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> [ Raghavendra: use function + enum instead of macro, cmpxchg for zero status reset Reintroduce break since we know the exact vCPU to send IPI as suggested by Konrad.] Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09xen: Defer spinlock setup until boot CPU setupJeremy Fitzhardinge1-1/+1
There's no need to do it at very early init, and doing it there makes it impossible to use the jump_label machinery. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-5-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, ticketlock: Collapse a layer of functionsJeremy Fitzhardinge1-30/+5
Now that the paravirtualization layer doesn't exist at the spinlock level any more, we can collapse the __ticket_ functions into the arch_ functions. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-4-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocksRaghavendra K T1-0/+1
The code size expands somewhat, and its better to just call a function rather than inline it. Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch, which is simplified. Suggested-by: Linus Torvalds <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-3-git-send-email-raghavendra.kt@linux.vnet.ibm.com Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, spinlock: Replace pv spinlocks with pv ticketlocksJeremy Fitzhardinge6-61/+65
Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Results: ======= setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM base = 3.11-rc patched = base + pvspinlock V12 +-----------------+----------------+--------+ dbench (Throughput in MB/sec. Higher is better) +-----------------+----------------+--------+ | base (stdev %)|patched(stdev%) | %gain | +-----------------+----------------+--------+ | 15035.3 (0.3) |15150.0 (0.6) | 0.8 | | 1470.0 (2.2) | 1713.7 (1.9) | 16.6 | | 848.6 (4.3) | 967.8 (4.3) | 14.0 | | 652.9 (3.5) | 685.3 (3.7) | 5.0 | +-----------------+----------------+--------+ pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases, and undercommits results are flat Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t] Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09Merge branch 'master' into virtio-nextRusty Russell85-1197/+399
The next commit gets conflicts because it relies on patches which were cc:stable and thus had to be merged into Linus' tree before the coming merge window. So pull in master now. Signed-off-by: Rusty Russell <[email protected]>
2013-08-07x86, relocs: Move ELF relocation handling to CKees Cook8-40/+97
Moves the relocation handling into C, after decompression. This requires that the decompressed size is passed to the decompression routine as well so that relocations can be found. Only kernels that need relocation support will use the code (currently just x86_32), but this is laying the ground work for 64-bit using it in support of KASLR. Based on work by Neill Clift and Michael Davidson. Signed-off-by: Kees Cook <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Acked-by: Zhang Yanfei <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-07Merge branch 'pm-cpufreq-ondemand' into pm-cpufreqRafael J. Wysocki1-29/+0
* pm-cpufreq: cpufreq: Remove unused function __cpufreq_driver_getavg() cpufreq: Remove unused APERF/MPERF support cpufreq: ondemand: Change the calculation of target frequency
2013-08-07KVM: nVMX: Advertise IA32_PAT in VM exit controlArthur Chunqi Li1-3/+5
Advertise VM_EXIT_SAVE_IA32_PAT and VM_EXIT_LOAD_IA32_PAT. Signed-off-by: Arthur Chunqi Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07KVM: nVMX: Fix up VM_ENTRY_IA32E_MODE control feature reportingJan Kiszka1-1/+5
Do not report that we can enter the guest in 64-bit mode if the host is 32-bit only. This is not supported by KVM. Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07KVM: nEPT: Advertise WB type EPTPJan Kiszka1-2/+2
At least WB must be possible. Signed-off-by: Jan Kiszka <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nVMX: Keep arch.pat in sync on L1-L2 switchesJan Kiszka1-3/+6
When asking vmx to load the PAT MSR for us while switching from L1 to L2 or vice versa, we have to update arch.pat as well as it may later be used again to load or read out the MSR content. Reviewed-by: Gleb Natapov <[email protected]> Tested-by: Arthur Chunqi Li <[email protected]> Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Miscelleneous cleanupsNadav Har'El1-4/+2
Some trivial code cleanups not really related to nested EPT. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Some additional commentsNadav Har'El1-0/+13
Some additional comments to preexisting code: Explain who (L0 or L1) handles EPT violation and misconfiguration exits. Don't mention "shadow on either EPT or shadow" as the only two options. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07Advertise the support of EPT to the L1 guest, through the appropriate MSR.Nadav Har'El1-2/+18
This is the last patch of the basic Nested EPT feature, so as to allow bisection through this patch series: The guest will not see EPT support until this last patch, and will not attempt to use the half-applied feature. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Nested INVEPTNadav Har'El4-0/+77
If we let L1 use EPT, we should probably also support the INVEPT instruction. In our current nested EPT implementation, when L1 changes its EPT table for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course of this modification already calls INVEPT. But if last level of shadow page is unsync not all L1's changes to EPT12 are intercepted, which means roots need to be synced when L1 calls INVEPT. Global INVEPT should not be different since roots are synced by kvm_mmu_load() each time EPTP02 changes. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: MMU context for nested EPTNadav Har'El3-1/+69
KVM's existing shadow MMU code already supports nested TDP. To use it, we need to set up a new "MMU context" for nested EPT, and create a few callbacks for it (nested_ept_*()). This context should also use the EPT versions of the page table access functions (defined in the previous patch). Then, we need to switch back and forth between this nested context and the regular MMU context when switching between L1 and L2 (when L1 runs this L2 with EPT). Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Add nEPT violation/misconfigration supportYang Zhang4-14/+95
Inject nEPT fault to L1 guest. This patch is original from Xinhao. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: correctly check if remote tlb flush is needed for shadowed EPT tablesGleb Natapov1-4/+4
need_remote_flush() assumes that shadow page is in PT64 format, but with addition of nested EPT this is no longer always true. Fix it by bits definitions that depend on host shadow page type. Reported-by: Xiao Guangrong <[email protected]> Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Redefine EPT-specific link_shadow_page()Yang Zhang2-5/+11
Since nEPT doesn't support A/D bit, so we should not set those bit when build shadow page table. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Add EPT tables support to paging_tmpl.hNadav Har'El2-1/+41
This is the first patch in a series which adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set its own cr3 and take its own page faults without either of L0 or L1 getting involved. This often significanlty improves L2's performance over the previous two alternatives (shadow page tables over EPT, and shadow page tables over shadow page tables). This patch adds EPT support to paging_tmpl.h. paging_tmpl.h contains the code for reading and writing page tables. The code for 32-bit and 64-bit tables is very similar, but not identical, so paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once with PTTYPE=64, and this generates the two sets of similar functions. There are subtle but important differences between the format of EPT tables and that of ordinary x86 64-bit page tables, so for nested EPT we need a third set of functions to read the guest EPT table and to write the shadow EPT table. So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed with "EPT") which correctly read and write EPT tables. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Support shadow paging for guest paging without A/D bitsGleb Natapov1-3/+13
Some guest paging modes do not support A/D bits. Add support for such modes in shadow page code. For such modes PT_GUEST_DIRTY_MASK, PT_GUEST_ACCESSED_MASK, PT_GUEST_DIRTY_SHIFT and PT_GUEST_ACCESSED_SHIFT should be set to zero. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: make guest's A/D bits depends on guest's paging modeGleb Natapov1-8/+22
This patch makes guest A/D bits definition to be dependable on paging mode, so when EPT support will be added it will be able to define them differently. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Move common code to paging_tmpl.hNadav Har'El2-67/+69
For preparation, we just move gpte_access(), prefetch_invalid_gpte(), s_rsvd_bits_set(), protect_clean_gpte() and is_dirty_gpte() from mmu.c to paging_tmpl.h. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Fix wrong test in kvm_set_cr3Nadav Har'El1-11/+0
kvm_set_cr3() attempts to check if the new cr3 is a valid guest physical address. The problem is that with nested EPT, cr3 is an *L2* physical address, not an L1 physical address as this test expects. As the comment above this test explains, it isn't necessary, and doesn't correspond to anything a real processor would do. So this patch removes it. Note that this wrong test could have also theoretically caused problems in nested NPT, not just in nested EPT. However, in practice, the problem was avoided: nested_svm_vmexit()/vmrun() do not call kvm_set_cr3 in the nested NPT case, and instead set the vmcb (and arch.cr3) directly, thus circumventing the problem. Additional potential calls to the buggy function are avoided in that we don't trap cr3 modifications when nested NPT is enabled. However, because in nested VMX we did want to use kvm_set_cr3() (as requested in Avi Kivity's review of the original nested VMX patches), we can't avoid this problem and need to fix it. Reviewed-by: Orit Wasserman <[email protected]> Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Fix cr3 handling in nested exit and entryNadav Har'El1-0/+26
The existing code for handling cr3 and related VMCS fields during nested exit and entry wasn't correct in all cases: If L2 is allowed to control cr3 (and this is indeed the case in nested EPT), during nested exit we must copy the modified cr3 from vmcs02 to vmcs12, and we forgot to do so. This patch adds this copy. If L0 isn't controlling cr3 when running L2 (i.e., L0 is using EPT), and whoever does control cr3 (L1 or L2) is using PAE, the processor might have saved PDPTEs and we should also save them in vmcs12 (and restore later). Reviewed-by: Xiao Guangrong <[email protected]> Reviewed-by: Orit Wasserman <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Support LOAD_IA32_EFER entry/exit controls for L1Nadav Har'El1-7/+16
Recent KVM, since http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 switch the EFER MSR when EPT is used and the host and guest have different NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) and want to be able to run recent KVM as L1, we need to allow L1 to use this EFER switching feature. To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if available, and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former (the latter is still unsupported). Nested entry and exit emulation (prepare_vmcs_02 and load_vmcs12_host_state, respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do in this patch is to properly advertise this feature to L1. Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by using vmx_set_efer (which itself sets one of several vmcs02 fields), so we always support this feature, regardless of whether the host supports it. Reviewed-by: Orit Wasserman <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07KVM: MMU: fix check the reserved bits on the gpte of L2Xiao Guangrong1-2/+1
Current code always uses arch.mmu to check the reserved bits on guest gpte which is valid only for L1 guest, we should use arch.nested_mmu instead when we translate gva to gpa for the L2 guest Fix it by using @mmu instead since it is adapted to the current mmu mode automatically The bug can be triggered when nested npt is used and L1 guest and L2 guest use different mmu mode Reported-by: Jan Kiszka <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07KVM: nVMX: correctly set tr base on nested vmexit emulationGleb Natapov1-1/+1
After commit 21feb4eb64e21f8dc91136b91ee886b978ce6421 tr base is zeroed during vmexit. Set it to L1's HOST_TR_BASE. This should fix https://bugzilla.kernel.org/show_bug.cgi?id=60679 Reported-by: Yongjie Ren <[email protected]> Reviewed-by: Arthur Chunqi Li <[email protected]> Tested-by: Yongjie Ren <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-06x86/jump-label: Show where and what was wrong on errorsSteven Rostedt1-3/+18
When modifying text sections for jump labels, a paranoid check is performed. If the check fails, the system "bugs". But why it failed is not shown. The BUG_ON()s in the jump label update code is replaced with bug_at(ip). This is a function that will show what pointer failed, and what was at the location of the failure that made jump label panic. Signed-off-by: Steven Rostedt <[email protected]>