aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-03-17KVM: x86: release kvmclock page on resetGlauber Costa1-8/+12
When a vcpu is reset, kvmclock page keeps being written to this days. This is wrong and inconsistent: a cpu reset should take it to its initial state. Signed-off-by: Glauber Costa <[email protected]> CC: Jan Kiszka <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17mm: remove is_hwpoison_addressHuang Ying2-40/+0
Unused. Signed-off-by: Huang Ying <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: Replace is_hwpoison_address with __get_user_pagesHuang Ying1-1/+10
is_hwpoison_address only checks whether the page table entry is hwpoisoned, regardless the memory page mapped. While __get_user_pages will check both. QEMU will clear the poisoned page table entry (via unmap/map) to make it possible to allocate a new memory page for the virtual address across guest rebooting. But it is also possible that the underlying memory page is kept poisoned even after the corresponding page table entry is cleared, that is, a new memory page can not be allocated. __get_user_pages can catch these situations. Signed-off-by: Huang Ying <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17mm: make __get_user_pages return -EHWPOISON for HWPOISON page optionallyHuang Ying7-3/+21
Make __get_user_pages return -EHWPOISON for HWPOISON page only if FOLL_HWPOISON is specified. With this patch, the interested callers can distinguish HWPOISON pages from general FAULT pages, while other callers will still get -EFAULT for all these pages, so the user space interface need not to be changed. This feature is needed by KVM, where UCR MCE should be relayed to guest for HWPOISON page, while instruction emulation and MMIO will be tried for general FAULT page. The idea comes from Andrew Morton. Signed-off-by: Huang Ying <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-03-17mm: export __get_user_pagesHuang Ying5-11/+60
In most cases, get_user_pages and get_user_pages_fast should be used to pin user pages in memory. But sometimes, some special flags except FOLL_GET, FOLL_WRITE and FOLL_FORCE are needed, for example in following patch, KVM needs FOLL_HWPOISON. To support these users, __get_user_pages is exported directly. There are some symbol name conflicts in infiniband driver, fixed them too. Signed-off-by: Huang Ying <[email protected]> CC: Andrew Morton <[email protected]> CC: Michel Lespinasse <[email protected]> CC: Roland Dreier <[email protected]> CC: Ralph Campbell <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: x86: handle guest access to BBL_CR_CTL3 MSRjohn cooper2-0/+20
A correction to Intel cpu model CPUID data (patch queued) caused winxp to BSOD when booted with a Penryn model. This was traced to the CPUID "model" field correction from 6 -> 23 (as is proper for a Penryn class of cpu). Only in this case does the problem surface. The cause for this failure is winxp accessing the BBL_CR_CTL3 MSR which is unsupported by current kvm, appears to be a legacy MSR not fully characterized yet existing in current silicon, and is apparently carried forward in MSR space to accommodate vintage code as here. It is not yet conclusive whether this MSR implements any of its legacy functionality or is just an ornamental dud for compatibility. While I found no silicon version specific documentation link to this MSR, a general description exists in Intel's developer's reference which agrees with the functional behavior of other bootloader/kernel code I've examined accessing BBL_CR_CTL3. Regrettably winxp appears to be setting bit #19 called out as "reserved" in the above document. So to minimally accommodate this MSR, kvm msr get will provide the equivalent mock data and kvm msr write will simply toss the guest passed data without interpretation. While this treatment of BBL_CR_CTL3 addresses the immediate problem, the approach may be modified pending clarification from Intel. Signed-off-by: john cooper <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: make make_all_cpus_request() locklessXiao Guangrong2-12/+3
Now, we have 'vcpu->mode' to judge whether need to send ipi to other cpus, this way is very exact, so checking request bit is needless, then we can drop the spinlock let it's collateral Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-03-17KVM: Add "exiting guest mode" stateXiao Guangrong4-12/+35
Currently we keep track of only two states: guest mode and host mode. This patch adds an "exiting guest mode" state that tells us that an IPI will happen soon, so unless we need to wait for the IPI, we can avoid it completely. Also 1: No need atomically to read/write ->mode in vcpu's thread 2: reorganize struct kvm_vcpu to make ->mode and ->requests in the same cache line explicitly Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-03-17KVM: fix build warning within __kvm_set_memory_region() on s390Heiko Carstens1-0/+2
Get rid of this warning: CC arch/s390/kvm/../../../virt/kvm/kvm_main.o arch/s390/kvm/../../../virt/kvm/kvm_main.c:596:12: warning: 'kvm_create_dirty_bitmap' defined but not used The only caller of the function is within a !CONFIG_S390 section, so add the same ifdef around kvm_create_dirty_bitmap() as well. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: x86: Remove user space triggerable MCE error messageJan Kiszka1-3/+0
This case is a pure user space error we do not need to record. Moreover, it can be misused to flood the kernel log. Remove it. Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: fix rcu usage warning in kvm_arch_vcpu_ioctl_set_sregs()Xiao Guangrong1-1/+4
Fix: [ 1001.499596] =================================================== [ 1001.499599] [ INFO: suspicious rcu_dereference_check() usage. ] [ 1001.499601] --------------------------------------------------- [ 1001.499604] include/linux/kvm_host.h:301 invoked rcu_dereference_check() without protection! ...... [ 1001.499636] Pid: 6035, comm: qemu-system-x86 Not tainted 2.6.37-rc6+ #62 [ 1001.499638] Call Trace: [ 1001.499644] [] lockdep_rcu_dereference+0x9d/0xa5 [ 1001.499653] [] gfn_to_memslot+0x8d/0xc8 [kvm] [ 1001.499661] [] gfn_to_hva+0x16/0x3f [kvm] [ 1001.499669] [] kvm_read_guest_page+0x1e/0x5e [kvm] [ 1001.499681] [] kvm_read_guest_page_mmu+0x53/0x5e [kvm] [ 1001.499699] [] load_pdptrs+0x3f/0x9c [kvm] [ 1001.499705] [] ? vmx_set_cr0+0x507/0x517 [kvm_intel] [ 1001.499717] [] kvm_arch_vcpu_ioctl_set_sregs+0x1f3/0x3c0 [kvm] [ 1001.499727] [] kvm_vcpu_ioctl+0x6a5/0xbc5 [kvm] Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: VMX: Avoid atomic operation in vmx_vcpu_runAvi Kivity1-2/+5
Instead of exchanging the guest and host rcx, have separate storage for each. This allows us to avoid using the xchg instruction, which is is a little slower than normal operations. Signed-off-by: Avi Kivity <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: VMX: Simplify saving guest rcx in vmx_vcpu_runAvi Kivity1-2/+2
Change push top-of-stack pop guest-rcx pop dummy to pop guest-rcx which is the same thing, only simpler. Signed-off-by: Avi Kivity <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: VMX: increase ple_gap default to 128Rik van Riel1-2/+2
On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger PLE exits, even with the minimalistic PLE test from kvm-unit-tests. http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545 For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in order to get pause loop exits: # modprobe kvm_intel ple_gap=47 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 58298446 # rmmod kvm_intel # modprobe kvm_intel ple_gap=48 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 36616 Increase the ple_gap to 128 to be on the safe side. Signed-off-by: Rik van Riel <[email protected]> Acked-by: Zhai, Edwin <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-03-17KVM: SVM: Add support for perf-kvmJoerg Roedel1-2/+10
This patch adds the necessary code to run perf-kvm on AMD machines. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-03-17KVM: VMX: Avoid leaking fake realmode state to userspaceAvi Kivity1-7/+36
When emulating real mode, we fake some state: - tr.base points to a fake vm86 tss - segment registers are made to conform to vm86 restrictions change vmx_get_segment() not to expose this fake state to userspace; instead, return the original state. Signed-off-by: Avi Kivity <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: VMX: Save and restore tr selector across mode switchesAvi Kivity1-0/+2
When emulating real mode we play with tr hidden state, but leave tr.selector alone. That works well, except for save/restore, since loading TR writes it to the hidden state in vmx->rmode. Fix by also saving and restoring the tr selector; this makes things more consistent and allows migration to work during the early boot stages of Windows XP. Signed-off-by: Avi Kivity <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: PPC: Fix SPRG get/set for Book3S and BookEPeter Tyser2-12/+16
Previously SPRGs 4-7 were improperly read and written in kvm_arch_vcpu_ioctl_get_regs() and kvm_arch_vcpu_ioctl_set_regs(); Signed-off-by: Alexander Graf <[email protected]> Signed-off-by: Peter Tyser <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM guest: Fix section mismatch derived from kvm_guest_cpu_online()Sedat Dilek1-1/+1
WARNING: arch/x86/built-in.o(.text+0x1bb74): Section mismatch in reference from the function kvm_guest_cpu_online() to the function .cpuinit.text:kvm_guest_cpu_init() The function kvm_guest_cpu_online() references the function __cpuinit kvm_guest_cpu_init(). This is often because kvm_guest_cpu_online lacks a __cpuinit annotation or the annotation of kvm_guest_cpu_init is wrong. This patch fixes the warning. Tested with linux-next (next-20101231) Signed-off-by: Sedat Dilek <[email protected]> Acked-by: Rik van Riel <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17KVM: MMU: Don't flush shadow when enabling dirty trackingAvi Kivity2-10/+9
Instead, drop large mappings, which were the reason we dropped shadow. Signed-off-by: Avi Kivity <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2011-03-17Merge branch 'for_next' into for_linusJan Kara6-13/+23
2011-03-17genirq: Fix incorrect unlock in __setup_irq()Dan Carpenter1-1/+1
goto out_thread is called before we take the lock. It causes a gcc warning: "kernel/irq/manage.c:858: warning: ‘flags’ may be used uninitialized in this function" [ tglx: Moved unlock before free_cpumask_var() ] Signed-off-by: Dan Carpenter <[email protected]> LKML-Reference: <20110317114307.GJ2008@bicker> Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-17cris: Use generic show_interrupts()Thomas Gleixner2-39/+1
Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jesper Nilsson <[email protected]>
2011-03-17genirq: show_interrupts: Check desc->name before printing it blindlyThomas Gleixner1-1/+2
desc->name is not required and not used by all architectures. Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-17cris: Use accessor functions to set IRQ_PER_CPU flagThomas Gleixner1-3/+3
Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-17cris: Fix irq conversion falloutThomas Gleixner2-2/+2
arch/cris/arch-v10/kernel/irq.c: In function 'init_IRQ': arch/cris/arch-v10/kernel/irq.c:202:3: error: implicit declaration of function 'set_irq_desc_and_handler' Should have been set_irq_chip_and_handler() Fix it and convert to the new function names while at it. Reported-by: Peter Zijlstra <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-17amd64_edac: Fix decode_syndrome typesBorislav Petkov1-4/+4
Those should all be unsigned. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Fix DCT argument typeBorislav Petkov1-5/+4
Fix amd64_debug_display_dimm_sizes() arguments order per convention (pvt is always first). Also, the now second arg denotes the DCT so adjust its type. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Fix ranges signednessBorislav Petkov1-4/+5
The dram ranges make sense only as an unsigned type. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Drop local variableBorislav Petkov1-3/+2
Use the macro directly instead Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Fix PCI config addressing typesBorislav Petkov1-5/+5
Adjust argument types to the PCI config API's types. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Fix DRAM base macrosBorislav Petkov2-6/+5
Return unsigned u8 values only. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Fix node id signednessBorislav Petkov2-9/+11
A node id can never be negative since we use it as an index into the DRAM ranges array. This also makes one of the BUG_ON conditions redundant. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Drop redundant declarationsBorislav Petkov1-8/+0
Those were moved to the mce_amd.h header. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Enable driver on F15hBorislav Petkov3-15/+29
Add the PCI device ids required for driver registration. Remove pvt->ctl_name and use the family descriptor directly, instead. Then, bump driver version and fixup its format. Finally, enable DRAM ECC decoding on F15h. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Adjust ECC symbol size to F15hBorislav Petkov2-18/+16
F15h has the same ECC symbol size options as F10h revD and later so adjust checks to that. Simplify code a bit. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Simplify scrubrate settingBorislav Petkov2-13/+5
Drop per-instance variable and compute min scrubrate dynamically. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17PCI: Rename CPU PCI id defineBorislav Petkov2-2/+2
With increasing number of PCI function ids, add the PCI function id in the define name instead of its symbolic name in the BKDG for more clarity. Acked-by: Ingo Molnar <[email protected]> Acked-by: Jesse Barnes <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Improve DRAM address mappingBorislav Petkov2-70/+83
Drop static tables which map the bits in F2x80 to a chip select size in favor of functions doing the mapping with some bit fiddling. Also, add F15 support. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Sanitize ->read_dram_ctl_registerBorislav Petkov2-9/+7
This function is relevant for F10h and higher, and it has only one callsite so drop its function pointer from the low_ops struct. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Adjust sys_addr to chip select conversion routine to F15hBorislav Petkov1-14/+15
F15h sys_addr to chip select mapping is almost identical to F10h's so reuse that. Rename functions on that path accordingly. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Beef up early exit reportingBorislav Petkov1-1/+12
Add paranoid checks for the sys address before going off and decoding it. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Revamp online spare handlingBorislav Petkov2-21/+15
Replace per-DCT macros with smarter ones, drop hack and look for the spare rank on all chip selects on a channel. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Fix channel interleave removalBorislav Petkov1-9/+17
Remove the channel interleave select bit properly. See F2x110[DctSelIntLvAddr] for details. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Correct node interleaving removalBorislav Petkov1-4/+4
When node interleaving is enabled, a subset of the addr[14:12] bits has to be removed in order to get the normalized DCT address of the DRAM channel. The actual number of bits to remove is determined by F1x[1, 0][7C:40][IntlvEn]. Do this correctly. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Add support for interleaved region swappingBorislav Petkov2-0/+40
On revC3 and revE Fam10h machines and later, non-interleaved graphics framebuffer memory under the 16G mark can be swapped with a region located at the bottom of memory so that the GPU can use the interleaved region and thus two channels. Add support for that. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Unify get_error_addressBorislav Petkov2-15/+13
The address bits from MC4_STATUS differ only between K8 and the rest so no need for a per-family method. No functional change. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Simplify decoding pathBorislav Petkov3-65/+35
Use the struct mce directly instead of copying from it into a custom struct err_regs. No functionality change. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Adjust channel counting to F15hBorislav Petkov1-7/+6
The only difference is that F10h used to sport ganged DCTs and F15h doesn't so adjust the F10h routine and reuse it. Signed-off-by: Borislav Petkov <[email protected]>
2011-03-17amd64_edac: Cleanup old defines cruftBorislav Petkov2-76/+22
Remove unused defines, drop family names from define names. Signed-off-by: Borislav Petkov <[email protected]>