aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/paravirt.c
AgeCommit message (Collapse)AuthorFilesLines
2015-11-25x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_deferJuergen Gross1-3/+0
pte_update_defer can be removed as it is always set to the same function as pte_update. So any usage of pte_update_defer() can be replaced by pte_update(). pmd_update and pmd_update_defer are always set to paravirt_nop, so they can just be nuked. Signed-off-by: Juergen Gross <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-11-23x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV opBoris Ostrovsky1-5/+0
As result of commit "x86/xen: Avoid fast syscall path for Xen PV guests", usergs_sysret32 pv op is not called by Xen PV guests anymore and since they were the only ones who used it we can safely remove it. Signed-off-by: Boris Ostrovsky <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-11-23x86/paravirt: Remove the unused irq_enable_sysexit pv opBoris Ostrovsky1-7/+0
As result of commit "x86/xen: Avoid fast syscall path for Xen PV guests", the irq_enable_sysexit pv op is not called by Xen PV guests anymore and since they were the only ones who used it we can safely remove it. Signed-off-by: Boris Ostrovsky <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-11-19x86/paravirt: Remove unused pv_apic_ops structureJuergen Gross1-8/+0
The only member of that structure is startup_ipi_hook which is always set to paravirt_nop. Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: David Vrabel <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-11-07x86/paravirt: Kill some unused patching functionsBorislav Petkov1-12/+1
paravirt_patch_ignore() is completely unused and paravirt_patch_nop() doesn't do a whole lot. Remove them both. Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Chris Wright <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Rusty Russell <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-09-22x86/paravirt: Replace the paravirt nop with a bona fide empty functionAndy Lutomirski1-4/+12
PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an example, trimmed for readability): ff 15 00 00 00 00 callq *0x0(%rip) # 2796 <nmi+0x6> 2792: R_X86_64_PC32 pv_irq_ops+0x2c That's a call through a function pointer to regular C function that does nothing on native boots, but that function isn't protected against kprobes, isn't marked notrace, and is certainly not guaranteed to preserve any registers if the compiler is feeling perverse. This is bad news for a CLBR_NONE operation. Of course, if everything works correctly, once paravirt ops are patched, it gets nopped out, but what if we hit this code before paravirt ops are patched in? This can potentially cause breakage that is very difficult to debug. A more subtle failure is possible here, too: if _paravirt_nop uses the stack at all (even just to push RBP), it will overwrite the "NMI executing" variable if it's called in the NMI prologue. The Xen case, perhaps surprisingly, is fine, because it's already written in asm. Fix all of the cases that default to paravirt_nop (including adjust_exception_frame) with a big hammer: replace paravirt_nop with an asm function that is just a ret instruction. The Xen case may have other problems, so document them. This is part of a fix for some random crashes that Sasha saw. Reported-and-tested-by: Sasha Levin <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org Signed-off-by: Thomas Gleixner <[email protected]>
2015-07-06x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooksAndy Lutomirski1-2/+0
We've had ->read_tsc() and ->read_tscp() paravirt hooks since the very beginning of paravirt, i.e., d3561b7fa0fb ("[PATCH] paravirt: header and stubs for paravirtualisation"). AFAICT, the only paravirt guest implementation that ever replaced these calls was vmware, and it's gone. Arguably even vmware shouldn't have hooked RDTSC -- we fully support systems that don't have a TSC at all, so there's no point for a paravirt implementation to pretend that we have a TSC but to replace it. I also doubt that these hooks actually worked. Calls to rdtscl() and rdtscll(), which respected the hooks, were used seemingly interchangeably with native_read_tsc(), which did not. Just remove them. If anyone ever needs them again, they can try to make a case for why they need them. Before, on a paravirt config: text data bss dec hex filename 12618257 1816384 1093632 15528273 ecf151 vmlinux After: text data bss dec hex filename 12617207 1816384 1093632 15527223 eced37 vmlinux Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Huang Rui <[email protected]> Cc: John Stultz <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: kvm ML <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/d08a2600fb298af163681e5efd8e599d889a5b97.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-05-08Merge branch 'linus' into x86/asm, before applying dependent patchIngo Molnar1-3/+3
Signed-off-by: Ingo Molnar <[email protected]>
2015-04-22x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvopAndy Lutomirski1-1/+3
We don't use irq_enable_sysexit on 64-bit kernels any more. Remove all the paravirt and Xen machinery to support it on 64-bit kernels. Tested-by: Boris Ostrovsky <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-04-14x86: expose number of page table levels on Kconfig levelKirill A. Shutemov1-3/+3
We would want to use number of page table level to define mm_struct. Let's expose it as CONFIG_PGTABLE_LEVELS. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-24kprobes, x86: Prohibit probing on native_set_debugreg()/load_idt()Masami Hiramatsu1-1/+3
Since the kprobes uses do_debug for single stepping, functions called from do_debug() before notify_die() must not be probed. And also native_load_idt() is called from paranoid_exit when returning int3, this also must not be probed. Signed-off-by: Masami Hiramatsu <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Cc: Alok Kataria <[email protected]> Cc: Chris Wright <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Rusty Russell <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-04-24kprobes: Introduce NOKPROBE_SYMBOL() macro to maintain kprobes blacklistMasami Hiramatsu1-0/+4
Introduce NOKPROBE_SYMBOL() macro which builds a kprobes blacklist at kernel build time. The usage of this macro is similar to EXPORT_SYMBOL(), placed after the function definition: NOKPROBE_SYMBOL(function); Since this macro will inhibit inlining of static/inline functions, this patch also introduces a nokprobe_inline macro for static/inline functions. In this case, we must use NOKPROBE_SYMBOL() for the inline function caller. When CONFIG_KPROBES=y, the macro stores the given function address in the "_kprobe_blacklist" section. Since the data structures are not fully initialized by the macro (because there is no "size" information), those are re-initialized at boot time by using kallsyms. Signed-off-by: Masami Hiramatsu <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: Alok Kataria <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christopher Li <[email protected]> Cc: Chris Wright <[email protected]> Cc: David S. Miller <[email protected]> Cc: Jan-Simon Möller <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Rusty Russell <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-09-04x86, paravirt: Remove duplicate definition for DEF_NATIVEH. Peter Anvin1-5/+0
DEF_NATIVE() is defined in paravirt_types.h, remove duplicate definition in paravirt.c Reported-by: Linus Torvalds <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Link: http://lkml.kernel.org/r/CA%2B55aFxVv==DC0JdS87V%2BcPr-twN%[email protected]
2013-08-06x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt opsAndi Kleen1-2/+2
Cc: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-30Merge branch 'x86-paravirt-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt update from Ingo Molnar: "Various paravirtualization related changes - the biggest one makes guest support optional via CONFIG_HYPERVISOR_GUEST" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, wakeup, sleep: Use pvops functions for changing GDT entries x86, xen, gdt: Remove the pvops variant of store_gdt. x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed. x86: Make Linux guest support optional x86, Kconfig: Move PARAVIRT_DEBUG into the paravirt menu
2013-04-11x86, xen, gdt: Remove the pvops variant of store_gdt.Konrad Rzeszutek Wilk1-1/+0
The two use-cases where we needed to store the GDT were during ACPI S3 suspend and resume. As the patches: x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed. have demonstrated - there are other mechanism by which the GDT is saved and reloaded during early resume path. Hence we do not need to worry about the pvops call-chain for saving the GDT and can and can eliminate it. The other areas where the store_gdt is used are never going to be hit when running under the pvops platforms. Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-10x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metalBoris Ostrovsky1-12/+13
Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU updates" may cause a minor performance regression on bare metal. This patch resolves that performance regression. It is somewhat unclear to me if this is a good -stable candidate. ] Signed-off-by: Boris Ostrovsky <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Tested-by: Josh Boyer <[email protected]> Tested-by: Konrad Rzeszutek Wilk <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: <[email protected]> SEE NOTE ABOVE
2012-06-07x86, pvops: Remove hooks for {rd,wr}msr_safe_regsAndre Przywara1-2/+0
There were paravirt_ops hooks for the full register set variant of {rd,wr}msr_safe which are actually not used by anyone anymore. Remove them to make the code cleaner and avoid silent breakages when the pvops members were uninitialized. This has been boot-tested natively and under Xen with PVOPS enabled and disabled on one machine. Signed-off-by: Andre Przywara <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Acked-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2012-05-14x86: replace percpu_xxx funcs with this_cpu_xxxAlex Shi1-6/+6
Since percpu_xxx() serial functions are duplicated with this_cpu_xxx(). Removing percpu_xxx() definition and replacing them by this_cpu_xxx() in code. There is no function change in this patch, just preparation for later percpu_xxx serial function removing. On x86 machine the this_cpu_xxx() serial functions are same as __this_cpu_xxx() without no unnecessary premmpt enable/disable. Thanks for Stephen Rothwell, he found and fixed a i386 build error in the patch. Also thanks for Andrew Morton, he kept updating the patchset in Linus' tree. Signed-off-by: Alex Shi <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2012-03-28Merge tag 'split-asm_system_h-for-linus-20120328' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
2012-03-28Disintegrate asm/system.h for X86David Howells1-0/+1
Disintegrate asm/system.h for X86. Signed-off-by: David Howells <[email protected]> Acked-by: H. Peter Anvin <[email protected]> cc: [email protected]
2012-03-24Merge tag 'bug-for-3.4' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/bug.h> cleanup from Paul Gortmaker: "The changes shown here are to unify linux's BUG support under the one <linux/bug.h> file. Due to historical reasons, we have some BUG code in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h, but old code in kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h was including <asm/bug.h> to pseudo link them. This has caused confusion[1] and general yuck/WTF[2] reactions. Here is an example that violates the principle of least surprise: CC lib/string.o lib/string.c: In function 'strlcat': lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON' make[2]: *** [lib/string.o] Error 1 $ $ grep linux/bug.h lib/string.c #include <linux/bug.h> $ We've included <linux/bug.h> for the BUG infrastructure and yet we still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh - very confusing for someone who is new to kernel development. With the above in mind, the goals of this changeset are: 1) find and fix any include/*.h files that were relying on the implicit presence of BUG code. 2) find and fix any C files that were consuming kernel.h and hence relying on implicitly getting some/all BUG code. 3) Move the BUG related code living in kernel.h to <linux/bug.h> 4) remove the asm/bug.h from kernel.h to finally break the chain. During development, the order was more like 3-4, build-test, 1-2. But to ensure that git history for bisect doesn't get needless build failures introduced, the commits have been reorderd to fix the problem areas in advance. [1] https://lkml.org/lkml/2012/1/3/90 [2] https://lkml.org/lkml/2012/1/17/414" Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul and linux-next. * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: kernel.h: doesn't explicitly use bug.h, so don't include it. bug: consolidate BUILD_BUG_ON with other bug code BUG: headers with BUG/BUG_ON etc. need linux/bug.h bug.h: add include of it to various implicit C users lib: fix implicit users of kernel.h for TAINT_WARN spinlock: macroize assert_spin_locked to avoid bug.h dependency x86: relocate get/set debugreg fcns to include/asm/debugreg.
2012-02-29bug.h: add include of it to various implicit C usersPaul Gortmaker1-0/+1
With bug.h currently living right in linux/kernel.h there are files that use BUG_ON and friends but are not including the header explicitly. Fix them up so we can remove the presence in kernel.h file. Signed-off-by: Paul Gortmaker <[email protected]>
2012-02-24static keys: Introduce 'struct static_key', static_key_true()/false() and ↵Ingo Molnar1-2/+2
static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Jason Baron <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-08-12Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip: x86-64: Rework vsyscall emulation and add vsyscall= parameter x86-64: Wire up getcpu syscall x86: Remove unnecessary compile flag tweaks for vsyscall code x86-64: Add vsyscall:emulate_vsyscall trace event x86-64: Add user_64bit_mode paravirt op x86-64, xen: Enable the vvar mapping x86-64: Work around gold bug 13023 x86-64: Move the "user" vsyscall segment out of the data segment. x86-64: Pad vDSO to a page boundary
2011-08-04x86-64: Add user_64bit_mode paravirt opAndy Lutomirski1-0/+4
Three places in the kernel assume that the only long mode CPL 3 selector is __USER_CS. This is not true on Xen -- Xen's sysretq changes cs to the magic value 0xe033. Two of the places are corner cases, but as of "x86-64: Improve vsyscall emulation CS and RIP handling" (c9712944b2a12373cb6ff8059afcfb7e826a6c54), vsyscalls will segfault if called with Xen's extra CS selector. This causes a panic when older init builds die. It seems impossible to make Xen use __USER_CS reliably without taking a performance hit on every system call, so this fixes the tests instead with a new paravirt op. It's a little ugly because ptrace.h can't include paravirt.h. Signed-off-by: Andy Lutomirski <[email protected]> Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu Reported-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2011-07-14KVM guest: Add a pv_ops stub for steal timeGlauber Costa1-0/+9
This patch adds a function pointer in one of the many paravirt_ops structs, to allow guests to register a steal time function. Besides a steal time function, we also declare two jump_labels. They will be used to allow the steal time code to be easily bypassed when not in use. Signed-off-by: Glauber Costa <[email protected]> Acked-by: Rik van Riel <[email protected]> Tested-by: Eric B Munson <[email protected]> CC: Jeremy Fitzhardinge <[email protected]> CC: Peter Zijlstra <[email protected]> CC: Anthony Liguori <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-01-13thp: add pmd paravirt opsAndrea Arcangeli1-0/+3
Paravirt ops pmd_update/pmd_update_defer/pmd_set_at. Not all might be necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs pmd_update_defer), but this is to keep full simmetry with pte paravirt ops, which looks cleaner and simpler from a common code POV. Signed-off-by: Andrea Arcangeli <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-08-23x86, paravirt: Remove alloc_pmd_clone hook, only used by VMIAlok Kataria1-1/+0
VMI was the only user of the alloc_pmd_clone hook, given that VMI is now removed we can also remove this hook. Signed-off-by: Alok N Kataria <[email protected]> LKML-Reference: <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-02-27x86, paravirt: Remove kmap_atomic_pte paravirt op.Ian Campbell1-4/+0
Now that both Xen and VMI disable allocations of PTE pages from high memory this paravirt op serves no further purpose. This effectively reverts ce6234b5 "add kmap_atomic_pte for mapping highpte pages". Signed-off-by: Ian Campbell <[email protected]> LKML-Reference: <[email protected]> Acked-by: Alok Kataria <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2009-09-18Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds1-35/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits) x86: Move get/set_wallclock to x86_platform_ops x86: platform: Fix section annotations x86: apic namespace cleanup x86: Distangle ioapic and i8259 x86: Add Moorestown early detection x86: Add hardware_subarch ID for Moorestown x86: Add early platform detection x86: Move tsc_init to late_time_init x86: Move tsc_calibration to x86_init_ops x86: Replace the now identical time_32/64.c by time.c x86: time_32/64.c unify profile_pc x86: Move calibrate_cpu to tsc.c x86: Make timer setup and global variables the same in time_32/64.c x86: Remove mca bus ifdef from timer interrupt x86: Simplify timer_ack magic in time_32.c x86: Prepare unification of time_32/64.c x86: Remove do_timer hook x86: Add timer_init to x86_init_ops x86: Move percpu clockevents setup to x86_init_ops x86: Move xen_post_allocator_init into xen_pagetable_setup_done ... Fix up conflicts in arch/x86/include/asm/io_apic.h
2009-09-16x86: Move get/set_wallclock to x86_platform_opsFeng Tang1-2/+0
get/set_wallclock() have already a set of platform dependent implementations (default, EFI, paravirt). MRST will add another variant. Moving them to platform ops simplifies the existing code and minimizes the effort to integrate new variants. Signed-off-by: Feng Tang <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Thomas Gleixner <[email protected]>
2009-08-31x86, msr: Rewrite AMD rd/wrmsr variantsBorislav Petkov1-1/+0
Switch them to native_{rd,wr}msr_safe_regs and remove pv_cpu_ops.read_msr_amd. Signed-off-by: Borislav Petkov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2009-08-31x86, msr: Add rd/wrmsr interfaces with preset registersBorislav Petkov1-0/+2
native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow presetting of a subset of eight x86 GPRs before executing the rd/wrmsr instructions. This is needed at least on AMD K8 for accessing an erratum workaround MSR. Originally based on an idea by H. Peter Anvin. Signed-off-by: Borislav Petkov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2009-08-31x86: Move tsc_calibration to x86_init_opsThomas Gleixner1-1/+0
TSC calibration is modified by the vmware hypervisor and paravirt by separate means. Moorestown wants to add its own calibration routine as well. So make calibrate_tsc a proper x86_init_ops function and override it by paravirt or by the early setup of the vmware hypervisor. Signed-off-by: Thomas Gleixner <[email protected]>
2009-08-31x86: Add timer_init to x86_init_opsThomas Gleixner1-1/+0
The timer init code is convoluted with several quirks and the paravirt timer chooser. Figuring out which code path is actually taken is not for the faint hearted. Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and replace the paravirt time chooser and the remaining x86 quirk with a simple x86_init_ops function. Signed-off-by: Thomas Gleixner <[email protected]>
2009-08-31x86: Move percpu clockevents setup to x86_init_opsThomas Gleixner1-2/+0
paravirt overrides the setup of the default apic timers as per cpu timers. Moorestown needs to override that as well. Move it to x86_init_ops setup and create a separate x86_cpuinit struct which holds the function for the secondary evtl. hotplugabble CPUs. Signed-off-by: Thomas Gleixner <[email protected]>
2009-08-31x86: Move paravirt pagetable_setup to x86_init_opsThomas Gleixner1-7/+0
Replace more paravirt hackery by proper x86_init_ops. Signed-off-by: Thomas Gleixner <[email protected]>
2009-08-31x86: Move paravirt banner printout to x86_init_opsThomas Gleixner1-9/+1
Replace another obscure paravirt magic and move it to x86_init_ops. Such a hook is also useful for embedded and special hardware. Signed-off-by: Thomas Gleixner <[email protected]>
2009-08-31x86: Replace ARCH_SETUP by a proper x86_init_opsThomas Gleixner1-1/+0
ARCH_SETUP is a horrible leftover from the old arch/i386 mach support code. It still has a lonely user in xen. Move it to x86_init_ops. Signed-off-by: Thomas Gleixner <[email protected]>
2009-08-31x86: Move irq_init to x86_init_opsThomas Gleixner1-6/+0
irq_init is overridden by x86_quirks and by paravirts. Unify the whole mess and make it an unconditional x86_init_ops function which defaults to the standard function and can be overridden by the early platform code. Signed-off-by: Thomas Gleixner <[email protected]>
2009-08-27x86: Move memory_setup to x86_init_opsThomas Gleixner1-6/+0
memory_setup is overridden by x86_quirks and by paravirts with weak functions and quirks. Unify the whole mess and make it an unconditional x86_init_ops function which defaults to the standard function and can be overridden by the early platform code. Signed-off-by: Thomas Gleixner <[email protected]>
2009-06-10Merge branch 'x86-xen-for-linus' of ↵Linus Torvalds1-30/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits) xen: cache cr0 value to avoid trap'n'emulate for read_cr0 xen/x86-64: clean up warnings about IST-using traps xen/x86-64: fix breakpoints and hardware watchpoints xen: reserve Xen start_info rather than e820 reserving xen: add FIX_TEXT_POKE to fixmap lguest: update lazy mmu changes to match lguest's use of kvm hypercalls xen: honour VCPU availability on boot xen: add "capabilities" file xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet xen/sys/hypervisor: change writable_pt to features xen: add /sys/hypervisor support xen/xenbus: export xenbus_dev_changed xen: use device model for suspending xenbus devices xen: remove suspend_cancel hook xen/dev-evtchn: clean up locking in evtchn xen: export ioctl headers to userspace xen: add /dev/xen/evtchn driver xen: add irq_from_evtchn xen: clean up gate trap/interrupt constants xen: set _PAGE_NX in __supported_pte_mask before pagetable construction ...
2009-05-15x86: Fix performance regression caused by paravirt_ops on native kernelsJeremy Fitzhardinge1-0/+2
Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: | commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8 | Date: Mon Jul 7 12:07:51 2008 -0700 | | paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq *0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop /* or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: "Xin Xiaohui" <[email protected]> Analysed-by: "Li Xin" <[email protected]> Analysed-by: "Nakajima Jun" <[email protected]> Signed-off-by: Jeremy Fitzhardinge <[email protected]> Acked-by: H. Peter Anvin <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Xen-devel <[email protected]> LKML-Reference: <[email protected]> [ fixed the help text ] Signed-off-by: Ingo Molnar <[email protected]>
2009-04-07Merge commit 'origin/master' into for-linus/xen/masterJeremy Fitzhardinge1-1/+0
* commit 'origin/master': (4825 commits) Fix build errors due to CONFIG_BRANCH_TRACER=y parport: Use the PCI IRQ if offered tty: jsm cleanups Adjust path to gpio headers KGDB_SERIAL_CONSOLE check for module Change KCONFIG name tty: Blackin CTS/RTS Change hardware flow control from poll to interrupt driven Add support for the MAX3100 SPI UART. lanana: assign a device name and numbering for MAX3100 serqt: initial clean up pass for tty side tty: Use the generic RS485 ioctl on CRIS tty: Correct inline types for tty_driver_kref_get() splice: fix deadlock in splicing to file nilfs2: support nanosecond timestamp nilfs2: introduce secondary super block nilfs2: simplify handling of active state of segments nilfs2: mark minor flag for checkpoint created by internal operation nilfs2: clean up sketch file nilfs2: super block operations fix endian bug ... Conflicts: arch/x86/include/asm/thread_info.h arch/x86/lguest/boot.c drivers/xen/manage.c
2009-03-29x86/paravirt: use percpu_ rather than __get_cpu_varJeremy Fitzhardinge1-5/+5
Impact: minor optimisation percpu_read/write is a slightly more direct way of getting to percpu data. Signed-off-by: Jeremy Fitzhardinge <[email protected]>
2009-03-29x86/paravirt: allow preemption with lazy mmu modeJeremy Fitzhardinge1-3/+4
Impact: remove obsolete checks, simplification Lift restrictions on preemption with lazy mmu mode, as it is now allowed. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Acked-by: Peter Zijlstra <[email protected]>
2009-03-29x86/paravirt: finish change from lazy cpu to context switch start/endJeremy Fitzhardinge1-8/+6
Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: Jeremy Fitzhardinge <[email protected]> Acked-by: Peter Zijlstra <[email protected]>
2009-03-29x86/paravirt: flush pending mmu updates on context switchJeremy Fitzhardinge1-3/+10
Impact: allow preemption during lazy mmu updates If we're in lazy mmu mode when context switching, leave lazy mmu mode, but remember the task's state in TIF_LAZY_MMU_UPDATES. When we resume the task, check this flag and re-enter lazy mmu mode if its set. This sets things up for allowing lazy mmu mode while preemptible, though that won't actually be active until the next change. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Acked-by: Peter Zijlstra <[email protected]>
2009-03-29x86/pvops: replace arch_enter_lazy_cpu_mode with arch_start_context_switchJeremy Fitzhardinge1-13/+0
Impact: simplification, prepare for later changes Make lazy cpu mode more specific to context switching, so that it makes sense to do more context-switch specific things in the callbacks. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Acked-by: Peter Zijlstra <[email protected]>