aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2013-04-28Merge git://github.com/agraf/linux-2.6.git kvm-ppc-next into queueGleb Natapov1-0/+2
2013-04-28KVM: x86: Rework request for immediate exitJan Kiszka1-1/+1
The VMX implementation of enable_irq_window raised KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This caused infinite loops on vmentry. Fix it by letting enable_irq_window signal the need for an immediate exit via its return value and drop KVM_REQ_IMMEDIATE_EXIT. This issue only affects nested VMX scenarios. Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-04-28Merge branch 'pm-cpufreq'Rafael J. Wysocki1-0/+1
* pm-cpufreq: (57 commits) cpufreq: MAINTAINERS: Add co-maintainer cpufreq: pxa2xx: initialize variables ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y cpufreq: cpu0: Put cpu parent node after using it cpufreq: ARM big LITTLE: Adapt to latest cpufreq updates cpufreq: ARM big LITTLE: put DT nodes after using them cpufreq: Don't call __cpufreq_governor() for drivers without target() cpufreq: exynos5440: Protect OPP search calls with RCU lock cpufreq: dbx500: Round to closest available freq cpufreq: Call __cpufreq_governor() with correct policy->cpus mask cpufreq / intel_pstate: Optimize intel_pstate_set_policy cpufreq: OMAP: instantiate omap-cpufreq as a platform_driver arm: exynos: Enable OPP library support for exynos5440 cpufreq: exynos: Remove error return even if no soc is found cpufreq: exynos: Add cpufreq driver for exynos5440 cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor cpufreq: ondemand: allow custom powersave_bias_target handler to be registered cpufreq: convert cpufreq_driver to using RCU cpufreq: powerpc/platforms/cell: move cpufreq driver to drivers/cpufreq cpufreq: sparc: move cpufreq driver to drivers/cpufreq ... Conflicts: MAINTAINERS (with commit a8e39c3 from pm-cpuidle) drivers/cpufreq/cpufreq_governor.h (with commit beb0ff3)
2013-04-26KVM: Add KVM_IRQCHIP_NUM_PINS in addition to KVM_IOAPIC_NUM_PINSAlexander Graf1-0/+2
The concept of routing interrupt lines to an irqchip is nothing that is IOAPIC specific. Every irqchip has a maximum number of pins that can be linked to irq lines. So let's add a new define that allows us to reuse generic code for non-IOAPIC platforms. Signed-off-by: Alexander Graf <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]>
2013-04-26perf/x86/intel/P4: Robistify P4 PMU typesIngo Molnar1-31/+31
Linus found, while extending integer type extension checks in the sparse static code checker, various fragile patterns of mixed signed/unsigned 64-bit/32-bit integer use in perf_events_p4.c. The relevant hardware register ABI is 64 bit wide on 32-bit kernels as well, so clean it all up a bit, remove unnecessary casts, and make sure we use 64-bit unsigned integers in these places. [ Unfortunately this patch was not tested on real P4 hardware, those are pretty rare already. If this patch causes any problems on P4 hardware then please holler ... ] Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: David Miller <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-04-25crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of ↵Jussi Kivilinna1-0/+19
camellia cipher Patch adds AVX2/AES-NI/x86-64 implementation of Camellia cipher, requiring 32 parallel blocks for input (512 bytes). Compared to AVX implementation, this version is extended to use the 256-bit wide YMM registers. For AES-NI instructions data is split to two 128-bit registers and merged afterwards. Even with this additional handling, performance should be higher compared to the AES-NI/AVX implementation. Signed-off-by: Jussi Kivilinna <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-04-25crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipherJussi Kivilinna1-0/+24
Patch adds AVX2/x86-64 implementation of Serpent cipher, requiring 16 parallel blocks for input (256 bytes). Implementation is based on the AVX implementation and extends to use the 256-bit wide YMM registers. Since serpent does not use table look-ups, this implementation should be close to two times faster than the AVX implementation. Signed-off-by: Jussi Kivilinna <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-04-25crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipherJussi Kivilinna1-0/+18
Patch adds AVX2/x86-64 implementation of Twofish cipher, requiring 16 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Implementation also uses 256-bit wide YMM registers, which should give additional speed up compared to the AVX implementation. Signed-off-by: Jussi Kivilinna <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-04-25crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipherJussi Kivilinna2-0/+44
Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Signed-off-by: Jussi Kivilinna <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-04-25crypto: x86 - add more optimized XTS-mode for serpent-avxJussi Kivilinna2-0/+29
This patch adds AVX optimized XTS-mode helper functions/macros and converts serpent-avx to use the new facilities. Benefits are slightly improved speed and reduced stack usage as use of temporary IV-array is avoided. tcrypt results, with Intel i5-2450M: enc dec 16B 1.00x 1.00x 64B 1.00x 1.00x 256B 1.04x 1.06x 1024B 1.09x 1.09x 8192B 1.10x 1.09x Signed-off-by: Jussi Kivilinna <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-04-25x86 cmpxchg.h: fix wrong commentLi Zhong1-1/+1
Signed-off-by: Li Zhong <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2013-04-24Merge branch 'linus' into timers/coreThomas Gleixner8-8/+21
Reason: Get upstream fixes before adding conflicting code. Signed-off-by: Thomas Gleixner <[email protected]>
2013-04-22KVM: nVMX: Validate EFER values for VM_ENTRY/EXIT_LOAD_IA32_EFERJan Kiszka1-0/+1
As we may emulate the loading of EFER on VM-entry and VM-exit, implement the checks that VMX performs on the guest and host values on vmlaunch/ vmresume. Factor out kvm_valid_efer for this purpose which checks for set reserved bits. Signed-off-by: Jan Kiszka <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-04-22KVM: nVMX: Shadow-vmcs control fields/bitsAbel Gordon1-0/+3
Add definitions for all the vmcs control fields/bits required to enable vmcs-shadowing Signed-off-by: Abel Gordon <[email protected]> Reviewed-by: Orit Wasserman <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-04-22lguest: map Switcher below fixmap.Rusty Russell1-6/+0
Now we've adjusted all the code, we can simply set switcher_addr to wherever it needs to go below the fixmaps, rather than asserting that it should be so. With large NR_CPUS and PAE, people were hitting the "mapping switcher would thwack fixmap" message. Reported-by: Paul Bolle <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2013-04-22lguest: assume Switcher text is a single page.Rusty Russell1-5/+2
ie. SHARED_SWITCHER_PAGES == 1. It is well under a page, and it's a minor simplification: it's nice to have *one* simplification in a patch series! Signed-off-by: Rusty Russell <[email protected]>
2013-04-22lguest: prepare to make SWITCHER_ADDR a variable.Rusty Russell1-0/+2
We currently use the whole top PGD entry for the switcher, but that's hitting the fixmap in some configurations (mainly, large NR_CPUS). Introduce a variable, currently set to the constant. Signed-off-by: Rusty Russell <[email protected]>
2013-04-21perf/x86/amd: Add support for AMD NB and L2I "uncore" countersJacob Shin1-0/+2
Add support for AMD Family 15h [and above] northbridge performance counters. MSRs 0xc0010240 ~ 0xc0010247 are shared across all cores that share a common northbridge. Add support for AMD Family 16h L2 performance counters. MSRs 0xc0010230 ~ 0xc0010237 are shared across all cores that share a common L2 cache. We do not enable counter overflow interrupts. Sampling mode and per-thread events are not supported. Signed-off-by: Jacob Shin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/20130419213428.GA8229@jshin-Toonie Signed-off-by: Ingo Molnar <[email protected]>
2013-04-21Merge branch 'perf/urgent' into perf/coreIngo Molnar4-4/+9
Conflicts: arch/x86/kernel/cpu/perf_event_intel.c Merge in the latest fixes before applying new patches, resolve the conflict. Signed-off-by: Ingo Molnar <[email protected]>
2013-04-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Three groups of fixes: 1. Make sure we don't execute the early microcode patching if family < 6, since it would touch MSRs which don't exist on those families, causing crashes. 2. The Xen partial emulation of HyperV can be dealt with more gracefully than just disabling the driver. 3. More EFI variable space magic. In particular, variables hidden from runtime code need to be taken into account too." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Verify the family before dispatching microcode patching x86, hyperv: Handle Xen emulation of Hyper-V more gracefully x86,efi: Implement efi_no_storage_paranoia parameter efi: Export efi_query_variable_store() for efivars.ko x86/Kconfig: Make EFI select UCS2_STRING efi: Distinguish between "remaining space" and actually used space efi: Pass boot services variable info to runtime code Move utf16 functions to kernel core and rename x86,efi: Check max_size only if it is non-zero. x86, efivars: firmware bug workarounds should be in platform code
2013-04-19Merge remote-tracking branch 'efi/urgent' into x86/urgentH. Peter Anvin1-0/+7
Matt Fleming (1): x86, efivars: firmware bug workarounds should be in platform code Matthew Garrett (3): Move utf16 functions to kernel core and rename efi: Pass boot services variable info to runtime code efi: Distinguish between "remaining space" and actually used space Richard Weinberger (2): x86,efi: Check max_size only if it is non-zero. x86,efi: Implement efi_no_storage_paranoia parameter Sergey Vlasov (2): x86/Kconfig: Make EFI select UCS2_STRING efi: Export efi_query_variable_store() for efivars.ko Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-19iommu: Fix compile warnings with forward declarationsJoerg Roedel1-0/+7
The irq_remapping.h file for x86 does not include all necessary forward declarations for the data structures used. This causes compile warnings, so fix it. Signed-off-by: Joerg Roedel <[email protected]>
2013-04-19Merge tag 'edac_amd_f16h' of ↵Ingo Molnar4-4/+9
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull AMD F16h support for amd64_edac from Borislav Petkov. Signed-off-by: Ingo Molnar <[email protected]>
2013-04-18iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsetsNeil Horman1-0/+2
A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. For details of those that reported the problem, please see: https://bugzilla.redhat.com/show_bug.cgi?id=887006 [ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] Signed-off-by: Neil Horman <[email protected]> CC: Prarit Bhargava <[email protected]> CC: Don Zickus <[email protected]> CC: Don Dutile <[email protected]> CC: Bjorn Helgaas <[email protected]> CC: Asit Mallick <[email protected]> CC: David Woodhouse <[email protected]> CC: [email protected] CC: Joerg Roedel <[email protected]> CC: Konrad Rzeszutek Wilk <[email protected]> CC: Arkadiusz Miśkiewicz <[email protected]> Signed-off-by: Joerg Roedel <[email protected]>
2013-04-16KVM: VMX: Add the deliver posted interrupt algorithmYang Zhang1-0/+2
Only deliver the posted interrupt when target vcpu is running and there is no previous interrupt pending in pir. Signed-off-by: Yang Zhang <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2013-04-16KVM: VMX: Check the posted interrupt capabilityYang Zhang1-0/+4
Detect the posted interrupt feature. If it exists, then set it in vmcs_config. Signed-off-by: Yang Zhang <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2013-04-16KVM: VMX: Register a new IPI for posted interruptYang Zhang4-0/+13
Posted Interrupt feature requires a special IPI to deliver posted interrupt to guest. And it should has a high priority so the interrupt will not be blocked by others. Normally, the posted interrupt will be consumed by vcpu if target vcpu is running and transparent to OS. But in some cases, the interrupt will arrive when target vcpu is scheduled out. And host will see it. So we need to register a dump handler to handle it. Signed-off-by: Yang Zhang <[email protected]> Acked-by: Ingo Molnar <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2013-04-16KVM: VMX: Enable acknowledge interupt on vmexitYang Zhang1-0/+1
The "acknowledge interrupt on exit" feature controls processor behavior for external interrupt acknowledgement. When this control is set, the processor acknowledges the interrupt controller to acquire the interrupt vector on VM exit. After enabling this feature, an interrupt which arrived when target cpu is running in vmx non-root mode will be handled by vmx handler instead of handler in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt table to let real handler to handle it. Further, we will recognize the interrupt and only delivery the interrupt which not belong to current vcpu through idt table. The interrupt which belonged to current vcpu will be handled inside vmx handler. This will reduce the interrupt handle cost of KVM. Also, interrupt enable logic is changed if this feature is turnning on: Before this patch, hypervior call local_irq_enable() to enable it directly. Now IF bit is set on interrupt stack frame, and will be enabled on a return from interrupt handler if exterrupt interrupt exists. If no external interrupt, still call local_irq_enable() to enable it. Refer to Intel SDM volum 3, chapter 33.2. Signed-off-by: Yang Zhang <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2013-04-16Merge branch 'uprobes/core' of ↵Ingo Molnar1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core Pull uprobes updates from Oleg Nesterov: - "uretprobes" - an optimization to uprobes, like kretprobes are an optimization to kprobes. "perf probe -x file sym%return" now works like kretprobes. - PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes. Signed-off-by: Ingo Molnar <[email protected]>
2013-04-15efi: Pass boot services variable info to runtime codeMatthew Garrett1-0/+7
EFI variables can be flagged as being accessible only within boot services. This makes it awkward for us to figure out how much space they use at runtime. In theory we could figure this out by simply comparing the results from QueryVariableInfo() to the space used by all of our variables, but that fails if the platform doesn't garbage collect on every boot. Thankfully, calling QueryVariableInfo() while still inside boot services gives a more reliable answer. This patch passes that information from the EFI boot stub up to the efi platform code. Signed-off-by: Matthew Garrett <[email protected]> Signed-off-by: Matt Fleming <[email protected]>
2013-04-14Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set x86/mm/cpa/selftest: Fix false positive in CPA self test x86/mm/cpa: Convert noop to functional fix x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates
2013-04-14KVM: VMX: do not try to reexecute failed instruction while emulating invalid ↵Gleb Natapov1-0/+1
guest state During invalid guest state emulation vcpu cannot enter guest mode to try to reexecute instruction that emulator failed to emulate, so emulation will happen again and again. Prevent that by telling the emulator that instruction reexecution should not be attempted. Signed-off-by: Gleb Natapov <[email protected]>
2013-04-13uretprobes/x86: Hijack return addressAnton Arapov1-0/+1
Hijack the return address and replace it with a trampoline address. Signed-off-by: Anton Arapov <[email protected]> Acked-by: Srikar Dronamraju <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]>
2013-04-12x86-32: Fix possible incomplete TLB invalidate with PAE pagetablesDave Hansen1-1/+1
This patch attempts to fix: https://bugzilla.kernel.org/show_bug.cgi?id=56461 The symptom is a crash and messages like this: chrome: Corrupted page table at address 34a03000 *pdpt = 0000000000000000 *pde = 0000000000000000 Bad pagetable: 000f [#1] PREEMPT SMP Ingo guesses this got introduced by commit 611ae8e3f520 ("x86/tlb: enable tlb flush range support for x86") since that code started to free unused pagetables. On x86-32 PAE kernels, that new code has the potential to free an entire PMD page and will clear one of the four page-directory-pointer-table (aka pgd_t entries). The hardware aggressively "caches" these top-level entries and invlpg does not actually affect the CPU's copy. If we clear one we *HAVE* to do a full TLB flush, otherwise we might continue using a freed pmd page. (note, we do this properly on the population side in pud_populate()). This patch tracks whenever we clear one of these entries in the 'struct mmu_gather', and ensures that we follow up with a full tlb flush. BTW, I disassembled and checked that: if (tlb->fullmm == 0) and if (!tlb->fullmm && !tlb->need_flush_all) generate essentially the same code, so there should be zero impact there to the !PAE case. Signed-off-by: Dave Hansen <[email protected]> Cc: Peter Anvin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Artem S Tashkinov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-04-12x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMERPaul Bolle1-3/+0
The last users of FIX_CYCLONE_TIMER were removed in v2.6.18. We can remove this unneeded constant. Signed-off-by: Paul Bolle <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-04-11x86, xen, gdt: Remove the pvops variant of store_gdt.Konrad Rzeszutek Wilk2-5/+1
The two use-cases where we needed to store the GDT were during ACPI S3 suspend and resume. As the patches: x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed. have demonstrated - there are other mechanism by which the GDT is saved and reloaded during early resume path. Hence we do not need to worry about the pvops call-chain for saving the GDT and can and can eliminate it. The other areas where the store_gdt is used are never going to be hit when running under the pvops platforms. Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-11x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not neededKonrad Rzeszutek Wilk1-1/+0
During the ACPI S3 suspend, we store the GDT in the wakup_header (see wakeup_asm.s) field called 'pmode_gdt'. Which is then used during the resume path and has the same exact value as what the store/load_gdt do with the saved_context (which is saved/restored via save/restore_processor_state()). The flow during resume from ACPI S3 is simpler than the 64-bit counterpart. We only use the early bootstrap once (wakeup_gdt) and do various checks in real mode. After the checks are completed, we load the saved GDT ('pmode_gdt') and continue on with the resume (by heading to startup_32 in trampoline_32.S) - which quickly jumps to what was saved in 'pmode_entry' aka 'wakeup_pmode_return'. The 'wakeup_pmode_return' restores the GDT (saved_gdt) again (which was saved in do_suspend_lowlevel initially). After that it ends up calling the 'ret_point' which calls 'restore_processor_state()'. We have two opportunities to remove code where we restore the same GDT twice. Here is the call chain: wakeup_start |- lgdtl wakeup_gdt [the work-around broken BIOSes] | | - lgdtl pmode_gdt [the real one] | \-- startup_32 (in trampoline_32.S) \-- wakeup_pmode_return (in wakeup_32.S) |- lgdtl saved_gdt [the real one] \-- ret_point |.. |- call restore_processor_state The hibernate path is much simpler. During the saving of the hibernation image we call save_processor_state() and save the contents of that along with the rest of the kernel in the hibernation image destination. We save the EIP of 'restore_registers' (restore_jump_address) and cr3 (restore_cr3). During hibernate resume, the 'restore_registers' (via the 'restore_jump_address) in hibernate_asm_32.S is invoked which restores the contents of most registers. Naturally the resume path benefits from already being in 32-bit mode, so it does not have to reload the GDT. It only reloads the cr3 (from restore_cr3) and continues on. Note that the restoration of the restore image page-tables is done prior to this. After the 'restore_registers' it returns and we end up called restore_processor_state() - where we reload the GDT. The reload of the GDT is not needed as bootup kernel has already loaded the GDT which is at the same physical location as the the restored kernel. Note that the hibernation path assumes the GDT is correct during its 'restore_registers'. The assumption in the code is that the restored image is the same as saved - meaning we are not trying to restore an different kernel in the virtual address space of a new kernel. Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-11x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed.Konrad Rzeszutek Wilk1-3/+0
During the ACPI S3 resume path the trampoline code handles it already. During the ACPI S3 suspend phase (acpi_suspend_lowlevel) we set: early_gdt_descr.address = (..)get_cpu_gdt_table(smp_processor_id()); which is then used during the resume path and has the same exact value as what the store/load_gdt do with the saved_context (which is saved/restored via save/restore_processor_state()). The flow during resume is complex and for 64-bit kernels we use three GDTs - one early bootstrap GDT (wakeup_igdt) that we load to workaround broken BIOSes, an early Protected Mode to Long Mode transition one (tr_gdt), and the final one - early_gdt_descr (which points to the real GDT). The early ('wakeup_gdt') is loaded in 'trampoline_start' for working around broken BIOSes, and then when we end up in Protected Mode in the startup_32 (in trampoline_64.s, not head_32.s) we use the 'tr_gdt' (still in trampoline_64.s). This 'tr_gdt' has a a 32-bit code segment, 64-bit code segment with L=1, and a 32-bit data segment. Once we have transitioned from Protected Mode to Long Mode we then set the GDT to 'early_gdt_desc' and then via an iretq emerge in wakeup_long64 (set via 'initial_code' variable in acpi_suspend_lowlevel). In the wakeup_long64 we end up restoring the %rip (which is set to 'resume_point') and jump there. In 'resume_point' we call 'restore_processor_state' which does the load_gdt on the saved context. This load_gdt is redundant as the GDT loaded via early_gdt_desc is the same. Here is the call-chain: wakeup_start |- lgdtl wakeup_gdt [the work-around broken BIOSes] | \-- trampoline_start (trampoline_64.S) |- lgdtl tr_gdt | \-- startup_32 (trampoline_64.S) | \-- startup_64 (trampoline_64.S) | \-- secondary_startup_64 |- lgdtl early_gdt_desc | ... |- movq initial_code(%rip), %eax |-.. lretq \-- wakeup_64 |-- other registers are reloaded |-- call restore_processor_state The hibernate path is much simpler. During the saving of the hibernation image we call save_processor_state() and save the contents of that along with the rest of the kernel in the hibernation image destination. We save the EIP of 'restore_registers' (restore_jump_address) and cr3 (restore_cr3). During hibernate resume, the 'restore_registers' (via the 'restore_jump_address) in hibernate_asm_64.S is invoked which restores the contents of most registers. Naturally the resume path benefits from already being in 64-bit mode, so it does not have to load the GDT. It only reloads the cr3 (from restore_cr3) and continues on. Note that the restoration of the restore image page-tables is done prior to this. After the 'restore_registers' it returns and we end up called restore_processor_state() - where we reload the GDT. The reload of the GDT is not needed as bootup kernel has already loaded the GDT which is at the same physical location as the the restored kernel. Note that the hibernation path assumes the GDT is correct during its 'restore_registers'. The assumption in the code is that the restored image is the same as saved - meaning we are not trying to restore an different kernel in the virtual address space of a new kernel. Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-11x86: Use a read-only IDT alias on all CPUsKees Cook1-3/+1
Make a copy of the IDT (as seen via the "sidt" instruction) read-only. This primarily removes the IDT from being a target for arbitrary memory write attacks, and has the added benefit of also not leaking the kernel base offset, if it has been relocated. We already did this on vendor == Intel and family == 5 because of the F0 0F bug -- regardless of if a particular CPU had the F0 0F bug or not. Since the workaround was so cheap, there simply was no reason to be very specific. This patch extends the readonly alias to all CPUs, but does not activate the #PF to #UD conversion code needed to deliver the proper exception in the F0 0F case except on Intel family 5 processors. Signed-off-by: Kees Cook <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: Eric Northup <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-10x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metalBoris Ostrovsky2-1/+6
Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU updates" may cause a minor performance regression on bare metal. This patch resolves that performance regression. It is somewhat unclear to me if this is a good -stable candidate. ] Signed-off-by: Boris Ostrovsky <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Tested-by: Josh Boyer <[email protected]> Tested-by: Konrad Rzeszutek Wilk <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: <[email protected]> SEE NOTE ABOVE
2013-04-10x86/mm/cpa: Cleanup split_large_page() and its calleeBorislav Petkov1-1/+0
So basically we're generating the pte_t * from a struct page and we're handing it down to the __split_large_page() internal version which then goes and gets back struct page * from it because it needs it. Change the caller to hand down struct page * directly and the callee can compute the pte_t itself. Net save is one virt_to_page() call and simpler code. While at it, make __split_large_page() static. Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-04-10cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand ↵Jacob Shin1-0/+1
governor Future AMD processors, starting with Family 16h, can provide software with feedback on how the workload may respond to frequency change -- memory-bound workloads will not benefit from higher frequency, where as compute-bound workloads will. This patch enables this "frequency sensitivity feedback" to aid the ondemand governor to make better frequency change decisions by hooking into the powersave bias. Signed-off-by: Jacob Shin <[email protected]> Acked-by: Thomas Renninger <[email protected]> Acked-by: Borislav Petkov <[email protected]> Acked-by: Viresh Kumar <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-04-08arch: Consolidate tsk_is_polling()Thomas Gleixner1-2/+0
Move it to a common place. Preparatory patch for implementing set/clear for the idle need_resched poll implementation. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Paul McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Reviewed-by: Cc: Srivatsa S. Bhat <[email protected]> Cc: Magnus Damm <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2013-04-08KVM: Move kvm_rebooting declaration out of x86Geoff Levand1-1/+0
The variable kvm_rebooting is a common kvm variable, so move its declaration from arch/x86/include/asm/kvm_host.h to include/asm/kvm_host.h. Fixes this sparse warning when building on arm64: virt/kvm/kvm_main.c:warning: symbol 'kvm_rebooting' was not declared. Should it be static? Signed-off-by: Geoff Levand <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-04-08KVM: Move vm_list kvm_lock declarations out of x86Geoff Levand1-3/+0
The variables vm_list and kvm_lock are common to all architectures, so move the declarations from arch/x86/include/asm/kvm_host.h to include/linux/kvm_host.h. Fixes sparse warnings like these when building for arm64: virt/kvm/kvm_main.c: warning: symbol 'kvm_lock' was not declared. Should it be static? virt/kvm/kvm_main.c: warning: symbol 'vm_list' was not declared. Should it be static? Signed-off-by: Geoff Levand <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-04-02x86, msr: Unify variable namesBorislav Petkov1-7/+7
Make sure all MSR-accessing primitives which split MSR values in two 32-bit parts have their variables called 'low' and 'high' for consistence with the rest of the code and for ease of staring. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-02x86: Drop KERNEL_IMAGE_STARTBorislav Petkov1-1/+0
We have KERNEL_IMAGE_START and __START_KERNEL_map which both contain the start of the kernel text mapping's virtual address. Remove the prior one which has been replicated a lot less times around the tree. No functionality change. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-02x86: remove the x32 syscall bitmask from syscall_get_nr()Paul Moore1-2/+2
Commit fca460f95e928bae373daa8295877b6905bc62b8 simplified the x32 implementation by creating a syscall bitmask, equal to 0x40000000, that could be applied to x32 syscalls such that the masked syscall number would be the same as a x86_64 syscall. While that patch was a nice way to simplify the code, it went a bit too far by adding the mask to syscall_get_nr(); returning the masked syscall numbers can cause confusion with callers that expect syscall numbers matching the x32 ABI, e.g. unmasked syscall numbers. This patch fixes this by simply removing the mask from syscall_get_nr() while preserving the other changes from the original commit. While there are several syscall_get_nr() callers in the kernel, most simply check that the syscall number is greater than zero, in this case this patch will have no effect. Of those remaining callers, they appear to be few, seccomp and ftrace, and from my testing of seccomp without this patch the original commit definitely breaks things; the seccomp filter does not correctly filter the syscalls due to the difference in syscall numbers in the BPF filter and the value from syscall_get_nr(). Applying this patch restores the seccomp BPF filter functionality on x32. I've tested this patch with the seccomp BPF filters as well as ftrace and everything looks reasonable to me; needless to say general usage seemed fine as well. Signed-off-by: Paul Moore <[email protected]> Link: http://lkml.kernel.org/r/20130215172143.12549.10292.stgit@localhost Cc: <[email protected]> Cc: Will Drewry <[email protected]> Cc: H. Peter Anvin <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-04-02x86/mce: Rework cmci_rediscover() to play well with CPU hotplugSrivatsa S. Bhat1-2/+2
Dave Jones reports that offlining a CPU leads to this trace: numa_remove_cpu cpu 1 node 0: mask now 0,2-3 smpboot: CPU 1 is now offline BUG: using smp_processor_id() in preemptible [00000000] code: cpu-offline.sh/10591 caller is cmci_rediscover+0x6a/0xe0 Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2 Call Trace: [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100 [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0 [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150 [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10 [<ffffffff8104c2e3>] cpu_notify+0x23/0x50 [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20 [<ffffffff815ef082>] _cpu_down+0x302/0x350 [<ffffffff815ef106>] cpu_down+0x36/0x50 [<ffffffff815f1c9d>] store_online+0x8d/0xd0 [<ffffffff813edc48>] dev_attr_store+0x18/0x30 [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150 [<ffffffff811adfb2>] vfs_write+0xa2/0x170 [<ffffffff811ae16c>] sys_write+0x4c/0xa0 [<ffffffff81613019>] system_call_fastpath+0x16/0x1b However, a look at cmci_rediscover shows that it can be simplified quite a bit, apart from solving the above issue. It invokes functions that take spin locks with interrupts disabled, and hence it can run in atomic context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU is already dead and out of the cpu_online_mask. So take these points into account and simplify the code, and thereby also fix the above issue. Reported-by: Dave Jones <[email protected]> Signed-off-by: Srivatsa S. Bhat <[email protected]> Signed-off-by: Tony Luck <[email protected]>
2013-04-02x86, cpu: Convert AMD Erratum 400Borislav Petkov2-19/+1
Convert AMD erratum 400 to the bug infrastructure. Then, retract all exports for modules since they're not needed now and make the AMD erratum checking machinery local to amd.c. Use forward declarations to avoid shuffling too much code around needlessly. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>