aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2013-08-12PCI: use weak functions for MSI arch-specific functionsThomas Petazzoni1-30/+0
Until now, the MSI architecture-specific functions could be overloaded using a fairly complex set of #define and compile-time conditionals. In order to prepare for the introduction of the msi_chip infrastructure, it is desirable to switch all those functions to use the 'weak' mechanism. This commit converts all the architectures that were overidding those MSI functions to use the new strategy. Note that we keep two separate, non-weak, functions default_teardown_msi_irqs() and default_restore_msi_irqs() for the default behavior of the arch_teardown_msi_irqs() and arch_restore_msi_irqs(), as the default behavior is needed by x86 PCI code. Signed-off-by: Thomas Petazzoni <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Tested-by: Daniel Price <[email protected]> Tested-by: Thierry Reding <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: [email protected] Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: [email protected] Cc: Russell King <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: [email protected] Cc: Ralf Baechle <[email protected]> Cc: [email protected] Cc: David S. Miller <[email protected]> Cc: [email protected] Cc: Chris Metcalf <[email protected]> Signed-off-by: Jason Cooper <[email protected]>
2013-08-09x86: Don't clear olpc_ofw_header when sentinel is detectedDaniel Drake1-2/+2
OpenFirmware wasn't quite following the protocol described in boot.txt and the kernel has detected this through use of the sentinel value in boot_params. OFW does zero out almost all of the stuff that it should do, but not the sentinel. This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC support. OpenFirmware has now been fixed. However, it would be nice if we could maintain Linux compatibility with old firmware versions. To do that, we just have to avoid zeroing out olpc_ofw_header. OFW does not write to any other parts of the header that are being zapped by the sentinel-detection code, and all users of olpc_ofw_header are somewhat protected through checking for the OLPC_OFW_SIG magic value before using it. So this should not cause any problems for anyone. Signed-off-by: Daniel Drake <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Acked-by: Yinghai Lu <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: <[email protected]> # v3.9+
2013-08-09xen: Support 64-bit PV guest receiving NMIsKonrad Rzeszutek Wilk1-0/+1
This is based on a patch that Zhenzhong Duan had sent - which was missing some of the remaining pieces. The kernel has the logic to handle Xen-type-exceptions using the paravirt interface in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME - pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN - pv_cpu_ops.iret). That means the nmi handler (and other exception handlers) use the hypervisor iret. The other changes that would be neccessary for this would be to translate the NMI_VECTOR to one of the entries on the ipi_vector and make xen_send_IPI_mask_allbutself use different events. Fortunately for us commit 1db01b4903639fcfaec213701a494fe3fb2c490b (xen: Clean up apic ipi interface) implemented this and we piggyback on the cleanup such that the apic IPI interface will pass the right vector value for NMI. With this patch we can trigger NMIs within a PV guest (only tested x86_64). For this to work with normal PV guests (not initial domain) we need the domain to be able to use the APIC ops - they are already implemented to use the Xen event channels. For that to be turned on in a PV domU we need to remove the masking of X86_FEATURE_APIC. Incidentally that means kgdb will also now work within a PV guest without using the 'nokgdbroundup' workaround. Note that the 32-bit version is different and this patch does not enable that. CC: Lisa Nguyen <[email protected]> CC: Ben Guthro <[email protected]> CC: Zhenzhong Duan <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> [v1: Fixed up per David Vrabel comments] Reviewed-by: Ben Guthro <[email protected]> Reviewed-by: David Vrabel <[email protected]>
2013-08-09kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapiRaghavendra K T1-0/+1
These are needed by both guest and host. Originally-from: Srivatsa Vaddagiri <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-13-git-send-email-raghavendra.kt@linux.vnet.ibm.com Acked-by: Gleb Natapov <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, ticketlock: Add slowpath logicJeremy Fitzhardinge3-25/+65
Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: Unlocker Locker test for lock pickup -> fail unlock test slowpath -> false set slowpath flags block Whereas this works in any ordering: Unlocker Locker set slowpath flags test for lock pickup -> fail block unlock test slowpath -> true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked "add" is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Srivatsa Vaddagiri <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Cc: Stephan Diestelhorst <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, pvticketlock: When paravirtualizing ticket locks, increment by 2Jeremy Fitzhardinge2-6/+14
Increment ticket head/tails by 2 rather than 1 to leave the LSB free to store a "is in slowpath state" bit. This halves the number of possible CPUs for a given ticket size, but this shouldn't matter in practice - kernels built for 32k+ CPU systems are probably specially built for the hardware rather than a generic distro kernel. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, pvticketlock: Use callee-save for lock_spinningJeremy Fitzhardinge2-2/+2
Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, ticketlock: Collapse a layer of functionsJeremy Fitzhardinge1-30/+5
Now that the paravirtualization layer doesn't exist at the spinlock level any more, we can collapse the __ticket_ functions into the arch_ functions. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-4-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-09x86, spinlock: Replace pv spinlocks with pv ticketlocksJeremy Fitzhardinge4-46/+57
Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Results: ======= setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM base = 3.11-rc patched = base + pvspinlock V12 +-----------------+----------------+--------+ dbench (Throughput in MB/sec. Higher is better) +-----------------+----------------+--------+ | base (stdev %)|patched(stdev%) | %gain | +-----------------+----------------+--------+ | 15035.3 (0.3) |15150.0 (0.6) | 0.8 | | 1470.0 (2.2) | 1713.7 (1.9) | 16.6 | | 848.6 (4.3) | 967.8 (4.3) | 14.0 | | 652.9 (3.5) | 685.3 (3.7) | 5.0 | +-----------------+----------------+--------+ pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases, and undercommits results are flat Signed-off-by: Jeremy Fitzhardinge <[email protected]> Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Attilio Rao <[email protected]> [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t] Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-07x86, relocs: Move ELF relocation handling to CKees Cook3-5/+7
Moves the relocation handling into C, after decompression. This requires that the decompressed size is passed to the decompression routine as well so that relocations can be found. Only kernels that need relocation support will use the code (currently just x86_32), but this is laying the ground work for 64-bit using it in support of KASLR. Based on work by Neill Clift and Michael Davidson. Signed-off-by: Kees Cook <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Acked-by: Zhang Yanfei <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-07nEPT: Nested INVEPTNadav Har'El2-0/+3
If we let L1 use EPT, we should probably also support the INVEPT instruction. In our current nested EPT implementation, when L1 changes its EPT table for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course of this modification already calls INVEPT. But if last level of shadow page is unsync not all L1's changes to EPT12 are intercepted, which means roots need to be synced when L1 calls INVEPT. Global INVEPT should not be different since roots are synced by kvm_mmu_load() each time EPTP02 changes. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Nadav Har'El <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-07nEPT: Add nEPT violation/misconfigration supportYang Zhang1-0/+4
Inject nEPT fault to L1 guest. This patch is original from Xinhao. Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Jun Nakajima <[email protected]> Signed-off-by: Xinhao Xu <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-08-06x86/jump-label: Use best default nops for inital jump label callsSteven Rostedt1-2/+7
As specified by H. Peter Anvin, the best nops for x86 without knowing the running computer is: 32bit: 0x3e, 0x8d, 0x74, 0x26, 0x00 also known as GENERIC_NOP5_ATOMIC 64bit: 0x0f, 0x1f, 0x44, 0x00, 0x00 also known as P6_NOP5_ATOMIC Currently the default nop that is used by jump label is: 0xe9 0x00 0x00 0x00 0x00 Which is really a 5byte jump to the next position. It's better to use a real nop than a jmp. Cc: H. Peter Anvin <[email protected]> Cc: Jason Baron <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-08-06x86, asmlinkage, vdso: Mark vdso variables __visibleAndi Kleen1-1/+1
Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make 64bit checksum functions visibleAndi Kleen1-1/+1
They are implemented in assembler. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt opsAndi Kleen1-1/+2
Cc: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make several variables used from assembler/linker script ↵Andi Kleen3-3/+4
visible Plus one function, load_gs_index(). Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make kprobes code visible and fix assembler codeAndi Kleen1-5/+5
- Make all the external assembler template symbols __visible - Move the templates inline assembler code into a top level assembler statement, not inside a function. This avoids it being optimized away or cloned. Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make various syscalls asmlinkageAndi Kleen1-3/+3
FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use standard SYSCALL wrappers. But I didn't do that change in this patch. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make 32bit/64bit __switch_to visibleAndi Kleen1-2/+2
This function is called from inline assembler, so has to be visible. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make _*_start_kernel visibleAndi Kleen1-3/+5
Obviously these functions have to be visible, otherwise the whole kernel could be optimized away. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Make all interrupt handlers asmlinkage / __visibleAndi Kleen2-63/+59
These handlers are all referenced from assembler stubs, so need to be visible. The handlers without arguments become asmlinkage, the others __visible to not force regparms(0) on x86-32. I put it all into a single patch, please let me know if you want it it split up. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86, asmlinkage: Change dotraplinkage into __visible on 32bitAndi Kleen1-5/+1
Mark 32bit dotraplinkage functions as __visible for LTO. 64bit already is using asmlinkage which includes it. v2: Clean up (M.Marek) Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-06x86: Fix sys_call_table type in asm/syscall.hAndi Kleen1-1/+2
Make the sys_call_table type defined in asm/syscall.h match the definition in syscall_64.c v2: include asm/syscall.h in syscall_64.c too. I left uml alone because it doesn't have an syscall.h on its own and including the native one leads to other errors. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]> Cc: Richard Weinberger <[email protected]>
2013-08-05x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errorsTony Luck1-2/+11
The 0x1000 bit of the MCACOD field of machine check MCi_STATUS registers is only defined for corrected errors (where it means that hardware may be filtering errors see SDM section 15.9.2.1). For uncorrected errors it may, or may not be set - so we should mask it out when checking for the architecturaly defined recoverable error signatures (see SDM 15.9.3.1 and 15.9.3.2) Acked-by: Naveen N. Rao <[email protected]> Signed-off-by: Tony Luck <[email protected]>
2013-08-05x86: Correctly detect hypervisorJason Wang1-1/+1
We try to handle the hypervisor compatibility mode by detecting hypervisor through a specific order. This is not robust, since hypervisors may implement each others features. This patch tries to handle this situation by always choosing the last one in the CPUID leaves. This is done by letting .detect() return a priority instead of true/false and just re-using the CPUID leaf where the signature were found as the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who has the highest priority. Other sophisticated detection method could also be implemented on top. Suggested by H. Peter Anvin and Paolo Bonzini. Acked-by: K. Y. Srinivasan <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Doug Covelli <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dan Hecht <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Jason Wang <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-05x86, kvm: Switch to use hypervisor_cpuid_base()Jason Wang1-15/+9
Switch to use hypervisor_cpuid_base() to detect KVM. Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Jason Wang <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-05xen: Switch to use hypervisor_cpuid_base()Jason Wang1-15/+1
Switch to use hypervisor_cpuid_base() to detect Xen. Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Jason Wang <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-05x86: Introduce hypervisor_cpuid_base()Jason Wang1-0/+15
This patch introduce hypervisor_cpuid_base() which loop test the hypervisor existence function until the signature match and check the number of leaves if required. This could be used by Xen/KVM guest to detect the existence of hypervisor. Cc: Paolo Bonzini <[email protected]> Cc: Gleb Natapov <[email protected]> Signed-off-by: Jason Wang <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-02x86: sysfb: move EFI quirks from efifb to sysfbDavid Herrmann1-0/+57
The EFI FB quirks from efifb.c are useful for simple-framebuffer devices as well. Apply them by default so we can convert efifb.c to use efi-framebuffer platform devices. Signed-off-by: David Herrmann <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-02x86: provide platform-devices for boot-framebuffersDavid Herrmann1-0/+41
The current situation regarding boot-framebuffers (VGA, VESA/VBE, EFI) on x86 causes troubles when loading multiple fbdev drivers. The global "struct screen_info" does not provide any state-tracking about which drivers use the FBs. request_mem_region() theoretically works, but unfortunately vesafb/efifb ignore it due to quirks for broken boards. Avoid this by creating a platform framebuffer devices with a pointer to the "struct screen_info" as platform-data. Drivers can now create platform-drivers and the driver-core will refuse multiple drivers being active simultaneously. We keep the screen_info available for backwards-compatibility. Drivers can be converted in follow-up patches. Different devices are created for VGA/VESA/EFI FBs to allow multiple drivers to be loaded on distro kernels. We create: - "vesa-framebuffer" for VBE/VESA graphics FBs - "efi-framebuffer" for EFI FBs - "platform-framebuffer" for everything else This allows to load vesafb, efifb and others simultaneously and each picks up only the supported FB types. Apart from platform-framebuffer devices, this also introduces a compatibility option for "simple-framebuffer" drivers which recently got introduced for OF based systems. If CONFIG_X86_SYSFB is selected, we try to match the screen_info against a simple-framebuffer supported format. If we succeed, we create a "simple-framebuffer" device instead of a platform-framebuffer. This allows to reuse the simplefb.c driver across architectures and also to introduce a SimpleDRM driver. There is no need to have vesafb.c, efifb.c, simplefb.c and more just to have architecture specific quirks in their setup-routines. Instead, we now move the architecture specific quirks into x86-setup and provide a generic simple-framebuffer. For backwards-compatibility (if strange formats are used), we still allow vesafb/efifb to be loaded simultaneously and pick up all remaining devices. Signed-off-by: David Herrmann <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Tested-by: Stephen Warren <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-01sched/x86: Optimize switch_mm() for multi-threaded workloadsRik van Riel1-7/+13
Dick Fowles, Don Zickus and Joe Mario have been working on improvements to perf, and noticed heavy cache line contention on the mm_cpumask, running linpack on a 60 core / 120 thread system. The cause turned out to be unnecessary atomic accesses to the mm_cpumask. When in lazy TLB mode, the CPU is only removed from the mm_cpumask if there is a TLB flush event. Most of the time, no such TLB flush happens, and the kernel skips the TLB reload. It can also skip the atomic memory set & test. Here is a summary of Joe's test results: * The __schedule function dropped from 24% of all program cycles down to 5.5%. * The cacheline contention/hotness for accesses to that bitmask went from being the 1st/2nd hottest - down to the 84th hottest (0.3% of all shared misses which is now quite cold) * The average load latency for the bit-test-n-set instruction in __schedule dropped from 10k-15k cycles down to an average of 600 cycles. * The linpack program results improved from 133 GFlops to 144 GFlops. Peak GFlops rose from 133 to 153. Reported-by: Don Zickus <[email protected]> Reported-by: Joe Mario <[email protected]> Tested-by: Joe Mario <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Reviewed-by: Paul Turner <[email protected]> Acked-by: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Made the comments consistent around the modified code. ] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-29x86 / cpu topology: remove the stale macro arch_provides_topology_pointersHanjun Guo1-3/+0
Macro arch_provides_topology_pointers is pointless now, remove it. Signed-off-by: Hanjun Guo <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-07-29KVM: x86: rename EMULATE_DO_MMIOPaolo Bonzini1-2/+2
The next patch will reuse it for other userspace exits than MMIO, namely debug events. Signed-off-by: Paolo Bonzini <[email protected]>
2013-07-26cpufreq: Remove unused APERF/MPERF supportStratos Karafotis1-29/+0
The target frequency calculation method in the ondemand governor has changed and it is now independent of the measured average frequency. Consequently, the APERF/MPERF support in cpufreq is not used any more, so drop it. [rjw: Changelog] Signed-off-by: Stratos Karafotis <[email protected]> Acked-by: Viresh Kumar <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-07-23perf/x86: Add ability to calculate TSC from perf sample timestampsAdrian Hunter1-0/+1
For modern CPUs, perf clock is directly related to TSC. TSC can be calculated from perf clock and vice versa using a simple calculation. Two of the three componenets of that calculation are already exported in struct perf_event_mmap_page. This patch exports the third. Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-23kprobes/x86: Call out into INT3 handler directly instead of using notifierJiri Kosina1-0/+2
In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction patching"), the mechanism that was introduced for notifying alternatives code from int3 exception handler that and exception occured was die_notifier. This is however problematic, as early code might be using jump labels even before the notifier registration has been performed, which will then lead to an oops due to unhandled exception. One of such occurences has been encountered by Fengguang: int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8 task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000 RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225 RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006 RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002 R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0 R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8 FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0 Stack: ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48 ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68 ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98 Call Trace: <IRQ> [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79 [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84 [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0 [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41 [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80 <EOI> [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1 [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16 [<ffffffff81015f10>] default_idle+0x147/0x282 [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84 [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5 [...] Fix this by directly calling poke_int3_handler() from the int3 exception handler (analogically to what ftrace has been doing already), instead of relying on notifier, registration of which might not have yet been finalized by the time of the first trap. Reported-and-tested-by: Fengguang Wu <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-19perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5Andi Kleen1-0/+1
[KVM maintainers: The underlying support for this is in perf/core now. So please merge this patch into the KVM tree.] This is not arch perfmon, but older CPUs will just ignore it. This makes it possible to do at least some TSX measurements from a KVM guest v2: Various fixes to address review feedback v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits. v4: Use reserved bits for #GP v5: Remove obsolete argument Acked-by: Gleb Natapov <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-07-19kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functionsMasami Hiramatsu1-11/+0
Since introducing the text_poke_bp() for all text_poke_smp*() callers, text_poke_smp*() are now unused. This patch basically reverts: 3d55cc8a058e ("x86: Add text_poke_smp for SMP cross modifying code") 7deb18dcf047 ("x86: Introduce text_poke_smp_batch() for batch-code modifying") and related commits. This patch also fixes a Kconfig dependency issue on STOP_MACHINE in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD. Signed-off-by: Masami Hiramatsu <[email protected]> Reviewed-by: Jiri Kosina <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Jason Baron <[email protected]> Cc: [email protected] Cc: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/20130718114753.26675.18714.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <[email protected]>
2013-07-19Merge branch 'x86/jumplabel' into perf/coreIngo Molnar1-0/+1
Upcoming kprobes patches rely on the int3 code-patching machinery introduced by: fd4363fff3d9 x86: Introduce int3 (breakpoint)-based instruction patching Signed-off-by: Ingo Molnar <[email protected]>
2013-07-18remove sched notifier for cross-cpu migrationsMarcelo Tosatti1-1/+0
Linux as a guest on KVM hypervisor, the only user of the pvclock vsyscall interface, does not require notification on task migration because: 1. cpu ID number maps 1:1 to per-CPU pvclock time info. 2. per-CPU pvclock time info is updated if the underlying CPU changes. 3. that version is increased whenever underlying CPU changes. Which is sufficient to guarantee nanoseconds counter is calculated properly. Signed-off-by: Marcelo Tosatti <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-07-16x86: Introduce int3 (breakpoint)-based instruction patchingJiri Kosina1-0/+1
Introduce a method for run-time instruction patching on a live SMP kernel based on int3 breakpoint, completely avoiding the need for stop_machine(). The way this is achieved: - add a int3 trap to the address that will be patched - sync cores - update all but the first byte of the patched range - sync cores - replace the first byte (int3) by the first byte of replacing opcode - sync cores According to http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html synchronization after replacing "all but first" instructions should not be necessary (on Intel hardware), as the syncing after the subsequent patching of the first byte provides enough safety. But there's not only Intel HW out there, and we'd rather be on a safe side. If any CPU instruction execution would collide with the patching, it'd be trapped by the int3 breakpoint and redirected to the provided "handler" (which would typically mean just skipping over the patched region, acting as "nop" has been there, in case we are doing nop -> jump and jump -> nop transitions). Ftrace has been using this very technique since 08d636b ("ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine") for ages already, and jump labels are another obvious potential user of this. Based on activities of Masami Hiramatsu <[email protected]> a few years ago. Reviewed-by: Steven Rostedt <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-07-16x86, bitops: Change bitops to be native operand sizeH. Peter Anvin2-31/+39
Change the bitops operation to be naturally "long", i.e. 63 bits on the 64-bit kernel. Additional bugs are likely to crop up in the future. We already have bugs which machines with > 16 TiB of memory in a single node, as can happen if memory is interleaved. The x86 bitop operations take a signed index, so using an unsigned type is not an option. Jim Kukunas measured the effect of this patch on kernel size: it adds 2779 bytes to the allyesconfig kernel. Some of that probably could be elided by replacing the inline functions with macros which select the 32-bit type if the index is a 32-bit value, something like: In that case we could also use "Jr" constraints for the 64-bit version. However, this would more than double the amount of code for a relatively small gain. Note that we can't use ilog2() for _BITOPS_LONG_SHIFT, as that causes a recursive header inclusion problem. The change to constant_test_bit() should both generate better code and give correct result for negative bit indicies. As previously written the compiler had to generate extra code to create the proper wrong result for negative values. Signed-off-by: H. Peter Anvin <[email protected]> Cc: Jim Kukunas <[email protected]> Link: http://lkml.kernel.org/n/[email protected]
2013-07-14x86: delete __cpuinit usage from all x86 filesPaul Gortmaker10-16/+16
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: [email protected] Acked-by: Ingo Molnar <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: H. Peter Anvin <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2013-07-11Merge branch 'next' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: "There are not too many changes this time, except two new platform thermal drivers, ti-soc-thermal driver and x86_pkg_temp_thermal driver, and a couple of small fixes. Highlights: - move the ti-soc-thermal driver out of the staging tree to the thermal tree. - introduce the x86_pkg_temp_thermal driver. This driver registers CPU digital temperature package level sensor as a thermal zone. - small fixes/cleanups including removing redundant use of platform_set_drvdata() and of_match_ptr for all platform thermal drivers" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (34 commits) thermal: cpu_cooling: fix stub function thermal: ti-soc-thermal: use standard GPIO DT bindings thermal: MAINTAINERS: Add git tree path for SoC specific updates thermal: fix x86_pkg_temp_thermal.c build and Kconfig Thermal: Documentation for x86 package temperature thermal driver Thermal: CPU Package temperature thermal thermal: consider emul_temperature while computing trend thermal: ti-soc-thermal: add DT example for DRA752 chip thermal: ti-soc-thermal: add dra752 chip to device table thermal: ti-soc-thermal: add thermal data for DRA752 chips thermal: ti-soc-thermal: remove usage of IS_ERR_OR_NULL thermal: ti-soc-thermal: freeze FSM while computing trend thermal: ti-soc-thermal: remove external heat while extrapolating hotspot thermal: ti-soc-thermal: update DT reference for OMAP5430 x86, mcheck, therm_throt: Process package thresholds thermal: cpu_cooling: fix 'descend' check in get_property() Thermal: spear: Remove redundant use of of_match_ptr Thermal: kirkwood: Remove redundant use of of_match_ptr Thermal: dove: Remove redundant use of of_match_ptr Thermal: armada: Remove redundant use of of_match_ptr ...
2013-07-09Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds2-1/+16
Pull drm updates from Dave Airlie: "Okay this is the big one, I was stalled on the fbdev pull req as I stupidly let fbdev guys merge a patch I required to fix a warning with some patches I had, they ended up merging the patch from the wrong place, but the warning should be fixed. In future I'll just take the patch myself! Outside drm: There are some snd changes for the HDMI audio interactions on haswell, they've been acked for inclusion via my tree. This relies on the wound/wait tree from Ingo which is already merged. Major changes: AMD finally released the dynamic power management code for all their GPUs from r600->present day, this is great, off by default for now but also a huge amount of code, in fact it is most of this pull request. Since it landed there has been a lot of community testing and Alex has sent a lot of fixes for any bugs found so far. I suspect radeon might now be the biggest kernel driver ever :-P p.s. radeon.dpm=1 to enable dynamic powermanagement for anyone. New drivers: Renesas r-car display unit. Other highlights: - core: GEM CMA prime support, use new w/w mutexs for TTM reservations, cursor hotspot, doc updates - dvo chips: chrontel 7010B support - i915: Haswell (fbc, ips, vecs, watermarks, audio powerwell), Valleyview (enabled by default, rc6), lots of pll reworking, 30bpp support (this time for sure) - nouveau: async buffer object deletion, context/register init updates, kernel vp2 engine support, GF117 support, GK110 accel support (with external nvidia ucode), context cleanups. - exynos: memory leak fixes, Add S3C64XX SoC series support, device tree updates, common clock framework support, - qxl: cursor hotspot support, multi-monitor support, suspend/resume support - mgag200: hw cursor support, g200 mode limiting - shmobile: prime support - tegra: fixes mostly I've been banging on this quite a lot due to the size of it, and it seems to okay on everything I've tested it on." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (811 commits) drm/radeon/dpm: implement vblank_too_short callback for si drm/radeon/dpm: implement vblank_too_short callback for cayman drm/radeon/dpm: implement vblank_too_short callback for btc drm/radeon/dpm: implement vblank_too_short callback for evergreen drm/radeon/dpm: implement vblank_too_short callback for 7xx drm/radeon/dpm: add checks against vblank time drm/radeon/dpm: add helper to calculate vblank time drm/radeon: remove stray line in old pm code drm/radeon/dpm: fix display_gap programming on rv7xx drm/nvc0/gr: fix gpc firmware regression drm/nouveau: fix minor thinko causing bo moves to not be async on kepler drm/radeon/dpm: implement force performance level for TN drm/radeon/dpm: implement force performance level for ON/LN drm/radeon/dpm: implement force performance level for SI drm/radeon/dpm: implement force performance level for cayman drm/radeon/dpm: implement force performance levels for 7xx/eg/btc drm/radeon/dpm: add infrastructure to force performance levels drm/radeon: fix surface setup on r1xx drm/radeon: add support for 3d perf states on older asics drm/radeon: set default clocks for SI when DPM is disabled ...
2013-07-09reboot: move arch/x86 reboot= handling to generic kernelRobin Holt1-12/+0
Merge together the unicore32, arm, and x86 reboot= command line parameter handling. Signed-off-by: Robin Holt <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Russell King <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Russ Anderson <[email protected]> Cc: Robin Holt <[email protected]> Acked-by: Ingo Molnar <[email protected]> Acked-by: Guan Xuetao <[email protected]> Acked-by: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-08mce: acpi/apei: Add a boot option to disable ff mode for corrected errorsNaveen N. Rao1-0/+2
Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: Naveen N. Rao <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Tony Luck <[email protected]>
2013-07-08mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMCNaveen N. Rao1-0/+3
The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first. In this scenario, the OS is expected to not monitor for corrected errors (through CMCI/polling). Instead, the firmware notifies the OS on corrected error events through GHES. Linux already has support for GHES. This patch adds support for parsing CMC structure and to disable CMCI/polling if the firmware first flag is set. Further, the list of machine check bank structures at the end of CMC is used to determine which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. Signed-off-by: Naveen N. Rao <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Tony Luck <[email protected]>
2013-07-06Merge branch 'timers-core-for-linus' of ↵Linus Torvalds3-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: "The timer changes contain: - posix timer code consolidation and fixes for odd corner cases - sched_clock implementation moved from ARM to core code to avoid duplication by other architectures - alarm timer updates - clocksource and clockevents unregistration facilities - clocksource/events support for new hardware - precise nanoseconds RTC readout (Xen feature) - generic support for Xen suspend/resume oddities - the usual lot of fixes and cleanups all over the place The parts which touch other areas (ARM/XEN) have been coordinated with the relevant maintainers. Though this results in an handful of trivial to solve merge conflicts, which we preferred over nasty cross tree merge dependencies. The patches which have been committed in the last few days are bug fixes plus the posix timer lot. The latter was in akpms queue and next for quite some time; they just got forgotten and Frederic collected them last minute." * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits) hrtimer: Remove unused variable hrtimers: Move SMP function call to thread context clocksource: Reselect clocksource when watchdog validated high-res capability posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting posix_timers: fix racy timer delta caching on task exit posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule() selftests: add basic posix timers selftests posix_cpu_timers: consolidate expired timers check posix_cpu_timers: consolidate timer list cleanups posix_cpu_timer: consolidate expiry time type tick: Sanitize broadcast control logic tick: Prevent uncontrolled switch to oneshot mode tick: Make oneshot broadcast robust vs. CPU offlining x86: xen: Sync the CMOS RTC as well as the Xen wallclock x86: xen: Sync the wallclock when the system time is set timekeeping: Indicate that clock was set in the pvclock gtod notifier timekeeping: Pass flags instead of multiple bools to timekeeping_update() xen: Remove clock_was_set() call in the resume path hrtimers: Support resuming with two or more CPUs online (but stopped) timer: Fix jiffies wrap behavior of round_jiffies_common() ...