aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2016-03-04KVM: MMU: check kvm_mmu_pages and mmu_page_path indicesXiao Guangrong1-1/+6
Give a special invalid index to the root of the walk, so that we can check the consistency of kvm_mmu_pages and mmu_page_path. Signed-off-by: Xiao Guangrong <[email protected]> [Extracted from a bigger patch proposed by Guangrong. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: MMU: Fix ubsan warningsPaolo Bonzini1-24/+33
kvm_mmu_pages_init is doing some really yucky stuff. It is setting up a sentinel for mmu_page_clear_parents; however, because of a) the way levels are numbered starting from 1 and b) the way mmu_page_path sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be out of bounds. This is harmless because the code overwrites up to the first two elements of parents->idx and these are initialized, and because the sentinel is not needed in this case---mmu_page_clear_parents exits anyway when it gets to the end of the array. However ubsan complains, and everyone else should too. This fix does three things. First it makes the mmu_page_path arrays PT64_ROOT_LEVEL elements in size, so that we can write to them without checking the level in advance. Second it disintegrates kvm_mmu_pages_init between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and for_each_sp (to place the NULL sentinel at the end of the current path). This is okay because the mmu_page_path is only used in mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within a for_each_sp iterator, and hence always after a call to mmu_pages_next. Third it changes mmu_pages_clear_parents to just use the sentinel to stop iteration, without checking the bounds on level. Reported-by: Sasha Levin <[email protected]> Reported-by: Mike Krinkin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: MMU: cleanup handle_abnormal_pfnPaolo Bonzini1-6/+2
The goto and temporary variable are unnecessary, just use return statements. Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: VMX: use vmcs_clear/set_bits for debug register exitsPaolo Bonzini1-11/+3
Reviewed-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04Merge tag 'v4.5-rc6' into core/resources, to resolve conflictIngo Molnar39-204/+451
Signed-off-by: Ingo Molnar <[email protected]>
2016-03-04KVM: i8254: turn kvm_kpit_state.reinject into atomic_tRadim Krčmář2-5/+5
Document possible races between readers and concurrent update to the ioctl. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: move PIT timer function initializationRadim Krčmář1-2/+1
We can do it just once. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: don't assume layout of kvm_kpit_stateRadim Krčmář1-4/+8
channels has offset 0 and correct size now, but that can change. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: remove pointless dereference of PITRadim Krčmář1-2/+2
PIT is known at that point. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: remove pit and kvm from kvm_kpit_stateRadim Krčmář2-7/+9
kvm isn't ever used and pit can be accessed with container_of. If you *really* need kvm, pit_state_to_pit(ps)->kvm. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: refactor kvm_free_pitRadim Krčmář1-13/+11
Could be easier to read, but git history will become deeper. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: refactor kvm_create_pitRadim Krčmář1-15/+12
Locks are gone, so we don't need to duplicate error paths. Use goto everywhere. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: remove notifiers from PIT discard policyRadim Krčmář3-13/+38
Discard policy doesn't rely on information from notifiers, so we don't need to register notifiers unconditionally. We kept correct counts in case userspace switched between policies during runtime, but that can be avoided by reseting the state. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: remove unnecessary uses of PIT state lockRadim Krčmář3-14/+8
- kvm_create_pit had to lock only because it exposed kvm->arch.vpit very early, but initialization doesn't use kvm->arch.vpit since the last patch, so we can drop locking. - kvm_free_pit is only run after there are no users of KVM and therefore is the sole actor. - Locking in kvm_vm_ioctl_reinject doesn't do anything, because reinject is only protected at that place. - kvm_pit_reset isn't used anywhere and its locking can be dropped if we hide it. Removing useless locking allows to see what actually is being protected by PIT state lock (values accessible from the guest). Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: pass struct kvm_pit instead of kvm in PITRadim Krčmář3-72/+70
This patch passes struct kvm_pit into internal PIT functions. Those functions used to get PIT through kvm->arch.vpit, even though most of them never used *kvm for other purposes. Another benefit is that we don't need to set kvm->arch.vpit during initialization. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: tone down WARN_ON pit.state_lockRadim Krčmář1-14/+3
If the guest could hit this, it would hang the host kernel, bacause of sheer number of those reports. Internal callers have to be sensible anyway, so we now only check for it in an API function. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: use atomic_t instead of pit.inject_lockRadim Krčmář2-35/+24
The lock was an overkill, the same can be done with atomics. A mb() was added in kvm_pit_ack_irq, to pair with implicit barrier between pit_timer_fn and pit_do_work. The mb() prevents a race that could happen if pending == 0 and irq_ack == 0: kvm_pit_ack_irq: | pit_timer_fn: p = atomic_read(&ps->pending); | | atomic_inc(&ps->pending); | queue_work(pit_do_work); | pit_do_work: | atomic_xchg(&ps->irq_ack, 0); | return; atomic_set(&ps->irq_ack, 1); | if (p == 0) return; | where the interrupt would not be delivered in this tick of pit_timer_fn. PIT would have eventually delivered the interrupt, but we sacrifice perofmance to make sure that interrupts are not needlessly delayed. sfence isn't enough: atomic_dec_if_positive does atomic_read first and x86 can reorder loads before stores. lfence isn't enough: store can pass lfence, turning it into a nop. A compiler barrier would be more than enough as CPU needs to stall for unbelievably long to use fences. This patch doesn't do anything in kvm_pit_reset_reinject, because any order of resets can race, but the result differs by at most one interrupt, which is ok, because it's the same result as if the reset happened at a slightly different time. (Original code didn't protect the reset path with a proper lock, so users have to be robust.) Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: add kvm_pit_reset_reinjectRadim Krčmář1-8/+10
pit_state.pending and pit_state.irq_ack are always reset at the same time. Create a function for them. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: simplify atomics in kvm_pit_ack_irqRadim Krčmář1-11/+1
We already have a helper that does the same thing. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: change PIT discard tick policyRadim Krčmář1-5/+7
Discard policy uses ack_notifiers to prevent injection of PIT interrupts before EOI from the last one. This patch changes the policy to always try to deliver the interrupt, which makes a difference when its vector is in ISR. Old implementation would drop the interrupt, but proposed one injects to IRR, like real hardware would. The old policy breaks legacy NMI watchdogs, where PIT is used through virtual wire (LVT0): PIT never sends an interrupt before receiving EOI, thus a guest deadlock with disabled interrupts will stop NMIs. Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt through IOAPIC. (KVM's PIT is deeply rotten and luckily not used much in modern systems.) Even though there is a chance of regressions, I think we can fix the LVT0 NMI bug without introducing a new tick policy. Cc: <[email protected]> Reported-by: Yuki Shibuya <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03x86/tsc: Always Running Timer (ART) correlated clocksourceChristopher S. Hall3-1/+62
On modern Intel systems TSC is derived from the new Always Running Timer (ART). ART can be captured simultaneous to the capture of audio and network device clocks, allowing a correlation between timebases to be constructed. Upon capture, the driver converts the captured ART value to the appropriate system clock using the correlated clocksource mechanism. On systems that support ART a new CPUID leaf (0x15) returns parameters “m” and “n” such that: TSC_value = (ART_value * m) / n + k [n >= 1] [k is an offset that can adjusted by a privileged agent. The IA32_TSC_ADJUST MSR is an example of an interface to adjust k. See 17.14.4 of the Intel SDM for more details] Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Christopher S. Hall <[email protected]> [jstultz: Tweaked to fix build issue, also reworked math for 64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build fixes] Signed-off-by: John Stultz <[email protected]>
2016-03-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-10/+13
Pull KVM fixes from Paolo Bonzini: - ARM/MIPS: Fixes for ioctls when copy_from_user returns nonzero - x86: Small fix for Skylake TSC scaling - x86: Improved fix for last week's missed hardware breakpoint bug * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Update tsc multiplier on change. mips/kvm: fix ioctl error handling arm/arm64: KVM: Fix ioctl error handling KVM: x86: fix root cause for missed hardware breakpoints
2016-03-03xen/x86: Drop mode-selecting ifdefs in startup_xen()Boris Ostrovsky1-7/+3
Use asm/asm.h macros instead. Signed-off-by: Boris Ostrovsky <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2016-03-03xen/x86: Zero out .bss for PV guestsBoris Ostrovsky1-0/+9
ELF spec is unclear about whether .bss must me cleared by the loader. Currently the domain builder does it when loading the guest but because it is not (or rather may not be) guaranteed we should zero it out explicitly. Signed-off-by: Boris Ostrovsky <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2016-03-03x86/mm/pkeys: Fix access_error() denial of writes to write-only VMADave Hansen1-18/+0
Andrey Wagin reported that a simple test case was broken by: 2b5f7d013fc ("mm/core, x86/mm/pkeys: Add execute-only protection keys support") This test case creates an unreadable VMA and my patch assumed that all writes must be to readable VMAs. The simplest fix for this is to remove the pkey-related bits in access_error(). For execute-only support, I believe the existing version is sufficient because the permissions we are trying to enforce are entirely expressed in vma->vm_flags. We just depend on pkeys to get *an* exception, it does not matter that PF_PK was set, or even what state PKRU is in. I will re-add the necessary bits with the full pkeys implementation that includes the new syscalls. The three cases that matter are: 1. If a write to an execute-only VMA occurs, we will see PF_WRITE set, but !VM_WRITE on the VMA, and return 1. All execute-only VMAs have VM_WRITE clear by definition. 2. If a read occurs on a present PTE, we will fall in to the "read, present" case and return 1. 3. If a read occurs to a non-present PTE, we will miss the "read, not present" case, because the execute-only VMA will have VM_EXEC set, and we will properly return 0 allowing the PTE to be populated. Test program: int main() { int *p; p = mmap(NULL, 4096, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); p[0] = 1; return 0; } Reported-by: Andrey Wagin <[email protected]>, Signed-off-by: Dave Hansen <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 62b5f7d013fc ("mm/core, x86/mm/pkeys: Add execute-only protection keys support") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03x86/asm/decoder: Use explicitly signed charsJosh Poimboeuf1-3/+3
When running objtool on a ppc64le host to analyze x86 binaries, it reports a lot of false warnings like: ipc/compat_mq.o: warning: objtool: compat_SyS_mq_open()+0x91: can't find jump dest instruction at .text+0x3a5 The warnings are caused by the x86 instruction decoder setting the wrong value for the jump instruction's immediate field because it assumes that "char == signed char", which isn't true for all architectures. When converting char to int, gcc sign-extends on x86 but doesn't sign-extend on ppc64le. According to the gcc man page, that's a feature, not a bug: > Each kind of machine has a default for what "char" should be. It is > either like "unsigned char" by default or like "signed char" by > default. > > Ideally, a portable program should always use "signed char" or > "unsigned char" when it depends on the signedness of an object. Conform to the "standards" by changing the "char" casts to "signed char". This results in no actual changes to the object code on x86. Note: the x86 decoder now lives in three different locations in the kernel tree, which are all kept in sync via makefile checks and warnings: in-kernel, perf, and objtool. This fixes all three locations. Eventually we should probably try to at least converge the two separate "tools" locations into a single shared location. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/9dd4161719b20e6def9564646d68bfbe498c549f.1456962210.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03KVM: MMU: apply page track notifierXiao Guangrong3-6/+22
Register the notifier to receive write track event so that we can update our shadow page table It makes kvm_mmu_pte_write() be the callback of the notifier, no function is changed Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: simplify mmu_need_write_protectXiao Guangrong1-22/+7
Now, all non-leaf shadow page are page tracked, if gfn is not tracked there is no non-leaf shadow page of gfn is existed, we can directly make the shadow page of gfn to unsync Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: use page track for non-leaf shadow pagesXiao Guangrong1-5/+21
non-leaf shadow pages are always write protected, it can be the user of page track Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: page track: add notifier supportXiao Guangrong4-0/+114
Notifier list is introduced so that any node wants to receive the track event can register to the list Two APIs are introduced here: - kvm_page_track_register_notifier(): register the notifier to receive track event - kvm_page_track_unregister_notifier(): stop receiving track event by unregister the notifier The callback, node->track_write() is called when a write access on the write tracked page happens Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: clear write-flooding on the fast path of tracked pageXiao Guangrong3-4/+24
If the page fault is caused by write access on write tracked page, the real shadow page walking is skipped, we lost the chance to clear write flooding for the page structure current vcpu is using Fix it by locklessly waking shadow page table to clear write flooding on the shadow page structure out of mmu-lock. So that we change the count to atomic_t Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: let page fault handler be aware tracked pageXiao Guangrong4-7/+57
The page fault caused by write access on the write tracked page can not be fixed, it always need to be emulated. page_fault_handle_page_track() is the fast path we introduce here to skip holding mmu-lock and shadow page table walking However, if the page table is not present, it is worth making the page table entry present and readonly to make the read access happy mmu_need_write_protect() need to be cooked to avoid page becoming writable when making page table present or sync/prefetch shadow page table entries Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: page track: introduce kvm_slot_page_track_{add,remove}_pageXiao Guangrong2-0/+92
These two functions are the user APIs: - kvm_slot_page_track_add_page(): add the page to the tracking pool after that later specified access on that page will be tracked - kvm_slot_page_track_remove_page(): remove the page from the tracking pool, the specified access on the page is not tracked after the last user is gone Both of these are called under the protection both of mmu-lock and kvm->srcu or kvm->slots_lock Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: page track: add the framework of guest page trackingXiao Guangrong5-1/+82
The array, gfn_track[mode][gfn], is introduced in memory slot for every guest page, this is the tracking count for the gust page on different modes. If the page is tracked then the count is increased, the page is not tracked after the count reaches zero We use 'unsigned short' as the tracking count which should be enough as shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2 for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there is enough room for other trackers Two callbacks, kvm_page_track_create_memslot() and kvm_page_track_free_memslot() are implemented in this patch, they are internally used to initialize and reclaim the memory of the array Currently, only write track mode is supported Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: introduce kvm_mmu_slot_gfn_write_protectXiao Guangrong2-5/+13
Split rmap_write_protect() and introduce the function to abstract the write protection based on the slot This function will be used in the later patch Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpageXiao Guangrong2-13/+28
Abstract the common operations from account_shadowed() and unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage() and kvm_mmu_gfn_allow_lpage() These two functions will be used by page tracking in the later patch Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowedXiao Guangrong3-19/+22
kvm_lpage_info->write_count is used to detect if the large page mapping for the gfn on the specified level is allowed, rename it to disallow_lpage to reflect its purpose, also we rename has_wrprotected_page() to mmu_gfn_lpage_is_disallowed() to make the code more clearer Later we will extend this mechanism for page tracking: if the gfn is tracked then large mapping for that gfn on any level is not allowed. The new name is more straightforward Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03kvm: x86: Check dest_map->vector to match eoi signals for rtcJoerg Roedel1-3/+14
Using the vector stored at interrupt delivery makes the eoi matching safe agains irq migration in the ioapic. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03kvm: x86: Track irq vectors in ioapic->rtc_status.dest_mapJoerg Roedel2-1/+10
This allows backtracking later in case the rtc irq has been moved to another vcpu/vector. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03kvm: x86: Convert ioapic->rtc_status.dest_map to a structJoerg Roedel5-16/+26
Currently this is a bitmap which tracks which CPUs we expect an EOI from. Move this bitmap to a struct so that we can track additional information there. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03perf/x86/uncore: Fix build on UP-IOAPIC configsIngo Molnar1-0/+2
Commit: cf6d445f6897 ("perf/x86/uncore: Track packages, not per CPU data") reorganized the uncore code to track packages, and introduced a dependency on MAX_APIC_ID. This constant is not available on UP-IOAPIC builds: arch/x86/events/intel/uncore.c:1350:44: error: 'MAX_LOCAL_APIC' undeclared here (not in a function) Include asm/apicdef.h explicitly to pick it up. Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Harish Chegondi <[email protected]> Cc: Jacob Pan <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03PM / sleep / x86: Fix crash on graph trace through x86 suspendTodd E Brandt1-0/+7
Pause/unpause graph tracing around do_suspend_lowlevel as it has inconsistent call/return info after it jumps to the wakeup vector. The graph trace buffer will otherwise become misaligned and may eventually crash and hang on suspend. To reproduce the issue and test the fix: Run a function_graph trace over suspend/resume and set the graph function to suspend_devices_and_enter. This consistently hangs the system without this fix. Signed-off-by: Todd Brandt <[email protected]> Cc: All applicable <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-03-02kvm: x86: Update tsc multiplier on change.Owen Hofmann1-5/+9
vmx.c writes the TSC_MULTIPLIER field in vmx_vcpu_load, but only when a vcpu has migrated physical cpus. Record the last value written and update in vmx_vcpu_load on any change, otherwise a cpu migration must occur for TSC frequency scaling to take effect. Cc: [email protected] Fixes: ff2c3a1803775cc72dc6f624b59554956396b0ee Signed-off-by: Owen Hofmann <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-01arch/hotplug: Call into idle with a proper stateThomas Gleixner2-2/+2
Let the non boot cpus call into idle with the corresponding hotplug state, so the hotplug core can handle the further bringup. That's a first step to convert the boot side of the hotplugged cpus to do all the synchronization with the other side through the state machine. For now it'll only start the hotplug thread and kick the full bringup of the cpu. Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Rafael Wysocki <[email protected]> Cc: "Srivatsa S. Bhat" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Sebastian Siewior <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul McKenney <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Turner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-02-29Merge tag 'v4.5-rc6' into locking/core, to pick up fixesIngo Molnar24-93/+208
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵Ingo Molnar25-95/+211
applying new changes Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29perf/x86/intel/rapl: Convert it to a per package facilityThomas Gleixner1-108/+86
RAPL is a per package facility and we already have a mechanism for a dedicated per package reader. So there is no point to have multiple CPUs doing the same. The current implementation actually starts two timers on two CPUs if one does: perf stat -C1,2 -e -e power/energy-pkg .... which makes the whole concept of 1 reader per package moot. What's worse is that the above returns the double of the actual energy consumption, but that's a different problem to address and cannot be solved by removing the pointless per cpuness of that mechanism. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Harish Chegondi <[email protected]> Cc: Jacob Pan <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29perf/x86/intel/rapl: Utilize event->pmu_privateThomas Gleixner1-4/+12
Store the PMU pointer in event->pmu_private and use it instead of the per CPU data. Preparatory step to get rid of the per CPU allocations. The usage sites are the perf fast path, so we keep that even after the conversion to per package storage as a CPU to package lookup involves 3 loads versus 1 with the pmu_private pointer. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Harish Chegondi <[email protected]> Cc: Jacob Pan <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29perf/x86/intel/rapl: Make PMU lock rawThomas Gleixner1-10/+10
This lock is taken in hard interrupt context even on Preempt-RT. Make it raw so RT does not have to patch it. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Harish Chegondi <[email protected]> Cc: Jacob Pan <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29perf/x86/intel/rapl: Refactor the code some moreThomas Gleixner1-30/+31
Split out code from init into seperate functions. Tidy up the code and get rid of pointless comments. I wish there would be comments for code which is not obvious.... Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Harish Chegondi <[email protected]> Cc: Jacob Pan <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>