aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2016-03-04Merge tag 'v4.5-rc6' into core/resources, to resolve conflictIngo Molnar219-905/+1669
Signed-off-by: Ingo Molnar <[email protected]>
2016-03-04KVM: i8254: turn kvm_kpit_state.reinject into atomic_tRadim Krčmář2-5/+5
Document possible races between readers and concurrent update to the ioctl. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: move PIT timer function initializationRadim Krčmář1-2/+1
We can do it just once. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: don't assume layout of kvm_kpit_stateRadim Krčmář1-4/+8
channels has offset 0 and correct size now, but that can change. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: remove pointless dereference of PITRadim Krčmář1-2/+2
PIT is known at that point. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: remove pit and kvm from kvm_kpit_stateRadim Krčmář2-7/+9
kvm isn't ever used and pit can be accessed with container_of. If you *really* need kvm, pit_state_to_pit(ps)->kvm. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: refactor kvm_free_pitRadim Krčmář1-13/+11
Could be easier to read, but git history will become deeper. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: refactor kvm_create_pitRadim Krčmář1-15/+12
Locks are gone, so we don't need to duplicate error paths. Use goto everywhere. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: remove notifiers from PIT discard policyRadim Krčmář3-13/+38
Discard policy doesn't rely on information from notifiers, so we don't need to register notifiers unconditionally. We kept correct counts in case userspace switched between policies during runtime, but that can be avoided by reseting the state. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: remove unnecessary uses of PIT state lockRadim Krčmář3-14/+8
- kvm_create_pit had to lock only because it exposed kvm->arch.vpit very early, but initialization doesn't use kvm->arch.vpit since the last patch, so we can drop locking. - kvm_free_pit is only run after there are no users of KVM and therefore is the sole actor. - Locking in kvm_vm_ioctl_reinject doesn't do anything, because reinject is only protected at that place. - kvm_pit_reset isn't used anywhere and its locking can be dropped if we hide it. Removing useless locking allows to see what actually is being protected by PIT state lock (values accessible from the guest). Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: pass struct kvm_pit instead of kvm in PITRadim Krčmář3-72/+70
This patch passes struct kvm_pit into internal PIT functions. Those functions used to get PIT through kvm->arch.vpit, even though most of them never used *kvm for other purposes. Another benefit is that we don't need to set kvm->arch.vpit during initialization. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: tone down WARN_ON pit.state_lockRadim Krčmář1-14/+3
If the guest could hit this, it would hang the host kernel, bacause of sheer number of those reports. Internal callers have to be sensible anyway, so we now only check for it in an API function. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: use atomic_t instead of pit.inject_lockRadim Krčmář2-35/+24
The lock was an overkill, the same can be done with atomics. A mb() was added in kvm_pit_ack_irq, to pair with implicit barrier between pit_timer_fn and pit_do_work. The mb() prevents a race that could happen if pending == 0 and irq_ack == 0: kvm_pit_ack_irq: | pit_timer_fn: p = atomic_read(&ps->pending); | | atomic_inc(&ps->pending); | queue_work(pit_do_work); | pit_do_work: | atomic_xchg(&ps->irq_ack, 0); | return; atomic_set(&ps->irq_ack, 1); | if (p == 0) return; | where the interrupt would not be delivered in this tick of pit_timer_fn. PIT would have eventually delivered the interrupt, but we sacrifice perofmance to make sure that interrupts are not needlessly delayed. sfence isn't enough: atomic_dec_if_positive does atomic_read first and x86 can reorder loads before stores. lfence isn't enough: store can pass lfence, turning it into a nop. A compiler barrier would be more than enough as CPU needs to stall for unbelievably long to use fences. This patch doesn't do anything in kvm_pit_reset_reinject, because any order of resets can race, but the result differs by at most one interrupt, which is ok, because it's the same result as if the reset happened at a slightly different time. (Original code didn't protect the reset path with a proper lock, so users have to be robust.) Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: add kvm_pit_reset_reinjectRadim Krčmář1-8/+10
pit_state.pending and pit_state.irq_ack are always reset at the same time. Create a function for them. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: simplify atomics in kvm_pit_ack_irqRadim Krčmář1-11/+1
We already have a helper that does the same thing. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-04KVM: i8254: change PIT discard tick policyRadim Krčmář1-5/+7
Discard policy uses ack_notifiers to prevent injection of PIT interrupts before EOI from the last one. This patch changes the policy to always try to deliver the interrupt, which makes a difference when its vector is in ISR. Old implementation would drop the interrupt, but proposed one injects to IRR, like real hardware would. The old policy breaks legacy NMI watchdogs, where PIT is used through virtual wire (LVT0): PIT never sends an interrupt before receiving EOI, thus a guest deadlock with disabled interrupts will stop NMIs. Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt through IOAPIC. (KVM's PIT is deeply rotten and luckily not used much in modern systems.) Even though there is a chance of regressions, I think we can fix the LVT0 NMI bug without introducing a new tick policy. Cc: <[email protected]> Reported-by: Yuki Shibuya <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03powerpc/fsl-book3e: Avoid lbarx on e5500Scott Wood1-0/+13
lbarx/stbcx. are implemented on e6500, but not on e5500. Likewise, SMT is on e6500, but not on e5500. So, avoid executing an unimplemented instruction by only locking when needed (i.e. in the presence of SMT). Signed-off-by: Scott Wood <[email protected]>
2016-03-03x86/tsc: Always Running Timer (ART) correlated clocksourceChristopher S. Hall3-1/+62
On modern Intel systems TSC is derived from the new Always Running Timer (ART). ART can be captured simultaneous to the capture of audio and network device clocks, allowing a correlation between timebases to be constructed. Upon capture, the driver converts the captured ART value to the appropriate system clock using the correlated clocksource mechanism. On systems that support ART a new CPUID leaf (0x15) returns parameters “m” and “n” such that: TSC_value = (ART_value * m) / n + k [n >= 1] [k is an offset that can adjusted by a privileged agent. The IA32_TSC_ADJUST MSR is an example of an interface to adjust k. See 17.14.4 of the Intel SDM for more details] Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Christopher S. Hall <[email protected]> [jstultz: Tweaked to fix build issue, also reworked math for 64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build fixes] Signed-off-by: John Stultz <[email protected]>
2016-03-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-14/+17
Pull KVM fixes from Paolo Bonzini: - ARM/MIPS: Fixes for ioctls when copy_from_user returns nonzero - x86: Small fix for Skylake TSC scaling - x86: Improved fix for last week's missed hardware breakpoint bug * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Update tsc multiplier on change. mips/kvm: fix ioctl error handling arm/arm64: KVM: Fix ioctl error handling KVM: x86: fix root cause for missed hardware breakpoints
2016-03-03arm64: enable CONFIG_DEBUG_RODATA by defaultArd Biesheuvel1-3/+3
In spite of its name, CONFIG_DEBUG_RODATA is an important hardening feature for production kernels, and distros all enable it by default in their kernel configs. However, since enabling it used to result in more granular, and thus less efficient kernel mappings, it is not enabled by default for performance reasons. However, since commit 2f39b5f91eb4 ("arm64: mm: Mark .rodata as RO"), the various kernel segments (.text, .rodata, .init and .data) are already mapped individually, and the only effect of setting CONFIG_DEBUG_RODATA is that the existing .text and .rodata mappings are updated late in the boot sequence to have their read-only attributes set, which means that any performance concerns related to enabling CONFIG_DEBUG_RODATA are no longer valid. So from now on, make CONFIG_DEBUG_RODATA default to 'y' Signed-off-by: Ard Biesheuvel <[email protected]> Acked-by: Mark Rutland <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2016-03-03xen/x86: Drop mode-selecting ifdefs in startup_xen()Boris Ostrovsky1-7/+3
Use asm/asm.h macros instead. Signed-off-by: Boris Ostrovsky <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2016-03-03xen/x86: Zero out .bss for PV guestsBoris Ostrovsky1-0/+9
ELF spec is unclear about whether .bss must me cleared by the loader. Currently the domain builder does it when loading the guest but because it is not (or rather may not be) guaranteed we should zero it out explicitly. Signed-off-by: Boris Ostrovsky <[email protected]> Signed-off-by: David Vrabel <[email protected]>
2016-03-03x86/mm/pkeys: Fix access_error() denial of writes to write-only VMADave Hansen1-18/+0
Andrey Wagin reported that a simple test case was broken by: 2b5f7d013fc ("mm/core, x86/mm/pkeys: Add execute-only protection keys support") This test case creates an unreadable VMA and my patch assumed that all writes must be to readable VMAs. The simplest fix for this is to remove the pkey-related bits in access_error(). For execute-only support, I believe the existing version is sufficient because the permissions we are trying to enforce are entirely expressed in vma->vm_flags. We just depend on pkeys to get *an* exception, it does not matter that PF_PK was set, or even what state PKRU is in. I will re-add the necessary bits with the full pkeys implementation that includes the new syscalls. The three cases that matter are: 1. If a write to an execute-only VMA occurs, we will see PF_WRITE set, but !VM_WRITE on the VMA, and return 1. All execute-only VMAs have VM_WRITE clear by definition. 2. If a read occurs on a present PTE, we will fall in to the "read, present" case and return 1. 3. If a read occurs to a non-present PTE, we will miss the "read, not present" case, because the execute-only VMA will have VM_EXEC set, and we will properly return 0 allowing the PTE to be populated. Test program: int main() { int *p; p = mmap(NULL, 4096, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); p[0] = 1; return 0; } Reported-by: Andrey Wagin <[email protected]>, Signed-off-by: Dave Hansen <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 62b5f7d013fc ("mm/core, x86/mm/pkeys: Add execute-only protection keys support") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03x86/asm/decoder: Use explicitly signed charsJosh Poimboeuf1-3/+3
When running objtool on a ppc64le host to analyze x86 binaries, it reports a lot of false warnings like: ipc/compat_mq.o: warning: objtool: compat_SyS_mq_open()+0x91: can't find jump dest instruction at .text+0x3a5 The warnings are caused by the x86 instruction decoder setting the wrong value for the jump instruction's immediate field because it assumes that "char == signed char", which isn't true for all architectures. When converting char to int, gcc sign-extends on x86 but doesn't sign-extend on ppc64le. According to the gcc man page, that's a feature, not a bug: > Each kind of machine has a default for what "char" should be. It is > either like "unsigned char" by default or like "signed char" by > default. > > Ideally, a portable program should always use "signed char" or > "unsigned char" when it depends on the signedness of an object. Conform to the "standards" by changing the "char" casts to "signed char". This results in no actual changes to the object code on x86. Note: the x86 decoder now lives in three different locations in the kernel tree, which are all kept in sync via makefile checks and warnings: in-kernel, perf, and objtool. This fixes all three locations. Eventually we should probably try to at least converge the two separate "tools" locations into a single shared location. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/9dd4161719b20e6def9564646d68bfbe498c549f.1456962210.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03KVM: MMU: apply page track notifierXiao Guangrong3-6/+22
Register the notifier to receive write track event so that we can update our shadow page table It makes kvm_mmu_pte_write() be the callback of the notifier, no function is changed Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: simplify mmu_need_write_protectXiao Guangrong1-22/+7
Now, all non-leaf shadow page are page tracked, if gfn is not tracked there is no non-leaf shadow page of gfn is existed, we can directly make the shadow page of gfn to unsync Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: use page track for non-leaf shadow pagesXiao Guangrong1-5/+21
non-leaf shadow pages are always write protected, it can be the user of page track Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: page track: add notifier supportXiao Guangrong4-0/+114
Notifier list is introduced so that any node wants to receive the track event can register to the list Two APIs are introduced here: - kvm_page_track_register_notifier(): register the notifier to receive track event - kvm_page_track_unregister_notifier(): stop receiving track event by unregister the notifier The callback, node->track_write() is called when a write access on the write tracked page happens Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: clear write-flooding on the fast path of tracked pageXiao Guangrong3-4/+24
If the page fault is caused by write access on write tracked page, the real shadow page walking is skipped, we lost the chance to clear write flooding for the page structure current vcpu is using Fix it by locklessly waking shadow page table to clear write flooding on the shadow page structure out of mmu-lock. So that we change the count to atomic_t Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: let page fault handler be aware tracked pageXiao Guangrong4-7/+57
The page fault caused by write access on the write tracked page can not be fixed, it always need to be emulated. page_fault_handle_page_track() is the fast path we introduce here to skip holding mmu-lock and shadow page table walking However, if the page table is not present, it is worth making the page table entry present and readonly to make the read access happy mmu_need_write_protect() need to be cooked to avoid page becoming writable when making page table present or sync/prefetch shadow page table entries Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: page track: introduce kvm_slot_page_track_{add,remove}_pageXiao Guangrong2-0/+92
These two functions are the user APIs: - kvm_slot_page_track_add_page(): add the page to the tracking pool after that later specified access on that page will be tracked - kvm_slot_page_track_remove_page(): remove the page from the tracking pool, the specified access on the page is not tracked after the last user is gone Both of these are called under the protection both of mmu-lock and kvm->srcu or kvm->slots_lock Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: page track: add the framework of guest page trackingXiao Guangrong5-1/+82
The array, gfn_track[mode][gfn], is introduced in memory slot for every guest page, this is the tracking count for the gust page on different modes. If the page is tracked then the count is increased, the page is not tracked after the count reaches zero We use 'unsigned short' as the tracking count which should be enough as shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2 for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there is enough room for other trackers Two callbacks, kvm_page_track_create_memslot() and kvm_page_track_free_memslot() are implemented in this patch, they are internally used to initialize and reclaim the memory of the array Currently, only write track mode is supported Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: introduce kvm_mmu_slot_gfn_write_protectXiao Guangrong2-5/+13
Split rmap_write_protect() and introduce the function to abstract the write protection based on the slot This function will be used in the later patch Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpageXiao Guangrong2-13/+28
Abstract the common operations from account_shadowed() and unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage() and kvm_mmu_gfn_allow_lpage() These two functions will be used by page tracking in the later patch Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowedXiao Guangrong3-19/+22
kvm_lpage_info->write_count is used to detect if the large page mapping for the gfn on the specified level is allowed, rename it to disallow_lpage to reflect its purpose, also we rename has_wrprotected_page() to mmu_gfn_lpage_is_disallowed() to make the code more clearer Later we will extend this mechanism for page tracking: if the gfn is tracked then large mapping for that gfn on any level is not allowed. The new name is more straightforward Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Xiao Guangrong <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03kvm: x86: Check dest_map->vector to match eoi signals for rtcJoerg Roedel1-3/+14
Using the vector stored at interrupt delivery makes the eoi matching safe agains irq migration in the ioapic. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03kvm: x86: Track irq vectors in ioapic->rtc_status.dest_mapJoerg Roedel2-1/+10
This allows backtracking later in case the rtc irq has been moved to another vcpu/vector. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03kvm: x86: Convert ioapic->rtc_status.dest_map to a structJoerg Roedel5-16/+26
Currently this is a bitmap which tracks which CPUs we expect an EOI from. Move this bitmap to a struct so that we can track additional information there. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini21-88/+945
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The highlights are: * Enable VFIO device on PowerPC, from David Gibson * Optimizations to speed up IPIs between vcpus in HV KVM, from Suresh Warrier (who is also Suresh E. Warrier) * In-kernel handling of IOMMU hypercalls, and support for dynamic DMA windows (DDW), from Alexey Kardashevskiy. Signed-off-by: Paolo Bonzini <[email protected]>
2016-03-03powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint eventRavi Bangoria1-1/+2
When destroying a hw_breakpoint event, the kernel oopses as follows: Unable to handle kernel paging request for data at address 0x00000c07 NIP [c0000000000291d0] arch_unregister_hw_breakpoint+0x40/0x60 LR [c00000000020b6b4] release_bp_slot+0x44/0x80 Call chain: hw_breakpoint_event_init() bp->destroy = bp_perf_event_destroy; do_exit() perf_event_exit_task() perf_event_exit_task_context() WRITE_ONCE(child_ctx->task, TASK_TOMBSTONE); perf_event_exit_event() free_event() _free_event() bp_perf_event_destroy() // event->destroy(event); release_bp_slot() arch_unregister_hw_breakpoint() perf_event_exit_task_context() sets child_ctx->task as TASK_TOMBSTONE which is (void *)-1. arch_unregister_hw_breakpoint() tries to fetch 'thread' attribute of 'task' resulting in oops. Peterz points out that the code shouldn't be using bp->ctx anyway, but fixing that will require a decent amount of rework. So for now to fix the oops, check if bp->ctx->task has been set to (void *)-1, before dereferencing it. We don't use TASK_TOMBSTONE, because that would require exporting it and it's supposed to be an internal detail. Fixes: 63b6da39bb38 ("perf: Fix perf_event_exit_task() race") Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-03powerpc/mm: Move hash64 tlbflush code into a new headerAneesh Kumar K.V2-91/+95
No code changes. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-03powerpc/mm: Move hash related mmu-*.h headers to book3s/Aneesh Kumar K.V13-12/+12
No code changes. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-03powerpc/mm: add _PAGE_HASHPTE similar to 4K hashAneesh Kumar K.V1-1/+1
We don't need to update linux page table entry with _PAGE_HASHPTE early in hash pte fault. A parallel pte update will loop via _PAGE_BUSY and look at _PAGE_HASHPTE for a required hpte flush only if _PAGE_BUSY is cleared. That ensures a pte update will wait for a parallel hpte insert to finish before looking at _PAGE_HASHPTE bit. To avoid further confusion drop setting _PAGE_HASHPTE in cmpxchg in __hash_page_4K. commit 41743a4e34f0 ("powerpc: Free a PTE bit on ppc64 with 64K pages") did similar change for 64K config Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-03powerp/mm: Update code commentsAneesh Kumar K.V2-3/+3
We are updating pte in those functions. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-03mm: Some arch may want to use HPAGE_PMD related values as variablesKirill A. Shutemov1-0/+7
With next generation power processor, we are having a new mmu model [1] that require us to maintain a different linux page table format. Inorder to support both current and future ppc64 systems with a single kernel we need to make sure kernel can select between different page table format at runtime. With the new MMU (radix MMU) added, we will have two different pmd hugepage size 16MB for hash model and 2MB for Radix model. Hence make HPAGE_PMD related values as a variable. Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a followup patch. [1] http://ibm.biz/power-isa3 (Needs registration). Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-03powerpc/mm: Switch book3s 64 with 64K page size to 4 level page tableAneesh Kumar K.V8-64/+102
This is needed so that we can support both hash and radix page table using single kernel. Radix kernel uses a 4 level table. We now use physical address in upper page table tree levels. Even though they are aligned to their size, for the masked bits we use the bit positions as per PowerISA 3.0. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-03perf/x86/uncore: Fix build on UP-IOAPIC configsIngo Molnar1-0/+2
Commit: cf6d445f6897 ("perf/x86/uncore: Track packages, not per CPU data") reorganized the uncore code to track packages, and introduced a dependency on MAX_APIC_ID. This constant is not available on UP-IOAPIC builds: arch/x86/events/intel/uncore.c:1350:44: error: 'MAX_LOCAL_APIC' undeclared here (not in a function) Include asm/apicdef.h explicitly to pick it up. Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Harish Chegondi <[email protected]> Cc: Jacob Pan <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03powerpc/mm: Don't have conditional defines for real_pte_tAneesh Kumar K.V2-22/+9
We remove real_pte_t out of STRICT_MM_TYPESCHECK. Reviewed-by: Paul Mackerras <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-03powerpc/mm: Split pgtable types to separate headerAneesh Kumar K.V2-103/+109
We move the page table accessors into a separate header. We will later add a big endian variant of the table which is needed for radix. No functionality change only code movement. Reviewed-by: Paul Mackerras <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-03PM / sleep / x86: Fix crash on graph trace through x86 suspendTodd E Brandt1-0/+7
Pause/unpause graph tracing around do_suspend_lowlevel as it has inconsistent call/return info after it jumps to the wakeup vector. The graph trace buffer will otherwise become misaligned and may eventually crash and hang on suspend. To reproduce the issue and test the fix: Run a function_graph trace over suspend/resume and set the graph function to suspend_devices_and_enter. This consistently hangs the system without this fix. Signed-off-by: Todd Brandt <[email protected]> Cc: All applicable <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>