aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-07-01powerpc/nvram64: Need return the related error code on failure occursChen Gang1-6/+14
When error occurs, need return the related error code to let upper caller know about it. ppc_md.nvram_size() can return the error code (e.g. core99_nvram_size() in 'arch/powerpc/platforms/powermac/nvram.c'). Also set ret value when only need it, so can save structions for normal cases. Signed-off-by: Chen Gang <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc: Set cpu sibling mask before online cpuLi Zhong1-3/+5
It seems following race is possible: cpu0 cpux smp_init->cpu_up->_cpu_up __cpu_up kick_cpu(1) ------------------------------------------------------------------------- waiting online ... ... notify CPU_STARTING set cpux active set cpux online ------------------------------------------------------------------------- finish waiting online ... sched_init_smp init_sched_domains(cpu_active_mask) build_sched_domains set cpux sibling info ------------------------------------------------------------------------- Execution of cpu0 and cpux could be concurrent between two separator lines. So if the cpux sibling information was set too late (normally impossible, but could be triggered by adding some delay in start_secondary, after setting cpu online), build_sched_domains() running on cpu0 might see cpux active, with an empty sibling mask, then cause some bad address accessing like following: [ 0.099855] Unable to handle kernel paging request for data at address 0xc00000038518078f [ 0.099868] Faulting instruction address: 0xc0000000000b7a64 [ 0.099883] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.099895] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 0.099922] Modules linked in: [ 0.099940] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-rc1-00120-gb973425-dirty #16 [ 0.099956] task: c0000001fed80000 ti: c0000001fed7c000 task.ti: c0000001fed7c000 [ 0.099971] NIP: c0000000000b7a64 LR: c0000000000b7a40 CTR: c0000000000b4934 [ 0.099985] REGS: c0000001fed7f760 TRAP: 0300 Not tainted (3.10.0-rc1-00120-gb973425-dirty) [ 0.099997] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 24272828 XER: 20000003 [ 0.100045] SOFTE: 1 [ 0.100053] CFAR: c000000000445ee8 [ 0.100064] DAR: c00000038518078f, DSISR: 40000000 [ 0.100073] GPR00: 0000000000000080 c0000001fed7f9e0 c000000000c84d48 0000000000000010 GPR04: 0000000000000010 0000000000000000 c0000001fc55e090 0000000000000000 GPR08: ffffffffffffffff c000000000b80b30 c000000000c962d8 00000003845ffc5f GPR12: 0000000000000000 c00000000f33d000 c00000000000b9e4 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000000000 GPR20: c000000000ccf750 0000000000000000 c000000000c94d48 c0000001fc504000 GPR24: c0000001fc504000 c0000001fecef848 c000000000c94d48 c000000000ccf000 GPR28: c0000001fc522090 0000000000000010 c0000001fecef848 c0000001fed7fae0 [ 0.100293] NIP [c0000000000b7a64] .get_group+0x84/0xc4 [ 0.100307] LR [c0000000000b7a40] .get_group+0x60/0xc4 [ 0.100318] Call Trace: [ 0.100332] [c0000001fed7f9e0] [c0000000000dbce4] .lock_is_held+0xa8/0xd0 (unreliable) [ 0.100354] [c0000001fed7fa70] [c0000000000bf62c] .build_sched_domains+0x728/0xd14 [ 0.100375] [c0000001fed7fbe0] [c000000000af67bc] .sched_init_smp+0x4fc/0x654 [ 0.100394] [c0000001fed7fce0] [c000000000adce24] .kernel_init_freeable+0x17c/0x30c [ 0.100413] [c0000001fed7fdb0] [c00000000000ba08] .kernel_init+0x24/0x12c [ 0.100431] [c0000001fed7fe30] [c000000000009f74] .ret_from_kernel_thread+0x5c/0x68 [ 0.100445] Instruction dump: [ 0.100456] 38800010 38a00000 4838e3f5 60000000 7c6307b4 2fbf0000 419e0040 3d220001 [ 0.100496] 78601f24 39491590 e93e0008 7d6a002a <7d69582a> f97f0000 7d4a002a e93e0010 [ 0.100559] ---[ end trace 31fd0ba7d8756001 ]--- This patch tries to move the sibling maps updating before notify_cpu_starting() and cpu online, and a write barrier there to make sure sibling maps are updated before active and online mask. Signed-off-by: Li Zhong <[email protected]> Reviewed-by: Srivatsa S. Bhat <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01mac: Make cuda_init_via() __initGeert Uytterhoeven1-1/+1
cuda_init_via() is called from find_via_cuda() only, which is __init. Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc: Delete __cpuinit usage from all usersPaul Gortmaker19-50/+54
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the powerpc uses of the __cpuinit macros. There are no __CPUINIT users in assembly files in powerpc. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Josh Boyer <[email protected]> Cc: Matt Porter <[email protected]> Cc: Kumar Gala <[email protected]> Cc: [email protected] Signed-off-by: Paul Gortmaker <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01macintosh: Convert use of typedef ctl_table to struct ctl_tableJoe Perches1-4/+4
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/idle: Convert use of typedef ctl_table to struct ctl_tableJoe Perches1-2/+2
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/iommu: Remove unused pci_iommu_init() and pci_direct_iommu_init()Bjorn Helgaas1-7/+0
pci_iommu_init() and pci_direct_iommu_init() are not referenced anywhere, so remove them. Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc: Don't flush/invalidate the d/icache for an unknown relocation typeKevin Hao1-1/+2
For an unknown relocation type since the value of r4 is just the 8bit relocation type, the sum of r4 and r7 may yield an invalid memory address. For example: In normal case: r4 = c00xxxxx r7 = 40000000 r4 + r7 = 000xxxxx For an unknown relocation type: r4 = 000000xx r7 = 40000000 r4 + r7 = 400000xx 400000xx is an invalid memory address for a board which has just 512M memory. And for operations such as dcbst or icbi may cause bus error for an invalid memory address on some platforms and then cause the board reset. So we should skip the flush/invalidate the d/icache for an unknown relocation type. Signed-off-by: Kevin Hao <[email protected]> Acked-by: Suzuki K. Poulose <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/windfarm: Fix overtemperature clearingAaro Koskinen3-3/+15
With pm81/pm91/pm121, when the overtemperature state is entered, and when it remains on after skipped ticks, the driver will try to leave it too soon (immediately on the next tick). This is because the active FAILURE_OVERTEMP state is not visible in "new_failure" variable of the current tick. Furthermore, the driver will keep trying to clear condition in subsequent ticks as FAILURE_OVERTEMP remains set in the "last_failure" variable. These will start to trigger WARNINGS from windfarm core: [ 100.082735] windfarm: Clamping CPU frequency to minimum ! [ 100.108132] windfarm: Overtemp condition detected ! [ 101.952908] windfarm: Overtemp condition cleared ! [...] [ 102.980388] WARNING: at drivers/macintosh/windfarm_core.c:463 [...] [ 103.982227] WARNING: at drivers/macintosh/windfarm_core.c:463 [...] [ 105.030494] WARNING: at drivers/macintosh/windfarm_core.c:463 [...] [ 105.973666] WARNING: at drivers/macintosh/windfarm_core.c:463 [...] [ 106.977913] WARNING: at drivers/macintosh/windfarm_core.c:463 Fix by adding a helper global variable. We leave the overtemp state only after all failure bits have been cleared. I saw this error on iMac G5 iSight (pm121). Also pm81/pm91 are fixed based on the observation that these are almost identical/copy-pasted code. Signed-off-by: Aaro Koskinen <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/powernv: Use dev-node in PCI config accessorsGavin Shan3-89/+79
Currently, we're using the combo (PCI bus + devfn) in the PCI config accessors and PCI config accessors in EEH depends on them. However, it's not safe to refer the PCI bus which might have been removed during hotplug. So we're using device node in the PCI config accessors and the corresponding backends just reuse them. The patch also fix one potential risk: We possiblly have frozen PE during the early PCI probe time, but we haven't setup the PE mapping yet. So the errors should be counted to PE#0. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/eeh: Avoid build warningsGavin Shan3-4/+4
The patch is for avoiding following build warnings: The function .pnv_pci_ioda_fixup() references the function __init .eeh_init(). This is often because .pnv_pci_ioda_fixup lacks a __init The function .pnv_pci_ioda_fixup() references the function __init .eeh_addr_cache_build(). This is often because .pnv_pci_ioda_fixup lacks a __init Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/eeh: Refactor the output messageGavin Shan3-16/+41
We needn't the the whole backtrace other than one-line message in the error reporting interrupt handler. For errors triggered by access PCI config space or MMIO, we replace "WARN(1, ...)" with pr_err() and dump_stack(). The patch also adds more output messages to indicate what EEH core is doing. Besides, some printk() are replaced with pr_warning(). Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/eeh: Fix address catch for PowerNVGavin Shan2-1/+2
On the PowerNV platform, the EEH address cache isn't built correctly because we skipped the EEH devices without binding PE. The patch fixes that. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/powernv: Replace variables with flagsGavin Shan3-8/+11
We have 2 fields in "struct pnv_phb" to trace the states. The patch replace the fields with one and introduces flags for that. The patch doesn't impact the logic. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/eeh: Check PCIe link after resetGavin Shan1-13/+144
After reset (e.g. complete reset) in order to bring the fenced PHB back, the PCIe link might not be ready yet. The patch intends to make sure the PCIe link is ready before accessing its subordinate PCI devices. The patch also fixes that wrong values restored to PCI_COMMAND register for PCI bridges. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-07-01powerpc/eeh: Don't collect PCI-CFG data on PHBGavin Shan1-9/+23
When the PHB is fenced or dead, it's pointless to collect the data from PCI config space of subordinate PCI devices since it should return 0xFF's. The patch also fixes overwritten buffer while getting PCI config data. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-30powerpc/tm: Clear MSR RI in non-recoverable TM codeMichael Neuling1-2/+16
When we treclaim and trecheckpoint there's an unavoidable period when r1 will not be a valid kernel stack pointer. This patch clears the MSR recoverable interrupt (RI) bit over these regions to indicate we have an invalid kernel stack pointer. For treclaim, the region over which we clear MSR RI is larger than required to avoid the need for an extra costly mtmsrd. Thanks to Paulus for suggesting this change. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-30powerpc: Fix string instr. emulation for 32-bit processes on ppc64James Yang1-0/+4
String instruction emulation would erroneously result in a segfault if the upper bits of the EA are set and is so high that it fails access check. Truncate the EA to 32 bits if the process is 32-bit. Signed-off-by: James Yang <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-30trivial: powerpc: Fix typo in ioei_interrupt() descriptionSebastien Bessiere1-1/+1
Signed-off-by: Sebastien Bessiere <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-26mm/thp: define HPAGE_PMD_* constants as BUILD_BUG() if !THPKirill A. Shutemov1-1/+5
Currently, HPAGE_PMD_* constans rely on PMD_SHIFT regardless of CONFIG_TRANSPARENT_HUGEPAGE. PMD_SHIFT is not defined everywhere (e.g. arm nommu case). It means we can't use anything like this in generic code: if (PageTransHuge(page)) zero_huge_user(page, 0, HPAGE_PMD_SIZE); else clear_highpage(page); For !THP case, PageTransHuge() is 0 and compiler can eliminate zero_huge_user() call. But it still need to be valid C expression, means HPAGE_PMD_SIZE has to expand to something compiler can understand. Previously, HPAGE_PMD_* were defined to BUILD_BUG() for !THP. Let's come back to it. Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25powerpc/eeh: Use interruptible sleep in keehdGavin Shan1-1/+2
To replace down() with down_interrutible() to avoid following warning: [c00000007ba7b710] [c000000000014410] .__switch_to+0x1b0/0x380 [c00000007ba7b7c0] [c0000000007b408c] .__schedule+0x3ec/0x970 [c00000007ba7ba50] [c0000000007b1f24] .schedule_timeout+0x1a4/0x2b0 [c00000007ba7bb30] [c0000000007b34a4] .__down+0xa4/0x104 [c00000007ba7bbf0] [c0000000000b9230] .down+0x60/0x70 [c00000007ba7bc80] [c0000000000336d0] .eeh_event_handler+0x70/0x190 [c00000007ba7bd30] [c0000000000b1a58] .kthread+0xe8/0xf0 [c00000007ba7be30] [c00000000000a05c] .ret_from_kernel_thread+0x5c/0x8 This also avoids keeping the load average up while doing nothing. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25powerpc/eeh: Remove eeh_mutexGavin Shan3-46/+1
Originally, eeh_mutex was introduced to protect the PE hierarchy tree and the attached EEH devices because EEH core was possiblly running with multiple threads to access the PE hierarchy tree. However, we now have only one kthread in EEH core. So we needn't the eeh_mutex and just remove it. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25powerpc/mm: Fix build warnings with CONFIG_TRANSPARENT_HUGEPAGE disabledNathan Fontenot1-0/+1
Building with CONFIG_TRANSPARENT_HUGEPAGE disabled causes the following build wearnings; powerpc/arch/powerpc/include/asm/mmu-hash64.h: In function ‘__hash_page_thp’: powerpc/arch/powerpc/include/asm/mmu-hash64.h:354: warning: no return statement in function returning non-void This patch adds a return -1 to the static inline for __hash_page_thp() to correct the warnings. Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25powerpc/pseries: Enable PSTORE in pseries_defconfigAruna Balakrishnaiah1-0/+1
Since now we have pstore support for nvram in pseries, enable it in the default config. With this config option enabled, pstore infra-structure will be used to read/write the messages from/to nvram. Signed-off-by: Aruna Balakrishnaiah <[email protected]> Acked-by: Michael Neuling <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25powerpc/hw_brk: Fix clearing of extraneous IRQMichael Neuling1-0/+1
In 9422de3 "powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers" we changed the way we mark extraneous irqs with this: - info->extraneous_interrupt = !((bp->attr.bp_addr <= dar) && - (dar - bp->attr.bp_addr < bp->attr.bp_len)); + if (!((bp->attr.bp_addr <= dar) && + (dar - bp->attr.bp_addr < bp->attr.bp_len))) + info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; Unfortunately this is bogus as it never clears extraneous IRQ if it's already set. This correctly clears extraneous IRQ before possibly setting it. Signed-off-by: Michael Neuling <[email protected]> Reported-by: Edjunior Barbosa Machado <[email protected]> Cc: [email protected] Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25powerpc/hw_brk: Fix setting of length for exact mode breakpointsMichael Neuling1-1/+3
The smallest match region for both the DABR and DAWR is 8 bytes, so the kernel needs to filter matches when users want to look at regions smaller than this. Currently we set the length of PPC_BREAKPOINT_MODE_EXACT breakpoints to 8. This is wrong as in exact mode we should only match on 1 address, hence the length should be 1. This ensures that the kernel will filter out any exact mode hardware breakpoint matches on any addresses other than the requested one. Signed-off-by: Michael Neuling <[email protected]> Reported-by: Edjunior Barbosa Machado <[email protected]> Cc: [email protected] Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-25macintosh/adb: Replace __WAITQUEUE_INITIALIZER with more standard ↵Robert P. J. Day1-1/+1
DECLARE_WAITQUEUE. Signed-off-by: Robert P. J. Day <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: Optimize hugepage invalidateAneesh Kumar K.V4-9/+201
Hugepage invalidate involves invalidating multiple hpte entries. Optimize the operation using H_BULK_REMOVE on lpar platforms. On native, reduce the number of tlb flush. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc/THP: Enable THP on PPC64Aneesh Kumar K.V2-2/+30
We enable only if the we support 16MB page size. Reviewed-by: David Gibson <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: split hugepage when using subpage protectionAneesh Kumar K.V1-0/+48
We find all the overlapping vma and mark them such that we don't allocate hugepage in that range. Also we split existing huge page so that the normal page hash can be invalidated and new page faulted in with new protection bits. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: disable assert_pte_locked for collapse_huge_pageAneesh Kumar K.V1-0/+8
With THP we set pmd to none, before we do pte_clear. Hence we can't walk page table to get the pte lock ptr and verify whether it is locked. THP do take pte lock before calling pte_clear. So we don't change the locking rules here. It is that we can't use page table walking to check whether pte locks are held with THP. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: Prevent gcc to re-read the pagetablesAneesh Kumar K.V2-5/+5
GCC is very likely to read the pagetables just once and cache them in the local stack or in a register, but it is can also decide to re-read the pagetables. The problem is that the pagetable in those places can change from under gcc. With THP/hugetlbfs the pmd (and pgd for hugetlbfs giga pages) can change under gup_fast. The pages won't be freed untill we finish gup fast because we have irq disabled and we free these pages via rcu callback. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: Make linux pagetable walk safe with THP enabledAneesh Kumar K.V4-38/+68
We need to have irqs disabled to handle all the possible parallel update for linux page table without holding locks. Events that we are intersted in while walking page tables are 1) Page fault 2) umap 3) THP split 4) THP collapse A) local_irq_disabled: ------------------------ 1) page fault: A none to valid transition via page fault is not an issue because we would either see a none or valid. If it is none, we would error out the page table walk. We may need to use on stack values when checking for type of page table elements, because if we do if (!is_hugepd()) { if (!pmd_none() { if (pmd_bad() { We could take that bad condition because the pmd got converted to a hugepd after the !is_hugepd check via a hugetlb fault. The right way would be to check for pmd_none higher up or use on stack value. 2) A valid to none conversion via unmap: We can safely walk the upper level table, because we don't remove the the page table entries until rcu grace period. So even if we followed a wrong pointer we still have the pointer valid till the grace period. A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent pte_clear we take _PAGE_BUSY. 3) THP split: A valid transparent hugepage is converted to nomal page. Before we split we do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING So when walking page table we need to check for pmd_trans_splitting and handle that. The pte returned should also need to be checked for _PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save the value of PTE on stack and check for the flag in the local pte value. If we don't have the value set we can safely operate on the local pte value and we atomicaly set _PAGE_BUSY. 4) THP collapse: A normal page gets converted to hugepage. In the collapse path, we mark the pmd none early (pmdp_clear_flush). With irq disabled, if we are aleady walking page table we would see the pmd_none and won't continue. If we see a valid PMD, we should still check for _PAGE_PRESENT before setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc/THP: Add code to handle HPTE faults for hugepagesAneesh Kumar K.V4-4/+203
The deposted PTE page in the second half of the PMD table is used to track the state on hash PTEs. After updating the HPTE, we mark the coresponding slot in the deposted PTE page valid. Reviewed-by: David Gibson <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: Update gup_pmd_range to handle transparent hugepagesAneesh Kumar K.V1-2/+8
Reviewed-by: David Gibson <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc/kvm: Handle transparent hugepage in KVMAneesh Kumar K.V3-34/+44
We can find pte that are splitting while walking page tables. Return None pte in that case. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: Replace find_linux_pte with find_linux_pte_or_hugepteAneesh Kumar K.V7-33/+36
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly document why we don't need to handle transparent hugepages at callsites. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepagesAneesh Kumar K.V1-6/+26
Reviewed-by: David Gibson <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common codeAneesh Kumar K.V5-138/+138
We will use this in the later patch for handling THP pages Reviewed-by: David Gibson <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc/THP: Implement transparent hugepages for ppc64Aneesh Kumar K.V6-2/+625
We now have pmd entries covering 16MB range and the PMD table double its original size. We use the second half of the PMD table to deposit the pgtable (PTE page). The depoisted PTE page is further used to track the HPTE information. The information include [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry. With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need 4096 entries. Both will fit in a 4K PTE page. On hugepage invalidate we need to walk the PTE page and invalidate all valid HPTEs. This patch implements necessary arch specific functions for THP support and also hugepage invalidate logic. These PMD related functions are intentionally kept similar to their PTE counter-part. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc/THP: Double the PMD table size for THPAneesh Kumar K.V4-8/+16
THP code does PTE page allocation along with large page request and deposit them for later use. This is to ensure that we won't have any failures when we split hugepages to regular pages. On powerpc we want to use the deposited PTE page for storing hash pte slot and secondary bit information for the HPTEs. We use the second half of the pmd table to save the deposted PTE page. Reviewed-by: David Gibson <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc/mm: handle hugepage size correctly when invalidating hpte entriesAneesh Kumar K.V9-107/+95
If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch 0608d692463598c1d6e826d9dd7283381b4f246c "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc/eeh: Debugfs for error injectionGavin Shan1-0/+31
The patch creates debugfs entries (powerpc/PCIxxxx/err_injct) for injecting EEH errors for testing purpose. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc/powernv: Debugfs directory for PHBGavin Shan2-0/+27
The patch creates one debugfs directory ("powerpc/PCIxxxx") for each PHB so that we can hook EEH error injection debugfs entry there in proceeding patch. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powerpc/eeh: Register OPAL notifier for PCI errorGavin Shan1-1/+40
The patch registers OPAL event notifier and process the PCI errors from firmware. If we have pending PCI errors, special EEH event (without binding PE) will be sent to EEH core for processing. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powernv/opal: Disable OPAL notifier upon poweroffGavin Shan1-0/+4
While we're restarting or powering off the system, we needn't the OPAL notifier any more. So just to disable that. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-21powernv/opal: Notifier for OPAL eventsGavin Shan2-1/+73
This patch implements a notifier to receive a notification on OPAL event mask changes. The notifier is only called as a result of an OPAL interrupt, which will happen upon reception of FSP messages or PCI errors. Any event mask change detected as a result of opal_poll_events() will not result in a notifier call. [benh: changelog] Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Allow to check fenced PHB proactivelyGavin Shan1-0/+60
It's meaningless to handle frozen PE if we already had fenced PHB. The patch intends to check the PHB state before checking PE. If the PHB has been put into fenced state, we need take care of that firstly. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Enable EEH check for config accessGavin Shan1-1/+39
The patch enables EEH check and let EEH core to process the EEH errors for PowerNV platform while accessing config space. Originally, the implementation already had mechanism to check EEH errors and tried to recover from them. However, we never let EEH core to handle the EEH errors. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-06-20powerpc/eeh: Initialization for PowerNVGavin Shan2-5/+17
The patch initializes EEH for PowerNV platform. Because the OPAL APIs requires HUB ID, we need trace that through struct pnv_phb. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>