aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms/powernv
AgeCommit message (Collapse)AuthorFilesLines
2018-07-24powerpc/powernv: opal-kmsg standardise OPAL_BUSY handlingNicholas Piggin1-8/+16
OPAL_CONSOLE_FLUSH is documented as being able to return OPAL_BUSY, so implement the standard OPAL_BUSY handling for it. Reviewed-by: Russell Currey <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-24powerpc/powernv: Fix OPAL console driver OPAL_BUSY loopsNicholas Piggin1-15/+23
The OPAL console driver does not delay in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware. It can't yet be made to sleep because it is called under spinlock, but it can be changed to the standard OPAL_BUSY loop form, and a delay added to keep it from hitting the firmware too frequently. Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-24powerpc/powernv: opal_put_chars partial write fixNicholas Piggin1-1/+1
The intention here is to consume and discard the remaining buffer upon error. This works if there has not been a previous partial write. If there has been, then total_len is no longer total number of bytes to copy. total_len is always "bytes left to copy", so it should be added to written bytes. This code may not be exercised any more if partial writes will not be hit, but this is a small bugfix before a larger change. Reviewed-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-24powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt ↵Mukesh Ojha1-6/+3
handler Fixes: 8034f715f ("powernv/opal-dump: Convert to irq domain") Converts all the return explicit number to a more proper IRQ_HANDLED, which looks proper incase of interrupt handler returning case. Here, It also removes error message like "nobody cared" which was getting unveiled while returning -1 or 0 from handler. Signed-off-by: Mukesh Ojha <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-24powerpc/powernv/opal-dump : Handles opal_dump_info properlyMukesh Ojha1-3/+6
Moves the return value check of 'opal_dump_info' to a proper place which was previously unnecessarily filling all the dump info even on failure. Signed-off-by: Mukesh Ojha <[email protected]> Acked-by: Stewart Smith <[email protected]> Acked-by: Jeremy Kerr <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-19powerpc/powernv/npu: Add a debugfs setting to change ATSD thresholdAlistair Popple1-2/+10
The threshold at which it becomes more efficient to coalesce a range of ATSDs into a single per-PID ATSD is currently not well understood due to a lack of real-world work loads. This patch adds a debugfs parameter allowing the threshold to be altered at runtime in order to aid future development and refinement of the value. Signed-off-by: Alistair Popple <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-19Merge branch 'topic/ppc-kvm' into nextMichael Ellerman7-388/+454
Merge in some commits we're sharing with the KVM tree. I manually propagated the change from commit d3d4ffaae439 ("powerpc/powernv/ioda2: Reduce upper limit for DMA window size") into pci-ioda-tce.c. Conflicts: arch/powerpc/include/asm/cputable.h arch/powerpc/platforms/powernv/pci-ioda.c arch/powerpc/platforms/powernv/pci.h
2018-07-16powerpc/powernv/ioda: Allocate indirect TCE levels on demandAlexey Kardashevskiy3-22/+65
At the moment we allocate the entire TCE table, twice (hardware part and userspace translation cache). This normally works as we normally have contigous memory and the guest will map entire RAM for 64bit DMA. However if we have sparse RAM (one example is a memory device), then we will allocate TCEs which will never be used as the guest only maps actual memory for DMA. If it is a single level TCE table, there is nothing we can really do but if it a multilevel table, we can skip allocating TCEs we know we won't need. This adds ability to allocate only first level, saving memory. This changes iommu_table::free() to avoid allocating of an extra level; iommu_table::set() will do this when needed. This adds @alloc parameter to iommu_table::exchange() to tell the callback if it can allocate an extra level; the flag is set to "false" for the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns H_TOO_HARD. This still requires the entire table to be counted in mm::locked_vm. To be conservative, this only does on-demand allocation when the usespace cache table is requested which is the case of VFIO. The example math for a system replicating a powernv setup with NVLink2 in a guest: 16GB RAM mapped at 0x0 128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000 the table to cover that all with 64K pages takes: (((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB If we allocate only necessary TCE levels, we will only need: (((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect levels). Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-16powerpc/powernv: Rework TCE level allocationAlexey Kardashevskiy1-11/+19
This moves actual pages allocation to a separate function which is going to be reused later in on-demand TCE allocation. While we are at it, remove unnecessary level size round up as the caller does this already. Reviewed-by: David Gibson <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-16powerpc/powernv: Add indirect levels to it_userspaceAlexey Kardashevskiy3-21/+70
We want to support sparse memory and therefore huge chunks of DMA windows do not need to be mapped. If a DMA window big enough to require 2 or more indirect levels, and a DMA window is used to map all RAM (which is a default case for 64bit window), we can actually save some memory by not allocation TCE for regions which we are not going to map anyway. The hardware tables alreary support indirect levels but we also keep host-physical-to-userspace translation array which is allocated by vmalloc() and is a flat array which might use quite some memory. This converts it_userspace from vmalloc'ed array to a multi level table. As the format becomes platform dependend, this replaces the direct access to it_usespace with a iommu_table_ops::useraddrptr hook which returns a pointer to the userspace copy of a TCE; future extension will return NULL if the level was not allocated. This should not change non-KVM handling of TCE tables and it_userspace will not be allocated for non-KVM tables. Reviewed-by: David Gibson <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-16powerpc/powernv: Move TCE manupulation code to its own fileAlexey Kardashevskiy5-320/+340
Right now we have allocation code in pci-ioda.c and traversing code in pci.c, let's keep them toghether. However both files are big enough already so let's move this business to a new file. While we at it, move the code which links IOMMU table groups to IOMMU tables as it is not specific to any PNV PHB model. These puts exported symbols from the new file together. This fixes several warnings from checkpatch.pl like this: "WARNING: Prefer 'unsigned int' to bare use of 'unsigned'". As this is almost cut-n-paste, there should be no behavioral change. Reviewed-by: David Gibson <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-16powerpc/powernv: Remove useless wrapperAlexey Kardashevskiy1-6/+1
This gets rid of a useless wrapper around pnv_pci_ioda2_table_free_pages(). Reviewed-by: David Gibson <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-16powerpc/64s: Remove POWER9 DD1 supportNicholas Piggin2-52/+3
POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Michael Ellerman <[email protected]> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <[email protected]>
2018-07-04powerpc/powernv/ioda2: Add 256M IOMMU page size to the default POWER8 caseAlexey Kardashevskiy1-1/+1
The sketchy bypass uses 256M pages so add this page size as well. This should cause no behavioral change but will be used later. Fixes: 477afd6ea6 "powerpc/ioda: Use ibm,supported-tce-sizes for IOMMU page size mask" Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-02Revert "powerpc/powernv: Add support for the cxl kernel api on the real phb"Alastair D'Silva3-145/+1
Remove abandonned capi support for the Mellanox CX4. This reverts commit 4361b03430d685610e5feea3ec7846e8b9ae795f. Signed-off-by: Alastair D'Silva <[email protected]> Acked-by: Andrew Donnellan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-02Revert "cxl: Add support for interrupts on the Mellanox CX4"Alastair D'Silva3-90/+0
Remove abandonned capi support for the Mellanox CX4. This reverts commit a2f67d5ee8d950caaa7a6144cf0bfb256500b73e. Signed-off-by: Alastair D'Silva <[email protected]> Acked-by: Andrew Donnellan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-02powerpc/powernv/memtrace: Remove memtrace mmap()Michael Neuling1-29/+0
debugfs doesn't support mmap(), so this code is never used. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-02powerpc/powernv/ioda2: Reduce upper limit for DMA window sizeAlexey Kardashevskiy1-1/+1
We use PHB in mode1 which uses bit 59 to select a correct DMA window. However there is mode2 which uses bits 59:55 and allows up to 32 DMA windows per a PE. Even though documentation does not clearly specify that, it seems that the actual hardware does not support bits 59:55 even in mode1, in other words we can create a window as big as 1<<58 but DMA simply won't work. This reduces the upper limit from 59 to 55 bits to let the userspace know about the hardware limits. Fixes: 7aafac11e3 "powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested" Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-02powerpc/eeh: Avoid misleading message "EEH: no capable adapters found"Mauro S. M. Rodrigues1-1/+11
Due to recent refactoring in EEH in: commit b9fde58db7e5 ("powerpc/powernv: Rework EEH initialization on powernv") a misleading message was seen in the kernel message buffer: [ 0.108431] EEH: PowerNV platform initialized [ 0.589979] EEH: No capable adapters found This happened due to the removal of the initialization delay for powernv platform. Even though the EEH infrastructure for the devices is eventually initialized and still works just fine the eeh device probe step is postponed in order to assure the PEs are created. Later pnv_eeh_post_init does the probe devices job but at that point the message was already shown right after eeh_init flow. This patch introduces a new flag EEH_POSTPONED_PROBE to represent that temporary state and avoid the message mentioned above and showing the follow one instead: [ 0.107724] EEH: PowerNV platform initialized [ 4.844825] EEH: PCI Enhanced I/O Error Handling Enabled Signed-off-by: Mauro S. M. Rodrigues <[email protected]> Acked-by: Russell Currey <[email protected]> Tested-by:Venkat Rao B <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook1-4/+4
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <[email protected]>
2018-06-07Merge tag 'powerpc-4.18-1' of ↵Linus Torvalds16-99/+195
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for split PMD page table lock on 64-bit Book3S (Power8/9). - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support live patching again. - Add support for patching barrier_nospec in copy_from_user() and syscall entry. - A couple of fixes for our data breakpoints on Book3S. - A series from Nick optimising TLB/mm handling with the Radix MMU. - Numerous small cleanups to squash sparse/gcc warnings from Mathieu Malaterre. - Several series optimising various parts of the 32-bit code from Christophe Leroy. - Removal of support for two old machines, "SBC834xE" and "C2K" ("GEFanuc,C2K"), which is why the diffstat has so many deletions. And many other small improvements & fixes. There's a few out-of-area changes. Some minor ftrace changes OK'ed by Steve, and a fix to our powernv cpuidle driver. Then there's a series touching mm, x86 and fs/proc/task_mmu.c, which cleans up some details around pkey support. It was ack'ed/reviewed by Ingo & Dave and has been in next for several weeks. Thanks to: Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Balbir Singh, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Dave Hansen, Fabio Estevam, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Haren Myneni, Hari Bathini, Ingo Molnar, Jonathan Neuschäfer, Josh Poimboeuf, Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu Malaterre, Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao, Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica Gupta, Ravi Bangoria, Russell Currey, Sam Bobroff, Samuel Mendoza-Jonas, Segher Boessenkool, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stewart Smith, Thiago Jung Bauermann, Torsten Duwe, Vaibhav Jain, Wei Yongjun, Wolfram Sang, Yisheng Xie, YueHaibing" * tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (251 commits) powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap cpuidle: powernv: Fix promotion from snooze if next state disabled powerpc: fix build failure by disabling attribute-alias warning in pci_32 ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait() powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted" powerpc: fix spelling mistake: "Usupported" -> "Unsupported" powerpc/pkeys: Detach execute_only key on !PROT_EXEC powerpc/powernv: copy/paste - Mask SO bit in CR powerpc: Remove core support for Marvell mv64x60 hostbridges powerpc/boot: Remove core support for Marvell mv64x60 hostbridges powerpc/boot: Remove support for Marvell mv64x60 i2c controller powerpc/boot: Remove support for Marvell MPSC serial controller powerpc/embedded6xx: Remove C2K board support powerpc/lib: optimise PPC32 memcmp powerpc/lib: optimise 32 bits __clear_user() powerpc/time: inline arch_vtime_task_switch() powerpc/Makefile: set -mcpu=860 flag for the 8xx powerpc: Implement csum_ipv6_magic in assembly powerpc/32: Optimise __csum_partial() powerpc/lib: Adjust .balign inside string functions for PPC32 ...
2018-06-05powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted"Colin Ian King1-1/+1
Trivial fix to spelling mistake in hmi_error_types text Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Stewart Smith <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-04powerpc/powernv: copy/paste - Mask SO bit in CRHaren Myneni1-1/+2
NX can set the 3rd bit in CR register for XER[SO] (Summary overflow) which is not related to paste request. The current paste function returns failure for a successful request when this bit is set. So mask this bit and check the proper return status. Fixes: 2392c8c8c045 ("powerpc/powernv/vas: Define copy/paste interfaces") Cc: [email protected] # v4.14+ Signed-off-by: Haren Myneni <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03powerpc/64s: Enable barrier_nospec based on firmware settingsMichal Suchanek1-0/+1
Check what firmware told us and enable/disable the barrier_nospec as appropriate. We err on the side of enabling the barrier, as it's no-op on older systems, see the comment for more detail. Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03powerpc/perf: Unregister thread-imc if core-imc not supportedAnju T Sudhakar1-0/+9
Since thread-imc internally use the core-imc hardware infrastructure and is depended on it, having thread-imc in the kernel in the absence of core-imc is trivial. Patch disables thread-imc, if core-imc is not registered. Signed-off-by: Anju T Sudhakar <[email protected]> Reviewed-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03powerpc/perf: Rearrange memory freeing in imc initAnju T Sudhakar1-3/+10
When any of the IMC (In-Memory Collection counter) devices fail to initialize, imc_common_mem_free() frees set of memory. In doing so, pmu_ptr pointer is also freed. But pmu_ptr pointer is used in subsequent function (imc_common_cpuhp_mem_free()) which is wrong. Patch here reorders the code to avoid such access. Also free the memory which is dynamically allocated during imc initialization, wherever required. Signed-off-by: Anju T Sudhakar <[email protected]> Reviewed-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03powerpc: use time64_t in read_persistent_clockArnd Bergmann1-3/+2
Looking through the remaining users of the deprecated mktime() function, I found the powerpc rtc handlers, which use it in place of rtc_tm_to_time64(). To clean this up, I'm changing over the read_persistent_clock() function to the read_persistent_clock64() variant, and change all the platform specific handlers along with it. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's actionAlastair D'Silva1-2/+2
The function removes the process element from NPU cache. Signed-off-by: Alastair D'Silva <[email protected]> Acked-by: Frederic Barrat <[email protected]> Acked-by: Andrew Donnellan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03powerpc/powernv: process all OPAL event interrupts with kopaldNicholas Piggin3-61/+52
Using irq_work for processing OPAL event interrupts is not necessary. irq_work is typically used to schedule work from NMI context, a softirq may be more appropriate. However OPAL events are not particularly performance or latency critical, so they can all be invoked by kopald. This patch removes the irq_work queueing, and instead wakes up kopald when there is an event to be processed. kopald processes interrupts individually, enabling irqs and calling cond_resched between each one to minimise latencies. Event handlers themselves should still use threaded handlers, workqueues, etc. as necessary to avoid high interrupts-off latencies within any single interrupt. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESETNicholas Piggin2-1/+17
Although it is often possible to recover a CPU that was interrupted from OPAL with a system reset NMI, it's undesirable to interrupt them for a few reasons. Firstly because dump/debug code itself needs to call firmware, so it could hang on a lock or possibly corrupt a per-cpu data structure if it or another CPU was interrupted from OPAL. Secondly, the kexec crash dump code will not return from interrupt to unwind the OPAL call. Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to another CPU, which wait for it to leave firmware (or time out) to avoid this problem in normal conditions. Firmware bugs may still result in a timeout and interrupting OPAL, but that is the best option (stops the CPU, and possibly allows firmware to be debugged). Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03powerpc/powernv/ioda2: Remove redundant free of TCE pagesAlexey Kardashevskiy1-1/+0
When IODA2 creates a PE, it creates an IOMMU table with it_ops::free set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages(). Since iommu_tce_table_put() calls it_ops::free when the last reference to the table is released, explicit call to pnv_pci_ioda2_table_free_pages() is not needed so let's remove it. This should fix double free in the case of PCI hotuplug as pnv_pci_ioda2_table_free_pages() does not reset neither iommu_table::it_base nor ::it_size. This was not exposed by SRIOV as it uses different code path via pnv_pcibios_sriov_disable(). IODA1 does not inialize it_ops::free so it does not have this issue. Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE") Cc: [email protected] # v4.8+ Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03Merge branch 'fixes' into nextMichael Ellerman1-2/+12
We ended up with an ugly conflict between fixes and next in ftrace.h involving multiple nested ifdefs, and the automatic resolution is wrong. So merge fixes into next so we can fix it up.
2018-06-03Merge branch 'topic/ppc-kvm' into nextMichael Ellerman1-2/+1
Merge in some commits we're sharing with the kvm-ppc tree.
2018-05-28powerpc/powernv/cpuidle: Init all present cpus for deep statesAkshay Adiga1-2/+2
Init all present cpus for deep states instead of "all possible" cpus. Init fails if a possible cpu is guarded. Resulting in making only non-deep states available for cpuidle/hotplug. Stewart says, this means that for single threaded workloads, if you guard out a CPU core you'll not get WoF (Workload Optimised Frequency), which means that performance goes down when you wouldn't expect it to. Fixes: 77b54e9f213f ("powernv/powerpc: Add winkle support for offline cpus") Cc: [email protected] # v3.19+ Signed-off-by: Akshay Adiga <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-05-24powerpc/reg: Add TEXASR related macrosSimon Guo1-2/+1
This patches add some macros for CR0/TEXASR bits so that PR KVM TM logic (tbegin./treclaim./tabort.) can make use of them later. Signed-off-by: Simon Guo <[email protected]> Reviewed-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-05-21powerpc/64s: Add support for a store forwarding barrier at kernel entry/exitNicholas Piggin1-0/+1
On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths. This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs. Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected. Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching. Firmware advertisement is not implemented yet, so CPU flush types are hard coded. Thanks to Michal Suchánek for bug fixes and review. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michal Suchánek <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-05-21powernv: opal-sensor: Add support to read 64bit sensor valuesShilpasri G Bhat2-0/+54
This patch adds support to read 64-bit sensor values. This method is used to read energy sensors and counters which are of type u64. Signed-off-by: Shilpasri G Bhat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-05-18powerpc/powernv: Use __raw_[rm_]writeq_be() in npu-dma.cMichael Ellerman1-3/+2
This allows us to squash some sparse warnings and also avoids having to do explicity endian conversions in the code. Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Samuel Mendoza-Jonas <[email protected]>
2018-05-18powerpc/powernv: Use __raw_[rm_]writeq_be() in pci-ioda.cMichael Ellerman1-7/+8
This allows us to squash some sparse warnings and also avoids having to do explicity endian conversions in the code. Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Samuel Mendoza-Jonas <[email protected]>
2018-05-18powerpc/powernv: Fix NVRAM sleep in invalid context when crashingNicholas Piggin1-2/+12
Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context. Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: [email protected] # v3.2 Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-05-18powerpc/powernv: Fix opal_event_shutdown() called with interrupts disabledNicholas Piggin1-1/+1
A kernel crash in process context that calls emergency_restart from panic will end up calling opal_event_shutdown with interrupts disabled but not in interrupt. This causes a sleeping function to be called which gives the following warning with sysrq+c: Rebooting in 10 seconds.. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 in_atomic(): 0, irqs_disabled(): 1, pid: 7669, name: bash CPU: 20 PID: 7669 Comm: bash Tainted: G D W 4.17.0-rc5+ #3 Call Trace: dump_stack+0xb0/0xf4 (unreliable) ___might_sleep+0x174/0x1a0 mutex_lock+0x38/0xb0 __free_irq+0x68/0x460 free_irq+0x70/0xc0 opal_event_shutdown+0xb4/0xf0 opal_shutdown+0x24/0xa0 pnv_shutdown+0x28/0x40 machine_shutdown+0x44/0x60 machine_restart+0x28/0x80 emergency_restart+0x30/0x50 panic+0x2a0/0x328 oops_end+0x1ec/0x1f0 bad_page_fault+0xe8/0x154 handle_page_fault+0x34/0x38 --- interrupt: 300 at sysrq_handle_crash+0x44/0x60 LR = __handle_sysrq+0xfc/0x260 flag_spec.62335+0x12b844/0x1e8db4 (unreliable) __handle_sysrq+0xfc/0x260 write_sysrq_trigger+0xa8/0xb0 proc_reg_write+0xac/0x110 __vfs_write+0x6c/0x240 vfs_write+0xd0/0x240 ksys_write+0x6c/0x110 Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events") Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-05-14powerpc/ioda: Use ibm, supported-tce-sizes for IOMMU page size maskAlexey Kardashevskiy1-1/+29
At the moment we assume that IODA2 and newer PHBs can always do 4K/64K/16M IOMMU pages, however this is not the case for POWER9 and now skiboot advertises the supported sizes via the device so we use that instead of hard coding the mask. Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-05-14powerpc/powernv: Fix memtrace build when NUMA=nMichael Ellerman1-1/+1
Currently memtrace doesn't build if NUMA=n: In function ‘memtrace_alloc_node’: arch/powerpc/platforms/powernv/memtrace.c:134:6: error: the address of ‘contig_page_data’ will always evaluate as ‘true’ if (!NODE_DATA(nid) || !node_spanned_pages(nid)) ^ This is because for NUMA=n NODE_DATA(nid) points to an always allocated structure, contig_page_data. But even in the NUMA=y case memtrace_alloc_node() is only called for online nodes, and we should always have a NODE_DATA() allocated for an online node. So remove the (hopefully) overly paranoid check, which also means we can build when NUMA=n. Signed-off-by: Michael Ellerman <[email protected]>
2018-05-07Revert "powerpc/powernv: Increase memory block size to 1GB on radix"Balbir Singh1-9/+1
This commit was a stop-gap to prevent crashes on hotunplug, caused by the mismatch between the 1G mappings used for the linear mapping and the memory block size. Those issues are now resolved because we split the linear mapping at hotunplug time if necessary, as implemented in commit 4dd5f8a99e79 ("powerpc/mm/radix: Split linear mapping on hot-unplug"). Signed-off-by: Balbir Singh <[email protected]> Signed-off-by: Michael Neuling <[email protected]> Tested-by: Rashmica Gupta <[email protected]> Tested-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-25rtc: opal: Fix OPAL RTC driver OPAL_BUSY loopsNicholas Piggin1-3/+5
The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, up to 50 seconds have been observed here when RTC stops responding (BMC reboot can do it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: [email protected] # v3.2+ Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Alexandre Belloni <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-24powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large ↵Alistair Popple1-4/+19
address range The NPU has a limited number of address translation shootdown (ATSD) registers and the GPU has limited bandwidth to process ATSDs. This can result in contention of ATSD registers leading to soft lockups on some threads, particularly when invalidating a large address range in pnv_npu2_mn_invalidate_range(). At some threshold it becomes more efficient to flush the entire GPU TLB for the given MM context (PID) than individually flushing each address in the range. This patch will result in ranges greater than 2MB being converted from 32+ ATSDs into a single ATSD which will flush the TLB for the given PID on each GPU. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: [email protected] # v4.12+ Signed-off-by: Alistair Popple <[email protected]> Acked-by: Balbir Singh <[email protected]> Tested-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-24powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback ↵Alistair Popple1-3/+13
parameters There is a single npu context per set of callback parameters. Callers should be prevented from overwriting existing callback values so instead return an error if different parameters are passed. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: [email protected] # v4.12+ Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Mark Hairgrove <[email protected]> Tested-by: Mark Hairgrove <[email protected]> Reviewed-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-24powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroyAlistair Popple1-9/+42
The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions are used to allocate/free contexts to allow address translation and shootdown by the NPU on a particular GPU. Context initialisation is implicitly safe as it is protected by the requirement mmap_sem be held in write mode, however pnv_npu2_destroy_context() does not require mmap_sem to be held and it is not safe to call with a concurrent initialisation for a different GPU. It was assumed the driver would ensure destruction was not called concurrently with initialisation. However the driver may be simplified by allowing concurrent initialisation and destruction for different GPUs. As npu context creation/destruction is not a performance critical path and the critical section is not large a single spinlock is used for simplicity. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: [email protected] # v4.12+ Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Mark Hairgrove <[email protected]> Tested-by: Mark Hairgrove <[email protected]> Reviewed-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-24powerpc/powernv/memtrace: Let the arch hotunplug code flush cacheBalbir Singh1-17/+0
Don't do this via custom code, instead now that we have support in the arch hotplug/hotunplug code, rely on those routines to do the right thing. The existing flush doesn't work because it uses ppc64_caches.l1d.size instead of ppc64_caches.l1d.line_size. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Signed-off-by: Balbir Singh <[email protected]> Reviewed-by: Rashmica Gupta <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-15Merge tag 'powerpc-4.17-2' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes when loading modules built with a different CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic. - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions from firmware. - Remove tlbie trace points from KVM code that's called in real mode, because it causes crashes. - Fix checkstops caused by invalid tlbiel on Power9 Radix. - Ensure the set of CPU features we "know" are always enabled is actually the minimal set when we build with support for firmware supplied CPU features. Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin. * tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features powerpc/mm/radix: Fix checkstops caused by invalid tlbiel KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode powerpc/8xx: Fix build with hugetlbfs enabled powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops powerpc/fscr: Enable interrupts earlier before calling get_user() powerpc/64s: Fix section mismatch warnings from setup_rfi_flush() powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic