aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2016-07-05powerpc/pseries: Fix error return value in cmm_mem_going_offline()Rasmus Villemoes1-1/+1
cmm_mem_going_offline() is (only) called from cmm_memory_cb(), which sends the return value through notifier_from_errno(). The latter expects 0 or -errno (notifier_to_errno(notifier_from_errno(x)) is 0 for any x >= 0, so passing a positive value cannot make sense). Hence negate ENOMEM. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/rtas: Fix array overrun in ppc_rtas() syscallAndrew Donnellan1-1/+1
If ppc_rtas() is called with args.nargs == 16 and args.nret == 0, args.rets is set to point to &args.args[16], which is beyond the end of the args.args array. This results in a minor read overrun of the array when we check the first return code (which, per PAPR, is a required output of all RTAS calls) to see if there's been a hardware error. Change the nargs/nret check to ensure nargs is <= 15, allowing room for the status code. Users shouldn't be calling with nret == 0, but there's no real harm if they do, so we don't stop them. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc: Send SIGBUS on unaligned copy and pasteChris Smart2-0/+18
Calling ISA 3.0 instructions copy, copy_first, paste and paste_last generates an alignment fault when copying or pasting unaligned data (128 byte). We catch this and send SIGBUS to the userspace process that caused it. We do not emulate these because paste may contain additional metadata when pasting to a co-processor and paste_last is the synchronisation point for preceding copy/paste sequences. Thanks to Michael Neuling <mikey@neuling.org> for his help. Signed-off-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/perf: Export Power9 generic and cache events to sysfsMadhavan Srinivasan1-0/+59
Export the generic hardware and cache perf events for Power9 to sysfs, so users can determine the PMU event monitored. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/perf: Power9 PMU supportMadhavan Srinivasan2-1/+272
This patch adds base enablement for the power9 PMU. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/perf: Add power9 event list macros for generic and cache eventsMadhavan Srinivasan1-0/+55
Add macros for the generic and cache events on Power9 Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/perf: factor out power8 __init_pmu codeMadhavan Srinivasan1-1/+17
Factor out the power8 pmu init functions to share with power9. Monitor Mode Control Register S(MMCRS) and Monitor Mode Control Register H(MMCRH) registers are dropped in Power9. These registers are added to new function which are included for power8 init. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/perf: factor out power8 pmu functionsMadhavan Srinivasan4-254/+273
Factor out some of the power8 pmu functions to new file "isa207-common.c" to share with power9 pmu code. Only code movement and no logic change Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/perf: factor out power8 pmu macros and definesMadhavan Srinivasan2-217/+234
Factor out some of the power8 pmu macros to new a header file to share with power9 pmu code. Just code movement and no logic change. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/fadump: Fix build error introduced by recent cleanupMichael Ellerman1-1/+1
We spent so much time bike-shedding the printk() we missed that the next line was missing a semi-colon. And it seems none of our defconfigs turn on CONFIG_FA_DUMP. Fixes: 4a03749f140c ("powerpc/fadump: Trivial fix of spelling mistake, clean up message") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-30powerpc: Initialise pci_io_base as early as possibleDarren Stevens4-1/+10
Commit d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for radix") turned kernel memory and IO addresses from #defined constants to variables initialised at runtime. On PA6T (pasemi) systems the setup_arch() machine call initialises the onboard PCI-e root-ports, and uses pci_io_base to do this, which is now before its value has been set, resulting in a panic early in boot before console IO is initialised. Move the pci_io_base initialisation to the same place as vmalloc ranges are set (hash__early_init_mmu()/radix__early_init_mmu()) - this is the earliest possible place we can initialise it. Fixes: d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for radix") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Darren Stevens <darren@stevens-zone.net> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Add #ifdef CONFIG_PCI, massage change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29powerpc/powernv: Add driver for operator panel on FSP machinesSuraj Jitindar Singh4-0/+9
Implement new character device driver to allow access from user space to the operator panel display present on IBM Power Systems machines with FSPs. This will allow status information to be presented on the display which is visible to a user. The driver implements a character buffer which a user can read/write by accessing the device (/dev/op_panel). This buffer is then displayed on the operator panel display. Any attempt to write past the last character position will have no effect and attempts to write more characters than the size of the display will be truncated. The device may only be accessed by a single process at a time. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29powerpc/opal: Add inline function to get rc from an ASYNC_COMP opal_msgSuraj Jitindar Singh3-3/+11
An opal_msg of type OPAL_MSG_ASYNC_COMP contains the return code in the params[1] struct member. However this isn't intuitive or obvious when reading the code and requires that a user look at the skiboot documentation or opal-api.h to verify this. Add an inline function to get the return code from an opal_msg and update call sites accordingly. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0Michael Neuling1-17/+44
Currently we have 2 segments that are bolted for the kernel linear mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel stacks. Anything accessed outside of these regions may need to be faulted in. (In practice machines with TM always have 1T segments) If a machine has < 2TB of memory we never fault on the kernel linear mapping as these two segments cover all physical memory. If a machine has > 2TB of memory, there may be structures outside of these two segments that need to be faulted in. This faulting can occur when running as a guest as the hypervisor may remove any SLB that's not bolted. When we treclaim and trecheckpoint we have a window where we need to run with the userspace GPRs. This means that we no longer have a valid stack pointer in r1. For this window we therefore clear MSR RI to indicate that any exceptions taken at this point won't be able to be handled. This means that we can't take segment misses in this RI=0 window. In this RI=0 region, we currently access the thread_struct for the process being context switched to or from. This thread_struct access may cause a segment fault since it's not guaranteed to be covered by the two bolted segment entries described above. We've seen this with a crash when running as a guest with > 2TB of memory on PowerVM: Unrecoverable exception 4100 at c00000000004f138 Oops: Unrecoverable exception, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries CPU: 1280 PID: 7755 Comm: kworker/1280:1 Tainted: G X 4.4.13-46-default #1 task: c000189001df4210 ti: c000189001d5c000 task.ti: c000189001d5c000 NIP: c00000000004f138 LR: 0000000010003a24 CTR: 0000000010001b20 REGS: c000189001d5f730 TRAP: 4100 Tainted: G X (4.4.13-46-default) MSR: 8000000100001031 <SF,ME,IR,DR,LE> CR: 24000048 XER: 00000000 CFAR: c00000000004ed18 SOFTE: 0 GPR00: ffffffffc58d7b60 c000189001d5f9b0 00000000100d7d00 000000003a738288 GPR04: 0000000000002781 0000000000000006 0000000000000000 c0000d1f4d889620 GPR08: 000000000000c350 00000000000008ab 00000000000008ab 00000000100d7af0 GPR12: 00000000100d7ae8 00003ffe787e67a0 0000000000000000 0000000000000211 GPR16: 0000000010001b20 0000000000000000 0000000000800000 00003ffe787df110 GPR20: 0000000000000001 00000000100d1e10 0000000000000000 00003ffe787df050 GPR24: 0000000000000003 0000000000010000 0000000000000000 00003fffe79e2e30 GPR28: 00003fffe79e2e68 00000000003d0f00 00003ffe787e67a0 00003ffe787de680 NIP [c00000000004f138] restore_gprs+0xd0/0x16c LR [0000000010003a24] 0x10003a24 Call Trace: [c000189001d5f9b0] [c000189001d5f9f0] 0xc000189001d5f9f0 (unreliable) [c000189001d5fb90] [c00000000001583c] tm_recheckpoint+0x6c/0xa0 [c000189001d5fbd0] [c000000000015c40] __switch_to+0x2c0/0x350 [c000189001d5fc30] [c0000000007e647c] __schedule+0x32c/0x9c0 [c000189001d5fcb0] [c0000000007e6b58] schedule+0x48/0xc0 [c000189001d5fce0] [c0000000000deabc] worker_thread+0x22c/0x5b0 [c000189001d5fd80] [c0000000000e7000] kthread+0x110/0x130 [c000189001d5fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 Instruction dump: 7cb103a6 7cc0e3a6 7ca222a6 78a58402 38c00800 7cc62838 08860000 7cc000a6 38a00006 78c60022 7cc62838 0b060000 <e8c701a0> 7ccff120 e8270078 e8a70098 ---[ end trace 602126d0a1dedd54 ]--- This fixes this by copying the required data from the thread_struct to the stack before we clear MSR RI. Then once we clear RI, we only access the stack, guaranteeing there's no segment miss. We also tighten the region over which we set RI=0 on the treclaim() path. This may have a slight performance impact since we're adding an mtmsr instruction. Fixes: 090b9284d725 ("powerpc/tm: Clear MSR RI in non-recoverable TM code") Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-28powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()Gavin Shan1-1/+1
When calling eeh_rmv_device() in eeh_reset_device() for partial hotplug case, @rmv_data instead of its address is the proper argument. Otherwise, the stack frame is corrupted when writing to @rmv_data (actually its address) in eeh_rmv_device(). It results in kernel crash as observed. This fixes the issue by passing @rmv_data, not its address to eeh_rmv_device() in eeh_reset_device(). Fixes: 67086e32b564 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE") Reported-by: Pridhiviraj Paidipeddi <ppaidipe@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-28powerpc/powernv: Fix spelling mistake "Retrived" -> "Retrieved"Colin Ian King1-1/+1
Trivial fix to spelling mistake in pr_debug() message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-28powerpc/fadump: Trivial fix of spelling mistake, clean up messageColin Ian King1-3/+2
Fix trivial spelling mistake "rgistration". Also use pr_err() instead of printk() and unsplit the string to keep it all on one line. Signed-off-by: Colin Ian King <colin.king@canonical.com> [mpe: Keep rc on the same line, splitting it doesn't help] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-27powerpc/tm: Always reclaim in start_thread() for exec() class syscallsCyril Bur1-0/+10
Userspace can quite legitimately perform an exec() syscall with a suspended transaction. exec() does not return to the old process, rather it load a new one and starts that, the expectation therefore is that the new process starts not in a transaction. Currently exec() is not treated any differently to any other syscall which creates problems. Firstly it could allow a new process to start with a suspended transaction for a binary that no longer exists. This means that the checkpointed state won't be valid and if the suspended transaction were ever to be resumed and subsequently aborted (a possibility which is exceedingly likely as exec()ing will likely doom the transaction) the new process will jump to invalid state. Secondly the incorrect attempt to keep the transactional state while still zeroing state for the new process creates at least two TM Bad Things. The first triggers on the rfid to return to userspace as start_thread() has given the new process a 'clean' MSR but the suspend will still be set in the hardware MSR. The second TM Bad Thing triggers in __switch_to() as the processor is still transactionally suspended but __switch_to() wants to zero the TM sprs for the new process. This is an example of the outcome of calling exec() with a suspended transaction. Note the first 700 is likely the first TM bad thing decsribed earlier only the kernel can't report it as we've loaded userspace registers. c000000000009980 is the rfid in fast_exception_return() Bad kernel stack pointer 3fffcfa1a370 at c000000000009980 Oops: Bad kernel stack pointer, sig: 6 [#1] CPU: 0 PID: 2006 Comm: tm-execed Not tainted NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000 REGS: c00000003ffefd40 TRAP: 0700 Not tainted MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 00000000 XER: 00000000 CFAR: c0000000000098b4 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000 NIP [c000000000009980] fast_exception_return+0xb0/0xb8 LR [0000000000000000] (null) Call Trace: Instruction dump: f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b Kernel BUG at c000000000043e80 [verbose debug info unavailable] Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033) Oops: Unrecoverable exception, sig: 6 [#2] CPU: 0 PID: 2006 Comm: tm-execed Tainted: G D task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000 NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000 REGS: c00000003ffef7e0 TRAP: 0700 Tainted: G D MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]> CR: 28002828 XER: 00000000 CFAR: c000000000015a20 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000 GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000 GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004 GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000 GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000 GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80 NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c LR [c000000000015a24] __switch_to+0x1f4/0x420 Call Trace: Instruction dump: 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020 This fixes CVE-2016-5828. Fixes: bc2a9408fa65 ("powerpc: Hook in new transactional memory code") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24powerpc/pci: Reduce log level of PCI I/O space warningBenjamin Herrenschmidt1-3/+3
If a PHB has no I/O space, there's no need to make it look like something bad happened, a pr_debug() is plenty enough since this is the case of all our modern POWER chips. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24powerpc/ebpf/jit: Implement JIT compiler for extended BPFNaveen N. Rao8-3/+1315
PPC64 eBPF JIT compiler. Enable with: echo 1 > /proc/sys/net/core/bpf_jit_enable or echo 2 > /proc/sys/net/core/bpf_jit_enable ... to see the generated JIT code. This can further be processed with tools/net/bpf_jit_disasm. With CONFIG_TEST_BPF=m and 'modprobe test_bpf': test_bpf: Summary: 305 PASSED, 0 FAILED, [297/297 JIT'ed] ... on both ppc64 BE and LE. The details of the approach are documented through various comments in the code. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24powerpc/bpf/jit: Isolate classic BPF JIT specifics into a separate headerNaveen N. Rao4-121/+143
Break out classic BPF JIT specifics into a separate header in preparation for eBPF JIT implementation. Note that ppc32 will still need the classic BPF JIT. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24powerpc/bpf/jit: A few cleanupsNaveen N. Rao2-10/+11
1. Per the ISA, ADDIS actually uses RT, rather than RS. Though the result is the same, make the usage clear. 2. The multiply instruction used is a 32-bit multiply. Rename PPC_MUL() to PPC_MULW() to make the same clear. 3. PPC_STW[U] take the entire 16-bit immediate value and do not require word-alignment, per the ISA. Change the macros to use IMM_L(). 4. A few white-space cleanups to satisfy checkpatch.pl. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24powerpc/bpf/jit: Introduce rotate immediate instructionsNaveen N. Rao2-9/+13
Since we will be using the rotate immediate instructions for extended BPF JIT, let's introduce macros for the same. And since the shift immediate operations use the rotate immediate instructions, let's redo those macros to use the newly introduced instructions. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24powerpc/bpf/jit: Optimize 64-bit Immediate loadsNaveen N. Rao1-6/+11
Similar to the LI32() optimization, if the value can be represented in 32-bits, use LI32(). Also handle loading a few specific forms of immediate values in an optimum manner. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24powerpc/bpf/jit: Fix/enhance 32-bit Load Immediate implementationNaveen N. Rao1-3/+10
The existing LI32() macro can sometimes result in a sign-extended 32-bit load that does not clear the top 32-bits properly. As an example, loading 0x7fffffff results in the register containing 0xffffffff7fffffff. While this does not impact classic BPF JIT implementation (since that only uses the lower word for all operations), we would like to share this macro between classic BPF JIT and extended BPF JIT, wherein the entire 64-bit value in the register matters. Fix this by first doing a shifted LI followed by ORI. An additional optimization is with loading values between -32768 to -1, where we now only need a single LI. The new implementation now generates the same or less number of instructions. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-23powerpc/powernv: set power_save func after the idle states are initializedShreyas B. Prabhu2-1/+4
pnv_init_idle_states() discovers supported idle states from the device tree and does the required initialization. Set power_save function pointer only after this initialization is done Otherwise on machines which don't support nap, eg. Power9, the kernel will crash when it tries to nap. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-23powerpc/bpf/jit: Disable classic BPF JIT on ppc64leNaveen N. Rao1-1/+1
Classic BPF JIT was never ported completely to work on little endian powerpc. However, it can be enabled and will crash the system when used. As such, disable use of BPF JIT on ppc64le. Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Reported-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-23powerpc: Fix faults caused by radix patching of SLB miss handlerMichael Ellerman1-3/+4
As part of the Radix MMU support we added some feature sections in the SLB miss handler. These are intended to catch the case that we incorrectly take an SLB miss when Radix is enabled, and instead of crashing weirdly they bail out to a well defined exit path and trigger an oops. However the way they were written meant the bailout case was enabled by default until we did CPU feature patching. On powermacs the early debug prints in setup_system() can cause an SLB miss, which happens before code patching, and so the SLB miss handler would incorrectly bailout and crash during boot. Fix it by inverting the sense of the feature section, so that the code which is in place at boot is correct for the hash case. Once we determine we are using Radix - which will never happen on a powermac - only then do we patch in the bailout case which unconditionally jumps. Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Reported-by: Denis Kirjanov <kda@linux-powerpc.org> Tested-by: Denis Kirjanov <kda@linux-powerpc.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Print correct PHB type namesGavin Shan1-1/+3
We're initializing "IODA1" and "IODA2" PHBs though they are IODA2 and NPU PHBs as below kernel log indicates. Initializing IODA1 OPAL PHB /pciex@3fffe40700000 Initializing IODA2 OPAL PHB /pciex@3fff000400000 This fixes the PHB names. After it's applied, we get: Initializing IODA2 PHB (/pciex@3fffe40700000) Initializing NPU PHB (/pciex@3fff000400000) Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Functions to get/set PCI slot stateGavin Shan5-1/+115
This exports 4 functions, which base on the corresponding OPAL APIs to get/set PCI slot status. Those functions are going to be used by PowerNV PCI hotplug driver: pnv_pci_get_device_tree() opal_get_device_tree() pnv_pci_get_presence_state() opal_pci_get_presence_state() pnv_pci_get_power_state() opal_pci_get_power_state() pnv_pci_set_power_state() opal_pci_set_power_state() Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Introduce pnv_pci_get_slot_id()Gavin Shan2-0/+40
This introduces pnv_pci_get_slot_id() to get the hotpluggable PCI slot ID from the corresponding device node. It will be used by hotplug driver. Requested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Use PCI slot reset infrastructureGavin Shan1-1/+40
The (OPAL) firmware might provide the PCI slot reset capability which is identified by property "ibm,reset-by-firmware" on the PCI slot associated device node. This routes the reset request to firmware if "ibm,reset-by-firmware" exists in the PCI slot device node. Otherwise, the reset is done inside kernel as before. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Support PCI slot IDGavin Shan3-6/+10
The reset and poll functionality from (OPAL) firmware supports PHB and PCI slot at same time. They are identified by ID. This supports PCI slot ID by: * Rename the argument name for opal_pci_reset() and opal_pci_poll() accordingly * Rename pnv_eeh_phb_poll() to pnv_eeh_poll() and adjust its argument name. * One macro is added to produce PCI slot ID. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/pci: Delay populating pdnGavin Shan9-59/+69
The pdn (struct pci_dn) instances are allocated from memblock or bootmem when creating PCI controller (hoses) in setup_arch(). PCI hotplug, which will be supported by proceeding patches, releases PCI device nodes and their corresponding pdn on unplugging event. The memory chunks for pdn instances allocated from memblock or bootmem are hard to reused after being released. This delays creating pdn by pci_devs_phb_init() from setup_arch() to core_initcall() so that they are allocated from slab. The memory consumed by pdn can be released to system without problem during PCI unplugging time. It indicates that pci_dn is unavailable in setup_arch() and the the fixup on pdn (like AGP's) can't be carried out that time. We have to do that in pcibios_root_bridge_prepare() on maple/pasemi/powermac platforms where/when the pdn is available. pcibios_root_bridge_prepare is called from subsys_initcall() which is executed after core_initcall() so the code flow does not change. At the mean while, the EEH device is created when pdn is populated, meaning pdn and EEH device have same life cycle. In turn, we needn't call eeh_dev_init() to create EEH device explicitly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/pci: Update bridge windows on PCI plugGavin Shan1-2/+6
On the PCI plugging event, PCI slot's subordinate devices are scanned and their (IO and MMIO) resources are assigned. Platform dependent resources (PE#, IO/MMIO/DMA windows) are allocated or created on updating windows of the slot's upstream bridge. This updates the windows of the hot plugged slot's upstream bridge in pcibios_finish_adding_to_bus() so that the platform resources (PE#, IO/MMIO/DMA segments) are allocated or created accordingly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Dynamically release PEGavin Shan2-0/+175
This supports releasing PEs dynamically. A reference count is introduced to PE representing number of PCI devices associated with the PE. The reference count is increased when PCI device joins the PE and decreased when PCI device leaves the PE in pnv_pci_release_device(). When the count becomes zero, the PE and its consumed resources are released. Note that the count is accessed concurrently. So a counter with "int" type is enough here. In order to release the sources consumed by the PE, couple of helper functions are introduced as below: * pnv_pci_ioda1_unset_window() - Unset IODA1 DMA32 window * pnv_pci_ioda1_release_dma_pe() - Release IODA1 DMA32 segments * pnv_pci_ioda2_release_dma_pe() - Release IODA2 DMA resource * pnv_ioda_release_pe_seg() - Unmap IO/M32/M64 segments Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Make pnv_ioda_deconfigure_pe() visibleGavin Shan1-2/+4
pnv_ioda_deconfigure_pe() is visible only when CONFIG_PCI_IOV is enabled. The function will be used to tear down PE's associated mapping in PCI hotplug path that doesn't depend on CONFIG_PCI_IOV. This makes pnv_ioda_deconfigure_pe() visible and not depend on CONFIG_PCI_IOV. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Extend PCI bridge resourcesGavin Shan1-0/+61
The PCI slots are associated with root port or downstream ports of the PCIe switch connected to root port. When adapter is hot added to the PCI slot, it usually requests more IO or memory resource from the directly connected parent bridge (port) and update the bridge's windows accordingly. The resource windows of upstream bridges can't be updated automatically. It possibly leads to unbalanced resource across the bridges: The window of downstream bridge is overruning that of upstream bridge. The IO or MMIO path won't work. This resolves the above issue by extending bridge windows of root port and upstream port of the PCIe switch connected to the root port to PHB's windows. The windows of root port and bridge behind that are extended to the PHB's windows to accomodate the PCI hotplug happening in future. The PHB's 64KB 32-bits MSI region is included in bridge's M32 windows (in hardware) though it's excluded in the corresponding resource, as the bridge's M32 windows have 1MB as their minimal alignment. We observed EEH error during system boot when the MSI region is included in bridge's M32 window. This excludes top 1MB (including 64KB 32-bits MSI region) region from bridge's M32 windows when extending them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Setup PE for root busGavin Shan2-10/+41
There is no parent bridge for root bus, meaning pcibios_setup_bridge() isn't invoked for root bus. The PE for root bus is the ancestor of other PEs in PELTV. It means we need PE for root bus populated before all others. This populates the PE for root bus in pcibios_setup_bridge() path if it's not populated yet. The PE number next to the reserved one is used as the PE# to avoid holes in continuous M64 space. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Create PEs in pcibios_setup_bridge()Gavin Shan1-115/+69
Currently, the PEs and their associated resources are assigned in ppc_md.pcibios_fixup() except those used by SRIOV VFs. The function is called for once after PCI probing and resources assignment is completed. So it's obviously not hotplug friendly. This creates PEs dynamically in pcibios_setup_bridge() that is called for the event during system bootup and PCI hotplug: updating PCI bridge's windows after resource assignment/reassignment are done. In partial hotplug case, not all PCI devices included to one particular PE are unplugged and plugged again, we just need unbinding/binding the hot added PCI devices with the corresponding PE without creating new one. The change is applied to IODA1 and IODA2 PHBs only. The behaviour on NPU PHBs aren't changed. There are no PCI bridges on NPU PHBs, meaning pcibios_setup_bridge() won't be invoked there. We have to use old path (pnv_pci_ioda_fixup()) to setup PEs on NPU PHBs. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Allocate PE# in reverse orderGavin Shan1-8/+6
PE number for one particular PE can be allocated dynamically or reserved according to the consumed M64 (64-bits prefetchable) segments of the PE. The M64 segment can't be remapped to arbitrary PE, meaning the PE number is determined according to the index of the consumed M64 segment. As below figure shows, M64 resource grows from low to high end, meaning the PE (number) reserved according to M64 segment grows from low to high end as well, so does the dynamically allocated PE number. It will lead to conflict: PE number (M64 segment) reserved by dynamic allocation is required by hot added PCI adapter at later point. It fails the PCI hotplug because of the PE number can't be reserved based on the index of the consumed M64 segment. +---+---+---+---+---+--------------------------------+-----+ | 0 | 1 | 2 | 3 | 4 | ....... | 255 | +---+---+---+---+---+--------------------------------+-----+ PE number for dynamic allocation -----------------> PE number reserved for M64 segment -----------------> To resolve above conflicts, this forces the PE number to be allocated dynamically in reverse order. With this patch applied, the PE numbers are reserved in ascending order, but allocated dynamically in reverse order. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Increase PE# capacityGavin Shan2-6/+7
Each PHB maintains an array helping to translate 2-bytes Request ID (RID) to PE# with the assumption that PE# takes one byte, meaning that we can't have more than 256 PEs. However, pci_dn->pe_number already had 4-bytes for the PE#. This extends the PE# capacity for every PHB. After that, the PE number is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to check the PE# in phb->pe_rmap[] is valid or not. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Move pnv_pci_ioda_setup_opal_tce_kill() aroundGavin Shan1-2/+3
pnv_pci_ioda_setup_opal_tce_kill() called by pnv_ioda_setup_dma() to remap the TCE kill regiter. What's done in pnv_ioda_setup_dma() will be covered in pcibios_setup_bridge() which is invoked on each PCI bridge. It means we will possibly remap the TCE kill register for multiple times and it's unnecessary. This moves pnv_pci_ioda_setup_opal_tce_kill() to where the PHB is initialized (pnv_pci_init_ioda_phb()) to avoid above issue. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Remove PCI_RESET_DELAY_USGavin Shan1-3/+0
The macro defined in arch/powerpc/platforms/powernv/pci.c isn't used by anyone. Just remove it. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/pci: Override pcibios_setup_bridge()Gavin Shan2-0/+10
This overrides pcibios_setup_bridge() that is called to update PCI bridge windows when PCI resource assignment is completed, to assign PE and setup various (resource) mapping for the PE in subsequent patches. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc: export cpu_to_core_id()Mauricio Faria de Oliveira1-0/+1
Export cpu_to_core_id(). This will be used by the lpfc driver. This enables topology_core_id() from <linux/topology.h> (defined to cpu_to_core_id() in arch/powerpc/include/asm/topology.h) to be used by (non-builtin) modules. That is arch-neutral, already used by eg, drivers/base/topology.c, but it is builtin (obj-y in Makefile) thus didn't need the export. Since the module uses topology_core_id() and this is defined to cpu_to_core_id(), it needs the export, otherwise: ERROR: "cpu_to_core_id" [drivers/scsi/lpfc/lpfc.ko] undefined! Tested on next-20160601. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc: Load Monitor Register SupportJack Miller4-0/+34
This enables new registers, LMRR and LMSER, that can trigger an EBB in userspace code when a monitored load (via the new ldmx instruction) loads memory from a monitored space. This facility is controlled by a new FSCR bit, LM. This patch disables the FSCR LM control bit on task init and enables that bit when a load monitor facility unavailable exception is taken for using it. On context switch, this bit is then used to determine whether the two relevant registers are saved and restored. This is done lazily for performance reasons. Signed-off-by: Jack Miller <jack@codezen.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc: Improve FSCR init and context switchingMichael Neuling3-9/+7
This fixes a few issues with FSCR init and switching. In commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") we moved the setting of the FSCR register from inside an CPU_FTR_ARCH_207S section to inside just a CPU_FTR_ARCH_DSCR section. Hence we are setting FSCR on POWER6/7 where the FSCR doesn't exist. This is harmless but we shouldn't do it. Also, we can simplify the FSCR context switch. We don't need to go through the calculation involving dscr_inherit. We can just restore what we saved last time. We also set an initial value in INIT_THREAD, so that pid 1 which is cloned from that gets a sane value. Based on patch by Jack Miller. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc: Fix misleading comment in early_setup_secondary()Madhavan Srinivasan1-1/+1
Current comment in the early_setup_secondary() for paca->soft_enabled update is misleading. Comment should say to Mark interrupts "disabled" instead of "enabled". Fix the typo. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/kprobes: Remove kretprobe_trampoline_holder.Thiago Jung Bauermann1-6/+5
Fixes the following testsuite failure: $ sudo ./perf test -v kallsyms 1: vmlinux symtab matches kallsyms : --- start --- test child forked, pid 12489 Using /proc/kcore for kernel object code Looking at the vmlinux_path (8 entries long) Using /boot/vmlinux for symbols 0xc00000000003d300: diff name v: .kretprobe_trampoline_holder k: kretprobe_trampoline Maps only in vmlinux: c00000000086ca38-c000000000879b6c 87ca38 [kernel].text.unlikely c000000000879b6c-c000000000bf0000 889b6c [kernel].meminit.text c000000000bf0000-c000000000c53264 c00000 [kernel].init.text c000000000c53264-d000000004250000 c63264 [kernel].exit.text d000000004250000-d000000004450000 0 [libcrc32c] d000000004450000-d000000004620000 0 [xfs] d000000004620000-d000000004680000 0 [autofs4] d000000004680000-d0000000046e0000 0 [x_tables] d0000000046e0000-d000000004780000 0 [ip_tables] d000000004780000-d0000000047e0000 0 [rng_core] d0000000047e0000-ffffffffffffffff 0 [pseries_rng] Maps in vmlinux with a different name in kallsyms: Maps only in kallsyms: d000000000000000-f000000000000000 1000000000010000 [kernel.kallsyms] f000000000000000-ffffffffffffffff 3000000000010000 [kernel.kallsyms] test child finished with -1 ---- end ---- vmlinux symtab matches kallsyms: FAILED! The problem is that the kretprobe_trampoline symbol looks like this: $ eu-readelf -s /boot/vmlinux G kretprobe_trampoline 2431: c000000001302368 24 NOTYPE LOCAL DEFAULT 37 kretprobe_trampoline_holder 2432: c00000000003d300 8 FUNC LOCAL DEFAULT 1 .kretprobe_trampoline_holder 97543: c00000000003d300 0 NOTYPE GLOBAL DEFAULT 1 kretprobe_trampoline Its type is NOTYPE, and its size is 0, and this is a problem because symbol-elf.c:dso__load_sym skips function symbols that are not STT_FUNC or STT_GNU_IFUNC (this is determined by elf_sym__is_function). Even if the type is changed to STT_FUNC, when dso__load_sym calls symbols__fixup_duplicate, the kretprobe_trampoline symbol is dropped in favour of .kretprobe_trampoline_holder because the latter has non-zero size (as determined by choose_best_symbol). With this patch, all vmlinux symbols match /proc/kallsyms and the testcase passes. Commit c1c355ce14c0 ("x86/kprobes: Get rid of kretprobe_trampoline_holder()") gets rid of kretprobe_trampoline_holder altogether on x86. This commit does the same on powerpc. This change introduces no regressions on the perf and ftracetest testsuite results. Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>