blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2018-10-08	macintosh/adb: Rework printk output again	Finn Thain	2	-35/+26
	Avoid the KERN_CONT problem by avoiding message fragments. The problem arises during async ADB bus probing, when ADB messages may get mixed up with other messages. See also, commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines"). Remove a number of printk() continuation lines by logging handler changes in adb_try_handler_change() instead. This patch addresses the problematic use of "\n" at the beginning of pr_cont() messages, which got overlooked in commit f2be6295684b ("macintosh/adb: Properly mark continued kernel messages"). That commit also changed printk(KERN_DEBUG ...) to pr_debug(...), which hinders work on low-level ADB driver bugs. Revert that change. Cc: Andreas Schwab <[email protected]> Tested-by: Stan Johnson <[email protected]> Signed-off-by: Finn Thain <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-08	macintosh: Use common code to access RTC	Finn Thain	6	-171/+106
	Now that the 68k Mac port has adopted the via-pmu driver, the same RTC code can be shared between m68k and powerpc. Replace duplicated code in arch/powerpc and arch/m68k with common RTC accessors for Cuda and PMU. Drop the problematic WARN_ON which was introduced in commit 22db552b50fa ("powerpc/powermac: Fix rtc read/write functions"). Tested-by: Stan Johnson <[email protected]> Signed-off-by: Finn Thain <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Arnd Bergmann <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-05	powerpc/numa: Skip onlining a offline node in kdump path	Srikar Dronamraju	1	-2/+3
	With commit 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot"), kdump kernel on shared LPAR may crash. The necessary conditions are - Shared LPAR with at least 2 nodes having memory and CPUs. - Memory requirement for kdump kernel must be met by the first N-1 nodes where there are at least N nodes with memory and CPUs. Example numactl of such a machine. $ numactl -H available: 5 nodes (0,2,5-7) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: node 2 size: 255 MB node 2 free: 189 MB node 5 cpus: 24 25 26 27 28 29 30 31 node 5 size: 4095 MB node 5 free: 4024 MB node 6 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 6 size: 6353 MB node 6 free: 5998 MB node 7 cpus: 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39 node 7 size: 7640 MB node 7 free: 7164 MB node distances: node 0 2 5 6 7 0: 10 40 40 40 40 2: 40 10 40 40 40 5: 40 40 10 40 40 6: 40 40 40 10 20 7: 40 40 40 20 10 Steps to reproduce. 1. Load / start kdump service. 2. Trigger a kdump (for example : echo c > /proc/sysrq-trigger) When booting a kdump kernel with 2048M: kexec: Starting switchover sequence. I'm in purgatory Using 1TB segments hash-mmu: Initializing hash mmu with SLB Linux version 4.19.0-rc5-master+ (srikar@linux-xxu6) (gcc version 4.8.5 (SUSE Linux)) #1 SMP Thu Sep 27 19:45:00 IST 2018 Found initrd at 0xc000000009e70000:0xc00000000ae554b4 Using pSeries machine description ----------------------------------------------------- ppc64_pft_size = 0x1e phys_mem_size = 0x88000000 dcache_bsize = 0x80 icache_bsize = 0x80 cpu_features = 0x000000ff8f5d91a7 possible = 0x0000fbffcf5fb1a7 always = 0x0000006f8b5c91a1 cpu_user_features = 0xdc0065c2 0xef000000 mmu_features = 0x7c006001 firmware_features = 0x00000007c45bfc57 htab_hash_mask = 0x7fffff physical_start = 0x8000000 ----------------------------------------------------- numa: NODE_DATA [mem 0x87d5e300-0x87d67fff] numa: NODE_DATA(0) on node 6 numa: NODE_DATA [mem 0x87d54600-0x87d5e2ff] Top of RAM: 0x88000000, Total RAM: 0x88000000 Memory hole size: 0MB Zone ranges: DMA [mem 0x0000000000000000-0x0000000087ffffff] DMA32 empty Normal empty Movable zone start for each node Early memory node ranges node 6: [mem 0x0000000000000000-0x0000000087ffffff] Could not find start_pfn for node 0 Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000] On node 0 totalpages: 0 Initmem setup node 6 [mem 0x0000000000000000-0x0000000087ffffff] On node 6 totalpages: 34816 Unable to handle kernel paging request for data at address 0x00000060 Faulting instruction address: 0xc000000008703a54 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 11 PID: 1 Comm: swapper/11 Not tainted 4.19.0-rc5-master+ #1 NIP: c000000008703a54 LR: c000000008703a38 CTR: 0000000000000000 REGS: c00000000b673440 TRAP: 0380 Not tainted (4.19.0-rc5-master+) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24022022 XER: 20000002 CFAR: c0000000086fc238 IRQMASK: 0 GPR00: c000000008703a38 c00000000b6736c0 c000000009281900 0000000000000000 GPR04: 0000000000000000 0000000000000000 fffffffffffff001 c00000000b660080 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000220 GPR12: 0000000000002200 c000000009e51400 0000000000000000 0000000000000008 GPR16: 0000000000000000 c000000008c152e8 c000000008c152a8 0000000000000000 GPR20: c000000009422fd8 c000000009412fd8 c000000009426040 0000000000000008 GPR24: 0000000000000000 0000000000000000 c000000009168bc8 c000000009168c78 GPR28: c00000000b126410 0000000000000000 c00000000916a0b8 c00000000b126400 NIP [c000000008703a54] bus_add_device+0x84/0x1e0 LR [c000000008703a38] bus_add_device+0x68/0x1e0 Call Trace: [c00000000b6736c0] [c000000008703a38] bus_add_device+0x68/0x1e0 (unreliable) [c00000000b673740] [c000000008700194] device_add+0x454/0x7c0 [c00000000b673800] [c00000000872e660] __register_one_node+0xb0/0x240 [c00000000b673860] [c00000000839a6bc] __try_online_node+0x12c/0x180 [c00000000b673900] [c00000000839b978] try_online_node+0x58/0x90 [c00000000b673930] [c0000000080846d8] find_and_online_cpu_nid+0x158/0x190 [c00000000b673a10] [c0000000080848a0] numa_update_cpu_topology+0x190/0x580 [c00000000b673c00] [c000000008d3f2e4] smp_cpus_done+0x94/0x108 [c00000000b673c70] [c000000008d5c00c] smp_init+0x174/0x19c [c00000000b673d00] [c000000008d346b8] kernel_init_freeable+0x1e0/0x450 [c00000000b673dc0] [c0000000080102e8] kernel_init+0x28/0x160 [c00000000b673e30] [c00000000800b65c] ret_from_kernel_thread+0x5c/0x80 Instruction dump: 60000000 60000000 e89e0020 7fe3fb78 4bff87d5 60000000 7c7d1b79 4082008c e8bf0050 e93e0098 3b9f0010 2fa50000 <e8690060> 38630018 419e0114 7f84e378 ---[ end trace 593577668c2daa65 ]--- However a regular kernel with 4096M (2048 gets reserved for crash kernel) boots properly. Unlike regular kernels, which mark all available nodes as online, kdump kernel only marks just enough nodes as online and marks the rest as offline at boot. However kdump kernel boots with all available CPUs. With Commit 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot"), all CPUs are onlined on their respective nodes at boot time. try_online_node() tries to online the offline nodes but fails as all needed subsystems are not yet initialized. As part of fix, detect and skip early onlining of a offline node. Fixes: 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot") Reported-by: Pavithra Prakash <[email protected]> Signed-off-by: Srikar Dronamraju <[email protected]> Tested-by: Hari Bathini <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-05	powerpc: Don't print kernel instructions in show_user_instructions()	Michael Ellerman	1	-0/+10
	Recently we implemented show_user_instructions() which dumps the code around the NIP when a user space process dies with an unhandled signal. This was modelled on the x86 code, and we even went so far as to implement the exact same bug, namely that if the user process crashed with its NIP pointing into the kernel we will dump kernel text to dmesg. eg: bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1 bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6 bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048 This was discovered on x86 by Jann Horn and fixed in commit 342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP"). Fix it by checking the adjusted NIP value (pc) and number of instructions against USER_DS, and bail if we fail the check, eg: bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1 bad-bctr[2969]: Bad NIP, not dumping instructions. Fixes: 88b0fe175735 ("powerpc: Add show_user_instructions()") Signed-off-by: Michael Ellerman <[email protected]>
2018-10-04	powerpc/64s/radix: Explicitly flush ERAT with local LPID invalidation	Nicholas Piggin	1	-0/+1
	Local radix TLB flush operations that operate on congruence classes have explicit ERAT flushes for POWER9. The process scoped LPID flush did not have a flush, so add it. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-04	powerpc/64s/hash: Do not use PPC_INVALIDATE_ERAT on CPUs before POWER9	Nicholas Piggin	2	-2/+9
	PPC_INVALIDATE_ERAT is slbia IH=7 which is a new variant introduced with POWER9, and the result is undefined on earlier CPUs. Commits 7b9f71f974 ("powerpc/64s: POWER9 machine check handler") and d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") caused POWER7/8 code to use this instruction. Remove it. An ERAT flush can be made by invalidatig the SLB, but before POWER9 that requires a flush and rebolt. Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler") Fixes: d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Cc: [email protected] # v4.11+ Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-04	powerpc/time: Add set_state_oneshot_stopped decrementer callback	Anton Blanchard	1	-0/+1
	If CONFIG_PPC_WATCHDOG is enabled we always cap the decrementer to 0x7fffffff: if (IS_ENABLED(CONFIG_PPC_WATCHDOG)) set_dec(0x7fffffff); else set_dec(decrementer_max); If there are no future events, we don't reprogram the decrementer after this and we end up with 0x7fffffff even on a large decrementer capable system. As suggested by Nick, add a set_state_oneshot_stopped callback so we program the decrementer with decrementer_max if there are no future events. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-04	powerpc/time: Use clockevents_register_device(), fixing an issue with large ↵	Anton Blanchard	1	-14/+3
	decrementer We currently cap the decrementer clockevent at 4 seconds, even on systems with large decrementer support. Fix this by converting the code to use clockevents_register_device() which calculates the upper bound based on the max_delta passed in. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-04	powerpc/powernv/npu: Remove atsd_threshold debugfs setting	Mark Hairgrove	1	-14/+0
	This threshold is no longer used now that all invalidates issue a single ATSD to each active NPU. Signed-off-by: Mark Hairgrove <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-04	powerpc/powernv/npu: Use size-based ATSD invalidates	Mark Hairgrove	1	-48/+55
	Prior to this change only two types of ATSDs were issued to the NPU: invalidates targeting a single page and invalidates targeting the whole address space. The crossover point happened at the configurable atsd_threshold which defaulted to 2M. Invalidates that size or smaller would issue per-page invalidates for the whole range. The NPU supports more invalidation sizes however: 64K, 2M, 1G, and all. These invalidates target addresses aligned to their size. 2M is a common invalidation size for GPU-enabled applications because that is a GPU page size, so reducing the number of invalidates by 32x in that case is a clear improvement. ATSD latency is high in general so now we always issue a single invalidate rather than multiple. This will over-invalidate in some cases, but for any invalidation size over 2M it matches or improves the prior behavior. There's also an improvement for single-page invalidates since the prior version issued two invalidates for that case instead of one. With this change all issued ATSDs now perform a flush, so the flush parameter has been removed from all the helpers. To show the benefit here are some performance numbers from a microbenchmark which creates a 1G allocation then uses mprotect with PROT_NONE to trigger invalidates in strides across the allocation. One NPU (1 GPU): mprotect rate (GB/s) Stride Before After Speedup 64K 5.3 5.6 5% 1M 39.3 57.4 46% 2M 49.7 82.6 66% 4M 286.6 285.7 0% Two NPUs (6 GPUs): mprotect rate (GB/s) Stride Before After Speedup 64K 6.5 7.4 13% 1M 33.4 67.9 103% 2M 38.7 93.1 141% 4M 356.7 354.6 -1% Anything over 2M is roughly the same as before since both cases issue a single ATSD. Signed-off-by: Mark Hairgrove <[email protected]> Reviewed-By: Alistair Popple <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-04	powerpc/powernv/npu: Reduce eieio usage when issuing ATSD invalidates	Mark Hairgrove	1	-51/+48
	There are two types of ATSDs issued to the NPU: invalidates targeting a specific virtual address and invalidates targeting the whole address space. In both cases prior to this change, the sequence was: for each NPU - Write the target address to the XTS_ATSD_AVA register - EIEIO - Write the launch value to issue the ATSD First, a target address is not required when invalidating the whole address space, so that write and the EIEIO have been removed. The AP (size) field in the launch is not needed either. Second, for per-address invalidates the above sequence is inefficient in the common case of multiple NPUs because an EIEIO is issued per NPU. This unnecessarily forces the launches of later ATSDs to be ordered with the launches of earlier ones. The new sequence only issues a single EIEIO: for each NPU - Write the target address to the XTS_ATSD_AVA register EIEIO for each NPU - Write the launch value to issue the ATSD Performance results were gathered using a microbenchmark which creates a 1G allocation then uses mprotect with PROT_NONE to trigger invalidates in strides across the allocation. With only a single NPU active (one GPU) the difference is in the noise for both types of invalidates (+/-1%). With two NPUs active (on a 6-GPU system) the effect is more noticeable: mprotect rate (GB/s) Stride Before After Speedup 64K 5.9 6.5 10% 1M 31.2 33.4 7% 2M 36.3 38.7 7% 4M 322.6 356.7 11% Signed-off-by: Mark Hairgrove <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-04	powerpc: remove leftover code of old GCC version checks	Masahiro Yamada	1	-8/+0
	Clean up the leftover of commit f2910f0e6835 ("powerpc: remove old GCC version checks"). Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-04	powerpc/nohash: fix undefined behaviour when testing page size support	Daniel Axtens	1	-0/+3
	When enumerating page size definitions to check hardware support, we construct a constant which is (1U << (def->shift - 10)). However, the array of page size definitions is only initalised for various MMU_PAGE_* constants, so it contains a number of 0-initialised elements with def->shift == 0. This means we end up shifting by a very large number, which gives the following UBSan splat: ================================================================================ UBSAN: Undefined behaviour in /home/dja/dev/linux/linux/arch/powerpc/mm/tlb_nohash.c:506:21 shift exponent 4294967286 is too large for 32-bit type 'unsigned int' CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-00045-ga604f927b012-dirty #6 Call Trace: [c00000000101bc20] [c000000000a13d54] .dump_stack+0xa8/0xec (unreliable) [c00000000101bcb0] [c0000000004f20a8] .ubsan_epilogue+0x18/0x64 [c00000000101bd30] [c0000000004f2b10] .__ubsan_handle_shift_out_of_bounds+0x110/0x1a4 [c00000000101be20] [c000000000d21760] .early_init_mmu+0x1b4/0x5a0 [c00000000101bf10] [c000000000d1ba28] .early_setup+0x100/0x130 [c00000000101bf90] [c000000000000528] start_here_multiplatform+0x68/0x80 ================================================================================ Fix this by first checking if the element exists (shift != 0) before constructing the constant. Signed-off-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc: Wire up memtest	Christophe Leroy	2	-1/+4
	Add call to early_memtest() so that kernel compiled with CONFIG_MEMTEST really perform memtest at startup when requested via 'memtest' boot parameter. Tested-by: Daniel Axtens <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/mm: Don't report hugepage tables as memory leaks when using kmemleak	Christophe Leroy	1	-0/+3
	When a process allocates a hugepage, the following leak is reported by kmemleak. This is a false positive which is due to the pointer to the table being stored in the PGD as physical memory address and not virtual memory pointer. unreferenced object 0xc30f8200 (size 512): comm "mmap", pid 374, jiffies 4872494 (age 627.630s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<e32b68da>] huge_pte_alloc+0xdc/0x1f8 [<9e0df1e1>] hugetlb_fault+0x560/0x8f8 [<7938ec6c>] follow_hugetlb_page+0x14c/0x44c [<afbdb405>] __get_user_pages+0x1c4/0x3dc [<b8fd7cd9>] __mm_populate+0xac/0x140 [<3215421e>] vm_mmap_pgoff+0xb4/0xb8 [<c148db69>] ksys_mmap_pgoff+0xcc/0x1fc [<4fcd760f>] ret_from_syscall+0x0/0x38 See commit a984506c542e2 ("powerpc/mm: Don't report PUDs as memory leaks when using kmemleak") for detailed explanation. To fix that, this patch tells kmemleak to ignore the allocated hugepage table. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/tm: Reformat comments	Michael Neuling	1	-28/+39
	The comments in this file don't conform to the coding style so take them to "Comment Formatting Re-Education Camp". Suggested-by: Michael "Camp Drill Sergeant" Ellerman <[email protected]> Signed-off-by: Michael Neuling <[email protected]> [mpe: Reflow some comments and add full stops, fix spelling of Sergeant.] Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/config: Enable CONFIG_PRINTK_TIME	Petr Vorel	6	-0/+6
	for 64bit configs which use for CONFIG_LOG_BUF_SHIFT the same or higher value than the default (currently 17). Signed-off-by: Petr Vorel <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc: Remove duplicated include from pci_32.c	YueHaibing	1	-1/+0
	Remove duplicated include. Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Stephen Rothwell <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/64s: consolidate MCE counter increment.	Michal Suchanek	2	-5/+1
	The code in machine_check_exception excludes 64s hvmode when incrementing the MCE counter only to call opal_machine_check to increment it specifically for this case. Remove the exclusion and special case. Fixes: a43c1590426c ("powerpc/pseries: Flush SLB contents on SLB MCE errors.") Signed-off-by: Michal Suchanek <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/tm: Print 64-bits MSR	Breno Leitao	1	-1/+1
	On a kernel TM Bad thing program exception, the Machine State Register (MSR) is not being properly displayed. The exception code dumps a 32-bits value but MSR is a 64 bits register for all platforms that have HTM enabled. This patch dumps the MSR value as a 64-bits value instead of 32 bits. In order to do so, the 'reason' variable could not be used, since it trimmed MSR to 32-bits (int). Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/tm: Remove msr_tm_active()	Breno Leitao	2	-13/+15
	Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set. This function is not necessary, since MSR_TM_ACTIVE() just do the same and could be used, removing the dualism and simplifying the code. This patchset remove every instance of msr_tm_active() and replaced it by MSR_TM_ACTIVE(). Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/powernv: Mark function as __noreturn	Breno Leitao	1	-1/+1
	There is a mismatch between function pnv_platform_error_reboot() definition and declaration regarding function modifiers. In the declaration part, it contains the function attribute __noreturn, while function definition itself lacks it. This was reported by sparse tool as an error: arch/powerpc/platforms/powernv/opal.c:538:6: error: symbol 'pnv_platform_error_reboot' redeclared with different type (originally declared at arch/powerpc/platforms/powernv/powernv.h:11) - different modifiers I checked and the function is already being considered as being 'noreturn' by the compiler, thus, I understand this patch does not change any code being generated. Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	selftests/powerpc: New PTRACE_SYSEMU test	Breno Leitao	2	-1/+229
	This patch adds a new test for the new PTRACE_SYSEMU ptrace request. This test also relies on PTRACE_GETREGS and PTRACE_SETREGS requests to run properly, since the trace instruction (gettid() syscall) is being modified at run-time (by PTRACE_SETREGS) and re-executed three times. PTRACE_GETREGS is being used to check that the registers are still sane. This test basically creates a child process that executes syscalls and the parent process check if it is being traced appropriately. The parent process guarantees that the SYSCALLs are being traced, with PTRACE_SYSEMU, and ptrace stops the child application before a syscall is executed. The way the tests validates it, is by guaranteeing that the system calls arguments, as argv[0] (r3) which is the same register that will have the syscall return value on powerpc, are not being corrupted on PTRACE_SYSEMU with a return value, i.e, it continues to have the current arguments instead, meaning that the registers where not clobbered. This test is basically the same test for x86 located at tools/testing/selftests/x86/ptrace_syscall.c, limited to test PTRACE_SYSEMU request, and ported to PowerPC. Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/ptrace: Add support for PTRACE_SYSEMU	Breno Leitao	3	-1/+18
	This is a patch that adds support for PTRACE_SYSEMU ptrace request in PowerPC architecture. When ptrace(PTRACE_SYSEMU, ...) request is called, it will be handled by the arch independent function ptrace_resume(), which will tag the task with the TIF_SYSCALL_EMU flag. This flag needs to be handled from a platform dependent point of view, which is what this patch does. This patch adds this task's flag as part of the _TIF_SYSCALL_DOTRACE, which is the MACRO that is used to trace syscalls at entrance/exit. Since TIF_SYSCALL_EMU is now part of _TIF_SYSCALL_DOTRACE, if the task has _TIF_SYSCALL_DOTRACE set, it will hit do_syscall_trace_enter() at syscall entrance and do_syscall_trace_leave() at syscall leave. do_syscall_trace_enter() needs to handle the TIF_SYSCALL_EMU flag properly, which will interrupt the syscall executing if TIF_SYSCALL_EMU is set. The output values should not be changed, i.e. the return value (r3) should contain the original syscall argument on exit. With this flag set, the syscall is not executed fundamentally, because do_syscall_trace_enter() is returning -1 which is bigger than NR_syscall, thus, skipping the syscall execution and exiting userspace. Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc: Redefine TIF_32BITS thread flag	Breno Leitao	2	-2/+2
	Moving TIF_32BIT to use bit 20 instead of 4 in the task flag field. This change is making room for an upcoming new task macro (_TIF_SYSCALL_EMU) which is preferred to set a bit in the lower 16-bits part of the word. This upcoming flag macro will take part in a composed macro (_TIF_SYSCALL_DOTRACE) which will contain other flags as well, and it is preferred that the whole _TIF_SYSCALL_DOTRACE macro only sets the lower 16 bits of a word, so, it could be handled using immediate operations (as load immediate, add immediate, ...) where the immediate operand (SI) is limited to 16-bits. Another possible solution would be using the LOAD_REG_IMMEDIATE() macro to load a full 64-bits word immediate, but it takes 5 operations instead of one. Having TIF_32BITS being redefined to use an upper bit is not a problem since there is only one place in the assembly code where TIF_32BIT is being used, and it could be replaced with an operation with right shift (addis), since it is used alone, i.e. not being part of a composed macro, which has different bits set, and would require LOAD_REG_IMMEDIATE(). Tested on a 64 bits Big Endian machine running a 32 bits task. Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/64: add stack protector support	Christophe Leroy	6	-1/+23
	On PPC64, as register r13 points to the paca_struct at all time, this patch adds a copy of the canary there, which is copied at task_switch. That new canary is then used by using the following GCC options: -mstack-protector-guard=tls -mstack-protector-guard-reg=r13 -mstack-protector-guard-offset=offsetof(struct paca_struct, canary)) Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/32: add stack protector support	Christophe Leroy	6	-0/+51
	This functionality was tentatively added in the past (commit 6533b7c16ee5 ("powerpc: Initial stack protector (-fstack-protector) support")) but had to be reverted (commit f2574030b0e3 ("powerpc: Revert the initial stack protector support") because of GCC implementing it differently whether it had been built with libc support or not. Now, GCC offers the possibility to manually set the stack-protector mode (global or tls) regardless of libc support. This time, the patch selects HAVE_STACKPROTECTOR only if -mstack-protector-guard=tls is supported by GCC. On PPC32, as register r2 points to current task_struct at all time, the stack_canary located inside task_struct can be used directly by using the following GCC options: -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 -mstack-protector-guard-offset=offsetof(struct task_struct, stack_canary)) The protector is disabled for prom_init and bootx_init as it is too early to handle it properly. $ echo CORRUPT_STACK > /sys/kernel/debug/provoke-crash/DIRECT [ 134.943666] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: lkdtm_CORRUPT_STACK+0x64/0x64 [ 134.943666] [ 134.955414] CPU: 0 PID: 283 Comm: sh Not tainted 4.18.0-s3k-dev-12143-ga3272be41209 #835 [ 134.963380] Call Trace: [ 134.965860] [c6615d60] [c001f76c] panic+0x118/0x260 (unreliable) [ 134.971775] [c6615dc0] [c001f654] panic+0x0/0x260 [ 134.976435] [c6615dd0] [c032c368] lkdtm_CORRUPT_STACK_STRONG+0x0/0x64 [ 134.982769] [c6615e00] [ffffffff] 0xffffffff Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/xive: Move a dereference below a NULL test	zhong jiang	1	-3/+4
	Move the dereference of xc below the NULL test. Signed-off-by: zhong jiang <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/pseries: Fix how we iterate over the DTL entries	Naveen N. Rao	1	-1/+1
	When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we look up dtl_idx in the lppaca to determine the number of entries in the buffer. Since lppaca is in big endian, we need to do an endian conversion before using this in our calculation to determine the number of entries in the buffer. Without this, we do not iterate over the existing entries in the DTL buffer properly. Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/pseries: Fix DTL buffer registration	Naveen N. Rao	1	-1/+1
	When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we register the DTL buffer for a cpu when the associated file under powerpc/dtl in debugfs is opened. When doing so, we need to set the size of the buffer being registered in the second u32 word of the buffer. This needs to be in big endian, but we are not doing the conversion resulting in the below error showing up in dmesg: dtl_start: DTL registration for cpu 0 (hw 0) failed with -4 Fix this in the obvious manner. Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/traps: merge unrecoverable_exception() and nonrecoverable_exception()	Christophe Leroy	3	-13/+4
	PPC32 uses nonrecoverable_exception() while PPC64 uses unrecoverable_exception(). Both functions are doing almost the same thing. This patch removes nonrecoverable_exception() Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc: Convert to using %pOFn instead of device_node.name	Rob Herring	9	-40/+40
	In preparation to remove the node name pointer from struct device_node, convert printf users to use the %pOFn format specifier. Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: [email protected] Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	macintosh: Convert to using %pOFn instead of device_node.name	Rob Herring	3	-6/+14
	In preparation to remove the node name pointer from struct device_node, convert printf users to use the %pOFn format specifier. Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/pseries: Use of_irq_get helper() in request_event_sources_irqs()	Rob Herring	1	-27/+13
	Instead of calling both of_irq_parse_one() and irq_create_of_mapping(), call of_irq_get() instead which does essentially the same thing. of_irq_get() also calls irq_find_host() for deferred probe support, but this should be fine as irq_create_of_mapping() also calls that internally. This gets us closer to making the former 2 functions static. In the process of simplifying request_event_sources_irqs(), combine the the pr_err() and WARN_ON() calls to just a WARN(). Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/cell: Use irq_of_parse_and_map() helper	Rob Herring	1	-17/+4
	Instead of calling both of_irq_parse_one() and irq_create_of_mapping(), call of_irq_parse_and_map() instead which does the same thing. This gets us closer to making the former 2 functions static. Signed-off-by: Rob Herring <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/mm:book3s: Enable THP migration support	Aneesh Kumar K.V	2	-0/+9
	Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/mm/thp: update pmd_trans_huge to check for pmd_present	Aneesh Kumar K.V	2	-0/+21
	We need to make sure pmd_trans_huge returns false for a pmd migration entry. We mark the migration entry by clearing the _PAGE_PRESENT bit. We keep the _PAGE_PTE bit set to indicate a leaf page table entry. Hence we need to make sure we check for pmd_present() so that pmd_trans_huge won't return true on pmd migration entry. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	arch/powerpc/mm/hash: validate the pte entries before handling the hash fault	Aneesh Kumar K.V	2	-0/+10
	Make sure we are operating on THP and hugetlb entries in the respective hash fault handling routines. No functional change in this patch. If we walked the table wrongly before, we will retry the access. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/mm/book3s: Check for pmd_large instead of pmd_trans_huge	Aneesh Kumar K.V	3	-4/+8
	Update few code paths to check for pmd_large. set_pmd_at: We want to use this to store swap pte at pmd level. For swap ptes we don't want to set H_PAGE_THP_HUGE. Hence check for pmd_large in set_pmd_at. This remove the false WARN_ON when using this with swap pmd entry. pmd_page: We don't really use them on pmd migration entries. But they can also work with migration entries and we don't differentiate at the pte level. Hence update pmd_page to work with pmd migration entries too __find_linux_pte: lockless page table walk need to handle pmd migration entries. pmd_trans_huge check will return false on them. We don't set thp = 1 for such entries, but update hpage_shift correctly. Without this we will walk pmd migration entries as a pte page pointer which is wrong. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/mm/hugetlb/book3s: add _PAGE_PRESENT to hugepd pointer.	Aneesh Kumar K.V	3	-2/+5
	This make hugetlb directory pointer similar to other page able entries. A hugepd entry is identified by lack of _PAGE_PTE bit set and directory size stored in HUGEPD_SHIFT_MASK. We update that to also look at _PAGE_PRESENT Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit	Aneesh Kumar K.V	5	-12/+28
	With this patch we use 0x8000000000000000UL (_PAGE_PRESENT) to indicate a valid pgd/pud/pmd entry. We also switch the p**_present() to look at this bit. With pmd_present, we have a special case. We need to make sure we consider a pmd marked invalid during THP split as present. Right now we clear the _PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is only used with _PAGE_PRESENT cleared. Hence we are not really losing a pte bit for this special case. pmd_present is also updated to look at _PAGE_INVALID. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/powernv: Make possible for user to force a full ipl cec reboot	Vaibhav Jain	2	-6/+31
	Ever since fast reboot is enabled by default in opal, opal_cec_reboot() will use fast-reset instead of full IPL to perform system reboot. This leaves the user with no direct way to force a full IPL reboot except changing an nvram setting that persistently disables fast-reset for all subsequent reboots. This patch provides a more direct way for the user to force a one-shot full IPL reboot by passing the command line argument 'full' to the reboot command. So the user will be able to tweak the reboot behavior via: $ sudo reboot full # Force a full ipl reboot skipping fast-reset or $ sudo reboot # default reboot path (usually fast-reset) The reboot command passes the un-parsed command argument to the kernel via the 'Reboot' syscall which is then passed on to the arch function pnv_restart(). The patch updates pnv_restart() to handle this cmd-arg and issues opal_cec_reboot2 with OPAL_REBOOT_FULL_IPL to force a full IPL reset. Signed-off-by: Vaibhav Jain <[email protected]> Acked-by: Andrew Donnellan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-03	powerpc/perf: Add missing break in power7_marked_instr_event()	Michael Ellerman	1	-0/+1
	In power7_marked_instr_event() there is a switch case that is missing a break or an explicit fallthrough, it's not immediately clear which it should be. The function determines based on the PMU event code, whether the event is a "marked" event (which then requires us to configure the PMU in a certain way). On Power7 there is no specific bit(s) in the event to tell us that, we just have to know. Rather than having a full list of every event and whether they are marked, we pull apart the event code and for events with certain values of certain fields we can say that those are all marked events. We take the psel (bits 0-7) of the event, and look at bits 4-7. For a value of 6 we say that if the entire psel == 0x64 then if the pmc == 3 the event is marked, else not, and otherwise we continue. It is then that we fallthrough to the 8 case, where we return true if the unit == 0xd. The question is should the 6 case also fallthrough and check for unit == 0xd, or should it return. Looking at the full list of events we see that there are zero events where (psel >> 4) == 0x6 and unit == 0xd. So the answer is it doesn't really matter, there are no valid event codes that will return a different result whether we fallthrough or break. But equally, testing the 6 case events against unit == 0xd is slightly bogus, as there are no such events. So to make the code clearer, and avoid any future confusion, have the 6 case break rather than falling through. Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Madhavan Srinivasan <[email protected]>
2018-10-03	Revert "convert SLB miss handlers to C" and subsequent commits	Michael Ellerman	19	-457/+766
	This reverts commits: 5e46e29e6a97 ("powerpc/64s/hash: convert SLB miss handlers to C") 8fed04d0f6ae ("powerpc/64s/hash: remove user SLB data from the paca") 655deecf67b2 ("powerpc/64s/hash: SLB allocation status bitmaps") 2e1626744e8d ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup") 89ca4e126a3f ("powerpc/64s/hash: Add a SLB preload cache") This series had a few bugs, and the fixes are not all trivial. So revert most of it for now. Signed-off-by: Michael Ellerman <[email protected]>
2018-10-02	powerpc/lib: fix book3s/32 boot failure due to code patching	Christophe Leroy	1	-8/+12
	Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") accesses 'init_mem_is_free' flag too early, before the kernel is relocated. This provokes early boot failure (before the console is active). As it is not necessary to do this verification that early, this patch moves the test into patch_instruction() instead of __patch_instruction(). This modification also has the advantage of avoiding unnecessary remappings. Fixes: 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") Cc: [email protected] # 4.13+ Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-02	lib/xz: Put CRC32_POLY_LE in xz_private.h	Joel Stanley	2	-1/+4
	This fixes a regression introduced by faa16bc404d72a5 ("lib: Use existing define with polynomial"). The cleanup added a dependency on include/linux, which broke the PowerPC boot wrapper/decompresser when KERNEL_XZ is enabled: BOOTCC arch/powerpc/boot/decompress.o In file included from arch/powerpc/boot/../../../lib/decompress_unxz.c:233, from arch/powerpc/boot/decompress.c:42: arch/powerpc/boot/../../../lib/xz/xz_crc32.c:18:10: fatal error: linux/crc32poly.h: No such file or directory #include <linux/crc32poly.h> ^~~~~~~~~~~~~~~~~~~ The powerpc decompresser is a hairy corner of the kernel. Even while building a 64-bit kernel it needs to build a 32-bit binary and therefore avoid including files from include/linux. This allows users of the xz library to avoid including headers from 'include/linux/' while still achieving the cleanup of the magic number. Fixes: faa16bc404d72a5 ("lib: Use existing define with polynomial") Reported-by: Meelis Roos <[email protected]> Reported-by: kbuild test robot <[email protected]> Suggested-by: Christophe LEROY <[email protected]> Signed-off-by: Joel Stanley <[email protected]> Tested-by: Meelis Roos <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-09-28	selftests/powerpc: Fix Makefiles for headers_install change	Michael Ellerman	17	-0/+17
	Commit b2d35fa5fc80 ("selftests: add headers_install to lib.mk") introduced a requirement that Makefiles more than one level below the selftests directory need to define top_srcdir, but it didn't update any of the powerpc Makefiles. This broke building all the powerpc selftests with eg: make[1]: Entering directory '/src/linux/tools/testing/selftests/powerpc' BUILD_TARGET=/src/linux/tools/testing/selftests/powerpc/alignment; mkdir -p $BUILD_TARGET; make OUTPUT=$BUILD_TARGET -k -C alignment all make[2]: Entering directory '/src/linux/tools/testing/selftests/powerpc/alignment' ../../lib.mk:20: ../../../../scripts/subarch.include: No such file or directory make[2]: *** No rule to make target '../../../../scripts/subarch.include'. make[2]: Failed to remake makefile '../../../../scripts/subarch.include'. Makefile:38: recipe for target 'alignment' failed Fix it by setting top_srcdir in the affected Makefiles. Fixes: b2d35fa5fc80 ("selftests: add headers_install to lib.mk") Signed-off-by: Michael Ellerman <[email protected]>
2018-09-25	powerpc/numa: Use associativity if VPHN hcall is successful	Srikar Dronamraju	1	-1/+3
	Currently associativity is used to lookup node-id even if the preceding VPHN hcall failed. However this can cause CPU to be made part of the wrong node, (most likely to be node 0). This is because VPHN is not enabled on KVM guests. With 2ea6263 ("powerpc/topology: Get topology for shared processors at boot"), associativity is used to set to the wrong node. Hence KVM guest topology is broken. For example : A 4 node KVM guest before would have reported. [root@localhost ~]# numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 node 0 size: 1746 MB node 0 free: 1604 MB node 1 cpus: 4 5 6 7 node 1 size: 2044 MB node 1 free: 1765 MB node 2 cpus: 8 9 10 11 node 2 size: 2044 MB node 2 free: 1837 MB node 3 cpus: 12 13 14 15 node 3 size: 2044 MB node 3 free: 1903 MB node distances: node 0 1 2 3 0: 10 40 40 40 1: 40 10 40 40 2: 40 40 10 40 3: 40 40 40 10 Would now report: [root@localhost ~]# numactl -H available: 4 nodes (0-3) node 0 cpus: 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 1746 MB node 0 free: 1244 MB node 1 cpus: node 1 size: 2044 MB node 1 free: 2032 MB node 2 cpus: 1 node 2 size: 2044 MB node 2 free: 2028 MB node 3 cpus: node 3 size: 2044 MB node 3 free: 2032 MB node distances: node 0 1 2 3 0: 10 40 40 40 1: 40 10 40 40 2: 40 40 10 40 3: 40 40 40 10 Fix this by skipping associativity lookup if the VPHN hcall failed. Fixes: 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot") Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-09-25	powerpc/tm: Avoid possible userspace r1 corruption on reclaim	Michael Neuling	1	-1/+8
	Current we store the userspace r1 to PACATMSCRATCH before finally saving it to the thread struct. In theory an exception could be taken here (like a machine check or SLB miss) that could write PACATMSCRATCH and hence corrupt the userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but others do. We've never actually seen this happen but it's theoretically possible. Either way, the code is fragile as it is. This patch saves r1 to the kernel stack (which can't fault) before we turn MSR[RI] back on. PACATMSCRATCH is still used but only with MSR[RI] off. We then copy r1 from the kernel stack to the thread struct once we have MSR[RI] back on. Suggested-by: Breno Leitao <[email protected]> Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-09-25	powerpc/tm: Fix userspace r13 corruption	Michael Neuling	1	-2/+9
	When we treclaim we store the userspace checkpointed r13 to a scratch SPR and then later save the scratch SPR to the user thread struct. Unfortunately, this doesn't work as accessing the user thread struct can take an SLB fault and the SLB fault handler will write the same scratch SPRG that now contains the userspace r13. To fix this, we store r13 to the kernel stack (which can't fault) before we access the user thread struct. Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen as a random userspace segfault with r13 looking like a kernel address. Signed-off-by: Michael Neuling <[email protected]> Reviewed-by: Breno Leitao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>