aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel/smp.c
AgeCommit message (Collapse)AuthorFilesLines
2020-11-02powerpc/smp: Call rcu_cpu_starting() earlierQian Cai1-1/+2
The call to rcu_cpu_starting() in start_secondary() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows (with CONFIG_PROVE_RCU_LIST=y): WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call Trace: dump_stack+0xec/0x144 (unreliable) lockdep_rcu_suspicious+0x128/0x14c __lock_acquire+0x1060/0x1c60 lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 clockevents_register_device+0x74/0x270 register_decrementer_clockevent+0x94/0x110 start_secondary+0x134/0x800 start_secondary_prolog+0x10/0x14 This is avoided by adding a call to rcu_cpu_starting() near the beginning of the start_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. It's safe to call rcu_cpu_starting() in the arch code as well as later in generic code, as explained by Paul: It uses a per-CPU variable so that RCU pays attention only to the first call to rcu_cpu_starting() if there is more than one of them. This is even intentional, due to there being a generic arch-independent call to rcu_cpu_starting() in notify_cpu_starting(). So multiple calls to rcu_cpu_starting() are fine by design. Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion") Signed-off-by: Qian Cai <[email protected]> Acked-by: Paul E. McKenney <[email protected]> [mpe: Add Fixes tag, reword slightly & expand change log] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19powerpc/smp: Use GFP_ATOMIC while allocating tmp maskSrikar Dronamraju1-26/+31
Qian Cai reported a regression where CPU Hotplug fails with the latest powerpc/next BUG: sleeping function called from invalid context at mm/slab.h:494 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88 no locks held by swapper/88/0. irq event stamp: 18074448 hardirqs last enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110 hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0 do_idle at kernel/sched/idle.c:253 (discriminator 1) softirqs last enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0 softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0 CPU: 88 PID: 0 Comm: swapper/88 Tainted: G W 5.9.0-rc8-next-20201007 #1 Call Trace: [c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable) [c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310 [c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190 [c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0 slab_alloc_node at mm/slub.c:2817 (inlined by) __kmalloc_node at mm/slub.c:4013 [c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80 kmalloc_node at include/linux/slab.h:577 (inlined by) alloc_cpumask_var_node at lib/cpumask.c:116 [c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800 update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267 (inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387 (inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420 [c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14 Allocating a temporary mask while performing a CPU Hotplug operation with CONFIG_CPUMASK_OFFSTACK enabled, leads to calling a sleepable function from a atomic context. Fix this by allocating the temporary mask with GFP_ATOMIC flag. Also instead of having to allocate twice, allocate the mask in the caller so that we only have to allocate once. If the allocation fails, assume the mask to be same as sibling mask, which will make the scheduler to drop this domain for this CPU. Fixes: 70a94089d7f7 ("powerpc/smp: Optimize update_coregroup_mask") Fixes: 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2") Reported-by: Qian Cai <[email protected]> Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19powerpc/smp: Remove unnecessary variableSrikar Dronamraju1-9/+4
Commit 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2") introduced submask_fn in update_mask_by_l2 to track the right submask. However commit f6606cfdfbcd ("powerpc/smp: Dont assume l2-cache to be superset of sibling") introduced sibling_mask in update_mask_by_l2 to track the same submask. Remove sibling_mask in favour of submask_fn. Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Optimize update_coregroup_maskSrikar Dronamraju1-8/+22
All threads of a SMT4/SMT8 core can either be part of CPU's coregroup mask or outside the coregroup. Use this relation to reduce the number of iterations needed to find all the CPUs that share the same coregroup Use a temporary mask to iterate through the CPUs that may share coregroup mask. Also instead of setting one CPU at a time into cpu_coregroup_mask, copy the SMT4/SMT8/submask at one shot. Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Move coregroup mask updation to a new functionSrikar Dronamraju1-13/+19
Move the logic for updating the coregroup mask of a CPU to its own function. This will help in reworking the updation of coregroup mask in subsequent patch. Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Optimize update_mask_by_l2Srikar Dronamraju1-6/+45
All threads of a SMT4 core can either be part of this CPU's l2-cache mask or not related to this CPU l2-cache mask. Use this relation to reduce the number of iterations needed to find all the CPUs that share the same l2-cache. Use a temporary mask to iterate through the CPUs that may share l2_cache mask. Also instead of setting one CPU at a time into cpu_l2_cache_mask, copy the SMT4/sub mask at one shot. Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Check for duplicate topologies and consolidateSrikar Dronamraju1-0/+26
CACHE and COREGROUP domains are now part of default topology. However on systems that don't support CACHE or COREGROUP, these domains will eventually be degenerated. The degeneration happens per CPU. Do note the current fixup_topology() logic ensures that mask of a domain that is not supported on the current platform is set to the previous domain. Instead of waiting for the scheduler to degenerated try to consolidate based on their masks and sd_flags. This is done just before setting the scheduler topology. Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Depend on cpu_l1_cache_map when adding CPUsSrikar Dronamraju1-4/+3
Currently on hotplug/hotunplug, CPU iterates through all the CPUs in its core to find threads in its thread group. However this info is already captured in cpu_l1_cache_map. Hence reduce iterations and cleanup add_cpu_to_smallcore_masks function. Signed-off-by: Srikar Dronamraju <[email protected]> Tested-by: Satheesh Rajendran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Stop passing mask to update_mask_by_l2Srikar Dronamraju1-4/+4
update_mask_by_l2 is called only once. But it passes cpu_l2_cache_mask as parameter. Instead of passing cpu_l2_cache_mask, use it directly in update_mask_by_l2. Signed-off-by: Srikar Dronamraju <[email protected]> Tested-by: Satheesh Rajendran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Limit CPUs traversed to within a node.Srikar Dronamraju1-1/+1
All the arch specific topology cpumasks are within a node/DIE. However when setting these per CPU cpumasks, system traverses through all the online CPUs. This is redundant. Reduce the traversal to only CPUs that are online in the node to which the CPU belongs to. Signed-off-by: Srikar Dronamraju <[email protected]> Tested-by: Satheesh Rajendran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Optimize remove_cpu_from_masksSrikar Dronamraju1-2/+9
While offlining a CPU, system currently iterate through all the CPUs in the DIE to clear sibling, l2_cache and smallcore maps. However if there are more cores in a DIE, system can end up spending more time iterating through CPUs which are completely unrelated. Optimize this by only iterating through smaller but relevant cpumap. If shared_cache is set, cpu_l2_cache_map should be relevant else cpu_sibling_map would be relevant. Signed-off-by: Srikar Dronamraju <[email protected]> Tested-by: Satheesh Rajendran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Remove get_physical_package_idSrikar Dronamraju1-20/+0
Now that cpu_core_mask has been removed and topology_core_cpumask has been updated to use cpu_cpu_mask, we no more need get_physical_package_id. Signed-off-by: Srikar Dronamraju <[email protected]> Tested-by: Satheesh Rajendran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-06powerpc/smp: Stop updating cpu_core_maskSrikar Dronamraju1-26/+7
Anton Blanchard reported that his 4096 vcpu KVM guest took around 30 minutes to boot. He also analyzed it to the time taken to iterate while setting the cpu_core_mask. Further analysis shows that cpu_core_mask and cpu_cpu_mask for any CPU would be equal on Power. However updating cpu_core_mask took forever to update as its a per cpu cpumask variable. Instead cpu_cpu_mask was a per NODE /per DIE cpumask that was shared by all the respective CPUs. Also cpu_cpu_mask is needed from a scheduler perspective. However cpu_core_map is an exported symbol. Hence stop updating cpu_core_map and make it point to cpu_cpu_mask. Signed-off-by: Srikar Dronamraju <[email protected]> Tested-by: Satheesh Rajendran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-18powerpc/smp: Move ppc_md.cpu_die() to smp_ops.cpu_offline_self()Michael Ellerman1-2/+2
We have smp_ops->cpu_die() and ppc_md.cpu_die(). One of them offlines the current CPU and one offlines another CPU, can you guess which is which? Also one is in smp_ops and one is in ppc_md? So rename ppc_md.cpu_die(), to cpu_offline_self(), because that's what it does. And move it into smp_ops where it belongs. Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-18powerpc/smp: Fold cpu_die() into its only callerMichael Ellerman1-4/+0
Avoid the eternal confusion between cpu_die() and __cpu_die() by removing the former, folding it into its only caller. Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-18powerpc: Move arch_cpu_idle_dead() into smp.cMichael Ellerman1-0/+6
arch_cpu_idle_dead() is in idle.c, which makes sense, but it's inside a CONFIG_HOTPLUG_CPU block. It would be more at home in smp.c, inside the existing CONFIG_HOTPLUG_CPU block. Note that CONFIG_HOTPLUG_CPU depends on CONFIG_SMP so even though smp.c is not built for SMP=n builds, that's fine. Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16powerpc/smp: Create coregroup domainSrikar Dronamraju1-1/+53
Add percpu coregroup maps and masks to create coregroup domain. If a coregroup doesn't exist, the coregroup domain will be degenerated in favour of SMT/CACHE domain. Do note this patch is only creating stubs for cpu_to_coregroup_id. The actual cpu_to_coregroup_id implementation would be in a subsequent patch. Signed-off-by: Srikar Dronamraju <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16powerpc/smp: Allocate cpumask only after searching thread groupSrikar Dronamraju1-4/+3
If allocated earlier and the search fails, then cpu_l1_cache_map cpumask is unnecessarily cleared. However cpu_l1_cache_map can be allocated / cleared after we search thread group. Please note CONFIG_CPUMASK_OFFSTACK is not set on Powerpc. Hence cpumask allocated by zalloc_cpumask_var_node is never freed. Signed-off-by: Srikar Dronamraju <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16powerpc/numa: Detect support for coregroupSrikar Dronamraju1-0/+1
Add support for grouping cores based on the device-tree classification. - The last domain in the associativity domains always refers to the core. - If primary reference domain happens to be the penultimate domain in the associativity domains device-tree property, then there are no coregroups. However if its not a penultimate domain, then there are coregroups. There can be more than one coregroup. For now we would be interested in the last or the smallest coregroups, i.e one sub-group per DIE. Currently there are no firmwares that are exposing this grouping. Hence allow the basis for grouping to be abstract. Once the firmware starts using this grouping, code would be added to detect the type of grouping and adjust the sd domain flags accordingly. Signed-off-by: Srikar Dronamraju <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16powerpc/smp: Optimize start_secondarySrikar Dronamraju1-6/+11
In start_secondary, even if shared_cache was already set, system does a redundant match for cpumask. This redundant check can be removed by checking if shared_cache is already set. While here, localize the sibling_mask variable to within the if condition. Signed-off-by: Srikar Dronamraju <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16powerpc/smp: Dont assume l2-cache to be superset of siblingSrikar Dronamraju1-14/+29
Current code assumes that cpumask of cpus sharing a l2-cache mask will always be a superset of cpu_sibling_mask. Lets stop that assumption. cpu_l2_cache_mask is a superset of cpu_sibling_mask if and only if shared_caches is set. Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16powerpc/smp: Move topology fixups into a new functionSrikar Dronamraju1-6/+11
Move topology fixup based on the platform attributes into its own function which is called just before set_sched_topology. Signed-off-by: Srikar Dronamraju <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16powerpc/smp: Move powerpc_topology aboveSrikar Dronamraju1-52/+52
Just moving the powerpc_topology description above. This will help in using functions in this file and avoid declarations. No other functional changes Signed-off-by: Srikar Dronamraju <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16powerpc/smp: Merge Power9 topology with Power topologySrikar Dronamraju1-22/+3
A new sched_domain_topology_level was added just for Power9. However the same can be achieved by merging powerpc_topology with power9_topology and makes the code more simpler especially when adding a new sched domain. Signed-off-by: Srikar Dronamraju <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16powerpc/smp: Fix a warning under !NEED_MULTIPLE_NODESSrikar Dronamraju1-0/+2
Fix a build warning in a non CONFIG_NEED_MULTIPLE_NODES "error: _numa_cpu_lookup_table_ undeclared" Signed-off-by: Srikar Dronamraju <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-20powerpc/book3s64/kuap: Move UAMOR setup to key init functionAneesh Kumar K.V1-0/+1
UAMOR values are not application-specific. The kernel initializes its value based on different reserved keys. Remove the thread-specific UAMOR value and don't switch the UAMOR on context switch. Move UAMOR initialization to key initialization code and remove thread_struct.uamor because it is not used anymore. Before commit: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not change pkey registers") we used to update uamor based on key allocation and free. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-09mm: reorder includes after introduction of linux/pgtable.hMike Rapoport1-1/+1
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include of the latter in the middle of asm includes. Fix this up with the aid of the below script and manual adjustments here and there. import sys import re if len(sys.argv) is not 3: print "USAGE: %s <file> <header>" % (sys.argv[0]) sys.exit(1) hdr_to_move="#include <linux/%s>" % sys.argv[2] moved = False in_hdrs = False with open(sys.argv[1], "r") as f: lines = f.readlines() for _line in lines: line = _line.rstrip(' ') if line == hdr_to_move: continue if line.startswith("#include <linux/"): in_hdrs = True elif not moved and in_hdrs: moved = True print hdr_to_move print line Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-09mm: introduce include/linux/pgtable.hMike Rapoport1-1/+1
The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02powerpc: Fix misleading small cores printMichael Neuling1-1/+1
Currently when we boot on a big core system, we get this print: [ 0.040500] Using small cores at SMT level This is misleading as we've actually detected big cores. This patch clears up the print to say we've detect big cores but are using small cores for scheduling. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-03-27powerpc/smp: Use IS_ENABLED() to avoid #ifdefMichael Ellerman1-3/+2
We can avoid the #ifdef by using IS_ENABLED() in the existing condition check. Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-03-27powerpc/smp: Drop superfluous NULL checkMichael Ellerman1-5/+2
We don't need the NULL check of np, the result is the same because the OF helpers cope with NULL, of_node_to_nid(NULL) == NUMA_NO_NODE (-1). Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-03-04powerpc/numa: Remove late request for home node associativitySrikar Dronamraju1-5/+0
With commit ("powerpc/numa: Early request for home node associativity"), commit 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot") which was requesting home node associativity becomes redundant. Hence remove the late request for home node associativity. Signed-off-by: Srikar Dronamraju <[email protected]> Reported-by: Abdul Haleem <[email protected]> Reviewed-by: Nathan Lynch <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-03-04powerpc/smp: Use nid as fallback for package_idSrikar Dronamraju1-3/+27
package_id is to match cores that are part of the same chip. On PowerNV machines, package_id defaults to chip_id. However ibm,chip_id property is not present in device-tree of PowerVM LPARs. Hence lscpu output shows one core per socket and multiple cores. To overcome this, use nid as the package_id on PowerVM LPARs. Before the patch: Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 Core(s) per socket: 1 <---------------------- Socket(s): 16 <---------------------- NUMA node(s): 2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id -1 After the patch: Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 <--------------------- Core(s) per socket: 8 <--------------------- Socket(s): 2 NUMA node(s): 2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id 0 Now lscpu output is more in line with the system configuration. Signed-off-by: Srikar Dronamraju <[email protected]> [mpe: Use pkg_id instead of ppid, tweak change log and comment] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-02-23powerpc: 'current_set' is now a table of task_struct pointersChristophe Leroy1-6/+4
The table of pointers 'current_set' has been used for retrieving the stack and current. They used to be thread_info pointers as they were pointing to the stack and current was taken from the 'task' field of the thread_info. Now, the pointers of 'current_set' table are now both pointers to task_struct and pointers to thread_info. As they are used to get current, and the stack pointer is retrieved from current's stack field, this patch changes their type to task_struct, and renames secondary_ti to secondary_current. Reviewed-by: Nicholas Piggin <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: Activate CONFIG_THREAD_INFO_IN_TASKChristophe Leroy1-1/+1
This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. - Modifies klp_init_thread_info() to take a task_struct pointer argument. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Add task_stack.h to livepatch.h to fix build fails] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc/64: Use task_stack_page() to initialise paca->kstackChristophe Leroy1-1/+3
Rather than using the thread info use task_stack_page() to initialise paca->kstack, that way it will work with THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/smp: Make __smp_send_nmi_ipi() staticNicholas Piggin1-1/+2
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/smp: Fix NMI IPI xmon timeoutNicholas Piggin1-64/+29
The xmon debugger IPI handler waits in the callback function while xmon is still active. This means they don't complete the IPI, and the initiator always times out waiting for them. Things manage to work after the timeout because there is some fallback logic to keep NMI IPI state sane in case of the timeout, but this is a bit ugly. This patch changes NMI IPI back to half-asynchronous (i.e., wait for everyone to call in, do not wait for IPI function to complete), but the complexity is avoided by going one step further and allowing new IPIs to be issued before the IPI functions to all complete. If synchronization against that is required, it is left up to the caller, but current callers don't require that. In fact with the timeout handling, callers must be able to cope with this already. Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous") Cc: [email protected] # v4.19+ Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/smp: Fix NMI IPI timeoutNicholas Piggin1-2/+3
The NMI IPI timeout logic is broken, if __smp_send_nmi_ipi() times out on the first condition, delay_us will be zero which will send it into the second spin loop with no timeout so it will spin forever. Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous") Cc: [email protected] # v4.19+ Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-21powerpc: Fix stack protector crashes on CPU hotplugMichael Ellerman1-7/+3
Recently in commit 7241d26e8175 ("powerpc/64: properly initialise the stackprotector canary on SMP.") we fixed a crash with stack protector on SMP by initialising the stack canary in cpu_idle_thread_init(). But this can also causes crashes, when a CPU comes back online after being offline: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pnv_smp_cpu_kill_self+0x2a0/0x2b0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc3-gcc-7.3.1-00168-g4ffe713b7587 #94 Call Trace: dump_stack+0xb0/0xf4 (unreliable) panic+0x144/0x328 __stack_chk_fail+0x2c/0x30 pnv_smp_cpu_kill_self+0x2a0/0x2b0 cpu_die+0x48/0x70 arch_cpu_idle_dead+0x20/0x40 do_idle+0x274/0x390 cpu_startup_entry+0x38/0x50 start_secondary+0x5e4/0x600 start_secondary_prolog+0x10/0x14 Looking at the stack we see that the canary value in the stack frame doesn't match the canary in the task/paca. That is because we have reinitialised the task/paca value, but then the CPU coming online has returned into a function using the old canary value. That causes the comparison to fail. Instead we can call boot_init_stack_canary() from start_secondary() which never returns. This is essentially what the generic code does in cpu_startup_entry() under #ifdef X86, we should make that non-x86 specific in a future patch. Fixes: 7241d26e8175 ("powerpc/64: properly initialise the stackprotector canary on SMP.") Reported-by: Joel Stanley <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Christophe Leroy <[email protected]>
2018-10-13powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcoresGautham R. Shenoy1-1/+18
POWER9 SMT8 cores consist of two groups of threads, where threads in each group shares L1-cache. The scheduler is not aware of this distinction as the current sched-domain hierarchy has all the threads of the core defined at the SMT domain. SMT [Thread siblings of the SMT8 core] DIE [CPUs in the same die] NUMA [All the CPUs in the system] Due to this, we can observe run-to-run variance when we run a multi-threaded benchmark bound to a single core based on how the scheduler spreads the software threads across the two groups in the core. We fix this in this patch by defining each group of threads which share L1-cache to be the SMT level. The group of threads in the SMT8 core is defined to be the CACHE level. The sched-domain hierarchy after this patch will be : SMT [Thread siblings in the core that share L1 cache] CACHE [Thread siblings that are in the SMT8 core] DIE [CPUs in the same die] NUMA [All the CPUs in the system] Signed-off-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-13powerpc: Detect the presence of big-cores via "ibm, thread-groups"Gautham R. Shenoy1-0/+222
On IBM POWER9, the device tree exposes a property array identifed by "ibm,thread-groups" which will indicate which groups of threads share a particular set of resources. As of today we only have one form of grouping identifying the group of threads in the core that share the L1 cache, translation cache and instruction data flow. This patch adds helper functions to parse the contents of "ibm,thread-groups" and populate a per-cpu variable to cache information about siblings of each CPU that share the L1, traslation cache and instruction data-flow. It also defines a new global variable named "has_big_cores" which indicates if the cores on this configuration have multiple groups of threads that share L1 cache. For each online CPU, it maintains a cpu_smallcore_mask, which indicates the online siblings which share the L1-cache with it. Signed-off-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-10-13powerpc/64: properly initialise the stackprotector canary on SMP.Christophe Leroy1-0/+8
commit 06ec27aea9fc ("powerpc/64: add stack protector support") doesn't initialise the stack canary on SMP secondary CPU's paca, leading to the following false positive report from the stack protector. smp: Bringing up secondary CPUs ... Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __schedule+0x978/0xa80 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc7-next-20181010-autotest-autotest #1 Call Trace: [c000001fed5b3bf0] [c000000000a0ef3c] dump_stack+0xb0/0xf4 (unreliable) [c000001fed5b3c30] [c0000000000f9d68] panic+0x140/0x308 [c000001fed5b3cc0] [c0000000000f9844] __stack_chk_fail+0x24/0x30 [c000001fed5b3d20] [c000000000a2c3a8] __schedule+0x978/0xa80 [c000001fed5b3e00] [c000000000a2c9b4] schedule_idle+0x34/0x60 [c000001fed5b3e30] [c00000000013d344] do_idle+0x224/0x3d0 [c000001fed5b3ec0] [c00000000013d6e0] cpu_startup_entry+0x30/0x50 [c000001fed5b3ef0] [c000000000047f34] start_secondary+0x4d4/0x520 [c000001fed5b3f90] [c00000000000b370] start_secondary_prolog+0x10/0x14 This patch properly initialises the stack_canary of the secondary idle tasks. Reported-by: Abdul Haleem <[email protected]> Fixes: 06ec27aea9fc ("powerpc/64: add stack protector support") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-08-21powerpc/topology: Get topology for shared processors at bootSrikar Dronamraju1-0/+5
On a shared LPAR, Phyp will not update the CPU associativity at boot time. Just after the boot system does recognize itself as a shared LPAR and trigger a request for correct CPU associativity. But by then the scheduler would have already created/destroyed its sched domains. This causes - Broken load balance across Nodes causing islands of cores. - Performance degradation esp if the system is lightly loaded - dmesg to wrongly report all CPUs to be in Node 0. - Messages in dmesg saying borken topology. - With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain"), can cause rcu stalls at boot up. The sched_domains_numa_masks table which is used to generate cpumasks is only created at boot time just before creating sched domains and never updated. Hence, its better to get the topology correct before the sched domains are created. For example on 64 core Power 8 shared LPAR, dmesg reports Brought up 512 CPUs Node 0 CPUs: 0-511 Node 1 CPUs: Node 2 CPUs: Node 3 CPUs: Node 4 CPUs: Node 5 CPUs: Node 6 CPUs: Node 7 CPUs: Node 8 CPUs: Node 9 CPUs: Node 10 CPUs: Node 11 CPUs: ... BUG: arch topology borken the DIE domain not a subset of the NUMA domain BUG: arch topology borken the DIE domain not a subset of the NUMA domain numactl/lscpu output will still be correct with cores spreading across all nodes: Socket(s): 64 NUMA node(s): 12 Model: 2.0 (pvr 004d 0200) Model name: POWER8 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471 NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479 NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487 NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495 NUMA node4 CPU(s): 208-215,304-311,400-407,496-503 NUMA node5 CPU(s): 168-175,264-271,360-367,456-463 NUMA node6 CPU(s): 128-135,224-231,320-327,416-423 NUMA node7 CPU(s): 136-143,232-239,328-335,424-431 NUMA node8 CPU(s): 216-223,312-319,408-415,504-511 NUMA node9 CPU(s): 144-151,240-247,336-343,432-439 NUMA node10 CPU(s): 152-159,248-255,344-351,440-447 NUMA node11 CPU(s): 160-167,256-263,352-359,448-455 Currently on this LPAR, the scheduler detects 2 levels of Numa and created numa sched domains for all CPUs, but it finds a single DIE domain consisting of all CPUs. Hence it deletes all numa sched domains. To address this, detect the shared processor and update topology soon after CPUs are setup so that correct topology is updated just before scheduler creates sched domain. With the fix, dmesg reports: numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 368-375 464-471 numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 376-383 472-479 numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 384-391 480-487 numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 392-399 488-495 numa: Node 4 CPUs: 208-215 304-311 400-407 496-503 numa: Node 5 CPUs: 168-175 264-271 360-367 456-463 numa: Node 6 CPUs: 128-135 224-231 320-327 416-423 numa: Node 7 CPUs: 136-143 232-239 328-335 424-431 numa: Node 8 CPUs: 216-223 312-319 408-415 504-511 numa: Node 9 CPUs: 144-151 240-247 336-343 432-439 numa: Node 10 CPUs: 152-159 248-255 344-351 440-447 numa: Node 11 CPUs: 160-167 256-263 352-359 448-455 and lscpu also reports: Socket(s): 64 NUMA node(s): 12 Model: 2.0 (pvr 004d 0200) Model name: POWER8 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471 NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479 NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487 NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495 NUMA node4 CPU(s): 208-215,304-311,400-407,496-503 NUMA node5 CPU(s): 168-175,264-271,360-367,456-463 NUMA node6 CPU(s): 128-135,224-231,320-327,416-423 NUMA node7 CPU(s): 136-143,232-239,328-335,424-431 NUMA node8 CPU(s): 216-223,312-319,408-415,504-511 NUMA node9 CPU(s): 144-151,240-247,336-343,432-439 NUMA node10 CPU(s): 152-159,248-255,344-351,440-447 NUMA node11 CPU(s): 160-167,256-263,352-359,448-455 Reported-by: Manjunatha H R <[email protected]> Signed-off-by: Srikar Dronamraju <[email protected]> [mpe: Trim / format change log] Tested-by: Michael Ellerman <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-07-24powerpc: NMI IPI make NMI IPIs fully sychronousNicholas Piggin1-23/+27
There is an asynchronous aspect to smp_send_nmi_ipi. The caller waits for all CPUs to call in to the handler, but it does not wait for completion of the handler. This is a needless complication, so remove it and always wait synchronously. The synchronous wait allows the caller to easily time out and clear the wait for completion (zero nmi_ipi_busy_count) in the case of badly behaved handlers. This would have prevented the recent smp_send_stop NMI IPI bug from causing the system to hang. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-19powerpc: smp_send_stop do not offline stopped CPUsNicholas Piggin1-6/+0
Marking CPUs stopped by smp_send_stop as offline can cause warnings due to cross-CPU wakeups. This trace was noticed on a busy system running a sysrq+c crash test, after the injected crash: WARNING: CPU: 51 PID: 1546 at kernel/sched/core.c:1179 set_task_cpu+0x22c/0x240 CPU: 51 PID: 1546 Comm: kworker/u352:1 Tainted: G D Workqueue: mlx5e mlx5e_update_stats_work [mlx5_core] [...] NIP [c00000000017c21c] set_task_cpu+0x22c/0x240 LR [c00000000017d580] try_to_wake_up+0x230/0x720 Call Trace: [c000000001017700] runqueues+0x0/0xb00 (unreliable) [c00000000017d580] try_to_wake_up+0x230/0x720 [c00000000015a214] insert_work+0x104/0x140 [c00000000015adb0] __queue_work+0x230/0x690 [c000003fc5007910] [c00000000015b26c] queue_work_on+0x5c/0x90 [c0080000135fc8f8] mlx5_cmd_exec+0x538/0xcb0 [mlx5_core] [c008000013608fd0] mlx5_core_access_reg+0x140/0x1d0 [mlx5_core] [c00800001362777c] mlx5e_update_pport_counters.constprop.59+0x6c/0x90 [mlx5_core] [c008000013628868] mlx5e_update_ndo_stats+0x28/0x90 [mlx5_core] [c008000013625558] mlx5e_update_stats_work+0x68/0xb0 [mlx5_core] [c00000000015bcec] process_one_work+0x1bc/0x5f0 [c00000000015ecac] worker_thread+0xac/0x6b0 [c000000000168338] kthread+0x168/0x1b0 [c00000000000b628] ret_from_kernel_thread+0x5c/0xb4 This happens because firstly the CPU is not really offline in the usual sense, processes and interrupts have not been migrated away. Secondly smp_send_stop does not happen atomically on all CPUs, so one CPU can have marked itself offline, while another CPU is still running processes or interrupts which can affect the first CPU. Fix this by just not marking the CPU as offline. It's more like frozen in time, so offline does not really reflect its state properly anyway. There should be nothing in the crash/panic path that walks online CPUs and synchronously waits for them, so this change should not introduce new hangs. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03powerpc/nmi: Add an API for sending "safe" NMIsMichael Ellerman1-5/+15
Currently the options we have for sending NMIs are not necessarily safe, that is they can potentially interrupt a CPU in a non-recoverable region of code, meaning the kernel must then panic(). But we'd like to use smp_send_nmi_ipi() to do cross-CPU calls in situations where we don't want to risk a panic(), because it doesn't have the requirement that interrupts must be enabled like smp_call_function(). So add an API for the caller to indicate that it wants to use the NMI infrastructure, but doesn't want to do anything "unsafe". Currently that is implemented by not actually calling cause_nmi_ipi(), instead falling back to an IPI. In future we can pass the safe parameter down to cause_nmi_ipi() and the individual backends can potentially take it into account before deciding what to do. Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]>
2018-06-03powerpc: move a stray NMI IPI case under NMI_IPI ifdefNicholas Piggin1-0/+2
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-06-03powerpc: move timer broadcast code under GENERIC_CLOCKEVENTS_BROADCAST ifdefNicholas Piggin1-0/+8
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>