diff options
author | Uros Bizjak <[email protected]> | 2023-10-15 22:24:39 +0200 |
---|---|---|
committer | Ingo Molnar <[email protected]> | 2023-10-16 12:51:58 +0200 |
commit | a048d3abae7c33f0a3f4575fab15ac5504d443f7 (patch) | |
tree | a6a75c6d29d408aee9a7b4f17f5228e31ca02fcd | |
parent | e29aad08b1da7772b362537be32335c0394e65fe (diff) |
x86/percpu: Rewrite arch_raw_cpu_ptr() to be easier for compilers to optimize
Implement arch_raw_cpu_ptr() as a load from this_cpu_off and then
add the ptr value to the base. This way, the compiler can propagate
addend to the following instruction and simplify address calculation.
E.g.: address calcuation in amd_pmu_enable_virt() improves from:
48 c7 c0 00 00 00 00 mov $0x0,%rax
87b7: R_X86_64_32S cpu_hw_events
65 48 03 05 00 00 00 add %gs:0x0(%rip),%rax
00
87bf: R_X86_64_PC32 this_cpu_off-0x4
48 c7 80 28 13 00 00 movq $0x0,0x1328(%rax)
00 00 00 00
to:
65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax
00
8798: R_X86_64_PC32 this_cpu_off-0x4
48 c7 80 00 00 00 00 movq $0x0,0x0(%rax)
00 00 00 00
87a6: R_X86_64_32S cpu_hw_events+0x1328
The compiler also eliminates additional redundant loads from
this_cpu_off, reducing the number of percpu offset reads
from 1668 to 1646 on a test build, a -1.3% reduction.
Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Uros Bizjak <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
-rw-r--r-- | arch/x86/include/asm/percpu.h | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 60ea7755c0fe..915675f3ad60 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -56,9 +56,11 @@ #define arch_raw_cpu_ptr(ptr) \ ({ \ unsigned long tcp_ptr__; \ - asm ("add " __percpu_arg(1) ", %0" \ + asm ("mov " __percpu_arg(1) ", %0" \ : "=r" (tcp_ptr__) \ - : "m" (__my_cpu_var(this_cpu_off)), "0" (ptr)); \ + : "m" (__my_cpu_var(this_cpu_off))); \ + \ + tcp_ptr__ += (unsigned long)(ptr); \ (typeof(*(ptr)) __kernel __force *)tcp_ptr__; \ }) #else /* CONFIG_SMP */ |