diff options
| author | Jie Meng <[email protected]> | 2022-10-07 13:23:48 -0700 | 
|---|---|---|
| committer | Alexei Starovoitov <[email protected]> | 2022-10-19 16:53:51 -0700 | 
| commit | 77d8f5d47bfbb5f0a8630102838ffb22cd70d6f5 (patch) | |
| tree | fed0be0238b5a5069eef777cc041716a45dc5a4c /tools/perf/scripts/python/bin/compaction-times-report | |
| parent | 81b35e7cad790eecf9f359662804bb26055ac7e8 (diff) | |
bpf,x64: use shrx/sarx/shlx when available
BMI2 provides 3 shift instructions (shrx, sarx and shlx) that use VEX
encoding but target general purpose registers [1]. They allow the shift
count in any general purpose register and have the same performance as
non BMI2 shift instructions [2].
Instead of shr/sar/shl that implicitly use %cl (lowest 8 bit of %rcx),
emit their more flexible alternatives provided in BMI2 when advantageous;
keep using the non BMI2 instructions when shift count is already in
BPF_REG_4/%rcx as non BMI2 instructions are shorter.
To summarize, when BMI2 is available:
-------------------------------------------------
            |   arbitrary dst
=================================================
src == ecx  |   shl dst, cl
-------------------------------------------------
src != ecx  |   shlx dst, dst, src
-------------------------------------------------
And no additional register shuffling is needed.
A concrete example between non BMI2 and BMI2 codegen.  To shift %rsi by
%rdi:
Without BMI2:
 ef3:   push   %rcx
        51
 ef4:   mov    %rdi,%rcx
        48 89 f9
 ef7:   shl    %cl,%rsi
        48 d3 e6
 efa:   pop    %rcx
        59
With BMI2:
 f0b:   shlx   %rdi,%rsi,%rsi
        c4 e2 c1 f7 f6
[1] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
[2] https://www.agner.org/optimize/instruction_tables.pdf
Signed-off-by: Jie Meng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Diffstat (limited to 'tools/perf/scripts/python/bin/compaction-times-report')
0 files changed, 0 insertions, 0 deletions