aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/libxed.py
diff options
context:
space:
mode:
authorJie Meng <[email protected]>2022-10-07 13:23:48 -0700
committerAlexei Starovoitov <[email protected]>2022-10-19 16:53:51 -0700
commit77d8f5d47bfbb5f0a8630102838ffb22cd70d6f5 (patch)
treefed0be0238b5a5069eef777cc041716a45dc5a4c /tools/perf/scripts/python/libxed.py
parent81b35e7cad790eecf9f359662804bb26055ac7e8 (diff)
bpf,x64: use shrx/sarx/shlx when available
BMI2 provides 3 shift instructions (shrx, sarx and shlx) that use VEX encoding but target general purpose registers [1]. They allow the shift count in any general purpose register and have the same performance as non BMI2 shift instructions [2]. Instead of shr/sar/shl that implicitly use %cl (lowest 8 bit of %rcx), emit their more flexible alternatives provided in BMI2 when advantageous; keep using the non BMI2 instructions when shift count is already in BPF_REG_4/%rcx as non BMI2 instructions are shorter. To summarize, when BMI2 is available: ------------------------------------------------- | arbitrary dst ================================================= src == ecx | shl dst, cl ------------------------------------------------- src != ecx | shlx dst, dst, src ------------------------------------------------- And no additional register shuffling is needed. A concrete example between non BMI2 and BMI2 codegen. To shift %rsi by %rdi: Without BMI2: ef3: push %rcx 51 ef4: mov %rdi,%rcx 48 89 f9 ef7: shl %cl,%rsi 48 d3 e6 efa: pop %rcx 59 With BMI2: f0b: shlx %rdi,%rsi,%rsi c4 e2 c1 f7 f6 [1] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set [2] https://www.agner.org/optimize/instruction_tables.pdf Signed-off-by: Jie Meng <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
Diffstat (limited to 'tools/perf/scripts/python/libxed.py')
0 files changed, 0 insertions, 0 deletions