diff options
| author | Vincent Mailhol <[email protected]> | 2022-09-07 18:09:34 +0900 | 
|---|---|---|
| committer | Borislav Petkov <[email protected]> | 2022-09-20 15:31:17 +0200 | 
| commit | 146034fed6ee75ec09cf8f996165e2296ceae0bb (patch) | |
| tree | 9a92ecbc9049ef5fa9effeca6f921b7a8cfe12c5 /tools/perf/scripts/python/libxed.py | |
| parent | 521a547ced6477c54b4b0cc206000406c221b4d6 (diff) | |
x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions
For x86_64, the current ffs() implementation does not produce optimized
code when called with a constant expression. On the contrary, the
__builtin_ffs() functions of both GCC and clang are able to fold the
expression into a single instruction.
** Example **
Consider two dummy functions foo() and bar() as below:
  #include <linux/bitops.h>
  #define CONST 0x01000000
  unsigned int foo(void)
  {
  	return ffs(CONST);
  }
  unsigned int bar(void)
  {
  	return __builtin_ffs(CONST);
  }
GCC would produce below assembly code:
  0000000000000000 <foo>:
     0:	ba 00 00 00 01       	mov    $0x1000000,%edx
     5:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     a:	0f bc c2             	bsf    %edx,%eax
     d:	83 c0 01             	add    $0x1,%eax
    10:	c3                   	ret
  <Instructions after ret and before next function were redacted>
  0000000000000020 <bar>:
    20:	b8 19 00 00 00       	mov    $0x19,%eax
    25:	c3                   	ret
And clang would produce:
  0000000000000000 <foo>:
     0:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
     5:	0f bc 05 00 00 00 00 	bsf    0x0(%rip),%eax        # c <foo+0xc>
     c:	83 c0 01             	add    $0x1,%eax
     f:	c3                   	ret
  0000000000000010 <bar>:
    10:	b8 19 00 00 00       	mov    $0x19,%eax
    15:	c3                   	ret
Both examples clearly demonstrate the benefit of using __builtin_ffs()
instead of the kernel's asm implementation for constant expressions.
However, for non constant expressions, the kernel's ffs() asm version
remains better for x86_64 because, contrary to GCC, it doesn't emit the
CMOV assembly instruction, c.f. [1] (noticeably, clang is able optimize
out the CMOV call).
Use __builtin_constant_p() to select between the kernel's ffs() and
the __builtin_ffs() depending on whether the argument is constant or
not.
As a side benefit, replacing the ffs() function declaration by a macro
also removes below -Wshadow warning:
  ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]
    283 | static __always_inline int ffs(int x)
** Statistics **
On a allyesconfig, before...:
  $ objdump -d vmlinux.o | grep bsf | wc -l
  1081
...and after:
  $ objdump -d vmlinux.o | grep bsf | wc -l
  792
So, roughly 26.7% of the calls to ffs() were using constant
expressions and could be optimized out.
(tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1)
[1] commit ca3d30cc02f7 ("x86_64, asm: Optimise fls(), ffs() and fls64()")
  [ bp: Massage commit message. ]
Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Yury Norov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Diffstat (limited to 'tools/perf/scripts/python/libxed.py')
0 files changed, 0 insertions, 0 deletions