aboutsummaryrefslogtreecommitdiff
path: root/scripts/gdb/linux/timerlist.py
diff options
context:
space:
mode:
authorEric Dumazet <[email protected]>2021-11-12 08:19:50 -0800
committerBorislav Petkov <[email protected]>2021-12-08 11:26:09 +0100
commit3411506550b1f714a52b5db087666c08658d2698 (patch)
treee96cd40dcbad1e8346aa181e51537bcab2ff2a41 /scripts/gdb/linux/timerlist.py
parent0fcfb00b28c0b7884635dacf38e46d60bf3d4eb1 (diff)
x86/csum: Rewrite/optimize csum_partial()
With more NICs supporting CHECKSUM_COMPLETE, and IPv6 being widely used csum_partial() is heavily used with small amount of bytes, and is consuming many cycles. IPv6 header size, for instance, is 40 bytes. Another thing to consider is that NET_IP_ALIGN is 0 on x86, meaning that network headers are not word-aligned, unless the driver forces this. This means that csum_partial() fetches one u16 to 'align the buffer', then performs three u64 additions with carry in a loop, then a remaining u32, then a remaining u16. With this new version, it performs a loop only for the 64 bytes blocks, then the remaining is bisected. Testing on various CPUs, all of them show a big reduction in csum_partial() cost (by 50 to 80 %) Before: 4.16% [kernel] [k] csum_partial After: 0.83% [kernel] [k] csum_partial If run in a loop 1,000,000 times: Before: 26,922,913 cycles # 3846130.429 GHz 80,302,961 instructions # 2.98 insn per cycle 21,059,816 branches # 3008545142.857 M/sec 2,896 branch-misses # 0.01% of all branches After: 17,960,709 cycles # 3592141.800 GHz 41,292,805 instructions # 2.30 insn per cycle 11,058,119 branches # 2211623800.000 M/sec 2,997 branch-misses # 0.03% of all branches [ bp: Massage, merge in subsequent fixes into a single patch: - um compilation error due to missing load_unaligned_zeropad(): - Reported-by: kernel test robot <[email protected]> - Link: https://lkml.kernel.org/r/[email protected] - Fix initial seed for odd buffers - Reported-by: Noah Goldstein <[email protected]> - Link: https://lkml.kernel.org/r/[email protected] ] Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Link: https://lore.kernel.org/r/[email protected]
Diffstat (limited to 'scripts/gdb/linux/timerlist.py')
0 files changed, 0 insertions, 0 deletions