diff options
| author | Michael Ellerman <[email protected]> | 2021-11-25 11:23:59 +1100 |
|---|---|---|
| committer | Michael Ellerman <[email protected]> | 2021-11-25 11:23:59 +1100 |
| commit | ff0d6be4bf9ad4daba024ba0157b97750c7ad1fb (patch) | |
| tree | 720d06d1fad98677d031f55d4e44a8738201c1fa /tools/perf/scripts/python | |
| parent | 136057256686de39cc3a07c2e39ef6bc43003ff6 (diff) | |
| parent | 9c5a432a558105d6145b058fad78eb6fcf3d4c38 (diff) | |
Merge branch 'topic/ppc-kvm' into next
This merge's Nick's big P9 KVM series, original cover letter follows:
KVM: PPC: Book3S HV P9: entry/exit optimisations
This reduces radix guest full entry/exit latency on POWER9 and POWER10
by 2x.
Nested HV guests should see smaller improvements in their L1 entry/exit,
but this is also combined with most L0 speedups also applying to nested
entry. nginx localhost throughput test in a SMP nested guest is improved
about 10% (in a direct guest it doesn't change much because it uses XIVE
for IPIs) when L0 and L1 are patched.
It does this in several main ways:
- Rearrange code to optimise SPR accesses. Mainly, avoid scoreboard
stalls.
- Test SPR values to avoid mtSPRs where possible. mtSPRs are expensive.
- Reduce mftb. mftb is expensive.
- Demand fault certain facilities to avoid saving and/or restoring them
(at the cost of fault when they are used, but this is mitigated over
a number of entries, like the facilities when context switching
processes). PM, TM, and EBB so far.
- Defer some sequences that are made just in case a guest is interrupted
in the middle of a critical section to the case where the guest is
scheduled on a different CPU, rather than every time (at the cost of
an extra IPI in this case). Namely the tlbsync sequence for radix with
GTSE, which is very expensive.
- Reduce locking, barriers, atomics related to the vcpus-per-vcore > 1
handling that the P9 path does not require.
Diffstat (limited to 'tools/perf/scripts/python')
0 files changed, 0 insertions, 0 deletions