diff options
Diffstat (limited to 'Documentation/x86/tlb.rst')
-rw-r--r-- | Documentation/x86/tlb.rst | 83 |
1 files changed, 0 insertions, 83 deletions
diff --git a/Documentation/x86/tlb.rst b/Documentation/x86/tlb.rst deleted file mode 100644 index 82ec58ae63a8..000000000000 --- a/Documentation/x86/tlb.rst +++ /dev/null @@ -1,83 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -======= -The TLB -======= - -When the kernel unmaps or modified the attributes of a range of -memory, it has two choices: - - 1. Flush the entire TLB with a two-instruction sequence. This is - a quick operation, but it causes collateral damage: TLB entries - from areas other than the one we are trying to flush will be - destroyed and must be refilled later, at some cost. - 2. Use the invlpg instruction to invalidate a single page at a - time. This could potentially cost many more instructions, but - it is a much more precise operation, causing no collateral - damage to other TLB entries. - -Which method to do depends on a few things: - - 1. The size of the flush being performed. A flush of the entire - address space is obviously better performed by flushing the - entire TLB than doing 2^48/PAGE_SIZE individual flushes. - 2. The contents of the TLB. If the TLB is empty, then there will - be no collateral damage caused by doing the global flush, and - all of the individual flush will have ended up being wasted - work. - 3. The size of the TLB. The larger the TLB, the more collateral - damage we do with a full flush. So, the larger the TLB, the - more attractive an individual flush looks. Data and - instructions have separate TLBs, as do different page sizes. - 4. The microarchitecture. The TLB has become a multi-level - cache on modern CPUs, and the global flushes have become more - expensive relative to single-page flushes. - -There is obviously no way the kernel can know all these things, -especially the contents of the TLB during a given flush. The -sizes of the flush will vary greatly depending on the workload as -well. There is essentially no "right" point to choose. - -You may be doing too many individual invalidations if you see the -invlpg instruction (or instructions _near_ it) show up high in -profiles. If you believe that individual invalidations being -called too often, you can lower the tunable:: - - /sys/kernel/debug/x86/tlb_single_page_flush_ceiling - -This will cause us to do the global flush for more cases. -Lowering it to 0 will disable the use of the individual flushes. -Setting it to 1 is a very conservative setting and it should -never need to be 0 under normal circumstances. - -Despite the fact that a single individual flush on x86 is -guaranteed to flush a full 2MB [1]_, hugetlbfs always uses the full -flushes. THP is treated exactly the same as normal memory. - -You might see invlpg inside of flush_tlb_mm_range() show up in -profiles, or you can use the trace_tlb_flush() tracepoints. to -determine how long the flush operations are taking. - -Essentially, you are balancing the cycles you spend doing invlpg -with the cycles that you spend refilling the TLB later. - -You can measure how expensive TLB refills are by using -performance counters and 'perf stat', like this:: - - perf stat -e - cpu/event=0x8,umask=0x84,name=dtlb_load_misses_walk_duration/, - cpu/event=0x8,umask=0x82,name=dtlb_load_misses_walk_completed/, - cpu/event=0x49,umask=0x4,name=dtlb_store_misses_walk_duration/, - cpu/event=0x49,umask=0x2,name=dtlb_store_misses_walk_completed/, - cpu/event=0x85,umask=0x4,name=itlb_misses_walk_duration/, - cpu/event=0x85,umask=0x2,name=itlb_misses_walk_completed/ - -That works on an IvyBridge-era CPU (i5-3320M). Different CPUs -may have differently-named counters, but they should at least -be there in some form. You can use pmu-tools 'ocperf list' -(https://github.com/andikleen/pmu-tools) to find the right -counters for a given CPU. - -.. [1] A footnote in Intel's SDM "4.10.4.2 Recommended Invalidation" - says: "One execution of INVLPG is sufficient even for a page - with size greater than 4 KBytes." |