diff options
author | Andy Lutomirski <[email protected]> | 2017-10-09 09:50:49 -0700 |
---|---|---|
committer | Ingo Molnar <[email protected]> | 2017-10-14 09:21:24 +0200 |
commit | b956575bed91ecfb136a8300742ecbbf451471ab (patch) | |
tree | a32c32334f11a31573ffa9b0e335e65948cfb4cc /net/lapb/lapb_iface.c | |
parent | 616dd5872e52493863b0202632703eebd51243dc (diff) |
x86/mm: Flush more aggressively in lazy TLB mode
Since commit:
94b1b03b519b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
x86's lazy TLB mode has been all the way lazy: when running a kernel thread
(including the idle thread), the kernel keeps using the last user mm's
page tables without attempting to maintain user TLB coherence at all.
From a pure semantic perspective, this is fine -- kernel threads won't
attempt to access user pages, so having stale TLB entries doesn't matter.
Unfortunately, I forgot about a subtlety. By skipping TLB flushes,
we also allow any paging-structure caches that may exist on the CPU
to become incoherent. This means that we can have a
paging-structure cache entry that references a freed page table, and
the CPU is within its rights to do a speculative page walk starting
at the freed page table.
I can imagine this causing two different problems:
- A speculative page walk starting from a bogus page table could read
IO addresses. I haven't seen any reports of this causing problems.
- A speculative page walk that involves a bogus page table can install
garbage in the TLB. Such garbage would always be at a user VA, but
some AMD CPUs have logic that triggers a machine check when it notices
these bogus entries. I've seen a couple reports of this.
Boris further explains the failure mode:
> It is actually more of an optimization which assumes that paging-structure
> entries are in WB DRAM:
>
> "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
> performance optimization that assumes PML4, PDP, PDE, and PTE entries
> are in cacheable WB-DRAM; memory type checks may be bypassed, and
> addresses outside of WB-DRAM may result in undefined behavior or NB
> protocol errors. 1=Disables performance optimization and allows PML4,
> PDP, PDE and PTE entries to be in any memory type. Operating systems
> that maintain page tables in memory types other than WB- DRAM must set
> TlbCacheDis to insure proper operation."
>
> The MCE generated is an NB protocol error to signal that
>
> "Link: A specific coherent-only packet from a CPU was issued to an
> IO link. This may be caused by software which addresses page table
> structures in a memory type other than cacheable WB-DRAM without
> properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
> example, when page table structure addresses are above top of memory. In
> such cases, the NB will generate an MCE if it sees a mismatch between
> the memory operation generated by the core and the link type."
>
> I'm assuming coherent-only packets don't go out on IO links, thus the
> error.
To fix this, reinstate TLB coherence in lazy mode. With this patch
applied, we do it in one of two ways:
- If we have PCID, we simply switch back to init_mm's page tables
when we enter a kernel thread -- this seems to be quite cheap
except for the cost of serializing the CPU.
- If we don't have PCID, then we set a flag and switch to init_mm
the first time we would otherwise need to flush the TLB.
The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
to override the default mode for benchmarking.
In theory, we could optimize this better by only flushing the TLB in
lazy CPUs when a page table is freed. Doing that would require
auditing the mm code to make sure that all page table freeing goes
through tlb_remove_page() as well as reworking some data structures
to implement the improved flush logic.
Reported-by: Markus Trippelsdorf <[email protected]>
Reported-by: Adam Borowski <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Johannes Hirte <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Roman Kagan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 94b1b03b519b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Diffstat (limited to 'net/lapb/lapb_iface.c')
0 files changed, 0 insertions, 0 deletions