Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry updates from Thomas Gleixner:
"The x86 entry, exception and interrupt code rework
This all started about 6 month ago with the attempt to move the Posix
CPU timer heavy lifting out of the timer interrupt code and just have
lockless quick checks in that code path. Trivial 5 patches.
This unearthed an inconsistency in the KVM handling of task work and
the review requested to move all of this into generic code so other
architectures can share.
Valid request and solved with another 25 patches but those unearthed
inconsistencies vs. RCU and instrumentation.
Digging into this made it obvious that there are quite some
inconsistencies vs. instrumentation in general. The int3 text poke
handling in particular was completely unprotected and with the batched
update of trace events even more likely to expose to endless int3
recursion.
In parallel the RCU implications of instrumenting fragile entry code
came up in several discussions.
The conclusion of the x86 maintainer team was to go all the way and
make the protection against any form of instrumentation of fragile and
dangerous code pathes enforcable and verifiable by tooling.
A first batch of preparatory work hit mainline with commit
d5f744f9a2ac ("Pull x86 entry code updates from Thomas Gleixner")
That (almost) full solution introduced a new code section
'.noinstr.text' into which all code which needs to be protected from
instrumentation of all sorts goes into. Any call into instrumentable
code out of this section has to be annotated. objtool has support to
validate this.
Kprobes now excludes this section fully which also prevents BPF from
fiddling with it and all 'noinstr' annotated functions also keep
ftrace off. The section, kprobes and objtool changes are already
merged.
The major changes coming with this are:
- Preparatory cleanups
- Annotating of relevant functions to move them into the
noinstr.text section or enforcing inlining by marking them
__always_inline so the compiler cannot misplace or instrument
them.
- Splitting and simplifying the idtentry macro maze so that it is
now clearly separated into simple exception entries and the more
interesting ones which use interrupt stacks and have the paranoid
handling vs. CR3 and GS.
- Move quite some of the low level ASM functionality into C code:
- enter_from and exit to user space handling. The ASM code now
calls into C after doing the really necessary ASM handling and
the return path goes back out without bells and whistels in
ASM.
- exception entry/exit got the equivivalent treatment
- move all IRQ tracepoints from ASM to C so they can be placed as
appropriate which is especially important for the int3
recursion issue.
- Consolidate the declaration and definition of entry points between
32 and 64 bit. They share a common header and macros now.
- Remove the extra device interrupt entry maze and just use the
regular exception entry code.
- All ASM entry points except NMI are now generated from the shared
header file and the corresponding macros in the 32 and 64 bit
entry ASM.
- The C code entry points are consolidated as well with the help of
DEFINE_IDTENTRY*() macros. This allows to ensure at one central
point that all corresponding entry points share the same
semantics. The actual function body for most entry points is in an
instrumentable and sane state.
There are special macros for the more sensitive entry points, e.g.
INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
They allow to put the whole entry instrumentation and RCU handling
into safe places instead of the previous pray that it is correct
approach.
- The INT3 text poke handling is now completely isolated and the
recursion issue banned. Aside of the entry rework this required
other isolation work, e.g. the ability to force inline bsearch.
- Prevent #DB on fragile entry code, entry relevant memory and
disable it on NMI, #MC entry, which allowed to get rid of the
nested #DB IST stack shifting hackery.
- A few other cleanups and enhancements which have been made
possible through this and already merged changes, e.g.
consolidating and further restricting the IDT code so the IDT
table becomes RO after init which removes yet another popular
attack vector
- About 680 lines of ASM maze are gone.
There are a few open issues:
- An escape out of the noinstr section in the MCE handler which needs
some more thought but under the aspect that MCE is a complete
trainwreck by design and the propability to survive it is low, this
was not high on the priority list.
- Paravirtualization
When PV is enabled then objtool complains about a bunch of indirect
calls out of the noinstr section. There are a few straight forward
ways to fix this, but the other issues vs. general correctness were
more pressing than parawitz.
- KVM
KVM is inconsistent as well. Patches have been posted, but they
have not yet been commented on or picked up by the KVM folks.
- IDLE
Pretty much the same problems can be found in the low level idle
code especially the parts where RCU stopped watching. This was
beyond the scope of the more obvious and exposable problems and is
on the todo list.
The lesson learned from this brain melting exercise to morph the
evolved code base into something which can be validated and understood
is that once again the violation of the most important engineering
principle "correctness first" has caused quite a few people to spend
valuable time on problems which could have been avoided in the first
place. The "features first" tinkering mindset really has to stop.
With that I want to say thanks to everyone involved in contributing to
this effort. Special thanks go to the following people (alphabetical
order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian
Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai
Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra,
Vitaly Kuznetsov, and Will Deacon"
* tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits)
x86/entry: Force rcu_irq_enter() when in idle task
x86/entry: Make NMI use IDTENTRY_RAW
x86/entry: Treat BUG/WARN as NMI-like entries
x86/entry: Unbreak __irqentry_text_start/end magic
x86/entry: __always_inline CR2 for noinstr
lockdep: __always_inline more for noinstr
x86/entry: Re-order #DB handler to avoid *SAN instrumentation
x86/entry: __always_inline arch_atomic_* for noinstr
x86/entry: __always_inline irqflags for noinstr
x86/entry: __always_inline debugreg for noinstr
x86/idt: Consolidate idt functionality
x86/idt: Cleanup trap_init()
x86/idt: Use proper constants for table size
x86/idt: Add comments about early #PF handling
x86/idt: Mark init only functions __init
x86/entry: Rename trace_hardirqs_off_prepare()
x86/entry: Clarify irq_{enter,exit}_rcu()
x86/entry: Remove DBn stacks
x86/entry: Remove debug IDT frobbing
x86/entry: Optimize local_db_save() for virt
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull the Kernel Concurrency Sanitizer from Thomas Gleixner:
"The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector,
which relies on compile-time instrumentation, and uses a
watchpoint-based sampling approach to detect races.
The feature was under development for quite some time and has already
found legitimate bugs.
Unfortunately it comes with a limitation, which was only understood
late in the development cycle:
It requires an up to date CLANG-11 compiler
CLANG-11 is not yet released (scheduled for June), but it's the only
compiler today which handles the kernel requirements and especially
the annotations of functions to exclude them from KCSAN
instrumentation correctly.
These annotations really need to work so that low level entry code and
especially int3 text poke handling can be completely isolated.
A detailed discussion of the requirements and compiler issues can be
found here:
https://lore.kernel.org/lkml/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com/
We came to the conclusion that trying to work around compiler
limitations and bugs again would end up in a major trainwreck, so
requiring a working compiler seemed to be the best choice.
For Continous Integration purposes the compiler restriction is
manageable and that's where most xxSAN reports come from.
For a change this limitation might make GCC people actually look at
their bugs. Some issues with CSAN in GCC are 7 years old and one has
been 'fixed' 3 years ago with a half baken solution which 'solved' the
reported issue but not the underlying problem.
The KCSAN developers also ponder to use a GCC plugin to become
independent, but that's not something which will show up in a few
days.
Blocking KCSAN until wide spread compiler support is available is not
a really good alternative because the continuous growth of lockless
optimizations in the kernel demands proper tooling support"
* tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining
compiler.h: Move function attributes to compiler_types.h
compiler.h: Avoid nested statement expression in data_race()
compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE()
kcsan: Update Documentation to change supported compilers
kcsan: Remove 'noinline' from __no_kcsan_or_inline
kcsan: Pass option tsan-instrument-read-before-write to Clang
kcsan: Support distinguishing volatile accesses
kcsan: Restrict supported compilers
kcsan: Avoid inserting __tsan_func_entry/exit if possible
ubsan, kcsan: Don't combine sanitizer with kcov on clang
objtool, kcsan: Add kcsan_disable_current() and kcsan_enable_current_nowarn()
kcsan: Add __kcsan_{enable,disable}_current() variants
checkpatch: Warn about data_race() without comment
kcsan: Use GFP_ATOMIC under spin lock
Improve KCSAN documentation a bit
kcsan: Make reporting aware of KCSAN tests
kcsan: Fix function matching in report
kcsan: Change data_race() to no longer require marking racing accesses
kcsan: Move kcsan_{disable,enable}_current() to kcsan-checks.h
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 updates from Thomas Gleixner:
"A set of fixes and updates for x86:
- Unbreak paravirt VDSO clocks.
While the VDSO code was moved into lib for sharing a subtle check
for the validity of paravirt clocks got replaced. While the
replacement works perfectly fine for bare metal as the update of
the VDSO clock mode is synchronous, it fails for paravirt clocks
because the hypervisor can invalidate them asynchronously.
Bring it back as an optional function so it does not inflict this
on architectures which are free of PV damage.
- Fix the jiffies to jiffies64 mapping on 64bit so it does not
trigger an ODR violation on newer compilers
- Three fixes for the SSBD and *IB* speculation mitigation maze to
ensure consistency, not disabling of some *IB* variants wrongly and
to prevent a rogue cross process shutdown of SSBD. All marked for
stable.
- Add yet more CPU models to the splitlock detection capable list
!@#%$!
- Bring the pr_info() back which tells that TSC deadline timer is
enabled.
- Reboot quirk for MacBook6,1"
* tag 'x86-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Unbreak paravirt VDSO clocks
lib/vdso: Provide sanity check for cycles (again)
clocksource: Remove obsolete ifdef
x86_64: Fix jiffies ODR violation
x86/speculation: PR_SPEC_FORCE_DISABLE enforcement for indirect branches.
x86/speculation: Prevent rogue cross-process SSBD shutdown
x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.
x86/cpu: Add Sapphire Rapids CPU model number
x86/split_lock: Add Icelake microserver and Tigerlake CPU models
x86/apic: Make TSC deadline timer detection message visible
x86/reboot/quirks: Add MacBook6,1 reboot quirk
|
|
Merge the state of the locking kcsan branch before the read/write_once()
and the atomics modifications got merged.
Squash the fallout of the rebase on top of the read/write once and atomic
fallback work into the merge. The history of the original branch is
preserved in tag locking-kcsan-2020-06-02.
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The kbuild test robot reported this warning:
arch/x86/kernel/cpu/mce/dev-mcelog.c: In function 'dev_mcelog_init_device':
arch/x86/kernel/cpu/mce/dev-mcelog.c:346:2: warning: 'strncpy' output \
truncated before terminating nul copying 12 bytes from a string of the \
same length [-Wstringop-truncation]
This is accurate, but I don't care that the trailing NUL character isn't
copied. The string being copied is just a magic number signature so that
crash dump tools can be sure they are decoding the right blob of memory.
Use memcpy() instead of strncpy().
Fixes: d8ecca4043f2 ("x86/mce/dev-mcelog: Dynamically allocate space for machine check records")
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
An interesting thing happened when a guest Linux instance took a machine
check. The VMM unmapped the bad page from guest physical space and
passed the machine check to the guest.
Linux took all the normal actions to offline the page from the process
that was using it. But then guest Linux crashed because it said there
was a second machine check inside the kernel with this stack trace:
do_memory_failure
set_mce_nospec
set_memory_uc
_set_memory_uc
change_page_attr_set_clr
cpa_flush
clflush_cache_range_opt
This was odd, because a CLFLUSH instruction shouldn't raise a machine
check (it isn't consuming the data). Further investigation showed that
the VMM had passed in another machine check because is appeared that the
guest was accessing the bad page.
Fix is to check the scope of the poison by checking the MCi_MISC register.
If the entire page is affected, then unmap the page. If only part of the
page is affected, then mark the page as uncacheable.
This assumes that VMMs will do the logical thing and pass in the "whole
page scope" via the MCi_MISC register (since they unmapped the entire
page).
[ bp: Adjust to x86/entry changes. ]
Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Reported-by: Jue Wang <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Jue Wang <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
to fixup conflicts in arch/x86/kernel/cpu/mce/core.c so MCE specific follow
up patches can be applied without creating a horrible merge conflict
afterwards.
|
|
The typical pattern for trace_hardirqs_off_prepare() is:
ENTRY
lockdep_hardirqs_off(); // because hardware
... do entry magic
instrumentation_begin();
trace_hardirqs_off_prepare();
... do actual work
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare();
instrumentation_end();
... do exit magic
lockdep_hardirqs_on();
which shows that it's named wrong, rename it to
trace_hardirqs_off_finish(), as it concludes the hardirq_off transition.
Also, given that the above is the only correct order, make the traditional
all-in-one trace_hardirqs_off() follow suit.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
This is all unused now.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
#MC is fragile as heck, don't tempt fate.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The last step to remove the irq tracing cruft from ASM. Ignore #DF as the
maschine is going to die anyway.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Convert various hypervisor vectors to IDTENTRY_SYSVEC:
- Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
- Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
- Remove the ASM idtentries in 64-bit
- Remove the BUILD_INTERRUPT entries in 32-bit
- Remove the old prototypes
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Convert various system vectors to IDTENTRY_SYSVEC:
- Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
- Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
- Remove the ASM idtentries in 64-bit
- Remove the BUILD_INTERRUPT entries in 32-bit
- Remove the old prototypes
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Switch all idtentry_enter/exit() users over to the new conditional RCU
handling scheme and make the user mode entries in #DB, #INT3 and #MCE use
the user mode idtentry functions.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Mark the relevant functions noinstr, use the plain non-instrumented MSR
accessors. The only odd part is the instrumentation_begin()/end() pair around the
indirect machine_check_vector() call as objtool can't figure that out. The
possible invoked functions are annotated correctly.
Also use notrace variant of nmi_enter/exit(). If MCEs happen then hardware
latency tracing is the least of the worries.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The MCE entry point uses the same mechanism as the IST entry point for
now. For #DB split the inner workings and just keep the nmi_enter/exit()
magic in the IST variant. Fixup the ASM code to emit the proper
noist_##cfunc call.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Mark all functions in the fragile code parts noinstr or force inlining so
they can't be instrumented.
Also make the hardware latency tracer invocation explicit outside of
non-instrumentable section.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
mce_check_crashing_cpu() is called right at the entry of the MCE
handler. It uses mce_rdmsr() and mce_wrmsr() which are wrappers around
rdmsr() and wrmsr() to handle the MCE error injection mechanism, which is
pointless in this context, i.e. when the MCE hits an offline CPU or the
system is already marked crashing.
The MSR access can also be traced, so use the untraceable variants. This
is also safe vs. XEN paravirt as these MSRs are not affected by XEN PV
modifications.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Convert #MC to IDTENTRY_MCE:
- Implement the C entry points with DEFINE_IDTENTRY_MCE
- Emit the ASM stub with DECLARE_IDTENTRY_MCE
- Remove the ASM idtentry in 64bit
- Remove the open coded ASM entry code in 32bit
- Fixup the XEN/PV code
- Remove the old prototypes
- Remove the error code from *machine_check_vector() as
it is always 0 and not used by any of the functions
it can point to. Fixup all the functions as well.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
There is no reason to have nmi_enter/exit() in the actual MCE
handlers. Move it to the entry point. This also covers the until now
uncovered initial handler which only prints.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Merge even more updates from Andrew Morton:
- a kernel-wide sweep of show_stack()
- pagetable cleanups
- abstract out accesses to mmap_sem - prep for mmap_sem scalability work
- hch's user acess work
Subsystems affected by this patch series: debug, mm/pagemap, mm/maccess,
mm/documentation.
* emailed patches from Andrew Morton <[email protected]>: (93 commits)
include/linux/cache.h: expand documentation over __read_mostly
maccess: return -ERANGE when probe_kernel_read() fails
x86: use non-set_fs based maccess routines
maccess: allow architectures to provide kernel probing directly
maccess: move user access routines together
maccess: always use strict semantics for probe_kernel_read
maccess: remove strncpy_from_unsafe
tracing/kprobes: handle mixed kernel/userspace probes better
bpf: rework the compat kernel probe handling
bpf:bpf_seq_printf(): handle potentially unsafe format string better
bpf: handle the compat string in bpf_trace_copy_string better
bpf: factor out a bpf_trace_copy_string helper
maccess: unify the probe kernel arch hooks
maccess: remove probe_read_common and probe_write_common
maccess: rename strnlen_unsafe_user to strnlen_user_nofault
maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault
maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault
maccess: update the top of file comment
maccess: clarify kerneldoc comments
maccess: remove duplicate kerneldoc comments
...
|
|
Convert comments that reference mmap_sem to reference mmap_lock instead.
[[email protected]: fix up linux-next leftovers]
[[email protected]: s/lockaphore/lock/, per Vlastimil]
[[email protected]: more linux-next fixups, per Michel]
Signed-off-by: Michel Lespinasse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ying Han <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes. Fix this up with the aid of
the below script and manual adjustments here and there.
import sys
import re
if len(sys.argv) is not 3:
print "USAGE: %s <file> <header>" % (sys.argv[0])
sys.exit(1)
hdr_to_move="#include <linux/%s>" % sys.argv[2]
moved = False
in_hdrs = False
with open(sys.argv[1], "r") as f:
lines = f.readlines()
for _line in lines:
line = _line.rstrip('
')
if line == hdr_to_move:
continue
if line.startswith("#include <linux/"):
in_hdrs = True
elif not moved and in_hdrs:
moved = True
print hdr_to_move
print line
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.
Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 srbds fixes from Thomas Gleixner:
"The 9th episode of the dime novel "The performance killer" with the
subtitle "Slow Randomizing Boosts Denial of Service".
SRBDS is an MDS-like speculative side channel that can leak bits from
the random number generator (RNG) across cores and threads. New
microcode serializes the processor access during the execution of
RDRAND and RDSEED. This ensures that the shared buffer is overwritten
before it is released for reuse. This is equivalent to a full bus
lock, which means that many threads running the RNG instructions in
parallel have the same effect as the same amount of threads issuing a
locked instruction targeting an address which requires locking of two
cachelines at once.
The mitigation support comes with the usual pile of unpleasant
ingredients:
- command line options
- sysfs file
- microcode checks
- a list of vulnerable CPUs identified by model and stepping this
time which requires stepping match support for the cpu match logic.
- the inevitable slowdown of affected CPUs"
* branch 'x86/srbds' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Add Ivy Bridge to affected list
x86/speculation: Add SRBDS vulnerability and mitigation documentation
x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation
x86/cpu: Add 'table' argument to cpu_matches()
|
|
Currently, it is possible to enable indirect branch speculation even after
it was force-disabled using the PR_SPEC_FORCE_DISABLE option. Moreover, the
PR_GET_SPECULATION_CTRL command gives afterwards an incorrect result
(force-disabled when it is in fact enabled). This also is inconsistent
vs. STIBP and the documention which cleary states that
PR_SPEC_FORCE_DISABLE cannot be undone.
Fix this by actually enforcing force-disabled indirect branch
speculation. PR_SPEC_ENABLE called after PR_SPEC_FORCE_DISABLE now fails
with -EPERM as described in the documentation.
Fixes: 9137bb27e60e ("x86/speculation: Add prctl() control for indirect branch speculation")
Signed-off-by: Anthony Steinhauser <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
|
|
When STIBP is unavailable or enhanced IBRS is available, Linux
force-disables the IBPB mitigation of Spectre-BTB even when simultaneous
multithreading is disabled. While attempts to enable IBPB using
prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, ...) fail with
EPERM, the seccomp syscall (or its prctl(PR_SET_SECCOMP, ...) equivalent)
which are used e.g. by Chromium or OpenSSH succeed with no errors but the
application remains silently vulnerable to cross-process Spectre v2 attacks
(classical BTB poisoning). At the same time the SYSFS reporting
(/sys/devices/system/cpu/vulnerabilities/spectre_v2) displays that IBPB is
conditionally enabled when in fact it is unconditionally disabled.
STIBP is useful only when SMT is enabled. When SMT is disabled and STIBP is
unavailable, it makes no sense to force-disable also IBPB, because IBPB
protects against cross-process Spectre-BTB attacks regardless of the SMT
state. At the same time since missing STIBP was only observed on AMD CPUs,
AMD does not recommend using STIBP, but recommends using IBPB, so disabling
IBPB because of missing STIBP goes directly against AMD's advice:
https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf
Similarly, enhanced IBRS is designed to protect cross-core BTB poisoning
and BTB-poisoning attacks from user space against kernel (and
BTB-poisoning attacks from guest against hypervisor), it is not designed
to prevent cross-process (or cross-VM) BTB poisoning between processes (or
VMs) running on the same core. Therefore, even with enhanced IBRS it is
necessary to flush the BTB during context-switches, so there is no reason
to force disable IBPB when enhanced IBRS is available.
Enable the prctl control of IBPB even when STIBP is unavailable or enhanced
IBRS is available.
Fixes: 7cc765a67d8e ("x86/speculation: Enable prctl mode for spectre_v2_user")
Signed-off-by: Anthony Steinhauser <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
"Misc changes:
- Unexport various PAT primitives
- Unexport per-CPU tlbstate and uninline TLB helpers"
* tag 'x86-mm-2020-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/tlb/uv: Add a forward declaration for struct flush_tlb_info
x86/cpu: Export native_write_cr4() only when CONFIG_LKTDM=m
x86/tlb: Restrict access to tlbstate
xen/privcmd: Remove unneeded asm/tlb.h include
x86/tlb: Move PCID helpers where they are used
x86/tlb: Uninline nmi_uaccess_okay()
x86/tlb: Move cr4_set_bits_and_update_boot() to the usage site
x86/tlb: Move paravirt_tlb_remove_table() to the usage site
x86/tlb: Move __flush_tlb_all() out of line
x86/tlb: Move flush_tlb_others() out of line
x86/tlb: Move __flush_tlb_one_kernel() out of line
x86/tlb: Move __flush_tlb_one_user() out of line
x86/tlb: Move __flush_tlb_global() out of line
x86/tlb: Move __flush_tlb() out of line
x86/alternatives: Move temporary_mm helpers into C
x86/cr4: Sanitize CR4.PCE update
x86/cpu: Uninline CR4 accessors
x86/tlb: Uninline __get_current_cr3_fast()
x86/mm: Use pgprotval_t in protval_4k_2_large() and protval_large_2_4k()
x86/mm: Unexport __cachemode2pte_tbl
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
"Misc updates:
- Extend the x86 family/model macros with a steppings dimension,
because x86 life isn't complex enough and Intel uses steppings to
differentiate between different CPUs. :-/
- Convert the TSC deadline timer quirks to the steppings macros.
- Clean up asm mnemonics.
- Fix the handling of an AMD erratum, or in other words, fix a kernel
erratum"
* tag 'x86-cpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Use RDRAND and RDSEED mnemonics in archrandom.h
x86/cpu: Use INVPCID mnemonic in invpcid.h
x86/cpu/amd: Make erratum #1054 a legacy erratum
x86/apic: Convert the TSC deadline timer matching to steppings macro
x86/cpu: Add a X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macro
x86/cpu: Add a steppings field to struct x86_cpu_id
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Add AMD Fam17h RAPL support
- Introduce CAP_PERFMON to kernel and user space
- Add Zhaoxin CPU support
- Misc fixes and cleanups
Tooling changes:
- perf record:
Introduce '--switch-output-event' to use arbitrary events to be
setup and read from a side band thread and, when they take place a
signal be sent to the main 'perf record' thread, reusing the core
for '--switch-output' to take perf.data snapshots from the ring
buffer used for '--overwrite', e.g.:
# perf record --overwrite -e sched:* \
--switch-output-event syscalls:*connect* \
workload
will take perf.data.YYYYMMDDHHMMSS snapshots up to around the
connect syscalls.
Add '--num-synthesize-threads' option to control degree of
parallelism of the synthesize_mmap() code which is scanning
/proc/PID/task/PID/maps and can be time consuming. This mimics
pre-existing behaviour in 'perf top'.
- perf bench:
Add a multi-threaded synthesize benchmark and kallsyms parsing
benchmark.
- Intel PT support:
Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
Allow using Intel PT to synthesize callchains for regular events.
Add support for synthesizing branch stacks for regular events
(cycles, instructions, etc) from Intel PT data.
Misc changes:
- Updated perf vendor events for power9 and Coresight.
- Add flamegraph.py script via 'perf flamegraph'
- Misc other changes, fixes and cleanups - see the Git log for details
Also, since over the last couple of years perf tooling has matured and
decoupled from the kernel perf changes to a large degree, going
forward Arnaldo is going to send perf tooling changes via direct pull
requests"
* tag 'perf-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (163 commits)
perf/x86/rapl: Add AMD Fam17h RAPL support
perf/x86/rapl: Make perf_probe_msr() more robust and flexible
perf/x86/rapl: Flip logic on default events visibility
perf/x86/rapl: Refactor to share the RAPL code between Intel and AMD CPUs
perf/x86/rapl: Move RAPL support to common x86 code
perf/core: Replace zero-length array with flexible-array
perf/x86: Replace zero-length array with flexible-array
perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel Tremont
perf/x86/rapl: Add Ice Lake RAPL support
perf flamegraph: Use /bin/bash for report and record scripts
perf cs-etm: Move definition of 'traceid_list' global variable from header file
libsymbols kallsyms: Move hex2u64 out of header
libsymbols kallsyms: Parse using io api
perf bench: Add kallsyms parsing
perf: cs-etm: Update to build with latest opencsd version.
perf symbol: Fix kernel symbol address display
perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The RCU updates for this cycle were:
- RCU-tasks update, including addition of RCU Tasks Trace for BPF use
and TASKS_RUDE_RCU
- kfree_rcu() updates.
- Remove scheduler locking restriction
- RCU CPU stall warning updates.
- Torture-test updates.
- Miscellaneous fixes and other updates"
* tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
rcu: Allow for smp_call_function() running callbacks from idle
rcu: Provide rcu_irq_exit_check_preempt()
rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()
rcu: Provide __rcu_is_watching()
rcu: Provide rcu_irq_exit_preempt()
rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
rcu/tree: Mark the idle relevant functions noinstr
x86: Replace ist_enter() with nmi_enter()
x86/mce: Send #MC singal from task work
x86/entry: Get rid of ist_begin/end_non_atomic()
sched,rcu,tracing: Avoid tracing before in_nmi() is correct
sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception
lockdep: Always inline lockdep_{off,on}()
hardirq/nmi: Allow nested nmi_enter()
arm64: Prepare arch_nmi_enter() for recursion
printk: Disallow instrumenting print_nmi_enter()
printk: Prepare for nested printk_nmi_enter()
rcutorture: Convert ULONG_CMP_LT() to time_before()
torture: Add a --kasan argument
torture: Save a few lines by using config_override_param initially
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource control updates from Borislav Petkov:
"Add support for wider Memory Bandwidth Monitoring counters by querying
their width from CPUID.
As a prerequsite for that, streamline and unify the CPUID detection of
the respective resource control attributes.
By Reinette Chatre"
* tag 'x86_cache_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Support wider MBM counters
x86/resctrl: Support CPUID enumeration of MBM counter width
x86/resctrl: Maintain MBM counter width per resource
x86/resctrl: Query LLC monitoring properties once during boot
x86/resctrl: Remove unnecessary RMID checks
x86/cpu: Move resctrl CPUID code to resctrl/
x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode update from Borislav Petkov:
"A single fix for late microcode loading to handle the correct return
value from stop_machine(), from Mihai Carabas"
* tag 'x86_microcode_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Fix return value for microcode late loading
|
|
Icelake microserver CPU supports split lock detection while it doesn't
have the split lock enumeration bit in IA32_CORE_CAPABILITIES. Tigerlake
CPUs do enumerate the MSR.
[ bp: Merge the two model-adding patches into one. ]
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
A few exceptions (like #DB and #BP) can happen at any location in the code,
this then means that tracers should treat events from these exceptions as
NMI-like. The interrupted context could be holding locks with interrupts
disabled for instance.
Similarly, #MC is an actual NMI-like exception.
All of them use ist_enter() which only concerns itself with RCU, but does
not do any of the other setup that NMIs need. This means things like:
printk()
raw_spin_lock_irq(&logbuf_lock);
<#DB/#BP/#MC>
printk()
raw_spin_lock_irq(&logbuf_lock);
are entirely possible (well, not really since printk tries hard to
play nice, but the concept stands).
So replace ist_enter() with nmi_enter(). Also observe that any nmi_enter()
caller must be both notrace and NOKPROBE, or in the noinstr text section.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Convert #MC over to using task_work_add(); it will run the same code
slightly later, on the return to user path of the same exception.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
This is completely overengineered and definitely not an interface which
should be made available to anything else than this particular MCE case.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Commit
21b5ee59ef18 ("x86/cpu/amd: Enable the fixed Instructions Retired
counter IRPERF")
mistakenly added erratum #1054 as an OS Visible Workaround (OSVW) ID 0.
Erratum #1054 is not OSVW ID 0 [1], so make it a legacy erratum.
There would never have been a false positive on older hardware that
has OSVW bit 0 set, since the IRPERF feature was not available.
However, save a couple of RDMSR executions per thread, on modern
system configurations that correctly set non-zero values in their
OSVW_ID_Length MSRs.
[1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors. The
revision guide is available from the bugzilla link below.
Fixes: 21b5ee59ef18 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF")
Reported-by: Andrew Cooper <[email protected]>
Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
|
|
The original Memory Bandwidth Monitoring (MBM) architectural
definition defines counters of up to 62 bits in the
IA32_QM_CTR MSR while the first-generation MBM implementation
uses statically defined 24 bit counters.
The MBM CPUID enumeration properties have been expanded to include
the MBM counter width, encoded as an offset from 24 bits.
While eight bits are available for the counter width offset IA32_QM_CTR
MSR only supports 62 bit counters. Add a sanity check, with warning
printed when encountered, to ensure counters cannot exceed the 62 bit
limit.
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/69d52abd5b14794d3a0f05ba7c755ed1f4c0d5ed.1588715690.git.reinette.chatre@intel.com
|
|
The original Memory Bandwidth Monitoring (MBM) architectural
definition defines counters of up to 62 bits in the
IA32_QM_CTR MSR while the first-generation MBM implementation
uses statically defined 24 bit counters.
Expand the MBM CPUID enumeration properties to include the MBM
counter width. The previously undefined EAX output register contains,
in bits [7:0], the MBM counter width encoded as an offset from
24 bits. Enumerating this property is only specified for Intel
CPUs.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.com
|
|
The original Memory Bandwidth Monitoring (MBM) architectural
definition defines counters of up to 62 bits in the IA32_QM_CTR MSR,
and the first-generation MBM implementation uses 24 bit counters.
Software is required to poll at 1 second or faster to ensure that
data is retrieved before a counter rollover occurs more than once
under worst conditions.
As system bandwidths scale the software requirement is maintained with
the introduction of a per-resource enumerable MBM counter width.
In preparation for supporting hardware with an enumerable MBM counter
width the current globally static MBM counter width is moved to a
per-resource MBM counter width. Currently initialized to 24 always
to result in no functional change.
In essence there is one function, mbm_overflow_count() that needs to
know the counter width to handle rollovers. The static value
used within mbm_overflow_count() will be replaced with a value
discovered from the hardware. Support for learning the MBM counter
width from hardware is added in the change that follows.
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/e36743b9800f16ce600f86b89127391f61261f23.1588715690.git.reinette.chatre@intel.com
|
|
Cache and memory bandwidth monitoring are features that are part of
x86 CPU resource control that is supported by the resctrl subsystem.
The monitoring properties are obtained via CPUID from every CPU
and only used within the resctrl subsystem where the properties are
only read from boot_cpu_data.
Obtain the monitoring properties once, placed in boot_cpu_data, via the
->c_bsp_init() helpers of the vendors that support X86_FEATURE_CQM_LLC.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/6d74a6ac3e69f4b7a8b4115835f9455faf0f468d.1588715690.git.reinette.chatre@intel.com
|
|
The cache and memory bandwidth monitoring properties are read using
CPUID on every CPU. After the information is read from the system a
sanity check is run to
(1) ensure that the RMID data is initialized for the boot CPU in case
the information was not available on the boot CPU and
(2) the boot CPU's RMID is set to the minimum of RMID obtained
from all CPUs.
Every known platform that supports resctrl has the same maximum RMID
on all CPUs. Both sanity checks found in x86_init_cache_qos() can thus
safely be removed.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/c9a3b60d34091840c8b0bd1c6fab15e5ba92cb17.1588715690.git.reinette.chatre@intel.com
|
|
The function determining a platform's support and properties of cache
occupancy and memory bandwidth monitoring (properties of
X86_FEATURE_CQM_LLC) can be found among the common CPU code. After
the feature's properties is populated in the per-CPU data the resctrl
subsystem is the only consumer (via boot_cpu_data).
Move the function that obtains the CPU information used by resctrl to
the resctrl subsystem and rename it from init_cqm() to
resctrl_cpu_detect(). The function continues to be called from the
common CPU code. This move is done in preparation of the addition of some
vendor specific code.
No functional change.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/38433b99f9d16c8f4ee796f8cc42b871531fa203.1588715690.git.reinette.chatre@intel.com
|
|
asm/resctrl_sched.h is dedicated to the code used for configuration
of the CPU resource control state when a task is scheduled.
Rename resctrl_sched.h to resctrl.h in preparation of additions that
will no longer make this file dedicated to work done during scheduling.
No functional change.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/6914e0ef880b539a82a6d889f9423496d471ad1d.1588715690.git.reinette.chatre@intel.com
|
|
A 32-bit version of mcelog issuing ioctls on /dev/mcelog causes errors
like the following:
MCE_GET_RECORD_LEN: Inappropriate ioctl for device
This is due to a missing compat_ioctl callback.
Assign to it compat_ptr_ioctl() as a generic implementation of the
.compat_ioctl file operation to ioctl functions that either ignore the
argument or pass a pointer to a compatible data type.
[ bp: Massage commit message. ]
Signed-off-by: He Zhe <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Tony Luck <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Zhaoxin CPU has provided facilities for monitoring performance
via PMU (Performance Monitor Unit), but the functionality is unused so far.
Therefore, add support for zhaoxin pmu to make performance related
hardware events available.
The PMU is mostly an Intel Architectural PerfMon-v2 with a novel
errata for the ZXC line. It supports the following events:
-----------------------------------------------------------------------------------------------------------------------------------
Event | Event | Umask | Description
| Select | |
-----------------------------------------------------------------------------------------------------------------------------------
cpu-cycles | 82h | 00h | unhalt core clock
instructions | 00h | 00h | number of instructions at retirement.
cache-references | 15h | 05h | number of fillq pushs at the current cycle.
cache-misses | 1ah | 05h | number of l2 miss pushed by fillq.
branch-instructions | 28h | 00h | counts the number of branch instructions retired.
branch-misses | 29h | 00h | mispredicted branch instructions at retirement.
bus-cycles | 83h | 00h | unhalt bus clock
stalled-cycles-frontend | 01h | 01h | Increments each cycle the # of Uops issued by the RAT to RS.
stalled-cycles-backend | 0fh | 04h | RS0/1/2/3/45 empty
L1-dcache-loads | 68h | 05h | number of retire/commit load.
L1-dcache-load-misses | 4bh | 05h | retired load uops whose data source followed an L1 miss.
L1-dcache-stores | 69h | 06h | number of retire/commit Store,no LEA
L1-dcache-store-misses | 62h | 05h | cache lines in M state evicted out of L1D due to Snoop HitM or dirty line replacement.
L1-icache-loads | 00h | 03h | number of l1i cache access for valid normal fetch,including un-cacheable access.
L1-icache-load-misses | 01h | 03h | number of l1i cache miss for valid normal fetch,including un-cacheable miss.
L1-icache-prefetches | 0ah | 03h | number of prefetch.
L1-icache-prefetch-misses | 0bh | 03h | number of prefetch miss.
dTLB-loads | 68h | 05h | number of retire/commit load
dTLB-load-misses | 2ch | 05h | number of load operations miss all level tlbs and cause a tablewalk.
dTLB-stores | 69h | 06h | number of retire/commit Store,no LEA
dTLB-store-misses | 30h | 05h | number of store operations miss all level tlbs and cause a tablewalk.
dTLB-prefetches | 64h | 05h | number of hardware pte prefetch requests dispatched out of the prefetch FIFO.
dTLB-prefetch-misses | 65h | 05h | number of hardware pte prefetch requests miss the l1d data cache.
iTLB-load | 00h | 00h | actually counter instructions.
iTLB-load-misses | 34h | 05h | number of code operations miss all level tlbs and cause a tablewalk.
-----------------------------------------------------------------------------------------------------------------------------------
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: CodyYao-oc <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Modules have no business poking into this but fixing this is for later.
[ bp: Carve out from an earlier patch. ]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.
As a first step, move __flush_tlb() out of line and hide the native
function. The latter can be static when CONFIG_PARAVIRT is disabled.
Consolidate the namespace while at it and remove the pointless extra
wrapper in the paravirt code.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.
The various CR4 accessors require cpu_tlbstate as the CR4 shadow cache
is located there.
In preparation for unexporting cpu_tlbstate, create a builtin function
for manipulating CR4 and rework the various helpers to use it.
No functional change.
[ bp: push the export of native_write_cr4() only when CONFIG_LKTDM=m to
the last patch in the series. ]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|