Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a bug in the implementation of the x86 accelerated version of
poly1305"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/poly1305 - fix overflow during partial reduction
|
|
Family 17h differs from prior families by:
- Does not support an L2 cache miss event
- It has re-enumerated PMC counters for:
- L2 cache references
- front & back end stalled cycles
So we add a new amd_f17h_perfmon_event_map[] so that the generic
perf event names will resolve to the correct h/w events on
family 17h and above processors.
Reference sections 2.1.13.3.3 (stalls) and 2.1.13.3.6 (L2):
https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
Signed-off-by: Kim Phillips <[email protected]>
Cc: <[email protected]> # v4.9+
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Janakarajan Natarajan <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Martin Liška <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors")
[ Improved the formatting a bit. ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add MDS to the new 'mitigations=' cmdline option.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
the maximum amount of system RAM supported. The size of the direct
mapping section is obtained from the smaller one of the below two
values:
(actual system RAM size + padding size) vs (max system RAM size supported)
This calculation is wrong since commit
b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").
In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
the kernel is using 4-level or 5-level page tables. Thus, it will always
use 4 PB as the maximum amount of system RAM, even in 4-level paging
mode where it should actually be 64 TB.
Thus, the size of the direct mapping section will always
be the sum of the actual system RAM size plus the padding size.
Even when the amount of system RAM is 64 TB, the following layout will
still be used. Obviously KALSR will be weakened significantly.
|____|_______actual RAM_______|_padding_|______the rest_______|
0 64TB ~120TB
Instead, it should be like this:
|____|_______actual RAM_______|_________the rest______________|
0 64TB ~120TB
The size of padding region is controlled by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.
The above issue only exists when
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.
Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
[ bp: Massage commit message. ]
Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: Baoquan He <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Thomas Garnier <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: x86-ml <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
|
|
$(call if_changed,...) must have FORCE as a prerequisite.
Signed-off-by: Masahiro Yamada <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Currently, when a new resource group is created, the allocation values
of the MBA resource are not initialized and remain meaningless data.
For example:
mkdir /sys/fs/resctrl/p1
cat /sys/fs/resctrl/p1/schemata
MB:0=100;1=100
echo "MB:0=10;1=20" > /sys/fs/resctrl/p1/schemata
cat /sys/fs/resctrl/p1/schemata
MB:0= 10;1= 20
rmdir /sys/fs/resctrl/p1
mkdir /sys/fs/resctrl/p2
cat /sys/fs/resctrl/p2/schemata
MB:0= 10;1= 20
Therefore, when the new group is created, it is reasonable to initialize
MBA resource with default values.
Initialize the MBA resource and cache resources in separate functions.
[ bp: Add newlines between code blocks for better readability. ]
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Carve out per rdt_domain initialization code from rdtgroup_init_alloc()
into a separate function.
No functional change, make the code more readable and save us at least
two indentation levels.
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: Reinette Chatre <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Merge it to pick up dependent urgent changes before applying more
resctrl stuff.
Signed-off-by: Borislav Petkov <[email protected]>
|
|
In pcibios_irq_init(), the PCI IRQ routing table 'pirq_table' is first
found through pirq_find_routing_table(). If the table is not found and
CONFIG_PCI_BIOS is defined, the table is then allocated in
pcibios_get_irq_routing_table() using kmalloc(). Later, if the I/O APIC is
used, this table is actually not used. In that case, the allocated table
is not freed, which is a memory leak.
Free the allocated table if it is not used.
Signed-off-by: Wenwen Wang <[email protected]>
[bhelgaas: added Ingo's reviewed-by, since the only change since v1 was to
use the irq_routing_table local variable name he suggested]
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
Pull in the command line updates from the tip tree so the MDS parts can be
added.
|
|
Configure x86 runtime CPU speculation bug mitigations in accordance with
the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2,
Speculative Store Bypass, and L1TF.
The default behavior is unchanged.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Jiri Kosina <[email protected]> (on x86)
Reviewed-by: Jiri Kosina <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Jon Masters <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: [email protected]
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: [email protected]
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Tyler Hicks <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Phil Auld <[email protected]>
Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoimboe@redhat.com
|
|
This code is only for CPUs which are affected by MSBDS, but are *not*
affected by the other two MDS issues.
For such CPUs, enabling the mds_idle_clear mitigation is enough to
mitigate SMT.
However if user boots with 'mds=off' and still has SMT enabled, we should
not report that SMT is mitigated:
$cat /sys//devices/system/cpu/vulnerabilities/mds
Vulnerable; SMT mitigated
But rather:
Vulnerable; SMT vulnerable
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
s/L1TF/MDS/
Signed-off-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
|
|
All stack types on x86 64-bit have guard pages now.
So there is no point in executing probabilistic overflow checks as the
guard pages are a accurate and reliable overflow prevention.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The IRQ stack lives in percpu space, so an IRQ handler that overflows it
will overwrite other data structures.
Use vmap() to remap the IRQ stack so that it will have the usual guard
pages that vmap()/vmalloc() allocations have. With this, the kernel will
panic immediately on an IRQ stack overflow.
[ tglx: Move the map code to a proper place and invoke it only when a CPU
is about to be brought online. No point in installing the map at
early boot for all possible CPUs. Fail the CPU bringup if the vmap()
fails as done for all other preparatory stages in CPU hotplug. ]
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Currently, the IRQ stack is hardcoded as the first page of the percpu
area, and the stack canary lives on the IRQ stack. The former gets in
the way of adding an IRQ stack guard page, and the latter is a potential
weakness in the stack canary mechanism.
Split the IRQ stack into its own private percpu pages.
[ tglx: Make 64 and 32 bit share struct irq_stack ]
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: "Chang S. Bae" <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Jordan Borgner <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Maran Wilson <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: "Rafael Ávila de Espíndola" <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: x86-ml <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Preparatory change for disentangling the irq stack union as a
prerequisite for irq stacks with guard pages.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "Chang S. Bae" <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Cc: Yi Wang <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
irq_ctx_init() crashes hard on page allocation failures. While that's ok
during early boot, it's just wrong in the CPU hotplug bringup code.
Check the page allocation failure and return -ENOMEM and handle it at the
call sites. On early boot the only way out is to BUG(), but on CPU hotplug
there is no reason to crash, so just abort the operation.
Rename the function to something more sensible while at it.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Alison Schofield <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Shaokun Zhang <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: x86-ml <[email protected]>
Cc: [email protected]
Cc: Yazen Ghannam <[email protected]>
Cc: Yi Wang <[email protected]>
Cc: Zhenzhong Duan <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
irq_ctx_init() is invoked from native_init_IRQ() or from xen_init_IRQ()
code. There is no reason to have this split. The interrupt stacks must be
allocated no matter what.
Invoke it from init_IRQ() before invoking the native or XEN init
implementation.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Abraham <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: x86-ml <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Preparatory patch to share code with 32bit.
No functional changes.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "Chang S. Bae" <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The percpu storage holds a pointer to the stack not the stack
itself. Rename it before sharing struct irq_stack with 64-bit.
No functional changes.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
There is no reason to have an u32 array in struct irq_stack. The only
purpose of the array is to size the struct properly.
Preparatory change for sharing struct irq_stack with 64-bit.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
On 32-bit IRQ_STACK_SIZE is the same as THREAD_SIZE.
To allow sharing struct irq_stack with 32-bit, define IRQ_STACK_SIZE for
32-bit and use it for struct irq_stack.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The current implementation of in_exception_stack() iterates over the
exception stacks array. Most of the time this is an useless exercise, but
even for the actual use cases (perf and ftrace) it takes at least 2
iterations to get to the NMI stack.
As the exception stacks and the guard pages are page aligned the loop can
be avoided completely.
Add a initial check whether the stack pointer is inside the full exception
stack area and leave early if not.
Create a lookup table which describes the stack area. The table index is
the page offset from the beginning of the exception stacks. So for any
given stack pointer the page offset is computed and a lookup in the
description table is performed. If it is inside a guard page, return. If
not, use the descriptor to fill in the info structure.
The table is filled at compile time and for the !KASAN case the interesting
page descriptors exactly fit into a single cache line. Just the last guard
page descriptor is in the next cacheline, but that should not be accessed
in the regular case.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The debug IST stack is actually two separate debug stacks to handle #DB
recursion. This is required because the CPU starts always at top of stack
on exception entry, which means on #DB recursion the second #DB would
overwrite the stack of the first.
The low level entry code therefore adjusts the top of stack on entry so a
secondary #DB starts from a different stack page. But the stack pages are
adjacent without a guard page between them.
Split the debug stack into 3 stacks which are separated by guard pages. The
3rd stack is never mapped into the cpu_entry_area and is only there to
catch triple #DB nesting:
--- top of DB_stack <- Initial stack
--- end of DB_stack
guard page
--- top of DB1_stack <- Top of stack after entering first #DB
--- end of DB1_stack
guard page
--- top of DB2_stack <- Top of stack after entering second #DB
--- end of DB2_stack
guard page
If DB2 would not act as the final guard hole, a second #DB would point the
top of #DB stack to the stack below #DB1 which would be valid and not catch
the not so desired triple nesting.
The backing store does not allocate any memory for DB2 and its guard page
as it is not going to be mapped into the cpu_entry_area.
- Adjust the low level entry code so it adjusts top of #DB with the offset
between the stacks instead of exception stack size.
- Make the dumpstack code aware of the new stacks.
- Adjust the in_debug_stack() implementation and move it into the NMI code
where it belongs. As this is NMI hotpath code, it just checks the full
area between top of DB_stack and bottom of DB1_stack without checking
for the guard page. That's correct because the NMI cannot hit a
stackpointer pointing to the guard page between DB and DB1 stack. Even
if it would, then the NMI operation still is unaffected, but the resume
of the debug exception on the topmost DB stack will crash by touching
the guard page.
[ bp: Make exception_stack_names static const char * const ]
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: "Chang S. Bae" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: [email protected]
Cc: Masahiro Yamada <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
All usage sites which expected that the exception stacks in the CPU entry
area are mapped linearly are fixed up. Enable guard pages between the
IST stacks.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The entry order of the TSS.IST array and the order of the stack
storage/mapping are not required to be the same.
With the upcoming split of the debug stack this is going to fall apart as
the number of TSS.IST array entries stays the same while the actual stacks
are increasing.
Make them separate so that code like dumpstack can just utilize the mapping
order. The IST index is solely required for the actual TSS.IST array
initialization.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: "Chang S. Bae" <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Dou Liyang <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
All users gone.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "Chang S. Bae" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Convert the TSS.IST setup code to use the cpu entry area information
directly instead of assuming a linear mapping of the IST stacks.
The store to orig_ist[] is no longer required as there are no users
anymore.
This is the last preparatory step towards IST guard pages.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "Chang S. Bae" <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.
There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.
No functional change. Preparation for removing orig_ist.
Cc: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.
There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.
No functional change. Preparation for removing orig_ist.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.
There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.
No functional change. Preparation for removing orig_ist.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Store a pointer to the per cpu entry area exception stack mappings to allow
fast retrieval.
Required for converting various places from using the shadow IST array to
directly doing address calculations on the actual mapping address.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
To allow guard pages between the IST stacks each stack needs to be
mapped individually.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
At the moment everything assumes a full linear mapping of the various
exception stacks. Adding guard pages to the cpu entry area mapping of the
exception stacks will break that assumption.
As a preparatory step convert both the real storage and the effective
mapping in the cpu entry area from character arrays to structures.
To ensure that both arrays have the same ordering and the same size of the
individual stacks fill the members with a macro. The guard size is the only
difference between the two resulting structures. For now both have guard
size 0 until the preparation of all usage sites is done.
Provide a couple of helper macros which are used in the following
conversions.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "Chang S. Bae" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
No point in retrieving the entry area pointer over and over. Do it once
and use unsigned int for 'cpu' everywhere.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The defines for the exception stack (IST) array in the TSS are using the
SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
array indices related to IST. That's confusing at best and does not provide
any value.
Make the indices zero based and fixup the usage sites. The only code which
needs to adjust the 0 based index is the interrupt descriptor setup which
needs to add 1 now.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: "Chang S. Bae" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Dou Liyang <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: [email protected]
Cc: Nicolai Stange <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Nothing requires those for 32bit builds.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Nothing uses that and before people get the wrong ideas, get rid of it.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Commit
d8ba61ba58c8 ("x86/entry/64: Don't use IST entry for #BP stack")
removed the last user but left the macro around.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Dou Liyang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
On x86, stacks go top to bottom, but the stack overflow check uses it
the other way round, which is just confusing. Clean it up and sanitize
the warning string a bit.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
stack_overflow_check() is using both irq_stack_ptr and irq_stack_union
to find the IRQ stack. That's going to break when vmapped irq stacks are
introduced.
Change it to just use irq_stack_ptr.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The get_stack_info() function is off-by-one when checking whether an
address is on a IRQ stack or a IST stack. This prevents an overflowed
IRQ or IST stack from being dumped properly.
[ tglx: Do the same for 32-bit ]
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Commit
37fe6a42b343 ("x86: Check stack overflow in detail")
added a broad check for the full exception stack area, i.e. it considers
the full exception stack area as valid.
That's wrong in two aspects:
1) It does not check the individual areas one by one
2) #DF, NMI and #MCE are not enabling interrupts which means that a
regular device interrupt cannot happen in their context. In fact if a
device interrupt hits one of those IST stacks that's a bug because some
code path enabled interrupts while handling the exception.
Limit the check to the #DB stack and consider all other IST stacks as
'overflow' or invalid.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Mitsuo Hayasaka <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Pull KVM fixes from Paolo Bonzini:
"5.1 keeps its reputation as a big bugfix release for KVM x86.
- Fix for a memory leak introduced during the merge window
- Fixes for nested VMX with ept=0
- Fixes for AMD (APIC virtualization, NMI injection)
- Fixes for Hyper-V under KVM and KVM under Hyper-V
- Fixes for 32-bit SMM and tests for SMM virtualization
- More array_index_nospec peppering"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
KVM: fix spectrev1 gadgets
KVM: x86: fix warning Using plain integer as NULL pointer
selftests: kvm: add a selftest for SMM
selftests: kvm: fix for compilers that do not support -no-pie
selftests: kvm/evmcs_test: complete I/O before migrating guest state
KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
KVM: x86: clear SMM flags before loading state while leaving SMM
KVM: x86: Open code kvm_set_hflags
KVM: x86: Load SMRAM in a single shot when leaving SMM
KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU
KVM: x86: Raise #GP when guest vCPU do not support PMU
x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
KVM: x86: svm: make sure NMI is injected after nmi_singlestep
svm/avic: Fix invalidate logical APIC id entry
Revert "svm: Fix AVIC incomplete IPI emulation"
kvm: mmu: Fix overflow on kvm mmu page limit calculation
KVM: nVMX: always use early vmcs check when EPT is disabled
KVM: nVMX: allow tests to use bad virtual-APIC page address
...
|
|
All architectures except MIPS were defining it in the same way,
and memory slots are handled entirely by common code so there
is no point in keeping the definition per-architecture.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
EFER.LME and EFER.NX are considered reserved if their respective feature
bits are not advertised to the guest.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
KVM allows userspace to violate consistency checks related to the
guest's CPUID model to some degree. Generally speaking, userspace has
carte blanche when it comes to guest state so long as jamming invalid
state won't negatively affect the host.
Currently this is seems to be a non-issue as most of the interesting
EFER checks are missing, e.g. NX and LME, but those will be added
shortly. Proactively exempt userspace from the CPUID checks so as not
to break userspace.
Note, the efer_reserved_bits check still applies to userspace writes as
that mask reflects the host's capabilities, e.g. KVM shouldn't allow a
guest to run with NX=1 if it has been disabled in the host.
Fixes: d80174745ba39 ("KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Most, but not all, helpers that are related to emulating consistency
checks for nested VM-Entry return -EINVAL when a check fails. Convert
the holdouts to have consistency throughout and to make it clear that
the functions are signaling pass/fail as opposed to "resume guest" vs.
"exit to userspace".
Opportunistically fix bad indentation in nested_vmx_check_guest_state().
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Convert all top-level nested VM-Enter consistency check functions to
return 0/-EINVAL instead of failure codes, since now they can only
ever return one failure code.
This also does not give the false impression that failure information is
always consumed and/or relevant, e.g. vmx_set_nested_state() only
cares whether or not the checks were successful.
nested_check_host_control_regs() can also now be inlined into its caller,
nested_vmx_check_host_state, since the two have effectively become the
same function.
Based on a patch by Sean Christopherson.
Signed-off-by: Paolo Bonzini <[email protected]>
|