aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-10-03x86: kmsan: disable instrumentation of unsupported codeAlexander Potapenko2-0/+3
Instrumenting some files with KMSAN will result in kernel being unable to link, boot or crashing at runtime for various reasons (e.g. infinite recursion caused by instrumentation hooks calling instrumented code again). Completely omit KMSAN instrumentation in the following places: - arch/x86/boot and arch/x86/realmode/rm, as KMSAN doesn't work for i386; - arch/x86/entry/vdso, which isn't linked with KMSAN runtime; - three files in arch/x86/kernel - boot problems; - arch/x86/mm/cpu_entry_area.c - recursion. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-02Merge tag 'x86_urgent_for_v6.0' of ↵Linus Torvalds1-22/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add the respective UP last level cache mask accessors in order not to cause segfaults when lscpu accesses their representation in sysfs - Fix for a race in the alternatives batch patching machinery when kprobes are set * tag 'x86_urgent_for_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cacheinfo: Add a cpu_llc_shared_mask() UP variant x86/alternative: Fix race in try_get_desc()
2022-10-02kbuild: use obj-y instead extra-y for objects placed at the headMasahiro Yamada1-5/+5
The objects placed at the head of vmlinux need special treatments: - arch/$(SRCARCH)/Makefile adds them to head-y in order to place them before other archives in the linker command line. - arch/$(SRCARCH)/kernel/Makefile adds them to extra-y instead of obj-y to avoid them going into built-in.a. This commit gets rid of the latter. Create vmlinux.a to collect all the objects that are unconditionally linked to vmlinux. The objects listed in head-y are moved to the head of vmlinux.a by using 'ar m'. With this, arch/$(SRCARCH)/kernel/Makefile can consistently use obj-y for builtin objects. There is no *.o that is directly linked to vmlinux. Drop unneeded code in scripts/clang-tools/gen_compile_commands.py. $(AR) mPi needs 'T' to workaround the llvm-ar bug. The fix was suggested by Nathan Chancellor [1]. [1]: https://lore.kernel.org/llvm/[email protected]/ Signed-off-by: Masahiro Yamada <[email protected]> Tested-by: Nick Desaulniers <[email protected]> Reviewed-by: Nicolas Schier <[email protected]>
2022-09-29Merge branch 'v6.0-rc7'Peter Zijlstra5-26/+65
Merge upstream to get RAPTORLAKE_S Signed-off-by: Peter Zijlstra <[email protected]>
2022-09-27x86/alternative: Fix race in try_get_desc()Nadav Amit1-22/+23
I encountered some occasional crashes of poke_int3_handler() when kprobes are set, while accessing desc->vec. The text poke mechanism claims to have an RCU-like behavior, but it does not appear that there is any quiescent state to ensure that nobody holds reference to desc. As a result, the following race appears to be possible, which can lead to memory corruption. CPU0 CPU1 ---- ---- text_poke_bp_batch() -> smp_store_release(&bp_desc, &desc) [ notice that desc is on the stack ] poke_int3_handler() [ int3 might be kprobe's so sync events are do not help ] -> try_get_desc(descp=&bp_desc) desc = __READ_ONCE(bp_desc) if (!desc) [false, success] WRITE_ONCE(bp_desc, NULL); atomic_dec_and_test(&desc.refs) [ success, desc space on the stack is being reused and might have non-zero value. ] arch_atomic_inc_not_zero(&desc->refs) [ might succeed since desc points to stack memory that was freed and might be reused. ] Fix this issue with small backportable patch. Instead of trying to make RCU-like behavior for bp_desc, just eliminate the unnecessary level of indirection of bp_desc, and hold the whole descriptor as a global. Anyhow, there is only a single descriptor at any given moment. Fixes: 1f676247f36a4 ("x86/alternatives: Implement a better poke_int3_handler() completion scheme") Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2022-09-27x86: kprobes: Remove unused macro stack_addrChen Zhongjin1-2/+0
An unused macro reported by [-Wunused-macros]. This macro is used to access the sp in pt_regs because at that time x86_32 can only get sp by kernel_stack_pointer(regs). '3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")' This commit have unified the pt_regs and from them we can get sp from pt_regs with regs->sp easily. Nowhere is using this macro anymore. Refrencing pt_regs directly is more clear. Remove this macro for code cleaning. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chen Zhongjin <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-09-26mm: remove rb tree.Liam R. Howlett1-1/+0
Remove the RB tree and start using the maple tree for vm_area_struct tracking. Drop validate_mm() calls in expand_upwards() and expand_downwards() as the lock is not held. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26mm: start tracking VMAs with maple treeLiam R. Howlett1-0/+1
Start tracking the VMAs with the new maple tree structure in parallel with the rb_tree. Add debug and trace events for maple tree operations and duplicate the rb_tree that is created on forks into the maple tree. The maple tree is added to the mm_struct including the mm_init struct, added support in required mm/mmap functions, added tracking in kernel/fork for process forking, and used to find the unmapped_area and checked against what the rbtree finds. This also moves the mmap_lock() in exit_mmap() since the oom reaper call does walk the VMAs. Otherwise lockdep will be unhappy if oom happens. When splitting a vma fails due to allocations of the maple tree nodes, the error path in __split_vma() calls new->vm_ops->close(new). The page accounting for hugetlb is actually in the close() operation, so it accounts for the removal of 1/2 of the VMA which was not adjusted. This results in a negative exit value. To avoid the negative charge, set vm_start = vm_end and vm_pgoff = 0. There is also a potential accounting issue in special mappings from insert_vm_struct() failing to allocate, so reverse the charge there in the failure scenario. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26Merge tag 'x86_urgent_for_v6.0-rc8' of ↵Linus Torvalds2-7/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Dave Hansen: - A performance fix for recent large AMD systems that avoids an ancient cpu idle hardware workaround - A new Intel model number. Folks like these upstream as soon as possible so that each developer doing feature development doesn't need to carry their own #define - SGX fixes for a userspace crash and a rare kernel warning * tag 'x86_urgent_for_v6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ACPI: processor idle: Practically limit "Dummy wait" workaround to old Intel systems x86/sgx: Handle VA page allocation failure for EAUG on PF. x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd x86/cpu: Add CPU model numbers for Meteor Lake
2022-09-26x86: Add support for CONFIG_CFI_CLANGSami Tolvanen3-1/+91
With CONFIG_CFI_CLANG, the compiler injects a type preamble immediately before each function and a check to validate the target function type before indirect calls: ; type preamble __cfi_function: mov <id>, %eax function: ... ; indirect call check mov -<id>,%r10d add -0x4(%r11),%r10d je .Ltmp1 ud2 .Ltmp1: call __x86_indirect_thunk_r11 Add error handling code for the ud2 traps emitted for the checks, and allow CONFIG_CFI_CLANG to be selected on x86_64. This produces the following oops on CFI failure (generated using lkdtm): [ 21.441706] CFI failure at lkdtm_indirect_call+0x16/0x20 [lkdtm] (target: lkdtm_increment_int+0x0/0x10 [lkdtm]; expected type: 0x7e0c52a) [ 21.444579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 21.445296] CPU: 0 PID: 132 Comm: sh Not tainted 5.19.0-rc8-00020-g9f27360e674c #1 [ 21.445296] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 21.445296] RIP: 0010:lkdtm_indirect_call+0x16/0x20 [lkdtm] [ 21.445296] Code: 52 1c c0 48 c7 c1 c5 50 1c c0 e9 25 48 2a cc 0f 1f 44 00 00 49 89 fb 48 c7 c7 50 b4 1c c0 41 ba 5b ad f3 81 45 03 53 f8 [ 21.445296] RSP: 0018:ffffa9f9c02ffdc0 EFLAGS: 00000292 [ 21.445296] RAX: 0000000000000027 RBX: ffffffffc01cb300 RCX: 385cbbd2e070a700 [ 21.445296] RDX: 0000000000000000 RSI: c0000000ffffdfff RDI: ffffffffc01cb450 [ 21.445296] RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8d081610 [ 21.445296] R10: 00000000bcc90825 R11: ffffffffc01c2fc0 R12: 0000000000000000 [ 21.445296] R13: ffffa31b827a6000 R14: 0000000000000000 R15: 0000000000000002 [ 21.445296] FS: 00007f08b42216a0(0000) GS:ffffa31b9f400000(0000) knlGS:0000000000000000 [ 21.445296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.445296] CR2: 0000000000c76678 CR3: 0000000001940000 CR4: 00000000000006f0 [ 21.445296] Call Trace: [ 21.445296] <TASK> [ 21.445296] lkdtm_CFI_FORWARD_PROTO+0x30/0x50 [lkdtm] [ 21.445296] direct_entry+0x12d/0x140 [lkdtm] [ 21.445296] full_proxy_write+0x5d/0xb0 [ 21.445296] vfs_write+0x144/0x460 [ 21.445296] ? __x64_sys_wait4+0x5a/0xc0 [ 21.445296] ksys_write+0x69/0xd0 [ 21.445296] do_syscall_64+0x51/0xa0 [ 21.445296] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 21.445296] RIP: 0033:0x7f08b41a6fe1 [ 21.445296] Code: be 07 00 00 00 41 89 c0 e8 7e ff ff ff 44 89 c7 89 04 24 e8 91 c6 02 00 8b 04 24 48 83 c4 68 c3 48 63 ff b8 01 00 00 03 [ 21.445296] RSP: 002b:00007ffcdf65c2e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 21.445296] RAX: ffffffffffffffda RBX: 00007f08b4221690 RCX: 00007f08b41a6fe1 [ 21.445296] RDX: 0000000000000012 RSI: 0000000000c738f0 RDI: 0000000000000001 [ 21.445296] RBP: 0000000000000001 R08: fefefefefefefeff R09: fefefefeffc5ff4e [ 21.445296] R10: 00007f08b42222b0 R11: 0000000000000246 R12: 0000000000c738f0 [ 21.445296] R13: 0000000000000012 R14: 00007ffcdf65c401 R15: 0000000000c70450 [ 21.445296] </TASK> [ 21.445296] Modules linked in: lkdtm [ 21.445296] Dumping ftrace buffer: [ 21.445296] (ftrace buffer empty) [ 21.471442] ---[ end trace 0000000000000000 ]--- [ 21.471811] RIP: 0010:lkdtm_indirect_call+0x16/0x20 [lkdtm] [ 21.472467] Code: 52 1c c0 48 c7 c1 c5 50 1c c0 e9 25 48 2a cc 0f 1f 44 00 00 49 89 fb 48 c7 c7 50 b4 1c c0 41 ba 5b ad f3 81 45 03 53 f8 [ 21.474400] RSP: 0018:ffffa9f9c02ffdc0 EFLAGS: 00000292 [ 21.474735] RAX: 0000000000000027 RBX: ffffffffc01cb300 RCX: 385cbbd2e070a700 [ 21.475664] RDX: 0000000000000000 RSI: c0000000ffffdfff RDI: ffffffffc01cb450 [ 21.476471] RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8d081610 [ 21.477127] R10: 00000000bcc90825 R11: ffffffffc01c2fc0 R12: 0000000000000000 [ 21.477959] R13: ffffa31b827a6000 R14: 0000000000000000 R15: 0000000000000002 [ 21.478657] FS: 00007f08b42216a0(0000) GS:ffffa31b9f400000(0000) knlGS:0000000000000000 [ 21.479577] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.480307] CR2: 0000000000c76678 CR3: 0000000001940000 CR4: 00000000000006f0 [ 21.481460] Kernel panic - not syncing: Fatal exception Signed-off-by: Sami Tolvanen <[email protected]> Reviewed-by: Kees Cook <[email protected]> Tested-by: Kees Cook <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-26x86/cpu: Include the header of init_ia32_feat_ctl()'s prototypeLuciano Leão1-1/+1
Include the header containing the prototype of init_ia32_feat_ctl(), solving the following warning: $ make W=1 arch/x86/kernel/cpu/feat_ctl.o arch/x86/kernel/cpu/feat_ctl.c:112:6: warning: no previous prototype for ‘init_ia32_feat_ctl’ [-Wmissing-prototypes] 112 | void init_ia32_feat_ctl(struct cpuinfo_x86 *c) This warning appeared after commit 5d5103595e9e5 ("x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup") had moved the function init_ia32_feat_ctl()'s prototype from arch/x86/kernel/cpu/cpu.h to arch/x86/include/asm/cpu.h. Note that, before the commit mentioned above, the header include "cpu.h" (arch/x86/kernel/cpu/cpu.h) was added by commit 0e79ad863df43 ("x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl()") solely to fix init_ia32_feat_ctl()'s missing prototype. So, the header include "cpu.h" is no longer necessary. [ bp: Massage commit message. ] Fixes: 5d5103595e9e5 ("x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup") Signed-off-by: Luciano Leão <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Nícolas F. R. A. Prado <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Make resctrl_arch_rmid_read() return values in bytesJames Morse3-19/+15
resctrl_arch_rmid_read() returns a value in chunks, as read from the hardware. This needs scaling to bytes by mon_scale, as provided by the architecture code. Now that resctrl_arch_rmid_read() performs the overflow and corrections itself, it may as well return a value in bytes directly. This allows the accesses to the architecture specific 'hw' structure to be removed. Move the mon_scale conversion into resctrl_arch_rmid_read(). mbm_bw_count() is updated to calculate bandwidth from bytes. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_dataJames Morse2-3/+8
resctrl_rmid_realloc_threshold can be set by user-space. The maximum value is specified by the architecture. Currently max_threshold_occ_write() reads the maximum value from boot_cpu_data.x86_cache_size, which is not portable to another architecture. Add resctrl_rmid_realloc_limit to describe the maximum size in bytes that user-space can set the threshold to. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Rename and change the units of resctrl_cqm_thresholdJames Morse3-25/+28
resctrl_cqm_threshold is stored in a hardware specific chunk size, but exposed to user-space as bytes. This means the filesystem parts of resctrl need to know how the hardware counts, to convert the user provided byte value to chunks. The interface between the architecture's resctrl code and the filesystem ought to treat everything as bytes. Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read() still returns its value in chunks, so this needs converting to bytes. As all the users have been touched, rename the variable to resctrl_rmid_realloc_threshold, which describes what the value is for. Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power of 2, so the existing code introduces a rounding error from resctrl's theoretical fraction of the cache usage. This behaviour is kept as it ensures the user visible value matches the value read from hardware when the rmid will be reallocated. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()James Morse2-6/+6
resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. When reading a bandwidth counter, get_corrected_mbm_count() must be used to correct the value read. get_corrected_mbm_count() is architecture specific, this work should be done in resctrl_arch_rmid_read(). Move the function calls. This allows the resctrl filesystems's chunks value to be removed in favour of the architecture private version. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()James Morse2-17/+20
resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. When reading a bandwidth counter, mbm_overflow_count() must be used to correct for any possible overflow. mbm_overflow_count() is architecture specific, its behaviour should be part of resctrl_arch_rmid_read(). Move the mbm_overflow_count() calls into resctrl_arch_rmid_read(). This allows the resctrl filesystems's prev_msr to be removed in favour of the architecture private version. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()James Morse1-14/+17
resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a hardware register. Currently the function returns the MBM values in chunks directly from hardware. To convert this to bytes, some correction and overflow calculations are needed. These depend on the resource and domain structures. Overflow detection requires the old chunks value. None of this is available to resctrl_arch_rmid_read(). MPAM requires the resource and domain structures to find the MMIO device that holds the registers. Pass the resource and domain to resctrl_arch_rmid_read(). This makes rmid_dirty() too big. Instead merge it with its only caller, and the name is kept as a local variable. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/resctrl: Abstract __rmid_read()James Morse3-25/+42
__rmid_read() selects the specified eventid and returns the counter value from the MSR. The error handling is architecture specific, and handled by the callers, rdtgroup_mondata_show() and __mon_event_count(). Error handling should be handled by architecture specific code, as a different architecture may have different requirements. MPAM's counters can report that they are 'not ready', requiring a second read after a short delay. This should be hidden from resctrl. Make __rmid_read() the architecture specific function for reading a counter. Rename it resctrl_arch_rmid_read() and move the error handling into it. A read from a counter that hardware supports but resctrl does not now returns -EINVAL instead of -EIO from the default case in __mon_event_count(). It isn't possible for user-space to see this change as resctrl doesn't expose counters it doesn't support. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-23x86/microcode/AMD: Track patch allocation size explicitlyKees Cook1-1/+2
In preparation for reducing the use of ksize(), record the actual allocation size for later memcpy(). This avoids copying extra (uninitialized!) bytes into the patch buffer when the requested allocation size isn't exactly the size of a kmalloc bucket. Additionally, fix potential future issues where runtime bounds checking will notice that the buffer was allocated to a smaller value than returned by ksize(). Fixes: 757885e94a22 ("x86, microcode, amd: Early microcode patch loading support for AMD") Suggested-by: Daniel Micay <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/lkml/CA+DvKQ+bp7Y7gmaVhacjv9uF6Ar-o4tet872h4Q8RPYPJjcJQA@mail.gmail.com/
2022-09-23x86/resctrl: Allow per-rmid arch private storage to be resetJames Morse2-14/+39
To abstract the rmid counters into a helper that returns the number of bytes counted, architecture specific per-rmid state is needed. It needs to be possible to reset this hidden state, as the values may outlive the life of an rmid, or the mount time of the filesystem. mon_event_read() is called with first = true when an rmid is first allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid() and call it from __mon_event_count()'s rr->first check. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Add per-rmid arch private storage for overflow and chunksJames Morse2-0/+49
A renamed __rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. For bandwidth counters the resctrl filesystem uses this to calculate the number of bytes ever seen. MPAM's scaling of counters can be changed at runtime, reducing the resolution but increasing the range. When this is changed the prev_msr values need to be converted by the architecture code. Add an array for per-rmid private storage. The prev_msr and chunks values will move here to allow resctrl_arch_rmid_read() to always return the number of bytes read by this counter without assistance from the filesystem. The values are moved in later patches when the overflow and correction calls are moved into __rmid_read(). Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunksJames Morse2-11/+18
mbm_bw_count() is only called by the mbm_handle_overflow() worker once a second. It reads the hardware register, calculates the bandwidth and updates m->prev_bw_msr which is used to hold the previous hardware register value. Operating directly on hardware register values makes it difficult to make this code architecture independent, so that it can be moved to /fs/, making the mba_sc feature something resctrl supports with no additional support from the architecture. Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware register using __mon_event_count(). Change mbm_bw_count() to use the current chunks value most recently saved by __mon_event_count(). This removes an extra call to __rmid_read(). Instead of using m->prev_msr to calculate the number of chunks seen, use the rr->val that was updated by __mon_event_count(). This removes an extra call to mbm_overflow_count() and get_corrected_mbm_count(). Calculating bandwidth like this means mbm_bw_count() no longer operates on hardware register values directly. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Allow update_mba_bw() to update controls directlyJames Morse4-11/+26
update_mba_bw() calculates a new control value for the MBA resource based on the user provided mbps_val and the current measured bandwidth. Some control values need remapping by delay_bw_map(). It does this by calling wrmsrl() directly. This needs splitting up to be done by an architecture specific helper, so that the remainder can eventually be moved to /fs/. Add resctrl_arch_update_one() to apply one configuration value to the provided resource and domain. This avoids the staging and cross-calling that is only needed with changes made by user-space. delay_bw_map() moves to be part of the arch code, to maintain the 'percentage control' view of MBA resources in resctrl. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Remove architecture copy of mbps_valJames Morse3-22/+5
The resctrl arch code provides a second configuration array mbps_val[] for the MBA software controller. Since resctrl switched over to allocating and freeing its own array when needed, nothing uses the arch code version. Remove it. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Switch over to the resctrl mbps_val listJames Morse3-29/+52
Updates to resctrl's software controller follow the same path as other configuration updates, but they don't modify the hardware state. rdtgroup_schemata_write() uses parse_line() and the resource's parse_ctrlval() function to stage the configuration. resctrl_arch_update_domains() then updates the mbps_val[] array instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update() call that would update hardware. This complicates the interface between resctrl's filesystem parts and architecture specific code. It should be possible for mba_sc to be completely implemented by the filesystem parts of resctrl. This would allow it to work on a second architecture with no additional code. resctrl_arch_update_domains() using the mbps_val[] array prevents this. Change parse_bw() to write the configuration value directly to the mbps_val[] array in the domain structure. Change rdtgroup_schemata_write() to skip the call to resctrl_arch_update_domains(), meaning all the mba_sc specific code in resctrl_arch_update_domains() can be removed. On the read-side, show_doms() and update_mba_bw() are changed to read the mbps_val[] array from the domain structure. With this, resctrl_arch_get_config() no longer needs to consider mba_sc resources. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Create mba_sc configuration in the rdt_domainJames Morse2-1/+39
To support resctrl's MBA software controller, the architecture must provide a second configuration array to hold the mbps_val[] from user-space. This complicates the interface between the architecture specific code and the filesystem portions of resctrl that will move to /fs/, to allow multiple architectures to support resctrl. Make the filesystem parts of resctrl create an array for the mba_sc values. The software controller can be changed to use this, allowing the architecture code to only consider the values configured in hardware. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Abstract and use supports_mba_mbps()James Morse1-5/+14
To determine whether the mba_MBps option to resctrl should be supported, resctrl tests the boot CPUs' x86_vendor. This isn't portable, and needs abstracting behind a helper so this check can be part of the filesystem code that moves to /fs/. Re-use the tests set_mba_sc() does to determine if the mba_sc is supported on this system. An 'alloc_capable' test is added so that support for the controls isn't implied by the 'delay_linear' property, which is always true for MPAM. Because mbm_update() only update mba_sc if the mbm_local counters are enabled, supports_mba_mbps() checks is_mbm_local_enabled(). (instead of using is_mbm_enabled(), which checks both). Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Remove set_mba_sc()s control array re-initialisationJames Morse1-7/+3
set_mba_sc() enables the 'software controller' to regulate the bandwidth based on the byte counters. This can be managed entirely in the parts of resctrl that move to /fs/, without any extra support from the architecture specific code. set_mba_sc() is called by rdt_enable_ctx() during mount and unmount. It currently resets the arch code's ctrl_val[] and mbps_val[] arrays. The ctrl_val[] was already reset when the domain was created, and by reset_all_ctrls() when the filesystem was last unmounted. Doing the work in set_mba_sc() is not necessary as the values are already at their defaults due to the creation of the domain, or were previously reset during umount(), or are about to reset during umount(). Add a reset of the mbps_val[] in reset_all_ctrls(), allowing the code in set_mba_sc() that reaches in to the architecture specific structures to be removed. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Add domain offline callback for resctrl workJames Morse3-30/+43
Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to free the memory. Move the monitor subdir removal and the cancelling of the mbm/limbo works into a new resctrl_offline_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Group struct rdt_hw_domain cleanupJames Morse1-7/+10
domain_add_cpu() and domain_remove_cpu() need to kfree() the child arrays that were allocated by domain_setup_ctrlval(). As this memory is moved around, and new arrays are created, adjusting the error handling cleanup code becomes noisier. To simplify this, move all the kfree() calls into a domain_free() helper. This depends on struct rdt_hw_domain being kzalloc()d, allowing it to unconditionally kfree() all the child arrays. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Add domain online callback for resctrl workJames Morse3-54/+66
Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to allocate the memory. Move domain_setup_mon_state(), the monitor subdir creation call and the mbm/limbo workers into a new resctrl_online_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Merge mon_capable and mon_enabledJames Morse3-9/+4
mon_enabled and mon_capable are always set as a pair by rdt_get_mon_l3_config(). There is no point having two values. Merge them together. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-22x86/resctrl: Kill off alloc_enabledJames Morse4-12/+4
rdt_resources_all[] used to have extra entries for L2CODE/L2DATA. These were hidden from resctrl by the alloc_enabled value. Now that the L2/L2CODE/L2DATA resources have been merged together, alloc_enabled doesn't mean anything, it always has the same value as alloc_capable which indicates allocation is supported by this resource. Remove alloc_enabled and its helpers. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Jamie Iles <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xin Hao <[email protected]> Tested-by: Shaopeng Tan <[email protected]> Tested-by: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-20x86/dumpstack: Don't mention RIP in "Code: "Jiri Slaby1-1/+1
Commit 238c91115cd0 ("x86/dumpstack: Fix misleading instruction pointer error message") changed the "Code:" line in bug reports when RIP is an invalid pointer. In particular, the report currently says (for example): BUG: kernel NULL pointer dereference, address: 0000000000000000 ... RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. That Unable to access opcode bytes at RIP 0xffffffffffffffd6. is quite confusing as RIP value is 0, not -42. That -42 comes from "regs->ip - PROLOGUE_SIZE", because Code is dumped with some prologue (and epilogue). So do not mention "RIP" on this line in this context. Signed-off-by: Jiri Slaby <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-19smp: add set_nr_cpu_ids()Yury Norov1-2/+2
In preparation to support compile-time nr_cpu_ids, add a setter for the variable. This is a no-op for all arches. Signed-off-by: Yury Norov <[email protected]>
2022-09-15x86,retpoline: Be sure to emit INT3 after JMP *%\regPeter Zijlstra1-0/+9
Both AMD and Intel recommend using INT3 after an indirect JMP. Make sure to emit one when rewriting the retpoline JMP irrespective of compiler SLS options or even CONFIG_SLS. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-09-08x86/sgx: Handle VA page allocation failure for EAUG on PF.Haitao Huang1-1/+4
VM_FAULT_NOPAGE is expected behaviour for -EBUSY failure path, when augmenting a page, as this means that the reclaimer thread has been triggered, and the intention is just to round-trip in ring-3, and retry with a new page fault. Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave") Signed-off-by: Haitao Huang <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Vijay Dhanraj <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2022-09-08x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxdJarkko Sakkinen1-6/+9
Unsanitized pages trigger WARN_ON() unconditionally, which can panic the whole computer, if /proc/sys/kernel/panic_on_warn is set. In sgx_init(), if misc_register() fails or misc_register() succeeds but neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be prematurely stopped. This may leave unsanitized pages, which will result a false warning. Refine __sgx_sanitize_pages() to return: 1. Zero when the sanitization process is complete or ksgxd has been requested to stop. 2. The number of unsanitized pages otherwise. Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list") Reported-by: Paul Menzel <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/linux-sgx/[email protected]/T/#u Link: https://lkml.kernel.org/r/[email protected]
2022-09-05asm-generic: Conditionally enable do_softirq_own_stack() via Kconfig.Sebastian Andrzej Siewior1-1/+1
Remove the CONFIG_PREEMPT_RT symbol from the ifdef around do_softirq_own_stack() and move it to Kconfig instead. Enable softirq stacks based on SOFTIRQ_ON_OWN_STACK which depends on HAVE_SOFTIRQ_ON_OWN_STACK and its default value is set to !PREEMPT_RT. This ensures that softirq stacks are not used on PREEMPT_RT and avoids a 'select' statement on an option which has a 'depends' statement. Link: https://lore.kernel.org/YvN5E%[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2022-09-02x86/microcode: Print previous version of microcode after reloadAshok Raj1-2/+3
Print both old and new versions of microcode after a reload is complete because knowing the previous microcode version is sometimes important from a debugging perspective. [ bp: Massage commit message. ] Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Tony Luck <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-01sgx: use ->f_mapping...Al Viro1-2/+1
Reviewed-by: Jarkko Sakkinen <[email protected]> Reviewed-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Al Viro <[email protected]>
2022-08-31x86/apic: Don't disable x2APIC if lockedDaniel Sneddon1-4/+40
The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC (or x2APIC). X2APIC mode is mostly compatible with legacy APIC, but it disables the memory-mapped APIC interface in favor of one that uses MSRs. The APIC mode is controlled by the EXT bit in the APIC MSR. The MMIO/xAPIC interface has some problems, most notably the APIC LEAK [1]. This bug allows an attacker to use the APIC MMIO interface to extract data from the SGX enclave. Introduce support for a new feature that will allow the BIOS to lock the APIC in x2APIC mode. If the APIC is locked in x2APIC mode and the kernel tries to disable the APIC or revert to legacy APIC mode a GP fault will occur. Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by preventing the kernel from trying to disable the x2APIC. On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS. If legacy APIC is required, then it SGX and TDX need to be disabled in the BIOS. [1]: https://aepicleak.com/aepicleak.pdf Signed-off-by: Daniel Sneddon <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Dave Hansen <[email protected]> Tested-by: Neelima Krishnan <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-08-31x86/resctrl: Fix to restore to original value when re-enabling hardware ↵Kohei Tarumizu1-3/+9
prefetch register The current pseudo_lock.c code overwrites the value of the MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0. Therefore, modify it to save and restore the original values. Fixes: 018961ae5579 ("x86/intel_rdt: Pseudo-lock region creation/removal core") Fixes: 443810fe6160 ("x86/intel_rdt: Create debugfs files for pseudo-locking testing") Fixes: 8a2fc0e1bc0c ("x86/intel_rdt: More precise L2 hit/miss measurements") Signed-off-by: Kohei Tarumizu <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Reinette Chatre <[email protected]> Link: https://lkml.kernel.org/r/eb660f3c2010b79a792c573c02d01e8e841206ad.1661358182.git.reinette.chatre@intel.com
2022-08-29x86/earlyprintk: Clean up pciserialPeter Zijlstra1-7/+7
While working on a GRUB patch to support PCI-serial, a number of cleanups were suggested that apply to the code I took inspiration from. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> # pci_ids.h Reviewed-by: Greg Kroah-Hartman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-08-29x86/mce: Retrieve poison range from hardwareJane Chu1-1/+12
When memory poison consumption machine checks fire, MCE notifier handlers like nfit_handle_mce() record the impacted physical address range which is reported by the hardware in the MCi_MISC MSR. The error information includes data about blast radius, i.e. how many cachelines did the hardware determine are impacted. A recent change 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison granularity") updated nfit_handle_mce() to stop hard coding the blast radius value of 1 cacheline, and instead rely on the blast radius reported in 'struct mce' which can be up to 4K (64 cachelines). It turns out that apei_mce_report_mem_error() had a similar problem in that it hard coded a blast radius of 4K rather than reading the blast radius from the error information. Fix apei_mce_report_mem_error() to convey the proper poison granularity. Signed-off-by: Jane Chu <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected]
2022-08-28Merge tag 'x86-urgent-2022-08-28' of ↵Linus Torvalds4-25/+64
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed * tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI: Mention retbleed vulnerability info file for sysfs x86/sev: Mark snp_abort() noreturn x86/sev: Don't use cc_platform_has() for early SEV-SNP calls x86/boot: Don't propagate uninitialized boot_params->cc_blob_address x86/cpu: Add new Raptor Lake CPU model number x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry x86/nospec: Fix i386 RSB stuffing x86/nospec: Unwreck the RSB stuffing x86/bugs: Add "unknown" reporting for MMIO Stale Data x86/entry: Fix entry_INT80_compat for Xen PV guests x86/PAT: Have pat_enabled() properly reflect state when running on Xen
2022-08-27x86/cpufeatures: Add LbrExtV2 feature bitSandipan Das1-0/+1
CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance monitoring features for AMD processors. Bit 1 of EAX indicates support for Last Branch Record Extension Version 2 (LbrExtV2) features. If found to be set during PMU initialization, the EBX bits of the same leaf can be used to determine the number of available LBR entries. For better utilization of feature words, LbrExtV2 is added as a scattered feature bit. [peterz: Rename to AMD_LBR_V2] Signed-off-by: Sandipan Das <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/172d2b0df39306ed77221c45ee1aa62e8ae0548d.1660211399.git.sandipan.das@amd.com
2022-08-26x86/microcode: Remove ->request_microcode_user()Borislav Petkov2-24/+0
181b6f40e9ea ("x86/microcode: Rip out the OLD_INTERFACE") removed the old microcode loading interface but forgot to remove the related ->request_microcode_user() functionality which it uses. Rip it out now too. Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-25x86/sev: Mark snp_abort() noreturnBorislav Petkov1-1/+1
Mark both the function prototype and definition as noreturn in order to prevent the compiler from doing transformations which confuse objtool like so: vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction This triggers with gcc-12. Add it and sev_es_terminate() to the objtool noreturn tracking array too. Sort it while at it. Suggested-by: Michael Matz <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-24x86/sev: Don't use cc_platform_has() for early SEV-SNP callsTom Lendacky1-2/+14
When running identity-mapped and depending on the kernel configuration, it is possible that the compiler uses jump tables when generating code for cc_platform_has(). This causes a boot failure because the jump table uses un-mapped kernel virtual addresses, not identity-mapped addresses. This has been seen with CONFIG_RETPOLINE=n. Similar to sme_encrypt_kernel(), use an open-coded direct check for the status of SNP rather than trying to eliminate the jump table. This preserves any code optimization in cc_platform_has() that can be useful post boot. It also limits the changes to SEV-specific files so that future compiler features won't necessarily require possible build changes just because they are not compatible with running identity-mapped. [ bp: Massage commit message. ] Fixes: 5e5ccff60a29 ("x86/sev: Add helper for validating pages in early enc attribute changes") Reported-by: Sean Christopherson <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: <[email protected]> # 5.19.x Link: https://lore.kernel.org/all/[email protected]/