aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2022-10-23Merge tag 'x86_urgent_for_v6.0_rc2' of ↵Linus Torvalds9-65/+69
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "As usually the case, right after a major release, the tip urgent branches accumulate a couple more fixes than normal. And here is the x86, a bit bigger, urgent pile. - Use the correct CPU capability clearing function on the error path in Intel perf LBR - A CFI fix to ftrace along with a simplification - Adjust handling of zero capacity bit mask for resctrl cache allocation on AMD - A fix to the AMD microcode loader to attempt patch application on every logical thread - A couple of topology fixes to handle CPUID leaf 0x1f enumeration info properly - Drop a -mabi=ms compiler option check as both compilers support it now anyway - A couple of fixes to how the initial, statically allocated FPU buffer state is setup and its interaction with dynamic states at runtime" * tag 'x86_urgent_for_v6.0_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly perf/x86/intel/lbr: Use setup_clear_cpu_cap() instead of clear_cpu_cap() ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph() x86/ftrace: Remove ftrace_epilogue() x86/resctrl: Fix min_cbm_bits for AMD x86/microcode/AMD: Apply the patch early on every logical thread x86/topology: Fix duplicated core ID within a package x86/topology: Fix multiple packages shown on a single-package system hwmon/coretemp: Handle large core ID value x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB x86/fpu: Exclude dynamic states from init_fpstate x86/fpu: Fix the init_fpstate size check with the actual size x86/fpu: Configure init_fpstate attributes orderly
2022-10-23arm64: dts: imx8mm: Enable CPLD_Dn pull down resistor on MX8MenloMarek Vasut1-8/+8
Enable CPLD_Dn pull down resistor instead of pull up to avoid intefering with CPLD power off functionality. Fixes: 510c527b4ff57 ("arm64: dts: imx8mm: Add i.MX8M Mini Toradex Verdin based Menlo board") Signed-off-by: Marek Vasut <[email protected]> Reviewed-by: Fabio Estevam <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
2022-10-22KVM: x86: Mask off reserved bits in CPUID.8000001AHJim Mattson1-0/+3
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. In the case of CPUID.8000001AH, only three bits are currently defined. The 125 reserved bits should be masked off. Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Signed-off-by: Jim Mattson <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2022-10-22KVM: x86: Mask off reserved bits in CPUID.80000008HJim Mattson1-0/+1
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. The following ranges of CPUID.80000008H are reserved and should be masked off: ECX[31:18] ECX[11:8] In addition, the PerfTscSize field at ECX[17:16] should also be zero because KVM does not set the PERFTSC bit at CPUID.80000001H.ECX[27]. Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Signed-off-by: Jim Mattson <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2022-10-22KVM: x86: Mask off reserved bits in CPUID.80000006HJim Mattson1-1/+2
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. CPUID.80000006H:EDX[17:16] are reserved bits and should be masked off. Fixes: 43d05de2bee7 ("KVM: pass through CPUID(0x80000006)") Signed-off-by: Jim Mattson <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2022-10-22KVM: x86: Mask off reserved bits in CPUID.80000001HJim Mattson1-0/+1
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. CPUID.80000001:EBX[27:16] are reserved bits and should be masked off. Fixes: 0771671749b5 ("KVM: Enhance guest cpuid management") Signed-off-by: Jim Mattson <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2022-10-22KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTERAlexander Graf1-0/+56
The KVM_X86_SET_MSR_FILTER ioctls contains a pointer in the passed in struct which means it has a different struct size depending on whether it gets called from 32bit or 64bit code. This patch introduces compat code that converts from the 32bit struct to its 64bit counterpart which then gets used going forward internally. With this applied, 32bit QEMU can successfully set MSR bitmaps when running on 64bit kernels. Reported-by: Andrew Randrianasulu <[email protected]> Fixes: 1a155254ff937 ("KVM: x86: Introduce MSR filtering") Signed-off-by: Alexander Graf <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2022-10-22KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()Alexander Graf1-14/+17
In the next patch we want to introduce a second caller to set_msr_filter() which constructs its own filter list on the stack. Refactor the original function so it takes it as argument instead of reading it through copy_from_user(). Signed-off-by: Alexander Graf <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2022-10-22Merge tag 'kvm-riscv-fixes-6.1-1' of https://github.com/kvm-riscv/linux into ↵Paolo Bonzini6-51/+57
HEAD KVM/riscv fixes for 6.1, take #1 - Fix compilation without RISCV_ISA_ZICBOM - Fix kvm_riscv_vcpu_timer_pending() for Sstc
2022-10-22Merge tag 'kvmarm-fixes-6.1-2' of ↵Paolo Bonzini2-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.1, take #2 - Fix a bug preventing restoring an ITS containing mappings for very large and very sparse device topology - Work around a relocation handling error when compiling the nVHE object with profile optimisation
2022-10-22Merge tag 'kvmarm-fixes-6.1-1' of ↵Paolo Bonzini5-30/+25
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.1, take #1 - Fix for stage-2 invalidation holding the VM MMU lock for too long by limiting the walk to the largest block mapping size - Enable stack protection and branch profiling for VHE - Two selftest fixes
2022-10-21x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctlyChang S. Bae1-0/+9
When an extended state component is not present in fpstate, but in init state, the function copies from init_fpstate via copy_feature(). But, dynamic states are not present in init_fpstate because of all-zeros init states. Then retrieving them from init_fpstate will explode like this: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... RIP: 0010:memcpy_erms+0x6/0x10 ? __copy_xstate_to_uabi_buf+0x381/0x870 fpu_copy_guest_fpstate_to_uabi+0x28/0x80 kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm] ? __this_cpu_preempt_check+0x13/0x20 ? vmx_vcpu_put+0x2e/0x260 [kvm_intel] kvm_vcpu_ioctl+0xea/0x6b0 [kvm] ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm] ? __fget_light+0xd4/0x130 __x64_sys_ioctl+0xe3/0x910 ? debug_smp_processor_id+0x17/0x20 ? fpregs_assert_state_consistent+0x27/0x50 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Adjust the 'mask' to zero out the userspace buffer for the features that are not available both from fpstate and from init_fpstate. The dynamic features depend on the compacted XSAVE format. Ensure it is enabled before reading XCOMP_BV in init_fpstate. Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode") Reported-by: Yuan Yao <[email protected]> Suggested-by: Dave Hansen <[email protected]> Signed-off-by: Chang S. Bae <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Tested-by: Yuan Yao <[email protected]> Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/ Link: https://lkml.kernel.org/r/[email protected]
2022-10-21x86/unwind/orc: Fix unreliable stack dump with gcovChen Zhongjin1-1/+1
When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder, causing the stack trace to show all text addresses as unreliable: # echo l > /proc/sysrq-trigger [ 477.521031] sysrq: Show backtrace of all active CPUs [ 477.523813] NMI backtrace for cpu 0 [ 477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65 [ 477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 [ 477.526439] Call Trace: [ 477.526854] <TASK> [ 477.527216] ? dump_stack_lvl+0xc7/0x114 [ 477.527801] ? dump_stack+0x13/0x1f [ 477.528331] ? nmi_cpu_backtrace.cold+0xb5/0x10d [ 477.528998] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 477.529641] ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0 [ 477.530393] ? arch_trigger_cpumask_backtrace+0x1d/0x30 [ 477.531136] ? sysrq_handle_showallcpus+0x1b/0x30 [ 477.531818] ? __handle_sysrq.cold+0x4e/0x1ae [ 477.532451] ? write_sysrq_trigger+0x63/0x80 [ 477.533080] ? proc_reg_write+0x92/0x110 [ 477.533663] ? vfs_write+0x174/0x530 [ 477.534265] ? handle_mm_fault+0x16f/0x500 [ 477.534940] ? ksys_write+0x7b/0x170 [ 477.535543] ? __x64_sys_write+0x1d/0x30 [ 477.536191] ? do_syscall_64+0x6b/0x100 [ 477.536809] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 477.537609] </TASK> This happens when the compiled code for show_stack() has a single word on the stack, and doesn't use a tail call to show_stack_log_lvl(). (CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.) Then the __unwind_start() skip logic hits an off-by-one bug and fails to unwind all the way to the intended starting frame. Fix it by reverting the following commit: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") The original justification for that commit no longer exists. That original issue was later fixed in a different way, with the following commit: f2ac57a4c49d ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels") Fixes: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Signed-off-by: Chen Zhongjin <[email protected]> [jpoimboe: rewrite commit log] Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]>
2022-10-21crypto: x86/polyval - Fix crashes when keys are not 16-byte alignedNathan Huckleberry1-5/+14
crypto_tfm::__crt_ctx is not guaranteed to be 16-byte aligned on x86-64. This causes crashes due to movaps instructions in clmul_polyval_update. Add logic to align polyval_tfm_ctx to 16 bytes. Cc: <[email protected]> Fixes: 34f7f6c30112 ("crypto: x86/polyval - Add PCLMULQDQ accelerated implementation of POLYVAL") Reported-by: Bruno Goncalves <[email protected]> Signed-off-by: Nathan Huckleberry <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2022-10-21iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check()Charlotte Tan1-1/+3
arch_rmrr_sanity_check() warns if the RMRR is not covered by an ACPI Reserved region, but it seems like it should accept an NVS region as well. The ACPI spec https://uefi.org/specs/ACPI/6.5/15_System_Address_Map_Interfaces.html uses similar wording for "Reserved" and "NVS" region types; for NVS regions it says "This range of addresses is in use or reserved by the system and must not be used by the operating system." There is an old comment on this mailing list that also suggests NVS regions should pass the arch_rmrr_sanity_check() test: The warnings come from arch_rmrr_sanity_check() since it checks whether the region is E820_TYPE_RESERVED. However, if the purpose of the check is to detect RMRR has regions that may be used by OS as free memory, isn't E820_TYPE_NVS safe, too? This patch overlaps with another proposed patch that would add the region type to the log since sometimes the bug reporter sees this log on the console but doesn't know to include the kernel log: https://lore.kernel.org/lkml/[email protected]/ Here's an example of the "Firmware Bug" apparent false positive (wrapped for line length): DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x000000006f760000-0x000000006f762fff], contact BIOS vendor for fixes DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x000000006f760000-0x000000006f762fff] This is the snippet from the e820 table: BIOS-e820: [mem 0x0000000068bff000-0x000000006ebfefff] reserved BIOS-e820: [mem 0x000000006ebff000-0x000000006f9fefff] ACPI NVS BIOS-e820: [mem 0x000000006f9ff000-0x000000006fffefff] ACPI data Fixes: f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") Cc: Will Mortensen <[email protected]> Link: https://lore.kernel.org/linux-iommu/[email protected]/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216443 Signed-off-by: Charlotte Tan <[email protected]> Reviewed-by: Aaron Tomlin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lu Baolu <[email protected]> Signed-off-by: Joerg Roedel <[email protected]>
2022-10-21parisc: Use signed char for hardware path in pdc.hHelge Deller1-23/+13
Clean up the struct for hardware_path and drop the struct device_path with a proper assignment of bc[] and mod members as signed chars. This patch prepares for the kbuild change from Jason A. Donenfeld to treat char as always unsigned. Signed-off-by: Helge Deller <[email protected]> Cc: Jason A. Donenfeld <[email protected]>
2022-10-21RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for SstcAnup Patel3-2/+19
The kvm_riscv_vcpu_timer_pending() checks per-VCPU next_cycles and per-VCPU software injected VS timer interrupt. This function returns incorrect value when Sstc is available because the per-VCPU next_cycles are only updated by kvm_riscv_vcpu_timer_save() called from kvm_arch_vcpu_put(). As a result, when Sstc is available the VCPU does not block properly upon WFI traps. To fix the above issue, we introduce kvm_riscv_vcpu_timer_sync() which will update per-VCPU next_cycles upon every VM exit instead of kvm_riscv_vcpu_timer_save(). Fixes: 8f5cb44b1bae ("RISC-V: KVM: Support sstc extension") Signed-off-by: Anup Patel <[email protected]> Reviewed-by: Atish Patra <[email protected]> Signed-off-by: Anup Patel <[email protected]>
2022-10-21RISC-V: Fix compilation without RISCV_ISA_ZICBOMAndrew Jones3-49/+38
riscv_cbom_block_size and riscv_init_cbom_blocksize() should always be available and riscv_init_cbom_blocksize() should always be invoked, even when compiling without RISCV_ISA_ZICBOM enabled. This is because disabling RISCV_ISA_ZICBOM means "don't use zicbom instructions in the kernel" not "pretend there isn't zicbom, even when there is". When zicbom is available, whether the kernel enables its use with RISCV_ISA_ZICBOM or not, KVM will offer it to guests. Ensure we can build KVM and that the block size is initialized even when compiling without RISCV_ISA_ZICBOM. Fixes: 8f7e001e0325 ("RISC-V: Clean up the Zicbom block size probing") Reported-by: kernel test robot <[email protected]> Signed-off-by: Andrew Jones <[email protected]> Signed-off-by: Anup Patel <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Reviewed-by: Heiko Stuebner <[email protected]> Tested-by: Heiko Stuebner <[email protected]> Signed-off-by: Anup Patel <[email protected]>
2022-10-20bpf: Fix dispatcher patchable function entry to 5 bytes nopJiri Olsa1-0/+13
The patchable_function_entry(5) might output 5 single nop instructions (depends on toolchain), which will clash with bpf_arch_text_poke check for 5 bytes nop instruction. Adding early init call for dispatcher that checks and change the patchable entry into expected 5 nop instruction if needed. There's no need to take text_mutex, because we are using it in early init call which is called at pre-smp time. Fixes: ceea991a019c ("bpf: Move bpf_dispatcher function out of ftrace locations") Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-10-20perf/x86/intel/lbr: Use setup_clear_cpu_cap() instead of clear_cpu_cap()Maxim Levitsky1-1/+1
clear_cpu_cap(&boot_cpu_data) is very similar to setup_clear_cpu_cap() except that the latter also sets a bit in 'cpu_caps_cleared' which later clears the same cap in secondary cpus, which is likely what is meant here. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Signed-off-by: Maxim Levitsky <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kan Liang <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-10-20ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()Peter Zijlstra2-9/+15
Different function signatures means they needs to be different functions; otherwise CFI gets upset. As triggered by the ftrace boot tests: [] CFI failure at ftrace_return_to_handler+0xac/0x16c (target: ftrace_stub+0x0/0x14; expected type: 0x0a5d5347) Fixes: 3c516f89e17e ("x86: Add support for CONFIG_CFI_CLANG") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-10-20x86/ftrace: Remove ftrace_epilogue()Peter Zijlstra1-15/+6
Remove the weird jumps to RET and simply use RET. This then promotes ftrace_stub() to a real function; which becomes important for kcfi. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Peter Zijlstra <[email protected]>
2022-10-18x86/resctrl: Fix min_cbm_bits for AMDBabu Moger1-6/+2
AMD systems support zero CBM (capacity bit mask) for cache allocation. That is reflected in rdt_init_res_defs_amd() by: r->cache.arch_has_empty_bitmaps = true; However given the unified code in cbm_validate(), checking for: val == 0 && !arch_has_empty_bitmaps is not enough because of another check in cbm_validate(): if ((zero_bit - first_bit) < r->cache.min_cbm_bits) The default value of r->cache.min_cbm_bits = 1. Leading to: $ cd /sys/fs/resctrl $ mkdir foo $ cd foo $ echo L3:0=0 > schemata -bash: echo: write error: Invalid argument $ cat /sys/fs/resctrl/info/last_cmd_status Need at least 1 bits in the mask Initialize the min_cbm_bits to 0 for AMD. Also, remove the default setting of min_cbm_bits and initialize it separately. After the fix: $ cd /sys/fs/resctrl $ mkdir foo $ cd foo $ echo L3:0=0 > schemata $ cat /sys/fs/resctrl/info/last_cmd_status ok Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps") Co-developed-by: Stephane Eranian <[email protected]> Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Reviewed-by: James Morse <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]
2022-10-18powerpc/64s/interrupt: Perf NMI should not take normal exit pathNicholas Piggin2-7/+21
NMI interrupts should exit with EXCEPTION_RESTORE_REGS not with interrupt_return_srr, which is what the perf NMI handler currently does. This breaks if a PMI hits after interrupt_exit_user_prepare_main() has switched the context tracking to user mode, then the CT_WARN_ON() in interrupt_exit_kernel_prepare() fires because it returns to kernel with context set to user. This could possibly be solved by soft-disabling PMIs in the exit path, but that reduces our ability to profile that code. The warning could be removed, but it's potentially useful. All other NMIs and soft-NMIs return using EXCEPTION_RESTORE_REGS, so this makes perf interrupts consistent with that and seems like the best fix. Signed-off-by: Nicholas Piggin <[email protected]> [mpe: Squash in fixups from Nick] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18powerpc/64/interrupt: Prevent NMI PMI causing a dangerous warningNicholas Piggin2-3/+16
NMI PMIs really should not return using the normal interrupt_return function. If such a PMI hits in code returning to user with the context switched to user mode, this warning can fire. This was enough to cause crashes when reproducing on 64s, because another perf interrupt would hit while reporting bug, and that would cause another bug, and so on until smashing the stack. Work around that particular crash for now by just disabling that context warning for PMIs. This is a hack and not a complete fix, there could be other such problems lurking in corners. But it does fix the known crash. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18KVM: PPC: BookS PR-KVM and BookE do not support context trackingNicholas Piggin1-0/+4
The context tracking code in PR-KVM and BookE implementations is not complete, and can cause host crashes if context tracking is enabled. Make these implementations depend on !CONTEXT_TRACKING_USER. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18powerpc: Fix reschedule bug in KUAP-unlocked user copyNicholas Piggin1-1/+11
schedule must not be explicitly called while KUAP is unlocked, because the AMR register will not be saved across the context switch on 64s (preemption is allowed because that is driven by interrupts which do save the AMR). exit_vmx_usercopy() runs inside an unlocked user access region, and it calls preempt_enable() which will call schedule() if need_resched() was set while non-preemptible. This can cause tasks to run unprotected when the should not, and can cause the user copy to be improperly blocked when scheduling back to it. Fix this by avoiding the explicit resched for preempt kernels by generating an interrupt to reschedule the context if need_resched() got set. Reported-by: Samuel Holland <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18powerpc/64s: Fix hash__change_memory_range preemption warningNicholas Piggin1-3/+5
stop_machine_cpuslocked takes a mutex so it must be called in a preemptible context, so it can't simply be fixed by disabling preemption. This is not a bug, because CPU hotplug is locked, so this processor will call in to the stop machine function. So raw_smp_processor_id() could be used. This leaves a small chance that this thread will be migrated to another CPU, so the master work would be done by a CPU from a different context. Better for test coverage to make that a common case by just having the first CPU to call in become the master. Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18powerpc/64s: Disable preemption in hash lazy mmu modeNicholas Piggin1-0/+6
apply_to_page_range on kernel pages does not disable preemption, which is a requirement for hash's lazy mmu mode, which keeps track of the TLBs to flush with a per-cpu array. Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18powerpc/64s: make linear_map_hash_lock a raw spinlockNicholas Piggin1-6/+6
This lock is taken while the raw kfence_freelist_lock is held, so it must also be a raw spinlock, as reported by lockdep when raw lock nesting checking is enabled. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18powerpc/64s: make HPTE lock and native_tlbie_lock irq-safeNicholas Piggin1-2/+25
With kfence enabled, there are several cases where HPTE and TLBIE locks are called from softirq context, for example: WARNING: inconsistent lock state 6.0.0-11845-g0cbbc95b12ac #1 Tainted: G N -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes: c000000002734de8 (native_tlbie_lock){+.?.}-{2:2}, at: .native_hpte_updateboltedpp+0x1a4/0x600 {IN-SOFTIRQ-W} state was registered at: .lock_acquire+0x20c/0x520 ._raw_spin_lock+0x4c/0x70 .native_hpte_invalidate+0x62c/0x840 .hash__kernel_map_pages+0x450/0x640 .kfence_protect+0x58/0xc0 .kfence_guarded_free+0x374/0x5a0 .__slab_free+0x3d0/0x630 .put_cred_rcu+0xcc/0x120 .rcu_core+0x3c4/0x14e0 .__do_softirq+0x1dc/0x7dc .do_softirq_own_stack+0x40/0x60 Fix this by consistently disabling irqs while taking either of these locks. Don't just disable bh because several of the more common cases already disable irqs, so this just makes the locks always irq-safe. Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18powerpc/64s: Add lockdep for HPTE lockNicholas Piggin1-7/+35
Add lockdep annotation for the HPTE bit-spinlock. Modern systems don't take the tlbie lock, so this shows up some of the same lockdep warnings that were being reported by the ppc970. And they're not taken in exactly the same places so this is nice to have in its own right. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPUHaren Myneni3-14/+45
The hypervisor assigns VAS (Virtual Accelerator Switchboard) windows depends on cores configured in LPAR. The kernel uses OF reconfig notifier to reconfig VAS windows for DLPAR CPU event. In the case of shared CPU mode partition, the hypervisor assigns VAS windows depends on CPU entitled capacity, not based on vcpus. When the user changes CPU entitled capacity for the partition, drmgr uses /proc/ppc64/lparcfg interface to notify the kernel. This patch adds the following changes to update VAS resources for shared mode: - Call vas reconfig windows from lparcfg_write() - Ignore reconfig changes in the VAS notifier Signed-off-by: Haren Myneni <[email protected]> [mpe: Rework error handling, report any errors as EIO] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18powerpc/pseries/vas: Add VAS IRQ primary handlerHaren Myneni2-7/+34
irq_default_primary_handler() can be used only with IRQF_ONESHOT flag, but the flag disables IRQ before executing the thread handler and enables it after the interrupt is handled. But this IRQ disable sets the VAS IRQ OFF state in the hypervisor. In case if NX faults during this window, the hypervisor will not deliver the fault interrupt to the partition and the user space may wait continuously for the CSB update. So use VAS specific IRQ handler instead of calling the default primary handler. Increment pending_faults counter in IRQ handler and the bottom thread handler will process all faults based on this counter. In case if the another interrupt is received while the thread is running, it will be processed using this counter. The synchronization of top and bottom handlers will be done with IRQTF_RUNTHREAD flag and will re-enter to bottom half if this flag is set. Signed-off-by: Haren Myneni <[email protected]> Reviewed-by: Frederic Barrat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-18x86/microcode/AMD: Apply the patch early on every logical threadBorislav Petkov1-3/+13
Currently, the patch application logic checks whether the revision needs to be applied on each logical CPU (SMT thread). Therefore, on SMT designs where the microcode engine is shared between the two threads, the application happens only on one of them as that is enough to update the shared microcode engine. However, there are microcode patches which do per-thread modification, see Link tag below. Therefore, drop the revision check and try applying on each thread. This is what the BIOS does too so this method is very much tested. Btw, change only the early paths. On the late loading paths, there's no point in doing per-thread modification because if is it some case like in the bugzilla below - removing a CPUID flag - the kernel cannot go and un-use features it has detected are there early. For that, one should use early loading anyway. [ bp: Fixes does not contain the oldest commit which did check for equality but that is good enough. ] Fixes: 8801b3fcb574 ("x86/microcode/AMD: Rework container parsing") Reported-by: Ștefan Talpalaru <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Tested-by: Ștefan Talpalaru <[email protected]> Cc: <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
2022-10-17ARC: mm: fix leakage of memory allocated for PTEPavel Kozlov1-1/+1
Since commit d9820ff ("ARC: mm: switch pgtable_t back to struct page *") a memory leakage problem occurs. Memory allocated for page table entries not released during process termination. This issue can be reproduced by a small program that allocates a large amount of memory. After several runs, you'll see that the amount of free memory has reduced and will continue to reduce after each run. All ARC CPUs are effected by this issue. The issue was introduced since the kernel stable release v5.15-rc1. As described in commit d9820ff after switch pgtable_t back to struct page *, a pointer to "struct page" and appropriate functions are used to allocate and free a memory page for PTEs, but the pmd_pgtable macro hasn't changed and returns the direct virtual address from the PMD (PGD) entry. Than this address used as a parameter in the __pte_free() and as a result this function couldn't release memory page allocated for PTEs. Fix this issue by changing the pmd_pgtable macro and returning pointer to struct page. Fixes: d9820ff76f95 ("ARC: mm: switch pgtable_t back to struct page *") Cc: Mike Rapoport <[email protected]> Cc: <[email protected]> # 5.15.x Signed-off-by: Pavel Kozlov <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2022-10-17arc: update config filesLukas Bulwahn13-33/+2
Clean up config files by: - removing configs that were deleted in the past - removing configs not in tree and without recently pending patches - adding new configs that are replacements for old configs in the file For some detailed information, see Link. Link: https://lore.kernel.org/kernel-janitors/[email protected]/ Signed-off-by: Lukas Bulwahn <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2022-10-17arc: iounmap() arg is volatileRandy Dunlap2-2/+2
Add 'volatile' to iounmap()'s argument to prevent build warnings. This make it the same as other major architectures. Placates these warnings: (12 such warnings) ../drivers/video/fbdev/riva/fbdev.c: In function 'rivafb_probe': ../drivers/video/fbdev/riva/fbdev.c:2067:42: error: passing argument 1 of 'iounmap' discards 'volatile' qualifier from pointer target type [-Werror=discarded-qualifiers] 2067 | iounmap(default_par->riva.PRAMIN); Fixes: 1162b0701b14b ("ARC: I/O and DMA Mappings") Signed-off-by: Randy Dunlap <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: [email protected] Cc: Arnd Bergmann <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2022-10-17arc: dts: Harmonize EHCI/OHCI DT nodes nameSerge Semin5-9/+9
In accordance with the Generic EHCI/OHCI bindings the corresponding node name is suppose to comply with the Generic USB HCD DT schema, which requires the USB nodes to have the name acceptable by the regexp: "^usb(@.*)?" . Make sure the "generic-ehci" and "generic-ohci"-compatible nodes are correctly named. Signed-off-by: Serge Semin <[email protected]> Acked-by: Alexey Brodkin <[email protected]> Acked-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2022-10-17ARC: bitops: Change __fls to return unsigned longAmadeusz Sławiński1-2/+2
As per asm-generic definition and other architectures __fls should return unsigned long. No functional change is expected as return value should fit in unsigned long. Reviewed-by: Cezary Rojewski <[email protected]> Signed-off-by: Amadeusz Sławiński <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2022-10-17ARC: Fix comment typoZhang Jiaming1-1/+1
Change 'seperate' to 'separate'. Signed-off-by: Zhang Jiaming <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2022-10-17ARC: Fix comment typoJilin Yuan2-3/+3
- Remove one of the repeated 'call' in comment line 396. - Delete the redundant word 'to', 'since' Signed-off-by: Jilin Yuan <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2022-10-17x86/topology: Fix duplicated core ID within a packageZhang Rui1-1/+1
Today, core ID is assumed to be unique within each package. But an AlderLake-N platform adds a Module level between core and package, Linux excludes the unknown modules bits from the core ID, resulting in duplicate core ID's. To keep core ID unique within a package, Linux must include all APIC-ID bits for known or unknown levels above the core and below the package in the core ID. It is important to understand that core ID's have always come directly from the APIC-ID encoding, which comes from the BIOS. Thus there is no guarantee that they start at 0, or that they are contiguous. As such, naively using them for array indexes can be problematic. [ dhansen: un-known -> unknown ] Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support") Suggested-by: Len Brown <[email protected]> Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Len Brown <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2022-10-17x86/topology: Fix multiple packages shown on a single-package systemZhang Rui1-4/+10
CPUID.1F/B does not enumerate Package level explicitly, instead, all the APIC-ID bits above the enumerated levels are assumed to be package ID bits. Current code gets package ID by shifting out all the APIC-ID bits that Linux supports, rather than shifting out all the APIC-ID bits that CPUID.1F enumerates. This introduces problems when CPUID.1F enumerates a level that Linux does not support. For example, on a single package AlderLake-N, there are 2 Ecore Modules with 4 atom cores in each module. Linux does not support the Module level and interprets the Module ID bits as package ID and erroneously reports a multi module system as a multi-package system. Fix this by using APIC-ID bits above all the CPUID.1F enumerated levels as package ID. [ dhansen: spelling fix ] Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support") Suggested-by: Len Brown <[email protected]> Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Len Brown <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2022-10-17x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUBNathan Chancellor1-1/+0
A recent change in LLVM made CONFIG_EFI_STUB unselectable because it no longer pretends to support -mabi=ms, breaking the dependency in Kconfig. Lack of CONFIG_EFI_STUB can prevent kernels from booting via EFI in certain circumstances. This check was added by 8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi") to ensure that __attribute__((ms_abi)) was available, as -mabi=ms is not actually used in any cflags. According to the GCC documentation, this attribute has been supported since GCC 4.4.7. The kernel currently requires GCC 5.1 so this check is not necessary; even when that change landed in 5.6, the kernel required GCC 4.9 so it was unnecessary then as well. Clang supports __attribute__((ms_abi)) for all versions that are supported for building the kernel so no additional check is needed. Remove the 'depends on' line altogether to allow CONFIG_EFI_STUB to be selected when CONFIG_EFI is enabled, regardless of compiler. Fixes: 8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi") Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Cc: [email protected] Link: https://github.com/llvm/llvm-project/commit/d1ad006a8f64bdc17f618deffa9e7c91d82c444d
2022-10-17x86/fpu: Exclude dynamic states from init_fpstateChang S. Bae1-3/+6
== Background == The XSTATE init code initializes all enabled and supported components. Then, the init states are saved in the init_fpstate buffer that is statically allocated in about one page. The AMX TILE_DATA state is large (8KB) but its init state is zero. And the feature comes only with the compacted format with these established dependencies: AMX->XFD->XSAVES. So this state is excludable from init_fpstate. == Problem == But the buffer is formatted to include that large state. Then, this can be the cause of a noisy splat like the below. This came from XRSTORS for the task with init_fpstate in its XSAVE buffer. It is reproducible on AMX systems when the running kernel is built with CONFIG_DEBUG_PAGEALLOC=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y: Bad FPU state detected at restore_fpregs_from_fpstate+0x57/0xd0, reinitializing FPU registers. ... RIP: 0010:restore_fpregs_from_fpstate+0x57/0xd0 ? restore_fpregs_from_fpstate+0x45/0xd0 switch_fpu_return+0x4e/0xe0 exit_to_user_mode_prepare+0x17b/0x1b0 syscall_exit_to_user_mode+0x29/0x40 do_syscall_64+0x67/0x80 ? do_syscall_64+0x67/0x80 ? exc_page_fault+0x86/0x180 entry_SYSCALL_64_after_hwframe+0x63/0xcd == Solution == Adjust init_fpstate to exclude dynamic states. XRSTORS from init_fpstate still initializes those states when their bits are set in the requested-feature bitmap. Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode") Reported-by: Lin X Wang <[email protected]> Signed-off-by: Chang S. Bae <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Lin X Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-17x86/fpu: Fix the init_fpstate size check with the actual sizeChang S. Bae1-18/+6
The init_fpstate buffer is statically allocated. Thus, the sanity test was established to check whether the pre-allocated buffer is enough for the calculated size or not. The currently measured size is not strictly relevant. Fix to validate the calculated init_fpstate size with the pre-allocated area. Also, replace the sanity check function with open code for clarity. The abstraction itself and the function naming do not tend to represent simply what it does. Fixes: 2ae996e0c1a3 ("x86/fpu: Calculate the default sizes independently") Signed-off-by: Chang S. Bae <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-17x86/fpu: Configure init_fpstate attributes orderlyChang S. Bae2-9/+5
The init_fpstate setup code is spread out and out of order. The init image is recorded before its scoped features and the buffer size are determined. Determine the scope of init_fpstate components and its size before recording the init state. Also move the relevant code together. Signed-off-by: Chang S. Bae <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: [email protected] Link: https://lore.kernel.org/r/[email protected]
2022-10-16Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds23-26/+26
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
2022-10-15Merge tag 'for-linus' of https://github.com/openrisc/linuxLinus Torvalds1-8/+8
Pull OpenRISC updates from Stafford Horne: "I have relocated to London so not much work from me while I get settled. Still, OpenRISC picked up two patches in this window: - Fix for kernel page table walking from Jann Horn - MAINTAINER entry cleanup from Palmer Dabbelt" * tag 'for-linus' of https://github.com/openrisc/linux: MAINTAINERS: git://github -> https://github.com for openrisc openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached