aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/apic/apic.c
AgeCommit message (Collapse)AuthorFilesLines
2023-04-07x86/apic: Fix atomic update of offset in reserve_eilvt_offset()Uros Bizjak1-3/+2
The detection of atomic update failure in reserve_eilvt_offset() is not correct. The value returned by atomic_cmpxchg() should be compared to the old value from the location to be updated. If these two are the same, then atomic update succeeded and "eilvt_offsets[offset]" location is updated to "new" in an atomic way. Otherwise, the atomic update failed and it should be retried with the value from "eilvt_offsets[offset]" - exactly what atomic_try_cmpxchg() does in a correct and more optimal way. Fixes: a68c439b1966c ("apic, x86: Check if EILVT APIC registers are available (AMD only)") Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-02x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOSMateusz Jończyk1-5/+8
A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on platforms that have x2APIC already enabled in the BIOS before starting the kernel. The kernel was supposed to panic with an approprite error message in validate_x2apic() due to the missing X2APIC support. However, validate_x2apic() was run too late in the boot cycle, and the kernel tried to initialize the APIC nonetheless. This resulted in an earlier panic in setup_local_APIC() because the APIC was not registered. In my experiments, a panic message in setup_local_APIC() was not visible in the graphical console, which resulted in a hang with no indication what has gone wrong. Instead of calling panic(), disable the APIC, which results in a somewhat working system with the PIC only (and no SMP). This way the user is able to diagnose the problem more easily. Disabling X2APIC mode is not an option because it's impossible on systems with locked x2APIC. The proper place to disable the APIC in this case is in check_x2apic(), which is called early from setup_arch(). Doing this in __apic_intr_mode_select() is too late. Make check_x2apic() unconditionally available and remove the empty stub. Reported-by: Paul Menzel <[email protected]> Reported-by: Robert Elliott (Servers) <[email protected]> Signed-off-by: Mateusz Jończyk <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Link: https://lore.kernel.org/lkml/[email protected] Link: https://lore.kernel.org/all/[email protected]
2022-08-31x86/apic: Don't disable x2APIC if lockedDaniel Sneddon1-4/+40
The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC (or x2APIC). X2APIC mode is mostly compatible with legacy APIC, but it disables the memory-mapped APIC interface in favor of one that uses MSRs. The APIC mode is controlled by the EXT bit in the APIC MSR. The MMIO/xAPIC interface has some problems, most notably the APIC LEAK [1]. This bug allows an attacker to use the APIC MMIO interface to extract data from the SGX enclave. Introduce support for a new feature that will allow the BIOS to lock the APIC in x2APIC mode. If the APIC is locked in x2APIC mode and the kernel tries to disable the APIC or revert to legacy APIC mode a GP fault will occur. Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by preventing the kernel from trying to disable the x2APIC. On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS. If legacy APIC is required, then it SGX and TDX need to be disabled in the BIOS. [1]: https://aepicleak.com/aepicleak.pdf Signed-off-by: Daniel Sneddon <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Dave Hansen <[email protected]> Tested-by: Neelima Krishnan <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-08-07Merge tag 'mm-nonmm-stable-2022-08-06-2' of ↵Linus Torvalds1-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc updates from Andrew Morton: "Updates to various subsystems which I help look after. lib, ocfs2, fatfs, autofs, squashfs, procfs, etc. A relatively small amount of material this time" * tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits) scripts/gdb: ensure the absolute path is generated on initial source MAINTAINERS: kunit: add David Gow as a maintainer of KUnit mailmap: add linux.dev alias for Brendan Higgins mailmap: update Kirill's email profile: setup_profiling_timer() is moslty not implemented ocfs2: fix a typo in a comment ocfs2: use the bitmap API to simplify code ocfs2: remove some useless functions lib/mpi: fix typo 'the the' in comment proc: add some (hopefully) insightful comments bdi: remove enum wb_congested_state kernel/hung_task: fix address space of proc_dohung_task_timeout_secs lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t() squashfs: support reading fragments in readahead call squashfs: implement readahead squashfs: always build "file direct" version of page actor Revert "squashfs: provide backing_dev_info in order to disable read-ahead" fs/ocfs2: Fix spelling typo in comment ia64: old_rr4 added under CONFIG_HUGETLB_PAGE proc: fix test for "vsyscall=xonly" boot option ...
2022-07-29profile: setup_profiling_timer() is moslty not implementedBen Dooks1-5/+0
The setup_profiling_timer() is mostly un-implemented by many architectures. In many places it isn't guarded by CONFIG_PROFILE which is needed for it to be used. Make it a weak symbol in kernel/profile.c and remove the 'return -EINVAL' implementations from the kenrel. There are a couple of architectures which do return 0 from the setup_profiling_timer() function but they don't seem to do anything else with it. To keep the /proc compatibility for now, leave these for a future update or removal. On ARM, this fixes the following sparse warning: arch/arm/kernel/smp.c:793:5: warning: symbol 'setup_profiling_timer' was not declared. Should it be static? Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ben Dooks <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-06-24KVM: x86: lapic: Rename [GET/SET]_APIC_DEST_FIELD to [GET/SET]_XAPIC_DEST_FIELDSuravee Suthikulpanit1-1/+1
To signify that the macros only support 8-bit xAPIC destination ID. Suggested-by: Maxim Levitsky <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Signed-off-by: Suravee Suthikulpanit <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-05-23Merge tag 'x86_misc_for_v5.19_rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Borislav Petkov: "A variety of fixes which don't fit any other tip bucket: - Remove unnecessary function export - Correct asm constraint - Fix __setup handlers retval" * tag 'x86_misc_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Cleanup the control_va_addr_alignment() __setup handler x86: Fix return value of __setup handlers x86/delay: Fix the wrong asm constraint in delay_loop() x86/amd_nb: Unexport amd_cache_northbridges()
2022-05-23Merge tag 'x86_apic_for_v5.19_rc1' of ↵Linus Torvalds1-4/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Borislav Petkov: - Always do default APIC routing setup so that cpumasks are properly allocated and are present when later accessed ("nosmp" and x2APIC) - Clarify the bit overlap between an old APIC and a modern, integrated one * tag 'x86_apic_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Do apic driver probe for "nosmp" use case x86/apic: Clarify i82489DX bit overlap in APIC_LVT0
2022-05-04x86: Fix return value of __setup handlersRandy Dunlap1-1/+1
__setup() handlers should return 1 to obsolete_checksetup() in init/main.c to indicate that the boot option has been handled. A return of 0 causes the boot option/value to be listed as an Unknown kernel parameter and added to init's (limited) argument (no '=') or environment (with '=') strings. So return 1 from these x86 __setup handlers. Examples: Unknown kernel command line parameters "apicpmtimer BOOT_IMAGE=/boot/bzImage-517rc8 vdso=1 ring3mwait=disable", will be passed to user space. Run /sbin/init as init process with arguments: /sbin/init apicpmtimer with environment: HOME=/ TERM=linux BOOT_IMAGE=/boot/bzImage-517rc8 vdso=1 ring3mwait=disable Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Fixes: 77b52b4c5c66 ("x86: add "debugpat" boot option") Fixes: e16fd002afe2 ("x86/cpufeature: Enable RING3MWAIT for Knights Landing") Fixes: b8ce33590687 ("x86_64: convert to clock events") Reported-by: Igor Zhbanov <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected]
2022-05-01x86/apic: Do apic driver probe for "nosmp" use caseKuppuswamy Sathyanarayanan1-3/+2
For the "nosmp" use case, the APIC initialization code selects "APIC_SYMMETRIC_IO_NO_ROUTING" as the default interrupt mode and avoids probing APIC drivers. This works well for the default APIC modes, but for the x2APIC case the probe function is required to allocate the cluster_hotplug mask. So in the APIC_SYMMETRIC_IO_NO_ROUTING case when the x2APIC is initialized it dereferences a NULL pointer and the kernel crashes. This was observed on a TDX platform where x2APIC is enabled and "nosmp" command line option is allowed. To fix this issue, probe APIC drivers via default_setup_apic_routing() for the APIC_SYMMETRIC_IO_NO_ROUTING interrupt mode too. Suggested-by: Kirill A. Shutemov <[email protected]> Suggested-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Tony Luck <[email protected]> Link: https://lore.kernel.org/r/a64f864e1114bcd63593286aaf61142cfce384ea.1650076869.git.sathyanarayanan.kuppuswamy@intel.com
2022-04-13x86/apic: Clarify i82489DX bit overlap in APIC_LVT0Thomas Gleixner1-1/+10
Daniel stumbled over the bit overlap of the i82498DX external APIC and the TSC deadline timer configuration bit in modern APICs, which is neither documented in the code nor in the current SDM. Maciej provided links to the original i82489DX/486 documentation. See Link. Remove the i82489DX macro maze, use a i82489DX specific define in the apic code and document the overlap in a comment. Reported-by: Daniel Vacek <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Maciej W. Rozycki <[email protected]> Link: https://lore.kernel.org/r/87ee22f3ci.ffs@tglx
2022-04-07x86/acpi/x86/boot: Add multiprocessor wake-up supportKuppuswamy Sathyanarayanan1-0/+10
Secondary CPU startup is currently performed with something called the "INIT/SIPI protocol". This protocol requires assistance from VMMs to boot guests. As should be a familiar story by now, that support can not be provded to TDX guests because TDX VMMs are not trusted by guests. To remedy this situation a new[1] "Multiprocessor Wakeup Structure" has been added to to an existing ACPI table (MADT). This structure provides the physical address of a "mailbox". A write to the mailbox then steers the secondary CPU to the boot code. Add ACPI MADT wake structure parsing support and wake support. Use this support to wake CPUs whenever it is present instead of INIT/SIPI. While this structure can theoretically be used on 32-bit kernels, there are no 32-bit TDX guest kernels. It has not been tested and can not practically *be* tested on 32-bit. Make it 64-bit only. 1. Details about the new structure can be found in ACPI v6.4, in the "Multiprocessor Wakeup Structure" section. Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Reviewed-by: Rafael J. Wysocki <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-10x86: Add support for 0x22/0x23 port I/O configuration spaceMaciej W. Rozycki1-6/+3
Define macros and accessors for the configuration space addressed indirectly with an index register and a data register at the port I/O locations of 0x22 and 0x23 respectively. This space is defined by the Intel MultiProcessor Specification for the IMCR register used to switch between the PIC and the APIC mode[1], by Cyrix processors for their configuration[2][3], and also some chipsets. Given the lack of atomicity with the indirect addressing a spinlock is required to protect accesses, although for Cyrix processors it is enough if accesses are executed with interrupts locally disabled, because the registers are local to the accessing CPU, and IMCR is only ever poked at by the BSP and early enough for interrupts not to have been configured yet. Therefore existing code does not have to change or use the new spinlock and neither it does. Put the spinlock in a library file then, so that it does not get pulled unnecessarily for configurations that do not refer it. Convert Cyrix accessors to wrappers so as to retain the brevity and clarity of the `getCx86' and `setCx86' calls. References: [1] "MultiProcessor Specification", Version 1.4, Intel Corporation, Order Number: 242016-006, May 1997, Section 3.6.2.1 "PIC Mode", pp. 3-7, 3-8 [2] "5x86 Microprocessor", Cyrix Corporation, Order Number: 94192-00, July 1995, Section 2.3.2.4 "Configuration Registers", p. 2-23 [3] "6x86 Processor", Cyrix Corporation, Order Number: 94175-01, March 1996, Section 2.4.4 "6x86 Configuration Registers", p. 2-23 Signed-off-by: Maciej W. Rozycki <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-05-29x86/apic: Mark _all_ legacy interrupts when IO/APIC is missingThomas Gleixner1-0/+1
PIC interrupts do not support affinity setting and they can end up on any online CPU. Therefore, it's required to mark the associated vectors as system-wide reserved. Otherwise, the corresponding irq descriptors are copied to the secondary CPUs but the vectors are not marked as assigned or reserved. This works correctly for the IO/APIC case. When the IO/APIC is disabled via config, kernel command line or lack of enumeration then all legacy interrupts are routed through the PIC, but nothing marks them as system-wide reserved vectors. As a consequence, a subsequent allocation on a secondary CPU can result in allocating one of these vectors, which triggers the BUG() in apic_update_vector() because the interrupt descriptor slot is not empty. Imran tried to work around that by marking those interrupts as allocated when a CPU comes online. But that's wrong in case that the IO/APIC is available and one of the legacy interrupts, e.g. IRQ0, has been switched to PIC mode because then marking them as allocated will fail as they are already marked as system vectors. Stay consistent and update the legacy vectors after attempting IO/APIC initialization and mark them as system vectors in case that no IO/APIC is available. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Imran Khan <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2021-03-21Merge branch 'linus' into x86/cleanups, to resolve conflictIngo Molnar1-0/+5
Conflicts: arch/x86/kernel/kprobes/ftrace.c Signed-off-by: Ingo Molnar <[email protected]>
2021-03-19x86/apic/of: Fix CPU devicetree-node lookupsJohan Hovold1-0/+5
Architectures that describe the CPU topology in devicetree and do not have an identity mapping between physical and logical CPU ids must override the default implementation of arch_match_cpu_phys_id(). Failing to do so breaks CPU devicetree-node lookups using of_get_cpu_node() and of_cpu_device_node_get() which several drivers rely on. It also causes the CPU struct devices exported through sysfs to point to the wrong devicetree nodes. On x86, CPUs are described in devicetree using their APIC ids and those do not generally coincide with the logical ids, even if CPU0 typically uses APIC id 0. Add the missing implementation of arch_match_cpu_phys_id() so that CPU-node lookups work also with SMP. Apart from fixing the broken sysfs devicetree-node links this likely does not affect current users of mainline kernels on x86. Fixes: 4e07db9c8db8 ("x86/devicetree: Use CPU description from Device Tree") Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-18x86: Fix various typos in commentsIngo Molnar1-5/+5
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: [email protected]
2021-02-24Merge tag 'x86-entry-2021-02-24' of ↵Linus Torvalds1-12/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq entry updates from Thomas Gleixner: "The irq stack switching was moved out of the ASM entry code in course of the entry code consolidation. It ended up being suboptimal in various ways. This reworks the X86 irq stack handling: - Make the stack switching inline so the stackpointer manipulation is not longer at an easy to find place. - Get rid of the unnecessary indirect call. - Avoid the double stack switching in interrupt return and reuse the interrupt stack for softirq handling. - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused about the stack pointer manipulation" * tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix stack-swizzle for FRAME_POINTER=y um: Enforce the usage of asm-generic/softirq_stack.h x86/softirq/64: Inline do_softirq_own_stack() softirq: Move do_softirq_own_stack() to generic asm header softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK x86/softirq: Remove indirection in do_softirq_own_stack() x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall x86/entry: Convert device interrupts to inline stack switching x86/entry: Convert system vectors to irq stack macro x86/irq: Provide macro for inlining irq stack switching x86/apic: Split out spurious handling code x86/irq/64: Adjust the per CPU irq stack pointer by 8 x86/irq: Sanitize irq stack tracking x86/entry: Fix instrumentation annotation
2021-02-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull KVM updates from Paolo Bonzini: "x86: - Support for userspace to emulate Xen hypercalls - Raise the maximum number of user memslots - Scalability improvements for the new MMU. Instead of the complex "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent, but the code that can run against page faults is limited. Right now only page faults take the lock for reading; in the future this will be extended to some cases of page table destruction. I hope to switch the default MMU around 5.12-rc3 (some testing was delayed due to Chinese New Year). - Cleanups for MAXPHYADDR checks - Use static calls for vendor-specific callbacks - On AMD, use VMLOAD/VMSAVE to save and restore host state - Stop using deprecated jump label APIs - Workaround for AMD erratum that made nested virtualization unreliable - Support for LBR emulation in the guest - Support for communicating bus lock vmexits to userspace - Add support for SEV attestation command - Miscellaneous cleanups PPC: - Support for second data watchpoint on POWER10 - Remove some complex workarounds for buggy early versions of POWER9 - Guest entry/exit fixes ARM64: - Make the nVHE EL2 object relocatable - Cleanups for concurrent translation faults hitting the same page - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Simplification of the early init hypercall handling Non-KVM changes (with acks): - Detection of contended rwlocks (implemented only for qrwlocks, because KVM only needs it for x86) - Allow __DISABLE_EXPORTS from assembly code - Provide a saner follow_pfn replacements for modules" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits) KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes KVM: selftests: Don't bother mapping GVA for Xen shinfo test KVM: selftests: Fix hex vs. decimal snafu in Xen test KVM: selftests: Fix size of memslots created by Xen tests KVM: selftests: Ignore recently added Xen tests' build output KVM: selftests: Add missing header file needed by xAPIC IPI tests KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static locking/arch: Move qrwlock.h include after qspinlock.h KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2 KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path KVM: PPC: remove unneeded semicolon KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest KVM: PPC: Book3S HV: Fix radix guest SLB side channel KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR ...
2021-02-12Merge branch 'x86/paravirt' into x86/entryIngo Molnar1-0/+4
Merge in the recent paravirt changes to resolve conflicts caused by objtool annotations. Conflicts: arch/x86/xen/xen-asm.S Signed-off-by: Ingo Molnar <[email protected]>
2021-02-10x86/apic: Split out spurious handling codeThomas Gleixner1-12/+19
sysvec_spurious_apic_interrupt() calls into the handling body of __spurious_interrupt() which is not obvious as that function is declared inside the DEFINE_IDTENTRY_IRQ(spurious_interrupt) macro. As __spurious_interrupt() is currently always inlined this ends up with two copies of the same code for no reason. Split the handling function out and invoke it from both entry points. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-02-04x86/apic: Add extra serialization for non-serializing MSRsDave Hansen1-0/+4
Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for MFENCE; LFENCE. Short summary: we have special MSRs that have weaker ordering than all the rest. Add fencing consistent with current SDM recommendations. This is not known to cause any issues in practice, only in theory. Longer story below: The reason the kernel uses a different semantic is that the SDM changed (roughly in late 2017). The SDM changed because folks at Intel were auditing all of the recommended fences in the SDM and realized that the x2apic fences were insufficient. Why was the pain MFENCE judged insufficient? WRMSR itself is normally a serializing instruction. No fences are needed because the instruction itself serializes everything. But, there are explicit exceptions for this serializing behavior written into the WRMSR instruction documentation for two classes of MSRs: IA32_TSC_DEADLINE and the X2APIC MSRs. Back to x2apic: WRMSR is *not* serializing in this specific case. But why is MFENCE insufficient? MFENCE makes writes visible, but only affects load/store instructions. WRMSR is unfortunately not a load/store instruction and is unaffected by MFENCE. This means that a non-serializing WRMSR could be reordered by the CPU to execute before the writes made visible by the MFENCE have even occurred in the first place. This means that an x2apic IPI could theoretically be triggered before there is any (visible) data to process. Does this affect anything in practice? I honestly don't know. It seems quite possible that by the time an interrupt gets to consume the (not yet) MFENCE'd data, it has become visible, mostly by accident. To be safe, add the SDM-recommended fences for all x2apic WRMSRs. This also leaves open the question of the _other_ weakly-ordered WRMSR: MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as the x2APIC MSRs, it seems substantially less likely to be a problem in practice. While writes to the in-memory Local Vector Table (LVT) might theoretically be reordered with respect to a weakly-ordered WRMSR like TSC_DEADLINE, the SDM has this to say: In x2APIC mode, the WRMSR instruction is used to write to the LVT entry. The processor ensures the ordering of this write and any subsequent WRMSR to the deadline; no fencing is required. But, that might still leave xAPIC exposed. The safest thing to do for now is to add the extra, recommended LFENCE. [ bp: Massage commit message, fix typos, drop accidentally added newline to tools/arch/x86/include/asm/barrier.h. ] Reported-by: Jan Kiszka <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-02-04x86/apic: Export x2apic_mode for use by KVM in "warm" pathSean Christopherson1-0/+1
Export x2apic_mode so that KVM can query whether x2APIC is active without having to incur the RDMSR in x2apic_enabled(). When Posted Interrupts are in use for a guest with an assigned device, KVM ends up checking for x2APIC at least once every time a vCPU halts. KVM could obviously snapshot x2apic_enabled() to avoid the RDMSR, but that's rather silly given that x2apic_mode holds the exact info needed by KVM. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-28x86/apic: Support 15 bits of APIC ID in MSI where availableDavid Woodhouse1-2/+24
Some hypervisors can allow the guest to use the Extended Destination ID field in the MSI address to address up to 32768 CPUs. This applies to all downstream devices which generate MSI cycles, including HPET, I/O-APIC and PCI MSI. HPET and PCI MSI use the same __irq_msi_compose_msg() function, while I/O-APIC generates its own and had support for the extended bits added in a previous commit. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/msi: Provide msi message shadow structsThomas Gleixner1-16/+19
Create shadow structs with named bitfields for msi_msg data, address_lo and address_hi and use them in the MSI message composer. Provide a function to retrieve the destination ID. This could be inline, but that'd create a circular header dependency. [dwmw2: fix bitfields not all to be a union] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/apic: Always provide irq_compose_msi_msg() method for vector domainDavid Woodhouse1-0/+32
This shouldn't be dependent on PCI_MSI. HPET and I/O-APIC can deliver interrupts through MSI without having any PCI in the system at all. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/apic: Cleanup destination modeThomas Gleixner1-1/+1
apic::irq_dest_mode is actually a boolean, but defined as u32 and named in a way which does not explain what it means. Make it a boolean and rename it to 'dest_mode_logical' Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/apic: Get rid of apic:: Dest_logicalThomas Gleixner1-1/+1
struct apic has two members which store information about the destination mode: dest_logical and irq_dest_mode. dest_logical contains a mask which was historically used to set the destination mode in IPI messages. Over time the usage was reduced and the logical/physical functions were seperated. There are only a few places which still use 'dest_logical' but they can use 'irq_dest_mode' instead. irq_dest_mode is actually a boolean where 0 means physical destination mode and 1 means logical destination mode. Of course the name does not reflect the functionality. This will be cleaned up in a subsequent change. Remove apic::dest_logical and fixup the remaining users. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/apic: Fix x2apic enablement without interrupt remappingDavid Woodhouse1-6/+8
Currently, Linux as a hypervisor guest will enable x2apic only if there are no CPUs present at boot time with an APIC ID above 255. Hotplugging a CPU later with a higher APIC ID would result in a CPU which cannot be targeted by external interrupts. Add a filter in x2apic_apic_id_valid() which can be used to prevent such CPUs from coming online, and allow x2apic to be enabled even if they are present at boot time. Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM") Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-16x86/irq: Move apic_post_init() invocation to one placeThomas Gleixner1-0/+3
No point to call it from both 32bit and 64bit implementations of default_setup_apic_routing(). Move it to the caller. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-08-10Merge tag 'locking-urgent-2020-08-10' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generally useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers" * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/seqlock, headers: Untangle the spaghetti monster locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header x86/headers: Remove APIC headers from <asm/smp.h> seqcount: More consistent seqprop names seqcount: Compress SEQCNT_LOCKNAME_ZERO() seqlock: Fold seqcount_LOCKNAME_init() definition seqlock: Fold seqcount_LOCKNAME_t definition seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g hrtimer: Use sequence counter with associated raw spinlock kvm/eventfd: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock vfs: Use sequence counter with associated spinlock timekeeping: Use sequence counter with associated raw spinlock xfrm: policy: Use sequence counters with associated lock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock netfilter: conntrack: Use sequence counter with associated spinlock sched: tasks: Use sequence counter with associated spinlock ...
2020-08-07mm: remove unneeded includes of <asm/pgalloc.h>Mike Rapoport1-1/+0
Patch series "mm: cleanup usage of <asm/pgalloc.h>" Most architectures have very similar versions of pXd_alloc_one() and pXd_free_one() for intermediate levels of page table. These patches add generic versions of these functions in <asm-generic/pgalloc.h> and enable use of the generic functions where appropriate. In addition, functions declared and defined in <asm/pgalloc.h> headers are used mostly by core mm and early mm initialization in arch and there is no actual reason to have the <asm/pgalloc.h> included all over the place. The first patch in this series removes unneeded includes of <asm/pgalloc.h> In the end it didn't work out as neatly as I hoped and moving pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require unnecessary changes to arches that have custom page table allocations, so I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local to mm/. This patch (of 8): In most cases <asm/pgalloc.h> header is required only for allocations of page table memory. Most of the .c files that include that header do not use symbols declared in <asm/pgalloc.h> and do not require that header. As for the other header files that used to include <asm/pgalloc.h>, it is possible to move that include into the .c file that actually uses symbols from <asm/pgalloc.h> and drop the include from the header file. The process was somewhat automated using sed -i -E '/[<"]asm\/pgalloc\.h/d' \ $(grep -L -w -f /tmp/xx \ $(git grep -E -l '[<"]asm/pgalloc\.h')) where /tmp/xx contains all the symbols defined in arch/*/include/asm/pgalloc.h. [[email protected]: fix powerpc warning] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> [m68k] Cc: Abdul Haleem <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Satheesh Rajendran <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Matthew Wilcox <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-06x86/headers: Remove APIC headers from <asm/smp.h>Ingo Molnar1-0/+1
The APIC headers are relatively complex and bring in additional header dependencies - while smp.h is a relatively simple header included from high level headers. Remove the dependency and add in the missing #include's in .c files where they gained it indirectly before. Signed-off-by: Ingo Molnar <[email protected]>
2020-06-13Merge tag 'x86-entry-2020-06-12' of ↵Linus Torvalds1-20/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry updates from Thomas Gleixner: "The x86 entry, exception and interrupt code rework This all started about 6 month ago with the attempt to move the Posix CPU timer heavy lifting out of the timer interrupt code and just have lockless quick checks in that code path. Trivial 5 patches. This unearthed an inconsistency in the KVM handling of task work and the review requested to move all of this into generic code so other architectures can share. Valid request and solved with another 25 patches but those unearthed inconsistencies vs. RCU and instrumentation. Digging into this made it obvious that there are quite some inconsistencies vs. instrumentation in general. The int3 text poke handling in particular was completely unprotected and with the batched update of trace events even more likely to expose to endless int3 recursion. In parallel the RCU implications of instrumenting fragile entry code came up in several discussions. The conclusion of the x86 maintainer team was to go all the way and make the protection against any form of instrumentation of fragile and dangerous code pathes enforcable and verifiable by tooling. A first batch of preparatory work hit mainline with commit d5f744f9a2ac ("Pull x86 entry code updates from Thomas Gleixner") That (almost) full solution introduced a new code section '.noinstr.text' into which all code which needs to be protected from instrumentation of all sorts goes into. Any call into instrumentable code out of this section has to be annotated. objtool has support to validate this. Kprobes now excludes this section fully which also prevents BPF from fiddling with it and all 'noinstr' annotated functions also keep ftrace off. The section, kprobes and objtool changes are already merged. The major changes coming with this are: - Preparatory cleanups - Annotating of relevant functions to move them into the noinstr.text section or enforcing inlining by marking them __always_inline so the compiler cannot misplace or instrument them. - Splitting and simplifying the idtentry macro maze so that it is now clearly separated into simple exception entries and the more interesting ones which use interrupt stacks and have the paranoid handling vs. CR3 and GS. - Move quite some of the low level ASM functionality into C code: - enter_from and exit to user space handling. The ASM code now calls into C after doing the really necessary ASM handling and the return path goes back out without bells and whistels in ASM. - exception entry/exit got the equivivalent treatment - move all IRQ tracepoints from ASM to C so they can be placed as appropriate which is especially important for the int3 recursion issue. - Consolidate the declaration and definition of entry points between 32 and 64 bit. They share a common header and macros now. - Remove the extra device interrupt entry maze and just use the regular exception entry code. - All ASM entry points except NMI are now generated from the shared header file and the corresponding macros in the 32 and 64 bit entry ASM. - The C code entry points are consolidated as well with the help of DEFINE_IDTENTRY*() macros. This allows to ensure at one central point that all corresponding entry points share the same semantics. The actual function body for most entry points is in an instrumentable and sane state. There are special macros for the more sensitive entry points, e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF. They allow to put the whole entry instrumentation and RCU handling into safe places instead of the previous pray that it is correct approach. - The INT3 text poke handling is now completely isolated and the recursion issue banned. Aside of the entry rework this required other isolation work, e.g. the ability to force inline bsearch. - Prevent #DB on fragile entry code, entry relevant memory and disable it on NMI, #MC entry, which allowed to get rid of the nested #DB IST stack shifting hackery. - A few other cleanups and enhancements which have been made possible through this and already merged changes, e.g. consolidating and further restricting the IDT code so the IDT table becomes RO after init which removes yet another popular attack vector - About 680 lines of ASM maze are gone. There are a few open issues: - An escape out of the noinstr section in the MCE handler which needs some more thought but under the aspect that MCE is a complete trainwreck by design and the propability to survive it is low, this was not high on the priority list. - Paravirtualization When PV is enabled then objtool complains about a bunch of indirect calls out of the noinstr section. There are a few straight forward ways to fix this, but the other issues vs. general correctness were more pressing than parawitz. - KVM KVM is inconsistent as well. Patches have been posted, but they have not yet been commented on or picked up by the KVM folks. - IDLE Pretty much the same problems can be found in the low level idle code especially the parts where RCU stopped watching. This was beyond the scope of the more obvious and exposable problems and is on the todo list. The lesson learned from this brain melting exercise to morph the evolved code base into something which can be validated and understood is that once again the violation of the most important engineering principle "correctness first" has caused quite a few people to spend valuable time on problems which could have been avoided in the first place. The "features first" tinkering mindset really has to stop. With that I want to say thanks to everyone involved in contributing to this effort. Special thanks go to the following people (alphabetical order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra, Vitaly Kuznetsov, and Will Deacon" * tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits) x86/entry: Force rcu_irq_enter() when in idle task x86/entry: Make NMI use IDTENTRY_RAW x86/entry: Treat BUG/WARN as NMI-like entries x86/entry: Unbreak __irqentry_text_start/end magic x86/entry: __always_inline CR2 for noinstr lockdep: __always_inline more for noinstr x86/entry: Re-order #DB handler to avoid *SAN instrumentation x86/entry: __always_inline arch_atomic_* for noinstr x86/entry: __always_inline irqflags for noinstr x86/entry: __always_inline debugreg for noinstr x86/idt: Consolidate idt functionality x86/idt: Cleanup trap_init() x86/idt: Use proper constants for table size x86/idt: Add comments about early #PF handling x86/idt: Mark init only functions __init x86/entry: Rename trace_hardirqs_off_prepare() x86/entry: Clarify irq_{enter,exit}_rcu() x86/entry: Remove DBn stacks x86/entry: Remove debug IDT frobbing x86/entry: Optimize local_db_save() for virt ...
2020-06-11Merge tag 'x86-urgent-2020-06-11' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 updates from Thomas Gleixner: "A set of fixes and updates for x86: - Unbreak paravirt VDSO clocks. While the VDSO code was moved into lib for sharing a subtle check for the validity of paravirt clocks got replaced. While the replacement works perfectly fine for bare metal as the update of the VDSO clock mode is synchronous, it fails for paravirt clocks because the hypervisor can invalidate them asynchronously. Bring it back as an optional function so it does not inflict this on architectures which are free of PV damage. - Fix the jiffies to jiffies64 mapping on 64bit so it does not trigger an ODR violation on newer compilers - Three fixes for the SSBD and *IB* speculation mitigation maze to ensure consistency, not disabling of some *IB* variants wrongly and to prevent a rogue cross process shutdown of SSBD. All marked for stable. - Add yet more CPU models to the splitlock detection capable list !@#%$! - Bring the pr_info() back which tells that TSC deadline timer is enabled. - Reboot quirk for MacBook6,1" * tag 'x86-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Unbreak paravirt VDSO clocks lib/vdso: Provide sanity check for cycles (again) clocksource: Remove obsolete ifdef x86_64: Fix jiffies ODR violation x86/speculation: PR_SPEC_FORCE_DISABLE enforcement for indirect branches. x86/speculation: Prevent rogue cross-process SSBD shutdown x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS. x86/cpu: Add Sapphire Rapids CPU model number x86/split_lock: Add Icelake microserver and Tigerlake CPU models x86/apic: Make TSC deadline timer detection message visible x86/reboot/quirks: Add MacBook6,1 reboot quirk
2020-06-11x86/entry: Convert APIC interrupts to IDTENTRY_SYSVECThomas Gleixner1-18/+5
Convert APIC interrupts to IDTENTRY_SYSVEC: - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC - Remove the ASM idtentries in 64-bit - Remove the BUILD_INTERRUPT entries in 32-bit - Remove the old prototypes No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-11x86/entry: Use idtentry for interruptsThomas Gleixner1-15/+8
Replace the extra interrupt handling code and reuse the existing idtentry machinery. This moves the irq stack switching on 64-bit from ASM to C code; 32-bit already does the stack switching in C. This requires to remove HAVE_IRQ_EXIT_ON_IRQ_STACK as the stack switch is not longer in the low level entry code. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-06-11x86/irq: Convey vector as argument and not in ptregsThomas Gleixner1-6/+25
Device interrupts which go through do_IRQ() or the spurious interrupt handler have their separate entry code on 64 bit for no good reason. Both 32 and 64 bit transport the vector number through ORIG_[RE]AX in pt_regs. Further the vector number is forced to fit into an u8 and is complemented and offset by 0x80 so it's in the signed character range. Otherwise GAS would expand the pushq to a 5 byte instruction for any vector > 0x7F. Treat the vector number like an error code and hand it to the C function as argument. This allows to get rid of the extra entry code in a later step. Simplify the error code push magic by implementing the pushq imm8 via a '.byte 0x6a, vector' sequence so GAS is not able to screw it up. As the pushq imm8 is sign extending the resulting error code needs to be truncated to 8 bits in C code. Originally-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-05-26x86/apic: Make TSC deadline timer detection message visibleBorislav Petkov1-1/+1
The commit c84cb3735fd5 ("x86/apic: Move TSC deadline timer debug printk") removed the message which said that the deadline timer was enabled. It added a pr_debug() message which is issued when deadline timer validation succeeds. Well, issued only when CONFIG_DYNAMIC_DEBUG is enabled - otherwise pr_debug() calls get optimized away if DEBUG is not defined in the compilation unit. Therefore, make the above message pr_info() so that it is visible in dmesg. Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-05-07x86/apic: Convert the TSC deadline timer matching to steppings macroBorislav Petkov1-45/+12
... and get rid of the function pointers which would spit out the microcode revision based on the CPU stepping. Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Mark Gross <mgross.linux.intel.com> Cc: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-05-01x86/apic: Move TSC deadline timer debug printkThomas Gleixner1-13/+14
Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a lockdep splat due to a lock order violation between hrtimer_base::lock and console_sem, when the 'once' condition is reset via /sys/kernel/debug/clear_warn_once after boot. The initial printk cannot trigger this because that happens during boot when the local APIC timer is set up on the boot CPU. Prevent it by moving the printk to a place which is guaranteed to be only called once during boot. Mark the deadline timer check related functions and data __init while at it. Reported-by: Leon Romanovsky <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-03-24x86/kernel: Convert to new CPU match macrosThomas Gleixner1-19/+13
The new macro set has a consistent namespace and uses C99 initializers instead of the grufty C89 ones. Get rid the of the local macro wrappers for consistency. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-02-07x86/apic: Mask IOAPIC entries when disabling the local APICTony W Wang-oc1-0/+7
When a system suspends, the local APIC is disabled in the suspend sequence, but the IOAPIC is left in the current state. This means unmasked interrupt lines stay unmasked. This is usually the case for IOAPIC pin 9 to which the ACPI interrupt is connected. That means that in suspended state the IOAPIC can respond to an external interrupt, e.g. the wakeup via keyboard/RTC/ACPI, but the interrupt message cannot be handled by the disabled local APIC. As a consequence the Remote IRR bit is set, but the local APIC does not send an EOI to acknowledge it. This causes the affected interrupt line to become stale and the stale Remote IRR bit will cause a hang when __synchronize_hardirq() is invoked for that interrupt line. To prevent this, mask all IOAPIC entries before disabling the local APIC. The resume code already has the unmask operation inside. [ tglx: Massaged changelog ] Signed-off-by: Tony W Wang-oc <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-29x86/timer: Don't skip PIT setup when APIC is disabled or in legacy modeThomas Gleixner1-5/+18
Tony reported a boot regression caused by the recent workaround for systems which have a disabled (clock gate off) PIT. On his machine the kernel fails to initialize the PIT because apic_needs_pit() does not take into account whether the local APIC interrupt delivery mode will actually allow to setup and use the local APIC timer. This should be easy to reproduce with acpi=off on the command line which also disables HPET. Due to the way the PIT/HPET and APIC setup ordering works (APIC setup can require working PIT/HPET) the information is not available at the point where apic_needs_pit() makes this decision. To address this, split out the interrupt mode selection from apic_intr_mode_init(), invoke the selection before making the decision whether PIT is required or not, and add the missing checks into apic_needs_pit(). Fixes: c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets") Reported-by: Anthony Buckley <[email protected]> Tested-by: Anthony Buckley <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Daniel Drake <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206125 Link: https://lore.kernel.org/r/[email protected]
2019-11-26Merge branches 'core-objtool-for-linus', 'x86-cleanups-for-linus' and ↵Linus Torvalds1-1/+1
'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 objtool, cleanup, and apic updates from Ingo Molnar: "Objtool: - Fix a gawk 5.0 incompatibility in gen-insn-attr-x86.awk. Most distros are still on gawk 4.2.x. Cleanup: - Misc cleanups, plus the removal of obsolete code such as Calgary IOMMU support, which code hasn't seen any real testing in a long time and there's no known users left. apic: - Two changes: a cleanup and a fix for an (old) race for oneshot threaded IRQ handlers" * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Fix awk regexp warnings * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove unused asm/rio.h x86: Fix typos in comments x86/pci: Remove #ifdef __KERNEL__ guard from <asm/pci.h> x86/pci: Remove pci_64.h x86: Remove the calgary IOMMU driver x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments x86/kdump: Remove the unused crash_copy_backup_region() x86/nmi: Remove stale EDAC include leftover * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Rename misnamed functions x86/ioapic: Prevent inconsistent state when moving an interrupt
2019-11-25Merge tag 'printk-for-5.5' of ↵Linus Torvalds1-22/+19
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow to print symbolic error names via new %pe modifier. - Use pr_warn() instead of the remaining pr_warning() calls. Fix formatting of the related lines. - Add VSPRINTF entry to MAINTAINERS. * tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits) checkpatch: don't warn about new vsprintf pointer extension '%pe' MAINTAINERS: Add VSPRINTF tools lib api: Renaming pr_warning to pr_warn ASoC: samsung: Use pr_warn instead of pr_warning lib: cpu_rmap: Use pr_warn instead of pr_warning trace: Use pr_warn instead of pr_warning dma-debug: Use pr_warn instead of pr_warning vgacon: Use pr_warn instead of pr_warning fs: afs: Use pr_warn instead of pr_warning sh/intc: Use pr_warn instead of pr_warning scsi: Use pr_warn instead of pr_warning platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning platform/x86: asus-laptop: Use pr_warn instead of pr_warning platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning oprofile: Use pr_warn instead of pr_warning of: Use pr_warn instead of pr_warning macintosh: Use pr_warn instead of pr_warning idsn: Use pr_warn instead of pr_warning ide: Use pr_warn instead of pr_warning crypto: n2: Use pr_warn instead of pr_warning ...
2019-11-05x86/apic/32: Avoid bogus LDR warningsJan Beulich1-13/+15
The removal of the LDR initialization in the bigsmp_32 APIC code unearthed a problem in setup_local_APIC(). The code checks unconditionally for a mismatch of the logical APIC id by comparing the early APIC id which was initialized in get_smp_config() with the actual LDR value in the APIC. Due to the removal of the bogus LDR initialization the check now can trigger on bigsmp_32 APIC systems emitting a warning for every booting CPU. This is of course a false positive because the APIC is not using logical destination mode. Restrict the check and the possibly resulting fixup to systems which are actually using the APIC in logical destination mode. [ tglx: Massaged changelog and added Cc stable ] Fixes: bae3a8d3308 ("x86/apic: Do not initialize LDR and DFR for bigsmp") Signed-off-by: Jan Beulich <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-10-27x86/apic, x86/uprobes: Correct parameter names in kernel-doc commentsYi Wang1-1/+1
Rename parameter names to the correct ones used in the function. No functional changes. [ bp: Merge two patches into a single one. ] Signed-off-by: Yi Wang <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-10-18x86: Use pr_warn instead of pr_warningKefeng Wang1-22/+19
As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of pr_warning"), removing pr_warning so all logging messages use a consistent <prefix>_warn style. Let's do it. Link: http://lkml.kernel.org/r/[email protected] To: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Robert Richter <[email protected]> Cc: Darren Hart <[email protected]> Cc: Andy Shevchenko <[email protected]> Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Petr Mladek <[email protected]>
2019-09-17Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds1-73/+115
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Thomas Gleixner: - Cleanup the apic IPI implementation by removing duplicated code and consolidating the functions into the APIC core. - Implement a safe variant of the IPI broadcast mode. Contrary to earlier attempts this uses the core tracking of which CPUs have been brought online at least once so that a broadcast does not end up in some dead end in BIOS/SMM code when the CPU is still waiting for init. Once all CPUs have been brought up once, IPI broadcasting is enabled. Before that regular one by one IPIs are issued. - Drop the paravirt CR8 related functions as they have no user anymore - Initialize the APIC TPR to block interrupt 16-31 as they are reserved for CPU exceptions and should never be raised by any well behaving device. - Emit a warning when vector space exhaustion breaks the admin set affinity of an interrupt. - Make sure to use the NMI fallback when shutdown via reboot vector IPI fails. The original code had conditions which prevent the code path to be reached. - Annotate various APIC config variables as RO after init. [ The ipi broadcase change came in earlier through the cpu hotplug branch, but I left the explanation in the commit message since it was shared between the two different branches - Linus ] * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) x86/apic/vector: Warn when vector space exhaustion breaks affinity x86/apic: Annotate global config variables as "read-only after init" x86/apic/x2apic: Implement IPI shorthands support x86/apic/flat64: Remove the IPI shorthand decision logic x86/apic: Share common IPI helpers x86/apic: Remove the shorthand decision logic x86/smp: Enhance native_send_call_func_ipi() x86/smp: Move smp_function_call implementations into IPI code x86/apic: Provide and use helper for send_IPI_allbutself() x86/apic: Add static key to Control IPI shorthands x86/apic: Move no_ipi_broadcast() out of 32bit x86/apic: Add NMI_VECTOR wait to IPI shorthand x86/apic: Remove dest argument from __default_send_IPI_shortcut() x86/hotplug: Silence APIC and NMI when CPU is dead x86/cpu: Move arch_smt_update() to a neutral place x86/apic/uv: Make x2apic_extra_bits static x86/apic: Consolidate the apic local headers x86/apic: Move apic_flat_64 header into apic directory x86/apic: Move ipi header into apic directory x86/apic: Cleanup the include maze ...