aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/apic/io_apic.c
AgeCommit message (Collapse)AuthorFilesLines
2024-08-07x86/ioapic: Cleanup remaining coding style issuesThomas Gleixner1-11/+12
Add missing new lines and reorder variable definitions. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup line breaksThomas Gleixner1-37/+18
80 character limit is history. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup bracket usageThomas Gleixner1-16/+18
Add brackets around if/for constructs as required by coding style or remove pointless line breaks to make it true single line statements which do not require brackets. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup commentsThomas Gleixner1-49/+37
Use proper comment styles and shrink comments to their scope where applicable. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Move replace_pin_at_irq_node() to the call siteThomas Gleixner1-22/+18
It's only used by check_timer(). Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup guarded debug printk()sThomas Gleixner1-38/+29
Cleanup the APIC printk()s which are inside of a apic verbosity guarded region by using apic_dbg() for the KERN_DEBUG level prints. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup apic_printk()sThomas Gleixner1-103/+73
Replace apic_printk($LEVEL) with the corresponding apic_pr_*() helpers and use pr_info() for APIC_QUIET as that is always printed so the indirection is pointless noise. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Use guard() for locking where applicableThomas Gleixner1-128/+64
KISS rules! Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup structsThomas Gleixner1-18/+14
Make them conforming to the TIP coding style guide. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Mark mp_alloc_timer_irq() __initThomas Gleixner1-2/+2
Only invoked from check_timer() which is __init too. Cleanup the variable declaration while at it. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Reviewed-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Handle allocation failures gracefullyThomas Gleixner1-24/+22
Breno observed panics when using failslab under certain conditions during runtime: can not alloc irq_pin_list (-1,0,20) Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed panic+0x4e9/0x590 mp_irqdomain_alloc+0x9ab/0xa80 irq_domain_alloc_irqs_locked+0x25d/0x8d0 __irq_domain_alloc_irqs+0x80/0x110 mp_map_pin_to_irq+0x645/0x890 acpi_register_gsi_ioapic+0xe6/0x150 hpet_open+0x313/0x480 That's a pointless panic which is a leftover of the historic IO/APIC code which panic'ed during early boot when the interrupt allocation failed. The only place which might justify panic is the PIT/HPET timer_check() code which tries to figure out whether the timer interrupt is delivered through the IO/APIC. But that code does not require to handle interrupt allocation failures. If the interrupt cannot be allocated then timer delivery fails and it either panics due to that or falls back to legacy mode. Cure this by removing the panic wrapper around __add_pin_to_irq_node() and making mp_irqdomain_alloc() aware of the failure condition and handle it as any other failure in this function gracefully. Reported-by: Breno Leitao <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Breno Leitao <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected]
2024-03-11Merge tag 'x86-apic-2024-03-10' of ↵Linus Torvalds1-56/+36
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Thomas Gleixner: "Rework of APIC enumeration and topology evaluation. The current implementation has a couple of shortcomings: - It fails to handle hybrid systems correctly. - The APIC registration code which handles CPU number assignents is in the middle of the APIC code and detached from the topology evaluation. - The various mechanisms which enumerate APICs, ACPI, MPPARSE and guest specific ones, tweak global variables as they see fit or in case of XENPV just hack around the generic mechanisms completely. - The CPUID topology evaluation code is sprinkled all over the vendor code and reevaluates global variables on every hotplug operation. - There is no way to analyze topology on the boot CPU before bringing up the APs. This causes problems for infrastructure like PERF which needs to size certain aspects upfront or could be simplified if that would be possible. - The APIC admission and CPU number association logic is incomprehensible and overly complex and needs to be kept around after boot instead of completing this right after the APIC enumeration. This update addresses these shortcomings with the following changes: - Rework the CPUID evaluation code so it is common for all vendors and provides information about the APIC ID segments in a uniform way independent of the number of segments (Thread, Core, Module, ..., Die, Package) so that this information can be computed instead of rewriting global variables of dubious value over and over. - A few cleanups and simplifcations of the APIC, IO/APIC and related interfaces to prepare for the topology evaluation changes. - Seperation of the parser stages so the early evaluation which tries to find the APIC address can be seperately overridden from the late evaluation which enumerates and registers the local APIC as further preparation for sanitizing the topology evaluation. - A new registration and admission logic which - encapsulates the inner workings so that parsers and guest logic cannot longer fiddle in it - uses the APIC ID segments to build topology bitmaps at registration time - provides a sane admission logic - allows to detect the crash kernel case, where CPU0 does not run on the real BSP, automatically. This is required to prevent sending INIT/SIPI sequences to the real BSP which would reset the whole machine. This was so far handled by a tedious command line parameter, which does not even work in nested crash scenarios. - Associates CPU number after the enumeration completed and prevents the late registration of APICs, which was somehow tolerated before. - Converting all parsers and guest enumeration mechanisms over to the new interfaces. This allows to get rid of all global variable tweaking from the parsers and enumeration mechanisms and sanitizes the XEN[PV] handling so it can use CPUID evaluation for the first time. - Mopping up existing sins by taking the information from the APIC ID segment bitmaps. This evaluates hybrid systems correctly on the boot CPU and allows for cleanups and fixes in the related drivers, e.g. PERF. The series has been extensively tested and the minimal late fallout due to a broken ACPI/MADT table has been addressed by tightening the admission logic further" * tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits) x86/topology: Ignore non-present APIC IDs in a present package x86/apic: Build the x86 topology enumeration functions on UP APIC builds too smp: Provide 'setup_max_cpus' definition on UP too smp: Avoid 'setup_max_cpus' namespace collision/shadowing x86/bugs: Use fixed addressing for VERW operand x86/cpu/topology: Get rid of cpuinfo::x86_max_cores x86/cpu/topology: Provide __num_[cores|threads]_per_package x86/cpu/topology: Rename topology_max_die_per_package() x86/cpu/topology: Rename smp_num_siblings x86/cpu/topology: Retrieve cores per package from topology bitmaps x86/cpu/topology: Use topology logical mapping mechanism x86/cpu/topology: Provide logical pkg/die mapping x86/cpu/topology: Simplify cpu_mark_primary_thread() x86/cpu/topology: Mop up primary thread mask handling x86/cpu/topology: Use topology bitmaps for sizing x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT x86/xen/smp_pv: Count number of vCPUs early x86/cpu/topology: Assign hotpluggable CPUIDs during init x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug x86/topology: Add a mechanism to track topology via APIC IDs ...
2024-02-25x86/apic/msi: Use DOMAIN_BUS_GENERIC_MSI for HPET/IO-APIC domain searchThomas Gleixner1-1/+1
The recent restriction to invoke irqdomain_ops::select() only when the domain bus token is not DOMAIN_BUS_ANY breaks the search for the parent MSI domain of HPET and IO-APIC. The latter causes a full boot fail. The restriction itself makes sense to avoid adding DOMAIN_BUS_ANY matches into the various ARM specific select() callbacks. Reverting this change would obviously break ARM platforms again and require DOMAIN_BUS_ANY matches added to various places. A simpler solution is to use the DOMAIN_BUS_GENERIC_MSI token for the HPET and IO-APIC parent domain search. This works out of the box because the affected parent domains check only for the firmware specification content and not for the bus token. Fixes: 5aa3c0cf5bba ("genirq/irqdomain: Don't call ops->select for DOMAIN_BUS_ANY tokens") Reported-by: Borislav Petkov (AMD) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/878r38cy8n.ffs@tglx
2024-02-15x86/mpparse: Remove the physid_t bitmap wrapperThomas Gleixner1-12/+12
physid_t is a wrapper around bitmap. Just remove the onion layer and use bitmap functionality directly. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/ioapic: Simplify setup_ioapic_ids_from_mpc_nocheck()Thomas Gleixner1-3/+2
No need to go through APIC callbacks. It's already established that this is an ancient APIC. So just copy the present mask and use the direct physid* functions all over the place. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/ioapic: Make io_apic_get_unique_id() simplerThomas Gleixner1-17/+5
No need to go through APIC callbacks. It's already established that this is an ancient APIC. So just copy the present mask and use the direct physid* functions all over the place. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/apic: Get rid of get_physical_broadcast()Thomas Gleixner1-27/+22
There is no point for this function. The only case where this is used is when there is no XAPIC available, which means the broadcast address is 0xF. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/ioapic: Replace some more set bit nonsenseThomas Gleixner1-4/+2
Yet another set_bit() operation wrapped in oring a mask. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/platform/ce4100: Dont override x86_init.mpparse.setup_ioapic_idsThomas Gleixner1-1/+1
There is no point to do that. The ATOMs have an XAPIC for which this function is a pointless exercise. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-23x86/ioapic: Remove unfinished sentence from commentAdrian Huang1-1/+1
[ mingo: Refine changelog. ] Signed-off-by: Adrian Huang <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: [email protected]
2023-08-09x86/apic: Nuke ack_APIC_irq()Dave Hansen1-2/+2
Yet another wrapper of a wrapper gone along with the outdated comment that this compiles to a single instruction. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Wei Liu <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
2023-08-09x86/ioapic/32: Decrapify phys_id_present_map operationThomas Gleixner1-4/+1
The operation to set the IOAPIC ID in phys_id_present_map is as convoluted as it can be. 1) Allocate a bitmap of 32byte size on the stack 2) Zero the bitmap and set the IOAPIC ID bit 3) Or the temporary bitmap over phys_id_present_map The same functionality can be achieved by setting the IOAPIC ID bit directly in the phys_id_present_map. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
2023-08-09x86/apic: Nuke apic::apicid_to_cpu_present()Thomas Gleixner1-6/+5
This is only used on 32bit and is a wrapper around physid_set_mask_of_physid() in all 32bit APIC drivers. Remove the callback and use physid_set_mask_of_physid() in the code directly, Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
2023-08-09x86/apic: Get rid of hard_smp_processor_id()Thomas Gleixner1-1/+1
No point in having a wrapper around read_apic_id(). Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
2023-08-09x86/apic/ioapic: Rename skip_ioapic_setupThomas Gleixner1-6/+6
Another variable name which is confusing at best. Convert to bool. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
2023-04-25Merge tag 'x86-apic-2023-04-24' of ↵Linus Torvalds1-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Thomas Gleixner: - Fix the incorrect handling of atomic offset updates in reserve_eilvt_offset() The check for the return value of atomic_cmpxchg() is not compared against the old value, it is compared against the new value, which makes it two round on success. Convert it to atomic_try_cmpxchg() which does the right thing. - Handle IO/APIC less systems correctly When IO/APIC is not advertised by ACPI then the computation of the lower bound for dynamically allocated interrupts like MSI goes wrong. This lower bound is used to exclude the IO/APIC legacy GSI space as that must stay reserved for the legacy interrupts. In case that the system, e.g. VM, does not advertise an IO/APIC the lower bound stays at 0. 0 is an invalid interrupt number except for the legacy timer interrupt on x86. The return value is unchecked in the core code, so it ends up to allocate interrupt number 0 which is subsequently considered to be invalid by the caller, e.g. the MSI allocation code. A similar problem was already cured for device tree based systems years ago, but that missed - or did not envision - the zero IO/APIC case. Consolidate the zero check and return the provided "from" argument to the core code call site, which is guaranteed to be greater than 0. - Simplify the X2APIC cluster CPU mask logic for CPU hotplug Per cluster CPU masks are required for X2APIC in cluster mode to determine the correct cluster for a target CPU when calculating the destination for IPIs These masks are established when CPUs are borught up. The first CPU in a cluster must allocate a new cluster CPU mask. As this happens during the early startup of a CPU, where memory allocations cannot be done, the mask has to be allocated by the control CPU. The current implementation allocates a clustermask just in case and if the to be brought up CPU is the first in a cluster the CPU takes over this allocation from a global pointer. This works nicely in the fully serialized CPU bringup scenario which is used today, but would fail completely for parallel bringup of CPUs. The cluster association of a CPU can be computed from the APIC ID which is enumerated by ACPI/MADT. So the cluster CPU masks can be preallocated and associated upfront and the upcoming CPUs just need to set their corresponding bit. Aside of preparing for parallel bringup this is a valuable simplification on its own. - Remove global variables which control the early startup of secondary CPUs on 64-bit The only information which is needed by a starting CPU is the Linux CPU number. The CPU number allows it to retrieve the rest of the required data from already existing per CPU storage. So instead of initial_stack, early_gdt_desciptor and initial_gs provide a new variable smpboot_control which contains the Linux CPU number for now. The starting CPU can retrieve and compute all required information for startup from there. Aside of being a cleanup, this is also preparing for parallel CPU bringup, where starting CPUs will look up their Linux CPU number via the APIC ID, when smpboot_control has the corresponding control bit set. - Make cc_vendor globally accesible Subsequent parallel bringup changes require access to cc_vendor because confidental computing platforms need special treatment in the early startup phase vs. CPUID and APCI ID readouts. The change makes cc_vendor global and provides stub accessors in case that CONFIG_ARCH_HAS_CC_PLATFORM is not set. This was merged from the x86/cc branch in anticipation of further parallel bringup commits which require access to cc_vendor. Due to late discoveries of fundamental issue with those patches these commits never happened. The merge commit is unfortunately in the middle of the APIC commits so unraveling it would have required a rebase or revert. As the parallel bringup seems to be well on its way for 6.5 this would be just pointless churn. As the commit does not contain any functional change it's not a risk to keep it. * tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Don't return 0 from arch_dynirq_lower_bound() x86/apic: Fix atomic update of offset in reserve_eilvt_offset() x86/coco: Export cc_vendor x86/smpboot: Reference count on smpboot_setup_warm_reset_vector() x86/smpboot: Remove initial_gs x86/smpboot: Remove early_gdt_descr on 64-bit x86/smpboot: Remove initial_stack on 64-bit x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
2023-04-12x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()Saurabh Sengar1-5/+9
arch_dynirq_lower_bound() is invoked by the core interrupt code to retrieve the lowest possible Linux interrupt number for dynamically allocated interrupts like MSI. The x86 implementation uses this to exclude the IO/APIC GSI space. This works correctly as long as there is an IO/APIC registered, but returns 0 if not. This has been observed in VMs where the BIOS does not advertise an IO/APIC. 0 is an invalid interrupt number except for the legacy timer interrupt on x86. The return value is unchecked in the core code, so it ends up to allocate interrupt number 0 which is subsequently considered to be invalid by the caller, e.g. the MSI allocation code. The function has already a check for 0 in the case that an IO/APIC is registered, as ioapic_dynirq_base is 0 in case of device tree setups. Consolidate this and zero check for both ioapic_dynirq_base and gsi_top, which is used in the case that no IO/APIC is registered. Fixes: 3e5bedc2c258 ("x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines") Signed-off-by: Saurabh Sengar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-26x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VMMichael Kelley1-2/+8
Current code always maps MMIO devices as shared (decrypted) in a confidential computing VM. But Hyper-V guest VMs on AMD SEV-SNP with vTOM use a paravisor running in VMPL0 to emulate some devices, such as the IO-APIC and TPM. In such a case, the device must be accessed as private (encrypted) because the paravisor emulates the device at an address below vTOM, where all accesses are encrypted. Add a new hypervisor callback to determine if an MMIO address should be mapped private. The callback allows hypervisor-specific code to handle any quirks, the use of a paravisor, etc. in determining whether a mapping must be private. If the callback is not used by a hypervisor, default to returning "false", which is consistent with normal coco VM behavior. Use this callback as another special case to check for when doing ioremap(). Just checking the starting address is sufficient as an ioremap range must be all private or all shared. Also make the callback in early boot IO-APIC mapping code that uses the fixmap. [ bp: Touchups. ] Signed-off-by: Michael Kelley <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-13x86/ioapic: Use irq_domain_create_hierarchy()Johan Hovold1-5/+2
Use the irq_domain_create_hierarchy() helper to create the hierarchical domain, which both serves as documentation and avoids poking at irqdomain internals. Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Tested-by: Hsin-Yi Wang <[email protected]> Tested-by: Mark-PK Tsai <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-04-07x86/tdx/ioapic: Add shared bit for IOAPIC base addressIsaku Yamahata1-2/+16
The kernel interacts with each bare-metal IOAPIC with a special MMIO page. When running under KVM, the guest's IOAPICs are emulated by KVM. When running as a TDX guest, the guest needs to mark each IOAPIC mapping as "shared" with the host. This ensures that TDX private protections are not applied to the page, which allows the TDX host emulation to work. ioremap()-created mappings such as virtio will be marked as shared by default. However, the IOAPIC code does not use ioremap() and instead uses the fixmap mechanism. Introduce a special fixmap helper just for the IOAPIC code. Ensure that it marks IOAPIC pages as "shared". This replaces set_fixmap_nocache() with __set_fixmap() since __set_fixmap() allows custom 'prot' values. AMD SEV gets IOAPIC pages shared because FIXMAP_PAGE_NOCACHE has _ENC bit clear. TDX has to set bit to share the page with the host. Signed-off-by: Isaku Yamahata <[email protected]> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Reviewed-by: Tony Luck <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-30Merge tag 'x86-irq-2021-08-30' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PIRQ updates from Thomas Gleixner: "A set of updates to support port 0x22/0x23 based PCI configuration space which can be found on various ALi chipsets and is also available on older Intel systems which expose a PIRQ router. While the Intel support is more or less nostalgia, the ALi chips are still in use on popular embedded boards used for routers" * tag 'x86-irq-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix typo s/ECLR/ELCR/ for the PIC register x86: Avoid magic number with ELCR register accesses x86/PCI: Add support for the Intel 82426EX PIRQ router x86/PCI: Add support for the Intel 82374EB/82374SB (ESC) PIRQ router x86/PCI: Add support for the ALi M1487 (IBC) PIRQ router x86: Add support for 0x22/0x23 port I/O configuration space
2021-08-10x86: Avoid magic number with ELCR register accessesMaciej W. Rozycki1-1/+1
Define PIC_ELCR1 and PIC_ELCR2 macros for accesses to the ELCR registers implemented by many chipsets in their embedded 8259A PIC cores, avoiding magic numbers that are difficult to handle, and complementing the macros we already have for registers originally defined with discrete 8259A PIC implementations. No functional change. Signed-off-by: Maciej W. Rozycki <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-10x86/ioapic: Force affinity setup before startupThomas Gleixner1-2/+4
The IO/APIC cannot handle interrupt affinity changes safely after startup other than from an interrupt handler. The startup sequence in the generic interrupt code violates that assumption. Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Marc Zyngier <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2021-03-21Merge branch 'linus' into x86/cleanups, to resolve conflictIngo Molnar1-0/+10
Conflicts: arch/x86/kernel/kprobes/ftrace.c Signed-off-by: Ingo Molnar <[email protected]>
2021-03-19x86/ioapic: Ignore IRQ2 againThomas Gleixner1-0/+10
Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where the matrix allocator claimed to be out of vectors. He analyzed it down to the point that IRQ2, the PIC cascade interrupt, which is supposed to be not ever routed to the IO/APIC ended up having an interrupt vector assigned which got moved during unplug of CPU0. The underlying issue is that IRQ2 for various reasons (see commit af174783b925 ("x86: I/O APIC: Never configure IRQ2" for details) is treated as a reserved system vector by the vector core code and is not accounted as a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2 which causes the IO/APIC setup to claim that interrupt which is granted by the vector domain because there is no sanity check. As a consequence the allocation counter of CPU0 underflows which causes a subsequent unplug to fail with: [ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU There is another sanity check missing in the matrix allocator, but the underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic during the conversion to irqdomains. For almost 6 years nobody complained about this wreckage, which might indicate that this requirement could be lifted, but for any system which actually has a PIC IRQ2 is unusable by design so any routing entry has no effect and the interrupt cannot be connected to a device anyway. Due to that and due to history biased paranoia reasons restore the IRQ2 ignore logic and treat it as non existent despite a routing entry claiming otherwise. Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Reported-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2021-03-18x86: Fix various typos in commentsIngo Molnar1-4/+4
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: [email protected]
2021-02-15sfi: Remove framework for deprecated firmwareAndy Shevchenko1-2/+2
SFI-based platforms are gone. So does this framework. This removes mention of SFI through the drivers and other code as well. Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Acked-by: Linus Walleij <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2020-12-10x86/ioapic: Cleanup the timer_works() irqflags messThomas Gleixner1-16/+6
Mark tripped over the creative irqflags handling in the IO-APIC timer delivery check which ends up doing: local_irq_save(flags); local_irq_enable(); local_irq_restore(flags); which triggered a new consistency check he's working on required for replacing the POPF based restore with a conditional STI. That code is a historical mess and none of this is needed. Make it straightforward use local_irq_disable()/enable() as that's all what is required. It is invoked from interrupt enabled code nowadays. Reported-by: Mark Rutland <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-10x86/ioapic: Correct the PCI/ISA trigger type selectionThomas Gleixner1-2/+2
PCI's default trigger type is level and ISA's is edge. The recent refactoring made it the other way round, which went unnoticed as it seems only to cause havoc on some AMD systems. Make the comment and code do the right thing again. Fixes: a27dca645d2c ("x86/io_apic: Cleanup trigger/polarity helpers") Reported-by: Tom Lendacky <[email protected]> Reported-by: Borislav Petkov <[email protected]> Reported-by: Qian Cai <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Tom Lendacky <[email protected]> Cc: David Woodhouse <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-04x86/ioapic: Use I/O-APIC ID for finding irqdomain, not indexDavid Woodhouse1-2/+2
In commit b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain") the I/O-APIC code was changed to find its parent irqdomain using irq_find_matching_fwspec(), but the key used for the lookup was wrong. It shouldn't use 'ioapic' which is the index into its own ioapics[] array. It should use the actual arbitration ID of the I/O-APIC in question, which is mpc_ioapic_id(ioapic). Fixes: b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain") Reported-by: lkp <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Handle Extended Destination ID field in RTEDavid Woodhouse1-5/+15
Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4 of the address used in the resulting MSI cycle. Historically, the x86 MSI format only used the top 8 of those 16 bits as the destination APIC ID, and the "Extended Destination ID" in the lower 8 bits was unused. With interrupt remapping, the lowest bit of the Extended Destination ID (bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable format MSI. A hypervisor can use the other 7 bits of the Extended Destination ID to permit guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs before having to expose a vIOMMU and interrupt remapping to the guest. No behavioural change in this patch, since nothing yet permits APIC IDs above 255 to be used with the non-IR I/OAPIC domain. [ tglx: Converted it to the cleaned up entry/msi_msg format and added commentry ] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomainDavid Woodhouse1-12/+13
All possible parent domains have a select method now. Make use of it. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Generate RTE directly from parent irqchip's MSI messageDavid Woodhouse1-31/+72
The I/O-APIC generates an MSI cycle with address/data bits taken from its Redirection Table Entry in some combination which used to make sense, but now is just a bunch of bits which get passed through in some seemingly arbitrary order. Instead of making IRQ remapping drivers directly frob the I/OA-PIC RTE, let them just do their job and generate an MSI message. The bit swizzling to turn that MSI message into the I/O-APIC's RTE is the same in all cases, since it's a function of the I/O-APIC hardware. The IRQ remappers have no real need to get involved with that. The only slight caveat is that the I/OAPIC is interpreting some of those fields too, and it does want the 'vector' field to be unique to make EOI work. The AMD IOMMU happens to put its IRTE index in the bits that the I/O-APIC thinks are the vector field, and accommodates this requirement by reserving the first 32 indices for the I/O-APIC. The Intel IOMMU doesn't actually use the bits that the I/O-APIC thinks are the vector field, so it fills in the 'pin' value there instead. [ tglx: Replaced the unreadably macro maze with the cleaned up RTE/msi_msg bitfields and added commentry to explain the mapping magic ] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Cleanup IO/APIC route entry structsThomas Gleixner1-79/+65
Having two seperate structs for the I/O-APIC RTE entries (non-remapped and DMAR remapped) requires type casts and makes it hard to map. Combine them in IO_APIC_routing_entry by defining a union of two 64bit bitfields. Use naming which reflects which bits are shared and which bits are actually different for the operating modes. [dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for register access into the same union and eliminating a few more places where bits were accessed through masks and shifts.] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/io_apic: Cleanup trigger/polarity helpersThomas Gleixner1-129/+115
'trigger' and 'polarity' are used throughout the I/O-APIC code for handling the trigger type (edge/level) and the active low/high configuration. While there are defines for initializing these variables and struct members, they are not used consequently and the meaning of 'trigger' and 'polarity' is opaque and confusing at best. Rename them to 'is_level' and 'active_low' and make them boolean in various structs so it's entirely clear what the meaning is. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/apic: Cleanup destination modeThomas Gleixner1-1/+1
apic::irq_dest_mode is actually a boolean, but defined as u32 and named in a way which does not explain what it means. Make it a boolean and rename it to 'dest_mode_logical' Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/apic: Cleanup delivery mode definesThomas Gleixner1-5/+6
The enum ioapic_irq_destination_types and the enumerated constants starting with 'dest_' are gross misnomers because they describe the delivery mode. Rename then enum and the constants so they actually make sense. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-12Merge tag 'x86-irq-2020-10-12' of ↵Linus Torvalds1-37/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq updates from Thomas Gleixner: "Surgery of the MSI interrupt handling to prepare the support of upcoming devices which require non-PCI based MSI handling: - Cleanup historical leftovers all over the place - Rework the code to utilize more core functionality - Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain assignment to PCI devices possible. - Assign irqdomains to PCI devices at initialization time which allows to utilize the full functionality of hierarchical irqdomains. - Remove arch_.*_msi_irq() functions from X86 and utilize the irqdomain which is assigned to the device for interrupt management. - Make the arch_.*_msi_irq() support conditional on a config switch and let the last few users select it" * tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS x86/apic/msi: Unbreak DMAR and HPET MSI iommu/amd: Remove domain search for PCI/MSI iommu/vt-d: Remove domain search for PCI/MSI[X] x86/irq: Make most MSI ops XEN private x86/irq: Cleanup the arch_*_msi_irqs() leftovers PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable x86/pci: Set default irq domain in pcibios_add_device() iommm/amd: Store irq domain in struct device iommm/vt-d: Store irq domain in struct device x86/xen: Wrap XEN MSI management into irqdomain irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() x86/xen: Consolidate XEN-MSI init x86/xen: Rework MSI teardown x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() PCI/MSI: Provide pci_dev_has_special_msi_domain() helper PCI_vmd_Mark_VMD_irqdomain_with_DOMAIN_BUS_VMD_MSI irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI x86/irq: Initialize PCI/MSI domain at PCI init time x86/pci: Reducde #ifdeffery in PCI init code ...
2020-09-23x86/ioapic: Unbreak check_timer()Thomas Gleixner1-0/+1
Several people reported in the kernel bugzilla that between v4.12 and v4.13 the magic which works around broken hardware and BIOSes to find the proper timer interrupt delivery mode stopped working for some older affected platforms which need to fall back to ExtINT delivery mode. The reason is that the core code changed to keep track of the masked and disabled state of an interrupt line more accurately to avoid the expensive hardware operations. That broke an assumption in i8259_make_irq() which invokes disable_irq_nosync(); irq_set_chip_and_handler(); enable_irq(); Up to v4.12 this worked because enable_irq() unconditionally unmasked the interrupt line, but after the state tracking improvements this is not longer the case because the IO/APIC uses lazy disabling. So the line state is unmasked which means that enable_irq() does not call into the new irq chip to unmask it. In principle this is a shortcoming of the core code, but it's more than unclear whether the core code should try to reset state. At least this cannot be done unconditionally as that would break other existing use cases where the chip type is changed, e.g. when changing the trigger type, but the callers expect the state to be preserved. As the way how check_timer() is switching the delivery modes is truly unique, the obvious fix is to simply unmask the i8259 manually after changing the mode to ExtINT delivery and switching the irq chip to the legacy PIC. Note, that the fixes tag is not really precise, but identifies the commit which broke the assumptions in the IO/APIC and i8259 code and that's the kernel version to which this needs to be backported. Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls") Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: [email protected] Tested-by: [email protected] Cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=197769
2020-09-16x86_ioapic_Consolidate_IOAPIC_allocationThomas Gleixner1-35/+35
Move the IOAPIC specific fields into their own struct and reuse the common devid. Get rid of the #ifdeffery as it does not matter at all whether the alloc info is a couple of bytes longer or not. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Joerg Roedel <[email protected]> Link: https://lore.kernel.org/r/[email protected]