Age | Commit message (Collapse) | Author | Files | Lines |
|
Add missing new lines and reorder variable definitions.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
80 character limit is history.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Add brackets around if/for constructs as required by coding style or remove
pointless line breaks to make it true single line statements which do not
require brackets.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Use proper comment styles and shrink comments to their scope where
applicable.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
It's only used by check_timer().
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Cleanup the APIC printk()s which are inside of a apic verbosity guarded
region by using apic_dbg() for the KERN_DEBUG level prints.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Replace apic_printk($LEVEL) with the corresponding apic_pr_*() helpers and
use pr_info() for APIC_QUIET as that is always printed so the indirection
is pointless noise.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
KISS rules!
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Make them conforming to the TIP coding style guide.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Only invoked from check_timer() which is __init too. Cleanup the variable
declaration while at it.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Reviewed-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Breno observed panics when using failslab under certain conditions during
runtime:
can not alloc irq_pin_list (-1,0,20)
Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed
panic+0x4e9/0x590
mp_irqdomain_alloc+0x9ab/0xa80
irq_domain_alloc_irqs_locked+0x25d/0x8d0
__irq_domain_alloc_irqs+0x80/0x110
mp_map_pin_to_irq+0x645/0x890
acpi_register_gsi_ioapic+0xe6/0x150
hpet_open+0x313/0x480
That's a pointless panic which is a leftover of the historic IO/APIC code
which panic'ed during early boot when the interrupt allocation failed.
The only place which might justify panic is the PIT/HPET timer_check() code
which tries to figure out whether the timer interrupt is delivered through
the IO/APIC. But that code does not require to handle interrupt allocation
failures. If the interrupt cannot be allocated then timer delivery fails
and it either panics due to that or falls back to legacy mode.
Cure this by removing the panic wrapper around __add_pin_to_irq_node() and
making mp_irqdomain_alloc() aware of the failure condition and handle it as
any other failure in this function gracefully.
Reported-by: Breno Leitao <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Tested-by: Qiuxu Zhuo <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
"Rework of APIC enumeration and topology evaluation.
The current implementation has a couple of shortcomings:
- It fails to handle hybrid systems correctly.
- The APIC registration code which handles CPU number assignents is
in the middle of the APIC code and detached from the topology
evaluation.
- The various mechanisms which enumerate APICs, ACPI, MPPARSE and
guest specific ones, tweak global variables as they see fit or in
case of XENPV just hack around the generic mechanisms completely.
- The CPUID topology evaluation code is sprinkled all over the vendor
code and reevaluates global variables on every hotplug operation.
- There is no way to analyze topology on the boot CPU before bringing
up the APs. This causes problems for infrastructure like PERF which
needs to size certain aspects upfront or could be simplified if
that would be possible.
- The APIC admission and CPU number association logic is
incomprehensible and overly complex and needs to be kept around
after boot instead of completing this right after the APIC
enumeration.
This update addresses these shortcomings with the following changes:
- Rework the CPUID evaluation code so it is common for all vendors
and provides information about the APIC ID segments in a uniform
way independent of the number of segments (Thread, Core, Module,
..., Die, Package) so that this information can be computed instead
of rewriting global variables of dubious value over and over.
- A few cleanups and simplifcations of the APIC, IO/APIC and related
interfaces to prepare for the topology evaluation changes.
- Seperation of the parser stages so the early evaluation which tries
to find the APIC address can be seperately overridden from the late
evaluation which enumerates and registers the local APIC as further
preparation for sanitizing the topology evaluation.
- A new registration and admission logic which
- encapsulates the inner workings so that parsers and guest logic
cannot longer fiddle in it
- uses the APIC ID segments to build topology bitmaps at
registration time
- provides a sane admission logic
- allows to detect the crash kernel case, where CPU0 does not run
on the real BSP, automatically. This is required to prevent
sending INIT/SIPI sequences to the real BSP which would reset
the whole machine. This was so far handled by a tedious command
line parameter, which does not even work in nested crash
scenarios.
- Associates CPU number after the enumeration completed and
prevents the late registration of APICs, which was somehow
tolerated before.
- Converting all parsers and guest enumeration mechanisms over to the
new interfaces.
This allows to get rid of all global variable tweaking from the
parsers and enumeration mechanisms and sanitizes the XEN[PV]
handling so it can use CPUID evaluation for the first time.
- Mopping up existing sins by taking the information from the APIC ID
segment bitmaps.
This evaluates hybrid systems correctly on the boot CPU and allows
for cleanups and fixes in the related drivers, e.g. PERF.
The series has been extensively tested and the minimal late fallout
due to a broken ACPI/MADT table has been addressed by tightening the
admission logic further"
* tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
x86/topology: Ignore non-present APIC IDs in a present package
x86/apic: Build the x86 topology enumeration functions on UP APIC builds too
smp: Provide 'setup_max_cpus' definition on UP too
smp: Avoid 'setup_max_cpus' namespace collision/shadowing
x86/bugs: Use fixed addressing for VERW operand
x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
x86/cpu/topology: Provide __num_[cores|threads]_per_package
x86/cpu/topology: Rename topology_max_die_per_package()
x86/cpu/topology: Rename smp_num_siblings
x86/cpu/topology: Retrieve cores per package from topology bitmaps
x86/cpu/topology: Use topology logical mapping mechanism
x86/cpu/topology: Provide logical pkg/die mapping
x86/cpu/topology: Simplify cpu_mark_primary_thread()
x86/cpu/topology: Mop up primary thread mask handling
x86/cpu/topology: Use topology bitmaps for sizing
x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT
x86/xen/smp_pv: Count number of vCPUs early
x86/cpu/topology: Assign hotpluggable CPUIDs during init
x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug
x86/topology: Add a mechanism to track topology via APIC IDs
...
|
|
The recent restriction to invoke irqdomain_ops::select() only when the
domain bus token is not DOMAIN_BUS_ANY breaks the search for the parent MSI
domain of HPET and IO-APIC. The latter causes a full boot fail.
The restriction itself makes sense to avoid adding DOMAIN_BUS_ANY matches
into the various ARM specific select() callbacks. Reverting this change
would obviously break ARM platforms again and require DOMAIN_BUS_ANY
matches added to various places.
A simpler solution is to use the DOMAIN_BUS_GENERIC_MSI token for the HPET
and IO-APIC parent domain search. This works out of the box because the
affected parent domains check only for the firmware specification content
and not for the bus token.
Fixes: 5aa3c0cf5bba ("genirq/irqdomain: Don't call ops->select for DOMAIN_BUS_ANY tokens")
Reported-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/878r38cy8n.ffs@tglx
|
|
physid_t is a wrapper around bitmap. Just remove the onion layer and use
bitmap functionality directly.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
No need to go through APIC callbacks. It's already established that this is
an ancient APIC. So just copy the present mask and use the direct physid*
functions all over the place.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
No need to go through APIC callbacks. It's already established that this is
an ancient APIC. So just copy the present mask and use the direct physid*
functions all over the place.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
There is no point for this function. The only case where this is used is
when there is no XAPIC available, which means the broadcast address is 0xF.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Yet another set_bit() operation wrapped in oring a mask.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
There is no point to do that. The ATOMs have an XAPIC for which this
function is a pointless exercise.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
[ mingo: Refine changelog. ]
Signed-off-by: Adrian Huang <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: [email protected]
|
|
Yet another wrapper of a wrapper gone along with the outdated comment
that this compiles to a single instruction.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
|
|
The operation to set the IOAPIC ID in phys_id_present_map is as convoluted
as it can be.
1) Allocate a bitmap of 32byte size on the stack
2) Zero the bitmap and set the IOAPIC ID bit
3) Or the temporary bitmap over phys_id_present_map
The same functionality can be achieved by setting the IOAPIC ID bit
directly in the phys_id_present_map.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
|
|
This is only used on 32bit and is a wrapper around
physid_set_mask_of_physid() in all 32bit APIC drivers.
Remove the callback and use physid_set_mask_of_physid() in the code
directly,
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
|
|
No point in having a wrapper around read_apic_id().
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
|
|
Another variable name which is confusing at best. Convert to bool.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
- Fix the incorrect handling of atomic offset updates in
reserve_eilvt_offset()
The check for the return value of atomic_cmpxchg() is not compared
against the old value, it is compared against the new value, which
makes it two round on success.
Convert it to atomic_try_cmpxchg() which does the right thing.
- Handle IO/APIC less systems correctly
When IO/APIC is not advertised by ACPI then the computation of the
lower bound for dynamically allocated interrupts like MSI goes wrong.
This lower bound is used to exclude the IO/APIC legacy GSI space as
that must stay reserved for the legacy interrupts.
In case that the system, e.g. VM, does not advertise an IO/APIC the
lower bound stays at 0.
0 is an invalid interrupt number except for the legacy timer
interrupt on x86. The return value is unchecked in the core code, so
it ends up to allocate interrupt number 0 which is subsequently
considered to be invalid by the caller, e.g. the MSI allocation code.
A similar problem was already cured for device tree based systems
years ago, but that missed - or did not envision - the zero IO/APIC
case.
Consolidate the zero check and return the provided "from" argument to
the core code call site, which is guaranteed to be greater than 0.
- Simplify the X2APIC cluster CPU mask logic for CPU hotplug
Per cluster CPU masks are required for X2APIC in cluster mode to
determine the correct cluster for a target CPU when calculating the
destination for IPIs
These masks are established when CPUs are borught up. The first CPU
in a cluster must allocate a new cluster CPU mask. As this happens
during the early startup of a CPU, where memory allocations cannot be
done, the mask has to be allocated by the control CPU.
The current implementation allocates a clustermask just in case and
if the to be brought up CPU is the first in a cluster the CPU takes
over this allocation from a global pointer.
This works nicely in the fully serialized CPU bringup scenario which
is used today, but would fail completely for parallel bringup of
CPUs.
The cluster association of a CPU can be computed from the APIC ID
which is enumerated by ACPI/MADT.
So the cluster CPU masks can be preallocated and associated upfront
and the upcoming CPUs just need to set their corresponding bit.
Aside of preparing for parallel bringup this is a valuable
simplification on its own.
- Remove global variables which control the early startup of secondary
CPUs on 64-bit
The only information which is needed by a starting CPU is the Linux
CPU number. The CPU number allows it to retrieve the rest of the
required data from already existing per CPU storage.
So instead of initial_stack, early_gdt_desciptor and initial_gs
provide a new variable smpboot_control which contains the Linux CPU
number for now. The starting CPU can retrieve and compute all
required information for startup from there.
Aside of being a cleanup, this is also preparing for parallel CPU
bringup, where starting CPUs will look up their Linux CPU number via
the APIC ID, when smpboot_control has the corresponding control bit
set.
- Make cc_vendor globally accesible
Subsequent parallel bringup changes require access to cc_vendor
because confidental computing platforms need special treatment in the
early startup phase vs. CPUID and APCI ID readouts.
The change makes cc_vendor global and provides stub accessors in case
that CONFIG_ARCH_HAS_CC_PLATFORM is not set.
This was merged from the x86/cc branch in anticipation of further
parallel bringup commits which require access to cc_vendor. Due to
late discoveries of fundamental issue with those patches these
commits never happened.
The merge commit is unfortunately in the middle of the APIC commits
so unraveling it would have required a rebase or revert. As the
parallel bringup seems to be well on its way for 6.5 this would be
just pointless churn. As the commit does not contain any functional
change it's not a risk to keep it.
* tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()
x86/apic: Fix atomic update of offset in reserve_eilvt_offset()
x86/coco: Export cc_vendor
x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
x86/smpboot: Remove initial_gs
x86/smpboot: Remove early_gdt_descr on 64-bit
x86/smpboot: Remove initial_stack on 64-bit
x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
|
|
arch_dynirq_lower_bound() is invoked by the core interrupt code to
retrieve the lowest possible Linux interrupt number for dynamically
allocated interrupts like MSI.
The x86 implementation uses this to exclude the IO/APIC GSI space.
This works correctly as long as there is an IO/APIC registered, but
returns 0 if not. This has been observed in VMs where the BIOS does
not advertise an IO/APIC.
0 is an invalid interrupt number except for the legacy timer interrupt
on x86. The return value is unchecked in the core code, so it ends up
to allocate interrupt number 0 which is subsequently considered to be
invalid by the caller, e.g. the MSI allocation code.
The function has already a check for 0 in the case that an IO/APIC is
registered, as ioapic_dynirq_base is 0 in case of device tree setups.
Consolidate this and zero check for both ioapic_dynirq_base and gsi_top,
which is used in the case that no IO/APIC is registered.
Fixes: 3e5bedc2c258 ("x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines")
Signed-off-by: Saurabh Sengar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Current code always maps MMIO devices as shared (decrypted) in a
confidential computing VM. But Hyper-V guest VMs on AMD SEV-SNP with vTOM
use a paravisor running in VMPL0 to emulate some devices, such as the
IO-APIC and TPM. In such a case, the device must be accessed as private
(encrypted) because the paravisor emulates the device at an address below
vTOM, where all accesses are encrypted.
Add a new hypervisor callback to determine if an MMIO address should
be mapped private. The callback allows hypervisor-specific code to handle
any quirks, the use of a paravisor, etc. in determining whether a mapping
must be private. If the callback is not used by a hypervisor, default
to returning "false", which is consistent with normal coco VM behavior.
Use this callback as another special case to check for when doing
ioremap(). Just checking the starting address is sufficient as an
ioremap range must be all private or all shared.
Also make the callback in early boot IO-APIC mapping code that uses the
fixmap.
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Use the irq_domain_create_hierarchy() helper to create the hierarchical
domain, which both serves as documentation and avoids poking at
irqdomain internals.
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Hsin-Yi Wang <[email protected]>
Tested-by: Mark-PK Tsai <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The kernel interacts with each bare-metal IOAPIC with a special
MMIO page. When running under KVM, the guest's IOAPICs are
emulated by KVM.
When running as a TDX guest, the guest needs to mark each IOAPIC
mapping as "shared" with the host. This ensures that TDX private
protections are not applied to the page, which allows the TDX host
emulation to work.
ioremap()-created mappings such as virtio will be marked as
shared by default. However, the IOAPIC code does not use ioremap() and
instead uses the fixmap mechanism.
Introduce a special fixmap helper just for the IOAPIC code. Ensure
that it marks IOAPIC pages as "shared". This replaces
set_fixmap_nocache() with __set_fixmap() since __set_fixmap()
allows custom 'prot' values.
AMD SEV gets IOAPIC pages shared because FIXMAP_PAGE_NOCACHE has _ENC
bit clear. TDX has to set bit to share the page with the host.
Signed-off-by: Isaku Yamahata <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PIRQ updates from Thomas Gleixner:
"A set of updates to support port 0x22/0x23 based PCI configuration
space which can be found on various ALi chipsets and is also available
on older Intel systems which expose a PIRQ router.
While the Intel support is more or less nostalgia, the ALi chips are
still in use on popular embedded boards used for routers"
* tag 'x86-irq-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix typo s/ECLR/ELCR/ for the PIC register
x86: Avoid magic number with ELCR register accesses
x86/PCI: Add support for the Intel 82426EX PIRQ router
x86/PCI: Add support for the Intel 82374EB/82374SB (ESC) PIRQ router
x86/PCI: Add support for the ALi M1487 (IBC) PIRQ router
x86: Add support for 0x22/0x23 port I/O configuration space
|
|
Define PIC_ELCR1 and PIC_ELCR2 macros for accesses to the ELCR registers
implemented by many chipsets in their embedded 8259A PIC cores, avoiding
magic numbers that are difficult to handle, and complementing the macros
we already have for registers originally defined with discrete 8259A PIC
implementations. No functional change.
Signed-off-by: Maciej W. Rozycki <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The IO/APIC cannot handle interrupt affinity changes safely after startup
other than from an interrupt handler. The startup sequence in the generic
interrupt code violates that assumption.
Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
the default interrupt setting happens before the interrupt is started up
for the first time.
Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)")
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Marc Zyngier <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
Conflicts:
arch/x86/kernel/kprobes/ftrace.c
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where
the matrix allocator claimed to be out of vectors. He analyzed it down to
the point that IRQ2, the PIC cascade interrupt, which is supposed to be not
ever routed to the IO/APIC ended up having an interrupt vector assigned
which got moved during unplug of CPU0.
The underlying issue is that IRQ2 for various reasons (see commit
af174783b925 ("x86: I/O APIC: Never configure IRQ2" for details) is treated
as a reserved system vector by the vector core code and is not accounted as
a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2
which causes the IO/APIC setup to claim that interrupt which is granted by
the vector domain because there is no sanity check. As a consequence the
allocation counter of CPU0 underflows which causes a subsequent unplug to
fail with:
[ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU
There is another sanity check missing in the matrix allocator, but the
underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic
during the conversion to irqdomains.
For almost 6 years nobody complained about this wreckage, which might
indicate that this requirement could be lifted, but for any system which
actually has a PIC IRQ2 is unusable by design so any routing entry has no
effect and the interrupt cannot be connected to a device anyway.
Due to that and due to history biased paranoia reasons restore the IRQ2
ignore logic and treat it as non existent despite a routing entry claiming
otherwise.
Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Vitaly Kuznetsov <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
Fix ~144 single-word typos in arch/x86/ code comments.
Doing this in a single commit should reduce the churn.
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: [email protected]
|
|
SFI-based platforms are gone. So does this framework.
This removes mention of SFI through the drivers and other code as well.
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Acked-by: Linus Walleij <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Mark tripped over the creative irqflags handling in the IO-APIC timer
delivery check which ends up doing:
local_irq_save(flags);
local_irq_enable();
local_irq_restore(flags);
which triggered a new consistency check he's working on required for
replacing the POPF based restore with a conditional STI.
That code is a historical mess and none of this is needed. Make it
straightforward use local_irq_disable()/enable() as that's all what is
required. It is invoked from interrupt enabled code nowadays.
Reported-by: Mark Rutland <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
PCI's default trigger type is level and ISA's is edge. The recent
refactoring made it the other way round, which went unnoticed as it seems
only to cause havoc on some AMD systems.
Make the comment and code do the right thing again.
Fixes: a27dca645d2c ("x86/io_apic: Cleanup trigger/polarity helpers")
Reported-by: Tom Lendacky <[email protected]>
Reported-by: Borislav Petkov <[email protected]>
Reported-by: Qian Cai <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Tom Lendacky <[email protected]>
Cc: David Woodhouse <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In commit b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to
find remapping irqdomain") the I/O-APIC code was changed to find its
parent irqdomain using irq_find_matching_fwspec(), but the key used
for the lookup was wrong. It shouldn't use 'ioapic' which is the index
into its own ioapics[] array. It should use the actual arbitration
ID of the I/O-APIC in question, which is mpc_ioapic_id(ioapic).
Fixes: b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain")
Reported-by: lkp <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4
of the address used in the resulting MSI cycle.
Historically, the x86 MSI format only used the top 8 of those 16 bits as
the destination APIC ID, and the "Extended Destination ID" in the lower 8
bits was unused.
With interrupt remapping, the lowest bit of the Extended Destination ID
(bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable
format MSI.
A hypervisor can use the other 7 bits of the Extended Destination ID to
permit guests to address up to 15 bits of APIC IDs, thus allowing 32768
vCPUs before having to expose a vIOMMU and interrupt remapping to the
guest.
No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR I/OAPIC domain.
[ tglx: Converted it to the cleaned up entry/msi_msg format and added
commentry ]
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
All possible parent domains have a select method now. Make use of it.
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The I/O-APIC generates an MSI cycle with address/data bits taken from its
Redirection Table Entry in some combination which used to make sense, but
now is just a bunch of bits which get passed through in some seemingly
arbitrary order.
Instead of making IRQ remapping drivers directly frob the I/OA-PIC RTE, let
them just do their job and generate an MSI message. The bit swizzling to
turn that MSI message into the I/O-APIC's RTE is the same in all cases,
since it's a function of the I/O-APIC hardware. The IRQ remappers have no
real need to get involved with that.
The only slight caveat is that the I/OAPIC is interpreting some of those
fields too, and it does want the 'vector' field to be unique to make EOI
work. The AMD IOMMU happens to put its IRTE index in the bits that the
I/O-APIC thinks are the vector field, and accommodates this requirement by
reserving the first 32 indices for the I/O-APIC. The Intel IOMMU doesn't
actually use the bits that the I/O-APIC thinks are the vector field, so it
fills in the 'pin' value there instead.
[ tglx: Replaced the unreadably macro maze with the cleaned up RTE/msi_msg
bitfields and added commentry to explain the mapping magic ]
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Having two seperate structs for the I/O-APIC RTE entries (non-remapped and
DMAR remapped) requires type casts and makes it hard to map.
Combine them in IO_APIC_routing_entry by defining a union of two 64bit
bitfields. Use naming which reflects which bits are shared and which bits
are actually different for the operating modes.
[dwmw2: Fix it up and finish the job, pulling the 32-bit w1,w2 words for
register access into the same union and eliminating a few more
places where bits were accessed through masks and shifts.]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
'trigger' and 'polarity' are used throughout the I/O-APIC code for handling
the trigger type (edge/level) and the active low/high configuration. While
there are defines for initializing these variables and struct members, they
are not used consequently and the meaning of 'trigger' and 'polarity' is
opaque and confusing at best.
Rename them to 'is_level' and 'active_low' and make them boolean in various
structs so it's entirely clear what the meaning is.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
apic::irq_dest_mode is actually a boolean, but defined as u32 and named in
a way which does not explain what it means.
Make it a boolean and rename it to 'dest_mode_logical'
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The enum ioapic_irq_destination_types and the enumerated constants starting
with 'dest_' are gross misnomers because they describe the delivery mode.
Rename then enum and the constants so they actually make sense.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq updates from Thomas Gleixner:
"Surgery of the MSI interrupt handling to prepare the support of
upcoming devices which require non-PCI based MSI handling:
- Cleanup historical leftovers all over the place
- Rework the code to utilize more core functionality
- Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain
assignment to PCI devices possible.
- Assign irqdomains to PCI devices at initialization time which
allows to utilize the full functionality of hierarchical
irqdomains.
- Remove arch_.*_msi_irq() functions from X86 and utilize the
irqdomain which is assigned to the device for interrupt management.
- Make the arch_.*_msi_irq() support conditional on a config switch
and let the last few users select it"
* tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS
x86/apic/msi: Unbreak DMAR and HPET MSI
iommu/amd: Remove domain search for PCI/MSI
iommu/vt-d: Remove domain search for PCI/MSI[X]
x86/irq: Make most MSI ops XEN private
x86/irq: Cleanup the arch_*_msi_irqs() leftovers
PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
x86/pci: Set default irq domain in pcibios_add_device()
iommm/amd: Store irq domain in struct device
iommm/vt-d: Store irq domain in struct device
x86/xen: Wrap XEN MSI management into irqdomain
irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
x86/xen: Consolidate XEN-MSI init
x86/xen: Rework MSI teardown
x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
PCI_vmd_Mark_VMD_irqdomain_with_DOMAIN_BUS_VMD_MSI
irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
x86/irq: Initialize PCI/MSI domain at PCI init time
x86/pci: Reducde #ifdeffery in PCI init code
...
|
|
Several people reported in the kernel bugzilla that between v4.12 and v4.13
the magic which works around broken hardware and BIOSes to find the proper
timer interrupt delivery mode stopped working for some older affected
platforms which need to fall back to ExtINT delivery mode.
The reason is that the core code changed to keep track of the masked and
disabled state of an interrupt line more accurately to avoid the expensive
hardware operations.
That broke an assumption in i8259_make_irq() which invokes
disable_irq_nosync();
irq_set_chip_and_handler();
enable_irq();
Up to v4.12 this worked because enable_irq() unconditionally unmasked the
interrupt line, but after the state tracking improvements this is not
longer the case because the IO/APIC uses lazy disabling. So the line state
is unmasked which means that enable_irq() does not call into the new irq
chip to unmask it.
In principle this is a shortcoming of the core code, but it's more than
unclear whether the core code should try to reset state. At least this
cannot be done unconditionally as that would break other existing use cases
where the chip type is changed, e.g. when changing the trigger type, but
the callers expect the state to be preserved.
As the way how check_timer() is switching the delivery modes is truly
unique, the obvious fix is to simply unmask the i8259 manually after
changing the mode to ExtINT delivery and switching the irq chip to the
legacy PIC.
Note, that the fixes tag is not really precise, but identifies the commit
which broke the assumptions in the IO/APIC and i8259 code and that's the
kernel version to which this needs to be backported.
Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls")
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: [email protected]
Tested-by: [email protected]
Cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197769
|
|
Move the IOAPIC specific fields into their own struct and reuse the common
devid. Get rid of the #ifdeffery as it does not matter at all whether the
alloc info is a couple of bytes longer or not.
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Wei Liu <[email protected]>
Acked-by: Joerg Roedel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|