Age | Commit message (Collapse) | Author | Files | Lines |
|
Set the new CAN_RESERVE flag when the initial reservation for an interrupt
happens. The flag is used in a subsequent patch to disable reservation mode
for a certain class of MSI devices.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Alexandru Chirvasitu <[email protected]>
Tested-by: Andy Shevchenko <[email protected]>
Cc: Dou Liyang <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Mikael Pettersson <[email protected]>
Cc: Josh Poulson <[email protected]>
Cc: Mihai Costache <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: [email protected]
Cc: Haiyang Zhang <[email protected]>
Cc: Dexuan Cui <[email protected]>
Cc: Simon Xiao <[email protected]>
Cc: Saeed Mahameed <[email protected]>
Cc: Jork Loeser <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: [email protected]
Cc: KY Srinivasan <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: Sakari Ailus <[email protected]>,
Cc: [email protected]
|
|
Functions x86_vector_debug_show(), uv_handle_nmi() and uv_nmi_setup_common()
are local to the source and do not need to be in global scope, so make them
static.
Fixes up various sparse warnings.
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Mike Travis <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
There are no in-tree callers of ht_create_irq(), the driver interface for
HyperTransport interrupts, left. Remove the unused entry point and all the
supporting code.
See 8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt
support").
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: Benjamin Herrenschmidt <[email protected]>
Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
|
|
free_moved_vector() accesses the per cpu vector array with this_cpu_write()
to clear the vector. The function has two call sites:
1) The vector cleanup IPI
2) The force_complete_move() code path
For #1 this_cpu_write() is correct as it runs on the CPU on which the
vector needs to be freed.
For #2 this_cpu_write() is wrong because the function is called from an
outgoing CPU which is not necessarily the CPU on which the previous vector
needs to be freed. As a result it sets the vector on the outgoing CPU to
NULL, which is pointless as that CPU does not handle interrupts
anymore. What's worse is that it leaves the vector on the previous target
CPU in place which later on triggers the BUG_ON(vector) in the vector
allocation code when the vector gets reused. That's possible because the
bitmap allocator entry of that CPU is freed correctly.
Always use the CPU to which the vector was associated and clear the vector
entry on that CPU. Fixup the tracepoint as well so it tracks on which CPU
the vector gets removed.
Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Petri Latvala <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Yu Chen <[email protected]>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710161614430.1973@nanos
|
|
The core interrupt code can call the affinity setter for inactive
interrupts under certain circumstances.
For inactive intererupts which use managed or reservation mode this is a
pointless exercise as the activation will assign a vector which fits the
destination mask.
Check for this and return w/o going through the vector assignment.
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The interrupt descriptor has a preset affinity mask at allocation
time, which is usually the default affinity mask.
The current code does not respect that mask and places the vector at some
random CPU, which gets corrected later by a set_affinity() call. That's
silly because the vector allocation can respect the mask upfront and place
the interrupt on a CPU which is in the mask. If that fails, then the
affinity is broken and a interrupt assigned on any online CPU.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Before a CPU is taken offline the number of active interrupt vectors on the
outgoing CPU and the number of vectors which are available on the other
online CPUs are counted and compared. If the active vectors are more than
the available vectors on the other CPUs then the CPU hot-unplug operation
is aborted. This again uses loop based search and is inaccurate.
The bitmap matrix allocator has accurate accounting information and can
tell exactly whether the vector space is sufficient or not.
Emit a message when the number of globaly reserved (unallocated) vectors is
larger than the number of available vectors after offlining a CPU because
after that point request_irq() might fail.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
IOAPICs install and allocate vectors for inactive interrupts. This results
in problems on CPU offline and wastes vector resources for nothing.
Handle inactive IOAPIC interrupts in the same way as inactive MSI
interrupts and switch them to the global reservation mode.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Devices with many queues allocate a huge number of interrupts and get
assigned a vector for each of them, even if the queues are not active and
the interrupts never requested. This causes problems with the decision
whether the global vector space is sufficient for CPU hot unplug
operations.
Change it to a reservation scheme, which allows overcommitment.
When the interrupt is allocated and initialized the vector assignment
merily updates the reservation request counter in the matrix
allocator. This counter is used to emit warnings when the reservation
exceeds the available vector space, but does not affect CPU offline
operations. Like the managed interrupts the corresponding MSI/DMAR/IOAPIC
entries are directed to the special shutdown vector.
When the interrupt is requested, then the activation code tries to assign a
real vector. If that succeeds the interrupt is started up and functional.
If that fails, then subsequently request_irq() fails with -ENOSPC.
This allows a clear separation of inactive and active modes and simplifies
the final decisions whether the global vector space is sufficient for CPU
offline operations.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Managed interrupts need to reserve interrupt vectors permanently, but as
long as the interrupt is deactivated, the vector should not be active.
Reserve a new system vector, which can be used to initially initialize
MSI/DMAR/IOAPIC entries. In that situation the interrupts are disabled in
the corresponding MSI/DMAR/IOAPIC devices. So the vector should never be
sent to any CPU.
When the managed interrupt is started up, a real vector is assigned from
the managed vector space and configured in MSI/DMAR/IOAPIC.
This allows a clear separation of inactive and active modes and simplifies
the final decisions whether the global vector space is sufficient for CPU
offline operations.
The vector space can be reserved even on offline CPUs and will survive CPU
offline/online operations.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The vector management state is not required to live in irq_cfg. irq_cfg is
only relevant for the depending irq domains (IOAPIC, DMAR, MSI ...).
The seperation of the vector management status allows to direct a shut down
interrupt to a special shutdown vector w/o confusing the internal state of
the vector management.
Preparatory change for the rework of managed interrupts and the global
vector reservation scheme.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
No point in compiling this for UP.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Replace the magic vector allocation code by a simple bitmap matrix
allocator. This avoids loops and hoops over CPUs and vector arrays, so in
case of densly used vector spaces it's way faster.
This also gets rid of the magic 'spread the vectors accross priority
levels' heuristics in the current allocator:
The comment in __asign_irq_vector says:
* NOTE! The local APIC isn't very good at handling
* multiple interrupts at the same interrupt level.
* As the interrupt level is determined by taking the
* vector number and shifting that right by 4, we
* want to spread these out a bit so that they don't
* all fall in the same interrupt level.
After doing some palaeontological research the following was found the
following in the PPro Developer Manual Volume 3:
"7.4.2. Valid Interrupts
The local and I/O APICs support 240 distinct vectors in the range of 16
to 255. Interrupt priority is implied by its vector, according to the
following relationship: priority = vector / 16
One is the lowest priority and 15 is the highest. Vectors 16 through
31 are reserved for exclusive use by the processor. The remaining
vectors are for general use. The processor's local APIC includes an
in-service entry and a holding entry for each priority level. To avoid
losing inter- rupts, software should allocate no more than 2 interrupt
vectors per priority."
The current SDM tells nothing about that, instead it states:
"If more than one interrupt is generated with the same vector number,
the local APIC can set the bit for the vector both in the IRR and the
ISR. This means that for the Pentium 4 and Intel Xeon processors, the
IRR and ISR can queue two interrupts for each interrupt vector: one
in the IRR and one in the ISR. Any additional interrupts issued for
the same interrupt vector are collapsed into the single bit in the
IRR.
For the P6 family and Pentium processors, the IRR and ISR registers
can queue no more than two interrupts per interrupt vector and will
reject other interrupts that are received within the same vector."
Which means, that on P6/Pentium the APIC will reject a new message and
tell the sender to retry, which increases the load on the APIC bus and
nothing more.
There is no affirmative answer from Intel on that, but it's a sane approach
to remove that for the following reasons:
1) No other (relevant Open Source) operating systems bothers to
implement this or mentiones this at all.
2) The current allocator has no enforcement for this and especially the
legacy interrupts, which are the main source of interrupts on these
P6 and older systmes, are allocated linearly in the same priority
level and just work.
3) The current machines have no problem with that at all as verified
with some experiments.
4) AMD at least confirmed that such an issue is unknown.
5) P6 and older are dinosaurs almost 20 years EOL, so there is really
no reason to worry about that too much.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Add tracepoints for analysing the new vector management
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Add the debug callback for the vector domain, which gives a detailed
information about vector usage if invoked for the domain by using rhe
matrix allocator debug function and vector/target information when invoked
for a particular interrupt.
Extra information foir the Vector domain:
Online bitmaps: 32
Global available: 6352
Global reserved: 5
Total allocated: 20
System: 41: 0-19,32,50,128,238-255
| CPU | avl | man | act | vectors
0 183 4 19 33-48,51-53
1 199 4 1 33
2 199 4 0
Extra information for interrupts:
Vector: 42
Target: 4
This allows a detailed analysis of the vector usage and the association to
interrupts and devices.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Initialize the matrix allocator and add the proper accounting points to the
code.
No functional change, just preparation.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Move the helper functions to a different place as they would end up in the
middle of management functions.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The info pointer checks in assign_irq_vector_policy() are pointless because
the pointer cannot be NULL, otherwise the calling code would already crash.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Now that the legacy PIC takeover by the IOAPIC is marked accordingly the
early boot allocation of APIC data is not longer necessary. Use the regular
allocation mechansim as it is used by non legacy interrupts and fill in the
known information (vector and affinity) so the allocator reuses the vector,
This is important as the timer check might move the timer interrupt 0 back
to the PIC in case the delivery through the IOAPIC fails.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The vector move cleanup needs to walk the vector space and do a lot of
sanity checks to find a vector to cleanup.
With single CPU affinities this can be simplified and made more robust by
queueing the vector configuration which needs to be cleaned up in a hlist
on the CPU which was the previous target.
That removes all the race conditions because the cleanup either finds a
valid list entry or not. The latter happens when the interrupt was torn
down before the cleanup handler was able to run.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Now that the interrupt affinities are targeted at single CPUs storing them
in a cpumask is overkill. Store them in a dedicated variable.
This does not yet remove the domain cpumasks because the current allocator
relies on them. Preparatory change for the allocator rework.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The naming convention of variables with the types irq_data and
apic_chip_data are inconsistent and confusing.
Before reworking the whole vector management make them consistent so
irq_data pointers are named 'irqd' and apic_chip_data are named 'apicd' all
over the place.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
With single CPU affinities it's not longer required to scan all interrupts
for potential destination masks which contain the newly booting CPU.
Reduce it to install the active legacy PIC vectors on the newly booting CPU
as those cannot be affinity controlled by the kernel and potentially end up
at any CPU in the system.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Setting the interrupt affinity of a single interrupt to multiple CPUs has a
dubious value.
1) This only works on machines where the APIC uses logical destination
mode. If the APIC uses physical destination mode then it is already
restricted to a single CPU
2) Experiments have shown, that the benefit of multi CPU affinity is close
to zero and in some test even worse than setting the affinity to a
single CPU.
The reason for this is that the delivery targets the APIC with the
lowest ID first and only if that APIC is busy (servicing an interrupt,
i.e. ISR is not empty) it hands it over to the next APIC. In the
conducted tests the vast majority of interrupts ends up on the APIC
with the lowest ID anyway, so there is no natural spreading of the
interrupts possible.
Supporting multi CPU affinities adds a lot of complexity to the code, which
can turn the allocation search into a worst case of
nr_vectors * nr_online_cpus * nr_bits_in_target_mask
As a first step disable it by restricting the vector search to a single
CPU.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
used_vectors is a nisnomer as it only has the system vectors which are
excluded from the regular vector allocation marked. It's not what the name
suggests storage for the actually used vectors.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The target_cpus() callback of the apic struct is not really useful. Some
APICs return cpu_online_mask and others cpus_all_mask. The latter is bogus
as it does not take holes in the cpus_possible_mask into account.
Replace it with cpus_online_mask which makes the most sense and remove the
callback.
The usage sites will be removed in a later step anyway, so get rid of it
now to have incremental changes.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Yu Chen <[email protected]>
Acked-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rui Zhang <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
This variable is beyond pointless. Nothing allocates a vector via
alloc_gate() below FIRST_SYSTEM_VECTOR. So nothing can change
first_system_vector.
If there is a need for a gate below FIRST_SYSTEM_VECTOR then it can be
added to the vector defines and FIRST_SYSTEM_VECTOR can be adjusted
accordingly.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The irq department delivers:
- Expand the generic infrastructure handling the irq migration on CPU
hotplug and convert X86 over to it. (Thomas Gleixner)
Aside of consolidating code this is a preparatory change for:
- Finalizing the affinity management for multi-queue devices. The
main change here is to shut down interrupts which are affine to a
outgoing CPU and reenabling them when the CPU comes online again.
That avoids moving interrupts pointlessly around and breaking and
reestablishing affinities for no value. (Christoph Hellwig)
Note: This contains also the BLOCK-MQ and NVME changes which depend
on the rework of the irq core infrastructure. Jens acked them and
agreed that they should go with the irq changes.
- Consolidation of irq domain code (Marc Zyngier)
- State tracking consolidation in the core code (Jeffy Chen)
- Add debug infrastructure for hierarchical irq domains (Thomas
Gleixner)
- Infrastructure enhancement for managing generic interrupt chips via
devmem (Bartosz Golaszewski)
- Constification work all over the place (Tobias Klauser)
- Two new interrupt controller drivers for MVEBU (Thomas Petazzoni)
- The usual set of fixes, updates and enhancements all over the
place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
irqchip/or1k-pic: Fix interrupt acknowledgement
irqchip/irq-mvebu-gicp: Allocate enough memory for spi_bitmap
irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
nvme: Allocate queues for all possible CPUs
blk-mq: Create hctx for each present CPU
blk-mq: Include all present CPUs in the default queue mapping
genirq: Avoid unnecessary low level irq function calls
genirq: Set irq masked state when initializing irq_desc
genirq/timings: Add infrastructure for estimating the next interrupt arrival time
genirq/timings: Add infrastructure to track the interrupt timings
genirq/debugfs: Remove pointless NULL pointer check
irqchip/gic-v3-its: Don't assume GICv3 hardware supports 16bit INTID
irqchip/gic-v3-its: Add ACPI NUMA node mapping
irqchip/gic-v3-its-platform-msi: Make of_device_ids const
irqchip/gic-v3-its: Make of_device_ids const
irqchip/irq-mvebu-icu: Add new driver for Marvell ICU
irqchip/irq-mvebu-gicp: Add new driver for Marvell GICP
dt-bindings/interrupt-controller: Add DT binding for the Marvell ICU
genirq/irqdomain: Remove auto-recursive hierarchy support
irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access
...
|
|
If the interrupt destination mode of the APIC is physical then the
effective affinity is restricted to a single CPU.
Mark the interrupt accordingly in the domain allocation code, so the core
code can avoid pointless affinity setting attempts.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The decision to which CPUs an interrupt is effectively routed happens in
the various apic->cpu_mask_to_apicid() implementations
To support effective affinity masks this information needs to be updated in
irq_data. Add a pointer to irq_data to the callbacks and feed it through
the call chain.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
All implementations of apic->cpu_mask_to_apicid_and() and the two incoming
cpumasks to search for the target.
Move that operation to the call site and rename it to cpu_mask_to_apicid()
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
All implementations of apic->cpu_mask_to_apicid_and() mask out the offline
cpus. The callsite already has a mask available, which has the offline CPUs
removed. Use that and remove the extra bits.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Use the fwnode to create a named domain so diagnosis works.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Add the missing name, so debugging will work proper.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
This function is only called by arch_early_irq_init(), which is an
__init function, so mark the child function __init as well.
In addition mark it inline for the !CONFIG_X86_IO_APIC case.
Signed-off-by: Dou Liyang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This patch adds the __irq_entry annotation to the default x86
platform IRQ handlers. ftrace's function_graph tracer uses the
__irq_entry annotation to notify the entry and return of IRQ
handlers.
For example, before the patch:
354549.667252 | 3) d..1 | default_idle_call() {
354549.667252 | 3) d..1 | arch_cpu_idle() {
354549.667253 | 3) d..1 | default_idle() {
354549.696886 | 3) d..1 | smp_trace_reschedule_interrupt() {
354549.696886 | 3) d..1 | irq_enter() {
354549.696886 | 3) d..1 | rcu_irq_enter() {
After the patch:
366416.254476 | 3) d..1 | arch_cpu_idle() {
366416.254476 | 3) d..1 | default_idle() {
366416.261566 | 3) d..1 ==========> |
366416.261566 | 3) d..1 | smp_trace_reschedule_interrupt() {
366416.261566 | 3) d..1 | irq_enter() {
366416.261566 | 3) d..1 | rcu_irq_enter() {
KASAN also uses this annotation. The smp_apic_timer_interrupt()
was already annotated.
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Cc: Aaron Lu <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Claudio Fontana <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Dou Liyang <[email protected]>
Cc: Gu Zheng <[email protected]>
Cc: Hidehiro Kawai <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/059fdf437c2f0c09b13c18c8fe4e69999d3ffe69.1483528431.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
affinities related to the CPU in question. The same thing is also done when
the system is suspended to S-states like S3 (mem).
For each IRQ we try to complete any on-going move regardless whether the
IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
its chip_data, assume it is of type struct apic_chip_data and manipulate it
by clearing old_domain mask etc. For irq_chips that are not part of the
x86_vector_domain, like those created by various GPIO drivers, will find
their chip_data being changed unexpectly.
Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
corrupted after resume:
# cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
gpio-511 ( |sysfs ) in hi
# rtcwake -s10 -mmem
<10 seconds passes>
# cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
gpio-511 ( |sysfs ) in ?
Note '?' in the output. It means the struct gpio_chip ->get function is
NULL whereas before suspend it was there.
Fix this by first checking that the IRQ belongs to x86_vector_domain before
we try to use the chip_data as struct apic_chip_data.
Reported-and-tested-by: Sakari Ailus <[email protected]>
Signed-off-by: Mika Westerberg <[email protected]>
Cc: [email protected] # 4.4+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The use of config_enabled() against config options is ambiguous. In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED(). Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
clearer.
This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.
I noticed two cases where config_enabled() is used against a tristate
option:
- config_enabled(CONFIG_HWMON)
[ drivers/net/wireless/ath/ath10k/thermal.c ]
- config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
[ drivers/gpu/drm/gma500/opregion.c ]
I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Stas Sergeev <[email protected]>
Cc: Matt Redfearn <[email protected]>
Cc: Joshua Kinard <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Markos Chandras <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: yu-cheng yu <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Will Drewry <[email protected]>
Cc: Nikolay Martynov <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Leonid Yegoshin <[email protected]>
Cc: Rafal Milecki <[email protected]>
Cc: James Cowgill <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Alex Smith <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Brian Norris <[email protected]>
Cc: Hidehiro Kawai <[email protected]>
Cc: "Luis R. Rodriguez" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Tony Wu <[email protected]>
Cc: Huaitong Han <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andrea Gelmini <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Rabin Vincent <[email protected]>
Cc: "Maciej W. Rozycki" <[email protected]>
Cc: David Daney <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
the already allocated vectors. This subsequently calls clear_vector_irq().
The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
clear_vector_irq().
We cannot suppress the call to x86_vector_free_irqs() for the failed
interrupt, because the other data related to this irq must be cleaned up as
well. So calling clear_vector_irq() with vector == 0 is legitimate.
Remove the BUG_ON and return if vector is zero,
[ tglx: Massaged changelog ]
Fixes: b5dc8e6c21e7 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Signed-off-by: Keith Busch <[email protected]>
Cc: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Harry reported, that he's able to trigger a system freeze with cpu hot
unplug. The freeze turned out to be a live lock caused by recent changes in
irq_force_complete_move().
When fixup_irqs() and from there irq_force_complete_move() is called on the
dying cpu, then all other cpus are in stop machine an wait for the dying cpu
to complete the teardown. If there is a move of an interrupt pending then
irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
mask and waits for them to clear the mask. That's obviously impossible as
those cpus are firmly stuck in stop machine with interrupts disabled.
I should have known that, but I completely overlooked it being concentrated on
the locking issues around the vectors. And the existance of the call to
__irq_complete_move() in the code, which actually sends the cleanup IPI made
it reasonable to wait for that cleanup to complete. That call was bogus even
before the recent changes as it was just a pointless distraction.
We have to look at two cases:
1) The move_in_progress flag of the interrupt is set
This means the ioapic has been updated with the new vector, but it has not
fired yet. In theory there is a race:
set_ioapic(new_vector) <-- Interrupt is raised before update is effective,
i.e. it's raised on the old vector.
So if the target cpu cannot handle that interrupt before the old vector is
cleaned up, we get a spurious interrupt and in the worst case the ioapic
irq line becomes stale, but my experiments so far have only resulted in
spurious interrupts.
But in case of cpu hotplug this should be a non issue because if the
affinity update happens right before all cpus rendevouz in stop machine,
there is no way that the interrupt can be blocked on the target cpu because
all cpus loops first with interrupts enabled in stop machine, so the old
vector is not yet cleaned up when the interrupt fires.
So the only way to run into this issue is if the delivery of the interrupt
on the apic/system bus would be delayed beyond the point where the target
cpu disables interrupts in stop machine. I doubt that it can happen, but at
least there is a theroretical chance. Virtualization might be able to
expose this, but AFAICT the IOAPIC emulation is not as stupid as the real
hardware.
I've spent quite some time over the weekend to enforce that situation,
though I was not able to trigger the delayed case.
2) The move_in_progress flag is not set and the old_domain cpu mask is not
empty.
That means, that an interrupt was delivered after the change and the
cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
responded to it yet.
In both cases we can assume that the next interrupt will arrive on the new
vector, so we can cleanup the old vectors on the cpus in the old_domain cpu
mask.
Fixes: 98229aa36caa "x86/irq: Plug vector cleanup race"
Reported-by: Harry Junior <[email protected]>
Tested-by: Tony Luck <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Joe Lawrence <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603140931430.3657@nanos
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
We still can end up with a stale vector due to the following:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
So we need to serialize the vector assignment against a pending cleanup. The
solution is rather simple now. We not only check for the move_in_progress flag
in assign_irq_vector(), we also check whether there is still a cleanup pending
in the old_domain cpumask. If so, we return -EBUSY to the caller and let him
deal with it. Though we have to be careful in the cpu unplug case. If the
cleanout has not yet completed then the following setaffinity() call would
return -EBUSY. Add code which prevents this.
Full context is here: http://lkml.kernel.org/r/[email protected]
Reported-and-tested-by: Joe Lawrence <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jeremiah Mahler <[email protected]>
Cc: [email protected]
Cc: Guenter Roeck <[email protected]>
Cc: [email protected] #4.3+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
First of all there is no point in looking up the irq descriptor again, but we
also need the descriptor for the final cleanup race fix in the next
patch. Make that change seperate. No functional difference.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Tested-by: Joe Lawrence <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jeremiah Mahler <[email protected]>
Cc: [email protected]
Cc: Guenter Roeck <[email protected]>
Cc: [email protected] #4.3+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
We want to synchronize new vector assignments with a pending cleanup. Remove a
dying cpu from a pending cleanup mask.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Tested-by: Joe Lawrence <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jeremiah Mahler <[email protected]>
Cc: [email protected]
Cc: Guenter Roeck <[email protected]>
Cc: [email protected] #4.3+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
There is no need to allocate a new cpumask for sending the cleanup vector. The
old_domain mask is now protected by the vector_lock, so we can safely remove
the offline cpus from it and send the IPI with the resulting mask.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Tested-by: Joe Lawrence <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jeremiah Mahler <[email protected]>
Cc: [email protected]
Cc: Guenter Roeck <[email protected]>
Cc: [email protected] #4.3+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
send_cleanup_vector() fiddles with the old_domain mask unprotected because it
relies on the protection by the move_in_progress flag. But this is fatal, as
the flag is reset after the IPI has been sent. So a cpu which receives the IPI
can still see the flag set and therefor ignores the cleanup request. If no
other cleanup request happens then the vector stays stale on that cpu and in
case of an irq removal the vector still persists. That can lead to use after
free when the next cleanup IPI happens.
Protect the code with vector_lock and clear move_in_progress before sending
the IPI.
This does not plug the race which Joe reported because:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
The full fix comes with a later patch.
Reported-and-tested-by: Joe Lawrence <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jeremiah Mahler <[email protected]>
Cc: [email protected]
Cc: Guenter Roeck <[email protected]>
Cc: [email protected] #4.3+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
No point of keeping offline cpus in the cleanup mask.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Tested-by: Joe Lawrence <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jeremiah Mahler <[email protected]>
Cc: [email protected]
Cc: Guenter Roeck <[email protected]>
Cc: [email protected] #4.3+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Reusing an existing vector and assigning a new vector has duplicated
code. Consolidate it.
This is also a preparatory patch for finally plugging the cleanup race.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Tested-by: Joe Lawrence <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jeremiah Mahler <[email protected]>
Cc: [email protected]
Cc: Guenter Roeck <[email protected]>
Cc: [email protected] #4.3+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
In the case that the new vector mask is a subset of the existing mask there is
no point to do a AND operation of currentmask & newmask. The result is
newmask. So we can simply copy the new mask to the current mask and be done
with it. Preparatory patch for further consolidation.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov <[email protected]>
Tested-by: Joe Lawrence <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jeremiah Mahler <[email protected]>
Cc: [email protected]
Cc: Guenter Roeck <[email protected]>
Cc: [email protected] #4.3+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|