Age | Commit message (Collapse) | Author | Files | Lines |
|
exceptions simultaneously
qemu-system-x86-8600 [004] d..1 7205.687530: kvm_entry: vcpu 2
qemu-system-x86-8600 [004] .... 7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e
qemu-system-x86-8600 [004] .... 7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0
qemu-system-x86-8600 [004] .... 7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
qemu-system-x86-8600 [004] .N.. 7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
kworker/4:2-7814 [004] .... 7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
qemu-system-x86-8600 [004] .... 7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
qemu-system-x86-8600 [004] d..1 7205.687711: kvm_entry: vcpu 2
After running some memory intensive workload in guest, I catch the kworker
which completes the GUP too quickly, and queues an "Page Ready" #PF exception
after the "Page not Present" exception before the next vmentry as the above
trace which will result in #DF injected to guest.
This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
occurs before the next vmentry since the GUP has already got the required page
and shadow page table has already been fixed by "Page Ready" handler.
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Fixes: 7c90705bf2a3 ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
[Changed indentation and added clearing of injected. - Radim]
Signed-off-by: Radim Krčmář <[email protected]>
|
|
This patch adds initial support for the STM32F7 I2C controller.
Signed-off-by: M'boumba Cedric Madianga <[email protected]>
Signed-off-by: Pierre-Yves MORDRET <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
This patch uses a more generic definition of speed enum for i2c-stm32f4
driver.
Signed-off-by: M'boumba Cedric Madianga <[email protected]>
Signed-off-by: Pierre-Yves MORDRET <[email protected]>
Reviewed-by: Ludovic BARRE <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
This patch adds the documentation of device tree bindings for STM32F7 I2C
Signed-off-by: M'boumba Cedric Madianga <[email protected]>
Signed-off-by: Pierre-Yves MORDRET <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
Bug fixes for stable.
|
|
Don't block vCPU if there is pending exception.
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
SVM AVIC hardware accelerates guest write to APIC_EOI register
(for edge-trigger interrupt), which means it does not trap to KVM.
So, only enable SVM AVIC only in split irqchip mode.
(e.g. launching qemu w/ option '-machine kernel_irqchip=split').
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Fixes: 44a95dae1d22 ("KVM: x86: Detect and Initialize AVIC support")
[Removed pr_debug - Radim.]
Signed-off-by: Radim Krčmář <[email protected]>
|
|
... and __initconst if applicable.
Based on similar work for an older kernel in the Grsecurity patch.
[JD: fix toshiba-wmi build]
[JD: add htcpen]
[JD: move __initconst where checkscript wants it]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>
|
|
All watchdog thread related functions are delegated to the smpboot thread
infrastructure, which handles serialization against CPU hotplug correctly.
The sysctl interface is completely decoupled from anything which requires
CPU hotplug protection.
No need to protect the sysctl writes against cpu hotplug anymore. Remove it
and add the now required protection to the powerpc arch_nmi_watchdog
implementation.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that all functionality is properly serialized against CPU hotplug,
remove the extra per cpu storage which holds the disabled events for
cleanup. The core makes sure that cleanup happens before new events are
created.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Get rid of the hodgepodge which tries to be smart about perf being
unavailable and error printout rate limiting.
That's all not required simply because this is never invoked when the perf
NMI watchdog is not functional.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
watchdog_nmi_enable() is an unparseable mess, Provide a clean perf specific
implementation, which will be used when the existing setup/teardown mess is
replaced.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use the init time detection of the perf NMI watchdog to determine whether
the perf NMI watchdog is functional. If not disable it permanentely. It
won't come back magically at runtime.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The watchdog tries to create perf events even after it figured out that
perf is not functional or the requested event is not supported.
That's braindead as this can be done once at init time and if not supported
the NMI watchdog can be turned off unconditonally.
Implement the perf hardlockup detector functionality for that. This creates
a new event create function, which will replace the unholy mess of the
existing one in later patches.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Letting user space poke directly at variables which are used at run time is
stupid and causes a lot of race conditions and other issues.
Seperate the user variables and on change invoke the reconfiguration, which
then stops the watchdogs, reevaluates the new user value and restarts the
watchdogs with the new parameters.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
need to be done in two steps.
1) Stop all NMIs
2) Read the new parameters and start NMIs
Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
clean reconfiguration add a 'run' argument and split the functionality in
powerpc.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Reflect that these variables are user interface related and remove the
whitespace damage in the sysctl table while at it.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The sysctl of the nmi_watchdog file prevents writes by setting:
min = max = 0
if none of the users is enabled. That involves ifdeffery and is competely
non obvious.
If none of the facilities is enabeld, then the file can simply be made read
only. Move the ifdeffery into the header and use a constant for file
permissions.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Having the same #ifdef in various places does not make it more
readable. Collect stuff into one place.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use a single function to update sysctl changes. This is not a high
frequency user space interface and it's root only.
Preparatory patch to cleanup the sysctl variable handling.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.
That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.
Aside of that the code implements the park/unpark facility of smp hotplug
threads on its own, which is even more pointless. We have functionality in
the smpboot thread code to do so.
Use the new thread management functions and get rid of the unholy mess.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.
That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.
Another horrible mechanism are the open coded park/unpark loops which are
used for reconfiguration of the watchdog. The smpboot infrastructure allows
exactly the same via smpboot_update_cpumask_thread_percpu(), which is cpu
hotplug safe. Using that instead of the open coded loops allows to get rid
of the hotplug locking mess in the watchdog code.
Implement a clean infrastructure which allows to replace the open coded
nonsense.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
smpboot_update_cpumask_threads_percpu() allocates a temporary cpumask at
runtime. This is suboptimal because the call site needs more code size for
proper error handling than a statically allocated temporary mask requires
data size.
Add static temporary cpumask. The function is globaly serialized, so no
further protection required.
Remove the half baken error handling in the watchdog code and get rid of
the export as there are no in tree modular users of that function.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Split the write part of the cpumask proc handler out into a separate helper
to avoid deep indentation. This also reduces the patch complexity in the
following cleanups.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The #ifdef maze in this file is horrible, group stuff at least a bit so one
can figure out what belongs to what.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Having stub functions which take a full page is not helping the
readablility of code.
Condense them and move the doubled #ifdef variant into the SYSFS section.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit:
b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded system")
tries to fix the following issue:
proc_write()
set_sample_period() <--- New sample period becoms visible
<----- Broken starts
proc_watchdog_update()
watchdog_enable_all_cpus() watchdog_hrtimer_fn()
update_watchdog_all_cpus() restart_timer(sample_period)
watchdog_park_threads()
thread->park()
disable_nmi()
<----- Broken ends
The reason why this is broken is that the update of the watchdog threshold
becomes immediately effective and visible for the hrtimer function which
uses that value to rearm the timer. But the NMI/perf side still uses the
old value up to the point where it is disabled. If the rate has been
lowered then the NMI can run fast enough to 'detect' a hard lockup because
the timer has not fired due to the longer period.
The patch 'fixed' this by adding a variable:
proc_write()
set_sample_period()
<----- Broken starts
proc_watchdog_update()
watchdog_enable_all_cpus() watchdog_hrtimer_fn()
update_watchdog_all_cpus() restart_timer(sample_period)
watchdog_park_threads()
park_in_progress = 1
<----- Broken ends
nmi_watchdog()
if (park_in_progress)
return;
The only effect of this variable was to make the window where the breakage
can hit small enough that it was not longer observable in testing. From a
correctness point of view it is a pointless bandaid which merily papers
over the root cause: the unsychronized update of the variable.
Looking deeper into the related code pathes unearthed similar problems in
the watchdog_start()/stop() functions.
watchdog_start()
perf_nmi_event_start()
hrtimer_start()
watchdog_stop()
hrtimer_cancel()
perf_nmi_event_stop()
In both cases the call order is wrong because if the tasks gets preempted
or the VM gets scheduled out long enough after the first call, then there is
a chance that the next NMI will see a stale hrtimer interrupt count and
trigger a false positive hard lockup splat.
Get rid of park_in_progress so the code can be gradually deobfuscated and
pruned from several layers of duct tape papering over the root cause,
which has been either ignored or not understood at all.
Once this is removed the underlying problem will be fixed by rewriting the
proc interface to do a proper synchronized update.
Address the start/stop() ordering problem as well by reverting the call
order, so this part is at least correct now.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanos
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The following deadlock is possible in the watchdog hotplug code:
cpus_write_lock()
...
takedown_cpu()
smpboot_park_threads()
smpboot_park_thread()
kthread_park()
->park() := watchdog_disable()
watchdog_nmi_disable()
perf_event_release_kernel();
put_event()
_free_event()
->destroy() := hw_perf_event_destroy()
x86_release_hardware()
release_ds_buffers()
get_online_cpus()
when a per cpu watchdog perf event is destroyed which drops the last
reference to the PMU hardware. The cleanup code there invokes
get_online_cpus() which instantly deadlocks because the hotplug percpu
rwsem is write locked.
To solve this add a deferring mechanism:
cpus_write_lock()
kthread_park()
watchdog_nmi_disable(deferred)
perf_event_disable(event);
move_event_to_deferred(event);
....
cpus_write_unlock()
cleaup_deferred_events()
perf_event_release_kernel()
This is still properly serialized against concurrent hotplug via the
cpu_add_remove_lock, which is held by the task which initiated the hotplug
event.
This is also used to handle event destruction when the watchdog threads are
parked via other mechanisms than CPU hotplug.
Analyzed-by: Peter Zijlstra <[email protected]>
Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The self disabling feature is broken vs. CPU hotplug locking:
CPU 0 CPU 1
cpus_write_lock();
cpu_up(1)
wait_for_completion()
....
unpark_watchdog()
->unpark()
perf_event_create() <- fails
watchdog_enable &= ~NMI_WATCHDOG;
....
cpus_write_unlock();
CPU 2
cpus_write_lock()
cpu_down(2)
wait_for_completion()
wakeup(watchdog);
watchdog()
if (!(watchdog_enable & NMI_WATCHDOG))
watchdog_nmi_disable()
perf_event_disable()
....
cpus_read_lock();
stop_smpboot_threads()
park_watchdog();
wait_for_completion(watchdog->parked);
Result: End of hotplug and instantaneous full lockup of the machine.
There is a similar problem with disabling the watchdog via the user space
interface as the sysctl function fiddles with watchdog_enable directly.
It's very debatable whether this is required at all. If the watchdog works
nicely on N CPUs and it fails to enable on the N + 1 CPU either during
hotplug or because the user space interface disabled it via sysctl cpumask
and then some perf user grabbed the counter which is then unavailable for
the watchdog when the sysctl cpumask gets changed back.
There is no real justification for this.
One of the reasons WHY this is done is the utter stupidity of the init code
of the perf NMI watchdog. Instead of checking upfront at boot whether PERF
is available and functional at all, it just does this check at run time
over and over when user space fiddles with the sysctl. That's broken beyond
repair along with the idiotic error code dependent warn level printks and
the even more silly printk rate limiting.
If the init code checks whether perf works at boot time, then this mess can
be more or less avoided completely. Perf does not come magically into life
at runtime. Brain usage while coding is overrated.
Remove the cruft and add a temporary safe guard which gets removed later.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The function is only used by the KVM init code. Mark it __init to prevent
creative abuse.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Following patches will use the mutex for other purposes as well. Rename it
as it is not longer a proc specific thing.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The watchdog proc interface causes extensive recursive locking of the CPU
hotplug percpu rwsem, which is deadlock prone.
Replace the get/put_online_cpus() pairs with cpu_hotplug_disable()/enable()
calls for now. Later patches will remove that requirement completely.
Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This interface has several issues:
- It's causing recursive locking of the hotplug lock.
- It's complete overkill to teardown all threads and then recreate them
The same can be achieved with the simple hardlockup_detector_perf_stop /
restart() interfaces. The abuse from the busy looping poweroff() loop of
PARISC has been solved as well.
Remove the cruft.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The broken lockup_detector_suspend/resume() interface is going away. Use
the new lockup_detector_soft_poweroff() interface to stop the watchdog from
the busy looping power off routine.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
PARISC has a a busy looping power off routine. If the watchdog is enabled
the watchdog timer will still fire, but the thread is not running, which
causes the softlockup watchdog to trigger.
Provide a interface which allows to turn the watchdog off.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The lockup_detector_suspend/resume() interface is broken in several ways
especially as it results in recursive locking of the CPU hotplug lock.
Use the new stop/restart interface in the perf NMI watchdog to temporarily
disable and reenable the already active watchdog events. That's enough to
handle it.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Provide an interface to stop and restart perf NMI watchdog events on all
CPUs. This is only usable during init and especially for handling the perf
HT bug on Intel machines. It's safe to use it this way as nothing can
start/stop the NMI watchdog in parallel.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The page_owner stacktrace always begin as follows:
[<ffffff987bfd48f4>] save_stack+0x40/0xc8
[<ffffff987bfd4da8>] __set_page_owner+0x3c/0x6c
These two entries do not provide any useful information and limits the
available stacktrace depth. The page_owner stacktrace was skipping
caller function from stack entries but this was missed with commit
f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")
Example page_owner entry after the patch:
Page allocated via order 0, mask 0x8(ffffff80085fb714)
PFN 654411 type Movable Block 639 type CMA Flags 0x0(ffffffbe5c7f12c0)
[<ffffff9b64989c14>] post_alloc_hook+0x70/0x80
...
[<ffffff9b651216e8>] msm_comm_try_state+0x5f8/0x14f4
[<ffffff9b6512486c>] msm_vidc_open+0x5e4/0x7d0
[<ffffff9b65113674>] msm_v4l2_open+0xa8/0x224
Link: http://lkml.kernel.org/r/[email protected]
Fixes: f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")
Signed-off-by: Prakash Gupta <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Russell King <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The stacktraces always begin as follows:
[<c00117b4>] save_stack_trace_tsk+0x0/0x98
[<c0011870>] save_stack_trace+0x24/0x28
...
This is because the stack trace code includes the stack frames for
itself. This is incorrect behaviour, and also leads to "skip" doing the
wrong thing (which is the number of stack frames to avoid recording.)
Perversely, it does the right thing when passed a non-current thread.
Fix this by ensuring that we have a known constant number of frames
above the main stack trace function, and always skip these.
This was fixed for arch arm by commit 3683f44c42e9 ("ARM: stacktrace:
avoid listing stacktrace functions in stacktrace")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Prakash Gupta <[email protected]>
Cc: Russell King <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.
The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.
I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.
I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.
I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.
This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.
[1] http://lkml.kernel.org/r/[email protected]
[[email protected]: fix typo]
[[email protected]: coding-style fixes]
[[email protected]: drm/i915: fix up]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
gcc-7 points out that a negative port_num value would overflow the
string buffer:
drivers/infiniband/hw/mlx4/sysfs.c: In function 'mlx4_ib_device_register_sysfs':
drivers/infiniband/hw/mlx4/sysfs.c:251:16: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
drivers/infiniband/hw/mlx4/sysfs.c:251:2: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10
drivers/infiniband/hw/mlx4/sysfs.c:303:17: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
drivers/infiniband/hw/mlx4/sysfs.c:303:3: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10
While we should be able to assume that port_num is positive here, making
the buffer one byte longer has no downsides and avoids the warning.
Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
gcc points out a minor bug in the handling of unknown cookie types,
which could result in a string overflow when the integer is copied into
a 3-byte string:
fs/fscache/object-list.c: In function 'fscache_objlist_show':
fs/fscache/object-list.c:265:19: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
sprintf(_type, "%02u", cookie->def->type);
^~~~~~
fs/fscache/object-list.c:265:4: note: 'sprintf' output between 3 and 4 bytes into a destination of size 3
This is currently harmless as no code sets a type other than 0 or 1, but
it makes sense to use snprintf() here to avoid overflowing the array if
that changes.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With gcc 4.1.2:
lib/test_bitmap.c:189: warning: integer constant is too large for `long' type
lib/test_bitmap.c:190: warning: integer constant is too large for `long' type
lib/test_bitmap.c:194: warning: integer constant is too large for `long' type
lib/test_bitmap.c:195: warning: integer constant is too large for `long' type
Add the missing "ULL" suffix to fix this.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 60ef690018b262dd ("bitmap: introduce BITMAP_FROM_U64()")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Yury Norov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In NOMMU configurations, we get a warning about a variable that has become
unused:
fs/proc/task_nommu.c: In function 'nommu_vma_show':
fs/proc/task_nommu.c:148:28: error: unused variable 'priv' [-Werror=unused-variable]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1240ea0dc3bb ("fs, proc: remove priv argument from is_stack")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
gcc-4.4.4 has issues with initialization of anonymous unions:
drivers/media/cec/cec-adap.c: In function 'cec_queue_msg_fh':
drivers/media/cec/cec-adap.c:184: error: unknown field 'lost_msgs' specified in initializer
work around this.
Fixes: 6b2bbb08747a5 ("media: cec: rework the cec event handling")
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
IDR only supports non-negative IDs. There used to be a 'WARN_ON_ONCE(id <
0)' in idr_replace(), but it was intentionally removed by commit
2e1c9b286765 ("idr: remove WARN_ON_ONCE() on negative IDs").
Then it was added back by commit 0a835c4f090a ("Reimplement IDR and IDA
using the radix tree"). However it seems that adding it back was a
mistake, given that some users such as drm_gem_handle_delete()
(DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
allowing the WARN_ON_ONCE to be triggered. drm_gem_handle_delete()
actually just wants idr_replace() to return an error code if the ID is
not allocated, including in the case where the ID is invalid (negative).
So once again remove the bogus WARN_ON_ONCE().
This bug was found by syzkaller, which encountered the following
warning:
WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
Kernel panic - not syncing: panic_on_warn set ...
CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297
RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78
RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000
R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff
R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8
drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
entry_SYSCALL_64_fastpath+0x1f/0xbe
Here is a C reproducer:
#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <sys/ioctl.h>
#include <drm/drm.h>
int main(void)
{
int cardfd = open("/dev/dri/card0", O_RDONLY);
ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
&(struct drm_gem_close) { .handle = -1 } );
}
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Signed-off-by: Eric Biggers <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]> [v4.11+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This code causes a static checker warning because Smatch doesn't trust
anything that comes from skb->data. I've reviewed this code and I do
think skb->data can be controlled by the user here.
The sctp_event_subscribe struct has 13 __u8 fields and we want to see
if ours is non-zero. sn_type can be any value in the 0-USHRT_MAX range.
We're subtracting SCTP_SN_TYPE_BASE which is 1 << 15 so we could read
either before the start of the struct or after the end.
This is a very old bug and it's surprising that it would go undetected
for so long but my theory is that it just doesn't have a big impact so
it would be hard to notice.
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add driver support for the Altera I2C Controller. The I2C
controller is soft IP for use in FPGAs.
Signed-off-by: Thor Thayer <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
Add the documentation to support the Altera synthesizable
logic I2C Controller in FPGA.
Signed-off-by: Thor Thayer <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
When adding myself as a reviewer for the Renesas Ethernet drivers
I somehow forgot about the bindings -- I want to review them as well.
Fixes: 8e6569af3a1b ("MAINTAINERS: add myself as Renesas Ethernet drivers reviewer")
Signed-off-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|