aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2017-06-22genirq/cpuhotplug: Use effective affinity maskThomas Gleixner1-3/+11
If the architecture supports the effective affinity mask, migrating interrupts away which are not targeted by the effective mask is pointless. They can stay in the user or system supplied affinity mask, but won't be targetted at any given point as the affinity setter functions need to validate against the online cpu mask anyway. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Introduce effective affinity maskThomas Gleixner4-7/+105
There is currently no way to evaluate the effective affinity mask of a given interrupt. Many irq chips allow only a single target CPU or a subset of CPUs in the affinity mask. Updating the mask at the time of setting the affinity to the subset would be counterproductive because information for cpu hotplug about assigned interrupt affinities gets lost. On CPU hotplug it's also pointless to force migrate an interrupt, which is not targeted at the CPU effectively. But currently the information is not available. Provide a seperate mask to be updated by the irq_chip->irq_set_affinity() implementations. Implement the read only proc files so the user can see the effective mask as well w/o trying to deduce it from /proc/interrupts. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/proc: Replace ever repeating type castThomas Gleixner1-5/+5
The proc file setup repeats the same ugly type cast for the irq number over and over. Do it once and hand in the local void pointer. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Remove pointless gfp argumentThomas Gleixner1-8/+7
All callers hand in GPF_KERNEL. No point to have an extra argument for that. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Remove pointless arg from show_irq_affinityThomas Gleixner1-3/+3
The third argument of the internal helper function is unused. Remove it. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Move irq_fixup_move_pending() to coreThomas Gleixner1-0/+5
Now that x86 uses the generic code, the function declaration and inline stub can move to the core internal header. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Set force affinity flag on hotplug migrationThomas Gleixner1-1/+1
Set the force migration flag when migrating interrupts away from an outgoing CPU. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Add support for conditional maskingThomas Gleixner1-1/+10
Interrupts which cannot be migrated in process context, need to be masked before the affinity is changed forcefully. Add support for that. Will be compiled out for architectures which do not have this x86 specific issue. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Add support for cleaning up move in progressThomas Gleixner2-3/+35
In order to move x86 to the generic hotplug migration code, add support for cleaning up move in progress bits. On architectures which have this x86 specific (mis)feature not enabled, this is optimized out by the compiler. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Do not migrated shutdown irqsThomas Gleixner1-3/+8
Interrupts, which are shut down are tried to be migrated as well. That's pointless because the interrupt cannot fire and the next startup will move it to the proper place anyway. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Reorder check logicThomas Gleixner1-16/+20
Move the checks for a valid irq chip and the irq_set_affinity() callback right in front of the whole migration logic. No point in doing a gazillion of other things when the interrupt cannot be migrated at all. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Dont claim success on errorThomas Gleixner1-1/+4
In case the affinity of an interrupt was broken, a printk is emitted. But if the affinity cannot be set at all due to a missing irq_set_affinity() callback or due to a failing callback, the message is still printed preceeded by a warning/error. That makes no sense whatsoever. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Remove irq disabling logicThomas Gleixner1-8/+4
This is called from stop_machine() with interrupts disabled. No point in disabling them some more. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Move pending helpers to internal.hChristoph Hellwig2-28/+38
So that the affinity code can reuse them. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Move initial affinity setup to irq_startup()Thomas Gleixner2-9/+8
The startup vs. setaffinity ordering of interrupts depends on the IRQF_NOAUTOEN flag. Chained interrupts are not getting any affinity assignment at all. A regular interrupt is started up and then the affinity is set. A IRQF_NOAUTOEN marked interrupt is not started up, but the affinity is set nevertheless. Move the affinity setup to startup_irq() so the ordering is always the same and chained interrupts get the proper default affinity assigned as well. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Rename setup_affinity() to irq_setup_affinity()Thomas Gleixner2-6/+7
Rename it with a proper irq_ prefix and make it available for other files in the core code. Preparatory patch for moving the irq affinity setup around. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Remove mask argument from setup_affinity()Thomas Gleixner3-34/+29
No point to have this alloc/free dance of cpumasks. Provide a static mask for setup_affinity() and protect it proper. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Provide irq_fixup_move_pending()Thomas Gleixner1-0/+30
If an CPU goes offline, the interrupts are migrated away, but a eventually pending interrupt move, which has not yet been made effective is kept pending even if the outgoing CPU is the sole target of the pending affinity mask. What's worse is, that the pending affinity mask is discarded even if it would contain a valid subset of the online CPUs. Implement a helper function which allows to avoid these issues. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/debugfs: Add proper debugfs interfaceThomas Gleixner7-1/+337
Debugging (hierarchical) interupt domains is tedious as there is no information about the hierarchy and no information about states of interrupts in the various domain levels. Add a debugfs directory 'irq' and subdirectories 'domains' and 'irqs'. The domains directory contains the domain files. The content is information about the domain. If the domain is part of a hierarchy then the parent domains are printed as well. # ls /sys/kernel/debug/irq/domains/ default INTEL-IR-2 INTEL-IR-MSI-2 IO-APIC-IR-2 PCI-MSI DMAR-MSI INTEL-IR-3 INTEL-IR-MSI-3 IO-APIC-IR-3 unknown-1 INTEL-IR-0 INTEL-IR-MSI-0 IO-APIC-IR-0 IO-APIC-IR-4 VECTOR INTEL-IR-1 INTEL-IR-MSI-1 IO-APIC-IR-1 PCI-HT # cat /sys/kernel/debug/irq/domains/VECTOR name: VECTOR size: 0 mapped: 216 flags: 0x00000041 # cat /sys/kernel/debug/irq/domains/IO-APIC-IR-0 name: IO-APIC-IR-0 size: 24 mapped: 19 flags: 0x00000041 parent: INTEL-IR-3 name: INTEL-IR-3 size: 65536 mapped: 167 flags: 0x00000041 parent: VECTOR name: VECTOR size: 0 mapped: 216 flags: 0x00000041 Unfortunately there is no per cpu information about the VECTOR domain (yet). The irqs directory contains detailed information about mapped interrupts. # cat /sys/kernel/debug/irq/irqs/3 handler: handle_edge_irq status: 0x00004000 istate: 0x00000000 ddepth: 1 wdepth: 0 dstate: 0x01018000 IRQD_IRQ_DISABLED IRQD_SINGLE_TARGET IRQD_MOVE_PCNTXT node: 0 affinity: 0-143 effectiv: 0 pending: domain: IO-APIC-IR-0 hwirq: 0x3 chip: IR-IO-APIC flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: INTEL-IR-3 hwirq: 0x20000 chip: INTEL-IR flags: 0x0 parent: domain: VECTOR hwirq: 0x3 chip: APIC flags: 0x0 This was developed to simplify the debugging of the managed affinity changes. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Marc Zyngier <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-06-22genirq/irqdomain: Add map counterThomas Gleixner1-0/+4
Add a map counter instead of counting radix tree entries for diagnosis. That also gives correct information for linear domains. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Marc Zyngier <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Allow fwnode to carry name information onlyThomas Gleixner1-13/+92
In order to provide proper debug interface it's required to have domain names available when the domain is added. Non fwnode based architectures like x86 have no way to do so. It's not possible to use domain ops or host data for this as domain ops might be the same for several instances, but the names have to be unique. Extend the irqchip fwnode to allow transporting the domain name. If no node is supplied, create a 'unknown-N' placeholder. Warn if an invalid node is supplied and treat it like no node. This happens e.g. with i2 devices on x86 which hand in an ACPI type node which has no interface for retrieving the name. [ Folded a fix from Marc to make DT name parsing work ] Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Marc Zyngier <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/msi: Prevent overwriting domain nameThomas Gleixner1-1/+2
Prevent overwriting an already assigned domain name. Remove the extra check for chip->name, because if domain->name is NULL overwriting it with NULL is not a problem. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Marc Zyngier <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-21irq/generic-chip: Provide devm_irq_setup_generic_chip()Bartosz Golaszewski1-0/+52
Provide a resource managed variant of irq_setup_generic_chip(). Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Marc Zyngier <[email protected]> Cc: [email protected] Cc: Jonathan Corbet <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-21irq/generic-chip: Provide devm_irq_alloc_generic_chip()Bartosz Golaszewski1-0/+34
Provide a resource managed variant of irq_alloc_generic_chip(). Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Marc Zyngier <[email protected]> Cc: [email protected] Cc: Jonathan Corbet <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-21irq/generic-chip: Export irq_init_generic_chip() locallyBartosz Golaszewski2-4/+14
This function will be used in the devres variant of irq_alloc_generic_chip(). Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Marc Zyngier <[email protected]> Cc: [email protected] Cc: Jonathan Corbet <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-20Merge branch 'linus' into irq/coreThomas Gleixner29-154/+175
Get upstream changes so pending patches won't conflict.
2017-06-18Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds3-6/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Three fixlets for timers: - Two hot-fixes for the alarmtimer based posix timers, which prevent a nasty DOS by self rescheduling timers. The proper cleanup of that mess is queued for 4.13 - Make a function static" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/broadcast: Make tick_broadcast_setup_oneshot() static alarmtimer: Rate limit periodic intervals alarmtimer: Prevent overflow of relative timers
2017-06-18Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "Two small fixes for the schedulre core: - Use the proper switch_mm() variant in idle_task_exit() because that code is not called with interrupts disabled. - Fix a confusing typo in a printk" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off() sched/fair: Fix typo in printk message
2017-06-18Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "Add a missing resource release to an error path" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Release resources in __setup_irq() error path
2017-06-15Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-devfreq'Rafael J. Wysocki136-3847/+8988
* pm-cpufreq: cpufreq: conservative: Allow down_threshold to take values from 1 to 10 Revert "cpufreq: schedutil: Reduce frequencies slower" * pm-cpuidle: cpuidle: dt: Add missing 'of_node_put()' * pm-devfreq: PM / devfreq: exynos-ppmu: Staticize event list PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
2017-06-13genirq: Release resources in __setup_irq() error pathHeiner Kallweit1-1/+3
In case __irq_set_trigger() fails the resources requested via irq_request_resources() are not released. Add the missing release call into the error handling path. Fixes: c1bacbae8192 ("genirq: Provide irq_request/release_resources chip callbacks") Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-06-12tick/broadcast: Make tick_broadcast_setup_oneshot() staticStephen Boyd2-3/+3
This function isn't used outside of tick-broadcast.c, so let's mark it static. Signed-off-by: Stephen Boyd <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-06-12Revert "cpufreq: schedutil: Reduce frequencies slower"Rafael J. Wysocki1-3/+0
Revert commit 39b64aa1c007 (cpufreq: schedutil: Reduce frequencies slower) that introduced unintentional changes in behavior leading to adverse effects on some systems. Reported-by: Viresh Kumar <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-06-11sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()Andy Lutomirski1-1/+1
idle_task_exit() can be called with IRQs on x86 on and therefore should use switch_mm(), not switch_mm_irqs_off(). This doesn't seem to cause any problems right now, but it will confuse my upcoming TLB flush changes. Nonetheless, I think it should be backported because it's trivial. There won't be any meaningful performance impact because idle_task_exit() is only used when offlining a CPU. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler") Link: http://lkml.kernel.org/r/ca3d1a9fa93a0b49f5a8ff729eda3640fb6abdf9.1497034141.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2017-06-11sched/fair: Fix typo in printk messageMarcin Nowakowski1-1/+1
'schedstats' kernel parameter should be set to enable/disable, so correct the printk hint saying that it should be set to 'enable' rather than 'enabled' to enable scheduler tracepoints. Signed-off-by: Marcin Nowakowski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-10Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Ingo Molnar: "An error handling corner case fix" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Drop the device lock on error
2017-06-10Merge branch 'rcu-urgent-for-linus' of ↵Linus Torvalds3-9/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fixes from Ingo Molnar: "Fix an SRCU bug affecting KVM IRQ injection" * 'rcu-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: srcu: Allow use of Classic SRCU from both process and interrupt context srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context
2017-06-10Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-0/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This is mostly tooling fixes, plus an instruction pointer filtering fix. It's more fixes than usual - Arnaldo got back from a longer vacation and there was a backlog" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) perf symbols: Kill dso__build_id_is_kmod() perf symbols: Keep DSO->symtab_type after decompress perf tests: Decompress kernel module before objdump perf tools: Consolidate error path in __open_dso() perf tools: Decompress kernel module when reading DSO data perf annotate: Use dso__decompress_kmodule_path() perf tools: Introduce dso__decompress_kmodule_{fd,path} perf tools: Fix a memory leak in __open_dso() perf annotate: Fix symbolic link of build-id cache perf/core: Drop kernel samples even though :u is specified perf script python: Remove dups in documentation examples perf script python: Updated trace_unhandled() signature perf script python: Fix wrong code snippets in documentation perf script: Fix documentation errors perf script: Fix outdated comment for perf-trace-python perf probe: Fix examples section of documentation perf report: Ensure the perf DSO mapping matches what libdw sees perf report: Include partial stacks unwound with libdw perf annotate: Add missing powerpc triplet perf test: Disable breakpoint signal tests for powerpc ...
2017-06-09Merge branch 'rcu/urgent' of ↵Ingo Molnar3-9/+8
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into rcu/urgent Pull RCU fix from Paul E. McKenney: " This series enables srcu_read_lock() and srcu_read_unlock() to be used from interrupt handlers, which fixes a bug in KVM's use of SRCU in delivery of interrupts to guest OSes. " Signed-off-by: Ingo Molnar <[email protected]>
2017-06-08Merge tag 'pm-4.12-rc5' of ↵Linus Torvalds2-26/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These revert one problematic commit related to system sleep and fix one recent intel_pstate regression. Specifics: - Revert a recent commit that attempted to avoid spurious wakeups from suspend-to-idle via ACPI SCI, but introduced regressions on some systems (Rafael Wysocki). We will get back to the problem it tried to address in the next cycle. - Fix a possible division by 0 during intel_pstate initialization due to a missing check (Rafael Wysocki)" * tag 'pm-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle" cpufreq: intel_pstate: Avoid division by 0 in min_perf_pct_min()
2017-06-09Merge branches 'intel_pstate' and 'pm-sleep'Rafael J. Wysocki2-26/+5
* intel_pstate: cpufreq: intel_pstate: Avoid division by 0 in min_perf_pct_min() * pm-sleep: Revert "ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle"
2017-06-08Merge branch 'for-linus' of ↵Linus Torvalds1-36/+10
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk fix from Petr Mladek: "This reverts a fix added into 4.12-rc1. It caused the kernel log to be printed on another console when two consoles of the same type were defined, e.g. console=ttyS0 console=ttyS1. This configuration was never supported by kernel itself, but it started to make sense with systemd. In other words, the commit broke userspace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: Revert "printk: fix double printing with earlycon"
2017-06-08srcu: Allow use of Classic SRCU from both process and interrupt contextPaolo Bonzini1-3/+2
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Cc: [email protected] Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <[email protected]> Suggested-by: Linu Cherian <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]> Cc: Linus Torvalds <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2017-06-08srcu: Allow use of Tiny/Tree SRCU from both process and interrupt contextPaolo Bonzini2-6/+6
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on a single variable. Therefore, as Peter Zijlstra pointed out, Tiny SRCU's implementation already supports mixed-context use of srcu_read_lock() and srcu_read_unlock(), at least as long as uses of srcu_read_lock() and srcu_read_unlock() in each handler are nested and paired properly. In other words, it is still illegal to (say) invoke srcu_read_lock() in an interrupt handler and to invoke the matching srcu_read_unlock() in a softirq handler. Therefore, the only change required for Tiny SRCU is to its comments. Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <[email protected]> Suggested-by: Linu Cherian <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]> Cc: Linus Torvalds <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Paolo Bonzini <[email protected]>
2017-06-08Revert "printk: fix double printing with earlycon"Petr Mladek1-36/+10
This reverts commit cf39bf58afdaabc0b86f141630fb3fd18190294e. The commit regression to users that define both console=ttyS1 and console=ttyS0 on the command line, see https://lkml.kernel.org/r/[email protected] The kernel log messages always appeared only on one serial port. It is even documented in Documentation/admin-guide/serial-console.rst: "Note that you can only define one console per device type (serial, video)." The above mentioned commit changed the order in which the command line parameters are searched. As a result, the kernel log messages go to the last mentioned ttyS* instead of the first one. We long thought that using two console=ttyS* on the command line did not make sense. But then we realized that console= parameters were handled also by systemd, see http://0pointer.de/blog/projects/serial-console.html "By default systemd will instantiate one [email protected] on the main kernel console, if it is not a virtual terminal." where "[4] If multiple kernel consoles are used simultaneously, the main console is the one listed first in /sys/class/tty/console/active, which is the last one listed on the kernel command line." This puts the original report into another light. The system is running in qemu. The first serial port is used to store the messages into a file. The second one is used to login to the system via a socket. It depends on systemd and the historic kernel behavior. By other words, systemd causes that it makes sense to define both console=ttyS1 console=ttyS0 on the command line. The kernel fix caused regression related to userspace (systemd) and need to be reverted. In addition, it went out that the fix helped only partially. The messages still were duplicated when the boot console was removed early by late_initcall(printk_late_init). Then the entire log was replayed when the same console was registered as a normal one. Link: [email protected] Cc: Aleksey Makarov <[email protected]> Cc: Sabrina Dubroca <[email protected]> Cc: Sudeep Holla <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Peter Hurley <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Robin Murphy <[email protected]>, Cc: Steven Rostedt <[email protected]> Cc: "Nair, Jayachandran" <[email protected]> Cc: [email protected] Cc: [email protected] Reported-by: Sabrina Dubroca <[email protected]> Acked-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Petr Mladek <[email protected]>
2017-06-08perf/core: Drop kernel samples even though :u is specifiedJin Yao1-0/+21
When doing sampling, for example: perf record -e cycles:u ... On workloads that do a lot of kernel entry/exits we see kernel samples, even though :u is specified. This is due to skid existing. This might be a security issue because it can leak kernel addresses even though kernel sampling support is disabled. The patch drops the kernel samples if exclude_kernel is specified. For example, test on Haswell desktop: perf record -e cycles:u <mgen> perf report --stdio Before patch applied: 99.77% mgen mgen [.] buf_read 0.20% mgen mgen [.] rand_buf_init 0.01% mgen [kernel.vmlinux] [k] apic_timer_interrupt 0.00% mgen mgen [.] last_free_elem 0.00% mgen libc-2.23.so [.] __random_r 0.00% mgen libc-2.23.so [.] _int_malloc 0.00% mgen mgen [.] rand_array_init 0.00% mgen [kernel.vmlinux] [k] page_fault 0.00% mgen libc-2.23.so [.] __random 0.00% mgen libc-2.23.so [.] __strcasestr 0.00% mgen ld-2.23.so [.] strcmp 0.00% mgen ld-2.23.so [.] _dl_start 0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4 0.00% mgen ld-2.23.so [.] _start We can see kernel symbols apic_timer_interrupt and page_fault. After patch applied: 99.79% mgen mgen [.] buf_read 0.19% mgen mgen [.] rand_buf_init 0.00% mgen libc-2.23.so [.] __random_r 0.00% mgen mgen [.] rand_array_init 0.00% mgen mgen [.] last_free_elem 0.00% mgen libc-2.23.so [.] vfprintf 0.00% mgen libc-2.23.so [.] rand 0.00% mgen libc-2.23.so [.] __random 0.00% mgen libc-2.23.so [.] _int_malloc 0.00% mgen libc-2.23.so [.] _IO_doallocbuf 0.00% mgen ld-2.23.so [.] do_lookup_x 0.00% mgen ld-2.23.so [.] open_verify.constprop.7 0.00% mgen ld-2.23.so [.] _dl_important_hwcaps 0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4 0.00% mgen ld-2.23.so [.] _start There are only userspace symbols. Signed-off-by: Jin Yao <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-07Revert "ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle"Rafael J. Wysocki2-26/+5
Revert commit eed4d47efe95 (ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle) as it turned out to be premature and triggered a number of different issues on various systems. That includes, but is not limited to, premature suspend-to-RAM aborts on Dell XPS 13 (9343) reported by Dominik. The issue the commit in question attempted to address is real and will need to be taken care of going forward, but evidently more work is needed for this purpose. Reported-by: Dominik Brodowski <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-06-05Merge branch 'for-4.12-fixes' of ↵Linus Torvalds2-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two cgroup fixes. One to address RCU delay of cpuset removal affecting userland visible behaviors. The other fixes a race condition between controller disable and cgroup removal" * 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: consider dying css as offline cgroup: Prevent kill_css() from being called more than once
2017-06-04alarmtimer: Rate limit periodic intervalsThomas Gleixner1-0/+8
The alarmtimer code has another source of potentially rearming itself too fast. Interval timers with a very samll interval have a similar CPU hog effect as the previously fixed overflow issue. The reason is that alarmtimers do not implement the normal protection against this kind of problem which the other posix timer use: timer expires -> queue signal -> deliver signal -> rearm timer This scheme brings the rearming under scheduler control and prevents permanently firing timers which hog the CPU. Bringing this scheme to the alarm timer code is a major overhaul because it lacks all the necessary mechanisms completely. So for a quick fix limit the interval to one jiffie. This is not problematic in practice as alarmtimers are usually backed by an RTC for suspend which have 1 second resolution. It could be therefor argued that the resolution of this clock should be set to 1 second in general, but that's outside the scope of this fix. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Kostya Serebryany <[email protected]> Cc: syzkaller <[email protected]> Cc: John Stultz <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-06-04alarmtimer: Prevent overflow of relative timersThomas Gleixner1-3/+3
Andrey reported a alartimer related RCU stall while fuzzing the kernel with syzkaller. The reason for this is an overflow in ktime_add() which brings the resulting time into negative space and causes immediate expiry of the timer. The following rearm with a small interval does not bring the timer back into positive space due to the same issue. This results in a permanent firing alarmtimer which hogs the CPU. Use ktime_add_safe() instead which detects the overflow and clamps the result to KTIME_SEC_MAX. Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Kostya Serebryany <[email protected]> Cc: syzkaller <[email protected]> Cc: John Stultz <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]