aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2016-07-15rcu: Convert rcutree to hotplug state machineThomas Gleixner2-53/+66
Straight forward conversion to the state machine. Though the question arises whether this needs really all these state transitions to work. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-15smp/cfd: Convert core to hotplug state machineRichard Weinberger2-47/+41
Install the callbacks via the state machine. They are installed at runtime so smpcfd_prepare_cpu() needs to be invoked by the boot-CPU. Signed-off-by: Richard Weinberger <[email protected]> [ Added the dropped CPU dying case back in. ] Signed-off-by: Richard Cochran <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Cc: Davidlohr Bueso <dave@stgolabs> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-15profile: Convert to hotplug state machineSebastian Andrzej Siewior1-116/+65
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. A lot of code is removed because the for-loop is used and create_hash_tables() is removed since its purpose is covered by the startup / teardown hooks. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-15timers/core: Convert to hotplug state machineRichard Cochran2-23/+7
When tearing down, call timers_dead_cpu() before notify_dead(). There is a hidden dependency between: - timers - block multiqueue - rcutree If timers_dead_cpu() comes later than blk_mq_queue_reinit_notify() that latter function causes a RCU stall. Signed-off-by: Richard Cochran <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-15hrtimer: Convert to hotplug state machineThomas Gleixner2-35/+10
Split out the clockevents callbacks instead of piggybacking them on hrtimers. This gets rid of a POST_DEAD user. See commit: 54e88fad223c ("sched: Make sure timers have migrated before killing the migration_thread") We just move the callback state to the proper place in the state machine. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Rusty Russell <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge misc fixes from Andrew Morton: "20 fixes" * emailed patches from Andrew Morton <[email protected]>: m32r: fix build warning about putc mm: workingset: printk missing log level, use pr_info() mm: thp: refix false positive BUG in page_move_anon_rmap() mm: rmap: call page_check_address() with sync enabled to avoid racy check mm: thp: move pmd check inside ptl for freeze_page() vmlinux.lds: account for destructor sections gcov: add support for gcc version >= 6 mm, meminit: ensure node is online before checking whether pages are uninitialised mm, meminit: always return a valid node from early_pfn_to_nid kasan/quarantine: fix bugs on qlist_move_cache() uapi: export lirc.h header madvise_free, thp: fix madvise_free_huge_pmd return value after splitting Revert "scripts/gdb: add documentation example for radix tree" Revert "scripts/gdb: add a Radix Tree Parser" scripts/gdb: Perform path expansion to lx-symbol's arguments scripts/gdb: add constants.py to .gitignore scripts/gdb: rebuild constants.py on dependancy change scripts/gdb: silence 'nothing to do' message kasan: add newline to messages mm, compaction: prevent VM_BUG_ON when terminating freeing scanner
2016-07-15Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds3-7/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a CPU hotplug related corruption of the load average that got introduced in this merge window" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Correct off by one bug in load migration calculation
2016-07-15gcov: add support for gcc version >= 6Florian Meier1-1/+1
Link: http://lkml.kernel.org/r/20160701130914.GA23225@styxhp Signed-off-by: Florian Meier <[email protected]> Reviewed-by: Peter Oberparleiter <[email protected]> Tested-by: Peter Oberparleiter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-14audit: fix whitespace in CWD recordSteve Grubb1-1/+1
Fix the whitespace in the CWD record Signed-off-by: Steve Grubb <[email protected]> [PM: fixed subject line] Signed-off-by: Paul Moore <[email protected]>
2016-07-14sched/cputime: Drop local_irq_save/restore from irqtime_account_irq()Rik van Riel1-4/+0
Paolo pointed out that irqs are already blocked when irqtime_account_irq() is called. That means there is no reason to call local_irq_save/restore() again. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-14sched/cputime: Clean up the old vtime gen irqtime accounting completelyFrederic Weisbecker1-23/+10
Vtime generic irqtime accounting has been removed but there are a few remnants to clean up: * The vtime_accounting_cpu_enabled() check in irq entry was only used by CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can safely remove it. * Without the vtime_accounting_cpu_enabled(), we no longer need to have a vtime_common_account_irq_enter() indirect function. * Move vtime_account_irq_enter() implementation under CONFIG_VIRT_CPU_ACCOUNTING_NATIVE which is the last user. * The vtime_account_user() call was only used on irq entry for CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can remove that too. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-14sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING codeRik van Riel1-13/+3
The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not appear to currently work right. On CPUs without nohz_full=, only tick based irq time sampling is done, which breaks down when dealing with a nohz_idle CPU. On firewalls and similar systems, no ticks may happen on a CPU for a while, and the irq time spent may never get accounted properly. This can cause issues with capacity planning and power saving, which use the CPU statistics as inputs in decision making. Remove the VTIME_GEN vtime irq time code, and replace it with the IRQ_TIME_ACCOUNTING code, when selected as a config option by the user. Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-14sched/cputime: Count actually elapsed irq & softirq timeRik van Riel1-47/+77
Currently, if there was any irq or softirq time during 'ticks' jiffies, the entire period will be accounted as irq or softirq time. This is inaccurate if only a subset of the time was actually spent handling irqs, and could conceivably mis-count all of the ticks during a period as irq time, when there was some irq and some softirq time. This can actually happen when irqtime_account_process_tick is called from account_idle_ticks, which can pass a larger number of ticks down all at once. Fix this by changing irqtime_account_hi_update(), irqtime_account_si_update(), and steal_account_process_ticks() to work with cputime_t time units, and return the amount of time spent in each mode. Rename steal_account_process_ticks() to steal_account_process_time(), to reflect that time is now accounted in cputime_t, instead of ticks. Additionally, have irqtime_account_process_tick() take into account how much time was spent in each of steal, irq, and softirq time. The latter could help improve the accuracy of cputime accounting when returning from idle on a NO_HZ_IDLE CPU. Properly accounting how much time was spent in hardirq and softirq time will also allow the NO_HZ_FULL code to re-use these same functions for hardirq and softirq accounting. Signed-off-by: Rik van Riel <[email protected]> [ Make nsecs_to_cputime64() actually return cputime64_t. ] Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-14Merge branch 'sched/core' into timers/nohz, to avoid conflicts in upcoming ↵Ingo Molnar8-186/+412
patches Signed-off-by: Ingo Molnar <[email protected]>
2016-07-14workqueue: Convert to state machine callbacksThomas Gleixner2-65/+53
Get rid of the prio ordering of the separate notifiers and use a proper state callback pair. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Nicolas Iooss <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Rusty Russell <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-14perf/core: Convert to hotplug state machineThomas Gleixner2-46/+21
Actually a nice symmetric startup/teardown pair which fits properly into the state machine concept. In the long run we should be able to invoke the startup callback for the boot CPU via the state machine and get rid of the init function which invokes it on the boot CPU. Note: This comes actually before the perf hardware callbacks. In the notifier model the hardware callbacks have a higher priority than the core callback. But that's solely for CPU offline so that hardware migration of events happens before the core is notified about the outgoing CPU. With the symetric state array model we have the following ordering: UP: core -> hardware DOWN: hardware -> core Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Reviewed-by: Sebastian Siewior <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-14cpu/hotplug: Handle early registration gracefullyThomas Gleixner1-0/+7
We switched the hotplug machinery to smpboot threads. Early registration of hotplug callbacks, i.e. from do_pre_smp_initcalls(), happens before the threads are initialized. Instead of moving the thread init, we simply handle it in the hotplug code itself and invoke the function directly. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-14Merge tag 'iio-for-4.8c' of ↵Greg Kroah-Hartman1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: Third set of IIO new device support, features and cleanups for the 4.8 cycle. New core features - Selection of the clock source for IIO timestamps. This is done per device as it makes little sense to have events in one timebase and data timestamped on another. Biggest reason for this is that we currently use a clock source which is non monotonic which can result in 'interesting' data sets. (Includes export for get_monotonic_corse64 which Thomas Gleixner didn't mind in an earlier version.) - MAINTAINERS add the git tree to the list for IIO. New device support + a kind of indirect staging graduation. * Broadcom iproc-static-adc - new driver * mcp4531 - support for MCP454x, MCP456x, MCP464x and MCP466x potentiometers * mpu6050 - support the IC20608 6 axis motion tracking device * st-sensors - support the lis3l02dq + drop the lis3l02dq driver from staging. The general purpose driver is missing event support, but good to get rid of this driver which was rather long in the tooth. New driver features * ak8975 - Add vid regulator support and refactor handling in general. - Allow a delay after enabling regulators. - Runtime and system PM. * bmg160 - filter frequency control support. * bmp280 - SPI device support. - EOC interrupt support for the BMP085 - power management support. - supply regulator support. - reset gpio support - dt bindings for reset gpio and regulators. - of table to support device tree registration * max1363 - Device tree bindings. * mcp4531 - Device tree bindings. * st-pressure - temperature channels as part of triggered buffer (previously not due probably to alignment issues - see below). - lps22hb open drain interrupt support. - lps22hb temperature channel support Cleanups and reworkings. * numerous ADC drivers - ensure the iio_dev->dev.of_node is set to the parent dev.of_node so as to allow client bindings to find the device. * ak8975 - Fix incorrect handling of missing regulator - make sure power is down and remove. * bmp280 - read the calibration data only once as it doesn't change. * isl29125 - Use a few macros to make code a touch more readable. * mma8452 - fix a memory leak on error. - drop an unecessary bit of return value handling. * potentiometer kconfig - typo fix. * st-pressure - drop some uninformative default assignments of elements of the channel array structure (aids readability). * st-sensors - Harden interrupt handling considerably. These are actually all using level interrupts, but at least two known boards have them wired to edge only interrupt chips. Hence a slightly interesting bit of handling is needed in which we first allow for the easy option (level triggered) and secondly check the status registers before reenabling edge interrupts and fall back to a tight loop in the thread until we successfully clear the interrupt. No harm is done if we never succeed in doing so. It's an odd patch that has been through a lot of revisions to reach a consensus on how to handle what is basically broken hardware (which the previous defaults allowed to kind of work). - Fix alignment to defined storagebytes boundaries. - Ensure alignment of power of 2 byte boundaries. This has always in theory been part of the ABI of IIO, but we missed a few that snuck in that need fixing. The effect was minor as they were only followed by timestamp channels which were correctly aligned, - Add some docs to explain the gain calculations.
2016-07-14Merge branches 'perf-urgent-for-linus' and 'timers-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf and timer fixes from Ingo Molnar: "A fix for a posix CPU timers bug, and a perf printk message fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix bogus kernel printk, again * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix_cpu_timer: Exit early when process has been reaped
2016-07-13Merge branch 'core/urgent' into smp/hotplug to pick up dependenciesThomas Gleixner1-0/+2
2016-07-13Merge branch 'core/rcu' into smp/hotplug to pick up dependenciesThomas Gleixner9-736/+844
2016-07-13Merge branch 'timers/core' into smp/hotplug to pick up dependenciesThomas Gleixner12-537/+749
2016-07-13sched/core: Correct off by one bug in load migration calculationThomas Gleixner3-7/+9
The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into account that the function is now called from a thread running on the outgoing CPU. As a result a cpu unplug leakes a load of 1 into the global load accounting mechanism. Fix it by adjusting for the currently running thread which calls calc_load_migrate(). Reported-by: Anton Blanchard <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Vaidyanathan Srinivasan <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: e9cd8fa4fcfd: ("sched/migration: Move calc_load_migrate() into CPU_DYING") Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607121744350.4083@nanos Signed-off-by: Ingo Molnar <[email protected]>
2016-07-13cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds ↵Thomas Gleixner1-0/+2
scribble Xiaolong Ye reported lock debug warnings triggered by the following commit: 8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine") The bug is the following: the cpuhp_bp_states[] array is cut short when CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless and happily scribble outside of the array bounds... We need to store them in case that the state is unregistered so we can invoke the teardown function. That's independent of CONFIG_SMP. Make sure the array is large enough. Reported-by: kernel test robot <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Adam Borowski <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anna-Maria Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: cff7d378d3fd "cpu/hotplug: Convert to a state machine for the control processor" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanos Signed-off-by: Ingo Molnar <[email protected]>
2016-07-11bpf: make inode code explicitly non-modularPaul Gortmaker1-3/+1
The Kconfig currently controlling compilation of this code is: init/Kconfig:config BPF_SYSCALL init/Kconfig: bool "Enable bpf() system call" ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modular infrastructure use, so that when reading the driver there is no doubt it is builtin-only. Note that MODULE_ALIAS is a no-op for non-modular code. We replace module.h with init.h since the file does use __init. Cc: Alexei Starovoitov <[email protected]> Cc: [email protected] Signed-off-by: Paul Gortmaker <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-07-11irqdomain: Fix irq_domain_alloc_irqs_recursive() error handlingAlexander Popov1-2/+4
If an irq_domain is auto-recursive and irq_domain_alloc_irqs_recursive() for its parent has returned an error, then do return and avoid calling irq_domain_free_irqs_recursive() uselessly, because: - if domain->ops->alloc() had failed for an auto-recursive irq_domain, then irq_domain_free_irqs_recursive() had already been called; - if domain->ops->alloc() had failed for a not auto-recursive irq_domain, then there is nothing to free at all. Signed-off-by: Alexander Popov <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-07-11posix_cpu_timer: Exit early when process has been reapedAlexey Dobriyan1-0/+1
Variable "now" seems to be genuinely used unintialized if branch if (CPUCLOCK_PERTHREAD(timer->it_clock)) { is not taken and branch if (unlikely(sighand == NULL)) { is taken. In this case the process has been reaped and the timer is marked as disarmed anyway. So none of the postprocessing of the sample is required. Return right away. Signed-off-by: Alexey Dobriyan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-07-10Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86"Ingo Molnar1-7/+0
This reverts commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a. Stephane reported the following regression: > Since Andi added: > > commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a > Author: Andi Kleen <[email protected]> > Date: Thu Jun 9 06:14:38 2016 -0700 > > perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86 > > $ perf stat -e ref-cycles ls > <not counted> .... > > fails systematically because the ref-cycles is now used by the > watchdog and given this is a system-wide pinned event, it monopolizes > the fixed counter 2 which is the only counter able to measure this event. Since the next merge window is near, fix the regression for now by reverting the commit. Reported-by: Stephane Eranian <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Vince Weaver <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-10sched/core: Panic on scheduling while atomic bugs if kernel.panic_on_warn is setDaniel Bristot de Oliveira1-0/+3
Currently, a schedule while atomic error prints the stack trace to the kernel log and the system continue running. Although it is possible to collect the kernel log messages and analyze it, often more information are needed. Furthermore, keep the system running is not always the best choice. For example, when the preempt count underflows the system will not stop to complain about scheduling while atomic, so the kernel log can wrap around overwriting the first stack trace, tuning the analysis even more challenging. This patch uses the kernel.panic_on_warn sysctl to help out on these more complex situations. When kernel.panic_on_warn is set to 1, the kernel will panic() in the schedule while atomic detection. The default value of the sysctl is 0, maintaining the current behavior. Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Reviewed-by: Luis Claudio R. Goncalves <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis Claudio R. Goncalves <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/e8f7b80f353aa22c63bd8557208163989af8493d.1464983675.git.bristot@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-07-10PM / hibernate: Image data protection during restorationRafael J. Wysocki3-0/+52
Make it possible to protect all pages holding image data during hibernate image restoration by setting them read-only (so as to catch attempts to write to those pages after image data have been stored in them). This adds overhead to image restoration code (it may cause large page mappings to be split as a result of page flags changes) and the errors it protects against should never happen in theory, so the feature is only active after passing hibernate=protect_image to the command line of the restore kernel. Also it only is built if CONFIG_DEBUG_RODATA is set. Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-07-10PM / hibernate: Add missing braces in __register_nosave_region()Rafael J. Wysocki1-1/+2
One branch of an if/else statement in __register_nosave_region() is formatted against the kernel coding style which causes the code to look slightly odd. To fix that, add missing braces to it. No functional changes. Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-07-10PM / hibernate: Clean up comments in snapshot.cRafael J. Wysocki1-306/+330
Many comments in kernel/power/snapshot.c do not follow the general comment formatting rules. They look odd, some of them are outdated too, some are hard to parse and generally difficult to understand. Clean them up to make them easier to comprehend. No functional changes. Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-07-10PM / hibernate: Clean up function headers in snapshot.cRafael J. Wysocki1-51/+42
The formatting of some function headers in kernel/power/snapshot.c is not consistent with the general kernel coding style and with the formatting of some other function headers in the same file. Make all of them follow the same formatting convention. No functional changes. Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-07-10PM / hibernate: Add missing braces in hibernate_setup()Rafael J. Wysocki1-3/+3
Make hibernate_setup() follow the coding style more closely by adding some missing braces to the if () statement in it. Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-07-09sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats togetherZhao Lei1-0/+40
In current code, we can get cpuacct data from several files, but each file has various limitations. For example: - We can get CPU usage in user and kernel mode via cpuacct.stat, but we can't get detailed data about each CPU. - We can get each CPU's kernel mode usage in cpuacct.usage_percpu_sys, but we can't get user mode usage data at the same time. This patch introduces cpuacct.usage_all, to show all detailed CPU accounting data together: # cat cpuacct.usage_all cpu user system 0 3809760299 5807968992 1 3250329855 454612211 .. Signed-off-by: Zhao Lei <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/7744460969edd7caaf0e903592ee52353ed9bdd6.1466415271.git.zhaolei@cn.fujitsu.com Signed-off-by: Ingo Molnar <[email protected]>
2016-07-09sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show()Zhao Lei1-15/+14
In cpuacct_stats_show() we currently we have copies of similar code, for each cpustat(system/user) variant. Use a loop instead to consolidate the code. This will also work better if we extend the CPUACCT_STAT_NSTATS type. Signed-off-by: Zhao Lei <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/b0597d4224655e9f333f1a6224ed9654c7d7d36a.1466415271.git.zhaolei@cn.fujitsu.com Signed-off-by: Ingo Molnar <[email protected]>
2016-07-09sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enumsZhao Lei1-27/+20
These two types have similar function, no need to separate them. Signed-off-by: Zhao Lei <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/436748885270d64363c7dc67167507d486c2057a.1466415271.git.zhaolei@cn.fujitsu.com Signed-off-by: Ingo Molnar <[email protected]>
2016-07-09bpf: introduce bpf_get_current_task() helperAlexei Starovoitov1-0/+13
over time there were multiple requests to access different data structures and fields of task_struct current, so finally add the helper to access 'current' as-is. Tracing bpf programs will do the rest of walking the pointers via bpf_probe_read(). Note that current can be null and bpf program has to deal it with, but even dumb passing null into bpf_probe_read() is still safe. Suggested-by: Brendan Gregg <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-07-08Merge back earlier suspend/hibernation changes for v4.8.Rafael J. Wysocki10-95/+149
2016-07-08Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-22/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two load-balancing fixes for cgroups-intense workloads" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion sched/fair: Fix effective_load() to consistently use smoothed load
2016-07-08Merge branch 'x86/mm' into x86/boot, to pick up dependenciesIngo Molnar19-176/+277
Signed-off-by: Ingo Molnar <[email protected]>
2016-07-07Merge branch 'timers/fast-wheel' into timers/coreIngo Molnar29-692/+1077
2016-07-07timers: Implement optimization for same expiry time in mod_timer()Anna-Maria Gleixner1-16/+35
The existing optimization for same expiry time in mod_timer() checks whether the timer expiry time is the same as the new requested expiry time. In the old timer wheel implementation this does not take the slack batching into account, neither does the new implementation evaluate whether the new expiry time will requeue the timer to the same bucket. To optimize that, we can calculate the resulting bucket and check if the new expiry time is different from the current expiry time. This calculation happens outside the base lock held region. If the resulting bucket is the same we can avoid taking the base lock and requeueing the timer. If the timer needs to be requeued then we have to check under the base lock whether the base time has changed between the lockless calculation and taking the lock. If it has changed we need to recalculate under the lock. This optimization takes effect for timers which are enqueued into the less granular wheel levels (1 and above). With a simple test case the functionality has been verified: Before After Match: 5.5% 86.6% Requeue: 94.5% 13.4% Recalc: <0.01% In the non optimized case the timer is requeued in 94.5% of the cases. With the index optimization in place the requeue rate drops to 13.4%. The case where the lockless index calculation has to be redone is less than 0.01%. With a real world test case (networking) we observed the following changes: Before After Match: 97.8% 99.7% Requeue: 2.2% 0.3% Recalc: <0.001% That means two percent fewer lock/requeue/unlock operations done in one of the hot path use cases of timers. Signed-off-by: Anna-Maria Gleixner <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Chris Mason <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: George Spelvin <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-07timers: Split out index calculationAnna-Maria Gleixner1-15/+32
For further optimizations we need to seperate index calculation from queueing. No functional change. Signed-off-by: Anna-Maria Gleixner <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Chris Mason <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: George Spelvin <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-07timers: Only wake softirq if necessaryThomas Gleixner1-0/+11
With the wheel forwading in place and with the HZ=1000 4ms folding we can avoid running the softirq at all. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Chris Mason <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: George Spelvin <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-07timers: Forward the wheel clock whenever possibleThomas Gleixner3-21/+120
The wheel clock is stale when a CPU goes into a long idle sleep. This has the side effect that timers which are queued end up in the outer wheel levels. That results in coarser granularity. To solve this, we keep track of the idle state and forward the wheel clock whenever possible. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Chris Mason <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: George Spelvin <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-07timers/nohz: Remove pointless tick_nohz_kick_tick() functionThomas Gleixner1-32/+1
This was a failed attempt to optimize the timer expiry in idle, which was disabled and never revisited. Remove the cruft. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Chris Mason <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: George Spelvin <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-07timers: Optimize collect_expired_timers() for NOHZAnna-Maria Gleixner1-8/+41
After a NOHZ idle sleep the timer wheel must be forwarded to current jiffies. There might be expired timers so the current code loops and checks the expired buckets for timers. This can take quite some time for long NOHZ idle periods. The pending bitmask in the timer base allows us to do a quick search for the next expiring timer and therefore a fast forward of the base time which prevents pointless long lasting loops. For a 3 seconds idle sleep this reduces the catchup time from ~1ms to 5us. Signed-off-by: Anna-Maria Gleixner <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Chris Mason <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: George Spelvin <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-07timers: Move __run_timers() functionAnna-Maria Gleixner1-26/+26
Move __run_timers() below __next_timer_interrupt() and next_pending_bucket() in preparation for __run_timers() NOHZ optimization. No functional change. Signed-off-by: Anna-Maria Gleixner <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Chris Mason <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: George Spelvin <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-07timers: Remove set_timer_slack() leftoversThomas Gleixner1-19/+0
We now have implicit batching in the timer wheel. The slack API is no longer used, so remove it. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Alan Stern <[email protected]> Cc: Andrew F. Davis <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Chris Mason <[email protected]> Cc: David S. Miller <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Dmitry Eremin-Solenikov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: George Spelvin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jaehoon Chung <[email protected]> Cc: Jens Axboe <[email protected]> Cc: John Stultz <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mathias Nyman <[email protected]> Cc: Pali Rohár <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>