blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2020-11-16	hrtimer: Fix kernel-doc markups	Mauro Carvalho Chehab	1	-1/+5
	The hrtimer_get_remaining() markup is documenting, instead, __hrtimer_get_remaining(), as it is placed at the C file. In order to properly document it, a kernel-doc markup is needed together with the function prototype. So, add a new one, while preserving the existing one, just fixing the function name. The hrtimer_is_queued prototype has a typo: it is using '=' instead of '-' to split: identifier - description as required by kernel-doc markup. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/9dc87808c2fd07b7e050bafcd033c5ef05808fea.1605521731.git.mchehab+huawei@kernel.org
2020-08-06	locking/seqlock, headers: Untangle the spaghetti monster	Peter Zijlstra	1	-0/+1
	By using lockdep_assert_*() from seqlock.h, the spaghetti monster attacked. Attack back by reducing seqlock.h dependencies from two key high level headers: - <linux/seqlock.h>: -Remove <linux/ww_mutex.h> - <linux/time.h>: -Remove <linux/seqlock.h> - <linux/sched.h>: +Add <linux/seqlock.h> The price was to add it to sched.h ... Core header fallout, we add direct header dependencies instead of gaining them parasitically from higher level headers: - <linux/dynamic_queue_limits.h>: +Add <asm/bug.h> - <linux/hrtimer.h>: +Add <linux/seqlock.h> - <linux/ktime.h>: +Add <asm/bug.h> - <linux/lockdep.h>: +Add <linux/smp.h> - <linux/sched.h>: +Add <linux/seqlock.h> - <linux/videodev2.h>: +Add <linux/kernel.h> Arch headers fallout: - PARISC: <asm/timex.h>: +Add <asm/special_insns.h> - SH: <asm/io.h>: +Add <asm/page.h> - SPARC: <asm/timer_64.h>: +Add <uapi/asm/asi.h> - SPARC: <asm/vvar.h>: +Add <asm/processor.h>, <asm/barrier.h> -Remove <linux/seqlock.h> - X86: <asm/fixmap.h>: +Add <asm/pgtable_types.h> -Remove <asm/acpi.h> There's also a bunch of parasitic header dependency fallout in .c files, not listed separately. [ mingo: Extended the changelog, split up & fixed the original patch. ] Co-developed-by: Ingo Molnar <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-29	hrtimer: Use sequence counter with associated raw spinlock	Ahmed S. Darwish	1	-1/+1
	A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_raw_spinlock_t data type, which allows to associate a raw spinlock with the sequence counter. This enables lockdep to verify that the raw spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-14	hrtimers: Prepare hrtimer_nanosleep() for time namespaces	Andrei Vagin	1	-2/+1
	clock_nanosleep() accepts absolute values of expiration time when TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace, and has to be converted to the host's time. There is timens_ktime_to_host() helper for converting time, but it accepts ktime argument. As a preparation, make hrtimer_nanosleep() accept a clock value in ktime instead of timespec64. Co-developed-by: Dmitry Safonov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-06	hrtimer: Annotate lockless access to timer->state	Eric Dumazet	1	-4/+10
	syzbot reported various data-race caused by hrtimer_is_queued() reading timer->state. A READ_ONCE() is required there to silence the warning. Also add the corresponding WRITE_ONCE() when timer->state is set. In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid loading timer->state twice. KCSAN reported these cases: BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0: __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991 __run_hrtimer kernel/time/hrtimer.c:1496 [inline] __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593 __do_softirq+0x115/0x33f kernel/softirq.c:292 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1: tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline] tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225 tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044 tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558 tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717 tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561 sk_backlog_rcv include/net/sock.h:945 [inline] __release_sock+0x135/0x1e0 net/core/sock.c:2435 release_sock+0x61/0x160 net/core/sock.c:2951 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0: __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991 __run_hrtimer kernel/time/hrtimer.c:1496 [inline] __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1: __tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265 tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline] tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561 sk_backlog_rcv include/net/sock.h:945 [inline] __release_sock+0x135/0x1e0 net/core/sock.c:2435 release_sock+0x61/0x160 net/core/sock.c:2951 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 __sys_sendto+0x21f/0x320 net/socket.c:1952 __do_sys_sendto net/socket.c:1964 [inline] __se_sys_sendto net/socket.c:1960 [inline] __x64_sys_sendto+0x89/0xb0 net/socket.c:1960 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ tglx: Added comments ] Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-08-28	hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD	Sebastian Andrzej Siewior	1	-0/+2
	Add kernel doc annotation for HRTIMER_MODE_HARD. Fixes: ae6683d815895 ("hrtimer: Introduce HARD expiry mode") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-08-01	hrtimer: Prepare support for PREEMPT_RT	Anna-Maria Gleixner	1	-0/+16
	When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling hrtimer_cancel() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If hrtimer_cancel() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-08-01	hrtimer: Make enqueue mode check work on RT	Thomas Gleixner	1	-0/+3
	hrtimer_start_range_ns() has a WARN_ONCE() which verifies that a timer which is marker for softirq expiry is not queued in the hard interrupt base and vice versa. When PREEMPT_RT is enabled, timers which are not explicitely marked to expire in hard interrupt context are deferrred to the soft interrupt. So the regular check would trigger. Change the check, so when PREEMPT_RT is enabled, it is verified that the timers marked for hard interrupt expiry are not tried to be queued for soft interrupt expiry or any of the unmarked and softirq marked is tried to be expired in hard interrupt context. Signed-off-by: Thomas Gleixner <[email protected]>
2019-08-01	hrtimer: Introduce HARD expiry mode	Sebastian Andrzej Siewior	1	-0/+6
	On PREEMPT_RT not all hrtimers can be expired in hard interrupt context even if that is perfectly fine on a PREEMPT_RT=n kernel, e.g. because they take regular spinlocks. Also for latency reasons PREEMPT_RT tries to defer most hrtimers' expiry into soft interrupt context. But there are hrtimers which must be expired in hard interrupt context even when PREEMPT_RT is enabled: - hrtimers which must expiry in hard interrupt context, e.g. scheduler, perf, watchdog related hrtimers - latency critical hrtimers, e.g. nanosleep, ..., kvm lapic timer Add a new mode flag HRTIMER_MODE_HARD which allows to mark these timers so PREEMPT_RT will not move them into softirq expiry mode. [ tglx: Split out of a larger combo patch. Added changelog ] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-08-01	hrtimer: Provide hrtimer_sleeper_start_expires()	Thomas Gleixner	1	-0/+3
	hrtimer_sleepers will gain a scheduling class dependent treatment on PREEMPT_RT. Create a wrapper around hrtimer_start_expires() to make that possible. Signed-off-by: Thomas Gleixner <[email protected]>
2019-08-01	hrtimer: Consolidate hrtimer_init() + hrtimer_init_sleeper() calls	Sebastian Andrzej Siewior	1	-3/+14
	hrtimer_init_sleeper() calls require prior initialisation of the hrtimer object which is embedded into the hrtimer_sleeper. Combine the initialization and spare a function call. Fixup all call sites. This is also a preparatory change for PREEMPT_RT to do hrtimer sleeper specific initializations of the embedded hrtimer without modifying any of the call sites. No functional change. [ anna-maria: Minor cleanups ] [ tglx: Adopted to the removal of the task argument of hrtimer_init_sleeper() and trivial polishing. Folded a fix from Stephen Rothwell for the vsoc code ] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-07-30	hrtimer: Remove task argument from hrtimer_init_sleeper()	Thomas Gleixner	1	-2/+1
	All callers hand in 'current' and that's the only task pointer which actually makes sense. Remove the task argument and set current in the function. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22	hrtimer: Split out hrtimer defines into separate header	Vincenzo Frascino	1	-15/+1
	To avoid include dependency hell split out the hrtimer defines which are required in the upcoming VDSO library into a separate header file. [ tglx: Split out from the VDSO library patch and included ktime.h as the new header depends on it. ] Signed-off-by: Vincenzo Frascino <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Shijith Thotton <[email protected]> Tested-by: Andre Przywara <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Russell King <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paul Burton <[email protected]> Cc: Daniel Lezcano <[email protected]> Cc: Mark Salyzyn <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Huw Davies <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-11-23	hrtimers/tick/clockevents: Remove sloppy license references	Thomas Gleixner	1	-2/+0
	"For licencing details see kernel-base/COPYING" and similar license references have no value over the SPDX identifier. Remove them. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Ingo Molnar <[email protected]> Acked-by: John Stultz <[email protected]> Acked-by: Corey Minyard <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Kate Stewart <[email protected]> Cc: Philippe Ombredanne <[email protected]> Cc: Peter Anvin <[email protected]> Cc: Russell King <[email protected]> Cc: Richard Cochran <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: David Riley <[email protected]> Cc: Colin Cross <[email protected]> Cc: Mark Brown <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-11-23	time: Add SPDX license identifiers	Thomas Gleixner	1	-0/+1
	Update the time(r) core files files with the correct SPDX license identifier based on the license text in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This work is based on a script and data from Philippe Ombredanne, Kate Stewart and myself. The data has been created with two independent license scanners and manual inspection. The following files do not contain any direct license information and have been omitted from the big initial SPDX changes: timeconst.bc: The .bc files were not touched time.c, timer.c, timekeeping.c: Licence was deduced from EXPORT_SYMBOL_GPL As those files do not contain direct license references they fall under the project license, i.e. GPL V2 only. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Ingo Molnar <[email protected]> Acked-by: John Stultz <[email protected]> Acked-by: Corey Minyard <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Kate Stewart <[email protected]> Cc: Philippe Ombredanne <[email protected]> Cc: Russell King <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: David Riley <[email protected]> Cc: Colin Cross <[email protected]> Cc: Mark Brown <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-11-23	time: Remove useless filenames in top level comments	Thomas Gleixner	1	-2/+0
	Remove the pointless filenames in the top level comments. They have no value at all and just occupy space. While at it tidy up some of the comments and remove a stale one. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Nicolas Pitre <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Ingo Molnar <[email protected]> Acked-by: John Stultz <[email protected]> Acked-by: Corey Minyard <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Kate Stewart <[email protected]> Cc: Philippe Ombredanne <[email protected]> Cc: Peter Anvin <[email protected]> Cc: Russell King <[email protected]> Cc: Richard Cochran <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: David Riley <[email protected]> Cc: Colin Cross <[email protected]> Cc: Mark Brown <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-04-26	Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME	Thomas Gleixner	1	-0/+2
	Revert commits 92af4dcb4e1c ("tracing: Unify the "boot" and "mono" tracing clocks") 127bfa5f4342 ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior") 7250a4047aa6 ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior") d6c7270e913d ("timekeeping: Remove boot time specific code") f2d6fdbfd238 ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior") d6ed449afdb3 ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock") 72199320d49d ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock") As stated in the pull request for the unification of CLOCK_MONOTONIC and CLOCK_BOOTTIME, it was clear that we might have to revert the change. As reported by several folks systemd and other applications rely on the documented behaviour of CLOCK_MONOTONIC on Linux and break with the above changes. After resume daemons time out and other timeout related issues are observed. Rafael compiled this list: * systemd kills daemons on resume, after >WatchdogSec seconds of suspending (Genki Sky). [Verified that that's because systemd uses CLOCK_MONOTONIC and expects it to not include the suspend time.] * systemd-journald misbehaves after resume: systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal corrupted or uncleanly shut down, renaming and replacing. (Mike Galbraith). * NetworkManager reports "networking disabled" and networking is broken after resume 50% of the time (Pavel). [May be because of systemd.] * MATE desktop dims the display and starts the screensaver right after system resume (Pavel). * Full system hang during resume (me). [May be due to systemd or NM or both.] That happens on debian and open suse systems. It's sad, that these problems were neither catched in -next nor by those folks who expressed interest in this change. Reported-by: Rafael J. Wysocki <[email protected]> Reported-by: Genki Sky <[email protected]>, Reported-by: Pavel Machek <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: John Stultz <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kevin Easton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salyzyn <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]>
2018-04-11	Merge branches 'pm-cpuidle' and 'pm-qos'	Rafael J. Wysocki	1	-0/+1
	* pm-cpuidle: tick-sched: avoid a maybe-uninitialized warning cpuidle: Add definition of residency to sysfs documentation time: hrtimer: Use timerqueue_iterate_next() to get to the next timer nohz: Avoid duplication of code related to got_idle_tick nohz: Gather tick_sched booleans under a common flag field cpuidle: menu: Avoid selecting shallow states with stopped tick cpuidle: menu: Refine idle state selection for running tick sched: idle: Select idle state before stopping the tick time: hrtimer: Introduce hrtimer_next_event_without() time: tick-sched: Split tick_nohz_stop_sched_tick() cpuidle: Return nohz hint from cpuidle_select() jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC sched: idle: Do not stop the tick before cpuidle_idle_call() sched: idle: Do not stop the tick upfront in the idle loop time: tick-sched: Reorganize idle tick management code * pm-qos: PM / QoS: mark expected switch fall-throughs
2018-04-07	time: hrtimer: Introduce hrtimer_next_event_without()	Rafael J. Wysocki	1	-0/+1
	The next set of changes will need to compute the time to the next hrtimer event over all hrtimers except for the scheduler tick one. To that end introduce a new helper function, hrtimer_next_event_without(), for computing the time until the next hrtimer event over all timers except for one and modify the underlying code in __hrtimer_next_event_base() to prepare it for being called by that new function. No intentional changes in functionality. Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]>
2018-03-13	hrtimer: Unify MONOTONIC and BOOTTIME clock behavior	Thomas Gleixner	1	-2/+0
	Now that th MONOTONIC and BOOTTIME clocks are indentical remove all the special casing. The user space visible interfaces still support both clocks, but their behavior is identical. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: John Stultz <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kevin Easton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salyzyn <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Implement support for softirq based hrtimers	Anna-Maria Gleixner	1	-5/+16
	hrtimer callbacks are always invoked in hard interrupt context. Several users in tree require soft interrupt context for their callbacks and achieve this by combining a hrtimer with a tasklet. The hrtimer schedules the tasklet in hard interrupt context and the tasklet callback gets invoked in softirq context later. That's suboptimal and aside of that the real-time patch moves most of the hrtimers into softirq context. So adding native support for hrtimers expiring in softirq context is a valuable extension for both mainline and the RT patch set. Each valid hrtimer clock id has two associated hrtimer clock bases: one for timers expiring in hardirq context and one for timers expiring in softirq context. Implement the functionality to associate a hrtimer with the hard or softirq related clock bases and update the relevant functions to take them into account when the next expiry time needs to be evaluated. Add a check into the hard interrupt context handler functions to check whether the first expiring softirq based timer has expired. If it's expired the softirq is raised and the accounting of softirq based timers to evaluate the next expiry time for programming the timer hardware is skipped until the softirq processing has finished. At the end of the softirq processing the regular processing is resumed. Suggested-by: Thomas Gleixner <[email protected]> Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Add clock bases and hrtimer mode for softirq context	Anna-Maria Gleixner	1	-0/+14
	Currently hrtimer callback functions are always executed in hard interrupt context. Users of hrtimers, which need their timer function to be executed in soft interrupt context, make use of tasklets to get the proper context. Add additional hrtimer clock bases for timers which must expire in softirq context, so the detour via the tasklet can be avoided. This is also required for RT, where the majority of hrtimer is moved into softirq hrtimer context. The selection of the expiry mode happens via a mode bit. Introduce HRTIMER_MODE_SOFT and the matching combinations with the ABS/REL/PINNED bits and update the decoding of hrtimer_mode in tracepoints. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Make hrtimer_reprogramm() unconditional	Anna-Maria Gleixner	1	-3/+3
	hrtimer_reprogram() needs to be available unconditionally for softirq based hrtimers. Move the function and all required struct members out of the CONFIG_HIGH_RES_TIMERS #ifdef. There is no functional change because hrtimer_reprogram() is only invoked when hrtimer_cpu_base.hres_active is true. Making it unconditional increases the text size for the CONFIG_HIGH_RES_TIMERS=n case, but avoids replication of that code for the upcoming softirq based hrtimers support. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional	Anna-Maria Gleixner	1	-2/+2
	hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer in a CPU base. This pointer cannot be dereferenced and is solely used to check whether a hrtimer which is removed is the hrtimer which is the first to expire in the CPU base. If this is the case, then the timer hardware needs to be reprogrammed to avoid an extra interrupt for nothing. Again, this is conditional functionality, but there is no compelling reason to make this conditional. As a preparation, hrtimer_cpu_base.next_timer needs to be available unconditonally. Aside of that the upcoming support for softirq based hrtimers requires access to this pointer unconditionally as well, so our motivation is not entirely simplicity based. Make the update of hrtimer_cpu_base.next_timer unconditional and remove the #ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is marginal as it's just a store on an already dirtied cacheline. No functional change. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Make the remote enqueue check unconditional	Anna-Maria Gleixner	1	-3/+3
	hrtimer_cpu_base.expires_next is used to cache the next event armed in the timer hardware. The value is used to check whether an hrtimer can be enqueued remotely. If the new hrtimer is expiring before expires_next, then remote enqueue is not possible as the remote hrtimer hardware cannot be accessed for reprogramming to an earlier expiry time. The remote enqueue check is currently conditional on CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no compelling reason to make this conditional. Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y guarded area and remove the conditionals in hrtimer_check_target(). The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the !hrtimer_cpu_base.hres_active case because in these cases nothing updates hrtimer_cpu_base.expires_next yet. This will be changed with later patches which further reduce the #ifdef zoo in this code. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Make the hrtimer_cpu_base::hres_active field unconditional, to ↵	Anna-Maria Gleixner	1	-12/+8
	simplify the code The hrtimer_cpu_base::hres_active_member field depends on CONFIG_HIGH_RES_TIMERS=y currently, and all related functions to this member are conditional as well. To simplify the code make it unconditional and set it to zero during initialization. (This will also help with the upcoming softirq based hrtimers code.) The conditional code sections can be avoided by adding IS_ENABLED(HIGHRES) conditionals into common functions, which ensures dead code elimination. There is no functional change. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Make room in 'struct hrtimer_cpu_base'	Anna-Maria Gleixner	1	-2/+2
	The upcoming softirq based hrtimers support requires an additional field in the hrtimer_cpu_base struct, which would grow the struct size beyond a cache line. The hrtimer_cpu_base::nr_retries and ::nr_hangs members are solely used for diagnostic output and have no requirement to be 'unsigned int'. Make them 'unsigned short' to create room for the new struct member. No functional change. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Store running timer in hrtimer_clock_base	Anna-Maria Gleixner	1	-11/+9
	The pointer to the currently running timer is stored in hrtimer_cpu_base before the base lock is dropped and the callback is invoked. This results in two levels of indirections and the upcoming support for softirq based hrtimer requires splitting the "running" storage into soft and hard IRQ context expiry. Storing both in the cpu base would require conditionals in all code paths accessing that information. It's possible to have a per clock base sequence count and running pointer without changing the semantics of the related mechanisms because the timer base pointer cannot be changed while a timer is running the callback. Unfortunately this makes cpu_clock base larger than 32 bytes on 32-bit kernels. Instead of having huge gaps due to alignment, remove the alignment and let the compiler pack CPU base for 32-bit kernels. The resulting cache access patterns are fortunately not really different from the current behaviour. On 64-bit kernels the 64-byte alignment stays and the behaviour is unchanged. This was determined by analyzing the resulting layout and looking at the number of cache lines involved for the frequently used clocks. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Clean up 'enum hrtimer_mode'	Anna-Maria Gleixner	1	-5/+11
	It's not obvious that the HRTIMER_MODE variants are bit combinations, because all modes are hard coded constants currently. Change it so the bit meanings are clear; and use the symbols for creating modes which combine bits. While at it get rid of the ugly tail comments as well. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Fix hrtimer_start[_range_ns]() function descriptions	Anna-Maria Gleixner	1	-3/+3
	The hrtimer_start[_range_ns]() functions start a timer reliably on this CPU only when HRTIMER_MODE_PINNED is set. Furthermore the HRTIMER_MODE_PINNED mode is not considered when a hrtimer is initialized. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Clean up the 'int clock' parameter of schedule_hrtimeout_range_clock()	Anna-Maria Gleixner	1	-1/+1
	schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the clock ID, instead of the customary predefined "clockid_t" type. In hrtimer coding style the canonical variable name for the clock ID is 'clock_id', therefore change the name of the parameter here as well to make it all consistent. While at it, clean up the description for the 'clock_id' and 'mode' function parameters. The clock modes and the clock IDs are not restricted as the comment suggests. Fix the mode description as well for the callers of schedule_hrtimeout_range_clock(). No functional changes intended. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Fix kerneldoc syntax for 'struct hrtimer_cpu_base'	Anna-Maria Gleixner	1	-4/+4
	The '/**' sequence marks the start of a structure description. Add the missing second asterisk. While at it adapt the ordering of the struct members to the struct definition and document the purpose of expires_next more precisely. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16	hrtimer: Optimize the hrtimer code by using static keys for ↵	Thomas Gleixner	1	-4/+0
	migration_enable/nohz_active The hrtimer_cpu_base::migration_enable and ::nohz_active fields were originally introduced to avoid accessing global variables for these decisions. Still that results in a (cache hot) load and conditional branch, which can be avoided by using static keys. Implement it with static keys and optimize for the most critical case of high performance networking which tends to disable the timer migration functionality. No change in functionality. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-30	nanosleep: Use get_timespec64() and put_timespec64()	Deepa Dinamani	1	-1/+1
	Usage of these apis and their compat versions makes the syscalls: clock_nanosleep and nanosleep and their compat implementations simpler. This is a preparatory patch to isolate data conversions to struct timespec64 at userspace boundaries. This helps contain the changes needed to transition to new y2038 safe types. Signed-off-by: Deepa Dinamani <[email protected]> Signed-off-by: Al Viro <[email protected]>
2017-06-14	posix-timers: Make nanosleep timespec argument const	Thomas Gleixner	1	-1/+1
	No nanosleep implementation modifies the rqtp argument. Mark is const. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Al Viro <[email protected]> Cc: John Stultz <[email protected]> Cc: Peter Zijlstra <[email protected]>
2017-06-14	posix-timers: Kill ->nsleep_restart()	Al Viro	1	-1/+0
	No more users. Signed-off-by: Al Viro <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: John Stultz <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-14	hrtimers/posix-timers: Merge nanosleep timespec copyout logics into a new helper	Al Viro	1	-0/+2
	Signed-off-by: Al Viro <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: John Stultz <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-14	hrtimer_nanosleep(): Pass rmtp in restart_block	Al Viro	1	-1/+0
	Store the pointer to the timespec which gets updated with the remaining time in the restart block and remove the function argument. [ tglx: Added changelog ] Signed-off-by: Al Viro <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: John Stultz <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-04-14	time: Change k_clock nsleep() to use timespec64	Deepa Dinamani	1	-1/+1
	struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. The syscall interfaces themselves will be changed in a separate series. Note that the restart_block parameter for nanosleep has also been left unchanged and will be part of syscall series noted above. Signed-off-by: Deepa Dinamani <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-03-17	hrtimer: Remove hrtimer_peek_ahead_timers() leftovers	Stephen Boyd	1	-4/+0
	This function was removed in commit c6eb3f70d448 (hrtimer: Get rid of hrtimer softirq, 2015-04-14) but the prototype wasn't ever deleted. Delete it now. Signed-off-by: Stephen Boyd <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-03-03	sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>	Ingo Molnar	1	-1/+0
	In our quest to simplify <linux/sched.h>'s header dependencies, remove the <linux/wait.h> inclusion from <linux/hrtimer.h> - which does not appear to be necessary, as hrtimer.h does not use waitqueues. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2017-02-10	time: Remove CONFIG_TIMER_STATS	Kees Cook	1	-11/+0
	Currently CONFIG_TIMER_STATS exposes process information across namespaces: kernel/time/timer_list.c print_timer(): SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); /proc/timer_list: #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570 Given that the tracer can give the same information, this patch entirely removes CONFIG_TIMER_STATS. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Kees Cook <[email protected]> Acked-by: John Stultz <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: [email protected] Cc: Lai Jiangshan <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Xing Gao <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Jessica Frazelle <[email protected]> Cc: [email protected] Cc: Nicolas Iooss <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Michal Marek <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Olof Johansson <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Cc: Arjan van de Ven <[email protected]> Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast Signed-off-by: Thomas Gleixner <[email protected]>
2016-12-25	ktime: Get rid of the union	Thomas Gleixner	1	-6/+6
	ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless. Get rid of the union and just keep ktime_t as simple typedef of type s64. The conversion was done with coccinelle and some manual mopping up. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]>
2016-07-15	hrtimer: Convert to hotplug state machine	Thomas Gleixner	1	-0/+7
	Split out the clockevents callbacks instead of piggybacking them on hrtimers. This gets rid of a POST_DEAD user. See commit: 54e88fad223c ("sched: Make sure timers have migrated before killing the migration_thread") We just move the callback state to the proper place in the state machine. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Rusty Russell <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-17	timer: convert timer_slack_ns from unsigned long to u64	John Stultz	1	-5/+7
	This patchset introduces a /proc/<pid>/timerslack_ns interface which would allow controlling processes to be able to set the timerslack value on other processes in order to save power by avoiding wakeups (Something Android currently does via out-of-tree patches). The first patch tries to fix the internal timer_slack_ns usage which was defined as a long, which limits the slack range to ~4 seconds on 32bit systems. It converts it to a u64, which provides the same basically unlimited slack (500 years) on both 32bit and 64bit machines. The second patch introduces the /proc/<pid>/timerslack_ns interface which allows the full 64bit slack range for a task to be read or set on both 32bit and 64bit machines. With these two patches, on a 32bit machine, after setting the slack on bash to 10 seconds: $ time sleep 1 real 0m10.747s user 0m0.001s sys 0m0.005s The first patch is a little ugly, since I had to chase the slack delta arguments through a number of functions converting them to u64s. Let me know if it makes sense to break that up more or not. Other than that things are fairly straightforward. This patch (of 2): The timer_slack_ns value in the task struct is currently a unsigned long. This means that on 32bit applications, the maximum slack is just over 4 seconds. However, on 64bit machines, its much much larger (~500 years). This disparity could make application development a little (as well as the default_slack) to a u64. This means both 32bit and 64bit systems have the same effective internal slack range. Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify the interface as a unsigned long, so we preserve that limitation on 32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is actually larger then what can be stored by an unsigned long. This patch also modifies hrtimer functions which specified the slack delta as a unsigned long. Signed-off-by: John Stultz <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Oren Laadan <[email protected]> Cc: Ruchi Kandoi <[email protected]> Cc: Rom Lemarchand <[email protected]> Cc: Kees Cook <[email protected]> Cc: Android Kernel Team <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-17	hrtimer: Handle remaining time proper for TIME_LOW_RES	Thomas Gleixner	1	-3/+31
	If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to prevent short sleeps, but we do not account for that in interfaces which retrieve the remaining time. Helge observed that timerfd can return a remaining time larger than the relative timeout. That's not expected and breaks userland test programs. Store the information that the timer was armed relative and provide functions to adjust the remaining time. To avoid bloating the hrtimer struct make state a u8, which as a bonus results in better code on x86 at least. Reported-and-tested-by: Helge Deller <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: John Stultz <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-19	timer: Minimize nohz off overhead	Thomas Gleixner	1	-0/+2
	If nohz is disabled on the kernel command line the [hr]timer code still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty pointless exercise. Cache nohz_active in [hr]timer per cpu bases and avoid the overhead. Before: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu After: 48.73% hog [.] main 15.36% [kernel] [k] _raw_spin_lock_irqsave 9.77% [kernel] [k] _raw_spin_unlock_irqrestore 6.61% [kernel] [k] lock_timer_base.isra.38 6.42% [kernel] [k] mod_timer 3.90% [kernel] [k] detach_if_pending 3.76% [kernel] [k] del_timer 2.41% [kernel] [k] internal_add_timer 1.39% [kernel] [k] __internal_add_timer 0.76% [kernel] [k] timerfn We probably should have a cached value for nohz full in the per cpu bases as well to avoid the cpumask check. The base cache line is hot already, the cpumask not necessarily. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: John Stultz <[email protected]> Cc: Joonwoo Park <[email protected]> Cc: Wenbo Wang <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-19	timer: Reduce timer migration overhead if disabled	Thomas Gleixner	1	-0/+2
	Eric reported that the timer_migration sysctl is not really nice performance wise as it needs to check at every timer insertion whether the feature is enabled or not. Further the check does not live in the timer code, so we have an extra function call which checks an extra cache line to figure out that it is disabled. We can do better and store that information in the per cpu (hr)timer bases. I pondered to use a static key, but that's a nightmare to update from the nohz code and the timer base cache line is hot anyway when we select a timer base. The old logic enabled the timer migration unconditionally if CONFIG_NO_HZ was set even if nohz was disabled on the kernel command line. With this modification, we start off with migration disabled. The user visible sysctl is still set to enabled. If the kernel switches to NOHZ migration is enabled, if the user did not disable it via the sysctl prior to the switch. If nohz=off is on the kernel command line, migration stays disabled no matter what. Before: 47.76% hog [.] main 14.84% [kernel] [k] _raw_spin_lock_irqsave 9.55% [kernel] [k] _raw_spin_unlock_irqrestore 6.71% [kernel] [k] mod_timer 6.24% [kernel] [k] lock_timer_base.isra.38 3.76% [kernel] [k] detach_if_pending 3.71% [kernel] [k] del_timer 2.50% [kernel] [k] internal_add_timer 1.51% [kernel] [k] get_nohz_timer_target 1.28% [kernel] [k] __internal_add_timer 0.78% [kernel] [k] timerfn 0.48% [kernel] [k] wake_up_nohz_cpu After: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: John Stultz <[email protected]> Cc: Joonwoo Park <[email protected]> Cc: Wenbo Wang <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-19	hrtimer: Allow hrtimer::function() to free the timer	Peter Zijlstra	1	-25/+16
	Currently an hrtimer callback function cannot free its own timer because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK after it. Freeing the timer would result in a clear use-after-free. Solve this by using a scheme similar to regular timers; track the current running timer in hrtimer_clock_base::running. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Al Viro <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul McKenney <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-19	hrtimer: Remove HRTIMER_STATE_MIGRATE	Oleg Nesterov	1	-5/+1
	I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally confused it looks buggy and simply unneeded. migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this is not enough: this can fool, say, hrtimer_is_queued() in dequeue_signal(). Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED? This fixes the race and we can kill STATE_MIGRATE. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>