Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"In this pile: pathname resolution rewrite.
- recursion in link_path_walk() is gone.
- nesting limits on symlinks are gone (the only limit remaining is
that the total amount of symlinks is no more than 40, no matter how
nested).
- "fast" (inline) symlinks are handled without leaving rcuwalk mode.
- stack footprint (independent of the nesting) is below kilobyte now,
about on par with what it used to be with one level of nested
symlinks and ~2.8 times lower than it used to be in the worst case.
- struct nameidata is entirely private to fs/namei.c now (not even
opaque pointers are being passed around).
- ->follow_link() and ->put_link() calling conventions had been
changed; all in-tree filesystems converted, out-of-tree should be
able to follow reasonably easily.
For out-of-tree conversions, see Documentation/filesystems/porting
for details (and in-tree filesystems for examples of conversion).
That has sat in -next since mid-May, seems to survive all testing
without regressions and merges clean with v4.1"
* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits)
turn user_{path_at,path,lpath,path_dir}() into static inlines
namei: move saved_nd pointer into struct nameidata
inline user_path_create()
inline user_path_parent()
namei: trim do_last() arguments
namei: stash dfd and name into nameidata
namei: fold path_cleanup() into terminate_walk()
namei: saner calling conventions for filename_parentat()
namei: saner calling conventions for filename_create()
namei: shift nameidata down into filename_parentat()
namei: make filename_lookup() reject ERR_PTR() passed as name
namei: shift nameidata inside filename_lookup()
namei: move putname() call into filename_lookup()
namei: pass the struct path to store the result down into path_lookupat()
namei: uninline set_root{,_rcu}()
namei: be careful with mountpoint crossings in follow_dotdot_rcu()
Documentation: remove outdated information from automount-support.txt
get rid of assorted nameidata-related debris
lustre: kill unused helper
lustre: kill unused macro (LOOKUP_CONTINUE)
...
|
|
Currently if we build a single target like:
$ touch util/map.c && make util/map.o
It will not rebuild util/map.o if it already exists and util/map.c is
modified.
The reason is that the top-level 'Makefile' processes util/map.o as an
implicit rule and if util/map.o exists make considers the 'util/map.o'
target as done and will not nest into Makefile.perf.
Adding FORCE for '%', because that's what we want to nest into
Makefile.perf for any target.
Adding Makefile into phony targets, because make tries to rebuild it and
it's also resolved as '%' target.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Lukas Wunner <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Lukas Wunner reported issue (and fix[1]) with 'make install prefix=...'.
Adding automated test for this, so it wouldn't happen again.
[1]: 75e84ab906ef ("perf tools: Fix build breakage if prefix= is specified")
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Lukas Wunner <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently we test only builds through top level Makefile, but seems like
there's a bunch of users using Makefile.perf directly.
Changing the make suite to be run for Makefile.perf as well. It takes
now considerable amount of time, but hopefully we catch more issues.
Also fixing the output indentation for make_kernelsrc and
make_kernelsrc_tools tests.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Lukas Wunner <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Current 'f' key action to enable/disable events won't work if there're
more than one event since perf_evsel_menu__run() doesn't return the key.
So move it to the hists browser loop so that it can be processed as like
other key action, and it's more natural to handle it there IMHO.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
x86/core, to merge last updates
Signed-off-by: Ingo Molnar <[email protected]>
|
|
|
|
This driver leaks out into arch/parisc builds that don't have
CONFIG_GENERIC_CLOCKEVENTS, leading to the following (truncated)
wreckage:
CC drivers/clocksource/timer-stm32.o
drivers/clocksource/timer-stm32.c:38:28: error: field 'evtdev' has incomplete type
drivers/clocksource/timer-stm32.c:44:19: warning: 'enum clock_event_mode' declared inside parameter list
drivers/clocksource/timer-stm32.c:44:19: warning: its scope is only this definition or declaration, which is probably not what you want
drivers/clocksource/timer-stm32.c:43:62: error: parameter 1 ('mode') has incomplete type
drivers/clocksource/timer-stm32.c:43:13: error: function declaration isn't a prototype
drivers/clocksource/timer-stm32.c: In function 'stm32_clock_event_set_mode':
drivers/clocksource/timer-stm32.c:47:3: error: type defaults to 'int' in declaration of '__mptr'
drivers/clocksource/timer-stm32.c:47:3: warning: initialization from incompatible pointer type
drivers/clocksource/timer-stm32.c:51:7: error: 'CLOCK_EVT_MODE_PERIODIC' undeclared (first use in this function)
drivers/clocksource/timer-stm32.c:51:7: note: each undeclared identifier is reported only once for each function it appears in
drivers/clocksource/timer-stm32.c:56:7: error: 'CLOCK_EVT_MODE_ONESHOT' undeclared (first use in this function)
Tighten up the dependencies to limit where it gets built by copying
the style of the Kconfig line for CLKSRC_EFM32 a few lines above.
Signed-off-by: Paul Gortmaker <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Chanwoo Choi <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
hpet_assign_irq() is called with hpet_device->num as "hardware
interrupt number", but hpet_device->num is initialized after the
interrupt has been assigned, so it's always 0. As a consequence only
the first MSI allocation succeeds, the following ones fail because the
"hardware interrupt number" already exists.
Move the initialization of dev->num and other fields before the call
to hpet_assign_irq(), which is the ordering before the offending
commit which introduced that regression.
Fixes: "3cb96f0c9733 x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains"
Reported-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1506211635010.4107@nanos
Cc: Jiang Liu <[email protected]>
Cc: Borislav Petkov <[email protected]>
|
|
Pull scsi target fixes from Nicholas Bellinger:
"Apologies for the late pull request.
Here are the outstanding target-pending fixes for v4.1 code.
The series contains three patches from Sagi + Co that address a few
iser-target issues that have been uncovered during recent testing at
Mellanox.
Patch #1 has a v3.16+ stable tag, and #2-3 have v3.10+ stable tags"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iser-target: Fix possible use-after-free
iser-target: release stale iser connections
iser-target: Fix variable-length response error completion
|
|
Pull drm fixes from Dave Airlie:
"A smattering of fixes,
mgag200:
don't accept modes that aren't aligned properly as hw can't do it
i915:
two regression fixes
radeon:
one query to allow userspace fixes
one oops fixer for older hw with new options enabled"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: don't probe MST on hw we don't support it on
drm/radeon: Add RADEON_INFO_VA_UNMAP_WORKING query
drm/mgag200: Reject non-character-cell-aligned mode widths
Revert "drm/i915: Don't skip request retirement if the active list is empty"
drm/i915: Always reset vma->ggtt_view.pages cache on unbinding
|
|
Get the infrastructure patches which are required for x86/apic into core
|
|
If an interrupt is marked with the no balancing flag, we still allow
setting the affinity for such an interrupt from the kernel itself, but
for interrupts which move the affinity from interrupt context via
irq_move_mask_irq() this runs into a check for the no balancing flag,
which in turn ends up with an endless storm of stack dumps because the
move pending flag is not reset.
Allow the move for interrupts which have the no balancing flag set and
clear the move pending bit before checking for interrupts with the per
cpu flag set.
Reported-by: Sergey Senozhatsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiang Liu <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1506201002570.4107@nanos
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
irq == 0 is not a valid irq for a irqdomain MSI allocation, but hpet
code checks only for negative return values.
Reported-by: Sergey Senozhatsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Replace CTRL+z with 'f' as hotkey for enable/disable events (Arnaldo Carvalho de Melo)
- Do not exit when 'f' is pressed in 'report' mode (Arnaldo Carvalho de Melo)
- Tell the user how to unfreeze events after pressing 'f' in 'perf top' (Arnaldo Carvalho de Melo)
- React to unassigned hotkey pressing in 'top/report' (Arnaldo Carvalho de Melo)
- Display total number of samples with --show-total-period in 'annotate' (Martin Liška)
- Add timeout to make procfs mmap processing more robust (Kan Liang)
- Fix sort__sym_cmp to also compare end of symbol (Yannick Brosseau)
Infrastructure changes:
- Ensure thread-stack is flushed (Adrian Hunter)
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fix from Arnaldo Carvalho de Melo:
- Fix build breakage if prefix= is specified, this introduced
a regression for a build idiom used by the Debian "linux-tools"
package, that does:
MAKE_PERF := $(MAKE) prefix=/usr V=1 ARCH=$(KERNEL_ARCH_PERF) ...
Fix it. (Lukas Wunner)
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The time out to limit the individual proc map processing was hard code
to 500ms. This patch introduce a new option --proc-map-timeout to make
the time limit configurable.
Signed-off-by: Kan Liang <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ying Huang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
System wide sampling like 'perf top' or 'perf record -a' read all
threads /proc/xxx/maps before sampling. If there are any threads which
generating a keeping growing huge maps, perf will do infinite loop
during synthesizing. Nothing will be sampled.
This patch fixes this issue by adding per-thread timeout to force stop
this kind of endless proc map processing.
PERF_RECORD_MISC_PROC_MAP_PARSE_TIME_OUT is introduced to indicate that
the mmap record are truncated by time out. User will get warning
notification when truncated mmap records are detected.
Reported-by: Ying Huang <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ying Huang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When using a map file from a JIT, due to memory reuse, we can obtain
multiple symbols with the same start address but a different length.
The symbols__find does check for the end so not doing it in
sort__sym_cmp was causing the hist_entry in the annotate part of a
report to match to the wrong entry, causing a fatal error.
Signed-off-by: Yannick Brosseau <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When that happens we were just ignoring the key press, now this
message is presented in the bottom line (the help line):
"Press '?' for help on key bindings"
Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When the user presses 'f' to disable events the visual cues are, well,
the percentages not changing and the number of events freezing.
Be more explicit by changing the help line at the bottom of the screen
to show the following messages when 'f' is pressed:
"Press 'f' again to re-enable the events"
And then, when 'f' is pressed again:
"Press 'f' to disable the events or 'h'
Suggested-by: Ingo Molnar <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The hists_browser was replacing whatever helpline provided by 'top' or
'report' with a static "Press '?' for help on key bindings", fix it.
Now the message passed by top appears at the bottom of the screen:
"For a higher level overview, try: perf top --sort comm,dso"
As well the message that will be added when the user presses 'f' to
disable the events, something along the lines of "press f again to
re-enable...".
Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The 'f' hotkey is only used when in 'top', dynamic mode, to
enable/disable events, currently not making sense in the 'report',
static mode, where we can't go from showing the histogram entries
created from a perf.data file to adding more events after recreating the
evlist created from the perf.data file, albeit possible, this is not
implemented right now.
Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
I.e. 'freeze'/'unfreeze', this is because CTRL+z has a well known
action, i.e. suspend the app, perf needs to follow that convention, that
will be done on a separate patch, tho.
Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Invoking Makefile.perf with prefix= breaks the build since Makefile.perf
hands that variable down to Makefile.build where it overrides
prefix := $(subst ./,,$(OUTPUT)$(dir)/)
leading to errors like this:
No rule to make target '/usrabspath.o', needed by '/usrlibperf-in.o'
Signed-off-by: Lukas Wunner <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Fixes: c819e2cf2eb6f65d3208d195d7a0edef6108d5
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To better reflect the purpose of this struct, that is to hold
info about samples, its total number and is percentage.
Cc: Martin Liska <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To compare two records on an instruction base, with --show-total-period
option provided, display total number of samples that belong to a line
in assembly language.
New hot key 't' is introduced for 'perf annotate' TUI.
Signed-off-by: Martin Liska <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The thread-stack represents a thread's current stack. When a thread
exits there can still be many functions on the stack e.g. exit() can be
called many levels deep, so all the callers will never return. To get
that information output, the thread-stack must be flushed.
Previously it was assumed the thread-stack would be flushed when the
struct thread was deleted. With thread ref-counting it is no longer
clear when that will be, if ever. So instead explicitly flush all the
thread-stacks at the end of a session.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Michael Turquette:
"Very late clk regression fixes for the ARM-based AT91 platform.
These went unnoticed by me until recently, hence the late pull
request"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: at91: fix h32mx prototype inclusion in pmc header
clk: at91: trivial: typo in peripheral clock description
clk: at91: fix PERIPHERAL_MAX_SHIFT definition
clk: at91: pll: fix input range validity check
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Nothing looks scary, just a few usual HD-audio regression fixes and
fixup, in addition to a minor Kconfig dependency fix for the old MIPS
drivers"
* tag 'sound-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix unused label skip_i915
ALSA: hda - Fix noisy outputs on Dell XPS13 (2015 model)
ALSA: mips: let SND_SGI_O2 select SND_PCM
ALSA: hda - Fix audio crackles on Dell Latitude E7x40
ALSA: hda - adding a DAC/pin preference map for a HP Envy TS machine
|
|
https://github.com/bbrezillon/linux-at91 into clk-fixes
|
|
When building the kernel with 32-bit binutils built with support
only for the i386 target, we get the following warning:
arch/x86/kernel/head_32.S:66: Warning: shift count out of range (32 is not between 0 and 31)
The problem is that in that case, binutils' internal type
representation is 32-bit wide and the shift range overflows.
In order to fix this, manipulate the shift expression which
creates the 4GiB constant to not overflow the shift count.
Suggested-by: Michael Matz <[email protected]>
Reported-and-tested-by: Enrico Mioso <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Trivial fix that prevents to compile this pmc clock driver if h32mx clock is
present but smd clock isn't.
Signed-off-by: Nicolas Ferre <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Acked-by: Alexandre Belloni <[email protected]>
Fixes: bcc5fd49a0fd ("clk: at91: add a driver for the h32mx clock")
Cc: <[email protected]> # 3.18+
|
|
Signed-off-by: Nicolas Ferre <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
|
|
If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.
Before:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
After:
48.73% hog [.] main
15.36% [kernel] [k] _raw_spin_lock_irqsave
9.77% [kernel] [k] _raw_spin_unlock_irqrestore
6.61% [kernel] [k] lock_timer_base.isra.38
6.42% [kernel] [k] mod_timer
3.90% [kernel] [k] detach_if_pending
3.76% [kernel] [k] del_timer
2.41% [kernel] [k] internal_add_timer
1.39% [kernel] [k] __internal_add_timer
0.76% [kernel] [k] timerfn
We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Joonwoo Park <[email protected]>
Cc: Wenbo Wang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.
We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.
The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.
With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.
Before:
47.76% hog [.] main
14.84% [kernel] [k] _raw_spin_lock_irqsave
9.55% [kernel] [k] _raw_spin_unlock_irqrestore
6.71% [kernel] [k] mod_timer
6.24% [kernel] [k] lock_timer_base.isra.38
3.76% [kernel] [k] detach_if_pending
3.71% [kernel] [k] del_timer
2.50% [kernel] [k] internal_add_timer
1.51% [kernel] [k] get_nohz_timer_target
1.28% [kernel] [k] __internal_add_timer
0.78% [kernel] [k] timerfn
0.48% [kernel] [k] wake_up_nohz_cpu
After:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Joonwoo Park <[email protected]>
Cc: Wenbo Wang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Simplify the handling of the flag storage for the timer statistics. No
intermediate storage anymore. Just hand over the flags field.
I left the printout of 'deferrable' for now because changing this
would be an ABI update and I have no idea how strong people feel about
that. OTOH, I wonder whether we should kill the whole timer stats
stuff because all of that information can be retrieved via ftrace/perf
as well.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Joonwoo Park <[email protected]>
Cc: Wenbo Wang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Instead of storing a pointer to the per cpu tvec_base we can simply
cache a CPU index in the timer_list and use that to get hold of the
correct per cpu tvec_base. This is only used in lock_timer_base() and
the slightly larger code is peanuts versus the spinlock operation and
the d-cache foot print of the timer wheel.
Aside of that this allows to get rid of following nuisances:
- boot_tvec_base
That statically allocated 4k bss data is just kept around so the
timer has a home when it gets statically initialized. It serves no
other purpose.
With the CPU index we assign the timer to CPU0 at static
initialization time and therefor can avoid the whole boot_tvec_base
dance. That also simplifies the init code, which just can use the
per cpu base.
Before:
text data bss dec hex filename
17491 9201 4160 30852 7884 ../build/kernel/time/timer.o
After:
text data bss dec hex filename
17440 9193 0 26633 6809 ../build/kernel/time/timer.o
- Overloading the base pointer with various flags
The CPU index has enough space to hold the flags (deferrable,
irqsafe) so we can get rid of the extra masking and bit fiddling
with the base pointer.
As a benefit we reduce the size of struct timer_list on 64 bit
machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list,
which is a real win as we have tons of them embedded in other structs.
This changes also the newly added deferrable printout of the timer
start trace point to capture and print all timer->flags, which allows
us to decode the target cpu of the timer as well.
We might have used bitfields for this, but that would change the
static initializers and the init function for no value to accomodate
big endian bitfields.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Joonwoo Park <[email protected]>
Cc: Wenbo Wang <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Badhri Jagan Sridharan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
This reduces the size of struct tvec_base by 50% and results in
slightly smaller code as well.
Before:
struct tvec_base: size: 8256, cachelines: 129
text data bss dec hex filename
17698 13297 8256 39251 9953 ../build/kernel/time/timer.o
After:
struct tvec_base: 4160, cachelines: 65
text data bss dec hex filename
17491 9201 4160 30852 7884 ../build/kernel/time/timer.o
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Viresh Kumar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Joonwoo Park <[email protected]>
Cc: Wenbo Wang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The FIFO guarantee is only there if two timers are queued into the
same bucket at the same jiffie on the same cpu:
- The slack value depends on the delta between expiry and enqueue
time, so the resulting expiry time can be different for timers
which are queued in different jiffies.
- Timers which are queued into the secondary array end up after a
later queued timer which was queued into the primary array due to
cascading.
- Timers can end up on different cpus due to the NOHZ target moving
around. Obviously there is no guarantee of expiry ordering between
cpus.
So anything which relies on FIFO behaviour of the timer wheel is
broken already.
This is a preparatory patch for converting the timer wheel to hlist
which reduces the memory foot print of the wheel by 50%.
It's a seperate patch so any (unlikely to happen) regression caused by
this can be identified clearly.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Viresh Kumar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Joonwoo Park <[email protected]>
Cc: Wenbo Wang <[email protected]>
Cc: George Spelvin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
catchup_timer_jiffies() has been applied blindly to several functions
without looking for possible better ways to do it.
1) internal_add_timer()
Move the update to base->all_timers before we actually insert the
timer into the wheel.
2) detach_if_pending()
Again the update to base->all_timers allows us to explicitely do
the timer_jiffies update in place, if this was the last timer which
got removed.
3) __run_timers()
We only check on entry, which is silly, because base->timer_jiffies
can be behind - especially on NOHZ kernels - and if there is a
single deferrable timer somewhere between base->timer_jiffies and
jiffies we expire it and then loop until base->timer_jiffies ==
jiffies.
Move it into the loop.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Joonwoo Park <[email protected]>
Cc: Wenbo Wang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Fix the PERIPHERAL_MAX_SHIFT definition (3 instead of 4) and adapt the
round_rate and set_rate logic accordingly.
Signed-off-by: Boris Brezillon <[email protected]>
Reported-by: "Wu, Songjun" <[email protected]>
|
|
The PLL impose a certain input range to work correctly, but it appears that
this input range does not apply on the input clock (or parent clock) but
on the input clock after it has passed the PLL divisor.
Fix the implementation accordingly.
Cc: <[email protected]> # v3.14+
Signed-off-by: Boris Brezillon <[email protected]>
Reported-by: Jonas Andersson <[email protected]>
|
|
Sine commit 269ad8015a6b ("sched/deadline: Avoid double-accounting in
case of missed deadlines), parameter 'rq' is no longer used, so
remove it.
Signed-off-by: Zhiqiang Zhang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Resetting the p->dl_throttled flag in rt_mutex_setprio() (for a task that is going
to be boosted) is superfluous, as the natural place to do so is in
replenish_dl_entity().
If the task was on the runqueue and it is boosted by a DL task, it will be enqueued
back with ENQUEUE_REPLENISH flag set, which can guarantee that dl_throttled is
reset in replenish_dl_entity().
This patch drops the resetting of throttled status in function rt_mutex_setprio().
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
There are two init_sched_dl_class() declarations, this patch drops
the duplicate.
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
non-feasible target
This patch adds a check that prevents futile attempts to move DL tasks
to a CPU with active tasks of equal or earlier deadline. The same
behavior as commit 80e3d87b2c55 ("sched/rt: Reduce rq lock contention
by eliminating locking of non-feasible target") for rt class.
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
It's a bootstrap function, make init_sched_dl_class() __init.
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
pull_dl_task() uses pick_next_earliest_dl_task() to select a migration
candidate; this is sub-optimal since the next earliest task -- as per
the regular runqueue -- might not be migratable at all. This could
result in iterating the entire runqueue looking for a task.
Instead iterate the pushable queue -- this queue only contains tasks
that have at least 2 cpus set in their cpus_allowed mask.
Signed-off-by: Wanpeng Li <[email protected]>
[ Improved the changelog. ]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Avoid touching the curr->preempt_notifier cacheline when not needed.
Provides a small improvement on pipe-bench:
taskset 01 perf stat --repeat 10 -- perf bench sched pipe
before:
Performance counter stats for 'perf bench sched pipe' (10 runs):
12385.016204 task-clock (msec) # 1.001 CPUs utilized ( +- 0.34% )
2,000,023 context-switches # 0.161 M/sec ( +- 0.00% )
0 cpu-migrations # 0.000 K/sec
175 page-faults # 0.014 K/sec ( +- 0.26% )
41,376,162,250 cycles # 3.341 GHz ( +- 0.11% )
17,389,139,321 stalled-cycles-frontend # 42.03% frontend cycles idle ( +- 0.25% )
<not supported> stalled-cycles-backend
68,788,588,003 instructions # 1.66 insns per cycle
# 0.25 stalled cycles per insn ( +- 0.02% )
13,449,387,620 branches # 1085.940 M/sec ( +- 0.02% )
20,880,690 branch-misses # 0.16% of all branches ( +- 0.98% )
12.372646094 seconds time elapsed ( +- 0.34% )
after:
Performance counter stats for 'perf bench sched pipe' (10 runs):
12180.936528 task-clock (msec) # 1.001 CPUs utilized ( +- 0.33% )
2,000,077 context-switches # 0.164 M/sec ( +- 0.00% )
0 cpu-migrations # 0.000 K/sec
174 page-faults # 0.014 K/sec ( +- 0.27% )
40,691,545,577 cycles # 3.341 GHz ( +- 0.06% )
16,446,333,371 stalled-cycles-frontend # 40.42% frontend cycles idle ( +- 0.18% )
<not supported> stalled-cycles-backend
68,570,100,387 instructions # 1.69 insns per cycle
# 0.24 stalled cycles per insn ( +- 0.01% )
13,389,740,014 branches # 1099.237 M/sec ( +- 0.01% )
20,175,440 branch-misses # 0.15% of all branches ( +- 0.52% )
12.169253010 seconds time elapsed ( +- 0.33% )
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|