aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2017-06-27tracing: Add support for display of tgid in trace outputJoel Fernandes2-14/+31
Earlier patches introduced ability to record the tgid using the 'record-tgid' option. Here we read the tgid and output it if the option is enabled. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Tested-by: Michael Sartain <[email protected]> Signed-off-by: Joel Fernandes <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-06-27tracing: Add support for recording tgid of tasksJoel Fernandes4-25/+201
Inorder to support recording of tgid, the following changes are made: * Introduce a new API (tracing_record_taskinfo) to additionally record the tgid along with the task's comm at the same time. This has has the benefit of not setting trace_cmdline_save before all the information for a task is saved. * Add a new API tracing_record_taskinfo_sched_switch to record task information for 2 tasks at a time (previous and next) and use it from sched_switch probe. * Preserve the old API (tracing_record_cmdline) and create it as a wrapper around the new one so that existing callers aren't affected. * Reuse the existing sched_switch and sched_wakeup probes to record tgid information and add a new option 'record-tgid' to enable recording of tgid When record-tgid option isn't enabled to being with, we take care to make sure that there's isn't memory or runtime overhead. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Tested-by: Michael Sartain <[email protected]> Signed-off-by: Joel Fernandes <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-06-27ftrace: Decrement count for dyn_ftrace_total_info fileSteven Rostedt (VMware)1-0/+1
The dyn_ftrace_total_info file is used to show how many functions have been converted into nops and can be used by ftrace. The problem is that it does not get decremented when functions are removed (init boot code being freed, and modules being freed). That means the number is very inaccurate everytime functions are removed from the ftrace tables. Decrement it when functions are removed. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-06-27ftrace: Remove unused function ftrace_arch_read_dyn_info()Steven Rostedt (VMware)1-18/+3
ftrace_arch_read_dyn_info() was used so that archs could add its own debug information into the dyn_ftrace_total_info in the tracefs file system. That file is for debugging usage of dynamic ftrace. No arch uses that function anymore, so just get rid of it. This also allows for tracing_read_dyn_info() to be cleaned up a bit. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-06-27PM / hibernate: Drop redundant parameter of swsusp_alloc()BaoJun Luo1-3/+2
The first parameter of swsusp_alloc is not used, so drop it. Signed-off-by: BaoJun Luo <[email protected]> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-06-27PM / hibernate: Use CONFIG_HAVE_SET_MEMORY for include conditionBalbir Singh1-3/+3
Kbuild reported a build failure when CONFIG_STRICT_KERNEL_RWX was enabled on powerpc. We don't yet have ARCH_HAS_SET_MEMORY and ppc32 saw a build failure. I've only done a basic compile test with a config that has hibernation enabled. Fixes: 50327ddfbc92 (kernel/power/snapshot.c: use set_memory.h header) Reported-by: Christophe Leroy <[email protected]> Signed-off-by: Balbir Singh <[email protected]> Acked-by: Pavel Machek <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-06-26seccomp: Switch from atomic_t to recount_tKees Cook1-5/+5
This switches the seccomp usage tracking from atomic_t to refcount_t to gain refcount overflow protections. Cc: Elena Reshetova <[email protected]> Cc: David Windsor <[email protected]> Cc: Hans Liljestrand <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2017-06-26seccomp: Clean up core dump logicKees Cook1-3/+3
This just cleans up the core dumping logic to avoid the braces around the RET_KILL case. Signed-off-by: Kees Cook <[email protected]>
2017-06-26ftrace: Have cached module filters be an active filterSteven Rostedt (VMware)2-6/+21
When a module filter is added to set_ftrace_filter, if the module is not loaded, it is cached. This should be considered an active filter, and function tracing should be filtered by this. That is, if a cached module filter is the only filter set, then no function tracing should be happening, as all the functions available will be filtered out. This makes sense, as the reason to add a cached module filter, is to trace the module when you load it. There shouldn't be any other tracing happening until then. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-06-26ftrace: Implement cached modules tracing on module loadSteven Rostedt (VMware)1-0/+93
If a module is cached in the set_ftrace_filter, and that module is loaded, then enable tracing on that module as if the cached module text was written into set_ftrace_filter just as the module is loaded. # echo ":mod:kvm_intel" > # cat /sys/kernel/tracing/set_ftrace_filter #### all functions enabled #### :mod:kvm_intel # modprobe kvm_intel # cat /sys/kernel/tracing/set_ftrace_filter vmx_get_rflags [kvm_intel] vmx_get_pkru [kvm_intel] vmx_get_interrupt_shadow [kvm_intel] vmx_rdtscp_supported [kvm_intel] vmx_invpcid_supported [kvm_intel] [..] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-06-26ftrace: Have the cached module list show in set_ftrace_filterSteven Rostedt (VMware)1-12/+100
When writing in a module filter into set_ftrace_filter for a module that is not yet loaded, it it cached, and will be executed when the module is loaded (although that is not implemented yet at this commit). Display the list of cached modules to be traced. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-06-26ftrace: Add :mod: caching infrastructure to trace_arraySteven Rostedt (VMware)2-6/+148
This is the start of the infrastructure work to allow for tracing module functions before it is loaded. Currently the following command: # echo :mod:some-mod > set_ftrace_filter will enable tracing of all functions within the module "some-mod" if it is loaded. What we want, is if the module is not loaded, that line will be saved. When the module is loaded, then the "some-mod" will have that line executed on it, so that the functions within it starts being traced. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-06-26kernel/module.c: suppress warning about unused nowarn variableCorentin Labbe1-10/+20
This patch fix the following warning: kernel/module.c: In function 'add_usage_links': kernel/module.c:1653:6: warning: variable 'nowarn' set but not used [-Wunused-but-set-variable] [jeyu: folded in first patch since it only swapped the function order so that del_usage_links can be called from add_usage_links] Signed-off-by: Corentin Labbe <[email protected]> Signed-off-by: Jessica Yu <[email protected]>
2017-06-26genirq: Avoid unnecessary low level irq function callsJeffy Chen1-20/+33
Check irq state in enable/disable/unmask/mask_irq to avoid unnecessary low level irq function calls. This has two advantages: - Conditionals are faster than hardware access - Solves issues with the underlying refcounting of the pinctrl infrastructure Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Jeffy Chen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-06-26genirq: Set irq masked state when initializing irq_descJeffy Chen1-0/+1
The irq default state is set to disabled when allocating irq desc, but the masked state flag is not set. This is inconsistent vs. the state tracking logic which is used to prevent unnecessary calls to hardware level irq chip functions. Set the masked state flag as well. Signed-off-by: Jeffy Chen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-06-25posix-stubs: Conditionally include COMPAT_SYS_NI definesDeepa Dinamani1-6/+7
These apis only need to be defined if CONFIG_COMPAT is enabled. Signed-off-by: Deepa Dinamani <[email protected]> Signed-off-by: Al Viro <[email protected]>
2017-06-25time: introduce {get,put}_itimerspec64Deepa Dinamani2-0/+51
As we change the user space type for the timerfd and posix timer functions to newer data types, we need some form of conversion helpers to avoid duplicating that logic. Suggested-by: Arnd Bergmann <[email protected]> Signed-off-by: Deepa Dinamani <[email protected]> Signed-off-by: Al Viro <[email protected]>
2017-06-25time: add get_timespec64 and put_timespec64Deepa Dinamani2-0/+72
Add helper functions to convert between struct timespec64 and struct timespec at userspace boundaries. This is a preparatory patch to use timespec64 as the basic type internally in the kernel as timespec is not y2038 safe on 32 bit systems. The patch helps the cause by containing all data conversions at the userspace boundaries within these functions. Suggested-by: Arnd Bergmann <[email protected]> Signed-off-by: Deepa Dinamani <[email protected]> Signed-off-by: Al Viro <[email protected]>
2017-06-25Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-25/+46
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A few fixes for timekeeping and timers: - Plug a subtle race due to a missing READ_ONCE() in the timekeeping code where reloading of a pointer results in an inconsistent callback argument being supplied to the clocksource->read function. - Correct the CLOCK_MONOTONIC_RAW sub-nanosecond accounting in the time keeping core code, to prevent a possible discontuity. - Apply a similar fix to the arm64 vdso clock_gettime() implementation - Add missing includes to clocksource drivers, which relied on indirect includes which fails in certain configs. - Use the proper iomem pointer for read/iounmap in a probe function" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting time: Fix clock->read(clock) race around clocksource changes clocksource: Explicitly include linux/clocksource.h when needed clocksource/drivers/arm_arch_timer: Fix read and iounmap of incorrect variable
2017-06-25Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Three fixlets for perf: - Return the proper error code if aux buffers for a event are not supported. - Calculate the probe offset for inlined functions correctly - Update the Skylake DTLB load/store miss event so it can count 1G TLB entries as well" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf probe: Fix probe definition for inlined functions perf/x86/intel: Add 1G DTLB load/store miss support for SKL perf/aux: Correct return code of rb_alloc_aux() if !has_aux(ev)
2017-06-24genirq/timings: Add infrastructure for estimating the next interrupt arrival ↵Daniel Lezcano2-0/+358
time An interrupt behaves with a burst of activity with periodic interval of time followed by one or two peaks of longer interval. As the time intervals are periodic, statistically speaking they follow a normal distribution and each interrupts can be tracked individually. Add a mechanism to compute the statistics on all interrupts, except the timers which are deterministic from a prediction point of view, as their expiry time is known. The goal is to extract the periodicity for each interrupt, with the last timestamp and sum them, so the next event can be predicted to a certain extent. Taking the earliest prediction gives the expected wakeup on the system (assuming a timer won't expire before). Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: "Rafael J . Wysocki" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Bjorn Helgaas <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-24genirq/timings: Add infrastructure to track the interrupt timingsDaniel Lezcano6-0/+129
The interrupt framework gives a lot of information about each interrupt. It does not keep track of when those interrupts occur though, which is a prerequisite for estimating the next interrupt arrival for power management purposes. Add a mechanism to record the timestamp for each interrupt occurrences in a per-CPU circular buffer to help with the prediction of the next occurrence using a statistical model. Each CPU can store up to IRQ_TIMINGS_SIZE events <irq, timestamp>, the current value of IRQ_TIMINGS_SIZE is 32. Each event is encoded into a single u64, where the high 48 bits are used for the timestamp and the low 16 bits are for the irq number. A static key is introduced so when the irq prediction is switched off at runtime, the overhead is near to zero. It results in most of the code in internals.h for inline reasons and a very few in the new file timings.c. The latter will contain more in the next patch which will provide the statistical model for the next event prediction. Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Nicolas Pitre <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: "Rafael J . Wysocki" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Bjorn Helgaas <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-24genirq/debugfs: Remove pointless NULL pointer checkThomas Gleixner2-8/+6
debugfs_remove() has it's own NULL pointer check. Remove the conditional and make irq_remove_debugfs_entry() an inline helper Reported-by: kbuild test robot <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2017-06-24Merge branch 'for-linus' of ↵Linus Torvalds1-6/+14
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull timer fix from Eric Biederman: "This fixes an issue of confusing injected signals with the signals from posix timers that has existed since posix timers have been in the kernel. This patch is slightly simpler than my earlier version of this patch as I discovered in testing that I had misspelled "#ifdef CONFIG_POSIX_TIMERS". So I deleted that unnecessary test and made setting of resched_timer uncondtional. I have tested this and verified that without this patch there is a nasty hang that is easy to trigger, and with this patch everything works properly" Thomas Gleixner dixit: "It fixes the problem at hand and covers the ptrace case as well, which I missed. Reviewed-and-tested-by: Thomas Gleixner <[email protected]>" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Only reschedule timers on signals timers have sent
2017-06-24sched/fair: Remove effective_load()Rik van Riel1-123/+1
The effective_load() function was only used by the NUMA balancing code, and not by the regular load balancing code. Now that the NUMA balancing code no longer uses it either, get rid of it. Signed-off-by: Rik van Riel <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-24sched/numa: Implement NUMA node level wake_affine()Rik van Riel1-59/+71
Since select_idle_sibling() can place a task anywhere on a socket, comparing loads between individual CPU cores makes no real sense for deciding whether to do an affine wakeup across sockets, either. Instead, compare the load between the sockets in a similar way the load balancer and the numa balancing code do. Signed-off-by: Rik van Riel <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-24sched/fair: Simplify wake_affine() for the single socket caseRik van Riel1-1/+12
Then 'this_cpu' and 'prev_cpu' are in the same socket, select_idle_sibling() will do its thing regardless of the return value of wake_affine(). Just return true and don't look at all the other things. Signed-off-by: Rik van Riel <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-24sched/numa: Override part of migrate_degrades_locality() when idle balancingRik van Riel1-0/+4
Several tests in the NAS benchmark seem to run a lot slower with NUMA balancing enabled, than with NUMA balancing disabled. The slower run time corresponds with increased idle time. Overriding the final test of migrate_degrades_locality (but still doing the other NUMA tests first) seems to improve performance of those benchmarks. Reported-by: Jirka Hladky <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-24Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar2-7/+37
Signed-off-by: Ingo Molnar <[email protected]>
2017-06-23bpf: possibly avoid extra masking for narrower load in verifierYonghong Song2-14/+32
Commit 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") permits narrower load for certain ctx fields. The commit however will already generate a masking even if the prog-specific ctx conversion produces the result with narrower size. For example, for __sk_buff->protocol, the ctx conversion loads the data into register with 2-byte load. A narrower 2-byte load should not generate masking. For __sk_buff->vlan_present, the conversion function set the result as either 0 or 1, essentially a byte. The narrower 2-byte or 1-byte load should not generate masking. To avoid unnecessary masking, prog-specific *_is_valid_access now passes converted_op_size back to verifier, which indicates the valid data width after perceived future conversion. Based on this information, verifier is able to avoid unnecessary marking. Since we want more information back from prog-specific *_is_valid_access checking, all of them are packed into one data structure for more clarity. Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-23sched/rt: Move RT related code from sched/core.c to sched/rt.cNicolas Pitre3-315/+315
This helps making sched/core.c smaller and hopefully easier to understand and maintain. Signed-off-by: Nicolas Pitre <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-23sched/deadline: Move DL related code from sched/core.c to sched/deadline.cNicolas Pitre3-340/+364
This helps making sched/core.c smaller and hopefully easier to understand and maintain. Signed-off-by: Nicolas Pitre <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-23sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabledNicolas Pitre1-4/+3
Make CONFIG_CPUSETS=y depend on SMP as this feature makes no sense on UP. This allows for configuring out cpuset_cpumask_can_shrink() and task_can_attach() entirely, which shrinks the kernel a bit. Signed-off-by: Nicolas Pitre <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-22tracing: Show address when function names are not foundSteven Rostedt (VMware)1-4/+14
Currently, when a function is not found in kallsyms, instead of simply showing the function address, it shows nothing at all: # echo ':mod:kvm_intel' > /sys/kernel/tracing/set_ftrace_filter # echo function > /sys/kernel/tracing/set_ftrace_filter # qemu -enable-kvm /home/my-qemu-image <Ctrl-C> # rmmod kvm_intel # cat /sys/kernel/tracing/trace qemu-system-x86-2408 [001] d..2 135.013238: <-kvm_arch_hardware_enable qemu-system-x86-2408 [001] .... 135.014574: <-kvm_arch_vm_ioctl qemu-system-x86-2408 [001] .... 135.015420: <-kvm_vm_ioctl_check_extension qemu-system-x86-2408 [001] .... 135.045411: <-__do_cpuid_ent qemu-system-x86-2408 [001] .... 135.045412: <-__do_cpuid_ent qemu-system-x86-2408 [001] .... 135.045412: <-__do_cpuid_ent qemu-system-x86-2408 [001] .... 135.045412: <-__do_cpuid_ent qemu-system-x86-2408 [001] ...1 135.045413: <-__do_cpuid_ent qemu-system-x86-2408 [001] .... 135.045413: <-__do_cpuid_ent When it should show: qemu-system-x86-2408 [001] d..2 135.013238: 0xffffffffa02a39f0 <-kvm_arch_hardware_enable qemu-system-x86-2408 [001] .... 135.014574: 0xffffffffa02a2ba0 <-kvm_arch_vm_ioctl qemu-system-x86-2408 [001] .... 135.015420: 0xffffffffa029e4e0 <-kvm_vm_ioctl_check_extension qemu-system-x86-2408 [001] .... 135.045411: 0xffffffffa02a1380 <-__do_cpuid_ent qemu-system-x86-2408 [001] .... 135.045412: 0xffffffffa029e160 <-__do_cpuid_ent qemu-system-x86-2408 [001] .... 135.045412: 0xffffffffa029e180 <-__do_cpuid_ent qemu-system-x86-2408 [001] .... 135.045412: 0xffffffffa029e520 <-__do_cpuid_ent qemu-system-x86-2408 [001] ...1 135.045413: 0xffffffffa02a13b0 <-__do_cpuid_ent qemu-system-x86-2408 [001] .... 135.045413: 0xffffffffa02a1380 <-__do_cpuid_ent instead. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-06-22genirq/irqdomain: Remove auto-recursive hierarchy supportMarc Zyngier2-43/+14
It did seem like a good idea at the time, but it never really caught on, and auto-recursive domains remain unused 3 years after having been introduced. Oh well, time for a late spring cleanup. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2017-06-22genirq/irqdomain: Add irq_domain_update_bus_token helperMarc Zyngier1-0/+31
We can have irq domains that are identified by the same fwnode (because they are serviced by the same HW), and yet have different functionnality (because they serve different busses, for example). This is what we use the bus_token field. Since we don't use this field when generating the domain name, all the aliasing domains will get the same name, and the debugfs file creation fails. Also, bus_token is updated by individual drivers, and the core code is unaware of that update. In order to sort this mess, let's introduce a helper that takes care of updating bus_token, and regenerate the debugfs file. A separate patch will update all the individual users. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2017-06-22genirq/affinity: Assign vectors to all present CPUsChristoph Hellwig1-13/+63
Currently the irq vector spread algorithm is restricted to online CPUs, which ties the IRQ mapping to the currently online devices and doesn't deal nicely with the fact that CPUs could come and go rapidly due to e.g. power management. Instead assign vectors to all present CPUs to avoid this churn. Build a map of all possible CPUs for a given node, as the architectures only provide a map of all onlines CPUs. Do this dynamically on each call for the vector assingments, which is a bit suboptimal and could be optimized in the future by provinding a mapping from the arch code. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: [email protected] Cc: Sagi Grimberg <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: [email protected] Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Avoid irq affinity setting for single targetsThomas Gleixner1-2/+10
Avoid trying to add a newly online CPU to the effective affinity mask of an started up interrupt. That interrupt will either stay on the already online CPU or move around for no value. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Introduce IRQD_SINGLE_TARGET flagThomas Gleixner1-0/+1
Many interrupt chips allow only a single CPU as interrupt target. The core code has no knowledge about that. That's unfortunate as it could avoid trying to readd a newly online CPU to the effective affinity mask. Add the status flag and the necessary accessors. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Handle managed IRQs on CPU hotplugThomas Gleixner2-0/+50
If a CPU goes offline, interrupts affine to the CPU are moved away. If the outgoing CPU is the last CPU in the affinity mask the migration code breaks the affinity and sets it it all online cpus. This is a problem for affinity managed interrupts as CPU hotplug is often used for power management purposes. If the affinity is broken, the interrupt is not longer affine to the CPUs to which it was allocated. The affinity spreading allows to lay out multi queue devices in a way that they are assigned to a single CPU or a group of CPUs. If the last CPU goes offline, then the queue is not longer used, so the interrupt can be shutdown gracefully and parked until one of the assigned CPUs comes online again. Add a graceful shutdown mechanism into the irq affinity breaking code path, mark the irq as MANAGED_SHUTDOWN and leave the affinity mask unmodified. In the online path, scan the active interrupts for managed interrupts and if the interrupt is functional and the newly online CPU is part of the affinity mask, restart the interrupt if it is marked MANAGED_SHUTDOWN or if the interrupts is started up, try to add the CPU back to the effective affinity mask. Originally-by: Christoph Hellwig <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Handle managed irqs gracefully in irq_startup()Thomas Gleixner1-3/+61
Affinity managed interrupts should keep their assigned affinity accross CPU hotplug. To avoid magic hackery in device drivers, the core code shall manage them transparently and set these interrupts into a managed shutdown state when the last CPU of the assigned affinity mask goes offline. The interrupt will be restarted when one of the CPUs in the assigned affinity mask comes back online. Add the necessary logic to irq_startup(). If an interrupt is requested and started up, the code checks whether it is affinity managed and if so, it checks whether a CPU in the interrupts affinity mask is online. If not, it puts the interrupt into managed shutdown state. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Add force argument to irq_startup()Thomas Gleixner4-7/+14
In order to handle managed interrupts gracefully on irq_startup() so they won't lose their assigned affinity, it's necessary to allow startups which keep the interrupts in managed shutdown state, if none of the assigend CPUs is online. This allows drivers to request interrupts w/o the CPUs being online, which avoid online/offline churn in drivers. Add a force argument which can override that decision and let only request_irq() and enable_irq() allow the managed shutdown handling. enable_irq() is required, because the interrupt might be requested with IRQF_NOAUTOEN and enable_irq() invokes irq_startup() which would then wreckage the assignment again. All other callers force startup and potentially break the assigned affinity. No functional change as this only adds the function argument. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Split out irq_startup() codeThomas Gleixner1-11/+18
Split out the inner workings of irq_startup() so it can be reused to handle managed interrupts gracefully. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Introduce IRQD_MANAGED_SHUTDOWNThomas Gleixner1-0/+10
Affinity managed interrupts should keep their assigned affinity accross CPU hotplug. To avoid magic hackery in device drivers, the core code shall manage them transparently. This will set these interrupts into a managed shutdown state when the last CPU of the assigned affinity mask goes offline. The interrupt will be restarted when one of the CPUs in the assigned affinity mask comes back online. Introduce the necessary state flag and the accessor functions. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/cpuhotplug: Use effective affinity maskThomas Gleixner1-3/+11
If the architecture supports the effective affinity mask, migrating interrupts away which are not targeted by the effective mask is pointless. They can stay in the user or system supplied affinity mask, but won't be targetted at any given point as the affinity setter functions need to validate against the online cpu mask anyway. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Introduce effective affinity maskThomas Gleixner4-7/+105
There is currently no way to evaluate the effective affinity mask of a given interrupt. Many irq chips allow only a single target CPU or a subset of CPUs in the affinity mask. Updating the mask at the time of setting the affinity to the subset would be counterproductive because information for cpu hotplug about assigned interrupt affinities gets lost. On CPU hotplug it's also pointless to force migrate an interrupt, which is not targeted at the CPU effectively. But currently the information is not available. Provide a seperate mask to be updated by the irq_chip->irq_set_affinity() implementations. Implement the read only proc files so the user can see the effective mask as well w/o trying to deduce it from /proc/interrupts. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq/proc: Replace ever repeating type castThomas Gleixner1-5/+5
The proc file setup repeats the same ugly type cast for the irq number over and over. Do it once and hand in the local void pointer. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Remove pointless gfp argumentThomas Gleixner1-8/+7
All callers hand in GPF_KERNEL. No point to have an extra argument for that. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Remove pointless arg from show_irq_affinityThomas Gleixner1-3/+3
The third argument of the internal helper function is unused. Remove it. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22genirq: Move irq_fixup_move_pending() to coreThomas Gleixner1-0/+5
Now that x86 uses the generic code, the function declaration and inline stub can move to the core internal header. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]