aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-09-04dma-mapping: remove CONFIG_ARCH_NO_COHERENT_DMA_MMAPChristoph Hellwig1-7/+5
CONFIG_ARCH_NO_COHERENT_DMA_MMAP is now functionally identical to !CONFIG_MMU, so remove the separate symbol. The only difference is that arm did not set it for !CONFIG_MMU, but arm uses a separate dma mapping implementation including its own mmap method, which is handled by moving the CONFIG_MMU check in dma_can_mmap so that is only applies to the dma-direct case, just as the other ifdefs for it. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> # m68k
2019-09-04dma-mapping: add a dma_can_mmap helperChristoph Hellwig1-0/+23
Add a helper to check if DMA allocations for a specific device can be mapped to userspace using dma_mmap_*. Signed-off-by: Christoph Hellwig <[email protected]>
2019-09-04dma-mapping: explicitly wire up ->mmap and ->get_sgtableChristoph Hellwig1-8/+12
While the default ->mmap and ->get_sgtable implementations work for the majority of our dma_map_ops impementations they are inherently safe for others that don't use the page allocator or CMA and/or use their own way of remapping not covered by the common code. So remove the defaults if these methods are not wired up, but instead wire up the default implementations for all safe instances. Fixes: e1c7e324539a ("dma-mapping: always provide the dma_map_ops based implementation") Signed-off-by: Christoph Hellwig <[email protected]>
2019-09-04dma-mapping: move the dma_get_sgtable API comments from arm to common codeChristoph Hellwig1-0/+11
The comments are spot on and should be near the central API, not just near a single implementation. Signed-off-by: Christoph Hellwig <[email protected]>
2019-09-03kgdb: fix comment regarding static functionNadav Amit1-4/+1
The comment that says that module_event() is not static is clearly wrong. Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Daniel Thompson <[email protected]>
2019-09-03kdb: Replace strncmp with str_has_prefixChuhong Yuan1-1/+1
strncmp(str, const, len) is error-prone. We had better use newly introduced str_has_prefix() instead of it. Signed-off-by: Chuhong Yuan <[email protected]> Signed-off-by: Daniel Thompson <[email protected]>
2019-09-03cpuidle: play_idle: Increase the resolution to usecDaniel Lezcano1-3/+4
The play_idle resolution is 1ms. The intel_powerclamp bases the idle duration on jiffies. The idle injection API is also using msec based duration but has no user yet. Unfortunately, msec based time does not fit well when we want to inject idle cycle precisely with shallow idle state. In order to set the scene for the incoming idle injection user, move the precision up to usec when calling play_idle. Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2019-09-03irqdomain: Add the missing assignment of domain->fwnode for named fwnodeDexuan Cui1-0/+1
Recently device pass-through stops working for Linux VM running on Hyper-V. git-bisect shows the regression is caused by the recent commit 467a3bb97432 ("PCI: hv: Allocate a named fwnode ..."), but the root cause is that the commit d59f6617eef0 forgets to set the domain->fwnode for IRQCHIP_FWNODE_NAMED*, and as a result: 1. The domain->fwnode remains to be NULL. 2. irq_find_matching_fwspec() returns NULL since "h->fwnode == fwnode" is false, and pci_set_bus_msi_domain() sets the Hyper-V PCI root bus's msi_domain to NULL. 3. When the device is added onto the root bus, the device's dev->msi_domain is set to NULL in pci_set_msi_domain(). 4. When a device driver tries to enable MSI-X, pci_msi_setup_msi_irqs() calls arch_setup_msi_irqs(), which uses the native MSI chip (i.e. arch/x86/kernel/apic/msi.c: pci_msi_controller) to set up the irqs, but actually pci_msi_setup_msi_irqs() is supposed to call msi_domain_alloc_irqs() with the hbus->irq_domain, which is created in hv_pcie_init_irq_domain() and is associated with the Hyper-V chip hv_msi_irq_chip. Consequently, the irq line is not properly set up, and the device driver can not receive any interrupt. Fixes: d59f6617eef0 ("genirq: Allow fwnode to carry name information only") Fixes: 467a3bb97432 ("PCI: hv: Allocate a named fwnode instead of an address-based one") Reported-by: Lili Deng <[email protected]> Signed-off-by: Dexuan Cui <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/PU1P153MB01694D9AF625AC335C600C5FBFBE0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM
2019-09-03sched/uclamp: Always use 'enum uclamp_id' for clamp_id valuesPatrick Bellasi2-20/+20
The supported clamp indexes are defined in 'enum clamp_id', however, because of the code logic in some of the first utilization clamping series version, sometimes we needed to use 'unsigned int' to represent indices. This is not more required since the final version of the uclamp_* APIs can always use the proper enum uclamp_id type. Fix it with a bulk rename now that we have all the bits merged. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Michal Koutny <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-03sched/uclamp: Update CPU's refcount on TG's clamp changesPatrick Bellasi1-1/+54
On updates of task group (TG) clamp values, ensure that these new values are enforced on all RUNNABLE tasks of the task group, i.e. all RUNNABLE tasks are immediately boosted and/or capped as requested. Do that each time we update effective clamps from cpu_util_update_eff(). Use the *cgroup_subsys_state (css) to walk the list of tasks in each affected TG and update their RUNNABLE tasks. Update each task by using the same mechanism used for cpu affinity masks updates, i.e. by taking the rq lock. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Michal Koutny <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-03sched/uclamp: Use TG's clamps to restrict TASK's clampsPatrick Bellasi1-1/+27
When a task specific clamp value is configured via sched_setattr(2), this value is accounted in the corresponding clamp bucket every time the task is {en,de}qeued. However, when cgroups are also in use, the task specific clamp values could be restricted by the task_group (TG) clamp values. Update uclamp_cpu_inc() to aggregate task and TG clamp values. Every time a task is enqueued, it's accounted in the clamp bucket tracking the smaller clamp between the task specific value and its TG effective value. This allows to: 1. ensure cgroup clamps are always used to restrict task specific requests, i.e. boosted not more than its TG effective protection and capped at least as its TG effective limit. 2. implement a "nice-like" policy, where tasks are still allowed to request less than what enforced by their TG effective limits and protections Do this by exploiting the concept of "effective" clamp, which is already used by a TG to track parent enforced restrictions. Apply task group clamp restrictions only to tasks belonging to a child group. While, for tasks in the root group or in an autogroup, system defaults are still enforced. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Michal Koutny <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-03sched/uclamp: Propagate system defaults to the root groupPatrick Bellasi1-2/+29
The clamp values are not tunable at the level of the root task group. That's for two main reasons: - the root group represents "system resources" which are always entirely available from the cgroup standpoint. - when tuning/restricting "system resources" makes sense, tuning must be done using a system wide API which should also be available when control groups are not. When a system wide restriction is available, cgroups should be aware of its value in order to know exactly how much "system resources" are available for the subgroups. Utilization clamping supports already the concepts of: - system defaults: which define the maximum possible clamp values usable by tasks. - effective clamps: which allows a parent cgroup to constraint (maybe temporarily) its descendants without losing the information related to the values "requested" from them. Exploit these two concepts and bind them together in such a way that, whenever system default are tuned, the new values are propagated to (possibly) restrict or relax the "effective" value of nested cgroups. When cgroups are in use, force an update of all the RUNNABLE tasks. Otherwise, keep things simple and do just a lazy update next time each task will be enqueued. Do that since we assume a more strict resource control is required when cgroups are in use. This allows also to keep "effective" clamp values updated in case we need to expose them to user-space. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Michal Koutny <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-03sched/uclamp: Propagate parent clampsPatrick Bellasi2-0/+46
In order to properly support hierarchical resources control, the cgroup delegation model requires that attribute writes from a child group never fail but still are locally consistent and constrained based on parent's assigned resources. This requires to properly propagate and aggregate parent attributes down to its descendants. Implement this mechanism by adding a new "effective" clamp value for each task group. The effective clamp value is defined as the smaller value between the clamp value of a group and the effective clamp value of its parent. This is the actual clamp value enforced on tasks in a task group. Since it's possible for a cpu.uclamp.min value to be bigger than the cpu.uclamp.max value, ensure local consistency by restricting each "protection" (i.e. min utilization) with the corresponding "limit" (i.e. max utilization). Do that at effective clamps propagation to ensure all user-space write never fails while still always tracking the most restrictive values. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Michal Koutny <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-03sched/uclamp: Extend CPU's cgroup controllerPatrick Bellasi2-4/+197
The cgroup CPU bandwidth controller allows to assign a specified (maximum) bandwidth to the tasks of a group. However this bandwidth is defined and enforced only on a temporal base, without considering the actual frequency a CPU is running on. Thus, the amount of computation completed by a task within an allocated bandwidth can be very different depending on the actual frequency the CPU is running that task. The amount of computation can be affected also by the specific CPU a task is running on, especially when running on asymmetric capacity systems like Arm's big.LITTLE. With the availability of schedutil, the scheduler is now able to drive frequency selections based on actual task utilization. Moreover, the utilization clamping support provides a mechanism to bias the frequency selection operated by schedutil depending on constraints assigned to the tasks currently RUNNABLE on a CPU. Giving the mechanisms described above, it is now possible to extend the cpu controller to specify the minimum (or maximum) utilization which should be considered for tasks RUNNABLE on a cpu. This makes it possible to better defined the actual computational power assigned to task groups, thus improving the cgroup CPU bandwidth controller which is currently based just on time constraints. Extend the CPU controller with a couple of new attributes uclamp.{min,max} which allow to enforce utilization boosting and capping for all the tasks in a group. Specifically: - uclamp.min: defines the minimum utilization which should be considered i.e. the RUNNABLE tasks of this group will run at least at a minimum frequency which corresponds to the uclamp.min utilization - uclamp.max: defines the maximum utilization which should be considered i.e. the RUNNABLE tasks of this group will run up to a maximum frequency which corresponds to the uclamp.max utilization These attributes: a) are available only for non-root nodes, both on default and legacy hierarchies, while system wide clamps are defined by a generic interface which does not depends on cgroups. This system wide interface enforces constraints on tasks in the root node. b) enforce effective constraints at each level of the hierarchy which are a restriction of the group requests considering its parent's effective constraints. Root group effective constraints are defined by the system wide interface. This mechanism allows each (non-root) level of the hierarchy to: - request whatever clamp values it would like to get - effectively get only up to the maximum amount allowed by its parent c) have higher priority than task-specific clamps, defined via sched_setattr(), thus allowing to control and restrict task requests. Add two new attributes to the cpu controller to collect "requested" clamp values. Allow that at each non-root level of the hierarchy. Keep it simple by not caring now about "effective" values computation and propagation along the hierarchy. Update sysctl_sched_uclamp_handler() to use the newly introduced uclamp_mutex so that we serialize system default updates with cgroup relate updates. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Michal Koutny <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-03sched/topology: Improve load balancing on AMD EPYC systemsMatt Fleming1-1/+2
SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init() for any sched domains with a NUMA distance greater than 2 hops (RECLAIM_DISTANCE). The idea being that it's expensive to balance across domains that far apart. However, as is rather unfortunately explained in: commit 32e45ff43eaf ("mm: increase RECLAIM_DISTANCE to 30") the value for RECLAIM_DISTANCE is based on node distance tables from 2011-era hardware. Current AMD EPYC machines have the following NUMA node distances: node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 16 32 32 32 32 1: 16 10 16 16 32 32 32 32 2: 16 16 10 16 32 32 32 32 3: 16 16 16 10 32 32 32 32 4: 32 32 32 32 10 16 16 16 5: 32 32 32 32 16 10 16 16 6: 32 32 32 32 16 16 10 16 7: 32 32 32 32 16 16 16 10 where 2 hops is 32. The result is that the scheduler fails to load balance properly across NUMA nodes on different sockets -- 2 hops apart. For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4 (CPUs 32-39) like so, $ numactl -C 0-7,32-39 ./spinner 16 causes all threads to fork and remain on node 0 until the active balancer kicks in after a few seconds and forcibly moves some threads to node 4. Override node_reclaim_distance for AMD Zen. Signed-off-by: Matt Fleming <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-03sched/fair: Don't assign runtime for throttled cfs_rqLiangyan1-0/+5
do_sched_cfs_period_timer() will refill cfs_b runtime and call distribute_cfs_runtime to unthrottle cfs_rq, sometimes cfs_b->runtime will allocate all quota to one cfs_rq incorrectly, then other cfs_rqs attached to this cfs_b can't get runtime and will be throttled. We find that one throttled cfs_rq has non-negative cfs_rq->runtime_remaining and cause an unexpetced cast from s64 to u64 in snippet: distribute_cfs_runtime() { runtime = -cfs_rq->runtime_remaining + 1; } The runtime here will change to a large number and consume all cfs_b->runtime in this cfs_b period. According to Ben Segall, the throttled cfs_rq can have account_cfs_rq_runtime called on it because it is throttled before idle_balance, and the idle_balance calls update_rq_clock to add time that is accounted to the task. This commit prevents cfs_rq to be assgined new runtime if it has been throttled until that distribute_cfs_runtime is called. Signed-off-by: Liangyan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Ben Segall <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: d3d9dc330236 ("sched: Throttle entities exceeding their allowed bandwidth") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-03dma-mapping: introduce dma_get_merge_boundary()Yoshihiro Shimoda1-0/+11
This patch adds a new DMA API "dma_get_merge_boundary". This function returns the DMA merge boundary if the DMA layer can merge the segments. This patch also adds the implementation for a new dma_map_ops pointer. Signed-off-by: Yoshihiro Shimoda <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller6-18/+44
r8152 conflicts are the NAPI fixes in 'net' overlapping with some tasklet stuff in net-next Signed-off-by: David S. Miller <[email protected]>
2019-09-02Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar7-32/+61
Signed-off-by: Ingo Molnar <[email protected]>
2019-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-2/+6
Pull networking fixes from David Miller: 1) Fix some length checks during OGM processing in batman-adv, from Sven Eckelmann. 2) Fix regression that caused netfilter conntrack sysctls to not be per-netns any more. From Florian Westphal. 3) Use after free in netpoll, from Feng Sun. 4) Guard destruction of pfifo_fast per-cpu qdisc stats with qdisc_is_percpu_stats(), from Davide Caratti. Similar bug is fixed in pfifo_fast_enqueue(). 5) Fix memory leak in mld_del_delrec(), from Eric Dumazet. 6) Handle neigh events on internal ports correctly in nfp, from John Hurley. 7) Clear SKB timestamp in NF flow table code so that it does not confuse fq scheduler. From Florian Westphal. 8) taprio destroy can crash if it is invoked in a failure path of taprio_init(), because the list head isn't setup properly yet and the list del is unconditional. Perform the list add earlier to address this. From Vladimir Oltean. 9) Make sure to reapply vlan filters on device up, in aquantia driver. From Dmitry Bogdanov. 10) sgiseeq driver releases DMA memory using free_page() instead of dma_free_attrs(). From Christophe JAILLET. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits) net: seeq: Fix the function used to release some memory in an error handling path enetc: Add missing call to 'pci_free_irq_vectors()' in probe and remove functions net: bcmgenet: use ethtool_op_get_ts_info() tc-testing: don't hardcode 'ip' in nsPlugin.py net: dsa: microchip: add KSZ8563 compatibility string dt-bindings: net: dsa: document additional Microchip KSZ8563 switch net: aquantia: fix out of memory condition on rx side net: aquantia: linkstate irq should be oneshot net: aquantia: reapply vlan filters on up net: aquantia: fix limit of vlan filters net: aquantia: fix removal of vlan 0 net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte taprio: Fix kernel panic in taprio_destroy net: dsa: microchip: fill regmap_config name rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2] net: stmmac: dwmac-rk: Don't fail if phy regulator is absent amd-xgbe: Fix error path in xgbe_mod_init() netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorder mac80211: Correctly set noencrypt for PAE frames ...
2019-08-31tracing: Rename tracing_reset() to tracing_reset_cpu()Steven Rostedt (VMware)2-4/+3
The name tracing_reset() was a misnomer, as it really only reset a single CPU buffer. Rename it to tracing_reset_cpu() and also make it static and remove the prototype from trace.h, as it is only used in a single function. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing: Document the stack trace algorithm in the commentsSteven Rostedt (VMware)1-0/+98
As the max stack tracer algorithm is not that easy to understand from the code, add comments that explain the algorithm and mentions how ARCH_FTRACE_SHIFT_STACK_TRACER affects it. Link: http://lkml.kernel.org/r/[email protected] Suggested-by: Joel Fernandes <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/arm64: Have max stack tracer handle the case of return address after ↵Steven Rostedt (VMware)1-0/+14
data Most archs (well at least x86) store the function call return address on the stack before storing the local variables for the function. The max stack tracer depends on this in its algorithm to display the stack size of each function it finds in the back trace. Some archs (arm64), may store the return address (from its link register) just before calling a nested function. There's no reason to save the link register on leaf functions, as it wont be updated. This breaks the algorithm of the max stack tracer. Add a new define ARCH_FTRACE_SHIFT_STACK_TRACER that an architecture may set if it stores the return address (link register) after it stores the function's local variables, and have the stack trace shift the values of the mapped stack size to the appropriate functions. Link: [email protected] Reported-by: Jiping Ma <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/probe: Add immediate string parameter supportMasami Hiramatsu5-15/+51
Add immediate string parameter (\"string") support to probe events. This allows you to specify an immediate (or dummy) parameter instead of fetching a string from memory. This feature looks odd, but imagine that you put a probe on a code to trace some string data. If the code is compiled into 2 instructions and 1 instruction has a string on memory but other has no string since it is optimized out. In that case, you can not fold those into one event, even if ftrace supported multiple probes on one event. With this feature, you can set a dummy string like foo=\"(optimized)":string instead of something like foo=+0(+0(%bp)):string. Link: http://lkml.kernel.org/r/156095691687.28024.13372712423865047991.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/probe: Add immediate parameter supportMasami Hiramatsu3-1/+20
Add immediate value parameter (\1234) support to probe events. This allows you to specify an immediate (or dummy) parameter instead of fetching from memory or register. This feature looks odd, but imagine when you put a probe on a code to trace some data. If the code is compiled into 2 instructions and 1 instruction has a value but other has nothing since it is optimized out. In that case, you can not fold those into one event, even if ftrace supported multiple probes on one event. With this feature, you can set a dummy value like foo=\deadbeef instead of something like foo=%di. Link: http://lkml.kernel.org/r/156095690733.28024.13258186548822649469.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/uprobe: Add per-probe delete from eventMasami Hiramatsu1-1/+30
Add per-probe delete method from one event passing the head of definition. In other words, the events which match the head N parameters are deleted. Link: http://lkml.kernel.org/r/156095689811.28024.221706761151739433.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/kprobe: Add per-probe delete from eventMasami Hiramatsu3-1/+44
Allow user to delete a probe from event. This is done by head match. For example, if we have 2 probes on an event $ cat kprobe_events p:kprobes/testprobe _do_fork r1=%ax r2=%dx p:kprobes/testprobe idle_fork r1=%ax r2=%cx Then you can remove one of them by passing the head of definition which identify the probe. $ echo "-:kprobes/testprobe idle_fork" >> kprobe_events Link: http://lkml.kernel.org/r/156095688848.28024.15798690082378432435.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/uprobe: Add multi-probe per uprobe event supportMasami Hiramatsu2-19/+43
Allow user to define several probes on one uprobe event. Note that this only support appending method. So deleting event will delete all probes on the event. Link: http://lkml.kernel.org/r/156095687876.28024.13840331032234992863.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/kprobe: Add multi-probe per event supportMasami Hiramatsu4-18/+111
Add multi-probe per one event support to kprobe events. User can define several different probes on one trace event if those events have same "event signature", e.g. # echo p:testevent _do_fork > kprobe_events # echo p:testevent fork_idle >> kprobe_events # kprobe_events p:kprobes/testevent _do_fork p:kprobes/testevent fork_idle The event signature is defined by kprobe type (retprobe or not), the number of args, argument names, and argument types. Note that this only support appending method. Delete event operation will delete all probes on the event. Link: http://lkml.kernel.org/r/156095686913.28024.9357292202316540742.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/dynevent: Pass extra arguments to match operationMasami Hiramatsu5-10/+13
Pass extra arguments to match operation for checking exact match. If the event doesn't support exact match, it will be ignored. Link: http://lkml.kernel.org/r/156095685930.28024.10405547027475590975.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/dynevent: Delete all matched eventsMasami Hiramatsu1-3/+5
When user gives an event name to delete, delete all matched events instead of the first one. This means if there are several events which have same name but different group (subsystem) name, those are removed if user passed only the event name, e.g. # cat kprobe_events p:group1/testevent _do_fork p:group2/testevent fork_idle # echo -:testevent >> kprobe_events # cat kprobe_events # Link: http://lkml.kernel.org/r/156095684958.28024.16597826267117453638.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/probe: Split trace_event related data from trace_probeMasami Hiramatsu4-113/+311
Split the trace_event related data from trace_probe data structure and introduce trace_probe_event data structure for its folder. This trace_probe_event data structure can have multiple trace_probe. Link: http://lkml.kernel.org/r/156095683995.28024.7552150340561557873.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31kprobes: Allow kprobes coexist with livepatchMasami Hiramatsu1-16/+40
Allow kprobes which do not modify regs->ip, coexist with livepatch by dropping FTRACE_OPS_FL_IPMODIFY from ftrace_ops. User who wants to modify regs->ip (e.g. function fault injection) must set a dummy post_handler to its kprobes when registering. However, if such regs->ip modifying kprobes is set on a function, that function can not be livepatched. Link: http://lkml.kernel.org/r/156403587671.30117.5233558741694155985.stgit@devnote2 Acked-by: Joe Lawrence <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31Merge tag 'trace-v5.3-rc6' of ↵Linus Torvalds4-14/+34
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Small fixes and minor cleanups for tracing: - Make exported ftrace function not static - Fix NULL pointer dereference in reading probes as they are created - Fix NULL pointer dereference in k/uprobe clean up path - Various documentation fixes" * tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Correct kdoc formats ftrace/x86: Remove mcount() declaration tracing/probe: Fix null pointer dereference tracing: Make exported ftrace_set_clr_event non-static ftrace: Check for successful allocation of hash ftrace: Check for empty hash and comment the race with registering probes ftrace: Fix NULL pointer dereference in t_probe_next()
2019-08-31tracing: Correct kdoc formatsJakub Kicinski1-12/+14
Fix the following kdoc warnings: kernel/trace/trace.c:1579: warning: Function parameter or member 'tr' not described in 'update_max_tr_single' kernel/trace/trace.c:1579: warning: Function parameter or member 'tsk' not described in 'update_max_tr_single' kernel/trace/trace.c:1579: warning: Function parameter or member 'cpu' not described in 'update_max_tr_single' kernel/trace/trace.c:1776: warning: Function parameter or member 'type' not described in 'register_tracer' kernel/trace/trace.c:2239: warning: Function parameter or member 'task' not described in 'tracing_record_taskinfo' kernel/trace/trace.c:2239: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo' kernel/trace/trace.c:2269: warning: Function parameter or member 'prev' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:2269: warning: Function parameter or member 'next' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:2269: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:3078: warning: Function parameter or member 'ip' not described in 'trace_vbprintk' kernel/trace/trace.c:3078: warning: Function parameter or member 'fmt' not described in 'trace_vbprintk' kernel/trace/trace.c:3078: warning: Function parameter or member 'args' not described in 'trace_vbprintk' Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing/probe: Fix null pointer dereferenceXinpeng Liu1-1/+2
BUG: KASAN: null-ptr-deref in trace_probe_cleanup+0x8d/0xd0 Read of size 8 at addr 0000000000000000 by task syz-executor.0/9746 trace_probe_cleanup+0x8d/0xd0 free_trace_kprobe.part.14+0x15/0x50 alloc_trace_kprobe+0x23e/0x250 Link: http://lkml.kernel.org/r/[email protected] Fixes: e3dc9f898ef9c ("tracing/probe: Add trace_event_call accesses APIs") Signed-off-by: Xinpeng Liu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-31tracing: Make exported ftrace_set_clr_event non-staticDenis Efremov1-1/+1
The function ftrace_set_clr_event is declared static and marked EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the function was decided to be a part of API, this commit removes the static attribute and adds the declaration to the header. Link: http://lkml.kernel.org/r/[email protected] Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances") Reviewed-by: Joe Jin <[email protected]> Signed-off-by: Denis Efremov <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-2/+6
Daniel Borkmann says: ==================== pull-request: bpf 2019-08-31 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix 32-bit zero-extension during constant blinding which has been causing a regression on ppc64, from Naveen. 2) Fix a latency bug in nfp driver when updating stack index register, from Jiong. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-08-30ftrace: Check for successful allocation of hashNaveen N. Rao1-0/+5
In register_ftrace_function_probe(), we are not checking the return value of alloc_and_copy_ftrace_hash(). The subsequent call to ftrace_match_records() may end up dereferencing the same. Add a check to ensure this doesn't happen. Link: http://lkml.kernel.org/r/26e92574f25ad23e7cafa3cf5f7a819de1832cbe.1562249521.git.naveen.n.rao@linux.vnet.ibm.com Cc: [email protected] Fixes: 1ec3a81a0cf42 ("ftrace: Have each function probe use its own ftrace_ops") Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-30ftrace: Check for empty hash and comment the race with registering probesSteven Rostedt (VMware)1-1/+9
The race between adding a function probe and reading the probes that exist is very subtle. It needs a comment. Also, the issue can also happen if the probe has has the EMPTY_HASH as its func_hash. Cc: [email protected] Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-30ftrace: Fix NULL pointer dereference in t_probe_next()Naveen N. Rao1-0/+4
LTP testsuite on powerpc results in the below crash: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc00000000029d800 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV ... CPU: 68 PID: 96584 Comm: cat Kdump: loaded Tainted: G W NIP: c00000000029d800 LR: c00000000029dac4 CTR: c0000000001e6ad0 REGS: c0002017fae8ba10 TRAP: 0300 Tainted: G W MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022422 XER: 20040000 CFAR: c00000000029d90c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 ... NIP [c00000000029d800] t_probe_next+0x60/0x180 LR [c00000000029dac4] t_mod_start+0x1a4/0x1f0 Call Trace: [c0002017fae8bc90] [c000000000cdbc40] _cond_resched+0x10/0xb0 (unreliable) [c0002017fae8bce0] [c0000000002a15b0] t_start+0xf0/0x1c0 [c0002017fae8bd30] [c0000000004ec2b4] seq_read+0x184/0x640 [c0002017fae8bdd0] [c0000000004a57bc] sys_read+0x10c/0x300 [c0002017fae8be30] [c00000000000b388] system_call+0x5c/0x70 The test (ftrace_set_ftrace_filter.sh) is part of ftrace stress tests and the crash happens when the test does 'cat $TRACING_PATH/set_ftrace_filter'. The address points to the second line below, in t_probe_next(), where filter_hash is dereferenced: hash = iter->probe->ops.func_hash->filter_hash; size = 1 << hash->size_bits; This happens due to a race with register_ftrace_function_probe(). A new ftrace_func_probe is created and added into the func_probes list in trace_array under ftrace_lock. However, before initializing the filter, we drop ftrace_lock, and re-acquire it after acquiring regex_lock. If another process is trying to read set_ftrace_filter, it will be able to acquire ftrace_lock during this window and it will end up seeing a NULL filter_hash. Fix this by just checking for a NULL filter_hash in t_probe_next(). If the filter_hash is NULL, then this probe is just being added and we can simply return from here. Link: http://lkml.kernel.org/r/05e021f757625cbbb006fad41380323dbe4e3b43.1562249521.git.naveen.n.rao@linux.vnet.ibm.com Cc: [email protected] Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array") Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-08-30Merge branch 'for-next/atomics' into for-next/coreWill Deacon1-1/+3
* for-next/atomics: (10 commits) Rework LSE instruction selection to use static keys instead of alternatives
2019-08-30Merge branch 'topic/mem-encrypt' into nextMichael Ellerman2-10/+1
This branch has some cross-arch patches that are a prequisite for the SVM work. They're in a topic branch in case any of the other arch maintainers want to merge them to resolve conflicts.
2019-08-29dma-mapping: make dma_atomic_pool_init self-containedChristoph Hellwig1-3/+14
The memory allocated for the atomic pool needs to have the same mapping attributes that we use for remapping, so use pgprot_dmacoherent instead of open coding it. Also deduct a suitable zone to allocate the memory from based on the presence of the DMA zones. Signed-off-by: Christoph Hellwig <[email protected]>
2019-08-29dma-mapping: remove arch_dma_mmap_pgprotChristoph Hellwig2-6/+14
arch_dma_mmap_pgprot is used for two things: 1) to override the "normal" uncached page attributes for mapping memory coherent to devices that can't snoop the CPU caches 2) to provide the special DMA_ATTR_WRITE_COMBINE semantics on older arm systems and some mips platforms Replace one with the pgprot_dmacoherent macro that is already provided by arm and much simpler to use, and lift the DMA_ATTR_WRITE_COMBINE handling to common code with an explicit arch opt-in. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> # m68k Acked-by: Paul Burton <[email protected]> # mips
2019-08-29jump_label: Don't warn on __exit jump entriesAndrew Murray1-1/+3
On architectures that discard .exit.* sections at runtime, a warning is printed for each jump label that is used within an in-kernel __exit annotated function: can't patch jump_label at ehci_hcd_cleanup+0x8/0x3c WARNING: CPU: 0 PID: 1 at kernel/jump_label.c:410 __jump_label_update+0x12c/0x138 As these functions will never get executed (they are free'd along with the rest of initmem) - we do not need to patch them and should not display any warnings. The warning is displayed because the test required to satisfy jump_entry_is_init is based on init_section_contains (__init_begin to __init_end) whereas the test in __jump_label_update is based on init_kernel_text (_sinittext to _einittext) via kernel_text_address). Fixes: 19483677684b ("jump_label: Annotate entries that operate on __init code earlier") Signed-off-by: Andrew Murray <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mark Rutland <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2019-08-29posix-cpu-timers: Make expiry_active check actually work correctlyThomas Gleixner1-4/+5
The state tracking changes broke the expiry active check by not writing to it and instead sitting timers_active, which is already set. That's not a big issue as the actual expiry is protected by sighand lock, so concurrent handling is not possible. That means that the second task which invokes that function executes the expiry code for nothing. Write to the proper flag. Also add a check whether the flag is set into check_process_timers(). That check had been missing in the code before the rework already. The check for another task handling the expiry of process wide timers was only done in the fastpath check. If the fastpath check returns true because a per task timer expired, then the checking of process wide timers was done in parallel which is as explained above just a waste of cycles. Fixes: 244d49e30653 ("posix-cpu-timers: Move state tracking to struct posix_cputimers") Signed-off-by: Thomas Gleixner <[email protected]> Cc: Frederic Weisbecker <[email protected]>
2019-08-28Merge tag 'arm64-fixes' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Hot on the heels of our last set of fixes are a few more for -rc7. Two of them are fixing issues with our virtual interrupt controller implementation in KVM/arm, while the other is a longstanding but straightforward kallsyms fix which was been acked by Masami and resolves an initialisation failure in kprobes observed on arm64. - Fix GICv2 emulation bug (KVM) - Fix deadlock in virtual GIC interrupt injection code (KVM) - Fix kprobes blacklist init failure due to broken kallsyms lookup" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is long kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol
2019-08-28tick: Mark sched_timer to expire in hard interrupt contextSebastian Andrzej Siewior1-1/+1
sched_timer must be initialized with the _HARD mode suffix to ensure expiry in hard interrupt context on RT. The previous conversion to HARD expiry mode missed on one instance in tick_nohz_switch_to_nohz(). Fix it up. Fixes: 902a9f9c50905 ("tick: Mark tick related hrtimers to expiry in hard interrupt context") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-08-28genirq/affinity: Remove const qualifier from node_to_cpumask argumentMing Lei1-1/+1
When CONFIG_CPUMASK_OFFSTACK isn't enabled, 'cpumask_var_t' is as 'typedef struct cpumask cpumask_var_t[1]', so the argument 'node_to_cpumask' alloc_nodes_vectors() can't be declared as 'const cpumask_var_t *' Fixes the following warning: kernel/irq/affinity.c: In function '__irq_build_affinity_masks': alloc_nodes_vectors(numvecs, node_to_cpumask, cpu_mask, ^ kernel/irq/affinity.c:128:13: note: expected 'const struct cpumask (*)[1]' but argument is of type 'struct cpumask (*)[1]' static void alloc_nodes_vectors(unsigned int numvecs, ^ Fixes: b1a5a73e64e9 ("genirq/affinity: Spread vectors on node according to nr_cpu ratio") Reported-by: kbuild test robot <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]