aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-02-13irqdomain: Drop revmap mutexJohan Hovold1-7/+6
The revmap mutex is essentially only used to maintain the integrity of the radix tree during updates (lookups use RCU). As the global irq_domain_mutex is now held in all paths that update the revmap structures there is strictly no longer any need for the dedicated mutex, which can be removed. Drop the revmap mutex and add lockdep assertions to the revmap helpers to make sure that the global lock is always held when updating the revmap. Tested-by: Hsin-Yi Wang <[email protected]> Tested-by: Mark-PK Tsai <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-13irqdomain: Fix domain registration raceMarc Zyngier1-19/+43
Hierarchical domains created using irq_domain_create_hierarchy() are currently added to the domain list before having been fully initialised. This specifically means that a racing allocation request might fail to allocate irq data for the inner domains of a hierarchy in case the parent domain pointer has not yet been set up. Note that this is not really any issue for irqchip drivers that are registered early (e.g. via IRQCHIP_DECLARE() or IRQCHIP_ACPI_DECLARE()) but could potentially cause trouble with drivers that are registered later (e.g. modular drivers using IRQCHIP_PLATFORM_DRIVER_BEGIN(), gpiochip drivers, etc.). Fixes: afb7da83b9f4 ("irqdomain: Introduce helper function irq_domain_add_hierarchy()") Cc: [email protected] # 3.19 Signed-off-by: Marc Zyngier <[email protected]> [ johan: add commit message ] Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-13irqdomain: Fix mapping-creation raceJohan Hovold1-18/+46
Parallel probing of devices that share interrupts (e.g. when a driver uses asynchronous probing) can currently result in two mappings for the same hardware interrupt to be created due to missing serialisation. Make sure to hold the irq_domain_mutex when creating mappings so that looking for an existing mapping before creating a new one is done atomically. Fixes: 765230b5f084 ("driver-core: add asynchronous probing support for drivers") Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings") Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] # 4.8 Cc: Dmitry Torokhov <[email protected]> Cc: Jon Hunter <[email protected]> Tested-by: Hsin-Yi Wang <[email protected]> Tested-by: Mark-PK Tsai <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-13irqdomain: Refactor __irq_domain_alloc_irqs()Johan Hovold1-40/+48
Refactor __irq_domain_alloc_irqs() so that it can be called internally while holding the irq_domain_mutex. This will be used to fix a shared-interrupt mapping race, hence the Fixes tag. Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings") Cc: [email protected] # 4.8 Tested-by: Hsin-Yi Wang <[email protected]> Tested-by: Mark-PK Tsai <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-13irqdomain: Look for existing mapping only onceJohan Hovold1-27/+33
Avoid looking for an existing mapping twice when creating a new mapping using irq_create_fwspec_mapping() by factoring out the actual allocation which is shared with irq_create_mapping_affinity(). The new helper function will also be used to fix a shared-interrupt mapping race, hence the Fixes tag. Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings") Cc: [email protected] # 4.8 Tested-by: Hsin-Yi Wang <[email protected]> Tested-by: Mark-PK Tsai <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-13irqdomain: Drop bogus fwspec-mapping error handlingJohan Hovold1-6/+1
In case a newly allocated IRQ ever ends up not having any associated struct irq_data it would not even be possible to dispose the mapping. Replace the bogus disposal with a WARN_ON(). This will also be used to fix a shared-interrupt mapping race, hence the CC-stable tag. Fixes: 1e2a7d78499e ("irqdomain: Don't set type when mapping an IRQ") Cc: [email protected] # 4.8 Tested-by: Hsin-Yi Wang <[email protected]> Tested-by: Mark-PK Tsai <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-13irqdomain: Fix disassociation raceJohan Hovold1-0/+5
The global irq_domain_mutex is held when mapping interrupts from non-hierarchical domains but currently not when disposing them. This specifically means that updates of the domain mapcount is racy (currently only used for statistics in debugfs). Make sure to hold the global irq_domain_mutex also when disposing mappings from non-hierarchical domains. Fixes: 9dc6be3d4193 ("genirq/irqdomain: Add map counter") Cc: [email protected] # 4.13 Tested-by: Hsin-Yi Wang <[email protected]> Tested-by: Mark-PK Tsai <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-13irqdomain: Fix association raceJohan Hovold1-5/+14
The sanity check for an already mapped virq is done outside of the irq_domain_mutex-protected section which means that an (unlikely) racing association may not be detected. Fix this by factoring out the association implementation, which will also be used in a follow-on change to fix a shared-interrupt mapping race. Fixes: ddaf144c61da ("irqdomain: Refactor irq_domain_associate_many()") Cc: [email protected] # 3.11 Tested-by: Hsin-Yi Wang <[email protected]> Tested-by: Mark-PK Tsai <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-13Merge tag 'clocksource.2023.02.06b' of ↵Thomas Gleixner2-22/+56
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into timers/core Pull clocksource watchdog changes from Paul McKenney: o Improvements to clocksource-watchdog console messages. o Loosening of the clocksource-watchdog skew criteria to match those of NTP (500 parts per million, relaxed from 400 parts per million). If it is good enough for NTP, it is good enough for the clocksource watchdog. o Suspend clocksource-watchdog checking temporarily when high memory latencies are detected. This avoids the false-positive clock-skew events that have been seen on production systems running memory-intensive workloads. o On systems where the TSC is deemed trustworthy, use it as the watchdog timesource, but only when specifically requested using the tsc=watchdog kernel boot parameter. This permits clock-skew events to be detected, but avoids forcing workloads to use the slow HPET and ACPI PM timers. These last two timers are slow enough to cause systems to be needlessly marked bad on the one hand, and real skew does sometimes happen on production systems running production workloads on the other. And sometimes it is the fault of the TSC, or at least of the firmware that told the kernel to program the TSC with the wrong frequency. o Add a tsc=revalidate kernel boot parameter to allow the kernel to diagnose cases where the TSC hardware works fine, but was told by firmware to tick at the wrong frequency. Such cases are rare, but they really have happened on production systems. Link: https://lore.kernel.org/r/20230210193640.GA3325193@paulmck-ThinkPad-P17-Gen-1
2023-02-13sched/core: Fix a missed update of user_cpus_ptrWaiman Long1-1/+4
Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask"), a successful call to sched_setaffinity() should always save the user requested cpu affinity mask in a task's user_cpus_ptr. However, when the given cpu mask is the same as the current one, user_cpus_ptr is not updated. Fix this by saving the user mask in this case too. Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-02-13freezer,umh: Fix call_usermode_helper_exec() vs SIGKILLPeter Zijlstra1-7/+13
Tetsuo-San noted that commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic") broke call_usermodehelper_exec() for the KILLABLE case. Specifically it was missed that the second, unconditional, wait_for_completion() was not optional and ensures the on-stack completion is unused before going out-of-scope. Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic") Reported-by: [email protected] Reported-by: Tetsuo Handa <[email protected]> Debugged-by: Tetsuo Handa <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-02-12tracing: Make trace_define_field_ext() staticSteven Rostedt (Google)1-1/+1
trace_define_field_ext() is not used outside of trace_events.c, it should be static. Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: b6c7abd1c28a ("tracing: Fix TASK_COMM_LEN in trace event format file") Reported-by: Reported-by: kernel test robot <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-12Merge tag 'trace-v6.2-rc7' of ↵Linus Torvalds3-10/+33
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Fix showing of TASK_COMM_LEN instead of its value The TASK_COMM_LEN was converted from a macro into an enum so that BTF would have access to it. But this unfortunately caused TASK_COMM_LEN to display in the format fields of trace events, as they are created by the TRACE_EVENT() macro and such, macros convert to their values, where as enums do not. To handle this, instead of using the field itself to be display, save the value of the array size as another field in the trace_event_fields structure, and use that instead. Not only does this fix the issue, but also converts the other trace events that have this same problem (but were not breaking tooling). With this change, the original work around b3bc8547d3be6 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well") could be reverted (but that should be done in the merge window)" * tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix TASK_COMM_LEN in trace event format file
2023-02-12tracing: Fix TASK_COMM_LEN in trace event format fileYafang Shao3-10/+33
After commit 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN"), the content of the format file under /sys/kernel/tracing/events/task/task_newtask was changed from field:char comm[16]; offset:12; size:16; signed:0; to field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0; John reported that this change breaks older versions of perfetto. Then Mathieu pointed out that this behavioral change was caused by the use of __stringify(_len), which happens to work on macros, but not on enum labels. And he also gave the suggestion on how to fix it: :One possible solution to make this more robust would be to extend :struct trace_event_fields with one more field that indicates the length :of an array as an actual integer, without storing it in its stringified :form in the type, and do the formatting in f_show where it belongs. The result as follows after this change, $ cat /sys/kernel/tracing/events/task/task_newtask/format field:char comm[16]; offset:12; size:16; signed:0; Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Alexei Starovoitov <[email protected]> Cc: Kajetan Puchalski <[email protected]> CC: Qais Yousef <[email protected]> Fixes: 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN") Reported-by: John Stultz <[email protected]> Debugged-by: Mathieu Desnoyers <[email protected]> Suggested-by: Mathieu Desnoyers <[email protected]> Suggested-by: Steven Rostedt <[email protected]> Signed-off-by: Yafang Shao <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-11Merge tag 'locking-urgent-2023-02-11' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix an rtmutex missed-wakeup bug" * tag 'locking-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rtmutex: Ensure that the top waiter is always woken up
2023-02-11sched/rt: pick_next_rt_entity(): check list_entryPietro Borrello1-1/+4
Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") removed any path which could make pick_next_rt_entity() return NULL. However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of pick_next_rt_entity()) still checks the error condition, which can never happen, since list_entry() never returns NULL. Remove the BUG_ON check, and instead emit a warning in the only possible error condition here: the queue being empty which should never happen. Fixes: 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") Signed-off-by: Pietro Borrello <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Phil Auld <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Link: https://lore.kernel.org/r/20230128-list-entry-null-check-sched-v3-1-b1a71bd1ac6b@diag.uniroma1.it
2023-02-11sched/deadline: Add more reschedule cases to prio_changed_dl()Valentin Schneider1-15/+27
I've been tracking down an issue on a ~5.17ish kernel where: CPUx CPUy <DL task p0 owns an rtmutex M> <p0 depletes its runtime, gets throttled> <rq switches to the idle task> <DL task p1 blocks on M, boost/replenish p0> <No call to resched_curr() happens here> [idle task keeps running here until *something* accidentally sets TIF_NEED_RESCHED] On that kernel, it is quite easy to trigger using rt-tests's deadline_test [1] with the test running on isolated CPUs (this reduces the chance of something unrelated setting TIF_NEED_RESCHED on the idle tasks, making the issue even more obvious as the hung task detector chimes in). I haven't been able to reproduce this using a mainline kernel, even if I revert 2972e3050e35 ("tracing: Make trace_marker{,_raw} stream-like") which gets rid of the lock involved in the above test, *but* I cannot convince myself the issue isn't there from looking at the code. Make prio_changed_dl() issue a reschedule if the current task isn't a deadline one. While at it, ensure a reschedule is emitted when a queued-but-not-current task gets boosted with an earlier deadline that current's. [1]: https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git Signed-off-by: Valentin Schneider <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Juri Lelli <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-11sched/fair: sanitize vruntime of entity being placedZhang Qiao1-2/+13
When a scheduling entity is placed onto cfs_rq, its vruntime is pulled to the base level (around cfs_rq->min_vruntime), so that the entity doesn't gain extra boost when placed backwards. However, if the entity being placed wasn't executed for a long time, its vruntime may get too far behind (e.g. while cfs_rq was executing a low-weight hog), which can inverse the vruntime comparison due to s64 overflow. This results in the entity being placed with its original vruntime way forwards, so that it will effectively never get to the cpu. To prevent that, ignore the vruntime of the entity being placed if it didn't execute for much longer than the characteristic sheduler time scale. [rkagan: formatted, adjusted commit log, comments, cutoff value] Signed-off-by: Zhang Qiao <[email protected]> Co-developed-by: Roman Kagan <[email protected]> Signed-off-by: Roman Kagan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-02-11sched/fair: Remove capacity inversion detectionVincent Guittot2-98/+5
Remove the capacity inversion detection which is now handled by util_fits_cpu() returning -1 when we need to continue to look for a potential CPU with better performance. This ends up almost reverting patches below except for some comments: commit da07d2f9c153 ("sched/fair: Fixes for capacity inversion detection") commit aa69c36f31aa ("sched/fair: Consider capacity inversion in util_fits_cpu()") commit 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-11sched/fair: unlink misfit task from cpu overutilizedVincent Guittot1-23/+82
By taking into account uclamp_min, the 1:1 relation between task misfit and cpu overutilized is no more true as a task with a small util_avg may not fit a high capacity cpu because of uclamp_min constraint. Add a new state in util_fits_cpu() to reflect the case that task would fit a CPU except for the uclamp_min hint which is a performance requirement. Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we can use this new value to take additional action to select the best CPU that doesn't match uclamp_min hint. When util_fits_cpu() returns -1, we will continue to look for a possible CPU with better performance, which replaces Capacity Inversion detection with capacity_orig_of() - thermal_load_avg to detect a capacity inversion. Signed-off-by: Vincent Guittot <[email protected]> Reviewed-and-tested-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Tested-by: Kajetan Puchalski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-10bpf: allow to disable bpf prog memory accountingYafang Shao1-6/+7
We can simply disable the bpf prog memory accouting by not setting the GFP_ACCOUNT. Signed-off-by: Yafang Shao <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Roman Gushchin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-10bpf: allow to disable bpf map memory accountingYafang Shao2-3/+5
We can simply set root memcg as the map's memcg to disable bpf memory accounting. bpf_map_area_alloc is a little special as it gets the memcg from current rather than from the map, so we need to disable GFP_ACCOUNT specifically for it. Signed-off-by: Yafang Shao <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Roman Gushchin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-10bpf: use bpf_map_kvcalloc in bpf_local_storageYafang Shao2-2/+17
Introduce new helper bpf_map_kvcalloc() for the memory allocation in bpf_local_storage(). Then the allocation will charge the memory from the map instead of from current, though currently they are the same thing as it is only used in map creation path now. By charging map's memory into the memcg from the map, it will be more clear. Signed-off-by: Yafang Shao <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Roman Gushchin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-10Daniel Borkmann says:Jakub Kicinski14-85/+530
==================== pull-request: bpf-next 2023-02-11 We've added 96 non-merge commits during the last 14 day(s) which contain a total of 152 files changed, 4884 insertions(+), 962 deletions(-). There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c between commit 5b246e533d01 ("ice: split probe into smaller functions") from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev() is otherwise there a 2nd time, and add XDP features to the existing ice_cfg_netdev() one: [...] ice_set_netdev_features(netdev); netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); [...] Stephen's merge conflict mail: https://lore.kernel.org/bpf/[email protected]/ The main changes are: 1) Add support for BPF trampoline on s390x which finally allows to remove many test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich. 2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski. 3) Add capability to export the XDP features supported by the NIC. Along with that, add a XDP compliance test tool, from Lorenzo Bianconi & Marek Majtyka. 4) Add __bpf_kfunc tag for marking kernel functions as kfuncs, from David Vernet. 5) Add a deep dive documentation about the verifier's register liveness tracking algorithm, from Eduard Zingerman. 6) Fix and follow-up cleanups for resolve_btfids to be compiled as a host program to avoid cross compile issues, from Jiri Olsa & Ian Rogers. 7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted when testing on different NICs, from Jesper Dangaard Brouer. 8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang. 9) Extend libbpf to add an option for when the perf buffer should wake up, from Jon Doron. 10) Follow-up fix on xdp_metadata selftest to just consume on TX completion, from Stanislav Fomichev. 11) Extend the kfuncs.rst document with description on kfunc lifecycle & stability expectations, from David Vernet. 12) Fix bpftool prog profile to skip attaching to offline CPUs, from Tonghao Zhang. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-10net: skbuff: drop the word head from skb cacheJakub Kicinski1-1/+1
skbuff_head_cache is misnamed (perhaps for historical reasons?) because it does not hold heads. Head is the buffer which skb->data points to, and also where shinfo lives. struct sk_buff is a metadata structure, not the head. Eric recently added skb_small_head_cache (which allocates actual head buffers), let that serve as an excuse to finally clean this up :) Leave the user-space visible name intact, it could possibly be uAPI. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-02-09hung_task: print message when hung_task_warnings gets down to zero.fuyuanli1-0/+2
It's useful to report it when hung_task_warnings gets down to zero, so that we can know if kernel log was lost or there is no hung task was detected. Link: https://lkml.kernel.org/r/20230201135416.GA6560@didi-ThinkCentre-M920t-N000 Signed-off-by: fuyuanli <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan5-7/+7
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [[email protected]: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Sebastian Reichel <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arjun Roy <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Punit Agrawal <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-09mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASKSuren Baghdasaryan1-1/+1
To simplify the usage of VM_LOCKED_CLEAR_MASK in vm_flags_clear(), replace it with VM_LOCKED_MASK bitmask and convert all users. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arjun Roy <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Punit Agrawal <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-09kernel/fork: convert vma assignment to a memcpySuren Baghdasaryan1-1/+1
Patch series "introduce vm_flags modifier functions", v4. This patchset was originally published as a part of per-VMA locking [1] and was split after suggestion that it's viable on its own and to facilitate the review process. It is now a preprequisite for the next version of per-VMA lock patchset, which reuses vm_flags modifier functions to lock the VMA when vm_flags are being updated. VMA vm_flags modifications are usually done under exclusive mmap_lock protection because this attrubute affects other decisions like VMA merging or splitting and races should be prevented. Introduce vm_flags modifier functions to enforce correct locking. This patch (of 7): Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler errors when we add a const modifier to vma->vm_flags. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arjun Roy <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Punit Agrawal <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Sebastian Reichel <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-09mm/mmap: remove __vma_adjust()Liam R. Howlett1-1/+1
Inline the work of __vma_adjust() into vma_merge(). This reduces code size and has the added benefits of the comments for the cases being located with the code. Change the comments referencing vma_adjust() accordingly. [[email protected]: fix vma_merge() offset when expanding the next vma] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-09sched: convert to vma iteratorLiam R. Howlett1-7/+7
Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-09kernel/fork: convert forking to using the vmi iteratorLiam R. Howlett1-11/+8
Avoid using the maple tree interface directly. This gains type safety. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-35/+54
net/devlink/leftover.c / net/core/devlink.c: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global") f05bd8ebeb69 ("devlink: move code to a dedicated directory") 687125b5799c ("devlink: split out core code") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-09PM: EM: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-4/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-02-09time/debug: Fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-1/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-08kernel/fail_function: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-4/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: Andrew Morton <[email protected]> Reviewed-by: Yang Yingliang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-02-08kernel/power/energy_model.c: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-4/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: "Rafael J. Wysocki" <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Len Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-02-08kernel/time/test_udelay.c: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-1/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-02-07sched/topology: Introduce sched_numa_hop_mask()Valentin Schneider1-0/+33
Tariq has pointed out that drivers allocating IRQ vectors would benefit from having smarter NUMA-awareness - cpumask_local_spread() only knows about the local node and everything outside is in the same bucket. sched_domains_numa_masks is pretty much what we want to hand out (a cpumask of CPUs reachable within a given distance budget), introduce sched_numa_hop_mask() to export those cpumasks. Link: http://lore.kernel.org/r/[email protected] Signed-off-by: Valentin Schneider <[email protected]> Reviewed-by: Yury Norov <[email protected]> Signed-off-by: Yury Norov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-07sched: add sched_numa_find_nth_cpu()Yury Norov1-0/+57
The function finds Nth set CPU in a given cpumask starting from a given node. Leveraging the fact that each hop in sched_domains_numa_masks includes the same or greater number of CPUs than the previous one, we can use binary search on hops instead of linear walk, which makes the overall complexity of O(log n) in terms of number of cpumask_weight() calls. Signed-off-by: Yury Norov <[email protected]> Acked-by: Tariq Toukan <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Peter Lafreniere <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-07tracing: Allow boot instances to have snapshot buffersSteven Rostedt (Google)1-7/+72
Add to ftrace_boot_snapshot, "=<instance>" name, where the instance will get a snapshot buffer, and will take a snapshot at the end of boot (which will save the boot traces). Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing: Add trace_array_puts() to write into instanceSteven Rostedt (Google)1-10/+17
Add a generic trace_array_puts() that can be used to "trace_puts()" into an allocated trace_array instance. This is just another variant of trace_array_printk(). Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing: Add enabling of events to boot instancesSteven Rostedt (Google)3-5/+10
Add the format of: trace_instance=foo,sched:sched_switch,irq_handler_entry,initcall That will create the "foo" instance and enable the sched_switch event (here were the "sched" system is explicitly specified), the irq_handler_entry event, and all events under the system initcall. Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing: Add creation of instances at boot command lineSteven Rostedt (Google)1-0/+50
Add kernel command line to add tracing instances. This only creates instances at boot but still does not enable any events to them. Later changes will extend this command line to add enabling of events, filters, and triggers. As well as possibly redirecting trace_printk()! Link: https://lkml.kernel.org/r/[email protected] Cc: Randy Dunlap <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing: Fix trace_event_raw_event_synth() if else statementSteven Rostedt (Google)1-2/+2
The test to check if the field is a stack is to be done if it is not a string. But the code had: } if (event->fields[i]->is_stack) { and not } else if (event->fields[i]->is_stack) { which would cause it to always be tested. Worse yet, this also included an "else" statement that was only to be called if the field was not a string and a stack, but this code allows it to be called if it was a string (and not a stack). Also fixed some whitespace issues. Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Tom Zanussi <[email protected]> Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Reported-by: kernel test robot <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]>
2023-02-07tracing: Acquire buffer from temparary trace sequenceLinyu Yuan1-0/+23
there is one dwc3 trace event declare as below, DECLARE_EVENT_CLASS(dwc3_log_event, TP_PROTO(u32 event, struct dwc3 *dwc), TP_ARGS(event, dwc), TP_STRUCT__entry( __field(u32, event) __field(u32, ep0state) __dynamic_array(char, str, DWC3_MSG_MAX) ), TP_fast_assign( __entry->event = event; __entry->ep0state = dwc->ep0state; ), TP_printk("event (%08x): %s", __entry->event, dwc3_decode_event(__get_str(str), DWC3_MSG_MAX, __entry->event, __entry->ep0state)) ); the problem is when trace function called, it will allocate up to DWC3_MSG_MAX bytes from trace event buffer, but never fill the buffer during fast assignment, it only fill the buffer when output function are called, so this means if output function are not called, the buffer will never used. add __get_buf(len) which acquiree buffer from iter->tmp_seq when trace output function called, it allow user write string to acquired buffer. the mentioned dwc3 trace event will changed as below, DECLARE_EVENT_CLASS(dwc3_log_event, TP_PROTO(u32 event, struct dwc3 *dwc), TP_ARGS(event, dwc), TP_STRUCT__entry( __field(u32, event) __field(u32, ep0state) ), TP_fast_assign( __entry->event = event; __entry->ep0state = dwc->ep0state; ), TP_printk("event (%08x): %s", __entry->event, dwc3_decode_event(__get_buf(DWC3_MSG_MAX), DWC3_MSG_MAX, __entry->event, __entry->ep0state)) );. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Linyu Yuan <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing/osnoise: No need for schedule_hrtimeout rangeDavidlohr Bueso1-1/+1
No slack time is being passed, just use schedule_hrtimeout(). Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07Merge tag 'trace-v6.2-rc6' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Fix regression in poll() and select() With the fix that made poll() and select() block if read would block caused a slight regression in rasdaemon, as it needed that kind of behavior. Add a way to make that behavior come back by writing zero into the 'buffer_percentage', which means to never block on read" * tag 'trace-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
2023-02-07fanotify,audit: Allow audit to use the full permission event responseRichard Guy Briggs1-3/+15
This patch passes the full response so that the audit function can use all of it. The audit function was updated to log the additional information in the AUDIT_FANOTIFY record. Currently the only type of fanotify info that is defined is an audit rule number, but convert it to hex encoding to future-proof the field. Hex encoding suggested by Paul Moore <[email protected]>. The {subj,obj}_trust values are {0,1,2}, corresponding to no, yes, unknown. Sample records: type=FANOTIFY msg=audit(1600385147.372:590): resp=2 fan_type=1 fan_info=3137 subj_trust=3 obj_trust=5 type=FANOTIFY msg=audit(1659730979.839:284): resp=1 fan_type=0 fan_info=0 subj_trust=2 obj_trust=2 Suggested-by: Steve Grubb <[email protected]> Link: https://lore.kernel.org/r/3075502.aeNJFYEL58@x2 Tested-by: Steve Grubb <[email protected]> Acked-by: Steve Grubb <[email protected]> Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <bcb6d552e517b8751ece153e516d8b073459069c.1675373475.git.rgb@redhat.com>
2023-02-07fanotify: Ensure consistent variable type for responseRichard Guy Briggs1-1/+1
The user space API for the response variable is __u32. This patch makes sure that the whole path through the kernel uses u32 so that there is no sign extension or truncation of the user space response. Suggested-by: Steve Grubb <[email protected]> Link: https://lore.kernel.org/r/12617626.uLZWGnKmhe@x2 Signed-off-by: Richard Guy Briggs <[email protected]> Acked-by: Paul Moore <[email protected]> Tested-by: Steve Grubb <[email protected]> Acked-by: Steve Grubb <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <3778cb0b3501bc4e686ba7770b20eb9ab0506cf4.1675373475.git.rgb@redhat.com>