blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-05-02	genirq: Use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()	Minghao Chi	1	-9/+4
	pm_runtime_resume_and_get() achieves the same and simplifies the code. [ tglx: Simplify it further by presetting retval ] Reported-by: Zeal Robot <[email protected]> Signed-off-by: Minghao Chi <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-02	timekeeping: Consolidate fast timekeeper	Thomas Gleixner	1	-10/+10
	Provide a inline function which replaces the copy & pasta. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-02	timekeeping: Annotate ktime_get_boot_fast_ns() with data_race()	Thomas Gleixner	1	-1/+1
	Accessing timekeeper::offset_boot in ktime_get_boot_fast_ns() is an intended data race as the reader side cannot synchronize with a writer and there is no space in struct tk_read_base of the NMI safe timekeeper. Mark it so. Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-01	mm: Fix PASID use-after-free issue	Fenghua Yu	1	-1/+1
	The PASID is being freed too early. It needs to stay around until after device drivers that might be using it have had a chance to clear it out of the hardware. The relevant refcounts are: mmget() /mmput() refcount the mm's address space mmgrab()/mmdrop() refcount the mm itself The PASID is currently tied to the life of the mm's address space and freed in __mmput(). This makes logical sense because the PASID can't be used once the address space is gone. But, this misses an important point: even after the address space is gone, the PASID will still be programmed into a device. Device drivers might, for instance, still need to flush operations that are outstanding and need to use that PASID. They do this at file->release() time. Device drivers call the IOMMU driver to hold a reference on the mm itself and drop it at file->release() time. But, the IOMMU driver holds a reference on the mm itself, not the address space. The address space (and the PASID) is long gone by the time the driver tries to clean up. This is effectively a use-after-free bug on the PASID. To fix this, move the PASID free operation from __mmput() to __mmdrop(). This ensures that the IOMMU driver's existing mmgrab() keeps the PASID allocated until it drops its mm reference. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Reported-by: Zhangfei Gao <[email protected]> Suggested-by: Jean-Philippe Brucker <[email protected]> Suggested-by: Jacob Pan <[email protected]> Signed-off-by: Fenghua Yu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Zhangfei Gao <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-01	smp: Make softirq handling RT safe in flush_smp_call_function_queue()	Sebastian Andrzej Siewior	2	-1/+17
	flush_smp_call_function_queue() invokes do_softirq() which is not available on PREEMPT_RT. flush_smp_call_function_queue() is invoked from the idle task and the migration task with preemption or interrupts disabled. So RT kernels cannot process soft interrupts in that context as that has to acquire 'sleeping spinlocks' which is not possible with preemption or interrupts disabled and forbidden from the idle task anyway. The currently known SMP function call which raises a soft interrupt is in the block layer, but this functionality is not enabled on RT kernels due to latency and performance reasons. RT could wake up ksoftirqd unconditionally, but this wants to be avoided if there were soft interrupts pending already when this is invoked in the context of the migration task. The migration task might have preempted a threaded interrupt handler which raised a soft interrupt, but did not reach the local_bh_enable() to process it. The "running" ksoftirqd might prevent the handling in the interrupt thread context which is causing latency issues. Add a new function which handles this case explicitely for RT and falls back to do_softirq() on !RT kernels. In the RT case this warns when one of the flushed SMP function calls raised a soft interrupt so this can be investigated. [ tglx: Moved the RT part out of SMP code ] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected]
2022-05-01	smp: Rename flush_smp_call_function_from_idle()	Thomas Gleixner	4	-11/+24
	This is invoked from the stopper thread too, which is definitely not idle. Rename it to flush_smp_call_function_queue() and fixup the callers. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-01	sched: Fix missing prototype warnings	Thomas Gleixner	8	-10/+15
	A W=1 build emits more than a dozen missing prototype warnings related to scheduler and scheduler specific includes. Reported-by: kernel test robot <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-04-30	task_work: allow TWA_SIGNAL without a rescheduling IPI	Jens Axboe	1	-6/+19
	Some use cases don't always need an IPI when sending a TWA_SIGNAL notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL, except it doesn't send an IPI to the target task. It merely sets TIF_NOTIFY_SIGNAL and wakes up the task. This can be useful in avoiding a forceful transition to the kernel if the task is running in userspace. Depending on the task_work in question, it may be quite fine waiting for the next reschedule or kernel enter anyway, or the use case may even have other mechanisms for hinting to the task that a transition may be useful. This can drive more cooperative scheduling of task_work. Reviewed-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-04-29	kernel: make taskstats available from all net namespaces	xu xin	1	-0/+1
	If getdelays runs in a non-init network namespace, it will fail in getting delayacct stats even if it has privilege of root user, which seems to be not very reasonable. We can simply reproduce this by executing commands: unshare -n getdelays -d -p <pid> I don't think net namespace should be an obstacle to the normal execution of getdelay function. So let's make it available from all net namespaces. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: xu xin <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Yang Yang <[email protected]> Cc: "Dr. Thomas Orgis" <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Ismael Luceno <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-29	taskstats: version 12 with thread group and exe info	Dr. Thomas Orgis	2	-2/+31
	The task exit struct needs some crucial information to be able to provide an enhanced version of process and thread accounting. This change provides: 1. ac_tgid in additon to ac_pid 2. thread group execution walltime in ac_tgetime 3. flag AGROUP in ac_flag to indicate the last task in a thread group / process 4. device ID and inode of task's /proc/self/exe in ac_exe_dev and ac_exe_inode 5. tools/accounting/procacct as demonstrator When a task exits, taskstats are reported to userspace including the task's pid and ppid, but without the id of the thread group this task is part of. Without the tgid, the stats of single tasks cannot be correlated to each other as a thread group (process). The taskstats documentation suggests that on process exit a data set consisting of accumulated stats for the whole group is produced. But such an additional set of stats is only produced for actually multithreaded processes, not groups that had only one thread, and also those stats only contain data about delay accounting and not the more basic information about CPU and memory resource usage. Adding the AGROUP flag to be set when the last task of a group exited enables determination of process end also for single-threaded processes. My applicaton basically does enhanced process accounting with summed cputime, biggest maxrss, tasks per process. The data is not available with the traditional BSD process accounting (which is not designed to be extensible) and the taskstats interface allows more efficient on-the-fly grouping and summing of the stats, anyway, without intermediate disk writes. Furthermore, I do carry statistics on which exact program binary is used how often with associated resources, getting a picture on how important which parts of a collection of installed scientific software in different versions are, and how well they put load on the machine. This is enabled by providing information on /proc/self/exe for each task. I assume the two 64-bit fields for device ID and inode are more appropriate than the possibly large resolved path to keep the data volume down. Add the tgid to the stats to complete task identification, the flag AGROUP to mark the last task of a group, the group wallclock time, and inode-based identification of the associated executable file. Add tools/accounting/procacct.c as a simplified fork of getdelays.c to demonstrate process and thread accounting. [[email protected]: fix version number in comment] Link: https://lkml.kernel.org/r/20220405003601.7a5f6008@plasteblaster Link: https://lkml.kernel.org/r/20220331004106.64e5616b@plasteblaster Signed-off-by: Dr. Thomas Orgis <[email protected]> Reviewed-by: Ismael Luceno <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: xu xin <[email protected]> Cc: Yang Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-29	kexec: remove redundant assignments	Michal Orzel	1	-2/+0
	Get rid of redundant assignments which end up in values not being read either because they are overwritten or the function ends. Reported by clang-tidy [deadcode.DeadStores] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Orzel <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Michal Orzel <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-29	ptrace: remove redudant check of #ifdef PTRACE_SINGLESTEP	Tiezhu Yang	1	-6/+0
	Patch series "ptrace: do some cleanup". This patch (of 3): PTRACE_SINGLESTEP is always defined as 9 in include/uapi/linux/ptrace.h, remove redudant check of #ifdef PTRACE_SINGLESTEP. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tiezhu Yang <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-29	lib/Kconfig.debug: remove more CONFIG_..._VALUE indirections	Rasmus Villemoes	2	-3/+3
	As in "kernel/panic.c: remove CONFIG_PANIC_ON_OOPS_VALUE indirection", use the IS_ENABLED() helper rather than having a hidden config option. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-29	kernel: pid_namespace: use NULL instead of using plain integer as pointer	Haowen Bai	1	-1/+1
	This fixes the following sparse warnings: kernel/pid_namespace.c:55:77: warning: Using plain integer as NULL pointer Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haowen Bai <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-29	seccomp: Use FIFO semantics to order notifications	Sargun Dhillon	1	-1/+1
	Previously, the seccomp notifier used LIFO semantics, where each notification would be added on top of the stack, and notifications were popped off the top of the stack. This could result one process that generates a large number of notifications preventing other notifications from being handled. This patch moves from LIFO (stack) semantics to FIFO (queue semantics). Signed-off-by: Sargun Dhillon <[email protected]> Reviewed-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-04-29	ftrace: cleanup ftrace_graph_caller enable and disable	Chengming Zhou	1	-0/+18
	The ftrace_[enable,disable]_ftrace_graph_caller() are used to do special hooks for graph tracer, which are not needed on some ARCHs that use graph_ops:func function to install return_hooker. So introduce the weak version in ftrace core code to cleanup in x86. Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-29	sched/fair: Remove cfs_rq_tg_path()	Dietmar Eggemann	1	-19/+0
	cfs_rq_tg_path() is used by a tracepoint-to traceevent (tp-2-te) converter to format the path of a taskgroup or autogroup respectively. It doesn't have any in-kernel users after the removal of the sched_trace_cfs_rq_path() helper function. cfs_rq_tg_path() can be coded in a tp-2-te converter. Remove it from kernel/sched/fair.c. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-04-29	sched/fair: Remove sched_trace_*() helper functions	Dietmar Eggemann	1	-98/+0
	We no longer need them as we can use DWARF debug info or BTF + pahole to re-generate the required structs to compile against them for a given kernel. This moves the burden of maintaining these helper functions to the module. https://github.com/qais-yousef/sched_tp Note that pahole v1.15 is required at least for using DWARF. And for BTF v1.23 which is not yet released will be required. There's alignment problem that will lead to crashes in earlier versions when used with BTF. We should have enough infrastructure to make these helper functions now obsolete, so remove them. [Rewrote commit message to reflect the new alternative] Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-04-29	sched/fair: Refactor cpu_util_without()	Dietmar Eggemann	1	-100/+57
	Except the 'task has no contribution or is new' condition at the beginning of cpu_util_without(), which it shares with the load and runnable counterpart functions, a cpu_util_next(..., dst_cpu = -1) call can replace the rest of it. The UTIL_EST specific check that task util_est has to be subtracted from the CPU one in case of an enqueued (or current (to cater for the wakeup - lb race)) task has to be moved to cpu_util_next(). This was initially introduced by commit c469933e7721 ("sched/fair: Fix cpu_util_wake() for 'execl' type workloads"). UnixBench's `execl` throughput tests were run on the dual socket 40 CPUs Intel E5-2690 v2 to make sure it doesn't regress again. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-04-29	timekeeping: Mark NMI safe time accessors as notrace	Kurt Kanzenbach	1	-2/+2
	Mark the CLOCK_MONOTONIC fast time accessors as notrace. These functions are used in tracing to retrieve timestamps, so they should not recurse. Fixes: 4498e7467e9e ("time: Parametrize all tk_fast_mono users") Fixes: f09cb9a1808e ("time: Introduce tk_fast_raw") Reported-by: Steven Rostedt <[email protected]> Signed-off-by: Kurt Kanzenbach <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ Link: https://lore.kernel.org/r/[email protected]
2022-04-28	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	6	-14/+17
	include/linux/netdevice.h net/core/dev.c 6510ea973d8d ("net: Use this_cpu_inc() to increment net->core_stats") 794c24e9921f ("net-core: rx_otherhost_dropped to core_stats") https://lore.kernel.org/all/[email protected]/ drivers/net/wan/cosa.c d48fea8401cf ("net: cosa: fix error check return value of register_chrdev()") 89fbca3307d4 ("net: wan: remove support for COSA and SRP synchronous serial boards") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-04-28	Merge tag 'net-5.18-rc5' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf and netfilter. Current release - new code bugs: - bridge: switchdev: check br_vlan_group() return value - use this_cpu_inc() to increment net->core_stats, fix preempt-rt Previous releases - regressions: - eth: stmmac: fix write to sgmii_adapter_base Previous releases - always broken: - netfilter: nf_conntrack_tcp: re-init for syn packets only, resolving issues with TCP fastopen - tcp: md5: fix incorrect tcp_header_len for incoming connections - tcp: fix F-RTO may not work correctly when receiving DSACK - tcp: ensure use of most recently sent skb when filling rate samples - tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT - virtio_net: fix wrong buf address calculation when using xdp - xsk: fix forwarding when combining copy mode with busy poll - xsk: fix possible crash when multiple sockets are created - bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook - sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event - wireguard: device: check for metadata_dst with skb_valid_dst() - netfilter: update ip6_route_me_harder to consider L3 domain - gre: make o_seqno start from 0 in native mode - gre: switch o_seqno to atomic to prevent races in collect_md mode Misc: - add Eric Dumazet to networking maintainers - dt: dsa: realtek: remove realtek,rtl8367s string - netfilter: flowtable: Remove the empty file" * tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) tcp: fix F-RTO may not work correctly when receiving DSACK Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits" net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK ixgbe: ensure IPsec VF<->PF compatibility MAINTAINERS: Update BNXT entry with firmware files netfilter: nft_socket: only do sk lookups when indev is available net: fec: add missing of_node_put() in fec_enet_init_stop_mode() bnx2x: fix napi API usage sequence tls: Skip tls_append_frag on zero copy size Add Eric Dumazet to networking maintainers netfilter: conntrack: fix udp offload timeout sysctl netfilter: nf_conntrack_tcp: re-init for syn packets only net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK net: Use this_cpu_inc() to increment net->core_stats Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted Bluetooth: hci_event: Fix creating hci_conn object on error status Bluetooth: hci_event: Fix checking for invalid handle on error status ice: fix use-after-free when deinitializing mailbox snapshot ice: wait 5 s for EMP reset after firmware flash ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg() ...
2022-04-27	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	Jakub Kicinski	25	-456/+1533
	Daniel Borkmann says: ==================== pull-request: bpf-next 2022-04-27 We've added 85 non-merge commits during the last 18 day(s) which contain a total of 163 files changed, 4499 insertions(+), 1521 deletions(-). The main changes are: 1) Teach libbpf to enhance BPF verifier log with human-readable and relevant information about failed CO-RE relocations, from Andrii Nakryiko. 2) Add typed pointer support in BPF maps and enable it for unreferenced pointers (via probe read) and referenced ones that can be passed to in-kernel helpers, from Kumar Kartikeya Dwivedi. 3) Improve xsk to break NAPI loop when rx queue gets full to allow for forward progress to consume descriptors, from Maciej Fijalkowski & Björn Töpel. 4) Fix a small RCU read-side race in BPF_PROG_RUN routines which dereferenced the effective prog array before the rcu_read_lock, from Stanislav Fomichev. 5) Implement BPF atomic operations for RV64 JIT, and add libbpf parsing logic for USDT arguments under riscv{32,64}, from Pu Lehui. 6) Implement libbpf parsing of USDT arguments under aarch64, from Alan Maguire. 7) Enable bpftool build for musl and remove nftw with FTW_ACTIONRETVAL usage so it can be shipped under Alpine which is musl-based, from Dominique Martinet. 8) Clean up {sk,task,inode} local storage trace RCU handling as they do not need to use call_rcu_tasks_trace() barrier, from KP Singh. 9) Improve libbpf API documentation and fix error return handling of various API functions, from Grant Seltzer. 10) Enlarge offset check for bpf_skb_{load,store}_bytes() helpers given data length of frags + frag_list may surpass old offset limit, from Liu Jian. 11) Various improvements to prog_tests in area of logging, test execution and by-name subtest selection, from Mykola Lysenko. 12) Simplify map_btf_id generation for all map types by moving this process to build time with help of resolve_btfids infra, from Menglong Dong. 13) Fix a libbpf bug in probing when falling back to legacy bpf_probe_read() helpers; the probing caused always to use old helpers, from Runqing Yang. 14) Add support for ARCompact and ARCv2 platforms for libbpf's PT_REGS tracing macros, from Vladimir Isaev. 15) Cleanup BPF selftests to remove old & unneeded rlimit code given kernel switched to memcg-based memory accouting a while ago, from Yafang Shao. 16) Refactor of BPF sysctl handlers to move them to BPF core, from Yan Zhu. 17) Fix BPF selftests in two occasions to work around regressions caused by latest LLVM to unblock CI until their fixes are worked out, from Yonghong Song. 18) Misc cleanups all over the place, from various others. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add libbpf's log fixup logic selftests libbpf: Fix up verifier log for unguarded failed CO-RE relos libbpf: Simplify bpf_core_parse_spec() signature libbpf: Refactor CO-RE relo human description formatting routine libbpf: Record subprog-resolved CO-RE relocations unconditionally selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests libbpf: Avoid joining .BTF.ext data with BPF programs by section name libbpf: Fix logic for finding matching program for CO-RE relocation libbpf: Drop unhelpful "program too large" guess libbpf: Fix anonymous type check in CO-RE logic bpf: Compute map_btf_id during build time selftests/bpf: Add test for strict BTF type check selftests/bpf: Add verifier tests for kptr selftests/bpf: Add C tests for kptr libbpf: Add kptr type tag macros to bpf_helpers.h bpf: Make BTF type match stricter for release arguments bpf: Teach verifier about kptr_get kfunc helpers bpf: Wire up freeing of referenced kptr bpf: Populate pairs of btf_id and destructor kfunc in btf bpf: Adapt copy_map_value for multiple offset case ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-04-27	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Jakub Kicinski	1	-1/+1
	Daniel Borkmann says: ==================== pull-request: bpf 2022-04-27 We've added 5 non-merge commits during the last 20 day(s) which contain a total of 6 files changed, 34 insertions(+), 12 deletions(-). The main changes are: 1) Fix xsk sockets when rx and tx are separately bound to the same umem, also fix xsk copy mode combined with busy poll, from Maciej Fijalkowski. 2) Fix BPF tunnel/collect_md helpers with bpf_xmit lwt hook usage which triggered a crash due to invalid metadata_dst access, from Eyal Birger. 3) Fix release of page pool in XDP live packet mode, from Toke Høiland-Jørgensen. 4) Fix potential NULL pointer dereference in kretprobes, from Adam Zabrocki. (Masami & Steven preferred this small fix to be routed via bpf tree given it's follow-up fix to Masami's rethook work that went via bpf earlier, too.) * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xsk: Fix possible crash when multiple sockets are created kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook bpf: Fix release of page_pool in BPF_PROG_RUN in test runner xsk: Fix l2fwd for copy mode + busy poll combo ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-04-27	tracing: Remove check of list iterator against head past the loop body	Jakob Koschel	3	-20/+26
	When list_for_each_entry() completes the iteration over the whole list without breaking the loop, the iterator value will be a bogus pointer computed based on the head element. While it is safe to use the pointer to determine if it was computed based on the head element, either with list_entry_is_head() or &pos->member == head, using the iterator variable after the loop should be avoided. In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1]. Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ Signed-off-by: Jakob Koschel <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-27	tracing: Replace usage of found with dedicated list iterator variable	Jakob Koschel	2	-21/+18
	To move the list iterator variable into the list_for_each_entry_() macro in the future it should be avoided to use the list iterator variable after the loop body. To never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use a found variable and simply checking if the variable was set, can determine if the break/goto was hit. Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-27	tracing: Remove usage of list iterator variable after the loop	Jakob Koschel	1	-8/+7
	In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1]. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Signed-off-by: Jakob Koschel <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-27	tracing: Remove usage of list iterator after the loop body	Jakob Koschel	1	-4/+9
	In preparation to limit the scope of the list iterator variable to the traversal loop, use a dedicated pointer to point to the found element [1]. Before, the code implicitly used the head when no element was found when using &pos->list. Since the new variable is only set if an element was found, the head needs to be used explicitly if the variable is NULL. Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-27	tracing: Introduce trace clock tai	Kurt Kanzenbach	1	-0/+1
	A fast/NMI safe accessor for CLOCK_TAI has been introduced. Use it for adding the additional trace clock "tai". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kurt Kanzenbach <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-27	ring-buffer: Have 32 bit time stamps use all 64 bits	Steven Rostedt (Google)	1	-10/+18
	When the new logic was made to handle deltas of events from interrupts that interrupted other events, it required 64 bit local atomics. Unfortunately, 64 bit local atomics are expensive on 32 bit architectures. Thus, commit 10464b4aa605e ("ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit") created a type of seq lock timer for 32 bits. It used two 32 bit local atomics, but required 2 bits from them each for synchronization, making it only 60 bits. Add a new "msb" field to hold the extra 4 bits that are cut off. Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-27	ring-buffer: Have absolute time stamps handle large numbers	Steven Rostedt (Google)	1	-5/+44
	There's an absolute timestamp event in the ring buffer, but this only saves 59 bits of the timestamp, as the 5 MSB is used for meta data (stating it is an absolute time stamp). This was never an issue as all the clocks currently in use never used those 5 MSB. But now there's a new clock (TAI) that does. To handle this case, when reading an absolute timestamp, a previous full timestamp is passed in, and the 5 MSB of that timestamp is OR'd to the absolute timestamp (if any of the 5 MSB are set), and then to test for overflow, if the new result is smaller than the passed in previous timestamp, then 1 << 59 is added to it. All the extra processing is done on the reader "slow" path, with the exception of the "too big delta" check, and the reading of timestamps for histograms. Note, libtraceevent will need to be updated to handle this case as well. But this is not a user space regression, as user space was never able to handle any timestamps that used more than 59 bits. Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Cc: Tom Zanussi <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Kurt Kanzenbach <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-27	x86/split_lock: Make life miserable for split lockers	Tony Luck	1	-0/+5
	In https://lore.kernel.org/all/87y22uujkm.ffs@tglx/ Thomas said: Its's simply wishful thinking that stuff gets fixed because of a WARN_ONCE(). This has never worked. The only thing which works is to make stuff fail hard or slow it down in a way which makes it annoying enough to users to complain. He was talking about WBINVD. But it made me think about how we use the split lock detection feature in Linux. Existing code has three options for applications: 1) Don't enable split lock detection (allow arbitrary split locks) 2) Warn once when a process uses split lock, but let the process keep running with split lock detection disabled 3) Kill process that use split locks Option 2 falls into the "wishful thinking" territory that Thomas warns does nothing. But option 3 might not be viable in a situation with legacy applications that need to run. Hence make option 2 much stricter to "slow it down in a way which makes it annoying". Primary reason for this change is to provide better quality of service to the rest of the applications running on the system. Internal testing shows that even with many processes splitting locks, performance for the rest of the system is much more responsive. The new "warn" mode operates like this. When an application tries to execute a bus lock the #AC handler. 1) Delays (interruptibly) 10 ms before moving to next step. 2) Blocks (interruptibly) until it can get the semaphore If interrupted, just return. Assume the signal will either kill the task, or direct execution away from the instruction that is trying to get the bus lock. 3) Disables split lock detection for the current core 4) Schedules a work queue to re-enable split lock detect in 2 jiffies 5) Returns The work queue that re-enables split lock detection also releases the semaphore. There is a corner case where a CPU may be taken offline while split lock detection is disabled. A CPU hotplug handler handles this case. Old behaviour was to only print the split lock warning on the first occurrence of a split lock from a task. Preserve that by adding a flag to the task structure that suppresses subsequent split lock messages from that task. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-04-26	tracing: make tracer_init_tracefs initcall asynchronous	Mark-PK Tsai	1	-10/+22
	Move trace_eval_init() to subsys_initcall to make it start earlier. And to avoid tracer_init_tracefs being blocked by trace_event_sem which trace_eval_init() hold [1], queue tracer_init_tracefs() to eval_map_wq to let the two works being executed sequentially. It can speed up the initialization of kernel as result of making tracer_init_tracefs asynchronous. On my arm64 platform, it reduce ~20ms of 125ms which total time do_initcalls spend. Link: https://lkml.kernel.org/r/[email protected] [1]: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark-PK Tsai <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Avoid adding tracer option before update_tracer_options	Mark-PK Tsai	1	-0/+7
	To prepare for support asynchronous tracer_init_tracefs initcall, avoid calling create_trace_option_files before __update_tracer_options. Otherwise, create_trace_option_files will show warning because some tracers in trace_types list are already in tr->topts. For example, hwlat_tracer call register_tracer in late_initcall, and global_trace.dir is already created in tracing_init_dentry, hwlat_tracer will be put into tr->topts. Then if the __update_tracer_options is executed after hwlat_tracer registered, create_trace_option_files find that hwlat_tracer is already in tr->topts. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/20220322133339.GA32582@xsang-OptiPlex-9020/ Reported-by: kernel test robot <[email protected]> Signed-off-by: Mark-PK Tsai <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	ring-buffer: Simplify if-if to if-else	Wan Jiabing	1	-2/+2
	Use if and else instead of if(A) and if (!A). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wan Jiabing <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Use WARN instead of printk and WARN_ON	Guo Zhengkui	1	-9/+3
	Use `WARN(cond, ...)` instead of `if (cond)` + `printk(...)` + `WARN_ON(1)`. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Steven Rostedt <[email protected]> Signed-off-by: Guo Zhengkui <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Fix sleeping function called from invalid context on RT kernel	Jun Miao	1	-3/+3
	When setting bootparams="trace_event=initcall:initcall_start tp_printk=1" in the cmdline, the output_printk() was called, and the spin_lock_irqsave() was called in the atomic and irq disable interrupt context suitation. On the PREEMPT_RT kernel, these locks are replaced with sleepable rt-spinlock, so the stack calltrace will be triggered. Fix it by raw_spin_lock_irqsave when PREEMPT_RT and "trace_event=initcall:initcall_start tp_printk=1" enabled. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 2, expected: 0 RCU nest depth: 0, expected: 0 Preemption disabled at: [<ffffffff8992303e>] try_to_wake_up+0x7e/0xba0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt17+ #19 34c5812404187a875f32bee7977f7367f9679ea7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x60/0x8c dump_stack+0x10/0x12 __might_resched.cold+0x11d/0x155 rt_spin_lock+0x40/0x70 trace_event_buffer_commit+0x2fa/0x4c0 ? map_vsyscall+0x93/0x93 trace_event_raw_event_initcall_start+0xbe/0x110 ? perf_trace_initcall_finish+0x210/0x210 ? probe_sched_wakeup+0x34/0x40 ? ttwu_do_wakeup+0xda/0x310 ? trace_hardirqs_on+0x35/0x170 ? map_vsyscall+0x93/0x93 do_one_initcall+0x217/0x3c0 ? trace_event_raw_event_initcall_level+0x170/0x170 ? push_cpu_stop+0x400/0x400 ? cblist_init_generic+0x241/0x290 kernel_init_freeable+0x1ac/0x347 ? _raw_spin_unlock_irq+0x65/0x80 ? rest_init+0xf0/0xf0 kernel_init+0x1e/0x150 ret_from_fork+0x22/0x30 </TASK> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jun Miao <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Change `if (strlen(glob))` to `if (glob[0])`	Ammar Faizi	1	-1/+1
	No need to traverse to the end of string. If the first byte is not a NUL char, it's guaranteed `if (strlen(glob))` is true. Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Cc: GNU/Weeb Mailing List <[email protected]> Signed-off-by: Ammar Faizi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Return -EINVAL if WARN_ON(!glob) triggered in ↵	Ammar Faizi	1	-1/+2
	event_hist_trigger_parse() If `WARN_ON(!glob)` is ever triggered, we will still continue executing the next lines. This will trigger the more serious problem, a NULL pointer dereference bug. Just return -EINVAL if @glob is NULL. Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Cc: GNU/Weeb Mailing List <[email protected]> Signed-off-by: Ammar Faizi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Make tp_printk work on syscall tracepoints	Jeff Xie	1	-24/+11
	Currently the tp_printk option has no effect on syscall tracepoint. When adding the kernel option parameter tp_printk, then: echo 1 > /sys/kernel/debug/tracing/events/syscalls/enable When running any application, no trace information is printed on the terminal. Now added printk for syscall tracepoints. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Xie <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Fix tracing_map_sort_entries() kernel-doc comment	Yang Li	1	-1/+2
	Add the description of @n_sort_keys and make @sort_key -> @sort_keys in tracing_map_sort_entries() kernel-doc comment to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. kernel/trace/tracing_map.c:1073: warning: Function parameter or member 'sort_keys' not described in 'tracing_map_sort_entries' kernel/trace/tracing_map.c:1073: warning: Function parameter or member 'n_sort_keys' not described in 'tracing_map_sort_entries' kernel/trace/tracing_map.c:1073: warning: Excess function parameter 'sort_key' description in 'tracing_map_sort_entries' Link: https://lkml.kernel.org/r/[email protected] Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Fix kernel-doc	Jiapeng Chong	1	-1/+1
	Fix the following W=1 kernel warnings: kernel/trace/trace.c:1181: warning: expecting prototype for tracing_snapshot_cond_data(). Prototype was for tracing_cond_snapshot_data() instead. Link: https://lkml.kernel.org/r/[email protected] Reported-by: Abaci Robot <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Fix inconsistent style of mini-HOWTO	Oscar Shiang	1	-2/+2
	Each description should start with a hyphen and a space. Insert spaces to fix it. Link: https://lkml.kernel.org/r/TYCP286MB19130AA4A9C6FC5A8793DED2A1359@TYCP286MB1913.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Oscar Shiang <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Separate hist state updates from hist registration	Tom Zanussi	1	-18/+48
	hist_register_trigger() handles both new hist registration as well as existing hist registration through event_command.reg(). Adding a new function, existing_hist_update_only(), that checks and updates existing histograms and exits after doing so allows the confusing logic in event_hist_trigger_parse() to be simplified. Link: https://lkml.kernel.org/r/211b2cd3e3d7e00f4f8ad45ef8b33063da6a7e05.1644010576.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Have existing event_command.parse() implementations use helpers	Tom Zanussi	4	-151/+69
	Simplify the existing event_command.parse() implementations by having them make use of the helper functions previously introduced. Link: https://lkml.kernel.org/r/b353e3427a81f9d3adafd98fd7d73e78a8209f43.1644010576.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Remove redundant trigger_ops params	Tom Zanussi	4	-60/+36
	Since event_trigger_data contains the .ops trigger_ops field, there's no reason to pass the trigger_ops separately. Remove it as a param from functions whenever event_trigger_data is passed. Link: https://lkml.kernel.org/r/9856c9bc81bde57077f5b8d6f8faa47156c6354a.1644010575.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Remove logic for registering multiple event triggers at a time	Tom Zanussi	3	-77/+45
	Code for registering triggers assumes it's possible to register more than one trigger at a time. In fact, it's unimplemented and there doesn't seem to be a reason to do that. Remove the n_registered param from event_trigger_register() and fix up callers. Doing so simplifies the logic in event_trigger_register to the point that it just becomes a wrapper calling event_command.reg(). It also removes the problematic call to event_command.unreg() in case of failure. A new function, event_trigger_unregister() is also added for callers to call themselves. The changes to trace_events_hist.c simply allow compilation; a separate patch follows which updates the hist triggers to work correctly with the new changes. Link: https://lkml.kernel.org/r/6149fec7a139d93e84fa4535672fb5bef88006b0.1644010575.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	tracing: Cleanup double word in comment	Tom Rix	1	-2/+2
	Remove the second 'is' and 'to'. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tom Rix <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-04-26	bpf: Compute map_btf_id during build time	Menglong Dong	15	-115/+52
	For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map types are computed during vmlinux-btf init: btf_parse_vmlinux() -> btf_vmlinux_map_ids_init() It will lookup the btf_type according to the 'map_btf_name' field in 'struct bpf_map_ops'. This process can be done during build time, thanks to Jiri's resolve_btfids. selftest of map_ptr has passed: $96 map_ptr:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Reported-by: kernel test robot <[email protected]> Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2022-04-26	kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set	Adam Zabrocki	1	-1/+1
	The recent kernel change in 73f9b911faa7 ("kprobes: Use rethook for kretprobe if possible"), introduced a potential NULL pointer dereference bug in the KRETPROBE mechanism. The official Kprobes documentation defines that "Any or all handlers can be NULL". Unfortunately, there is a missing return handler verification to fulfill these requirements and can result in a NULL pointer dereference bug. This patch adds such verification in kretprobe_rethook_handler() function. Fixes: 73f9b911faa7 ("kprobes: Use rethook for kretprobe if possible") Signed-off-by: Adam Zabrocki <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Anil S. Keshavamurthy <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]