blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-09-12	blk-throttle: fix io hung due to configuration updates	Yu Kuai	2	-6/+61
	If new configuration is submitted while a bio is throttled, then new waiting time is recalculated regardless that the bio might already wait for some time: tg_conf_updated throtl_start_new_slice tg_update_disptime throtl_schedule_next_dispatch Then io hung can be triggered by always submmiting new configuration before the throttled bio is dispatched. Fix the problem by respecting the time that throttled bio already waited. In order to do that, add new fields to record how many bytes/io are waited, and use it to calculate wait time for throttled bio under new configuration. Some simple test: 1) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 2048" > blkio.throttle.write_bps_device { sleep 2 echo "8:0 1024" > blkio.throttle.write_bps_device } & dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct 2) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 1024" > blkio.throttle.write_bps_device { sleep 4 echo "8:0 2048" > blkio.throttle.write_bps_device } & dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct test results: io finish time before this patch with this patch 1) 10s 6s 2) 8s 6s Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-12	blk-throttle: factor out code to calculate ios/bytes_allowed	Yu Kuai	1	-24/+35
	No functional changes, new apis will be used in later patches to calculate wait time for throttled bios when new configuration is submitted. Noted this patch also rename tg_with_in_iops/bps_limit() to tg_within_iops/bps_limit(). Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-12	blk-throttle: prevent overflow while calculating wait time	Yu Kuai	1	-5/+3
	There is a problem found by code review in tg_with_in_bps_limit() that 'bps_limit * jiffy_elapsed_rnd' might overflow. Fix the problem by calling mul_u64_u64_div_u64() instead. Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-12	blk-throttle: fix that io throttle can only work for single bio	Yu Kuai	5	-19/+9
	Test scripts: cd /sys/fs/cgroup/blkio/ echo "8:0 1024" > blkio.throttle.write_bps_device echo $$ > cgroup.procs dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & Test result: 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s The problem is that the second bio is finished after 10s instead of 20s. Root cause: 1) second bio will be flagged: __blk_throtl_bio while (true) { ... if (sq->nr_queued[rw]) -> some bio is throttled already break }; bio_set_flag(bio, BIO_THROTTLED); -> flag the bio 2) flagged bio will be dispatched without waiting: throtl_dispatch_tg tg_may_dispatch tg_with_in_bps_limit if (bps_limit == U64_MAX \|\| bio_flagged(bio, BIO_THROTTLED)) *wait = 0; -> wait time is zero return true; commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") support to count split bios for iops limit, thus it adds flagged bio checking in tg_with_in_bps_limit() so that split bios will only count once for bps limit, however, it introduce a new problem that io throttle won't work if multiple bios are throttled. In order to fix the problem, handle iops/bps limit in different ways: 1) for iops limit, there is no flag to record if the bio is throttled, and iops is always applied. 2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED, and io throttle will ignore bio with the flag. Noted this patch also remove the code to set flag in __bio_clone(), it's introduced in commit 111be8839817 ("block-throttle: avoid double charge"), and author thinks split bio can be resubmited and throttled again, which is wrong because split bio will continue to dispatch from caller. Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") Cc: <[email protected]> Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-12	sbitmap: fix batched wait_cnt accounting	Keith Busch	3	-16/+26
	Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-08	sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch	Uros Bizjak	1	-5/+5
	Use atomic_long_try_cmpxchg instead of atomic_long_cmpxchg (ptr, old, new) == old in __sbitmap_queue_get_batch. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, atomic_long_cmpxchg implicitly assigns old ptr value to "old" when cmpxchg fails, enabling further code simplifications, e.g. an extra memory read can be avoided in the loop. No functional change intended. Cc: Jens Axboe <[email protected]> Signed-off-by: Uros Bizjak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-08	nbd: Fix hung when signal interrupts nbd_start_device_ioctl()	Shigeru Yoshida	1	-2/+4
	syzbot reported hung task [1]. The following program is a simplified version of the reproducer: int main(void) { int sv[2], fd; if (socketpair(AF_UNIX, SOCK_STREAM, 0, sv) < 0) return 1; if ((fd = open("/dev/nbd0", 0)) < 0) return 1; if (ioctl(fd, NBD_SET_SIZE_BLOCKS, 0x81) < 0) return 1; if (ioctl(fd, NBD_SET_SOCK, sv[0]) < 0) return 1; if (ioctl(fd, NBD_DO_IT) < 0) return 1; return 0; } When signal interrupt nbd_start_device_ioctl() waiting the condition atomic_read(&config->recv_threads) == 0, the task can hung because it waits the completion of the inflight IOs. This patch fixes the issue by clearing queue, not just shutdown, when signal interrupt nbd_start_device_ioctl(). Link: https://syzkaller.appspot.com/bug?id=7d89a3ffacd2b83fdd39549bc4d8e0a89ef21239 [1] Reported-by: [email protected] Signed-off-by: Shigeru Yoshida <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-08	sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()	Jan Kara	1	-3/+15
	When __sbq_wake_up() decrements wait_cnt to 0 but races with someone else waking the waiter on the waitqueue (so the waitqueue becomes empty), it exits without reseting wait_cnt to wake_batch number. Once wait_cnt is 0, nobody will ever reset the wait_cnt or wake the new waiters resulting in possible deadlocks or busyloops. Fix the problem by making sure we reset wait_cnt even if we didn't wake up anybody in the end. Fixes: 040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup") Reported-by: Keith Busch <[email protected]> Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-05	rnbd-srv: remove redundant setting of blk_open_flags	Guoqing Jiang	1	-1/+0
	It is not necessary since it is set later just before function return success. Signed-off-by: Guoqing Jiang <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Md Haris Iqbal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-05	rnbd-srv: make process_msg_close returns void	Guoqing Jiang	1	-4/+3
	Change the return type to void given it always returns 0. Signed-off-by: Guoqing Jiang <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Md Haris Iqbal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-05	rnbd-srv: add comment in rnbd_srv_rdma_ev	Guoqing Jiang	1	-0/+5
	Let's add some explanations here given the err handling is not obvious. Signed-off-by: Guoqing Jiang <[email protected]> Acked-by: Md Haris Iqbal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-05	block: remove unneeded return value of bio_check_ro()	Miaohe Lin	1	-7/+3
	bio_check_ro() always return false now. Remove this unneeded return value and cleanup the sole caller. No functional change intended. Signed-off-by: Miaohe Lin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-05	blk-mq: remove unneeded needs_restart check	Miaohe Lin	1	-1/+1
	If code reaches here, needs_restart must be true. Remove this unneeded needs_restart check. No functional change intended. Signed-off-by: Miaohe Lin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-05	block/blk-map: Remove set but unused variable 'added'	Jiapeng Chong	1	-2/+1
	The variable added is not effectively used in the function, so delete it. block/blk-map.c:273:16: warning: variable 'added' set but not used. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2049 Reported-by: Abaci Robot <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-04	blk-throttle: clean up codes that can't be reached	Yu Kuai	1	-34/+56
	While doing code coverage testing while CONFIG_BLK_DEV_THROTTLING_LOW is disabled, we found that there are many codes can never be reached. This patch move such codes inside "#ifdef CONFIG_BLK_DEV_THROTTLING_LOW". Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-04	Revert "sbitmap: fix batched wait_cnt accounting"	Jens Axboe	3	-20/+16
	This reverts commit 16ede66973c84f890c03584f79158dd5b2d725f5. This is causing issues with CPU stalls on my test box, revert it for now until we understand what is going on. It looks like infinite looping off sbitmap_queue_wake_up(), but hard to tell with a lot of CPUs hitting this issue and the console scrolling infinitely. Link: https://lore.kernel.org/linux-block/[email protected]/ Signed-off-by: Jens Axboe <[email protected]>
2022-09-02	block: enable per-cpu bio caching for the fs bio set	Jens Axboe	1	-1/+2
	This is useful for polled IO on a file, or for polled IO with the io_uring passthrough mechanism. If bio allocations are done with REQ_POLLED for those cases, then initializing the bio set with BIOSET_PERCPU_CACHE enables the local per-cpu cache which eliminates allocations (and frees) of bio structs when possible. Reviewed-by: Kanchan Joshi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2022-09-01	sbitmap: fix batched wait_cnt accounting	Keith Busch	3	-16/+20
	Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-26	sbitmap: remove unnecessary code in __sbitmap_queue_get_batch	Liu Song	1	-3/+2
	If "nr + nr_tags <= map_depth", then the value of nr_tags will not be greater than map_depth, so no additional comparison is required. Signed-off-by: Liu Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-24	block/rnbd-clt: Remove the unneeded result variable	ye xingchen	1	-3/+1
	Return the value from rtrs_clt_rdma_cq_direct() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <[email protected]> Signed-off-by: ye xingchen <[email protected]> Acked-by: Jack Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-23	sbitmap: fix possible io hung due to lost wakeup	Yu Kuai	1	-22/+33
	There are two problems can lead to lost wakeup: 1) invalid wakeup on the wrong waitqueue: For example, 2 * wake_batch tags are put, while only wake_batch threads are woken: __sbq_wake_up atomic_cmpxchg -> reset wait_cnt __sbq_wake_up -> decrease wait_cnt ... __sbq_wake_up -> wait_cnt is decreased to 0 again atomic_cmpxchg sbq_index_atomic_inc -> increase wake_index wake_up_nr -> wake up and waitqueue might be empty sbq_index_atomic_inc -> increase again, one waitqueue is skipped wake_up_nr -> invalid wake up because old wakequeue might be empty To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. 2) 'wait_cnt' can be decreased while waitqueue is empty As pointed out by Jan Kara, following race is possible: CPU1 CPU2 __sbq_wake_up __sbq_wake_up sbq_wake_ptr() sbq_wake_ptr() -> the same wait_cnt = atomic_dec_return() /* decreased to 0 / sbq_index_atomic_inc() / move to next waitqueue / atomic_set() / reset wait_cnt / wake_up_nr() / wake up on the old waitqueue / wait_cnt = atomic_dec_return() / * decrease wait_cnt in the old * waitqueue, while it can be * empty. */ Fix the problem by waking up before updating 'wake_index' and 'wait_cnt'. With this patch, noted that 'wait_cnt' is still decreased in the old empty waitqueue, however, the wakeup is redirected to a active waitqueue, and the extra decrement on the old empty waitqueue is not handled. Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	block: use on-stack page vec for <= UIO_FASTIOV	Jens Axboe	1	-3/+11
	Avoid a kmalloc+kfree for each page array, if we only have a few pages that are mapped. An alloc+free for each IO is quite expensive, and it's pretty pointless if we're only dealing with 1 or a few vecs. Use UIO_FASTIOV like we do in other spots to set a sane limit for how big of an IO we want to avoid allocations for. Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	block: enable bio caching use for passthru IO	Jens Axboe	1	-8/+25
	bdev based polled O_DIRECT is currently quite a bit faster than passthru on the same device, and one of the reaons is that we're not able to use the bio caching for passthru IO. If REQ_POLLED is set on the request, use the fs bio set for grabbing a bio from the caches, if available. This saves 5-6% of CPU over head for polled passthru IO. Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	block: shrink rq_map_data a bit	Jens Axboe	2	-5/+5
	We don't need full ints for several of these members. Change the page_order and nr_entries to unsigned shorts, and the true/false from_user and null_mapped to booleans. This shrinks the struct from 32 to 24 bytes on 64-bit archs. Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	block, bfq: remove useless parameter for bfq_add/del_bfqq_busy()	Yu Kuai	3	-10/+12
	'bfqd' can be accessed through 'bfqq->bfqd', there is no need to pass it as a parameter separately. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	block, bfq: remove useless checking in bfq_put_queue()	Yu Kuai	1	-4/+2
	'bfqq->bfqd' is ensured to set in bfq_init_queue(), and it will never change afterwards. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	block, bfq: remove unused functions	Yu Kuai	2	-10/+8
	While doing code coverage testing(CONFIG_BFQ_CGROUP_DEBUG is disabled), we found that some functions doesn't have caller, thus remove them. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	block: Change the return type of blk_mq_map_queues() into void	Bart Van Assche	30	-94/+55
	Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <[email protected]> Cc: Jason Wang <[email protected]> Cc: Keith Busch <[email protected]> Cc: Martin K. Petersen <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: John Garry <[email protected]> Acked-by: Md Haris Iqbal <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	null_blk: Modify the behavior of null_map_queues()	Bart Van Assche	1	-1/+3
	Instead of returning -EINVAL if an internal inconsistency is detected, fall back to a single submission queue. This patch prepares for changing the return value of the .map_queues() callbacks into void. Cc: Christoph Hellwig <[email protected]> Cc: Keith Busch <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	block/rnbd-srv: Add event tracing support	Santosh Pradhan	4	-7/+241
	Add event tracing mechanism for following routines: - create_sess() - destroy_sess() - process_rdma() - process_msg_sess_info() - process_msg_open() - process_msg_close() How to use: 1. Load the rnbd_server module 2. cd /sys/kernel/debug/tracing 3. If all the events need to be enabled: echo 1 > events/rnbd_srv/enable 4. OR only speific routine/event needs to be enabled e.g. echo 1 > events/rnbd_srv/create_sess/enable 5. cat trace 5. Run some workload which can trigger create_sess() routine/event Signed-off-by: Santosh Pradhan <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-22	block: sed-opal: Add ioctl to return device status	[email protected]	4	-12/+96
	Provide a mechanism to retrieve basic status information about the device, including the "supported" flag indicating whether SED-OPAL is supported. The information returned is from the various feature descriptors received during the discovery0 step, and so this ioctl does nothing more than perform the discovery0 step and then save the information received. See "struct opal_status" and OPAL_FL_* bits for the status information currently returned. This is necessary to be able to check whether a device is OPAL enabled, set up, locked or unlocked from userspace programs like systemd-cryptsetup and libcryptsetup. Right now we just have to assume the user 'knows' or blindly attempt setup/lock/unlock operations. Signed-off-by: Douglas Miller <[email protected]> Tested-by: Luca Boccassi <[email protected]> Reviewed-by: Scott Bauer <[email protected]> Acked-by: Christian Brauner (Microsoft) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-21	Linux 6.0-rc2	Linus Torvalds	1	-1/+1

2022-08-21	Merge tag 'irq-urgent-2022-08-21' of ↵	Linus Torvalds	7	-32/+34
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "Misc irqchip fixes: LoongArch driver fixes and a Hyper-V IOMMU fix" * tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/loongson-liointc: Fix an error handling path in liointc_init() irqchip/loongarch: Fix irq_domain_alloc_fwnode() abuse irqchip/loongson-pch-pic: Move find_pch_pic() into CONFIG_ACPI irqchip/loongson-eiointc: Fix a build warning irqchip/loongson-eiointc: Fix irq affinity setting iommu/hyper-v: Use helper instead of directly accessing affinity
2022-08-21	Merge tag 'perf-urgent-2022-08-21' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kprobes fix from Ingo Molnar: "Fix a kprobes bug in JNG/JNLE emulation when a kprobe is installed at such instructions, possibly resulting in incorrect execution (the wrong branch taken)" * tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Fix JNG/JNLE emulation
2022-08-21	Merge tag 'trace-v6.0-rc1-2' of ↵	Linus Torvalds	5	-21/+119
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various fixes for tracing: - Fix a return value of traceprobe_parse_event_name() - Fix NULL pointer dereference from failed ftrace enabling - Fix NULL pointer dereference when asking for registers from eprobes - Make eprobes consistent with kprobes/uprobes, filters and histograms" * tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have filter accept "common_cpu" to be consistent tracing/probes: Have kprobes and uprobes use $COMM too tracing/eprobes: Have event probes be consistent with kprobes and uprobes tracing/eprobes: Fix reading of string fields tracing/eprobes: Do not hardcode $comm as a string tracing/eprobes: Do not allow eprobes to use $stack, or % for regs ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead tracing/perf: Fix double put of trace event when init fails tracing: React to error return from traceprobe_parse_event_name()
2022-08-21	tracing: Have filter accept "common_cpu" to be consistent	Steven Rostedt (Google)	1	-0/+1
	Make filtering consistent with histograms. As "cpu" can be a field of an event, allow for "common_cpu" to keep it from being confused with the "cpu" field of the event. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 1e3bac71c5053 ("tracing/histogram: Rename "cpu" to "common_cpu"") Suggested-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21	tracing/probes: Have kprobes and uprobes use $COMM too	Steven Rostedt (Google)	1	-2/+3
	Both $comm and $COMM can be used to get current->comm in eprobes and the filtering and histogram logic. Make kprobes and uprobes consistent in this regard and allow both $comm and $COMM as well. Currently kprobes and uprobes only handle $comm, which is inconsistent with the other utilities, and can be confusing to users. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 533059281ee5 ("tracing: probeevent: Introduce new argument fetching code") Suggested-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21	tracing/eprobes: Have event probes be consistent with kprobes and uprobes	Steven Rostedt (Google)	1	-6/+64
	Currently, if a symbol "@" is attempted to be used with an event probe (eprobes), it will cause a NULL pointer dereference crash. Both kprobes and uprobes can reference data other than the main registers. Such as immediate address, symbols and the current task name. Have eprobes do the same thing. For "comm", if "comm" is used and the event being attached to does not have the "comm" field, then make it the "$comm" that kprobes has. This is consistent to the way histograms and filters work. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21	tracing/eprobes: Fix reading of string fields	Steven Rostedt (Google)	1	-0/+21
	Currently when an event probe (eprobe) hooks to a string field, it does not display it as a string, but instead as a number. This makes the field rather useless. Handle the different kinds of strings, dynamic, static, relational/dynamic etc. Now when a string field is used, the ":string" type can be used to display it: echo "e:sw sched/sched_switch comm=$next_comm:string" > dynamic_events Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21	tracing/eprobes: Do not hardcode $comm as a string	Steven Rostedt (Google)	1	-2/+3
	The variable $comm is hard coded as a string, which is true for both kprobes and uprobes, but for event probes (eprobes) it is a field name. In most cases the "comm" field would be a string, but there's no guarantee of that fact. Do not assume that comm is a string. Not to mention, it currently forces comm fields to fault, as string processing for event probes is currently broken. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21	tracing/eprobes: Do not allow eprobes to use $stack, or % for regs	Steven Rostedt (Google)	1	-8/+13
	While playing with event probes (eprobes), I tried to see what would happen if I attempted to retrieve the instruction pointer (%rip) knowing that event probes do not use pt_regs. The result was: BUG: kernel NULL pointer dereference, address: 0000000000000024 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ #309 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:get_event_field.isra.0+0x0/0x50 Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8 50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24 8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74 RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086 RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000 RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8 R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8 R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff916c9ea40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0 Call Trace: <TASK> get_eprobe_size+0xb4/0x640 ? __mod_node_page_state+0x72/0xc0 __eprobe_trace_func+0x59/0x1a0 ? __mod_lruvec_page_state+0xaa/0x1b0 ? page_remove_file_rmap+0x14/0x230 ? page_remove_rmap+0xda/0x170 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x18f/0x240 trace_event_raw_event_sched_wakeup_template+0x7a/0xb0 try_to_wake_up+0x260/0x4c0 __wake_up_common+0x80/0x180 __wake_up_common_lock+0x7c/0xc0 do_notify_parent+0x1c9/0x2a0 exit_notify+0x1a9/0x220 do_exit+0x2ba/0x450 do_group_exit+0x2d/0x90 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Obviously this is not the desired result. Move the testing for TPARG_FL_TPOINT which is only used for event probes to the top of the "$" variable check, as all the other variables are not used for event probes. Also add a check in the register parsing "%" to fail if an event probe is used. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21	ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead	Yang Jihong	1	-0/+10
	ftrace_startup does not remove ops from ftrace_ops_list when ftrace_startup_enable fails: register_ftrace_function ftrace_startup __register_ftrace_function ... add_ftrace_ops(&ftrace_ops_list, ops) ... ... ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1 ... return 0 // ops is in the ftrace_ops_list. When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything: unregister_ftrace_function ftrace_shutdown if (unlikely(ftrace_disabled)) return -ENODEV; // return here, __unregister_ftrace_function is not executed, // as a result, ops is still in the ftrace_ops_list __unregister_ftrace_function ... If ops is dynamically allocated, it will be free later, in this case, is_ftrace_trampoline accesses NULL pointer: is_ftrace_trampoline ftrace_ops_trampoline do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL! Syzkaller reports as follows: [ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b [ 1203.508039] #PF: supervisor read access in kernel mode [ 1203.508798] #PF: error_code(0x0000) - not-present page [ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0 [ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI [ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G B W 5.10.0 #8 [ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0 [ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00 [ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246 [ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866 [ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b [ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07 [ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399 [ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008 [ 1203.525634] FS: 00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000 [ 1203.526801] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0 [ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Therefore, when ftrace_startup_enable fails, we need to rollback registration process and remove ops from ftrace_ops_list. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Steven Rostedt <[email protected]> Signed-off-by: Yang Jihong <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21	tracing/perf: Fix double put of trace event when init fails	Steven Rostedt (Google)	1	-3/+4
	If in perf_trace_event_init(), the perf_trace_event_open() fails, then it will call perf_trace_event_unreg() which will not only unregister the perf trace event, but will also call the put() function of the tp_event. The problem here is that the trace_event_try_get_ref() is called by the caller of perf_trace_event_init() and if perf_trace_event_init() returns a failure, it will then call trace_event_put(). But since the perf_trace_event_unreg() already called the trace_event_put() function, it triggers a WARN_ON(). WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 trace_event_dyn_put_ref+0x15/0x20 If perf_trace_event_reg() does not call the trace_event_try_get_ref() then the perf_trace_event_unreg() should not be calling trace_event_put(). This breaks symmetry and causes bugs like these. Pull out the trace_event_put() from perf_trace_event_unreg() and call it in the locations that perf_trace_event_unreg() is called. This not only fixes this bug, but also brings back the proper symmetry of the reg/unreg vs get/put logic. Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 1d18538e6a092 ("tracing: Have dynamic events have a ref counter") Reported-by: Krister Johansen <[email protected]> Reviewed-by: Krister Johansen <[email protected]> Tested-by: Krister Johansen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21	tracing: React to error return from traceprobe_parse_event_name()	Lukas Bulwahn	1	-1/+1
	The function traceprobe_parse_event_name() may set the first two function arguments to a non-null value and still return -EINVAL to indicate an unsuccessful completion of the function. Hence, it is not sufficient to just check the result of the two function arguments for being not null, but the return value also needs to be checked. Commit 95c104c378dc ("tracing: Auto generate event name when creating a group of events") changed the error-return-value checking of the second traceprobe_parse_event_name() invocation in __trace_eprobe_create() and removed checking the return value to jump to the error handling case. Reinstate using the return value in the error-return-value checking. Link: https://lkml.kernel.org/r/[email protected] Fixes: 95c104c378dc ("tracing: Auto generate event name when creating a group of events") Acked-by: Linyu Yuan <[email protected]> Signed-off-by: Lukas Bulwahn <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21	Merge tag 'i2c-for-6.0-rc2' of ↵	Linus Torvalds	2	-11/+18
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "A revert to fix a regression introduced this merge window and a fix for proper error handling in the remove path of the iMX driver" * tag 'i2c-for-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx: Make sure to unregister adapter on remove() Revert "i2c: scmi: Replace open coded device_get_match_data()"
2022-08-21	Merge tag '6.0-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds	14	-52/+45
	Pull cifs client fixes from Steve French: - memory leak fix - two small cleanups - trivial strlcpy removal - update missing entry for cifs headers in MAINTAINERS file * tag '6.0-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: move from strlcpy with unused retval to strscpy cifs: Fix memory leak on the deferred close cifs: remove useless parameter 'is_fsctl' from SMB2_ioctl() cifs: remove unused server parameter from calc_smb_size() cifs: missing directory in MAINTAINERS file
2022-08-21	asm goto: eradicate CC_HAS_ASM_GOTO	Nick Desaulniers	10	-89/+7
	GCC has supported asm goto since 4.5, and Clang has since version 9.0.0. The minimum supported versions of these tools for the build according to Documentation/process/changes.rst are 5.1 and 11.0.0 respectively. Remove the feature detection script, Kconfig option, and clean up some fallback code that is no longer supported. The removed script was also testing for a GCC specific bug that was fixed in the 4.7 release. Also remove workarounds for bpftrace using clang older than 9.0.0, since other BPF backend fixes are required at this point. Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/ Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637 Acked-by: Borislav Petkov <[email protected]> Suggested-by: Masahiro Yamada <[email protected]> Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Reviewed-by: Alexandre Belloni <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-08-21	i2c: imx: Make sure to unregister adapter on remove()	Uwe Kleine-König	1	-9/+11
	If for whatever reasons pm_runtime_resume_and_get() fails and .remove() is exited early, the i2c adapter stays around and the irq still calls its handler, while the driver data and the register mapping go away. So if later the i2c adapter is accessed or the irq triggers this results in havoc accessing freed memory and unmapped registers. So unregister the software resources even if resume failed, and only skip the hardware access in that case. Fixes: 588eb93ea49f ("i2c: imx: add runtime pm support to improve the performance") Signed-off-by: Uwe Kleine-König <[email protected]> Acked-by: Oleksij Rempel <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2022-08-21	Revert "i2c: scmi: Replace open coded device_get_match_data()"	Wolfram Sang	1	-2/+7
	This reverts commit 9ae551ded5ba55f96a83cd0811f7ef8c2f329d0c. We got a regression report, so ensure this machine boots again. We will come back with a better version hopefully. Reported-by: Josef Johansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wolfram Sang <[email protected]>
2022-08-20	Merge tag 'kbuild-fixes-v6.0' of ↵	Linus Torvalds	5	-9/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix module versioning broken on some architectures - Make dummy-tools enable CONFIG_PPC_LONG_DOUBLE_128 - Remove -Wformat-zero-length, which has no warning instance - Fix the order between drivers and libs in modules.order - Fix false-positive warnings in clang-analyzer * tag 'kbuild-fixes-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: scripts/clang-tools: Remove DeprecatedOrUnsafeBufferHandling check kbuild: fix the modules order between drivers and libs scripts/Makefile.extrawarn: Do not disable clang's -Wformat-zero-length kbuild: dummy-tools: pretend we understand __LONG_DOUBLE_128__ modpost: fix module versioning when a symbol lacks valid CRC