blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-07-03	vhost, vhost_net: add helper to check if vq has work	Mike Christie	3	-5/+5
	In the next patches each vq might have different workers so one could have work but others do not. For net, we only want to check specific vqs, so this adds a helper to check if a vq has work pending and converts vhost-net to use it. Signed-off-by: Mike Christie <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	vhost: add vhost_worker pointer to vhost_virtqueue	Mike Christie	2	-7/+15
	This patchset allows userspace to map vqs to different workers. This patch adds a worker pointer to the vq so in later patches in this set we can queue/flush specific vqs and their workers. Signed-off-by: Mike Christie <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	vhost: dynamically allocate vhost_worker	Mike Christie	2	-25/+45
	This patchset allows us to allocate multiple workers, so this has us move from the vhost_worker that's embedded in the vhost_dev to dynamically allocating it. Signed-off-by: Mike Christie <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	vhost: create worker at end of vhost_dev_set_owner	Mike Christie	1	-6/+13
	vsock can start queueing work after VHOST_VSOCK_SET_GUEST_CID, so after we have called vhost_worker_create it can be calling vhost_work_queue and trying to access the vhost worker/task. If vhost_dev_alloc_iovecs fails, then vhost_worker_free could free the worker/task from under vsock. This moves vhost_worker_create to the end of vhost_dev_set_owner where we know we can no longer fail in that path. If it fails after the VHOST_SET_OWNER and userspace closes the device, then the normal vsock release handling will do the right thing. Signed-off-by: Mike Christie <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	virtio_bt: call scheduler when we free unused buffs	Xianting Tian	1	-0/+1
	For virtio-net we were getting CPU stall warnings, and fixed it by calling the scheduler: see f8bb51043945 ("virtio_net: suppress cpu stall when free_unused_bufs"). This driver is similar so theoretically the same logic applies. Signed-off-by: Xianting Tian <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	virtio-console: call scheduler when we free unused buffs	Xianting Tian	1	-0/+1
	For virtio-net we were getting CPU stall warnings, and fixed it by calling the scheduler: see f8bb51043945 ("virtio_net: suppress cpu stall when free_unused_bufs"). This driver is similar so theoretically the same logic applies. Signed-off-by: Xianting Tian <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	virtio-crypto: call scheduler when we free unused buffs	Xianting Tian	1	-0/+1
	For virtio-net we were getting CPU stall warnings, and fixed it by calling the scheduler: see f8bb51043945 ("virtio_net: suppress cpu stall when free_unused_bufs"). This driver is similar so theoretically the same logic applies. Signed-off-by: Xianting Tian <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	vDPA/ifcvf: implement new accessors for vq_state	Zhu Lingshan	3	-33/+17
	This commit implements a better layout of the live migration bar, therefore the accessors for virtqueue state have been refactored. This commit also add a comment to the probing-ids list, indicating this driver drives F2000X-PL virtio-net Signed-off-by: Zhu Lingshan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	vDPA/ifcvf: detect and report max allowed vq size	Zhu Lingshan	3	-2/+35
	Rather than a hardcode, this commit detects and reports the max value of allowed size of the virtqueues Signed-off-by: Zhu Lingshan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	vDPA/ifcvf: dynamic allocate vq data stores	Zhu Lingshan	3	-1/+6
	This commit dynamically allocates the data stores for the virtqueues based on virtio_pci_common_cfg.num_queues. Signed-off-by: Zhu Lingshan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-07-03	docs: remove the tips on how to submit patches from MAINTAINERS	Jakub Kicinski	2	-78/+9
	Having "how to submit patches" in MAINTAINTERS seems out of place. We have a whole section of documentation about it, duplication is harmful and a lot of the text looks really out of date. Sections 1, 2 and 4 look really, really old and not applicable to the modern process. Section 3 is obvious but also we have build bots now. Section 5 is a bit outdated (diff -u?!). But I like the part about factoring out shared code, so add that to process docs. Section 6 is unnecessary? Section 7 is covered by more appropriate docs. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]> Message-ID: <[email protected]>
2023-07-03	docs: fix typo in zh_TW and zh_CN translation	Xueshi Hu	8	-8/+8
	In zh_TW and zh_CN translation, "http://lwn.net/Articles" is incorrectly written as "http://lwn.net/Articles". This patch is generated by the following script: rg -l "lwn.net/Articles" \| xargs sed -i 's/lwn.net\/articles/lwn.net\/Articles/g' Signed-off-by: Xueshi Hu <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]> Message-ID: <mr4mjneo2eghtpm2z6envih3kzjdjpptqcot2fm2wp5crljxag@oianggqjllbl>
2023-07-03	ovl: move all parameter handling into params.{c,h}	Christian Brauner	4	-564/+581
	While initially I thought that we couldn't move all new mount api handling into params.{c,h} it turns out it is possible. So this just moves a good chunk of code out of super.c and into params.{c,h}. Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Amir Goldstein <[email protected]>
2023-07-03	spi: rzv2m-csi: Fix SoC product name	Geert Uytterhoeven	1	-1/+1
	The SoC product name is "RZ/V2M". Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Fabrizio Castro <[email protected]> Link: https://lore.kernel.org/r/89e9870a2c510387e4d7a863025f4d3639d4a261.1688375020.git.geert+renesas@glider.be Signed-off-by: Mark Brown <[email protected]>
2023-07-03	s390/entry: remove mcck clock	Sven Schnelle	3	-4/+2
	In the past machine checks where accounted as irq time. With the conversion to generic entry, it was decided to account machine checks to the current context. The stckf at the beginning of the machine check handler and the lowcore member is no longer required, therefore remove it. Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	s390: fix various typos	Heiko Carstens	36	-50/+50
	Fix various typos found with codespell. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	s390/zcrypt: remove ZCRYPT_MULTIDEVNODES kernel config option	Harald Freudenberger	2	-29/+0
	Remove ZCRYPT_MULTIDEVNODES kernel config option and make the dependent code always build. The last years showed, that this option is enabled on all distros and exploited by some features (for example CEX plugin for kubernetes). So remove this choice as it was never used to switch off the multiple devices support for the zcrypt device driver. Signed-off-by: Harald Freudenberger <[email protected]> Reviewed-by: Holger Dengler <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	s390/zcrypt: do not retry administrative requests	Harald Freudenberger	1	-0/+6
	All kind of administrative requests should not been retried. Some card firmware detects this and assumes a replay attack. This patch checks on failure if the low level functions indicate a retry (EAGAIN) and checks for the ADMIN flag set on the request message. If this both are true, the response code for this message is changed to EIO to make sure the zcrypt API layer does not attempt to retry the request. As of now the ADMIN flag is set for a request message when - for EP11 the field 'flags' of the EP11 CPRB struct has the leftmost bit set. - for CCA when the CPRB minor version is 'T3', 'T5', 'T6' or 'T7'. Please note that the do-not-retry only applies to a request which has been sent to the card (= has been successfully enqueued) but the reply indicates some kind of failure and by default it would be replied. It is totally fine to retry a request if a previous attempt to enqueue the msg into the firmware queue had some kind of failure and thus the card has never seen this request. Reported-by: Frank Uhlig <[email protected]> Signed-off-by: Harald Freudenberger <[email protected]> Reviewed-by: Holger Dengler <[email protected]> Cc: [email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	s390/zcrypt: cleanup some debug code	Harald Freudenberger	6	-140/+0
	This patch removes most of the debug code which is build in when CONFIG_ZCRYPT_DEBUG is enabled. There is no real exploiter for this code any more and at least one ioctl fails with this code enabled. The CONFIG_ZCRYPT_DEBUG kernel config option still makes sense as some debug sysfs entries can get enabled with this and maybe long term a new better designed debug and error injection way will get introduced. This patch only removes code surrounded by the named kernel config option. This option should by default always be off anyway. The structs and defines removed by the patch have been used only by code surrounded by a CONFIG_ZCRYPT_DEBUG ifdef and thus can be removed also. In the end this patch removes all the failure-injection possibilities which had been available when the kernel had been build with CONFIG_ZCRYPT_DEBUG. It has never been used that much and was too unflexible anyway. Signed-off-by: Harald Freudenberger <[email protected]> Reviewed-by: Holger Dengler <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	s390/entry: rework entering DAT-on mode on CPU restart	Alexander Gordeev	1	-3/+8
	Instead of enforcing PSW_MASK_DAT bit on previously stored in lowcore restart_psw.mask use the PSW_KERNEL_BITS mask (which contains PSW_MASK_DAT) directly. As result, the PSW mask stored in lowcore is only used to enter the CPU restart routine, while PSW_KERNEL_BITS is used to enter the kernel code - similarily to commit 64ea2977add2 ("s390/mm: start kernel with DAT enabled"). Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	s390/mm: fence off VM macros from asm and linker	Alexander Gordeev	1	-2/+2
	Prevent assembler and linker scripts compilation errors by fencing it off with __ASSEMBLY__ define. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	s390: include linux/io.h instead of asm/io.h	Heiko Carstens	19	-26/+25
	Include linux/io.h instead of asm/io.h everywhere. linux/io.h includes asm/io.h, so this shouldn't cause any problems. Instead this might help for some randconfig build errors which were reported due to some undefined io related functions. Also move the changed include so it stays grouped together with other includes from the same directory. For ctcm_mpc.c also remove not needed comments (actually questions). Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	s390/ptrace: make all psw related defines also available for asm	Heiko Carstens	2	-85/+84
	Use the _AC() macro to make all psw related defines also available for assembler files. Acked-by: Alexander Gordeev <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	s390/ptrace: remove PSW_DEFAULT_KEY from uapi	Heiko Carstens	3	-5/+3
	Move PSW_DEFAULT_KEY from uapi/asm/ptrace.h to asm/ptrace.h. This is possible, since it depends on PAGE_DEFAULT_ACC which is not part of uapi. Or in other words: this define cannot be used without error. Therefore remove it from uapi. Acked-by: Alexander Gordeev <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2023-07-03	kdb: move kdb_send_sig() declaration to a better header file	Daniel Thompson	2	-1/+2
	kdb_send_sig() is defined in the signal code and called from kdb, but the declaration is part of the kdb internal code. Move the declaration to the shared header to avoid the warning: kernel/signal.c:4789:6: error: no previous prototype for 'kdb_send_sig' [-Werror=missing-prototypes] Reported-by: Arnd Bergmann <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Daniel Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-07-03	Documentation: ABI: sysfs-class-net-qmi: pass_through contact update	Subash Abhinov Kasiviswanathan	1	-1/+1
	Switch to the quicinc.com id. Fixes: bd1af6b5fffd ("Documentation: ABI: sysfs-class-net-qmi: document pass-through file") Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-03	tcp: annotate data races in __tcp_oow_rate_limited()	Eric Dumazet	1	-3/+9
	request sockets are lockless, __tcp_oow_rate_limited() could be called on the same object from different cpus. This is harmless. Add READ_ONCE()/WRITE_ONCE() annotations to avoid a KCSAN report. Fixes: 4ce7e93cb3fe ("tcp: rate limit ACK sent by SYN_RECV request sockets") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-03	Merge branch 'wireguard-fixes'	David S. Miller	7	-30/+54
	Jason A. Donenfeld says: ==================== wireguard fixes for 6.4.2/6.5-rc1 Sorry to send these patches during the merge window, but they're net fixes, not netdev enhancements, and while I'd ordinarily wait anyway, I just got a first bug report for one of these fixes, which I originally had thought was mostly unlikely. So please apply the following three patches to net: 1) Make proper use of nr_cpu_ids with cpumask_next(), rather than awkwardly using modulo, to handle dynamic CPU topology changes. Linus noticed this a while ago and pointed it out, and today a user actually got hit by it. 2) Respect persistent keepalive and other staged packets when setting the private key after the interface is already up. 3) Use timer_delete_sync() instead of del_timer_sync(), per the documentation. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-07-03	wireguard: timers: move to using timer_delete_sync	Jason A. Donenfeld	1	-5/+5
	The documentation says that del_timer_sync is obsolete, and code should use the equivalent timer_delete_sync instead, so switch to it. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-03	wireguard: netlink: send staged packets when setting initial private key	Jason A. Donenfeld	2	-9/+35
	Packets bound for peers can queue up prior to the device private key being set. For example, if persistent keepalive is set, a packet is queued up to be sent as soon as the device comes up. However, if the private key hasn't been set yet, the handshake message never sends, and no timer is armed to retry, since that would be pointless. But, if a user later sets a private key, the expectation is that those queued packets, such as a persistent keepalive, are actually sent. So adjust the configuration logic to account for this edge case, and add a test case to make sure this works. Maxim noticed this with a wg-quick(8) config to the tune of: [Interface] PostUp = wg set %i private-key somefile [Peer] PublicKey = ... Endpoint = ... PersistentKeepalive = 25 Here, the private key gets set after the device comes up using a PostUp script, triggering the bug. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: [email protected] Reported-by: Maxim Cournoyer <[email protected]> Tested-by: Maxim Cournoyer <[email protected]> Link: https://lore.kernel.org/wireguard/[email protected]/ Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-03	wireguard: queueing: use saner cpu selection wrapping	Jason A. Donenfeld	4	-16/+14
	Using `% nr_cpumask_bits` is slow and complicated, and not totally robust toward dynamic changes to CPU topologies. Rather than storing the next CPU in the round-robin, just store the last one, and also return that value. This simplifies the loop drastically into a much more common pattern. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: [email protected] Reported-by: Linus Torvalds <[email protected]> Tested-by: Manuel Leiner <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-03	samples: pktgen: fix append mode failed issue	J.J. Martzki	9	-6/+37
	Each sample script sources functions.sh before parameters.sh which makes $APPEND undefined when trapping EXIT no matter in append mode or not. Due to this when sample scripts finished they always do "pgctrl reset" which resets pktgen config. So move trap to each script after sourcing parameters.sh and trap EXIT explicitly. Signed-off-by: J.J. Martzki <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-03	selftests/net: Add xt_policy config for xfrm_policy test	Daniel Díaz	1	-0/+1
	When running Kselftests with the current selftests/net/config the following problem can be seen with the net:xfrm_policy.sh selftest: # selftests: net: xfrm_policy.sh [ 41.076721] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready [ 41.094787] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready [ 41.107635] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready # modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36 # iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?) # Perhaps iptables or your kernel needs to be upgraded. # modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36 # iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?) # Perhaps iptables or your kernel needs to be upgraded. # SKIP: Could not insert iptables rule ok 1 selftests: net: xfrm_policy.sh # SKIP This is because IPsec "policy" match support is not available to the kernel. This patch adds CONFIG_NETFILTER_XT_MATCH_POLICY as a module to the selftests/net/config file, so that `make kselftest-merge` can take this into consideration. Signed-off-by: Daniel Díaz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-03	net: fix net_dev_start_xmit trace event vs skb_transport_offset()	Eric Dumazet	1	-1/+2
	After blamed commit, we must be more careful about using skb_transport_offset(), as reminded us by syzbot: WARNING: CPU: 0 PID: 10 at include/linux/skbuff.h:2868 skb_transport_offset include/linux/skbuff.h:2977 [inline] WARNING: CPU: 0 PID: 10 at include/linux/skbuff.h:2868 perf_trace_net_dev_start_xmit+0x89a/0xce0 include/trace/events/net.h:14 Modules linked in: CPU: 0 PID: 10 Comm: kworker/u4:1 Not tainted 6.1.30-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet RIP: 0010:skb_transport_header include/linux/skbuff.h:2868 [inline] RIP: 0010:skb_transport_offset include/linux/skbuff.h:2977 [inline] RIP: 0010:perf_trace_net_dev_start_xmit+0x89a/0xce0 include/trace/events/net.h:14 Code: 8b 04 25 28 00 00 00 48 3b 84 24 c0 00 00 00 0f 85 4e 04 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc e8 56 22 01 fd <0f> 0b e9 f6 fc ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 86 f9 ff RSP: 0018:ffffc900002bf700 EFLAGS: 00010293 RAX: ffffffff8485d8ca RBX: 000000000000ffff RCX: ffff888100914280 RDX: 0000000000000000 RSI: 000000000000ffff RDI: 000000000000ffff RBP: ffffc900002bf818 R08: ffffffff8485d5b6 R09: fffffbfff0f8fb5e R10: 0000000000000000 R11: dffffc0000000001 R12: 1ffff110217d8f67 R13: ffff88810bec7b3a R14: dffffc0000000000 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8881f6a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f96cf6d52f0 CR3: 000000012224c000 CR4: 0000000000350ef0 Call Trace: <TASK> [<ffffffff84715e35>] trace_net_dev_start_xmit include/trace/events/net.h:14 [inline] [<ffffffff84715e35>] xmit_one net/core/dev.c:3643 [inline] [<ffffffff84715e35>] dev_hard_start_xmit+0x705/0x980 net/core/dev.c:3660 [<ffffffff8471a232>] __dev_queue_xmit+0x16b2/0x3370 net/core/dev.c:4324 [<ffffffff85416493>] dev_queue_xmit include/linux/netdevice.h:3030 [inline] [<ffffffff85416493>] batadv_send_skb_packet+0x3f3/0x680 net/batman-adv/send.c:108 [<ffffffff85416744>] batadv_send_broadcast_skb+0x24/0x30 net/batman-adv/send.c:127 [<ffffffff853bc52a>] batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:393 [inline] [<ffffffff853bc52a>] batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:421 [inline] [<ffffffff853bc52a>] batadv_iv_send_outstanding_bat_ogm_packet+0x69a/0x840 net/batman-adv/bat_iv_ogm.c:1701 [<ffffffff8151023c>] process_one_work+0x8ac/0x1170 kernel/workqueue.c:2289 [<ffffffff81511938>] worker_thread+0xaa8/0x12d0 kernel/workqueue.c:2436 Fixes: 66e4c8d95008 ("net: warn if transport header was not set") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-03	net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge mode	Vladimir Oltean	1	-3/+6
	There was a regression introduced by the blamed commit, where pinging to a VLAN-unaware bridge would fail with the repeated message "Couldn't decode source port" coming from the tagging protocol driver. When receiving packets with a bridge_vid as determined by dsa_tag_8021q_bridge_join(), dsa_8021q_rcv() will decode: - source_port = 0 (which isn't really valid, more like "don't know") - switch_id = 0 (which isn't really valid, more like "don't know") - vbid = value in range 1-7 Since the blamed patch has reversed the order of the checks, we are now going to believe that source_port != -1 and switch_id != -1, so they're valid, but they aren't. The minimal solution to the problem is to only populate source_port and switch_id with what dsa_8021q_rcv() came up with, if the vbid is zero, i.e. the source port information is trustworthy. Fixes: c1ae02d87689 ("net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-03	net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode	Vladimir Oltean	1	-2/+3
	According to the synchronization rules for .ndo_get_stats() as seen in Documentation/networking/netdevices.rst, acquiring a plain spin_lock() should not be illegal, but the bridge driver implementation makes it so. After running these commands, I am being faced with the following lockdep splat: $ ip link add link swp0 name macsec0 type macsec encrypt on && ip link set swp0 up $ ip link add dev br0 type bridge vlan_filtering 1 && ip link set br0 up $ ip link set macsec0 master br0 && ip link set macsec0 up ======================================================== WARNING: possible irq lock inversion dependency detected 6.4.0-04295-g31b577b4bd4a #603 Not tainted -------------------------------------------------------- swapper/1/0 just changed the state of lock: ffff6bd348724cd8 (&br->lock){+.-.}-{3:3}, at: br_forward_delay_timer_expired+0x34/0x198 but this lock took another, SOFTIRQ-unsafe lock in the past: (&ocelot->stats_lock){+.+.}-{3:3} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Chain exists of: &br->lock --> &br->hash_lock --> &ocelot->stats_lock Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ocelot->stats_lock); local_irq_disable(); lock(&br->lock); lock(&br->hash_lock); <Interrupt> lock(&br->lock); * DEADLOCK * (details about the 3 locks skipped) swp0 is instantiated by drivers/net/dsa/ocelot/felix.c, and this only matters to the extent that its .ndo_get_stats64() method calls spin_lock(&ocelot->stats_lock). Documentation/locking/lockdep-design.rst says: \| A lock is irq-safe means it was ever used in an irq context, while a lock \| is irq-unsafe means it was ever acquired with irq enabled. (...) \| Furthermore, the following usage based lock dependencies are not allowed \| between any two lock-classes:: \| \| <hardirq-safe> -> <hardirq-unsafe> \| <softirq-safe> -> <softirq-unsafe> Lockdep marks br->hash_lock as softirq-safe, because it is sometimes taken in softirq context (for example br_fdb_update() which runs in NET_RX softirq), and when it's not in softirq context it blocks softirqs by using spin_lock_bh(). Lockdep marks ocelot->stats_lock as softirq-unsafe, because it never blocks softirqs from running, and it is never taken from softirq context. So it can always be interrupted by softirqs. There is a call path through which a function that holds br->hash_lock: fdb_add_hw_addr() will call a function that acquires ocelot->stats_lock: ocelot_port_get_stats64(). This can be seen below: ocelot_port_get_stats64+0x3c/0x1e0 felix_get_stats64+0x20/0x38 dsa_slave_get_stats64+0x3c/0x60 dev_get_stats+0x74/0x2c8 rtnl_fill_stats+0x4c/0x150 rtnl_fill_ifinfo+0x5cc/0x7b8 rtmsg_ifinfo_build_skb+0xe4/0x150 rtmsg_ifinfo+0x5c/0xb0 __dev_notify_flags+0x58/0x200 __dev_set_promiscuity+0xa0/0x1f8 dev_set_promiscuity+0x30/0x70 macsec_dev_change_rx_flags+0x68/0x88 __dev_set_promiscuity+0x1a8/0x1f8 __dev_set_rx_mode+0x74/0xa8 dev_uc_add+0x74/0xa0 fdb_add_hw_addr+0x68/0xd8 fdb_add_local+0xc4/0x110 br_fdb_add_local+0x54/0x88 br_add_if+0x338/0x4a0 br_add_slave+0x20/0x38 do_setlink+0x3a4/0xcb8 rtnl_newlink+0x758/0x9d0 rtnetlink_rcv_msg+0x2f0/0x550 netlink_rcv_skb+0x128/0x148 rtnetlink_rcv+0x24/0x38 the plain English explanation for it is: The macsec0 bridge port is created without p->flags & BR_PROMISC, because it is what br_manage_promisc() decides for a VLAN filtering bridge with a single auto port. As part of the br_add_if() procedure, br_fdb_add_local() is called for the MAC address of the device, and this results in a call to dev_uc_add() for macsec0 while the softirq-safe br->hash_lock is taken. Because macsec0 does not have IFF_UNICAST_FLT, dev_uc_add() ends up calling __dev_set_promiscuity() for macsec0, which is propagated by its implementation, macsec_dev_change_rx_flags(), to the lower device: swp0. This triggers the call path: dev_set_promiscuity(swp0) -> rtmsg_ifinfo() -> dev_get_stats() -> ocelot_port_get_stats64() with a calling context that lockdep doesn't like (br->hash_lock held). Normally we don't see this, because even though many drivers that can be bridge ports don't support IFF_UNICAST_FLT, we need a driver that (a) doesn't support IFF_UNICAST_FLT, and (b) it forwards the IFF_PROMISC flag to another driver, and (c) that driver implements ndo_get_stats64() using a softirq-unsafe spinlock. Condition (b) is necessary because the first __dev_set_rx_mode() calls __dev_set_promiscuity() with "bool notify=false", and thus, the rtmsg_ifinfo() code path won't be entered. The same criteria also hold true for DSA switches which don't report IFF_UNICAST_FLT. When the DSA master uses a spin_lock() in its ndo_get_stats64() method, the same lockdep splat can be seen. I think the deadlock possibility is real, even though I didn't reproduce it, and I'm thinking of the following situation to support that claim: fdb_add_hw_addr() runs on a CPU A, in a context with softirqs locally disabled and br->hash_lock held, and may end up attempting to acquire ocelot->stats_lock. In parallel, ocelot->stats_lock is currently held by a thread B (say, ocelot_check_stats_work()), which is interrupted while holding it by a softirq which attempts to lock br->hash_lock. Thread B cannot make progress because br->hash_lock is held by A. Whereas thread A cannot make progress because ocelot->stats_lock is held by B. When taking the issue at face value, the bridge can avoid that problem by simply making the ports promiscuous from a code path with a saner calling context (br->hash_lock not held). A bridge port without IFF_UNICAST_FLT is going to become promiscuous as soon as we call dev_uc_add() on it (which we do unconditionally), so why not be preemptive and make it promiscuous right from the beginning, so as to not be taken by surprise. With this, we've broken the links between code that holds br->hash_lock or br->lock and code that calls into the ndo_change_rx_flags() or ndo_get_stats64() ops of the bridge port. Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-02	Merge tag 'iomap-6.5-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux	Linus Torvalds	2	-2/+1
	Pull iomap updates from Darrick Wong: - Fix a type signature mismatch - Drop Christoph as maintainer * tag 'iomap-6.5-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: drop me [hch] from MAINTAINERS for iomap fs: iomap: Change the type of blocksize from 'int' to 'unsigned int' in iomap_file_buffered_write_punch_delalloc
2023-07-02	Merge tag 'v6.5/vfs.fixes' of ↵	Linus Torvalds	1	-4/+10
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fix from Christian Brauner: "A fix for the backing file work from this cycle. When init_file() failed it would call file_free_rcu() on the file allocated by the caller of init_file(). It naively assumed that the correct cleanup operation would be called depending on whether it is a regular file or a backing file. However, that presupposes that the FMODE_BACKING flag would already be set which it won't be as that is done in the caller of init_file(). Fix that bug by moving the cleanup of the allocated file into the caller where it belongs in the first place. There's no good reason for init_file() to consume resources it didn't allocate. This is a mainline only fix and was reported by syzbot. The fix was validated by syzbot against the provided reproducer" * tag 'v6.5/vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: move cleanup from init_file() into its callers
2023-07-02	Merge tag 'i2c-for-6.5-rc1' of ↵	Linus Torvalds	103	-727/+585
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: - I2C has now a co-maintainer taking care of the host drivers. Welcome Andi Shyti and have fun! - platform remove callback converted to return void in drivers - simplify drivers by using devm_clk_get_enabled() - introduce i2c_get_match_data() to avoid more boilerplate code (especially since the core stopped delivering an i2c_device_id) - and the usual bunch of driver updates * tag 'i2c-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits) i2c: uniphier: Use devm_clk_get_enabled() i2c: uniphier-f: Use devm_clk_get_enabled() i2c: owl: Use devm_clk_get_enabled() i2c: lpc2k: Use devm_clk_get_enabled() i2c: hix5hd2: Use devm_clk_get_enabled() i2c: sun6i-p2wi: Use devm_clk_get_enabled() i2c: pasemi-platform: Use devm_clk_get_enabled() i2c: mt7621: Use devm_clk_get_enabled() i2c: xiic: Use devm_clk_get_enabled() i2c: davinci: Use platform table macro over module_alias i2c: ocores: use devm_ managed clks i2c: nomadik: Use dev_err_probe() whenever possible i2c: nomadik: Use devm_clk_get_enabled() i2c: nomadik: Remove unnecessary goto label usb: typec: ucsi: Mark dGPUs as DEVICE scope i2c: wmt: Use devm_platform_get_and_ioremap_resource() i2c: versatile: Use devm_platform_get_and_ioremap_resource() i2c: hix5hd2: Add I2C_M_STOP flag support for i2c-hix5hd2 driver. i2c: mpc: Use of_property_read_reg() to parse "reg" i2c: imx-lpi2c: Don't open-code DIV_ROUND_UP ...
2023-07-02	Merge tag 'parisc-for-6.5-rc1' of ↵	Linus Torvalds	37	-252/+485
	git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - Add missing cacheflush() syscall - Fix STI console on 64-bit-only machines - Move kernel debug options to Kconfig.debug - Lots of warning fixes in arch/parisc/ and drivers/parisc/ when compiled with W=1 - Enable some more graphics drivers in refreshed defconfigs * tag 'parisc-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (29 commits) parisc: Refresh defconfigs parisc: irq: Add irq-related function declarations parisc: Move init function declarations into header file parisc: dino: Make dino_init() returning void parisc: lba_pci: Mark two variables __maybe_unused parisc: unaligned: Include header file to avoid missing prototype warnings parisc: signal: Mark do_notify_resume() and sys_rt_sigreturn() asmlinkage parisc: unwind: Mark start and stop variables __maybe_unused parisc: init: Drop unused variable end_paddr parisc: traps: Mark functions static parisc: processor: Fix kdoc for init_cpu_profiler() parisc: sys_parisc: parisc_personality() is called from asm code parisc: ccio-dma: Fix kdoc and compiler warnings parisc: pdc_stable: Fix kdoc and compiler warnings parisc: pci-dma: Make pcxl_alloc_range() static parisc: Mark image_size __maybe_unused in perf_write() parisc: module: Mark symindex __maybe_unused parisc: pdc_chassis: Fix kdoc warnings parisc: firmware: Fix kdoc warnings parisc: drivers: Fix kdoc warnings ...
2023-07-02	xfs: fix the calculation for "end" and "length"	Shiyang Ruan	1	-4/+5
	The value of "end" should be "start + length - 1". Signed-off-by: Shiyang Ruan <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2023-07-02	xfs: fix xfs_btree_query_range callers to initialize btree rec fully	Darrick J. Wong	3	-20/+13
	Use struct initializers to ensure that the xfs_btree_irecs passed into the query_range function are completely initialized. No functional changes, just closing some sloppy hygiene. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2023-07-02	xfs: validate fsmap offsets specified in the query keys	Darrick J. Wong	1	-11/+19
	Improve the validation of the fsmap offset fields in the query keys and move the validation to the top of the function now that we have pushed the low key adjustment code downwards. Also fix some indenting issues that aren't worth a separate patch. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2023-07-02	xfs: fix logdev fsmap query result filtering	Darrick J. Wong	1	-22/+8
	The external log device fsmap backend doesn't have an rmapbt to query, so it's wasteful to spend time initializing the rmap_irec objects. Worse yet, the log could (someday) be longer than 2^32 fsblocks, so using the rmap irec structure will result in integer overflows. Fix this mess by computing the start address that we want from keys[0] directly, and use the daddr-based record filtering algorithm that we also use for rtbitmap queries. Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2023-07-02	xfs: clean up the rtbitmap fsmap backend	Darrick J. Wong	2	-53/+34
	The rtbitmap fsmap backend doesn't query the rmapbt, so it's wasteful to spend time initializing the rmap_irec objects. Worse yet, the logic to query the rtbitmap is spread across three separate functions, which is unnecessarily difficult to follow. Compute the start rtextent that we want from keys[0] directly and combine the functions to avoid passing parameters around everywhere, and consolidate all the logic into a single function. At one point many years ago I intended to use __xfs_getfsmap_rtdev as the launching point for realtime rmapbt queries, but this hasn't been the case for a long time. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2023-07-02	xfs: fix getfsmap reporting past the last rt extent	Darrick J. Wong	1	-1/+1
	The realtime section ends at the last rt extent. If the user configures the rt geometry with an extent size that is not an integer factor of the number of rt blocks, it's possible for there to be rt blocks past the end of the last rt extent. These tail blocks cannot ever be allocated and will cause corruption reports if the last extent coincides with the end of an rt bitmap block, so do not report consider them for the GETFSMAP output. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2023-07-02	xfs: fix integer overflows in the fsmap rtbitmap and logdev backends	Darrick J. Wong	1	-26/+64
	It's not correct to use the rmap irec structure to hold query key information to query the rtbitmap because the realtime volume can be longer than 2^32 fsblocks in length. Because the rt volume doesn't have allocation groups, introduce a daddr-based record filtering algorithm and compute the rtextent values using 64-bit variables. The same problem exists in the external log device fsmap implementation, so use the same solution to fix it too. After this patch, all the code that touches info->low and info->high under xfs_getfsmap_logdev and __xfs_getfsmap_rtdev are unnecessary. Cleaning this up will be done in subsequent patches. Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2023-07-02	xfs: fix interval filtering in multi-step fsmap queries	Darrick J. Wong	1	-19/+48
	I noticed a bug in ranged GETFSMAP queries: # xfs_io -c 'fsmap -vvvv' /opt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 8:80 [0..7]: static fs metadata 0 (0..7) 8 <snip> 9: 8:80 [192..223]: 137 0..31 0 (192..223) 32 # xfs_io -c 'fsmap -vvvv -d 208 208' /opt # That's not right -- we asked what block maps block 208, and we should've received a mapping for inode 137 offset 16. Instead, we get nothing. The root cause of this problem is a mis-interaction between the fsmap code and how btree ranged queries work. xfs_btree_query_range returns any btree record that overlaps with the query interval, even if the record starts before or ends after the interval. Similarly, GETFSMAP is supposed to return a recordset containing all records that overlap the range queried. However, it's possible that the recordset is larger than the buffer that the caller provided to convey mappings to userspace. In /that/ case, userspace is supposed to copy the last record returned to fmh_keys[0] and call GETFSMAP again. In this case, we do not want to return mappings that we have already supplied to the caller. The call to xfs_btree_query_range is the same, but now we ignore any records that start before fmh_keys[0]. Unfortunately, we didn't implement the filtering predicate correctly. The predicate should only be called when we're calling back for more records. Accomplish this by setting info->low.rm_blockcount to a nonzero value and ensuring that it is cleared as necessary. As a result, we no longer want to adjust dkeys[0] in the main setup function because that's confusing. This patch doesn't touch the logdev/rtbitmap backends because they have bigger problems that will be addressed by subsequent patches. Found via xfs/556 with parent pointers enabled. Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2023-07-02	Merge branch 'octeontx2-af-fixes'	David S. Miller	8	-9/+99
	Hariprasad Kelam says: ==================== octeontx2-af: MAC block fixes for CN10KB This patch set contains fixes for the issues encountered in testing CN10KB MAC block RPM_USX. Patch1: firmware to kernel communication is not working due to wrong interrupt configuration. CSR addresses are corrected. Patch2: NIX to RVU PF mapping errors encountered due to wrong firmware config. Corrects this mapping error. Patch3: Driver is trying to access non exist cgx/lmac which is resulting in kernel panic. Address this issue by adding proper checks. Patch4: MAC features are not getting reset on FLR. Fix the issue by resetting the stale config. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-07-02	octeontx2-af: Reset MAC features in FLR	Hariprasad Kelam	8	-6/+77
	AF driver configures MAC features like internal loopback and PFC upon receiving the request from PF and its VF netdev. But these features are not getting reset in FLR. This patch fixes the issue by resetting the same. Fixes: 23999b30ae67 ("octeontx2-af: Enable or disable CGX internal loopback") Fixes: 1121f6b02e7a ("octeontx2-af: Priority flow control configuration support") Signed-off-by: Hariprasad Kelam <[email protected]> Signed-off-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>