aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski37-195/+417
Merge in late fixes to prepare for the 6.5 net-next PR. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27gup: add warning if some caller would seem to want stack expansionLinus Torvalds1-2/+10
It feels very unlikely that anybody would want to do a GUP in an unmapped area under the stack pointer, but real users sometimes do some really strange things. So add a (temporary) warning for the case where a GUP fails and expanding the stack might have made it work. It's trivial to do the expansion in the caller as part of getting the mm lock in the first place - see __access_remote_vm() for ptrace, for example - it's just that it's unnecessarily painful to do it deep in the guts of the GUP lookup when we might have to drop and re-take the lock. I doubt anybody actually does anything quite this strange, but let's be proactive: adding these warnings is simple, and will make debugging it much easier if they trigger. Signed-off-by: Linus Torvalds <[email protected]>
2023-06-27mm: always expand the stack with the mmap write lock heldLinus Torvalds17-116/+169
This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Tested-by: Vegard Nossum <[email protected]> Tested-by: John Paul Adrian Glaubitz <[email protected]> # ia64 Tested-by: Frank Scheiner <[email protected]> # ia64 Signed-off-by: Linus Torvalds <[email protected]>
2023-06-27netlink: Add __sock_i_ino() for __netlink_diag_dump().Kuniyuki Iwashima3-4/+16
syzbot reported a warning in __local_bh_enable_ip(). [0] Commit 8d61f926d420 ("netlink: fix potential deadlock in netlink_set_err()") converted read_lock(&nl_table_lock) to read_lock_irqsave() in __netlink_diag_dump() to prevent a deadlock. However, __netlink_diag_dump() calls sock_i_ino() that uses read_lock_bh() and read_unlock_bh(). If CONFIG_TRACE_IRQFLAGS=y, read_unlock_bh() finally enables IRQ even though it should stay disabled until the following read_unlock_irqrestore(). Using read_lock() in sock_i_ino() would trigger a lockdep splat in another place that was fixed in commit f064af1e500a ("net: fix a lockdep splat"), so let's add __sock_i_ino() that would be safe to use under BH disabled. [0]: WARNING: CPU: 0 PID: 5012 at kernel/softirq.c:376 __local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376 Modules linked in: CPU: 0 PID: 5012 Comm: syz-executor487 Not tainted 6.4.0-rc7-syzkaller-00202-g6f68fc395f49 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 RIP: 0010:__local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376 Code: 45 bf 01 00 00 00 e8 91 5b 0a 00 e8 3c 15 3d 00 fb 65 8b 05 ec e9 b5 7e 85 c0 74 58 5b 5d c3 65 8b 05 b2 b6 b4 7e 85 c0 75 a2 <0f> 0b eb 9e e8 89 15 3d 00 eb 9f 48 89 ef e8 6f 49 18 00 eb a8 0f RSP: 0018:ffffc90003a1f3d0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: 1ffffffff1cf5996 RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffff8805c6f3 RBP: ffffffff8805c6f3 R08: 0000000000000001 R09: ffff8880152b03a3 R10: ffffed1002a56074 R11: 0000000000000005 R12: 00000000000073e4 R13: dffffc0000000000 R14: 0000000000000002 R15: 0000000000000000 FS: 0000555556726300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000045ad50 CR3: 000000007c646000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sock_i_ino+0x83/0xa0 net/core/sock.c:2559 __netlink_diag_dump+0x45c/0x790 net/netlink/diag.c:171 netlink_diag_dump+0xd6/0x230 net/netlink/diag.c:207 netlink_dump+0x570/0xc50 net/netlink/af_netlink.c:2269 __netlink_dump_start+0x64b/0x910 net/netlink/af_netlink.c:2374 netlink_dump_start include/linux/netlink.h:329 [inline] netlink_diag_handler_dump+0x1ae/0x250 net/netlink/diag.c:238 __sock_diag_cmd net/core/sock_diag.c:238 [inline] sock_diag_rcv_msg+0x31e/0x440 net/core/sock_diag.c:269 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2547 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0xde/0x190 net/socket.c:747 ____sys_sendmsg+0x71c/0x900 net/socket.c:2503 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2557 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2586 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f5303aaabb9 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc7506e548 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5303aaabb9 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00007f5303a6ed60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5303a6edf0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Fixes: 8d61f926d420 ("netlink: fix potential deadlock in netlink_set_err()") Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?extid=5da61cf6a9bc1902d422 Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addressesVladimir Oltean5-31/+74
When using the felix driver (the only one which supports UC filtering and MC filtering) as a DSA master for a random other DSA switch, one can see the following stack trace when the downstream switch ports join a VLAN-aware bridge: ============================= WARNING: suspicious RCU usage ----------------------------- net/8021q/vlan_core.c:238 suspicious rcu_dereference_protected() usage! stack backtrace: Workqueue: dsa_ordered dsa_slave_switchdev_event_work Call trace: lockdep_rcu_suspicious+0x170/0x210 vlan_for_each+0x8c/0x188 dsa_slave_sync_uc+0x128/0x178 __hw_addr_sync_dev+0x138/0x158 dsa_slave_set_rx_mode+0x58/0x70 __dev_set_rx_mode+0x88/0xa8 dev_uc_add+0x74/0xa0 dsa_port_bridge_host_fdb_add+0xec/0x180 dsa_slave_switchdev_event_work+0x7c/0x1c8 process_one_work+0x290/0x568 What it's saying is that vlan_for_each() expects rtnl_lock() context and it's not getting it, when it's called from the DSA master's ndo_set_rx_mode(). The caller of that - dsa_slave_set_rx_mode() - is the slave DSA interface's dsa_port_bridge_host_fdb_add() which comes from the deferred dsa_slave_switchdev_event_work(). We went to great lengths to avoid the rtnl_lock() context in that call path in commit 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work"), and calling rtnl_lock() is simply not an option due to the possibility of deadlocking when calling dsa_flush_workqueue() from the call paths that do hold rtnl_lock() - basically all of them. So, when the DSA master calls vlan_for_each() from its ndo_set_rx_mode(), the state of the 8021q driver on this device is really not protected from concurrent access by anything. Looking at net/8021q/, I don't think that vlan_info->vid_list was particularly designed with RCU traversal in mind, so introducing an RCU read-side form of vlan_for_each() - vlan_for_each_rcu() - won't be so easy, and it also wouldn't be exactly what we need anyway. In general I believe that the solution isn't in net/8021q/ anyway; vlan_for_each() is not cut out for this task. DSA doesn't need rtnl_lock() to be held per se - since it's not a netdev state change that we're blocking, but rather, just concurrent additions/removals to a VLAN list. We don't even need sleepable context - the callback of vlan_for_each() just schedules deferred work. The proposed escape is to remove the dependency on vlan_for_each() and to open-code a non-sleepable, rtnl-free alternative to that, based on copies of the VLAN list modified from .ndo_vlan_rx_add_vid() and .ndo_vlan_rx_kill_vid(). Fixes: 64fdc5f341db ("net: dsa: sync unicast and multicast addresses for VLAN filters too") Signed-off-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27Revert "af_unix: Call scm_recv() only after scm_set_cred()."Kuniyuki Iwashima1-1/+1
This reverts commit 3f5f118bb657f94641ea383c7c1b8c09a5d46ea2. Konrad reported that desktop environment below cannot be reached after commit 3f5f118bb657 ("af_unix: Call scm_recv() only after scm_set_cred().") - postmarketOS (Alpine Linux w/ musl 1.2.4) - busybox 1.36.1 - GNOME 44.1 - networkmanager 1.42.6 - openrc 0.47 Regarding to the warning of SO_PASSPIDFD, I'll post another patch to suppress it by skipping SCM_PIDFD if scm->pid == NULL in scm_pidfd_recv(). Reported-by: Konrad Dybcio <[email protected]> Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Tested-by: Ido Schimmel <[email protected]> Tested-by: Gal Pressman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-28kbuild: revive "Entering directory" for Make >= 4.4.1Masahiro Yamada1-16/+20
With commit 9da0763bdd82 ("kbuild: Use relative path when building in a subdir of the source tree"), compiler messages in out-of-tree builds include relative paths, which are relative to the build directory, not the directory where make was started. To help IDEs/editors find the source files, Kbuild lets GNU Make print "Entering directory ..." when it changes the working directory. It has been working fine for a long time, but David reported it is broken with the latest GNU Make. The behavior was changed by GNU Make commit 8f9e7722ff0f ("[SV 63537] Fix setting -w in makefiles"). Previously, setting --no-print-directory to MAKEFLAGS only affected child makes, but it is now interpreted in the current make as soon as it is set. [test code] $ cat /tmp/Makefile ifneq ($(SUBMAKE),1) MAKEFLAGS += --no-print-directory all: ; $(MAKE) SUBMAKE=1 else all: ; : endif [before 8f9e7722ff0f] $ make -C /tmp make: Entering directory '/tmp' make SUBMAKE=1 : make: Leaving directory '/tmp' [after 8f9e7722ff0f] $ make -C /tmp make SUBMAKE=1 : Previously, the effect of --no-print-directory was delayed until Kbuild started the directory descending, but it is no longer true with GNU Make 4.4.1. This commit adds one more recursion to cater to GNU Make >= 4.4.1. When Kbuild needs to change the working directory, __submake will be executed twice. __submake without --no-print-directory --> show "Entering directory ..." __submake with --no-print-directory --> parse the rest of Makefile We end up with one more recursion than needed for GNU Make < 4.4.1, but I do not want to complicate the version check. Reported-by: David Howells <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Masahiro Yamada <[email protected]> Tested-by: Nicolas Schier <[email protected]>
2023-06-28kbuild: set correct abs_srctree and abs_objtree for package buildsMasahiro Yamada1-6/+4
When you run 'make rpm-pkg', the rpmbuild tool builds the kernel in rpmbuild/BUILD, but $(abs_srctree) and $(abs_objtree) point to the directory path where make was started, not the kernel is actually being built. The same applies to 'make snap-pkg'. Fix it. Signed-off-by: Masahiro Yamada <[email protected]>
2023-06-27phylink: ReST-ify the phylink_pcs_neg_mode() kdocJakub Kicinski1-4/+6
Stephen reports warnings when rendering phylink kdocs as HTML: include/linux/phylink.h:110: ERROR: Unexpected indentation. include/linux/phylink.h:111: WARNING: Block quote ends without a blank line; unexpected unindent. include/linux/phylink.h:614: WARNING: Inline literal start-string without end-string. include/linux/phylink.h:644: WARNING: Inline literal start-string without end-string. Make phylink_pcs_neg_mode() use a proper list format to fix the first two warnings. The last two warnings, AFAICT, come from the use of shorthand like phylink_mode_*(). Perhaps those should be special-cased at the Sphinx level. Reported-by: Stephen Rothwell <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27libceph: Partially revert changes to support MSG_SPLICE_PAGESDavid Howells2-39/+107
Fix the mishandling of MSG_DONTWAIT and also reinstates the per-page checking of the source pages (which might have come from a DIO write by userspace) by partially reverting the changes to support MSG_SPLICE_PAGES and doing things a little differently. In messenger_v1: (1) The ceph_tcp_sendpage() is resurrected and the callers reverted to use that. (2) The callers now pass MSG_MORE unconditionally. Previously, they were passing in MSG_MORE|MSG_SENDPAGE_NOTLAST and then degrading that to just MSG_MORE on the last call to ->sendpage(). (3) Make ceph_tcp_sendpage() a wrapper around sendmsg() rather than sendpage(), setting MSG_SPLICE_PAGES if sendpage_ok() returns true on the page. In messenger_v2: (4) Bring back do_try_sendpage() and make the callers use that. (5) Make do_try_sendpage() use sendmsg() for both cases and set MSG_SPLICE_PAGES if sendpage_ok() is set. Fixes: 40a8c17aa770 ("ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage") Fixes: fa094ccae1e7 ("ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()") Reported-by: Ilya Dryomov <[email protected]> Link: https://lore.kernel.org/r/CAOi1vP9vjLfk3W+AJFeexC93jqPaPUn2dD_4NrzxwoZTbYfOnw@mail.gmail.com/ Link: https://lore.kernel.org/r/CAOi1vP_Bn918j24S94MuGyn+Gxk212btw7yWeDrRcW1U8pc_BA@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> cc: Xiubo Li <[email protected]> cc: Jeff Layton <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ # v1 Link: https://lore.kernel.org/r/[email protected]/ # v2 Reviewed-by: Ilya Dryomov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27net: phy: mscc: fix packet loss due to RGMII delaysVladimir Oltean1-2/+2
Two deadly typos break RX and TX traffic on the VSC8502 PHY using RGMII if phy-mode = "rgmii-id" or "rgmii-txid", and no "tx-internal-delay-ps" override exists. The negative error code from phy_get_internal_delay() does not get overridden with the delay deduced from the phy-mode, and later gets committed to hardware. Also, the rx_delay gets overridden by what should have been the tx_delay. Fixes: dbb050d2bfc8 ("phy: mscc: Add support for RGMII delay configuration") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Harini Katakam <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27Merge branch 'use-vmalloc_array-and-vcalloc'Jakub Kicinski6-9/+9
Julia Lawall says: ==================== use vmalloc_array and vcalloc The functions vmalloc_array and vcalloc were introduced in commit a8749a35c399 ("mm: vmalloc: introduce array allocation functions") but are not used much yet. This series introduces uses of these functions, to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch. @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) v2: This series uses vmalloc_array and vcalloc instead of array_size. It also leaves a multiplication of a constant by a sizeof as is. Two patches are thus dropped from the series. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27net: mana: use vmalloc_array and vcallocJulia Lawall1-1/+1
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27net: enetc: use vmalloc_array and vcallocJulia Lawall1-2/+2
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27ionic: use vmalloc_array and vcallocJulia Lawall1-2/+2
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27pds_core: use vmalloc_array and vcallocJulia Lawall1-2/+2
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27gve: use vmalloc_array and vcallocJulia Lawall1-1/+1
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27octeon_ep: use vmalloc_array and vcallocJulia Lawall1-1/+1
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-27nfsd: Fix creation time serialization orderTavian Barnes1-5/+5
In nfsd4_encode_fattr(), TIME_CREATE was being written out after all other times. However, they should be written out in an order that matches the bit flags in bmval1, which in this case are #define FATTR4_WORD1_TIME_ACCESS (1UL << 15) #define FATTR4_WORD1_TIME_CREATE (1UL << 18) #define FATTR4_WORD1_TIME_DELTA (1UL << 19) #define FATTR4_WORD1_TIME_METADATA (1UL << 20) #define FATTR4_WORD1_TIME_MODIFY (1UL << 21) so TIME_CREATE should come second. I noticed this on a FreeBSD NFSv4.2 client, which supports creation times. On this client, file times were weirdly permuted. With this patch applied on the server, times looked normal on the client. Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute") Link: https://unix.stackexchange.com/q/749605/56202 Signed-off-by: Tavian Barnes <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2023-06-27nvme-mpath: fix I/O failure with EAGAIN when failing over I/OSagi Grimberg1-0/+8
It is possible that the next available path we failover to, happens to be frozen (for example if it is during connection establishment). If the original I/O was set with NOWAIT, this cause the I/O to unnecessarily fail because the request queue cannot be entered, hence the I/O fails with EAGAIN. The NOWAIT restriction that was originally set for the I/O is no longer relevant or needed because this is the nvme requeue context. Hence we clear the REQ_NOWAIT flag when failing over I/O. This fix a simple test case of nvme controller reset during I/O when the multipath device that has only a single path and I/O fails with "Resource temporarily unavailable" errno. Note that this reproduces with io_uring which by default sets IOCB_NOWAIT by default. Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-06-27nvme: host: fix command name spellingDamien Le Moal1-1/+1
Correctly spell "Zeroes" in nvme_cmd_write_zeroes command name. Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-06-27ACPI: EC: Fix acpi_ec_dispatch_gpe()Rafael J. Wysocki1-6/+12
Commit 896e97bf99ec ("ACPI: EC: Clear GPE on interrupt handling only") broke suspend-to-idle at least on Dell XPS13 9360 and 9380. The problem is that acpi_ec_dispatch_gpe() must clear the EC GPE, because the EC GPE handler never runs when the system is in the suspend-to-idle state and if the EC GPE is not cleared by the suspend- to-idle loop, it is never cleared at all which leads to a GPE storm. This causes suspend-to-idle to burn energy instead of saving it which is potentially dangerous (the affected machines heat up rather badly when that happens). Addess this by making acpi_ec_dispatch_gpe() clear the EC GPE as it did before. Fixes: 896e97bf99ec ("ACPI: EC: Clear GPE on interrupt handling only") Tested-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-06-27vdpa/mlx5: Support interrupt bypassingEli Cohen2-9/+171
Add support for generation of interrupts from the device directly to the VM to the VCPU thus avoiding the overhead on the host CPU. When supported, the driver will attempt to allocate vectors for each data virtqueue. If a vector for a virtqueue cannot be provided it will use the QP mode where notifications go through the driver. In addition, we add a shutdown callback to make sure allocated interrupts are released in case of shutdown to allow clean shutdown. Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27virtio: Add missing documentation for structure fieldsSimon Horman1-1/+4
Add missing documentation for the vqs_list_lock field of struct virtio_device, and the validate field of struct virtio_driver. ./scripts/kernel-doc says: .../virtio.h:131: warning: Function parameter or member 'vqs_list_lock' not described in 'virtio_device' .../virtio.h:192: warning: Function parameter or member 'validate' not described in 'virtio_driver' 2 warnings as Errors No functional changes intended. Signed-off-by: Simon Horman <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]>
2023-06-27pds_vdpa: pds_vdps.rst and KconfigShannon Nelson4-0/+100
Add the documentation and Kconfig entry for pds_vdpa driver. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27pds_vdpa: subscribe to the pds_core eventsShannon Nelson2-1/+59
Register for the pds_core's notification events, primarily to find out when the FW has been reset so we can pass this on back up the chain. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27pds_vdpa: add support for vdpa and vdpamgmt interfacesShannon Nelson6-2/+892
This is the vDPA device support, where we advertise that we can support the virtio queues and deal with the configuration work through the pds_core's adminq. Signed-off-by: Shannon Nelson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2023-06-27pds_vdpa: add vdpa config client commandsShannon Nelson4-1/+236
These are the adminq commands that will be needed for setting up and using the vDPA device. There are a number of commands defined in the FW's API, but by making use of the FW's virtio BAR we only need a few of these commands for vDPA support. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27pds_vdpa: virtio bar setup for vdpaShannon Nelson2-0/+28
Prep and use the "modern" virtio bar utilities to get our virtio config space ready. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27pds_vdpa: get vdpa management infoShannon Nelson6-1/+150
Find the vDPA management information from the DSC in order to advertise it to the vdpa subsystem. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27pds_vdpa: new adminq entriesShannon Nelson1-0/+226
Add new adminq definitions in support for vDPA operations. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27pds_vdpa: move enum from common to adminq headerShannon Nelson2-21/+21
The pds_core_logical_qtype enum and IFNAMSIZ are not needed in the common PDS header, only needed when working with the adminq, so move them to the adminq header. Note: This patch might conflict with pds_vfio patches that are in review, depending on which patchset gets pulled first. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27pds_vdpa: Add new vDPA driver for AMD/Pensando DSCShannon Nelson7-0/+146
This is the initial auxiliary driver framework for a new vDPA device driver, an auxiliary_bus client of the pds_core driver. The pds_core driver supplies the PCI services for the VF device and for accessing the adminq in the PF device. This patch adds the very basics of registering for the auxiliary device and setting up debugfs entries. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27virtio: allow caller to override device DMA mask in vp_modernShannon Nelson2-1/+5
To add a bit of vendor flexibility with various virtio based devices, allow the caller to specify a different DMA mask. This adds a dma_mask field to struct virtio_pci_modern_device. If defined by the driver, this mask will be used in a call to dma_set_mask_and_coherent() instead of the traditional DMA_BIT_MASK(64). This allows limiting the DMA space on vendor devices with address limitations. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27virtio: allow caller to override device id in vp_modernShannon Nelson2-11/+22
To add a bit of vendor flexibility with various virtio based devices, allow the caller to check for a different device id. This adds a function pointer field to struct virtio_pci_modern_device to specify an override device id check. If defined by the driver, this function will be called to check that the PCI device is the vendor's expected device, and will return the found device id to be stored in mdev->id.device. This allows vendors with alternative vendor device ids to use this library on their own device BAR. Note: A lot of the diff in this is simply indenting the existing code into an else block. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27virtio_pci: Optimize virtio_pci_device structure sizeFeng Liu1-3/+4
Improve the size of the virtio_pci_device structure, which is commonly used to represent a virtio PCI device. A given virtio PCI device can either of legacy type or modern type, with the struct virtio_pci_legacy_device occupying 32 bytes and the struct virtio_pci_modern_device occupying 88 bytes. Make them a union, thereby save 32 bytes of memory as shown by the pahole tool. This improvement is particularly beneficial when dealing with numerous devices, as it helps conserve memory resources. Before the modification, pahole tool reported the following: struct virtio_pci_device { [...] struct virtio_pci_legacy_device ldev; /* 824 32 */ /* --- cacheline 13 boundary (832 bytes) was 24 bytes ago --- */ struct virtio_pci_modern_device mdev; /* 856 88 */ /* XXX last struct has 4 bytes of padding */ [...] /* size: 1056, cachelines: 17, members: 19 */ [...] }; After the modification, pahole tool reported the following: struct virtio_pci_device { [...] union { struct virtio_pci_legacy_device ldev; /* 824 32 */ struct virtio_pci_modern_device mdev; /* 824 88 */ }; /* 824 88 */ [...] /* size: 1024, cachelines: 16, members: 18 */ [...] }; Signed-off-by: Feng Liu <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]> Acked-by: Jason Wang <[email protected]>
2023-06-27tools/virtio: fix build break for aarch64Peng Fan1-1/+12
"-mfunction-return=thunk -mindirect-branch-register" are only valid for x86. So introduce compiler operation check to avoid such issues Fixes: 0d0ed4006127 ("tools/virtio: enable to build with retpoline") Signed-off-by: Peng Fan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27virtio-vdpa: Fix unchecked call to NULL set_vq_affinityDragos Tatulea1-1/+3
The referenced patch calls set_vq_affinity without checking if the op is valid. This patch adds the check. Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading mechanism") Reviewed-by: Gal Pressman <[email protected]> Signed-off-by: Dragos Tatulea <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Feng Liu <[email protected]>
2023-06-27vdpa/snet: implement the resume vDPA callbackAlvaro Karsz3-0/+22
The callback sends a resume command to the DPU through the control mechanism. Signed-off-by: Alvaro Karsz <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2023-06-27vdpa: solidrun: constify pointers to hwmon_channel_infoKrzysztof Kozlowski1-1/+1
Statically allocated array of pointers to hwmon_channel_info can be made const for safety. Acked-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Alvaro Karsz <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2023-06-27vDPA/ifcvf: a vendor driver should not set _CONFIG_S_FAILEDZhu Lingshan1-3/+1
VIRTIO_CONFIG_S_FAILED indicates the guest driver has given up the device due to fatal errors. So it is the guest decision, the vendor driver should not set this status to the device. Signed-off-by: Zhu Lingshan <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27vDPA/ifcvf: synchronize irqs in the reset routineZhu Lingshan3-52/+40
This commit synchronize irqs of the virtqueues and config space in the reset routine. Thus ifcvf_stop() and reset() are refactored as well. This commit renames ifcvf_stop_hw() to ifcvf_stop() Signed-off-by: Zhu Lingshan <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27vDPA/ifcvf: retire ifcvf_start_datapath and ifcvf_add_statusZhu Lingshan3-43/+0
Rather than former lazy-initialization mechanism, now the virtqueue operations and driver_features related ops access the virtio registers directly to take immediate actions. So ifcvf_start_datapath() should retire. ifcvf_add_status() is retierd because we should not change device status by a vendor driver's decision, this driver should only set device status which is from virito drivers upon vdpa_ops.set_status() Signed-off-by: Zhu Lingshan <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27vDPA/ifcvf: get_driver_features from virtio registersZhu Lingshan3-23/+29
This commit implements a new function ifcvf_get_driver_feature() which read driver_features from virtio registers. To be less ambiguous, ifcvf_set_features() is renamed to ifcvf_set_driver_features(), and ifcvf_get_features() is renamed to ifcvf_get_dev_features() which returns the provisioned vDPA device features. Signed-off-by: Zhu Lingshan <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27vDPA/ifcvf: virt queue ops take immediate actionsZhu Lingshan3-39/+45
In this commit, virtqueue operations including: set_vq_num(), set_vq_address(), set_vq_ready() and get_vq_ready() access PCI registers directly to take immediate actions. Signed-off-by: Zhu Lingshan <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-06-27dt-bindings: interrupt-controller: add Ralink SoCs interrupt controllerSergio Paracuellos1-0/+54
Add YAML doc for the interrupt controller which is present on Ralink SoCs. Reviewed-by: Conor Dooley <[email protected]> Signed-off-by: Sergio Paracuellos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-06-27dt-bindings: PCI: dwc: rockchip: Update for RK3588Sebastian Reichel1-3/+13
The PCIe 2.0 controllers on RK3588 need one additional clock, one additional reset line and one for ranges entry. Signed-off-by: Sebastian Reichel <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-06-27net: usb: qmi_wwan: add u-blox 0x1312 compositionDavide Tronchin1-0/+1
Add RmNet support for LARA-R6 01B. The new LARA-R6 product variant identified by the "01B" string can be configured (by AT interface) in three different USB modes: * Default mode (Vendor ID: 0x1546 Product ID: 0x1311) with 4 serial interfaces * RmNet mode (Vendor ID: 0x1546 Product ID: 0x1312) with 4 serial interfaces and 1 RmNet virtual network interface * CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1313) with 4 serial interface and 1 CDC-ECM virtual network interface The first 4 interfaces of all the 3 configurations (default, RmNet, ECM) are the same. In RmNet mode LARA-R6 01B exposes the following interfaces: If 0: Diagnostic If 1: AT parser If 2: AT parser If 3: AT parset/alternative functions If 4: RMNET interface Signed-off-by: Davide Tronchin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-06-27perf trace: fix MSG_SPLICE_PAGES build errorMatthieu Baerts1-0/+3
Our MPTCP CI and Stephen got this error: In file included from builtin-trace.c:907: trace/beauty/msg_flags.c: In function 'syscall_arg__scnprintf_msg_flags': trace/beauty/msg_flags.c:28:21: error: 'MSG_SPLICE_PAGES' undeclared (first use in this function) 28 | if (flags & MSG_##n) { | ^~~~ trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG' 50 | P_MSG_FLAG(SPLICE_PAGES); | ^~~~~~~~~~ trace/beauty/msg_flags.c:28:21: note: each undeclared identifier is reported only once for each function it appears in 28 | if (flags & MSG_##n) { | ^~~~ trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG' 50 | P_MSG_FLAG(SPLICE_PAGES); | ^~~~~~~~~~ The fix is similar to what was done with MSG_FASTOPEN: the new macro is defined if it is not defined in the system headers. Fixes: b848b26c6672 ("net: Kill MSG_SENDPAGE_NOTLAST") Reported-by: Stephen Rothwell <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Matthieu Baerts <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-06-27Merge tag 'nf-23-06-27' of ↵Paolo Abeni5-8/+60
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Reset shift on Boyer-Moore string match for each block, from Jeremy Sowden. 2) Fix acccess to non-linear area in DCCP conntrack helper, from Florian Westphal. 3) Fix kernel-doc warnings, by Randy Dunlap. 4) Bail out if expires= does not show in SIP helper message, or make ct_sip_parse_numerical_param() tristate and report error if expires= cannot be parsed. 5) Unbind non-anonymous set in case rule construction fails. 6) Fix underflow in chain reference counter in case set element already exists or it cannot be created. netfilter pull request 23-06-27 ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>