aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2024-10-03NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()Yanjun Zhang1-0/+1
On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32c8a56 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-10-03Merge tag 'net-6.12-rc2' of ↵Linus Torvalds4-3/+23
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from ieee802154, bluetooth and netfilter. Current release - regressions: - eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc - eth: am65-cpsw: fix forever loop in cleanup code Current release - new code bugs: - eth: mlx5: HWS, fixed double-free in error flow of creating SQ Previous releases - regressions: - core: avoid potential underflow in qdisc_pkt_len_init() with UFO - core: test for not too small csum_start in virtio_net_hdr_to_skb() - vrf: revert "vrf: remove unnecessary RCU-bh critical section" - bluetooth: - fix uaf in l2cap_connect - fix possible crash on mgmt_index_removed - dsa: improve shutdown sequence - eth: mlx5e: SHAMPO, fix overflow of hd_per_wq - eth: ip_gre: fix drops of small packets in ipgre_xmit Previous releases - always broken: - core: fix gso_features_check to check for both dev->gso_{ipv4_,}max_size - core: fix tcp fraglist segmentation after pull from frag_list - netfilter: nf_tables: prevent nf_skb_duplicated corruption - sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start - mac802154: fix potential RCU dereference issue in mac802154_scan_worker - eth: fec: restart PPS after link state change" * tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits) sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems doc: net: napi: Update documentation for napi_schedule_irqoff net/ncsi: Disable the ncsi work before freeing the associated structure net: phy: qt2025: Fix warning: unused import DeviceId gso: fix udp gso fraglist segmentation after pull from frag_list bridge: mcast: Fail MDB get request on empty entry vrf: revert "vrf: Remove unnecessary RCU-bh critical section" net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code net: phy: realtek: Check the index value in led_hw_control_get ppp: do not assume bh is held in ppp_channel_bridge_input() selftests: rds: move include.sh to TEST_FILES net: test for not too small csum_start in virtio_net_hdr_to_skb() net: gso: fix tcp fraglist segmentation after pull from frag_list ipv4: ip_gre: Fix drops of small packets in ipgre_xmit net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check net: add more sanity checks to qdisc_pkt_len_init() net: avoid potential underflow in qdisc_pkt_len_init() with UFO net: ethernet: ti: cpsw_ale: Fix warning on some platforms net: microchip: Make FDMA config symbol invisible ...
2024-10-03Merge tag 'vfs-6.12-rc2.fixes.2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "vfs: - Ensure that iter_folioq_get_pages() advances to the next slot otherwise it will end up using the same folio with an out-of-bound offset. iomap: - Dont unshare delalloc extents which can't be reflinked, and thus can't be shared. - Constrain the file range passed to iomap_file_unshare() directly in iomap instead of requiring the callers to do it. netfs: - Use folioq_count instead of folioq_nr_slot to prevent an unitialized value warning in netfs_clear_buffer(). - Fix missing wakeup after issuing writes by scheduling the write collector only if all the subrequest queues are empty and thus no writes are pending. - Fix two minor documentation bugs" * tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: constrain the file range passed to iomap_file_unshare iomap: don't bother unsharing delalloc extents netfs: Fix missing wakeup after issuing writes Documentation: add missing folio_queue entry folio_queue: fix documentation netfs: Fix a KMSAN uninit-value error in netfs_clear_buffer iov_iter: fix advancing slot in iter_folioq_get_pages()
2024-10-03Merge tag 'nf-24-10-02' of ↵Paolo Abeni1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix incorrect documentation in uapi/linux/netfilter/nf_tables.h regarding flowtable hooks, from Phil Sutter. 2) Fix nft_audit.sh selftests with newer nft binaries, due to different (valid) audit output, also from Phil. 3) Disable BH when duplicating packets via nf_dup infrastructure, otherwise race on nf_skb_duplicated for locally generated traffic. From Eric. 4) Missing return in callback of selftest C program, from zhang jiao. netfilter pull request 24-10-02 * tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: Add missing return value netfilter: nf_tables: prevent nf_skb_duplicated corruption selftests: netfilter: Fix nft_audit.sh for newer nft binaries netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED ==================== Link: https://patch.msgid.link/20241002202421.1281311-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-02net: test for not too small csum_start in virtio_net_hdr_to_skb()Eric Dumazet1-1/+3
syzbot was able to trigger this warning [1], after injecting a malicious packet through af_packet, setting skb->csum_start and thus the transport header to an incorrect value. We can at least make sure the transport header is after the end of the network header (with a estimated minimal size). [1] [ 67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0 mac=(-1,-1) mac_len=0 net=(16,-6) trans=10 shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0 priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) [ 67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9 [ 67.877764] sk family=17 type=3 proto=0 [ 67.878279] skb linear: 00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00 [ 67.879128] skb frag: 00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02 [ 67.879877] skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.880647] skb frag: 00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00 [ 67.881156] skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.881753] skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882173] skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882790] skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883171] skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883733] skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.884206] skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e [ 67.884704] skb frag: 000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00 [ 67.885139] skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.885677] skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886042] skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886408] skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887020] skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887384] skb frag: 00000100: 00 00 [ 67.887878] ------------[ cut here ]------------ [ 67.887908] offset (-6) >= skb_headlen() (14) [ 67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs [ 67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme #1011 [ 67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891043] Call Trace: [ 67.891173] <TASK> [ 67.891274] ? __warn (kernel/panic.c:741) [ 67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 67.891348] ? handle_bug (arch/x86/kernel/traps.c:239) [ 67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) [ 67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1)) [ 67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113) [ 67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200) [ 67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13)) [ 67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1)) [ 67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670) [ 67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227) [ 67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604) [ 67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25)) [ 67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1)) [ 67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4)) [ 67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657) [ 67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3)) [ 67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1)) [ 67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4)) [ 67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1)) [ 67.891734] ? do_sock_setsockopt (net/socket.c:2335) [ 67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355) [ 67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1)) [ 67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fixes: 9181d6f8a2bb ("net: add more sanity check in virtio_net_hdr_to_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02Merge tag 'mlx5-fixes-2024-09-25' of ↵Jakub Kicinski1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2024-09-25 * tag 'mlx5-fixes-2024-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix crash caused by calling __xfrm_state_delete() twice net/mlx5e: SHAMPO, Fix overflow of hd_per_wq net/mlx5: HWS, changed E2BIG error to a negative return code net/mlx5: HWS, fixed double-free in error flow of creating SQ net/mlx5: Fix wrong reserved field in hca_cap_2 in mlx5_ifc net/mlx5e: Fix NULL deref in mlx5e_tir_builder_alloc() net/mlx5: Added cond_resched() to crdump collection net/mlx5: Fix error path in multi-packet WQE transmit ==================== Link: https://patch.msgid.link/20240925202013.45374-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02Merge tag 'pull-work.unaligned' of ↵Linus Torvalds30-32/+31
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull generic unaligned.h cleanups from Al Viro: "Get rid of architecture-specific <asm/unaligned.h> includes, replacing them with a single generic <linux/unaligned.h> header file. It's the second largest (after asm/io.h) class of asm/* includes, and all but two architectures actually end up using exact same file. Massage the remaining two (arc and parisc) to do the same and just move the thing to from asm-generic/unaligned.h to linux/unaligned.h" [ This is one of those things that we're better off doing outside the merge window, and would only cause extra conflict noise if it was in linux-next for the next release due to all the trivial #include line updates. Rip off the band-aid. - Linus ] * tag 'pull-work.unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: move asm/unaligned.h to linux/unaligned.h arc: get rid of private asm/unaligned.h parisc: get rid of private asm/unaligned.h
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro30-32/+31
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02Merge tag 'asoc-fix-v6.12-rc1' of ↵Takashi Iwai604-5243/+13457
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.12 A bunch of fixes here that came in during the merge window and the first week of release, plus some new quirks and device IDs. There's nothing major here, it's a bit bigger than it might've been due to there being no fixes sent during the merge window due to your vacation.
2024-10-02inotify: Fix possible deadlock in fsnotify_destroy_markLizhi Xu1-7/+3
[Syzbot reported] WARNING: possible circular locking dependency detected 6.11.0-rc4-syzkaller-00019-gb311c1b497e5 #0 Not tainted ------------------------------------------------------ kswapd0/78 is trying to acquire lock: ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 but task is already holding lock: ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6841 [inline] ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xbb4/0x35a0 mm/vmscan.c:7223 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: ... kmem_cache_alloc_noprof+0x3d/0x2a0 mm/slub.c:4044 inotify_new_watch fs/notify/inotify/inotify_user.c:599 [inline] inotify_update_watch fs/notify/inotify/inotify_user.c:647 [inline] __do_sys_inotify_add_watch fs/notify/inotify/inotify_user.c:786 [inline] __se_sys_inotify_add_watch+0x72e/0x1070 fs/notify/inotify/inotify_user.c:729 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&group->mark_mutex){+.+.}-{3:3}: ... __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 fsnotify_destroy_marks+0x14a/0x660 fs/notify/mark.c:934 fsnotify_inoderemove include/linux/fsnotify.h:264 [inline] dentry_unlink_inode+0x2e0/0x430 fs/dcache.c:403 __dentry_kill+0x20d/0x630 fs/dcache.c:610 shrink_kill+0xa9/0x2c0 fs/dcache.c:1055 shrink_dentry_list+0x2c0/0x5b0 fs/dcache.c:1082 prune_dcache_sb+0x10f/0x180 fs/dcache.c:1163 super_cache_scan+0x34f/0x4b0 fs/super.c:221 do_shrink_slab+0x701/0x1160 mm/shrinker.c:435 shrink_slab+0x1093/0x14d0 mm/shrinker.c:662 shrink_one+0x43b/0x850 mm/vmscan.c:4815 shrink_many mm/vmscan.c:4876 [inline] lru_gen_shrink_node mm/vmscan.c:4954 [inline] shrink_node+0x3799/0x3de0 mm/vmscan.c:5934 kswapd_shrink_node mm/vmscan.c:6762 [inline] balance_pgdat mm/vmscan.c:6954 [inline] kswapd+0x1bcd/0x35a0 mm/vmscan.c:7223 [Analysis] The problem is that inotify_new_watch() is using GFP_KERNEL to allocate new watches under group->mark_mutex, however if dentry reclaim races with unlinking of an inode, it can end up dropping the last dentry reference for an unlinked inode resulting in removal of fsnotify mark from reclaim context which wants to acquire group->mark_mutex as well. This scenario shows that all notification groups are in principle prone to this kind of a deadlock (previously, we considered only fanotify and dnotify to be problematic for other reasons) so make sure all allocations under group->mark_mutex happen with GFP_NOFS. Reported-and-tested-by: syzbot+c679f13773f295d2da53@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c679f13773f295d2da53 Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240927143642.2369508-1-lizhi.xu@windriver.com
2024-10-02ALSA: hda: fix trigger_tstamp_latchedJaroslav Kysela1-1/+1
When the trigger_tstamp_latched flag is set, the PCM core code assumes that the low-level driver handles the trigger timestamping itself. Ensure that runtime->trigger_tstamp is always updated. Buglink: https://github.com/alsa-project/alsa-lib/issues/387 Reported-by: Zeno Endemann <zeno.endemann@mailbox.org> Signed-off-by: Jaroslav Kysela <perex@perex.cz> Link: https://patch.msgid.link/20241002081306.1788405-1-perex@perex.cz Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-10-01bpf: Make sure internal and UAPI bpf_redirect flags don't overlapToke Høiland-Jørgensen1-8/+5
The bpf_redirect_info is shared between the SKB and XDP redirect paths, and the two paths use the same numeric flag values in the ri->flags field (specifically, BPF_F_BROADCAST == BPF_F_NEXTHOP). This means that if skb bpf_redirect_neigh() is used with a non-NULL params argument and, subsequently, an XDP redirect is performed using the same bpf_redirect_info struct, the XDP path will get confused and end up crashing, which syzbot managed to trigger. With the stack-allocated bpf_redirect_info, the structure is no longer shared between the SKB and XDP paths, so the crash doesn't happen anymore. However, different code paths using identically-numbered flag values in the same struct field still seems like a bit of a mess, so this patch cleans that up by moving the flag definitions together and redefining the three flags in BPF_F_REDIRECT_INTERNAL to not overlap with the flags used for XDP. It also adds a BUILD_BUG_ON() check to make sure the overlap is not re-introduced by mistake. Fixes: e624d4ed4aa8 ("xdp: Extend xdp_redirect_map with broadcast support") Reported-by: syzbot+cca39e6e84a367a7e6f6@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Closes: https://syzkaller.appspot.com/bug?extid=cca39e6e84a367a7e6f6 Link: https://lore.kernel.org/bpf/20240920125625.59465-1-toke@redhat.com
2024-10-01cpufreq: Avoid a bad reference count on CPU nodeMiquel Sabaté Solà1-5/+1
In the parse_perf_domain function, if the call to of_parse_phandle_with_args returns an error, then the reference to the CPU device node that was acquired at the start of the function would not be properly decremented. Address this by declaring the variable with the __free(device_node) cleanup attribute. Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20240917134246.584026-1-mikisabate@gmail.com Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-01btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent ↵Filipe Manana1-1/+1
event class While running checkpatch.pl against a patch that modifies the btrfs_qgroup_extent event class, it complained about using a comma instead of a semicolon: $ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch WARNING: Possible comma where semicolon could be used #215: FILE: include/trace/events/btrfs.h:1720: + __entry->bytenr = bytenr, __entry->num_bytes = rec->num_bytes; total: 0 errors, 1 warnings, 184 lines checked So replace the comma with a semicolon to silence checkpatch and possibly other tools. It also makes the code consistent with the rest. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01folio_queue: fix documentationChristian Brauner1-1/+1
s/folioq_count/folioq_full/ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20241001134729.3f65ae78@canb.auug.org.au Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_sizeDaniel Borkmann1-0/+9
Commit 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") added a dev->gso_max_size test to gso_features_check() in order to fall back to GSO when needed. This was added as it was noticed that some drivers could misbehave if TSO packets get too big. However, the check doesn't respect dev->gso_ipv4_max_size limit. For instance, a device could be configured with BIG TCP for IPv4, but not IPv6. Therefore, add a netif_get_gso_max_size() equivalent to netif_get_gro_max_size() and use the helper to respect both limits before falling back to GSO engine. Fixes: 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240923212242.15669-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: Add netif_get_gro_max_size helper for GRODaniel Borkmann1-0/+9
Add a small netif_get_gro_max_size() helper which returns the maximum IPv4 or IPv6 GRO size of the netdevice. We later add a netif_get_gso_max_size() equivalent as well for GSO, so that these helpers can be used consistently instead of open-coded checks. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240923212242.15669-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01Merge tag 'drm-misc-fixes-2024-09-26' of ↵Dave Airlie2-1/+10
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: atomic: - Use correct type when reading damage rectangles display: - Fix kernel docs dp-mst: - Fix DSC decompression detection hdmi: - Fix infoframe size panthor: - Fix locking sched: - Update maintainers - Fix race condition whne queueing up jobs sysfb: - Disable sysfb if framebuffer parent device is unknown vbox: - Fix VLA handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240926121045.GA561653@localhost.localdomain
2024-09-30Merge tag 'vfs-6.12-rc2.fixes' of ↵Linus Torvalds2-1/+170
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "afs: - Fix setting of the server responding flag - Remove unused struct afs_address_list and afs_put_address_list() function - Fix infinite loop because of unresponsive servers - Ensure that afs_retry_request() function is correctly added to the afs_req_ops netfs operations table netfs: - Fix netfs_folio tracepoint handling to handle NULL mappings - Add a missing folio_queue API documentation - Ensure that netfs_write_folio() correctly advances the iterator via iov_iter_advance() - Fix a dentry leak during concurrent cull and cookie lookup operations in cachefiles pidfs: - Correctly handle accessing another task's pid namespace" * tag 'vfs-6.12-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs: Fix the netfs_folio tracepoint to handle NULL mapping netfs: Add folio_queue API documentation netfs: Advance iterator correctly rather than jumping it afs: Fix the setting of the server responding flag afs: Remove unused struct and function prototype afs: Fix possible infinite loop with unresponsive servers pidfs: check for valid pid namespace afs: Fix missing wire-up of afs_retry_request() cachefiles: fix dentry leak in cachefiles_open_file()
2024-09-30netfs: Fix the netfs_folio tracepoint to handle NULL mappingDavid Howells1-1/+2
Fix the netfs_folio tracepoint to handle folios that have a NULL mapping pointer. In such a case, just substitute a zero inode number. Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2917423.1727697556@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-30netfs: Add folio_queue API documentationDavid Howells1-0/+168
Add API documentation for folio_queue. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2912369.1727691281@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-doc@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-29close_range(): fix the logics in descriptor table trimmingAl Viro1-4/+4
Cloning a descriptor table picks the size that would cover all currently opened files. That's fine for clone() and unshare(), but for close_range() there's an additional twist - we clone before we close, and it would be a shame to have close_range(3, ~0U, CLOSE_RANGE_UNSHARE) leave us with a huge descriptor table when we are not going to keep anything past stderr, just because some large file descriptor used to be open before our call has taken it out. Unfortunately, it had been dealt with in an inherently racy way - sane_fdtable_size() gets a "don't copy anything past that" argument (passed via unshare_fd() and dup_fd()), close_range() decides how much should be trimmed and passes that to unshare_fd(). The problem is, a range that used to extend to the end of descriptor table back when close_range() had looked at it might very well have stuff grown after it by the time dup_fd() has allocated a new files_struct and started to figure out the capacity of fdtable to be attached to that. That leads to interesting pathological cases; at the very least it's a QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes a snapshot of descriptor table one might have observed at some point. Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination of unshare(CLONE_FILES) with plain close_range(), ending up with a weird state that would never occur with unshare(2) is confusing, to put it mildly. It's not hard to get rid of - all it takes is passing both ends of the range down to sane_fdtable_size(). There we are under ->files_lock, so the race is trivially avoided. So we do the following: * switch close_files() from calling unshare_fd() to calling dup_fd(). * undo the calling convention change done to unshare_fd() in 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" * introduce struct fd_range, pass a pointer to that to dup_fd() and sane_fdtable_size() instead of "trim everything past that point" they are currently getting. NULL means "we are not going to be punching any holes"; NR_OPEN_MAX is gone. * make sane_fdtable_size() use find_last_bit() instead of open-coding it; it's easier to follow that way. * while we are at it, have dup_fd() report errors by returning ERR_PTR(), no need to use a separate int *errorp argument. Fixes: 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-09-29Merge tag 'dma-mapping-6.12-2024-09-29' of ↵Linus Torvalds1-18/+19
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fix from Christoph Hellwig: - handle chained SGLs in the new tracing code (Christoph Hellwig) * tag 'dma-mapping-6.12-2024-09-29' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: fix DMA API tracing for chained scatterlists
2024-09-29Merge tag 'locking-urgent-2024-09-29' of ↵Linus Torvalds1-0/+136
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "lockdep: - Fix potential deadlock between lockdep and RCU (Zhiguo Niu) - Use str_plural() to address Coccinelle warning (Thorsten Blum) - Add debuggability enhancement (Luis Claudio R. Goncalves) static keys & calls: - Fix static_key_slow_dec() yet again (Peter Zijlstra) - Handle module init failure correctly in static_call_del_module() (Thomas Gleixner) - Replace pointless WARN_ON() in static_call_module_notify() (Thomas Gleixner) <linux/cleanup.h>: - Add usage and style documentation (Dan Williams) rwsems: - Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS (Waiman Long) atomic ops, x86: - Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak) - Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros Bizjak)" Signed-off-by: Ingo Molnar <mingo@kernel.org> * tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS jump_label: Fix static_key_slow_dec() yet again static_call: Replace pointless WARN_ON() in static_call_module_notify() static_call: Handle module init failure correctly in static_call_del_module() locking/lockdep: Simplify character output in seq_line() lockdep: fix deadlock issue between lockdep and rcu lockdep: Use str_plural() to fix Coccinelle warning cleanup: Add usage and style documentation lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8
2024-09-29Merge branch 'locking/core' into locking/urgent, to pick up pending commitsIngo Molnar1-0/+136
Merge all pending locking commits into a single branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-09-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-2/+16
Pull x86 kvm updates from Paolo Bonzini: "x86: - KVM currently invalidates the entirety of the page tables, not just those for the memslot being touched, when a memslot is moved or deleted. This does not traditionally have particularly noticeable overhead, but Intel's TDX will require the guest to re-accept private pages if they are dropped from the secure EPT, which is a non starter. Actually, the only reason why this is not already being done is a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs, so allow userspace to opt into the new behavior. - Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon) - Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work) - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value at the ICR offset - Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler - Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit - Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the nested guest) - Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU - Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding multi generation LRU support in KVM - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled, i.e. when the CPU has already flushed the RSB - Trace the per-CPU host save area as a VMCB pointer to improve readability and cleanup the retrieval of the SEV-ES host save area - Remove unnecessary accounting of temporary nested VMCB related allocations - Set FINAL/PAGE in the page fault error code for EPT violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2 - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible) - Minor SGX fix and a cleanup - Misc cleanups Generic: - Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed - Enable virtualization when KVM is loaded, not right before the first VM is created Together with the previous change, this simplifies a lot the logic of the callbacks, because their very existence implies virtualization is enabled - Fix a bug that results in KVM prematurely exiting to userspace for coalesced MMIO/PIO in many cases, clean up the related code, and add a testcase - Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_ the gpa+len crosses a page boundary, which thankfully is guaranteed to not happen in the current code base. Add WARNs in more helpers that read/write guest memory to detect similar bugs Selftests: - Fix a goof that caused some Hyper-V tests to be skipped when run on bare metal, i.e. NOT in a VM - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest - Explicitly include one-off assets in .gitignore. Past Sean was completely wrong about not being able to detect missing .gitignore entries - Verify userspace single-stepping works when KVM happens to handle a VM-Exit in its fastpath - Misc cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) Documentation: KVM: fix warning in "make htmldocs" s390: Enable KVM_S390_UCONTROL config in debug_defconfig selftests: kvm: s390: Add VM run test case KVM: SVM: let alternatives handle the cases when RSB filling is required KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure() KVM: x86: Update retry protection fields when forcing retry on emulation failure KVM: x86: Apply retry protection to "unprotect on failure" path KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn ...
2024-09-28Merge tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds1-2/+0
Pull ceph updates from Ilya Dryomov: "Three CephFS fixes from Xiubo and Luis and a bunch of assorted cleanups" * tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client: ceph: remove the incorrect Fw reference check when dirtying pages ceph: Remove empty definition in header file ceph: Fix typo in the comment ceph: fix a memory leak on cap_auths in MDS client ceph: flush all caps releases when syncing the whole filesystem ceph: rename ceph_flush_cap_releases() to ceph_flush_session_cap_releases() libceph: use min() to simplify code in ceph_dns_resolve_name() ceph: Convert to use jiffies macro ceph: Remove unused declarations
2024-09-27Merge tag 'bitmap-for-6.12' of https://github.com/norov/linuxLinus Torvalds7-232/+291
Pull bitmap updates from Yury Norov: - switch all bitmamp APIs from inline to __always_inline (Brian Norris) The __always_inline series improves on code generation, and now with the latest compiler versions is required to avoid compilation warnings. It spent enough in my backlog, and I'm thankful to Brian Norris for taking over and moving it forward. - introduce GENMASK_U128() macro (Anshuman Khandual) GENMASK_U128() is a prerequisite needed for arm64 development * tag 'bitmap-for-6.12' of https://github.com/norov/linux: lib/test_bits.c: Add tests for GENMASK_U128() uapi: Define GENMASK_U128 nodemask: Switch from inline to __always_inline cpumask: Switch from inline to __always_inline bitmap: Switch from inline to __always_inline find: Switch from inline to __always_inline
2024-09-27Merge tag 'cxl-for-6.12' of ↵Linus Torvalds3-0/+28
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull compute express link (cxl) updates from Dave Jiang: "Major changes address HDM decoder initialization from DVSEC ranges, refactoring the code related to cxl mailboxes to be independent of the memory devices, and adding support for shared upstream link access_coordinate calculation, as well as a change to remove locking from memory notifier callback. In addition, a number of misc cleanups and refactoring of the code are also included. Address HDM decoder initialization from DVSEC ranges: - Only register non-zero DVSEC ranges - Remove duplicate implementation of waiting for memory_info_valid - Simplify the checking of mem_enabled in cxl_hdm_decode_init() Refactor the code related to cxl mailboxes to be independent of the memory devices: - Move cxl headers in include/linux/ to include/cxl - Move all mailbox related data to 'struct cxl_mailbox' - Refactor mailbox APIs with 'struct cxl_mailbox' as input instead of memory device state Add support for shared upstream link access_coordinate calculation for configurations that have multiple targets under a switch or a root port where the aggregated bandwidth can be greater than the upstream link of the switch/RP upstream link: - Preserve the CDAT access_coordinate from an endpoint - Add the support for shared upstream link access_coordinate calculation - Add documentation to explain how the calculations are done Remove locking from memory notifier callback. Misc cleanups: - Convert devm_cxl_add_root() to return using ERR_CAST() - cxl_test use dev_is_platform() instead of open coding - Remove duplicate include of header core.h in core/cdat.c - use scoped resource management to drop put_device() for cxl_port - Use scoped_guard to drop device_lock() for cxl_port - Refactor __devm_cxl_add_port() to drop gotos - Rename cxl_setup_parent_dport to cxl_dport_init_aer and cxl_dport_map_regs() to cxl_dport_map_ras() - Refactor cxl_dport_init_aer() to be more concise - Remove duplicate host_bridge->native_aer checking in cxl_dport_init_ras_reporting() - Fix comment for cxl_query_cmd()" * tag 'cxl-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits) cxl: Add documentation to explain the shared link bandwidth calculation cxl: Calculate region bandwidth of targets with shared upstream link cxl: Preserve the CDAT access_coordinate for an endpoint cxl: Fix comment regarding cxl_query_cmd() return data cxl: Convert cxl_internal_send_cmd() to use 'struct cxl_mailbox' as input cxl: Move mailbox related bits to the same context cxl: move cxl headers to new include/cxl/ directory cxl/region: Remove lock from memory notifier callback cxl/pci: simplify the check of mem_enabled in cxl_hdm_decode_init() cxl/pci: Check Mem_info_valid bit for each applicable DVSEC cxl/pci: Remove duplicated implementation of waiting for memory_info_valid cxl/pci: Fix to record only non-zero ranges cxl/pci: Remove duplicate host_bridge->native_aer checking cxl/pci: cxl_dport_map_rch_aer() cleanup cxl/pci: Rename cxl_setup_parent_dport() and cxl_dport_map_regs() cxl/port: Refactor __devm_cxl_add_port() to drop goto pattern cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port cxl/port: Use __free() to drop put_device() for cxl_port cxl: Remove duplicate included header file core.h tools/testing/cxl: Use dev_is_platform() ...
2024-09-27Merge tag 'mm-hotfixes-stable-2024-09-27-09-45' of ↵Linus Torvalds2-1/+11
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "19 hotfixes. 13 are cc:stable. There's a focus on fixes for the memfd_pin_folios() work which was added into 6.11. Apart from that, the usual shower of singleton fixes" * tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: ocfs2: fix uninit-value in ocfs2_get_block() zram: don't free statically defined names memory tiers: use default_dram_perf_ref_source in log message Revert "list: test: fix tests for list_cut_position()" kselftests: mm: fix wrong __NR_userfaultfd value compiler.h: specify correct attribute for .rodata..c_jump_table mm/damon/Kconfig: update DAMON doc URL mm: kfence: fix elapsed time for allocated/freed track ocfs2: fix deadlock in ocfs2_get_system_file_inode ocfs2: reserve space for inline xattr before attaching reflink tree mm: migrate: annotate data-race in migrate_folio_unmap() mm/hugetlb: simplify refs in memfd_alloc_folio mm/gup: fix memfd_pin_folios alloc race panic mm/gup: fix memfd_pin_folios hugetlb page allocation mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak mm/hugetlb: fix memfd_pin_folios free_huge_pages leak mm/filemap: fix filemap_get_folios_contig THP panic mm: make SPLIT_PTE_PTLOCKS depend on SMP tools: fix shared radix-tree build
2024-09-27Merge tag 'for-linus-6.12-rc1a-tag' of ↵Linus Torvalds6-5/+146
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: "A second round of Xen related changes and features: - a small fix of the xen-pciback driver for a warning issued by sparse - support PCI passthrough when using a PVH dom0 - enable loading the kernel in PVH mode at arbitrary addresses, avoiding conflicts with the memory map when running as a Xen dom0 using the host memory layout" * tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/pvh: Add 64bit relocation page tables x86/kernel: Move page table macros to header x86/pvh: Set phys_base when calling xen_prepare_pvh() x86/pvh: Make PVH entrypoint PIC for x86-64 xen: sync elfnote.h from xen tree xen/pciback: fix cast to restricted pci_ers_result_t and pci_power_t xen/privcmd: Add new syscall to get gsi from dev xen/pvh: Setup gsi for passthrough device xen/pci: Add a function to reset device for xen
2024-09-27Merge tag 'for-6.12/dm-changes' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mikulas Patocka: - Misc VDO fixes - Remove unused declarations dm_get_rq_mapinfo() and dm_zone_map_bio() - Dm-delay: Improve kernel documentation - Dm-crypt: Allow to specify the integrity key size as an option - Dm-bufio: Remove pointless NULL check - Small code cleanups: Use ERR_CAST; remove unlikely() around IS_ERR; use __assign_bit - Dm-integrity: Fix gcc 5 warning; convert comma to semicolon; fix smatch warning - Dm-integrity: Support recalculation in the 'I' mode - Revert "dm: requeue IO if mapping table not yet available" - Dm-crypt: Small refactoring to make the code more readable - Dm-cache: Remove pointless error check - Dm: Fix spelling errors - Dm-verity: Restart or panic on an I/O error if restart or panic was requested - Dm-verity: Fallback to platform keyring also if key in trusted keyring is rejected * tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits) dm verity: fallback to platform keyring also if key in trusted keyring is rejected dm-verity: restart or panic on an I/O error dm: fix spelling errors dm-cache: remove pointless error check dm vdo: handle unaligned discards correctly dm vdo indexer: Convert comma to semicolon dm-crypt: Use common error handling code in crypt_set_keyring_key() dm-crypt: Use up_read() together with key_put() only once in crypt_set_keyring_key() Revert "dm: requeue IO if mapping table not yet available" dm-integrity: check mac_size against HASH_MAX_DIGESTSIZE in sb_mac() dm-integrity: support recalculation in the 'I' mode dm integrity: Convert comma to semicolon dm integrity: fix gcc 5 warning dm: Make use of __assign_bit() API dm integrity: Remove extra unlikely helper dm: Convert to use ERR_CAST() dm bufio: Remove NULL check of list_entry() dm-crypt: Allow to specify the integrity key size as option dm: Remove unused declaration and empty definition "dm_zone_map_bio" dm delay: enhance kernel documentation ...
2024-09-27Merge tag 'driver-core-6.12-rc1' of ↵Linus Torvalds7-13/+9
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of patches for the driver core code for 6.12-rc1. This set is the one that caused the most delay on my side, due to lots of last-minute reports of problems in the async shutdown feature that was added. In the end, I've reverted all of the patches in that series so we are back to "normal" and the patch set is being reworked for the next merge window. Other than the async shutdown patches that were reverted, included in here are: - minor driver core cleanups - minor driver core bus and class api cleanups and simplifications for some callbacks - some const markings of structures - other even more minor cleanups All of these, including the last minute reverts, have been in linux-next, but all of the reports of problems in linux-next were before the reverts happened. After the reverts, all is good" * tag 'driver-core-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits) Revert "driver core: don't always lock parent in shutdown" Revert "driver core: separate function to shutdown one device" Revert "driver core: shut down devices asynchronously" Revert "nvme-pci: Make driver prefer asynchronous shutdown" Revert "driver core: fix async device shutdown hang" driver core: fix async device shutdown hang driver core: attribute_container: Remove unused functions driver core: Trivially simplify ((struct device_private *)curr)->device->p to @curr devres: Correclty strip percpu address space of devm_free_percpu() argument driver core: Make parameter check consistent for API cluster device_(for_each|find)_child() bus: fsl-mc: make fsl_mc_bus_type const nvme-pci: Make driver prefer asynchronous shutdown driver core: shut down devices asynchronously driver core: separate function to shutdown one device driver core: don't always lock parent in shutdown platform: Make platform_bus_type constant driver core: class: Check namespace relevant parameters in class_register() driver:base:core: Adding a "Return:" line in comment for device_link_add() drivers/base: Introduce device_match_t for device finding APIs firmware_loader: Block path traversal ...
2024-09-27[tree-wide] finally take no_llseek outAl Viro2-2/+0
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-26compiler.h: specify correct attribute for .rodata..c_jump_tableTiezhu Yang1-1/+1
Currently, there is an assembler message when generating kernel/bpf/core.o under CONFIG_OBJTOOL with LoongArch compiler toolchain: Warning: setting incorrect section attributes for .rodata..c_jump_table This is because the section ".rodata..c_jump_table" should be readonly, but there is a "W" (writable) part of the flags: $ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c" [34] .rodata..c_j[...] PROGBITS 0000000000000000 0000d2e0 0000000000000800 0000000000000000 WA 0 0 8 There is no above issue on x86 due to the generated section flag is only "A" (allocatable). In order to silence the warning on LoongArch, specify the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly, then the section attribute of ".rodata..c_jump_table" must be readonly in the kernel/bpf/core.o file. Before: $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c" 21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3 CONTENTS, ALLOC, LOAD, RELOC, DATA After: $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c" 21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA By the way, AFAICT, maybe the root cause is related with the different compiler behavior of various archs, so to some extent this change is a workaround for LoongArch, and also there is no effect for x86 which is the only port supported by objtool before LoongArch with this patch. Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> [6.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-26mm/hugetlb: fix memfd_pin_folios resv_huge_pages leakSteve Sistare1-0/+10
memfd_pin_folios followed by unpin_folios leaves resv_huge_pages elevated if the pages were not already faulted in. During a normal page fault, resv_huge_pages is consumed here: hugetlb_fault() alloc_hugetlb_folio() dequeue_hugetlb_folio_vma() dequeue_hugetlb_folio_nodemask() dequeue_hugetlb_folio_node_exact() free_huge_pages-- resv_huge_pages-- During memfd_pin_folios, the page is created by calling alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and resv_huge_pages is not modified: memfd_alloc_folio() alloc_hugetlb_folio_nodemask() dequeue_hugetlb_folio_nodemask() dequeue_hugetlb_folio_node_exact() free_huge_pages-- alloc_hugetlb_folio_nodemask has other callers that must not modify resv_huge_pages. Therefore, to fix, define an alternate version of alloc_hugetlb_folio_nodemask for this call site that adjusts resv_huge_pages. Link: https://lkml.kernel.org/r/1725373521-451395-4-git-send-email-steven.sistare@oracle.com Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-26Merge tag 'soc-ep93xx-dt-6.12' of ↵Linus Torvalds6-174/+70
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC update from Arnd Bergmann: "Convert ep93xx to devicetree This concludes a long journey towards replacing the old board files with devictree description on the Cirrus Logic EP93xx platform. Nikita Shubin has been working on this for a long time, for details see the last post on https://lore.kernel.org/lkml/20240909-ep93xx-v12-0-e86ab2423d4b@maquefel.me/" * tag 'soc-ep93xx-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits) dt-bindings: gpio: ep9301: Add missing "#interrupt-cells" to examples MAINTAINERS: Update EP93XX ARM ARCHITECTURE maintainer soc: ep93xx: drop reference to removed EP93XX_SOC_COMMON config net: cirrus: use u8 for addr to calm down sparse dmaengine: cirrus: use snprintf() to calm down gcc 13.3.0 dmaengine: ep93xx: Fix a NULL vs IS_ERR() check in probe() pinctrl: ep93xx: Fix raster pins typo spi: ep93xx: update kerneldoc comments for ep93xx_spi clk: ep93xx: Fix off by one in ep93xx_div_recalc_rate() clk: ep93xx: add module license dmaengine: cirrus: remove platform code ASoC: cirrus: edb93xx: Delete driver ARM: ep93xx: soc: drop defines ARM: ep93xx: delete all boardfiles ata: pata_ep93xx: remove legacy pinctrl use pwm: ep93xx: drop legacy pinctrl ARM: ep93xx: DT for the Cirrus ep93xx SoC platforms ARM: dts: ep93xx: Add EDB9302 DT ARM: dts: ep93xx: add ts7250 board ARM: dts: add Cirrus EP93XX SoC .dtsi ...
2024-09-26Merge tag 'asm-generic-6.12' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "These are only two small patches, one cleanup for arch/alpha and a preparation patch cleaning up the handling of runtime constants in the linker scripts" * tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: runtime constants: move list of constants to vmlinux.lds.h alpha: no need to include asm/xchg.h twice
2024-09-26Merge tag 'efi-next-for-v6.12' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "Not a lot happening in EFI land this cycle. - Prevent kexec from crashing on a corrupted TPM log by using a memory type that is reserved by default - Log correctable errors reported via CPER - A couple of cosmetic fixes" * tag 'efi-next-for-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: Remove redundant null pointer checks in efi_debugfs_init() efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption efi/cper: Print correctable AER information efi: Remove unused declaration efi_initialize_iomem_resources()
2024-09-26Revert "binfmt_elf, coredump: Log the reason of the failed core dumps"Linus Torvalds1-6/+2
This reverts commit fb97d2eb542faf19a8725afbd75cbc2518903210. The logging was questionable to begin with, but it seems to actively deadlock on the task lock. "On second thought, let's not log core dump failures. 'Tis a silly place" because if you can't tell your core dump is truncated, maybe you should just fix your debugger instead of adding bugs to the kernel. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Link: https://lore.kernel.org/all/d122ece6-3606-49de-ae4d-8da88846bef2@oracle.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-26dma-mapping: fix DMA API tracing for chained scatterlistsChristoph Hellwig1-18/+19
scatterlist allocations can be chained, and thus all iterations need to use the chain-aware iterators. Switch the newly added tracing to use the proper iterators so that they work with chained scatterlists. Fixes: 038eb433dc14 ("dma-mapping: add tracing for dma-mapping API calls") Reported-by: syzbot+95e4ef83a3024384ec7a@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Tested-by: syzbot+95e4ef83a3024384ec7a@syzkaller.appspotmail.com
2024-09-26Merge tag 'net-6.12-rc1' of ↵Linus Torvalds3-6/+34
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. It looks like that most people are still traveling: both the ML volume and the processing capacity are low. Previous releases - regressions: - netfilter: - nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put() - nf_tables: keep deleted flowtable hooks until after RCU - tcp: check skb is non-NULL in tcp_rto_delta_us() - phy: aquantia: fix -ETIMEDOUT PHY probe failure when firmware not present - eth: virtio_net: fix mismatched buf address when unmapping for small packets - eth: stmmac: fix zero-division error when disabling tc cbs - eth: bonding: fix unnecessary warnings and logs from bond_xdp_get_xmit_slave() Previous releases - always broken: - netfilter: - fix clash resolution for bidirectional flows - fix allocation with no memcg accounting - eth: r8169: add tally counter fields added with RTL8125 - eth: ravb: fix rx and tx frame size limit" * tag 'net-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits) selftests: netfilter: Avoid hanging ipvs.sh kselftest: add test for nfqueue induced conntrack race netfilter: nfnetlink_queue: remove old clash resolution logic netfilter: nf_tables: missing objects with no memcg accounting netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n netfilter: nf_tables: Keep deleted flowtable hooks until after RCU docs: tproxy: ignore non-transparent sockets in iptables netfilter: ctnetlink: Guard possible unused functions selftests: netfilter: nft_tproxy.sh: add tcp tests selftests: netfilter: add reverse-clash resolution test case netfilter: conntrack: add clash resolution for reverse collisions netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash selftests/net: packetdrill: increase timing tolerance in debug mode usbnet: fix cyclical race on disconnect with work queue net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled virtio_net: Fix mismatched buf address when unmapping for small packets bonding: Fix unnecessary warnings and logs from bond_xdp_get_xmit_slave() r8169: add missing MODULE_FIRMWARE entry for RTL8126A rev.b ...
2024-09-26Merge tag 'char-misc-6.12-rc1' of ↵Linus Torvalds10-60/+363
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver updates from Greg KH: "Here is the "big" set of char/misc and other driver subsystem changes for 6.12-rc1. Lots of changes in here, primarily dominated by the usual IIO driver updates and additions, but there are also small driver subsystem updates all over the place. Included in here are: - lots and lots of new IIO drivers and updates to existing ones - interconnect subsystem updates and new drivers - nvmem subsystem updates and new drivers - mhi driver updates - power supply subsystem updates - kobj_type const work for many different small subsystems - comedi driver fix - coresight subsystem and driver updates - fpga subsystem improvements - slimbus fixups - binder new feature addition for "frozen" notifications - lots and lots of other small driver updates and cleanups All of these have been in linux-next for a long time with no reported problems" * tag 'char-misc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (354 commits) greybus: gb-beagleplay: Add firmware upload API arm64: dts: ti: k3-am625-beagleplay: Add bootloader-backdoor-gpios to cc1352p7 dt-bindings: net: ti,cc1352p7: Add bootloader-backdoor-gpios MAINTAINERS: Update path for U-Boot environment variables YAML nvmem: layouts: add U-Boot env layout comedi: ni_routing: tools: Check when the file could not be opened ocxl: Remove the unused declarations in headr file hpet: Fix the wrong format specifier uio: Constify struct kobj_type cxl: Constify struct kobj_type binder: modify the comment for binder_proc_unlock iio: adc: axp20x_adc: add support for AXP717 ADC dt-bindings: iio: adc: Add AXP717 compatible iio: adc: axp20x_adc: Add adc_en1 and adc_en2 to axp_data w1: ds2482: Drop explicit initialization of struct i2c_device_id::driver_data to 0 tools: iio: rm .*.cmd when make clean iio: adc: standardize on formatting for id match tables iio: proximity: aw96103: Add support for aw96103/aw96105 proximity sensor bus: mhi: host: pci_generic: Enable EDL trigger for Foxconn modems bus: mhi: host: pci_generic: Update EDL firmware path for Foxconn modems ...
2024-09-26Merge tag 'tty-6.12-rc1' of ↵Linus Torvalds3-11/+24
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver updates from Greg KH: "Here is the "big" set of tty/serial driver updates for 6.12-rc1. Nothing major in here, just nice forward progress in the slow cleanup of the serial apis, and lots of other driver updates and fixes. Included in here are: - serial api updates from Jiri to make things more uniform and sane - 8250_platform driver cleanups - samsung serial driver fixes and updates - qcom-geni serial driver fixes from Johan for the bizarre UART engine that that chip seems to have. Hopefully it's in a better state now, but hardware designers still seem to come up with more ways to make broken UARTS 40+ years after this all should have finished. - sc16is7xx driver updates - omap 8250 driver updates - 8250_bcm2835aux driver updates - a few new serial driver bindings added - other serial minor driver updates All of these have been in linux-next for a long time with no reported problems" * tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits) tty: serial: samsung: Fix serial rx on Apple A7-A9 tty: serial: samsung: Fix A7-A11 serial earlycon SError tty: serial: samsung: Use bit manipulation macros for APPLE_S5L_* tty: rp2: Fix reset with non forgiving PCIe host bridges serial: 8250_aspeed_vuart: Enable module autoloading serial: qcom-geni: fix polled console corruption serial: qcom-geni: disable interrupts during console writes serial: qcom-geni: fix console corruption serial: qcom-geni: introduce qcom_geni_serial_poll_bitfield() serial: qcom-geni: fix arg types for qcom_geni_serial_poll_bit() soc: qcom: geni-se: add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers serial: qcom-geni: fix false console tx restart serial: qcom-geni: fix fifo polling timeout tty: hvc: convert comma to semicolon mxser: convert comma to semicolon serial: 8250_bcm2835aux: Fix clock imbalance in PM resume serial: sc16is7xx: convert bitmask definitions to use BIT() macro serial: sc16is7xx: fix copy-paste errors in EFR_SWFLOWx_BIT constants serial: sc16is7xx: remove SC16IS7XX_MSR_DELTA_MASK serial: xilinx_uartps: Make cdns_rs485_supported static ...
2024-09-26Merge tag 'usb-6.12-rc1' of ↵Linus Torvalds11-46/+261
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/Thunderbolt updates from Greg KH: "Here is the large set of USB and Thunderbolt changes for 6.12-rc1. Nothing "major" in here, except for a new 9p network gadget that has been worked on for a long time (all of the needed acks are here) Other than that, it's the usual set of: - Thunderbolt / USB4 driver updates and additions for new hardware - dwc3 driver updates and new features added - xhci driver updates - typec driver updates - USB gadget updates and api additions to make some gadgets more configurable by userspace - dwc2 driver updates - usb phy driver updates - usbip feature additions - other minor USB driver updates All of these have been in linux-next for a long time with no reported issues" * tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (145 commits) sub: cdns3: Use predefined PCI vendor ID constant sub: cdns2: Use predefined PCI vendor ID constant USB: misc: yurex: fix race between read and write USB: misc: cypress_cy7c63: check for short transfer USB: appledisplay: close race between probe and completion handler USB: class: CDC-ACM: fix race between get_serial and set_serial usb: r8a66597-hcd: make read-only const arrays static usb: typec: ucsi: Fix busy loop on ASUS VivoBooks usb: dwc3: rtk: Clean up error code in __get_dwc3_maximum_speed() usb: storage: ene_ub6250: Fix right shift warnings usb: roles: Improve the fix for a false positive recursive locking complaint locking/mutex: Introduce mutex_init_with_key() locking/mutex: Define mutex_init() once net/9p/usbg: fix CONFIG_USB_GADGET dependency usb: xhci: fix loss of data on Cadence xHC usb: xHCI: add XHCI_RESET_ON_RESUME quirk for Phytium xHCI host usb: dwc3: imx8mp: disable SS_CON and U3 wakeup for system sleep usb: dwc3: imx8mp: add 2 software managed quirk properties for host mode usb: host: xhci-plat: Parse xhci-missing_cas_quirk and apply quirk usb: misc: onboard_usb_dev: add Microchip usb5744 SMBus programming support ...
2024-09-26Merge tag 'probes-v6.12' of ↵Linus Torvalds2-9/+20
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: - uprobes: make trace_uprobe->nhit counter a per-CPU one This makes uprobe event's hit counter per-CPU for improving scalability on multi-core environment - kprobes: Remove obsoleted declaration for init_test_probes Remove unused init_test_probes() from header - Raw tracepoint probe supports raw tracepoint events on modules: - add a function for iterating over all tracepoints in all modules - add a function for iterating over tracepoints in a module - support raw tracepoint events on modules - support raw tracepoints on future loaded modules - add a test for tracepoint events on modules" * tag 'probes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: sefltests/tracing: Add a test for tracepoint events on modules tracing/fprobe: Support raw tracepoints on future loaded modules tracing/fprobe: Support raw tracepoint events on modules tracepoint: Support iterating tracepoints in a loading module tracepoint: Support iterating over tracepoints on modules kprobes: Remove obsoleted declaration for init_test_probes uprobes: turn trace_uprobe's nhit counter to be per-CPU one
2024-09-26Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds3-2/+24
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-balloon supports new stats - vdpa supports setting mac address - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster - virtio_fs supports new sysfs entries for queue info - virtio/vsock performance has been improved And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vsock/virtio: avoid queuing packets when intermediate queue is empty vsock/virtio: refactor virtio_transport_send_pkt_work fw_cfg: Constify struct kobj_type vdpa/mlx5: Postpone MR deletion vdpa/mlx5: Introduce init/destroy for MR resources vdpa/mlx5: Rename mr_mtx -> lock vdpa/mlx5: Extract mr members in own resource struct vdpa/mlx5: Rename function vdpa/mlx5: Delete direct MKEYs in parallel vdpa/mlx5: Create direct MKEYs in parallel MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section virtio_fs: add sysfs entries for queue information virtio_fs: introduce virtio_fs_put_locked helper vdpa: Remove unused declarations vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command vdpa/mlx5: Small improvement for change_num_qps() vdpa/mlx5: Keep notifiers during suspend but ignore vdpa/mlx5: Parallelize device resume vdpa/mlx5: Parallelize device suspend vdpa/mlx5: Use async API for vq modify commands ...
2024-09-26netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTEDPhil Sutter1-1/+1
Fix the comment which incorrectly defines it as NLA_U32. Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26Merge tag 'nf-24-09-26' of ↵Paolo Abeni1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net v2: with kdoc fixes per Paolo Abeni. The following patchset contains Netfilter fixes for net: Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP packets to one another, two packets of the same flow in each direction handled by different CPUs that result in two conntrack objects in NEW state, where reply packet loses race. Then, patch #3 adds a testcase for this scenario. Series from Florian Westphal. 1) NAT engine can falsely detect a port collision if it happens to pick up a reply packet as NEW rather than ESTABLISHED. Add extra code to detect this and suppress port reallocation in this case. 2) To complete the clash resolution in the reply direction, extend conntrack logic to detect clashing conntrack in the reply direction to existing entry. 3) Adds a test case. Then, an assorted list of fixes follow: 4) Add a selftest for tproxy, from Antonio Ojea. 5) Guard ctnetlink_*_size() functions under #if defined(CONFIG_NETFILTER_NETLINK_GLUE_CT) || defined(CONFIG_NF_CONNTRACK_EVENTS) From Andy Shevchenko. 6) Use -m socket --transparent in iptables tproxy documentation. From XIE Zhibang. 7) Call kfree_rcu() when releasing flowtable hooks to address race with netlink dump path, from Phil Sutter. 8) Fix compilation warning in nf_reject with CONFIG_BRIDGE_NETFILTER=n. From Simon Horman. 9) Guard ctnetlink_label_size() under CONFIG_NF_CONNTRACK_EVENTS which is its only user, to address a compilation warning. From Simon Horman. 10) Use rcu-protected list iteration over basechain hooks from netlink dump path. 11) Fix memcg for nf_tables, use GFP_KERNEL_ACCOUNT is not complete. 12) Remove old nfqueue conntrack clash resolution. Instead trying to use same destination address consistently which requires double DNAT, use the existing clash resolution which allows clashing packets go through with different destination. Antonio Ojea originally reported an issue from the postrouting chain, I proposed a fix: https://lore.kernel.org/netfilter-devel/ZuwSwAqKgCB2a51-@calendula/T/ which he reported it did not work for him. 13) Adds a selftest for patch 12. 14) Fixes ipvs.sh selftest. netfilter pull request 24-09-26 * tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: Avoid hanging ipvs.sh kselftest: add test for nfqueue induced conntrack race netfilter: nfnetlink_queue: remove old clash resolution logic netfilter: nf_tables: missing objects with no memcg accounting netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n netfilter: nf_tables: Keep deleted flowtable hooks until after RCU docs: tproxy: ignore non-transparent sockets in iptables netfilter: ctnetlink: Guard possible unused functions selftests: netfilter: nft_tproxy.sh: add tcp tests selftests: netfilter: add reverse-clash resolution test case netfilter: conntrack: add clash resolution for reverse collisions netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash ==================== Link: https://patch.msgid.link/20240926110717.102194-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-26netfilter: nfnetlink_queue: remove old clash resolution logicFlorian Westphal1-4/+0
For historical reasons there are two clash resolution spots in netfilter, one in nfnetlink_queue and one in conntrack core. nfnetlink_queue one was added first: If a colliding entry is found, NAT NAT transformation is reversed by calling nat engine again with altered tuple. See commit 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks") for details. One problem is that nf_reroute() won't take an action if the queueing doesn't occur in the OUTPUT hook, i.e. when queueing in forward or postrouting, packet will be sent via the wrong path. Another problem is that the scenario addressed (2nd UDP packet sent with identical addresses while first packet is still being processed) can also occur without any nfqueue involvement due to threaded resolvers doing A and AAAA requests back-to-back. This lead us to add clash resolution logic to the conntrack core, see commit 6a757c07e51f ("netfilter: conntrack: allow insertion of clashing entries"). Instead of fixing the nfqueue based logic, lets remove it and let conntrack core handle this instead. Retain the ->update hook for sake of nfqueue based conntrack helpers. We could axe this hook completely but we'd have to split confirm and helper logic again, see commit ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again"). This SHOULD NOT be backported to kernels earlier than v5.6; they lack adequate clash resolution handling. Patch was originally written by Pablo Neira Ayuso. Reported-by: Antonio Ojea <aojea@google.com> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1766 Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Antonio Ojea <aojea@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>