aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-05zsmalloc: document freeable statsSergey Senozhatsky1-0/+2
When freeable class stat was added to classes file (back in 2016) we forgot to update zsmalloc documentation. Fix that. Link: https://lkml.kernel.org/r/[email protected] Fixes: 1120ed548394 ("mm/zsmalloc: add `freeable' column to pool stat") Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Minchan Kim <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05zsmalloc: document new fullness groupingSergey Senozhatsky1-59/+74
Patch series "zsmalloc: minor documentation updates". Two minor patches that bring zsmalloc documentation up to date. This patch (of 2): Update documentation and reflect new zspages fullness grouping (we don't use almost_empty and almost_full anymore). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sergey Senozhatsky <[email protected]> Fixes: 67e157eb3639 ("zsmalloc: show per fullness group class stats") Cc: Minchan Kim <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05fsdax: force clear dirty mark if CoWShiyang Ruan1-0/+37
XFS allows CoW on non-shared extents to combat fragmentation[1]. The old non-shared extent could be mwrited before, its dax entry is marked dirty. This results in a WARNing: [ 28.512349] ------------[ cut here ]------------ [ 28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390 [ 28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables [ 28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 6.3.0-rc1-00001-g85e1481e19c1-dirty #117 [ 28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014 [ 28.516307] RIP: 0010:dax_insert_entry+0x342/0x390 [ 28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48 [ 28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086 [ 28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 000000000018824b [ 28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 00000000ffffffff [ 28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: ffffc9000845f9c8 [ 28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 0000000000000155 [ 28.519301] R13: 0000000000000000 R14: 000000000018824b R15: ffff88810cfa76b8 [ 28.519703] FS: 00007f14a0c94740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000 [ 28.520148] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 0000000000770ee0 [ 28.520863] PKRU: 55555554 [ 28.521043] Call Trace: [ 28.521219] <TASK> [ 28.521368] dax_fault_iter+0x196/0x390 [ 28.521595] dax_iomap_pte_fault+0x19b/0x3d0 [ 28.521852] __xfs_filemap_fault+0x234/0x2b0 [ 28.522116] __do_fault+0x30/0x130 [ 28.522334] do_fault+0x193/0x340 [ 28.522586] __handle_mm_fault+0x2d3/0x690 [ 28.522975] handle_mm_fault+0xe6/0x2c0 [ 28.523259] do_user_addr_fault+0x1bc/0x6f0 [ 28.523521] exc_page_fault+0x60/0x140 [ 28.523763] asm_exc_page_fault+0x22/0x30 [ 28.524001] RIP: 0033:0x7f14a0b589ca [ 28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 [ 28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202 [ 28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 0000000000006046 [ 28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 00007f14a0c8d000 [ 28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 000000000014a000 [ 28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 028f5c28f5c28f5c [ 28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 00007f14a0c946c0 [ 28.527449] </TASK> [ 28.527600] ---[ end trace 0000000000000000 ]--- To be able to delete this entry, clear its dirty mark before invalidate_inode_pages2_range(). [1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/ Link: https://lkml.kernel.org/r/[email protected] Fixes: f80e1668888f3 ("fsdax: invalidate pages when CoW") Signed-off-by: Shiyang Ruan <[email protected]> Cc: Dan Williams <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Jan Kara <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm/hugetlb: fix uffd wr-protection for CoW optimization pathPeter Xu1-2/+12
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be writable even with uffd-wp bit set. It only happens with hugetlb private mappings, when someone firstly wr-protects a missing pte (which will install a pte marker), then a write to the same page without any prior access to the page. Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before reaching hugetlb_wp() to avoid taking more locks that userfault won't need. However there's one CoW optimization path that can trigger hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap. This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit is detected. The new path will only trigger in the CoW optimization path because generic hugetlb_fault() (e.g. when a present pte was wr-protected) will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE won't be affected and can still be resolved, IOW only skip CoW not CoR. This patch will be needed for v5.19+ hence copy stable. [[email protected]: v2] Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n [[email protected]: v3] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu <[email protected]> Reported-by: Muhammad Usama Anjum <[email protected]> Tested-by: Muhammad Usama Anjum <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: enable maple tree RCU mode by defaultLiam R. Howlett3-2/+7
Use the maple tree in RCU mode for VMA tracking. The maple tree tracks the stack and is able to update the pivot (lower/upper boundary) in-place to allow the page fault handler to write to the tree while holding just the mmap read lock. This is safe as the writes to the stack have a guard VMA which ensures there will always be a NULL in the direction of the growth and thus will only update a pivot. It is possible, but not recommended, to have VMAs that grow up/down without guard VMAs. syzbot has constructed a testcase which sets up a VMA to grow and consume the empty space. Overwriting the entire NULL entry causes the tree to be altered in a way that is not safe for concurrent readers; the readers may see a node being rewritten or one that does not match the maple state they are using. Enabling RCU mode allows the concurrent readers to see a stable node and will return the expected result. [[email protected]: we don't need to free the nodes with RCU[ Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree") Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Reported-by: [email protected] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05maple_tree: add RCU lock checking to rcu callback functionsLiam R. Howlett1-92/+96
Dereferencing RCU objects within the RCU callback without the RCU check has caused lockdep to complain. Fix the RCU dereferencing by using the RCU callback lock to ensure the operation is safe. Also stop creating a new lock to use for dereferencing during destruction of the tree or subtree. Instead, pass through a pointer to the tree that has the lock that is held for RCU dereferencing checking. It also does not make sense to use the maple state in the freeing scenario as the tree walk is a special case where the tree no longer has the normal encodings and parent pointers. Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <[email protected]> Reported-by: Suren Baghdasaryan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05maple_tree: add smp_rmb() to dead node detectionLiam R. Howlett1-2/+6
Add an smp_rmb() before reading the parent pointer to ensure that anything read from the node prior to the parent pointer hasn't been reordered ahead of this check. The is necessary for RCU mode. Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05maple_tree: fix write memory barrier of nodes once dead for RCU modeLiam R. Howlett2-2/+21
During the development of the maple tree, the strategy of freeing multiple nodes changed and, in the process, the pivots were reused to store pointers to dead nodes. To ensure the readers see accurate pivots, the writers need to mark the nodes as dead and call smp_wmb() to ensure any readers can identify the node as dead before using the pivot values. There were two places where the old method of marking the node as dead without smp_wmb() were being used, which resulted in RCU readers seeing the wrong pivot value before seeing the node was dead. Fix this race condition by using mte_set_node_dead() which has the smp_wmb() call to ensure the race is closed. Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed are marked as dead to ensure there are no other call paths besides the two updated paths. This is necessary for the RCU mode of the maple tree. Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05maple_tree: remove extra smp_wmb() from mas_dead_leaves()Liam Howlett1-1/+0
The call to mte_set_dead_node() before the smp_wmb() already calls smp_wmb() so this is not needed. This is an optimization for the RCU mode of the maple tree. Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05maple_tree: fix freeing of nodes in rcu modeLiam Howlett1-11/+62
The walk to destroy the nodes was not always setting the node type and would result in a destroy method potentially using the values as nodes. Avoid this by setting the correct node types. This is necessary for the RCU mode of the maple tree. Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05maple_tree: detect dead nodes in mas_start()Liam Howlett1-0/+4
When initially starting a search, the root node may already be in the process of being replaced in RCU mode. Detect and restart the walk if this is the case. This is necessary for RCU mode of the maple tree. Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05maple_tree: be more cautious about dead nodesLiam Howlett1-9/+43
Patch series "Fix VMA tree modification under mmap read lock". Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an inconsistency between threads walking the VMA maple tree. The inconsistency is caused by the page fault handler modifying the maple tree while holding the mmap_lock for read. This only happens for stack VMAs. We had thought this was safe as it only modifies a single pivot in the tree. Unfortunately, syzbot constructed a test case where the stack had no guard page and grew the stack to abut the next VMA. This causes us to delete the NULL entry between the two VMAs and rewrite the node. We considered several options for fixing this, including dropping the mmap_lock, then reacquiring it for write; and relaxing the definition of the tree to permit a zero-length NULL entry in the node. We decided the best option was to backport some of the RCU patches from -next, which solve the problem by allocating a new node and RCU-freeing the old node. Since the problem exists in 6.1, we preferred a solution which is similar to the one we intended to merge next merge window. These patches have been in -next since next-20230301, and have received intensive testing in Android as part of the RCU page fault patchset. They were also sent as part of the "Per-VMA locks" v4 patch series. Patches 1 to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode for the tree. Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through mmtests showed there was a 15-20% performance decrease in will-it-scale/brk1-processes. This tests creating and inserting a single VMA repeatedly through the brk interface and isn't representative of any real world applications. This patch (of 8): ma_pivots() and ma_data_end() may be called with a dead node. Ensure to that the node isn't dead before using the returned values. This is necessary for RCU mode of the maple tree. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arjun Roy <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Li <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: freak07 <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Punit Agrawal <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05Merge tag 'wireless-2023-04-05' of ↵Jakub Kicinski13-110/+97
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.3 mt76 has a fix for leaking cleartext frames on a certain scenario and two firmware file handling related fixes. For brcmfmac we have a fix for an older SDIO suspend regression and for ath11k avoiding a kernel crash during hibernation with SUSE kernels. * tag 'wireless-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mt76: ignore key disable commands wifi: ath11k: reduce the MHI timeout to 20s wifi: mt76: mt7921: fix fw used for offload check for mt7922 wifi: mt76: mt7921: Fix use-after-free in fw features query. wifi: brcmfmac: Fix SDIO suspend/resume regression ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-05Merge tag 'linux-can-fixes-for-6.3-20230405' of ↵Jakub Kicinski2-27/+52
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2023-04-05 The first patch is by Oleksij Rempel and fixes a out-of-bounds memory access in the j1939 protocol. The remaining 3 patches target the ISOTP protocol. Oliver Hartkopp fixes the ISOTP protocol to pass information about dropped PDUs to the user space via control messages. Michal Sojka's patch fixes poll() to not forward false EPOLLOUT events. And Oliver Hartkopp fixes a race condition between isotp_sendsmg() and isotp_release(). * tag 'linux-can-fixes-for-6.3-20230405' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: isotp: fix race between isotp_sendsmg() and isotp_release() can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-05Merge branch '100GbE' of ↵Jakub Kicinski1-3/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-04-04 (ice) This series contains updates to ice driver only. Simei adjusts error path on adding VF Flow Director filters that were not releasing all resources. Lingyu adds setting/resetting of VF Flow Director filters counters during initialization. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Reset FDIR counter in FDIR init stage ice: fix wrong fallback logic for FDIR ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-05Merge tag 'drm-misc-fixes-2023-04-05' of ↵Daniel Vetter5-29/+56
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * ivpu: DMA fence and suspend fixes * nouveau: Color-depth fixes * panfrost: Fix mmap error handling Signed-off-by: Daniel Vetter <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/20230405182855.GA1551@linux-uq9g
2023-04-05ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530Hans de Goede1-0/+14
The Lenovo ThinkPad W530 uses a nvidia k1000m GPU. When this gets used together with one of the older nvidia binary driver series (the latest series does not support it), then backlight control does not work. This is caused by commit 3dbc80a3e4c5 ("ACPI: video: Make backlight class device registration a separate step (v2)") combined with commit 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default"). After these changes the acpi_video# backlight device is only registered when requested by a GPU driver calling acpi_video_register_backlight() which the nvidia binary driver does not do. I realize that using the nvidia binary driver is not a supported use-case and users can workaround this by adding acpi_backlight=video on the kernel commandline, but the ThinkPad W530 is a popular model under Linux users, so it seems worthwhile to add a quirk for this. I will also email Nvidia asking them to make the driver call acpi_video_register_backlight() when an internal LCD panel is detected. So maybe the next maintenance release of the drivers will fix this... Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-04-05ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2Hans de Goede1-0/+23
On the Apple iMac14,1 and iMac14,2 all-in-ones (monitors with builtin "PC") the connection between the GPU and the panel is seen by the GPU driver as regular DP instead of eDP, causing the GPU driver to never call acpi_video_register_backlight(). (GPU drivers only call acpi_video_register_backlight() when an internal panel is detected, to avoid non working acpi_video# devices getting registered on desktops which unfortunately is a real issue.) Fix the missing acpi_video# backlight device on these all-in-ones by adding a acpi_backlight=video DMI quirk, so that video.ko will immediately register the backlight device instead of waiting for an acpi_video_register_backlight() call. Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-04-05ACPI: video: Make acpi_backlight=video work independent from GPU driverHans de Goede1-2/+13
Commit 3dbc80a3e4c5 ("ACPI: video: Make backlight class device registration a separate step (v2)") combined with commit 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Means that the video.ko code now fully depends on the GPU driver calling acpi_video_register_backlight() for the acpi_video# backlight class devices to get registered. This means that if the GPU driver does not do this, acpi_backlight=video on the cmdline, or DMI quirks for selecting acpi_video# will not work. This is a problem on for example Apple iMac14,1 all-in-ones where the monitor's LCD panel shows up as a regular DP connection instead of eDP so the GPU driver will not call acpi_video_register_backlight() [1]. Fix this by making video.ko directly register the acpi_video# devices when these have been explicitly requested either on the cmdline or through DMI quirks (rather then auto-detection being used). [1] GPU drivers only call acpi_video_register_backlight() when an internal panel is detected, to avoid non working acpi_video# devices getting registered on desktops which unfortunately is a real issue. Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-04-05ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()Hans de Goede2-15/+21
Allow callers of __acpi_video_get_backlight_type() to pass a pointer to a bool which will get set to false if the backlight-type comes from the cmdline or a DMI quirk and set to true if auto-detection was used. And make __acpi_video_get_backlight_type() non static so that it can be called directly outside of video_detect.c . While at it turn the acpi_video_get_backlight_type() and acpi_video_backlight_use_native() wrappers into static inline functions in include/acpi/video.h, so that we need to export one less symbol. Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-04-05usb: cdnsp: Fixes error: uninitialized symbol 'len'Pawel Laszczak1-2/+1
The patch 5bc38d33a5a1: "usb: cdnsp: Fixes issue with redundant Status Stage" leads to the following Smatch static checker warning: drivers/usb/cdns3/cdnsp-ep0.c:470 cdnsp_setup_analyze() error: uninitialized symbol 'len'. cc: <[email protected]> Fixes: 5bc38d33a5a1 ("usb: cdnsp: Fixes issue with redundant Status Stage") Signed-off-by: Pawel Laszczak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05cifs: sanitize paths in cifs_update_super_prepath.Thiago Rafael Becker3-7/+11
After a server reboot, clients are failing to move files with ENOENT. This is caused by DFS referrals containing multiple separators, which the server move call doesn't recognize. v1: Initial patch. v2: Move prototype to header. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182472 Fixes: a31080899d5f ("cifs: sanitize multiple delimiters in prepath") Actually-Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Reviewed-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Thiago Rafael Becker <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-04-05usb: gadgetfs: Fix ep_read_iter to handle ITER_UBUFSandeep Dhavale1-1/+1
iov_iter for ep_read_iter can be ITER_UBUF with io_uring. In that case dup_iter() does not have to allocate iov and it can return NULL. Fix the assumption by checking for iter_is_ubuf() other wise ep_read_iter can treat this as failure and return -ENOMEM. Fixes: 1e23db450cff ("io_uring: use iter_ubuf for single range imports") Signed-off-by: Sandeep Dhavale <[email protected]> Acked-by: Jens Axboe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05usb: gadget: f_fs: Fix ffs_epfile_read_iter to handle ITER_UBUFSandeep Dhavale1-1/+1
iov_iter for ffs_epfile_read_iter can be ITER_UBUF with io_uring. In that case dup_iter() does not have to allocate anything and it can return NULL. ffs_epfile_read_iter treats this as a failure and returns -ENOMEM. Fix it by checking if iter_is_ubuf(). Fixes: 1e23db450cff ("io_uring: use iter_ubuf for single range imports") Signed-off-by: Sandeep Dhavale <[email protected]> Acked-by: Jens Axboe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05usb: typec: altmodes/displayport: Fix configure initial pin assignmentRD Babiera1-1/+5
While determining the initial pin assignment to be sent in the configure message, using the DP_PIN_ASSIGN_DP_ONLY_MASK mask causes the DFP_U to send both Pin Assignment C and E when both are supported by the DFP_U and UFP_U. The spec (Table 5-7 DFP_U Pin Assignment Selection Mandates, VESA DisplayPort Alt Mode Standard v2.0) indicates that the DFP_U never selects Pin Assignment E when Pin Assignment C is offered. Update the DP_PIN_ASSIGN_DP_ONLY_MASK conditional to intially select only Pin Assignment C if it is available. Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode") Cc: [email protected] Signed-off-by: RD Babiera <[email protected]> Reviewed-by: Heikki Krogerus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05usb: dwc3: pci: add support for the Intel Meteor Lake-SHeikki Krogerus1-0/+4
This patch adds the necessary PCI ID for Intel Meteor Lake-S devices. Signed-off-by: Heikki Krogerus <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05arm64: compat: Work around uninitialized variable warningArd Biesheuvel1-18/+14
Dan reports that smatch complains about a potential uninitialized variable being used in the compat alignment fixup code. The logic is not wrong per se, but we do end up using an uninitialized variable if reading the instruction that triggered the alignment fault from user space faults, even if the fault ensures that the uninitialized value doesn't propagate any further. Given that we just give up and return 1 if any fault occurs when reading the instruction, let's get rid of the 'success handling' pattern that captures the fault in a variable and aborts later, and instead, just return 1 immediately if any of the get_user() calls result in an exception. Fixes: 3fc24ef32d3b ("arm64: compat: Implement misalignment fixups for multiword loads") Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2023-04-05Merge tag 'trace-v6.3-rc5' of ↵Linus Torvalds6-24/+63
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix timerlat notification, as it was not triggering the notify to users when a new max latency was hit. - Do not trigger max latency if the tracing is off. When tracing is off, the ring buffer is not updated, it does not make sense to notify when there's a new max latency detected by the tracer, as why that latency happened is not available. The tracing logic still runs when the ring buffer is disabled, but it should not be triggering notifications. - Fix race on freeing the synthetic event "last_cmd" variable by adding a mutex around it. - Fix race between reader and writer of the ring buffer by adding memory barriers. When the writer is still on the reader page it must have its content visible on the buffer before it moves the commit index that the reader uses to know how much content is on the page. - Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_ and _RET_IP_, which gets broken if it is not inlined. - Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build. The field formats of trace events are calculated by using sizeof(type) and other means by what is passed into the structure macros like __field(). The __field() macro is only meant for atom types like int, long, short, pointer, etc. It is not meant for arrays. The code will currently compile with arrays, but then the format produced will be inaccurate, and user space parsing tools will break. Two bugs have already been fixed, now add code that will make the kernel fail to build if another trace event includes this buggy field format. - Fix boot up snapshot code: Boot snapshots were triggering when not even asked for on the kernel command line. This was caused by two bugs: 1) It would trigger a snapshot on any instance if one was created from the kernel command line. 2) The error handling would only affect the top level instance. So the fact that a snapshot was done on a instance that didn't allocate a buffer triggered a warning written into the top level buffer, and worse yet, disabled the top level buffer. - Fix memory leak that was caused when an error was logged in a trace buffer instance, and then the buffer instance was removed. The allocated error log messages still needed to be freed. * tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Free error logs of tracing instances tracing: Fix ftrace_boot_snapshot command line logic tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance tracing: Error if a trace event has an array for a __field() tracing/osnoise: Fix notify new tracing_max_latency tracing/timerlat: Notify new max thread latency ftrace: Mark get_lock_parent_ip() __always_inline ring-buffer: Fix race while reader and writer are on the same page tracing/synthetic: Fix races on freeing last_cmd
2023-04-05nvme: fix discard support without oncsKeith Busch1-3/+3
The device can report discard support without setting the ONCS DSM bit. When not set, the driver clears max_discard_size expecting it to be set later. We don't know the size until we have the namespace format, though, so setting it is deferred until configuring one, but the driver was abandoning the discard settings due to that initial clearing. Move the max_discard_size calculation above the check for a '0' discard size. Fixes: 1a86924e4f46475 ("nvme: fix interpretation of DMRSL") Reported-by: Laurence Oberman <[email protected]> Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Niklas Cassel <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Tested-by: Laurence Oberman <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-04-05tracing: Free error logs of tracing instancesSteven Rostedt (Google)1-0/+1
When a tracing instance is removed, the error messages that hold errors that occurred in the instance needs to be freed. The following reports a memory leak: # cd /sys/kernel/tracing # mkdir instances/foo # echo 'hist:keys=x' > instances/foo/events/sched/sched_switch/trigger # cat instances/foo/error_log [ 117.404795] hist:sched:sched_switch: error: Couldn't find field Command: hist:keys=x ^ # rmdir instances/foo Then check for memory leaks: # echo scan > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff88810d8ec700 (size 192): comm "bash", pid 869, jiffies 4294950577 (age 215.752s) hex dump (first 32 bytes): 60 dd 68 61 81 88 ff ff 60 dd 68 61 81 88 ff ff `.ha....`.ha.... a0 30 8c 83 ff ff ff ff 26 00 0a 00 00 00 00 00 .0......&....... backtrace: [<00000000dae26536>] kmalloc_trace+0x2a/0xa0 [<00000000b2938940>] tracing_log_err+0x277/0x2e0 [<000000004a0e1b07>] parse_atom+0x966/0xb40 [<0000000023b24337>] parse_expr+0x5f3/0xdb0 [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560 [<00000000293a9645>] trigger_process_regex+0x135/0x1a0 [<000000005c22b4f2>] event_trigger_write+0x87/0xf0 [<000000002cadc509>] vfs_write+0x162/0x670 [<0000000059c3b9be>] ksys_write+0xca/0x170 [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0 [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc unreferenced object 0xffff888170c35a00 (size 32): comm "bash", pid 869, jiffies 4294950577 (age 215.752s) hex dump (first 32 bytes): 0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 68 69 73 74 . Command: hist 3a 6b 65 79 73 3d 78 0a 00 00 00 00 00 00 00 00 :keys=x......... backtrace: [<000000006a747de5>] __kmalloc+0x4d/0x160 [<000000000039df5f>] tracing_log_err+0x29b/0x2e0 [<000000004a0e1b07>] parse_atom+0x966/0xb40 [<0000000023b24337>] parse_expr+0x5f3/0xdb0 [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560 [<00000000293a9645>] trigger_process_regex+0x135/0x1a0 [<000000005c22b4f2>] event_trigger_write+0x87/0xf0 [<000000002cadc509>] vfs_write+0x162/0x670 [<0000000059c3b9be>] ksys_write+0xca/0x170 [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0 [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc The problem is that the error log needs to be freed when the instance is removed. Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Eric Biggers <[email protected]> Fixes: 2f754e771b1a6 ("tracing: Have the error logs show up in the proper instances") Reported-by: Mirsad Goran Todorovac <[email protected]> Tested-by: Mirsad Todorovac <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-05x86/cpu: Add model number for Intel Arrow Lake processorTony Luck1-0/+2
Successor to Lunar Lake. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-05can: isotp: fix race between isotp_sendsmg() and isotp_release()Oliver Hartkopp1-24/+31
As discussed with Dae R. Jeong and Hillf Danton here [1] the sendmsg() function in isotp.c might get into a race condition when restoring the former tx.state from the old_state. Remove the old_state concept and implement proper locking for the ISOTP_IDLE transitions in isotp_sendmsg(), inspired by a simplification idea from Hillf Danton. Introduce a new tx.state ISOTP_SHUTDOWN and use the same locking mechanism from isotp_release() which resolves a potential race between isotp_sendsmg() and isotp_release(). [1] https://lore.kernel.org/linux-can/ZB%2F93xJxq%2FBUqAgG@dragonet v1: https://lore.kernel.org/all/[email protected] v2: https://lore.kernel.org/all/[email protected] take care of signal interrupts for wait_event_interruptible() in isotp_release() v3: https://lore.kernel.org/all/[email protected] take care of signal interrupts for wait_event_interruptible() in isotp_sendmsg() in the wait_tx_done case v4: https://lore.kernel.org/all/[email protected] take care of signal interrupts for wait_event_interruptible() in isotp_sendmsg() in ALL cases Cc: Dae R. Jeong <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Oliver Hartkopp <[email protected]> Fixes: 4f027cba8216 ("can: isotp: split tx timer into transmission and timeout") Link: https://lore.kernel.org/all/[email protected] Cc: [email protected] [mkl: rephrase commit message] Signed-off-by: Marc Kleine-Budde <[email protected]>
2023-04-05Merge tag 'drm-intel-fixes-2023-04-05' of ↵Daniel Vetter6-14/+25
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v6.3-rc6: - Fix DP MST DSC M/N calculation to use compressed bpp - Fix racy use-after-free in perf ioctl - Fix context runtime accounting - Fix handling of GT reset during HuC loading - Fix use of unsigned vm_fault_t for error values Signed-off-by: Daniel Vetter <[email protected]> From: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-04-05can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT eventsMichal Sojka1-1/+16
When using select()/poll()/epoll() with a non-blocking ISOTP socket to wait for when non-blocking write is possible, a false EPOLLOUT event is sometimes returned. This can happen at least after sending a message which must be split to multiple CAN frames. The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is not equal to ISOTP_IDLE and this behavior is not reflected in datagram_poll(), which is used in isotp_ops. This is fixed by introducing ISOTP-specific poll function, which suppresses the EPOLLOUT events in that case. v2: https://lore.kernel.org/all/[email protected] v1: https://lore.kernel.org/all/[email protected] https://lore.kernel.org/all/[email protected] Signed-off-by: Michal Sojka <[email protected]> Reported-by: Jakub Jira <[email protected]> Tested-by: Oliver Hartkopp <[email protected]> Acked-by: Oliver Hartkopp <[email protected]> Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/[email protected] Cc: [email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
2023-04-05can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infosOliver Hartkopp1-1/+1
isotp.c was still using sock_recv_timestamp() which does not provide control messages to detect dropped PDUs in the receive path. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Signed-off-by: Oliver Hartkopp <[email protected]> Link: https://lore.kernel.org/all/[email protected] Cc: [email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
2023-04-05can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory accessOleksij Rempel1-1/+4
In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access could occur during the memcpy() operation if the size of skb->cb is larger than the size of struct j1939_sk_buff_cb. This is because the memcpy() operation uses the size of skb->cb, leading to a read beyond the struct j1939_sk_buff_cb. Updated the memcpy() operation to use the size of struct j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the memcpy() operation only reads the memory within the bounds of struct j1939_sk_buff_cb, preventing out-of-bounds memory access. Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb is greater than or equal to the size of struct j1939_sk_buff_cb. This ensures that the skb->cb buffer is large enough to hold the j1939_sk_buff_cb structure. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Reported-by: Shuangpeng Bai <[email protected]> Tested-by: Shuangpeng Bai <[email protected]> Signed-off-by: Oleksij Rempel <[email protected]> Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ Link: https://lore.kernel.org/all/[email protected] Cc: [email protected] [mkl: rephrase commit message] Signed-off-by: Marc Kleine-Budde <[email protected]>
2023-04-05perf/core: Fix the same task check in perf_event_set_outputKan Liang1-1/+1
The same task check in perf_event_set_output has some potential issues for some usages. For the current perf code, there is a problem if using of perf_event_open() to have multiple samples getting into the same mmap’d memory when they are both attached to the same process. https://lore.kernel.org/all/[email protected]/ Because the event->ctx is not ready when the perf_event_set_output() is invoked in the perf_event_open(). Besides the above issue, before the commit bd2756811766 ("perf: Rewrite core context handling"), perf record can errors out when sampling with a hardware event and a software event as below. $ perf record -e cycles,dummy --per-thread ls failed to mmap with 22 (Invalid argument) That's because that prior to the commit a hardware event and a software event are from different task context. The problem should be a long time issue since commit c3f00c70276d ("perk: Separate find_get_context() from event initialization"). The task struct is stored in the event->hw.target for each per-thread event. It is a more reliable way to determine whether two events are attached to the same task. The event->hw.target was also introduced several years ago by the commit 50f16a8bf9d7 ("perf: Remove type specific target pointers"). It can not only be used to fix the issue with the current code, but also back port to fix the issues with an older kernel. Note: The event->hw.target was introduced later than commit c3f00c70276d. The patch may cannot be applied between the commit c3f00c70276d and commit 50f16a8bf9d7. Anybody that wants to back-port this at that period may have to find other solutions. Fixes: c3f00c70276d ("perf: Separate find_get_context() from event initialization") Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Zhengjun Xing <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-04-05perf: Optimize perf_pmu_migrate_context()Peter Zijlstra1-5/+7
Thomas reported that offlining CPUs spends a lot of time in synchronize_rcu() as called from perf_pmu_migrate_context() even though he's not actually using uncore events. Turns out, the thing is unconditionally waiting for RCU, even if there's no actual events to migrate. Fixes: 0cda4c023132 ("perf: Introduce perf_pmu_migrate_context()") Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-04-05accel/ivpu: Fix S3 system suspend when not idleJacek Lawrynowicz1-15/+11
Wait for VPU to be idle in ivpu_pm_suspend_cb() before powering off the device, so jobs are not lost and TDRs are not triggered after resume. Fixes: 852be13f3bd3 ("accel/ivpu: Add PM support") Signed-off-by: Stanislaw Gruszka <[email protected]> Reviewed-by: Jeffrey Hugo <[email protected]> Signed-off-by: Jacek Lawrynowicz <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-04-05accel/ivpu: Add dma fence to command buffers onlyKarol Wachowski1-11/+7
Currently job->done_fence is added to every BO handle within a job. If job handle (command buffer) is shared between multiple submits, KMD will add the fence in each of them. Then bo_wait_ioctl() executed on command buffer will exit only when all jobs containing that handle are done. This creates deadlock scenario for user mode driver in case when job handle is added as dependency of another job, because bo_wait_ioctl() of first job will wait until second job finishes, and second job can not finish before first one. Having fences added only to job buffer handle allows user space to execute bo_wait_ioctl() on the job even if it's handle is submitted with other job. Fixes: cd7272215c44 ("accel/ivpu: Add command buffer submission logic") Signed-off-by: Karol Wachowski <[email protected]> Signed-off-by: Stanislaw Gruszka <[email protected]> Reviewed-by: Jeffrey Hugo <[email protected]> Signed-off-by: Jacek Lawrynowicz <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-04-04tracing: Fix ftrace_boot_snapshot command line logicSteven Rostedt (Google)1-6/+7
The kernel command line ftrace_boot_snapshot by itself is supposed to trigger a snapshot at the end of boot up of the main top level trace buffer. A ftrace_boot_snapshot=foo will do the same for an instance called foo that was created by trace_instance=foo,... The logic was broken where if ftrace_boot_snapshot was by itself, it would trigger a snapshot for all instances that had tracing enabled, regardless if it asked for a snapshot or not. When a snapshot is requested for a buffer, the buffer's tr->allocated_snapshot is set to true. Use that to know if a trace buffer wants a snapshot at boot up or not. Since the top level buffer is part of the ftrace_trace_arrays list, there's no reason to treat it differently than the other buffers. Just iterate the list if ftrace_boot_snapshot was specified. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ross Zwisler <[email protected]> Fixes: 9c1c251d670bc ("tracing: Allow boot instances to have snapshot buffers") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-04tracing: Have tracing_snapshot_instance_cond() write errors to the ↵Steven Rostedt (Google)1-7/+7
appropriate instance If a trace instance has a failure with its snapshot code, the error message is to be written to that instance's buffer. But currently, the message is written to the top level buffer. Worse yet, it may also disable the top level buffer and not the instance that had the issue. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ross Zwisler <[email protected]> Fixes: 2824f50332486 ("tracing: Make the snapshot trigger work with instances") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-04gve: Secure enough bytes in the first TX desc for all TCP pktsShailend Chand2-7/+7
Non-GSO TCP packets whose SKBs' linear portion did not include the entire TCP header were not populating the first Tx descriptor with as many bytes as the vNIC expected. This change ensures that all TCP packets populate the first descriptor with the correct number of bytes. Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC") Signed-off-by: Shailend Chand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-04netlink: annotate lockless accesses to nlk->max_recvmsg_lenEric Dumazet1-6/+9
syzbot reported a data-race in data-race in netlink_recvmsg() [1] Indeed, netlink_recvmsg() can be run concurrently, and netlink_dump() also needs protection. [1] BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0: netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg net/socket.c:1038 [inline] __sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194 __do_sys_recvfrom net/socket.c:2212 [inline] __se_sys_recvfrom net/socket.c:2208 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2208 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1: netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg net/socket.c:1038 [inline] ____sys_recvmsg+0x156/0x310 net/socket.c:2720 ___sys_recvmsg net/socket.c:2762 [inline] do_recvmmsg+0x2e5/0x710 net/socket.c:2856 __sys_recvmmsg net/socket.c:2935 [inline] __do_sys_recvmmsg net/socket.c:2958 [inline] __se_sys_recvmmsg net/socket.c:2951 [inline] __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x0000000000000000 -> 0x0000000000001000 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Fixes: 9063e21fb026 ("netlink: autosize skb lengthes") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-04ethtool: reset #lanes when lanes is omittedAndy Roulin1-3/+4
If the number of lanes was forced and then subsequently the user omits this parameter, the ksettings->lanes is reset. The driver should then reset the number of lanes to the device's default for the specified speed. However, although the ksettings->lanes is set to 0, the mod variable is not set to true to indicate the driver and userspace should be notified of the changes. The consequence is that the same ethtool operation will produce different results based on the initial state. If the initial state is: $ ethtool swp1 | grep -A 3 'Speed: ' Speed: 500000Mb/s Lanes: 2 Duplex: Full Auto-negotiation: on then executing 'ethtool -s swp1 speed 50000 autoneg off' will yield: $ ethtool swp1 | grep -A 3 'Speed: ' Speed: 500000Mb/s Lanes: 2 Duplex: Full Auto-negotiation: off While if the initial state is: $ ethtool swp1 | grep -A 3 'Speed: ' Speed: 500000Mb/s Lanes: 1 Duplex: Full Auto-negotiation: off executing the same 'ethtool -s swp1 speed 50000 autoneg off' results in: $ ethtool swp1 | grep -A 3 'Speed: ' Speed: 500000Mb/s Lanes: 1 Duplex: Full Auto-negotiation: off This patch fixes this behavior. Omitting lanes will always results in the driver choosing the default lane width for the chosen speed. In this scenario, regardless of the initial state, the end state will be, e.g., $ ethtool swp1 | grep -A 3 'Speed: ' Speed: 500000Mb/s Lanes: 2 Duplex: Full Auto-negotiation: off Fixes: 012ce4dd3102 ("ethtool: Extend link modes settings uAPI with lanes") Signed-off-by: Andy Roulin <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-04Merge branch 'raw-ping-fix-locking-in-proc-net-raw-icmp'Jakub Kicinski5-35/+33
Kuniyuki Iwashima says: ==================== raw/ping: Fix locking in /proc/net/{raw,icmp}. The first patch fixes a NULL deref for /proc/net/raw and second one fixes the same issue for ping sockets. The first patch also converts hlist_nulls to hlist, but this is because the current code uses sk_nulls_for_each() for lockless readers, instead of sk_nulls_for_each_rcu() which adds memory barrier, but raw sockets does not use the nulls marker nor SLAB_TYPESAFE_BY_RCU in the first place. OTOH, the ping sockets already uses sk_nulls_for_each_rcu(), and such conversion can be posted later for net-next. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-04ping: Fix potentail NULL deref for /proc/net/icmp.Kuniyuki Iwashima1-4/+4
After commit dbca1596bbb0 ("ping: convert to RCU lookups, get rid of rwlock"), we use RCU for ping sockets, but we should use spinlock for /proc/net/icmp to avoid a potential NULL deref mentioned in the previous patch. Let's go back to using spinlock there. Note we can convert ping sockets to use hlist instead of hlist_nulls because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets. Fixes: dbca1596bbb0 ("ping: convert to RCU lookups, get rid of rwlock") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-04raw: Fix NULL deref in raw_get_next().Kuniyuki Iwashima4-31/+29
Dae R. Jeong reported a NULL deref in raw_get_next() [0]. It seems that the repro was running these sequences in parallel so that one thread was iterating on a socket that was being freed in another netns. unshare(0x40060200) r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00') socket$inet_icmp_raw(0x2, 0x3, 0x1) pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f) After commit 0daf07e52709 ("raw: convert raw sockets to RCU"), we use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW sockets. However, we should use spinlock for slow paths to avoid the NULL deref. Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object is not reused during iteration in the grace period. In fact, the lockless readers do not check the nulls marker with get_nulls_value(). So, SOCK_RAW should use hlist instead of hlist_nulls. Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(), let's convert hlist_nulls to hlist and use sk_for_each_rcu() for fast paths and sk_for_each() and spinlock for /proc/net/raw. [0]: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline] RIP: 0010:sock_net include/net/sock.h:649 [inline] RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline] RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline] RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995 Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206 RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000 RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338 RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9 R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78 R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030 FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225 seq_read+0x224/0x320 fs/seq_file.c:162 pde_read fs/proc/inode.c:316 [inline] proc_reg_read+0x23f/0x330 fs/proc/inode.c:328 vfs_read+0x31e/0xd30 fs/read_write.c:468 ksys_pread64 fs/read_write.c:665 [inline] __do_sys_pread64 fs/read_write.c:675 [inline] __se_sys_pread64 fs/read_write.c:672 [inline] __x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x478d29 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011 RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29 RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000 R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740 R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline] RIP: 0010:sock_net include/net/sock.h:649 [inline] RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline] RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline] RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995 Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206 RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000 RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338 RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9 R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78 R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030 FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 0daf07e52709 ("raw: convert raw sockets to RCU") Reported-by: syzbot <[email protected]> Reported-by: Dae R. Jeong <[email protected]> Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-04Merge branch 'for-6.3/cxl-doe-fixes' into for-6.3/cxlDan Williams483-4267/+4547
Pick up the fixes (first 6 patches) from the DOE rework series from Lukas for v6.3-rc. Link: https://lore.kernel.org/all/[email protected]/
2023-04-04cxl/hdm: Extend DVSEC range register emulation for region enumerationDan Williams1-5/+22
One motivation for mapping range registers to decoder objects is to use those settings for region autodiscovery. The need to map a region for devices programmed to use range registers is especially urgent now that the kernel no longer routes "Soft Reserved" ranges in the memory map to device-dax by default. The CXL memory range loses all access mechanisms. Complete the implementation by marking the DPA reservation and setting the endpoint-decoder state to signal autodiscovery. Note that the default settings of ways=1 and granularity=4096 set in cxl_decode_init() do not need to be updated. Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions") Tested-by: Dave Jiang <[email protected]> Tested-by: Gregory Price <[email protected]> Link: https://lore.kernel.org/r/168012575521.221280.14177293493678527326.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Dave Jiang <[email protected]> Signed-off-by: Dan Williams <[email protected]>