aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-12net: phylink: report correct max speed for QUSGMIIMaxime Chevallier1-1/+1
Q-USGMII is the quad port version of USGMII, and supports a max speed of 1Gbps on each line. Make so that phylink_interface_max_speed() reports this information correctly. Fixes: ae0e4bb2a0e0 ("net: phylink: Adjust link settings based on rate matching") Signed-off-by: Maxime Chevallier <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-13btrfs: do not ASSERT() on duplicated global rootsQu Wenruo1-2/+8
[BUG] Syzbot reports a reproducible ASSERT() when using rescue=usebackuproot mount option on a corrupted fs. The full report can be found here: https://syzkaller.appspot.com/bug?extid=c4614eae20a166c25bf0 BTRFS error (device loop0: state C): failed to load root csum assertion failed: !tmp, in fs/btrfs/disk-io.c:1103 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3664! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 3608 Comm: syz-executor356 Not tainted 6.0.0-rc7-syzkaller-00029-g3800a713b607 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 RIP: 0010:assertfail+0x1a/0x1c fs/btrfs/ctree.h:3663 RSP: 0018:ffffc90003aaf250 EFLAGS: 00010246 RAX: 0000000000000032 RBX: 0000000000000000 RCX: f21c13f886638400 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffff888021c640a0 R08: ffffffff816bd38d R09: ffffed10173667f1 R10: ffffed10173667f1 R11: 1ffff110173667f0 R12: dffffc0000000000 R13: ffff8880229c21f7 R14: ffff888021c64060 R15: ffff8880226c0000 FS: 0000555556a73300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a2637d7a00 CR3: 00000000709c4000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> btrfs_global_root_insert+0x1a7/0x1b0 fs/btrfs/disk-io.c:1103 load_global_roots_objectid+0x482/0x8c0 fs/btrfs/disk-io.c:2467 load_global_roots fs/btrfs/disk-io.c:2501 [inline] btrfs_read_roots fs/btrfs/disk-io.c:2528 [inline] init_tree_roots+0xccb/0x203c fs/btrfs/disk-io.c:2939 open_ctree+0x1e53/0x33df fs/btrfs/disk-io.c:3574 btrfs_fill_super+0x1c6/0x2d0 fs/btrfs/super.c:1456 btrfs_mount_root+0x885/0x9a0 fs/btrfs/super.c:1824 legacy_get_tree+0xea/0x180 fs/fs_context.c:610 vfs_get_tree+0x88/0x270 fs/super.c:1530 fc_mount fs/namespace.c:1043 [inline] vfs_kern_mount+0xc9/0x160 fs/namespace.c:1073 btrfs_mount+0x3d3/0xbb0 fs/btrfs/super.c:1884 [CAUSE] Since the introduction of global roots, we handle csum/extent/free-space-tree roots as global roots, even if no extent-tree-v2 feature is enabled. So for regular csum/extent/fst roots, we load them into fs_info::global_root_tree rb tree. And we should not expect any conflicts in that rb tree, thus we have an ASSERT() inside btrfs_global_root_insert(). But rescue=usebackuproot can break the assumption, as we will try to load those trees again and again as long as we have bad roots and have backup roots slot remaining. So in that case we can have conflicting roots in the rb tree, and triggering the ASSERT() crash. [FIX] We can safely remove that ASSERT(), as the caller will properly put the offending root. To make further debugging easier, also add two explicit error messages: - Error message for conflicting global roots - Error message when using backup roots slot Reported-by: [email protected] Fixes: abed4aaae4f7 ("btrfs: track the csum, extent, and free space trees in a rb tree") CC: [email protected] # 6.1+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-06-12Merge tag 'mm-hotfixes-stable-2023-06-12-12-22' of ↵Linus Torvalds21-29/+161
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "19 hotfixes. 14 are cc:stable and the remainder address issues which were introduced during this development cycle or which were considered inappropriate for a backport" * tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: zswap: do not shrink if cgroup may not zswap page cache: fix page_cache_next/prev_miss off by one ocfs2: check new file size on fallocate call mailmap: add entry for John Keeping mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp() epoll: ep_autoremove_wake_function should use list_del_init_careful mm/gup_test: fix ioctl fail for compat task nilfs2: reject devices with insufficient block count ocfs2: fix use-after-free when unmounting read-only filesystem lib/test_vmalloc.c: avoid garbage in page array nilfs2: fix possible out-of-bounds segment allocation in resize ioctl riscv/purgatory: remove PGO flags powerpc/purgatory: remove PGO flags x86/purgatory: remove PGO flags kexec: support purgatories with .text.hot sections mm/uffd: allow vma to merge as much as possible mm/uffd: fix vma operation where start addr cuts part of vma radix-tree: move declarations to header nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
2023-06-13btrfs: can_nocow_file_extent should pass down args->strict from callersChris Mason1-1/+1
Commit 619104ba453ad0 ("btrfs: move common NOCOW checks against a file extent into a helper") changed our call to btrfs_cross_ref_exist() to always pass false for the 'strict' parameter. We're passing this down through the stack so that we can do a full check for cross references during swapfile activation. With strict always false, this test fails: btrfs subvol create swappy chattr +C swappy fallocate -l1G swappy/swapfile chmod 600 swappy/swapfile mkswap swappy/swapfile btrfs subvol snap swappy swapsnap btrfs subvol del -C swapsnap btrfs fi sync / sync;sync;sync swapon swappy/swapfile The fix is to just use args->strict, and everyone except swapfile activation is passing false. Fixes: 619104ba453ad0 ("btrfs: move common NOCOW checks against a file extent into a helper") CC: [email protected] # 6.1+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Chris Mason <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-06-13btrfs: fix iomap_begin length for nocow writesChristoph Hellwig1-6/+12
can_nocow_extent can reduce the len passed in, which needs to be propagated to btrfs_dio_iomap_begin so that iomap does not submit more data then is mapped. This problems exists since the btrfs_get_blocks_direct helper was added in commit c5794e51784a ("btrfs: Factor out write portion of btrfs_get_blocks_direct"), but the ordered_extent splitting added in commit b73a6fd1b1ef ("btrfs: split partial dio bios before submit") added a WARN_ON that made a syzkaller test fail. Reported-by: [email protected] Fixes: c5794e51784a ("btrfs: Factor out write portion of btrfs_get_blocks_direct") CC: [email protected] # 6.1+ Tested-by: [email protected] Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-06-12igb: fix nvm.ops.read() error handlingAleksandr Loktionov1-0/+3
Add error handling into igb_set_eeprom() function, in case nvm.ops.read() fails just quit with error code asap. Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver") Signed-off-by: Aleksandr Loktionov <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-06-12igc: Fix possible system crash when loading moduleVinicius Costa Gomes1-0/+3
Guarantee that when probe() is run again, PTM and PCI busmaster will be in the same state as it was if the driver was never loaded. Avoid an i225/i226 hardware issue that PTM requests can be made even though PCI bus mastering is not enabled. These unexpected PTM requests can crash some systems. So, "force" disable PTM and busmastering before removing the driver, so they can be re-enabled in the right order during probe(). This is more like a workaround and should be applicable for i225 and i226, in any platform. Fixes: 1b5d73fb8624 ("igc: Enable PCIe PTM") Signed-off-by: Vinicius Costa Gomes <[email protected]> Reviewed-by: Muhammad Husaini Zulkifli <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-06-12igc: Clean the TX buffer and TX descriptor ringMuhammad Husaini Zulkifli1-1/+8
There could be a race condition during link down where interrupt being generated and igc_clean_tx_irq() been called to perform the TX completion. Properly clear the TX buffer/descriptor ring and disable the TX Queue ring in igc_free_tx_resources() to avoid that. Kernel trace: [ 108.237177] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLIFUI1.R00.4204.A00.2105270302 05/27/2021 [ 108.237178] RIP: 0010:refcount_warn_saturate+0x55/0x110 [ 108.242143] RSP: 0018:ffff9e7980003db0 EFLAGS: 00010286 [ 108.245555] Code: 84 bc 00 00 00 c3 cc cc cc cc 85 f6 74 46 80 3d 20 8c 4d 01 00 75 ee 48 c7 c7 88 f4 03 ab c6 05 10 8c 4d 01 01 e8 0b 10 96 ff <0f> 0b c3 cc cc cc cc 80 3d fc 8b 4d 01 00 75 cb 48 c7 c7 b0 f4 03 [ 108.250434] [ 108.250434] RSP: 0018:ffff9e798125f910 EFLAGS: 00010286 [ 108.254358] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 108.259325] [ 108.259325] RAX: 0000000000000000 RBX: ffff8ddb935b8000 RCX: 0000000000000027 [ 108.261868] RDX: ffff8de250a28800 RSI: ffff8de250a1c580 RDI: ffff8de250a1c580 [ 108.265538] RDX: 0000000000000027 RSI: 0000000000000002 RDI: ffff8de250a9c588 [ 108.265539] RBP: ffff8ddb935b8000 R08: ffffffffab2655a0 R09: ffff9e798125f898 [ 108.267914] RBP: ffff8ddb8a5b8d80 R08: 0000005648eba354 R09: 0000000000000000 [ 108.270196] R10: 0000000000000001 R11: 000000002d2d2d2d R12: ffff9e798125f948 [ 108.270197] R13: ffff9e798125fa1c R14: ffff8ddb8a5b8d80 R15: 7fffffffffffffff [ 108.273001] R10: 000000002d2d2d2d R11: 000000002d2d2d2d R12: ffff8ddb8a5b8ed4 [ 108.276410] FS: 00007f605851b740(0000) GS:ffff8de250a80000(0000) knlGS:0000000000000000 [ 108.280597] R13: 00000000000002ac R14: 00000000ffffff99 R15: ffff8ddb92561b80 [ 108.282966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 108.282967] CR2: 00007f053c039248 CR3: 0000000185850003 CR4: 0000000000f70ee0 [ 108.286206] FS: 0000000000000000(0000) GS:ffff8de250a00000(0000) knlGS:0000000000000000 [ 108.289701] PKRU: 55555554 [ 108.289702] Call Trace: [ 108.289704] <TASK> [ 108.293977] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 108.297562] sock_alloc_send_pskb+0x20c/0x240 [ 108.301494] CR2: 00007f053c03a168 CR3: 0000000184394002 CR4: 0000000000f70ef0 [ 108.301495] PKRU: 55555554 [ 108.306464] __ip_append_data.isra.0+0x96f/0x1040 [ 108.309441] Call Trace: [ 108.309443] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 108.314927] <IRQ> [ 108.314928] sock_wfree+0x1c7/0x1d0 [ 108.318078] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 108.320276] skb_release_head_state+0x32/0x90 [ 108.324812] ip_make_skb+0xf6/0x130 [ 108.327188] skb_release_all+0x16/0x40 [ 108.330775] ? udp_sendmsg+0x9f3/0xcb0 [ 108.332626] napi_consume_skb+0x48/0xf0 [ 108.334134] ? xfrm_lookup_route+0x23/0xb0 [ 108.344285] igc_poll+0x787/0x1620 [igc] [ 108.346659] udp_sendmsg+0x9f3/0xcb0 [ 108.360010] ? ttwu_do_activate+0x40/0x220 [ 108.365237] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 108.366744] ? try_to_wake_up+0x289/0x5e0 [ 108.376987] ? sock_sendmsg+0x81/0x90 [ 108.395698] ? __pfx_process_timeout+0x10/0x10 [ 108.395701] sock_sendmsg+0x81/0x90 [ 108.409052] __napi_poll+0x29/0x1c0 [ 108.414279] ____sys_sendmsg+0x284/0x310 [ 108.419507] net_rx_action+0x257/0x2d0 [ 108.438216] ___sys_sendmsg+0x7c/0xc0 [ 108.439723] __do_softirq+0xc1/0x2a8 [ 108.444950] ? finish_task_switch+0xb4/0x2f0 [ 108.452077] irq_exit_rcu+0xa9/0xd0 [ 108.453584] ? __schedule+0x372/0xd00 [ 108.460713] common_interrupt+0x84/0xa0 [ 108.467840] ? clockevents_program_event+0x95/0x100 [ 108.474968] </IRQ> [ 108.482096] ? do_nanosleep+0x88/0x130 [ 108.489224] <TASK> [ 108.489225] asm_common_interrupt+0x26/0x40 [ 108.496353] ? __rseq_handle_notify_resume+0xa9/0x4f0 [ 108.503478] RIP: 0010:cpu_idle_poll+0x2c/0x100 [ 108.510607] __sys_sendmsg+0x5d/0xb0 [ 108.518687] Code: 05 e1 d9 c8 00 65 8b 15 de 64 85 55 85 c0 7f 57 e8 b9 ef ff ff fb 65 48 8b 1c 25 00 cc 02 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 77 63 cd 00 85 c0 75 ed e8 ce ec ff ff [ 108.525817] do_syscall_64+0x44/0xa0 [ 108.531563] RSP: 0018:ffffffffab203e70 EFLAGS: 00000202 [ 108.538693] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 108.546775] [ 108.546777] RIP: 0033:0x7f605862b7f7 [ 108.549495] RAX: 0000000000000001 RBX: ffffffffab20c940 RCX: 000000000000003b [ 108.551955] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 108.554068] RDX: 4000000000000000 RSI: 000000002da97f6a RDI: 00000000002b8ff4 [ 108.559816] RSP: 002b:00007ffc99264058 EFLAGS: 00000246 [ 108.564178] RBP: 0000000000000000 R08: 00000000002b8ff4 R09: ffff8ddb01554c80 [ 108.571302] ORIG_RAX: 000000000000002e [ 108.571303] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f605862b7f7 [ 108.574023] R10: 000000000000015b R11: 000000000000000f R12: ffffffffab20c940 [ 108.574024] R13: 0000000000000000 R14: ffff8de26fbeef40 R15: ffffffffab20c940 [ 108.578727] RDX: 0000000000000000 RSI: 00007ffc992640a0 RDI: 0000000000000003 [ 108.578728] RBP: 00007ffc99264110 R08: 0000000000000000 R09: 175f48ad1c3a9c00 [ 108.581187] do_idle+0x62/0x230 [ 108.585890] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc992642d8 [ 108.585891] R13: 00005577814ab2ba R14: 00005577814addf0 R15: 00007f605876d000 [ 108.587920] cpu_startup_entry+0x1d/0x20 [ 108.591422] </TASK> [ 108.596127] rest_init+0xc5/0xd0 [ 108.600490] ---[ end trace 0000000000000000 ]--- Test Setup: DUT: - Change mac address on DUT Side. Ensure NIC not having same MAC Address - Running udp_tai on DUT side. Let udp_tai running throughout the test Example: ./udp_tai -i enp170s0 -P 100000 -p 90 -c 1 -t 0 -u 30004 Host: - Perform link up/down every 5 second. Result: Kernel panic will happen on DUT Side. Fixes: 13b5b7fd6a4a ("igc: Add support for Tx/Rx rings") Signed-off-by: Muhammad Husaini Zulkifli <[email protected]> Tested-by: Naama Meir <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-06-12clk: mediatek: mt8365: Fix index issueAlexandre Mergnat1-0/+12
Before the patch [1], the clock probe was done directly in the clk-mt8365 driver. In this probe function, the array which stores the data clocks is sized using the higher defined numbers (*_NR_CLOCK) in the clock lists [2]. Currently, with the patch [1], the specific clk-mt8365 probe function is replaced by the mtk generic one [3], which size the clock data array by adding all the clock descriptor array size provided by the clk-mt8365 driver. Actually, all clock indexes come from the header file [2], that mean, if there are more clock (then more index) in the header file [2] than the number of clock declared in the clock descriptor arrays (which is the case currently), the clock data array will be undersized and then the generic probe function will overflow when it will try to write in "clk_data[CLK_INDEX]". Actually, instead of crashing at boot, the probe function returns an error in the log which looks like: "of_clk_hw_onecell_get: invalid index 135", then this clock isn't enabled. Solve this issue by adding in the driver the missing clocks declared in the header clock file [2]. [1]: Commit ffe91cb28f6a ("clk: mediatek: mt8365: Convert to mtk_clk_simple_{probe,remove}()") [2]: include/dt-bindings/clock/mediatek,mt8365-clk.h [3]: drivers/clk/mediatek/clk-mtk.c Fixes: ffe91cb28f6a ("clk: mediatek: mt8365: Convert to mtk_clk_simple_{probe,remove}()") Signed-off-by: Alexandre Mergnat <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Markus Schneider-Pargmann <[email protected]> Reviewed-by: Markus Schneider-Pargmann <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>
2023-06-12zswap: do not shrink if cgroup may not zswapNhat Pham1-2/+9
Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy. If this limit is reached or exceeded, then zswap shrinking is triggered and short-circuits the store attempt. However, since the zswap's LRU is not memcg-aware, this can create the following pathological behavior: the cgroup whose zswap limit is 0 will evict pages from other cgroups continually, without lowering its own zswap usage. This means the shrinking will continue until the need for swap ceases or the pool becomes empty. As a result of this, we observe a disproportionate amount of zswap writeback and a perpetually small zswap pool in our experiments, even though the pool limit is never hit. More generally, a cgroup might unnecessarily evict pages from other cgroups before we drive the memcg back below its limit. This patch fixes the issue by rejecting zswap store attempt without shrinking the pool when obj_cgroup_may_zswap() returns false. [[email protected]: fix return of unintialized value] [[email protected]: s/ENOSPC/ENOMEM/] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: f4840ccfca25 ("zswap: memcg accounting") Signed-off-by: Nhat Pham <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Domenico Cerasuolo <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12page cache: fix page_cache_next/prev_miss off by oneMike Kravetz1-10/+16
Ackerley Tng reported an issue with hugetlbfs fallocate here[1]. The issue showed up after the conversion of hugetlb page cache lookup code to use page_cache_next_miss. Code in hugetlb fallocate, userfaultfd and GUP is now using page_cache_next_miss to determine if a page is present the page cache. The following statement is used. present = page_cache_next_miss(mapping, index, 1) != index; There are two issues with page_cache_next_miss when used in this way. 1) If the passed value for index is equal to the 'wrap-around' value, the same index will always be returned. This wrap-around value is 0, so 0 will be returned even if page is present at index 0. 2) If there is no gap in the range passed, the last index in the range will be returned. When passed a range of 1 as above, the passed index value will be returned even if the page is present. The end result is the statement above will NEVER indicate a page is present in the cache, even if it is. As noted by Ackerley in [1], users can see this by hugetlb fallocate incorrectly returning EEXIST if pages are already present in the file. In addition, hugetlb pages will not be included in core dumps if they need to be brought in via GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages already present in the cache. It may try to allocate a new page and potentially return ENOMEM as opposed to EEXIST. Both page_cache_next_miss and page_cache_prev_miss have similar issues. Fix by: - Check for index equal to 'wrap-around' value and do not exit early. - If no gap is found in range, return index outside range. - Update function description to say 'wrap-around' value could be returned if passed as index. [1] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()") Signed-off-by: Mike Kravetz <[email protected]> Reported-by: Ackerley Tng <[email protected]> Reviewed-by: Ackerley Tng <[email protected]> Tested-by: Ackerley Tng <[email protected]> Cc: Erdem Aktas <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Vishal Annapurve <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12ocfs2: check new file size on fallocate callLuís Henriques1-1/+7
When changing a file size with fallocate() the new size isn't being checked. In particular, the FSIZE ulimit isn't being checked, which makes fstest generic/228 fail. Simply adding a call to inode_newsize_ok() fixes this issue. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luís Henriques <[email protected]> Reviewed-by: Mark Fasheh <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12mailmap: add entry for John KeepingJohn Keeping1-0/+1
Map my corporate address to my personal one, as I am leaving the company. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Keeping <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()Kefeng Wang1-0/+2
If 'aggr_interval' is smaller than 'sample_interval', max_nr_accesses in damon_nr_accesses_to_accesses_bp() becomes zero which leads to divide error, let's validate the values of them in damon_set_attrs() to fix it, which similar to others attrs check. Link: https://lkml.kernel.org/r/[email protected] Fixes: 2f5bef5a590b ("mm/damon/core: update monitoring results for new monitoring attributes") Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=841a46899768ec7bec67 Link: https://lore.kernel.org/damon/[email protected]/ Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Kefeng Wang <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12epoll: ep_autoremove_wake_function should use list_del_init_carefulBenjamin Segall1-1/+5
autoremove_wake_function uses list_del_init_careful, so should epoll's more aggressive variant. It only doesn't because it was copied from an older wait.c rather than the most recent. [[email protected]: add comment] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: a16ceb139610 ("epoll: autoremove wakers even more aggressively") Signed-off-by: Ben Segall <[email protected]> Cc: Al Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12mm/gup_test: fix ioctl fail for compat taskHaibo Li1-0/+1
When tools/testing/selftests/mm/gup_test.c is compiled as 32bit, then run on arm64 kernel, it reports "ioctl: Inappropriate ioctl for device". Fix it by filling compat_ioctl in gup_test_fops Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haibo Li <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: AngeloGioacchino Del Regno <[email protected]> Cc: Matthias Brugger <[email protected]> Cc: John Hubbard <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12nilfs2: reject devices with insufficient block countRyusuke Konishi1-1/+42
The current sanity check for nilfs2 geometry information lacks checks for the number of segments stored in superblocks, so even for device images that have been destructively truncated or have an unusually high number of segments, the mount operation may succeed. This causes out-of-bounds block I/O on file system block reads or log writes to the segments, the latter in particular causing "a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to hang. Fix this issue by checking the number of segments stored in the superblock and avoiding mounting devices that can cause out-of-bounds accesses. To eliminate the possibility of overflow when calculating the number of blocks required for the device from the number of segments, this also adds a helper function to calculate the upper bound on the number of segments and inserts a check using it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?extid=7d50f1e54a12ba3aeae2 Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12ocfs2: fix use-after-free when unmounting read-only filesystemLuís Henriques1-2/+4
It's trivial to trigger a use-after-free bug in the ocfs2 quotas code using fstest generic/452. After a read-only remount, quotas are suspended and ocfs2_mem_dqinfo is freed through ->ocfs2_local_free_info(). When unmounting the filesystem, an UAF access to the oinfo will eventually cause a crash. BUG: KASAN: slab-use-after-free in timer_delete+0x54/0xc0 Read of size 8 at addr ffff8880389a8208 by task umount/669 ... Call Trace: <TASK> ... timer_delete+0x54/0xc0 try_to_grab_pending+0x31/0x230 __cancel_work_timer+0x6c/0x270 ocfs2_disable_quotas.isra.0+0x3e/0xf0 [ocfs2] ocfs2_dismount_volume+0xdd/0x450 [ocfs2] generic_shutdown_super+0xaa/0x280 kill_block_super+0x46/0x70 deactivate_locked_super+0x4d/0xb0 cleanup_mnt+0x135/0x1f0 ... </TASK> Allocated by task 632: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x8b/0x90 ocfs2_local_read_info+0xe3/0x9a0 [ocfs2] dquot_load_quota_sb+0x34b/0x680 dquot_load_quota_inode+0xfe/0x1a0 ocfs2_enable_quotas+0x190/0x2f0 [ocfs2] ocfs2_fill_super+0x14ef/0x2120 [ocfs2] mount_bdev+0x1be/0x200 legacy_get_tree+0x6c/0xb0 vfs_get_tree+0x3e/0x110 path_mount+0xa90/0xe10 __x64_sys_mount+0x16f/0x1a0 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 650: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x50 __kasan_slab_free+0xf9/0x150 __kmem_cache_free+0x89/0x180 ocfs2_local_free_info+0x2ba/0x3f0 [ocfs2] dquot_disable+0x35f/0xa70 ocfs2_susp_quotas.isra.0+0x159/0x1a0 [ocfs2] ocfs2_remount+0x150/0x580 [ocfs2] reconfigure_super+0x1a5/0x3a0 path_mount+0xc8a/0xe10 __x64_sys_mount+0x16f/0x1a0 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luís Henriques <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Tested-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12lib/test_vmalloc.c: avoid garbage in page arrayLorenzo Stoakes1-1/+1
It turns out that alloc_pages_bulk_array() does not treat the page_array parameter as an output parameter, but rather reads the array and skips any entries that have already been allocated. This is somewhat unexpected and breaks this test, as we allocate the pages array uninitialised on the assumption it will be overwritten. As a result, the test was referencing uninitialised data and causing the PFN to not be valid and thus a WARN_ON() followed by a null pointer deref and panic. In addition, this is an array of pointers not of struct page objects, so we need only allocate an array with elements of pointer size. We solve both problems by simply using kcalloc() and referencing sizeof(struct page *) rather than sizeof(struct page). Link: https://lkml.kernel.org/r/[email protected] Fixes: 869cb29a61a1 ("lib/test_vmalloc.c: add vm_map_ram()/vm_unmap_ram() test case") Signed-off-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Baoquan He <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12nilfs2: fix possible out-of-bounds segment allocation in resize ioctlRyusuke Konishi1-0/+9
Syzbot reports that in its stress test for resize ioctl, the log writing function nilfs_segctor_do_construct hits a WARN_ON in nilfs_segctor_truncate_segments(). It turned out that there is a problem with the current implementation of the resize ioctl, which changes the writable range on the device (the range of allocatable segments) at the end of the resize process. This order is necessary for file system expansion to avoid corrupting the superblock at trailing edge. However, in the case of a file system shrink, if log writes occur after truncating out-of-bounds trailing segments and before the resize is complete, segments may be allocated from the truncated space. The userspace resize tool was fine as it limits the range of allocatable segments before performing the resize, but it can run into this issue if the resize ioctl is called alone. Fix this issue by changing nilfs_sufile_resize() to update the range of allocatable segments immediately after successful truncation of segment space in case of file system shrink. Link: https://lkml.kernel.org/r/[email protected] Fixes: 4e33f9eab07e ("nilfs2: implement resize ioctl") Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: [email protected] Closes: https://lkml.kernel.org/r/[email protected] Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12riscv/purgatory: remove PGO flagsRicardo Ribalda1-0/+5
If profile-guided optimization is enabled, the purgatory ends up with multiple .text sections. This is not supported by kexec and crashes the system. Link: https://lkml.kernel.org/r/[email protected] Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <[email protected]> Acked-by: Palmer Dabbelt <[email protected]> Cc: <[email protected]> Cc: Albert Ou <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Simon Horman <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Rix <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12powerpc/purgatory: remove PGO flagsRicardo Ribalda1-0/+5
If profile-guided optimization is enabled, the purgatory ends up with multiple .text sections. This is not supported by kexec and crashes the system. Link: https://lkml.kernel.org/r/[email protected] Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: <[email protected]> Cc: Albert Ou <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Simon Horman <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Rix <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12x86/purgatory: remove PGO flagsRicardo Ribalda1-0/+5
If profile-guided optimization is enabled, the purgatory ends up with multiple .text sections. This is not supported by kexec and crashes the system. Link: https://lkml.kernel.org/r/[email protected] Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <[email protected]> Cc: <[email protected]> Cc: Albert Ou <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Simon Horman <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Rix <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12kexec: support purgatories with .text.hot sectionsRicardo Ribalda1-1/+13
Patch series "kexec: Fix kexec_file_load for llvm16 with PGO", v7. When upreving llvm I realised that kexec stopped working on my test platform. The reason seems to be that due to PGO there are multiple .text sections on the purgatory, and kexec does not supports that. This patch (of 4): Clang16 links the purgatory text in two sections when PGO is in use: [ 1] .text PROGBITS 0000000000000000 00000040 00000000000011a1 0000000000000000 AX 0 0 16 [ 2] .rela.text RELA 0000000000000000 00003498 0000000000000648 0000000000000018 I 24 1 8 ... [17] .text.hot. PROGBITS 0000000000000000 00003220 000000000000020b 0000000000000000 AX 0 0 1 [18] .rela.text.hot. RELA 0000000000000000 00004428 0000000000000078 0000000000000018 I 24 17 8 And both of them have their range [sh_addr ... sh_addr+sh_size] on the area pointed by `e_entry`. This causes that image->start is calculated twice, once for .text and another time for .text.hot. The second calculation leaves image->start in a random location. Because of this, the system crashes immediately after: kexec_core: Starting new kernel Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Philipp Rudo <[email protected]> Cc: Albert Ou <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Young <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Simon Horman <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Rix <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12mm/uffd: allow vma to merge as much as possiblePeter Xu1-2/+6
We used to not pass in the pgoff correctly when register/unregister uffd regions, it caused incorrect behavior on vma merging and can cause mergeable vmas being separate after ioctls return. For example, when we have: vma1(range 0-9, with uffd), vma2(range 10-19, no uffd) Then someone unregisters uffd on range (5-9), it should logically become: vma1(range 0-4, with uffd), vma2(range 5-19, no uffd) But with current code we'll have: vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd) This patch allows such merge to happen correctly before ioctl returns. This behavior seems to have existed since the 1st day of uffd. Since pgoff for vma_merge() is only used to identify the possibility of vma merging, meanwhile here what we did was always passing in a pgoff smaller than what we should, so there should have no other side effect besides not merging it. Let's still tentatively copy stable for this, even though I don't see anything will go wrong besides vma being split (which is mostly not user visible). Link: https://lkml.kernel.org/r/[email protected] Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Peter Xu <[email protected]> Reported-by: Lorenzo Stoakes <[email protected]> Acked-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12mm/uffd: fix vma operation where start addr cuts part of vmaPeter Xu1-0/+5
Patch series "mm/uffd: Fix vma merge/split", v2. This series contains two patches that fix vma merge/split for userfaultfd on two separate issues. Patch 1 fixes a regression since 6.1+ due to something we overlooked when converting to maple tree apis. The plan is we use patch 1 to replace the commit "2f628010799e (mm: userfaultfd: avoid passing an invalid range to vma_merge())" in mm-hostfixes-unstable tree if possible, so as to bring uffd vma operations back aligned with the rest code again. Patch 2 fixes a long standing issue that vma can be left unmerged even if we can for either uffd register or unregister. Many thanks to Lorenzo on either noticing this issue from the assert movement patch, looking at this problem, and also provided a reproducer on the unmerged vma issue [1]. [1] https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e This patch (of 2): It seems vma merging with uffd paths is broken with either register/unregister, where right now we can feed wrong parameters to vma_merge() and it's found by recent patch which moved asserts upwards in vma_merge() by Lorenzo Stoakes: https://lore.kernel.org/all/[email protected]/ It's possible that "start" is contained within vma but not clamped to its start. We need to convert this into either "cannot merge" case or "can merge" case 4 which permits subdivision of prev by assigning vma to prev. As we loop, each subsequent VMA will be clamped to the start. This patch will eliminate the report and make sure vma_merge() calls will become legal again. One thing to mention is that the "Fixes: 29417d292bd0" below is there only to help explain where the warning can start to trigger, the real commit to fix should be 69dbe6daf104. Commit 29417d292bd0 helps us to identify the issue, but unfortunately we may want to keep it in Fixes too just to ease kernel backporters for easier tracking. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs") Signed-off-by: Peter Xu <[email protected]> Reported-by: Mark Rutland <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Cc: Lorenzo Stoakes <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12radix-tree: move declarations to headerArnd Bergmann4-6/+15
The xarray.c file contains the only call to radix_tree_node_rcu_free(), and it comes with its own extern declaration for it. This means the function definition causes a missing-prototype warning: lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes] Instead, move the declaration for this function to a new header that can be included by both, and do the same for the radix_tree_node_cachep variable that has the same underlying problem but does not cause a warning with gcc. [[email protected]: fix building radix tree test suite] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Peng Zhang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()Ryusuke Konishi1-2/+10
A syzbot fault injection test reported that nilfs_btnode_create_block, a helper function that allocates a new node block for b-trees, causes a kernel BUG for disk images where the file system block size is smaller than the page size. This was due to unexpected flags on the newly allocated buffer head, and it turned out to be because the buffer flags were not cleared by nilfs_btnode_abort_change_key() after an error occurred during a b-tree update operation and the buffer was later reused in that state. Fix this issue by using nilfs_btnode_delete() to abandon the unused preallocated buffer in nilfs_btnode_abort_change_key(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: [email protected] Closes: https://lkml.kernel.org/r/[email protected] Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-12io_uring/io-wq: don't clear PF_IO_WORKER on exitJens Axboe1-3/+0
A recent commit gated the core dumping task exit logic on current->flags remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time. This exposed a problem with the io-wq handling of that, which explicitly clears PF_IO_WORKER before calling do_exit(). The reasons for this manual clear of PF_IO_WORKER is historical, where io-wq used to potentially trigger a sleep on exit. As the io-wq thread is exiting, it should not participate any further accounting. But these days we don't need to rely on current->flags anymore, so we can safely remove the PF_IO_WORKER clearing. Reported-by: Zorro Lang <[email protected]> Reported-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-06-12Merge tag 'for-6.4-rc6-tag' of ↵Linus Torvalds3-11/+30
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A more fixes and regression fixes: - in subpage mode, fix crash when repairing metadata at the end of a stripe - properly enable async discard when remounting from read-only to read-write - scrub regression fixes: - respect read-only scrub when attempting to do a repair - fix reporting of found errors, the stats don't get properly accounted after a stripe repair" * tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: also report errors hit during the initial read btrfs: scrub: respect the read-only flag during repair btrfs: properly enable async discard when switching from RO->RW btrfs: subpage: fix a crash in metadata repair path
2023-06-12sctp: fix an error code in sctp_sf_eat_auth()Dan Carpenter1-1/+1
The sctp_sf_eat_auth() function is supposed to enum sctp_disposition values and returning a kernel error code will cause issues in the caller. Change -ENOMEM to SCTP_DISPOSITION_NOMEM. Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-12sctp: handle invalid error codes without calling BUG()Dan Carpenter1-1/+4
The sctp_sf_eat_auth() function is supposed to return enum sctp_disposition values but if the call to sctp_ulpevent_make_authkey() fails, it returns -ENOMEM. This results in calling BUG() inside the sctp_side_effects() function. Calling BUG() is an over reaction and not helpful. Call WARN_ON_ONCE() instead. This code predates git. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-12ipvlan: fix bound dev checking for IPv6 l3s modeHangbin Liu1-0/+4
The commit 59a0b022aa24 ("ipvlan: Make skb->skb_iif track skb->dev for l3s mode") fixed ipvlan bonded dev checking by updating skb skb_iif. This fix works for IPv4, as in raw_v4_input() the dif is from inet_iif(skb), which is skb->skb_iif when there is no route. But for IPv6, the fix is not enough, because in ipv6_raw_deliver() -> raw_v6_match(), the dif is inet6_iif(skb), which is returns IP6CB(skb)->iif instead of skb->skb_iif if it's not a l3_slave. To fix the IPv6 part issue. Let's set IP6CB(skb)->iif to correct ifindex. BTW, ipvlan handles NS/NA specifically. Since it works fine, I will not reset IP6CB(skb)->iif when addr->atype is IPVL_ICMPV6. Fixes: c675e06a98a4 ("ipvlan: decouple l3s mode dependencies from other modes") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2196710 Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Larysa Zaremba <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-12wifi: mac80211: fragment per STA profile correctlyBenjamin Berg3-5/+6
When fragmenting the ML per STA profile, the element ID should be IEEE80211_MLE_SUBELEM_PER_STA_PROFILE rather than WLAN_EID_FRAGMENT. Change the helper function to take the to be used element ID and pass the appropriate value for each of the fragmentation levels. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230611121219.9b5c793d904b.I7dad952bea8e555e2f3139fbd415d0cd2b3a08c3@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-12net: ethtool: correct MAX attribute value for statsJakub Kicinski1-1/+1
When compiling YNL generated code compiler complains about array-initializer-out-of-bounds. Turns out the MAX value for STATS_GRP uses the value for STATS. This may lead to random corruptions in user space (kernel itself doesn't use this value as it never parses stats). Fixes: f09ea6fb1272 ("ethtool: add a new command for reading standard stats") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-11cifs: fix max_credits implementationShyam Prasad N2-4/+30
The current implementation of max_credits on the client does not work because the CreditRequest logic for several commands does not take max_credits into account. Still, we can end up asking the server for more credits, depending on the number of credits in flight. For this, we need to limit the credits while parsing the responses too. Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-06-11cifs: fix sockaddr comparison in iface_cmpShyam Prasad N4-37/+88
iface_cmp used to simply do a memcmp of the two provided struct sockaddrs. The comparison needs to do more based on the address family. Similar logic was already present in cifs_match_ipaddr. Doing something similar now. Signed-off-by: Shyam Prasad N <[email protected]> Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-06-11smb/client: print "Unknown" instead of bogus link speed valueEnzo Matsumiya1-1/+46
The virtio driver for Linux guests will not set a link speed to its paravirtualized NICs. This will be seen as -1 in the ethernet layer, and when some servers (e.g. samba) fetches it, it's converted to an unsigned value (and multiplied by 1000 * 1000), so in client side we end up with: 1) Speed: 4294967295000000 bps in DebugData. This patch introduces a helper that returns a speed string (in Mbps or Gbps) if interface speed is valid (>= SPEED_10 and <= SPEED_800000), or "Unknown" otherwise. The reason to not change the value in iface->speed is because we don't know the real speed of the HW backing the server NIC, so let's keep considering these as the fastest NICs available. Also print "Capabilities: None" when the interface doesn't support any. Signed-off-by: Enzo Matsumiya <[email protected]> Reviewed-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-06-11cifs: print all credit counters in DebugDataShyam Prasad N1-3/+8
Output of /proc/fs/cifs/DebugData shows only the per-connection counter for the number of credits of regular type. i.e. the credits reserved for echo and oplocks are not displayed. There have been situations recently where having this info would have been useful. This change prints the credit counters of all three types: regular, echo, oplocks. Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-06-11cifs: fix status checks in cifs_tree_connectShyam Prasad N2-8/+10
The ordering of status checks at the beginning of cifs_tree_connect is wrong. As a result, a tcon which is good may stay marked as needing reconnect infinitely. Fixes: 2f0e4f034220 ("cifs: check only tcon status on tcon related functions") Cc: [email protected] # 6.3 Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-06-11smb: remove obsolete comment鑫华1-1/+1
Because do_gettimeofday has been removed and replaced by ktime_get_real_ts64, So just remove the comment as it's not needed now. Signed-off-by: 鑫华 <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-06-11blk-cgroup: Flush stats before releasing blkcg_gqMing Lei1-9/+31
As noted by Michal, the blkg_iostat_set's in the lockless list hold reference to blkg's to protect against their removal. Those blkg's hold reference to blkcg. When a cgroup is being destroyed, cgroup_rstat_flush() is only called at css_release_work_fn() which is called when the blkcg reference count reaches 0. This circular dependency will prevent blkcg and some blkgs from being freed after they are made offline. It is less a problem if the cgroup to be destroyed also has other controllers like memory that will call cgroup_rstat_flush() which will clean up the reference count. If block is the only controller that uses rstat, these offline blkcg and blkgs may never be freed leaking more and more memory over time. To prevent this potential memory leak: - flush blkcg per-cpu stats list in __blkg_release(), when no new stat can be added - add global blkg_stat_lock for covering concurrent parent blkg stat update - don't grab bio->bi_blkg reference when adding the stats into blkcg's per-cpu stat list since all stats are guaranteed to be consumed before releasing blkg instance, and grabbing blkg reference for stats was the most fragile part of original patch Based on Waiman's patch: https://lore.kernel.org/linux-block/[email protected]/ Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") Cc: [email protected] Reported-by: Jay Shin <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Waiman Long <[email protected]> Cc: [email protected] Cc: Yosry Ahmed <[email protected]> Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-11Linux 6.4-rc6Linus Torvalds1-1/+1
2023-06-11IB/isert: Fix incorrect release of isert connectionSaravanan Vajravel1-2/+0
The ib_isert module is releasing the isert connection both in isert_wait_conn() handler as well as isert_free_conn() handler. In isert_wait_conn() handler, it is expected to wait for iSCSI session logout operation to complete. It should free the isert connection only in isert_free_conn() handler. When a bunch of iSER target is cleared, this issue can lead to use-after-free memory issue as isert conn is twice released Fixes: b02efbfc9a05 ("iser-target: Fix implicit termination of connections") Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Saravanan Vajravel <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-06-11IB/isert: Fix possible list corruption in CMA handlerSaravanan Vajravel1-0/+4
When ib_isert module receives connection error event, it is releasing the isert session and removes corresponding list node but it doesn't take appropriate mutex lock to remove the list node. This can lead to linked list corruption Fixes: bd3792205aae ("iser-target: Fix pending connections handling in target stack shutdown sequnce") Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Saravanan Vajravel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-06-11IB/isert: Fix dead lock in ib_isertSaravanan Vajravel1-2/+8
- When a iSER session is released, ib_isert module is taking a mutex lock and releasing all pending connections. As part of this, ib_isert is destroying rdma cm_id. To destroy cm_id, rdma_cm module is sending CM events to CMA handler of ib_isert. This handler is taking same mutex lock. Hence it leads to deadlock between ib_isert & rdma_cm modules. - For fix, created local list of pending connections and release the connection outside of mutex lock. Calltrace: --------- [ 1229.791410] INFO: task kworker/10:1:642 blocked for more than 120 seconds. [ 1229.791416] Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 [ 1229.791418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1229.791419] task:kworker/10:1 state:D stack: 0 pid: 642 ppid: 2 flags:0x80004000 [ 1229.791424] Workqueue: ib_cm cm_work_handler [ib_cm] [ 1229.791436] Call Trace: [ 1229.791438] __schedule+0x2d1/0x830 [ 1229.791445] ? select_idle_sibling+0x23/0x6f0 [ 1229.791449] schedule+0x35/0xa0 [ 1229.791451] schedule_preempt_disabled+0xa/0x10 [ 1229.791453] __mutex_lock.isra.7+0x310/0x420 [ 1229.791456] ? select_task_rq_fair+0x351/0x990 [ 1229.791459] isert_cma_handler+0x224/0x330 [ib_isert] [ 1229.791463] ? ttwu_queue_wakelist+0x159/0x170 [ 1229.791466] cma_cm_event_handler+0x25/0xd0 [rdma_cm] [ 1229.791474] cma_ib_handler+0xa7/0x2e0 [rdma_cm] [ 1229.791478] cm_process_work+0x22/0xf0 [ib_cm] [ 1229.791483] cm_work_handler+0xf4/0xf30 [ib_cm] [ 1229.791487] ? move_linked_works+0x6e/0xa0 [ 1229.791490] process_one_work+0x1a7/0x360 [ 1229.791491] ? create_worker+0x1a0/0x1a0 [ 1229.791493] worker_thread+0x30/0x390 [ 1229.791494] ? create_worker+0x1a0/0x1a0 [ 1229.791495] kthread+0x10a/0x120 [ 1229.791497] ? set_kthread_struct+0x40/0x40 [ 1229.791499] ret_from_fork+0x1f/0x40 [ 1229.791739] INFO: task targetcli:28666 blocked for more than 120 seconds. [ 1229.791740] Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 [ 1229.791741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1229.791742] task:targetcli state:D stack: 0 pid:28666 ppid: 5510 flags:0x00004080 [ 1229.791743] Call Trace: [ 1229.791744] __schedule+0x2d1/0x830 [ 1229.791746] schedule+0x35/0xa0 [ 1229.791748] schedule_preempt_disabled+0xa/0x10 [ 1229.791749] __mutex_lock.isra.7+0x310/0x420 [ 1229.791751] rdma_destroy_id+0x15/0x20 [rdma_cm] [ 1229.791755] isert_connect_release+0x115/0x130 [ib_isert] [ 1229.791757] isert_free_np+0x87/0x140 [ib_isert] [ 1229.791761] iscsit_del_np+0x74/0x120 [iscsi_target_mod] [ 1229.791776] lio_target_np_driver_store+0xe9/0x140 [iscsi_target_mod] [ 1229.791784] configfs_write_file+0xb2/0x110 [ 1229.791788] vfs_write+0xa5/0x1a0 [ 1229.791792] ksys_write+0x4f/0xb0 [ 1229.791794] do_syscall_64+0x5b/0x1a0 [ 1229.791798] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: bd3792205aae ("iser-target: Fix pending connections handling in target stack shutdown sequnce") Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Saravanan Vajravel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-06-11Merge tag 'x86_urgent_for_v6.4_rc6' of ↵Linus Torvalds1-9/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: - Set up the kernel CS earlier in the boot process in case EFI boots the kernel after bypassing the decompressor and the CS descriptor used ends up being the EFI one which is not mapped in the identity page table, leading to early SEV/SNP guest communication exceptions resulting in the guest crashing * tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/head/64: Switch to KERNEL_CS as soon as new GDT is installed
2023-06-11Merge tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds6-56/+62
Pull smb server fixes from Steve French: "Five smb3 server fixes, all also for stable: - Fix four slab out of bounds warnings: improve checks for protocol id, and for small packet length, and for create context parsing, and for negotiate context parsing - Fix for incorrect dereferencing POSIX ACLs" * tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: validate smb request protocol id ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loop ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR() ksmbd: fix out-of-bound read in parse_lease_state() ksmbd: fix out-of-bound read in deassemble_neg_contexts()
2023-06-11RDMA/mlx5: Fix affinity assignmentMark Bloch4-12/+18
The cited commit aimed to ensure that Virtual Functions (VFs) assign a queue affinity to a Queue Pair (QP) to distribute traffic when the LAG master creates a hardware LAG. If the affinity was set while the hardware was not in LAG, the firmware would ignore the affinity value. However, this commit unintentionally assigned an affinity to QPs on the LAG master's VPORT even if the RDMA device was not marked as LAG-enabled. In most cases, this was not an issue because when the hardware entered hardware LAG configuration, the RDMA device of the LAG master would be destroyed and a new one would be created, marked as LAG-enabled. The problem arises when a user configures Equal-Cost Multipath (ECMP). In ECMP mode, traffic can be directed to different physical ports based on the queue affinity, which is intended for use by VPORTS other than the E-Switch manager. ECMP mode is supported only if both E-Switch managers are in switchdev mode and the appropriate route is configured via IP. In this configuration, the RDMA device is not destroyed, and we retain the RDMA device that is not marked as LAG-enabled. To ensure correct behavior, Send Queues (SQs) opened by the E-Switch manager through verbs should be assigned strict affinity. This means they will only be able to communicate through the native physical port associated with the E-Switch manager. This will prevent the firmware from assigning affinity and will not allow the SQs to be remapped in case of failover. Fixes: 802dcc7fc5ec ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode") Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Mark Bloch <[email protected]> Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-06-11IB/uverbs: Fix to consider event queue closing also upon non-blocking modeYishai Hadas1-7/+5
Fix ib_uverbs_event_read() to consider event queue closing also upon non-blocking mode. Once the queue is closed (e.g. hot-plug flow) all the existing events are cleaned-up as part of ib_uverbs_free_event_queue(). An application that uses the non-blocking FD mode should get -EIO in that case to let it knows that the device was removed already. Otherwise, it can loose the indication that the device was removed and won't recover. As part of that, refactor the code to have a single flow with regards to 'is_closed' for both blocking and non-blocking modes. Fixes: 14e23bd6d221 ("RDMA/core: Fix locking in ib_uverbs_event_read") Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Yishai Hadas <[email protected]> Link: https://lore.kernel.org/r/97b00116a1e1e13f8dc4ec38a5ea81cf8c030210.1685960567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>