aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-24selftests/mm: fix build warnings on ppc64Michael Ellerman2-0/+2
Fix warnings like: In file included from uffd-unit-tests.c:8: uffd-unit-tests.c: In function `uffd_poison_handle_fault': uffd-common.h:45:33: warning: format `%llu' expects argument of type `long long unsigned int', but argument 3 has type `__u64' {aka `long unsigned int'} [-Wformat=] By switching to unsigned long long for u64 for ppc64 builds. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Michael Ellerman <[email protected]> Cc: Shuah Khan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24arm64: patching: fix handling of execmem addressesWill Deacon1-1/+1
Klara Modin reported warnings for a kernel configured with BPF_JIT but without MODULES: [ 44.131296] Trying to vfree() bad address (000000004a17c299) [ 44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) [ 44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G D W 6.9.0-01786-g2c9e5d4a0082 #25 [ 44.158229] Hardware name: Raspberry Pi 3 Model B (DT) [ 44.164433] Workqueue: events bpf_prog_free_deferred [ 44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) [ 44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) [ 44.188772] sp : ffff800082a13c70 [ 44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000 [ 44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000 [ 44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880 [ 44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006 [ 44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002 [ 44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000 [ 44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370 [ 44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370 [ 44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240 [ 44.275703] Call trace: [ 44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) [ 44.283858] vfree (mm/vmalloc.c:3322) [ 44.287835] execmem_free (mm/execmem.c:70) [ 44.292347] bpf_jit_free_exec+0x10/0x1c [ 44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006) [ 44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195) [ 44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474) [ 44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785) [ 44.317785] process_one_work (kernel/workqueue.c:3273) [ 44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2)) [ 44.327292] kthread (kernel/kthread.c:388) [ 44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861) The problem is because bpf_arch_text_copy() silently fails to write to the read-only area as a result of patch_map() faulting and the resulting -EFAULT being chucked away. Update patch_map() to use CONFIG_EXECMEM instead of CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses. Link: https://lkml.kernel.org/r/[email protected] Fixes: 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reported-by: Klara Modin <[email protected]> Closes: https://lore.kernel.org/all/[email protected] Tested-by: Klara Modin <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Luis Chamberlain <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24selftests/mm: compaction_test: fix bogus test success and reduce probability ↵Dev Jain1-22/+49
of OOM-killer invocation Reset nr_hugepages to zero before the start of the test. If a non-zero number of hugepages is already set before the start of the test, the following problems arise: - The probability of the test getting OOM-killed increases. Proof: The test wants to run on 80% of available memory to prevent OOM-killing (see original code comments). Let the value of mem_free at the start of the test, when nr_hugepages = 0, be x. In the other case, when nr_hugepages > 0, let the memory consumed by hugepages be y. In the former case, the test operates on 0.8 * x of memory. In the latter, the test operates on 0.8 * (x - y) of memory, with y already filled, hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 * x. Q.E.D - The probability of a bogus test success increases. Proof: Let the memory consumed by hugepages be greater than 25% of x, with x and y defined as above. The definition of compaction_index is c_index = (x - y)/z where z is the memory consumed by hugepages after trying to increase them again. In check_compaction(), we set the number of hugepages to zero, and then increase them back; the probability that they will be set back to consume at least y amount of memory again is very high (since there is not much delay between the two attempts of changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption). Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3 hence, c_index can always be forced to be less than 3, thereby the test succeeding always. Q.E.D Link: https://lkml.kernel.org/r/[email protected] Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <[email protected]> Cc: <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Sri Jayaramappa <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepagesDev Jain1-0/+2
Currently, the test tries to set nr_hugepages to zero, but that is not actually done because the file offset is not reset after read(). Fix that using lseek(). Link: https://lkml.kernel.org/r/[email protected] Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <[email protected]> Cc: <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Sri Jayaramappa <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24selftests/mm: compaction_test: fix bogus test success on Aarch64Dev Jain1-7/+13
Patch series "Fixes for compaction_test", v2. The compaction_test memory selftest introduces fragmentation in memory and then tries to allocate as many hugepages as possible. This series addresses some problems. On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since compaction_index becomes 0, which is less than 3, due to no division by zero exception being raised. We fix that by checking for division by zero. Secondly, correctly set the number of hugepages to zero before trying to set a large number of them. Now, consider a situation in which, at the start of the test, a non-zero number of hugepages have been already set (while running the entire selftests/mm suite, or manually by the admin). The test operates on 80% of memory to avoid OOM-killer invocation, and because some memory is already blocked by hugepages, it would increase the chance of OOM-killing. Also, since mem_free used in check_compaction() is the value before we set nr_hugepages to zero, the chance that the compaction_index will be small is very high if the preset nr_hugepages was high, leading to a bogus test success. This patch (of 3): Currently, if at runtime we are not able to allocate a huge page, the test will trivially pass on Aarch64 due to no exception being raised on division by zero while computing compaction_index. Fix that by checking for nr_hugepages == 0. Anyways, in general, avoid a division by zero by exiting the program beforehand. While at it, fix a typo, and handle the case where the number of hugepages may overflow an integer. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Sri Jayaramappa <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24mailmap: update email address for Satya PriyaSatya Priya Kakitapalli1-1/+1
Update mailmap with my latest email ID, [email protected] is no longer active. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Satya Priya Kakitapalli <[email protected]> Cc: Ajit Pandey <[email protected]> Cc: Bjorn Andersson <[email protected]> Cc: Imran Shaik <[email protected]> Cc: Jagadeesh Kona <[email protected]> Cc: Konrad Dybcio <[email protected]> Cc: Taniya Das <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24mm/huge_memory: don't unpoison huge_zero_folioMiaohe Lin1-0/+7
When I did memory failure tests recently, below panic occurs: kernel BUG at include/linux/mm.h:1135! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 Call Trace: <TASK> do_shrink_slab+0x14f/0x6a0 shrink_slab+0xca/0x8c0 shrink_node+0x2d0/0x7d0 balance_pgdat+0x33a/0x720 kswapd+0x1f3/0x410 kthread+0xd5/0x100 ret_from_fork+0x2f/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: mce_inject hwpoison_inject ---[ end trace 0000000000000000 ]--- RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 The root cause is that HWPoison flag will be set for huge_zero_folio without increasing the folio refcnt. But then unpoison_memory() will decrease the folio refcnt unexpectedly as it appears like a successfully hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when releasing huge_zero_folio. Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue. We're not prepared to unpoison huge_zero_folio yet. Link: https://lkml.kernel.org/r/[email protected] Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page") Signed-off-by: Miaohe Lin <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Xu Yu <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24kasan, fortify: properly rename memintrinsicsAndrey Konovalov1-4/+18
After commit 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled, even though the compiler instruments meminstrinsics by generating calls to __asan/__hwasan_ prefixed functions, FORTIFY_SOURCE still uses uninstrumented memset/memmove/memcpy as the underlying functions. As a result, KASAN cannot detect bad accesses in memset/memmove/memcpy. This also makes KASAN tests corrupt kernel memory and cause crashes. To fix this, use __asan_/__hwasan_memset/memmove/memcpy as the underlying functions whenever appropriate. Do this only for the instrumented code (as indicated by __SANITIZE_ADDRESS__). Link: https://lkml.kernel.org/r/[email protected] Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") Fixes: 51287dcb00cc ("kasan: emit different calls for instrumentable memintrinsics") Fixes: 36be5cba99f6 ("kasan: treat meminstrinsic as builtins in uninstrumented files") Signed-off-by: Andrey Konovalov <[email protected]> Reported-by: Erhard Furtner <[email protected]> Reported-by: Nico Pache <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Marco Elver <[email protected]> Tested-by: Nico Pache <[email protected]> Acked-by: Nico Pache <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Daniel Axtens <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24lib: add version into /proc/allocinfo outputSuren Baghdasaryan2-17/+35
Add version string and a header at the beginning of /proc/allocinfo to allow later format changes. Example output: > head /proc/allocinfo allocinfo - version: 1.0 # <size> <calls> <tag info> 0 0 init/main.c:1314 func:do_initcalls 0 0 init/do_mounts.c:353 func:mount_nodev_root 0 0 init/do_mounts.c:187 func:mount_root_generic 0 0 init/do_mounts.c:158 func:do_mount_root 0 0 init/initramfs.c:493 func:unpack_to_rootfs 0 0 init/initramfs.c:492 func:unpack_to_rootfs 0 0 init/initramfs.c:491 func:unpack_to_rootfs 512 1 arch/x86/events/rapl.c:681 func:init_rapl_pmus 128 1 arch/x86/events/rapl.c:571 func:rapl_cpu_online [[email protected]: remove stray newline from struct allocinfo_private] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAILHailong.Liu1-3/+2
commit a421ef303008 ("mm: allow !GFP_KERNEL allocations for kvmalloc") includes support for __GFP_NOFAIL, but it presents a conflict with commit dd544141b9eb ("vmalloc: back off when the current task is OOM-killed"). A possible scenario is as follows: process-a __vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL) __vmalloc_area_node() vm_area_alloc_pages() --> oom-killer send SIGKILL to process-a if (fatal_signal_pending(current)) break; --> return NULL; To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages() if __GFP_NOFAIL set. This issue occurred during OPLUS KASAN TEST. Below is part of the log -> oom-killer sends signal to process [65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198 [65731.259685] [T32454] Call trace: [65731.259698] [T32454] dump_backtrace+0xf4/0x118 [65731.259734] [T32454] show_stack+0x18/0x24 [65731.259756] [T32454] dump_stack_lvl+0x60/0x7c [65731.259781] [T32454] dump_stack+0x18/0x38 [65731.259800] [T32454] mrdump_common_die+0x250/0x39c [mrdump] [65731.259936] [T32454] ipanic_die+0x20/0x34 [mrdump] [65731.260019] [T32454] atomic_notifier_call_chain+0xb4/0xfc [65731.260047] [T32454] notify_die+0x114/0x198 [65731.260073] [T32454] die+0xf4/0x5b4 [65731.260098] [T32454] die_kernel_fault+0x80/0x98 [65731.260124] [T32454] __do_kernel_fault+0x160/0x2a8 [65731.260146] [T32454] do_bad_area+0x68/0x148 [65731.260174] [T32454] do_mem_abort+0x151c/0x1b34 [65731.260204] [T32454] el1_abort+0x3c/0x5c [65731.260227] [T32454] el1h_64_sync_handler+0x54/0x90 [65731.260248] [T32454] el1h_64_sync+0x68/0x6c [65731.260269] [T32454] z_erofs_decompress_queue+0x7f0/0x2258 --> be->decompressed_pages = kvcalloc(be->nr_pages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL); kernel panic by NULL pointer dereference. erofs assume kvmalloc with __GFP_NOFAIL never return NULL. [65731.260293] [T32454] z_erofs_runqueue+0xf30/0x104c [65731.260314] [T32454] z_erofs_readahead+0x4f0/0x968 [65731.260339] [T32454] read_pages+0x170/0xadc [65731.260364] [T32454] page_cache_ra_unbounded+0x874/0xf30 [65731.260388] [T32454] page_cache_ra_order+0x24c/0x714 [65731.260411] [T32454] filemap_fault+0xbf0/0x1a74 [65731.260437] [T32454] __do_fault+0xd0/0x33c [65731.260462] [T32454] handle_mm_fault+0xf74/0x3fe0 [65731.260486] [T32454] do_mem_abort+0x54c/0x1b34 [65731.260509] [T32454] el0_da+0x44/0x94 [65731.260531] [T32454] el0t_64_sync_handler+0x98/0xb4 [65731.260553] [T32454] el0t_64_sync+0x198/0x19c Link: https://lkml.kernel.org/r/[email protected] Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL") Signed-off-by: Hailong.Liu <[email protected]> Acked-by: Michal Hocko <[email protected]> Suggested-by: Barry Song <[email protected]> Reported-by: Oven <[email protected]> Reviewed-by: Barry Song <[email protected]> Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Cc: Chao Yu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-24Merge tag 'riscv-for-linus-6.10-mw2' of ↵Linus Torvalds27-184/+395
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - The compression format used for boot images is now configurable at build time, and these formats are shown in `make help` - access_ok() has been optimized - A pair of performance bugs have been fixed in the uaccess handlers - Various fixes and cleanups, including one for the IMSIC build failure and one for the early-boot ftrace illegal NOPs bug * tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix early ftrace nop patching irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict riscv: selftests: Add signal handling vector tests riscv: mm: accelerate pagefault when badaccess riscv: uaccess: Relax the threshold for fast path riscv: uaccess: Allow the last potential unrolled copy riscv: typo in comment for get_f64_reg Use bool value in set_cpu_online() riscv: selftests: Add hwprobe binaries to .gitignore riscv: stacktrace: fixed walk_stackframe() ftrace: riscv: move from REGS to ARGS riscv: do not select MODULE_SECTIONS by default riscv: show help string for riscv-specific targets riscv: make image compression configurable riscv: cpufeature: Fix extension subset checking riscv: cpufeature: Fix thead vector hwcap removal riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled riscv: Define TASK_SIZE_MAX for __access_ok() riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
2024-05-24Merge tag 'for-linus-6.10a-rc1-tag' of ↵Linus Torvalds4-27/+67
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - a small cleanup in the drivers/xen/xenbus Makefile - a fix of the Xen xenstore driver to improve connecting to a late started Xenstore - an enhancement for better support of ballooning in PVH guests - a cleanup using try_cmpxchg() instead of open coding it * tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: drivers/xen: Improve the late XenStore init protocol xen/xenbus: Use *-y instead of *-objs in Makefile xen/x86: add extra pages to unpopulated-alloc if available locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()
2024-05-24Merge tag 'for-6.10-tag' of ↵Linus Torvalds5-12/+45
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull more btrfs updates from David Sterba: "A few more updates, mostly stability fixes or user visible changes: - fix race in zoned mode during device replace that can lead to use-after-free - update return codes and lower message levels for quota rescan where it's causing false alerts - fix unexpected qgroup id reuse under some conditions - fix condition when looking up extent refs - add option norecovery (removed in 6.8), the intended replacements haven't been used and some aplications still rely on the old one - build warning fixes" * tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: re-introduce 'norecovery' mount option btrfs: fix end of tree detection when searching for data extent ref btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning btrfs: zoned: fix use-after-free due to race with dev replace btrfs: qgroup: fix qgroup id collision across mounts btrfs: qgroup: update rescan message levels and error codes
2024-05-24Merge tag 'erofs-for-6.10-rc1-2' of ↵Linus Torvalds8-88/+65
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull more erofs updates from Gao Xiang: "The main ones are metadata API conversion to byte offsets by Al Viro. Another patch gets rid of unnecessary memory allocation out of DEFLATE decompressor. The remaining one is a trivial cleanup. - Convert metadata APIs to byte offsets - Avoid allocating DEFLATE streams unnecessarily - Some erofs_show_options() cleanup" * tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: avoid allocating DEFLATE streams before mounting z_erofs_pcluster_begin(): don't bother with rounding position down erofs: don't round offset down for erofs_read_metabuf() erofs: don't align offset for erofs_read_metabuf() (simple cases) erofs: mechanically convert erofs_read_metabuf() to offsets erofs: clean up erofs_show_options()
2024-05-24Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefsLinus Torvalds15-57/+96
Pull bcachefs fixes from Kent Overstreet: "Nothing exciting, just syzbot fixes (except for the one FMODE_CAN_ODIRECT patch). Looks like syzbot reports have slowed down; this is all catch up from two weeks of conferences. Next hardening project is using Thomas's error injection tooling to torture test repair" * tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix race path in bch2_inode_insert() bcachefs: Ensure we're RW before journalling bcachefs: Fix shutdown ordering bcachefs: Fix unsafety in bch2_dirent_name_bytes() bcachefs: Fix stack oob in __bch2_encrypt_bio() bcachefs: Fix btree_trans leak in bch2_readahead() bcachefs: Fix bogus verify_replicas_entry() assert bcachefs: Check for subvolues with bogus snapshot/inode fields bcachefs: bch2_checksum() returns 0 for unknown checksum type bcachefs: Fix bch2_alloc_ciphers() bcachefs: Add missing guard in bch2_snapshot_has_children() bcachefs: Fix missing parens in drop_locks_do() bcachefs: Improve bch2_assert_pos_locked() bcachefs: Fix shift overflows in replicas.c bcachefs: Fix shift overflow in btree_lost_data() bcachefs: Fix ref in trans_mark_dev_sbs() error path bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method bcachefs: Fix rcu splat in check_fix_ptrs()
2024-05-24Merge tag 'input-for-v6.10-rc0' of ↵Linus Torvalds82-168/+328
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - a change to input core to trim amount of keys data in modalias string in case when a device declares too many keys and they do not fit in uevent buffer instead of reporting an error which results in uevent not being generated at all - support for Machenike G5 Pro Controller added to xpad driver - support for FocalTech FT5452 and FT8719 added to edt-ft5x06 - support for new SPMI vibrator added to pm8xxx-vibrator driver - missing locking added to cyapa touchpad driver - removal of unused fields in various driver structures - explicit initialization of i2c_device_id::driver_data to 0 dropped from input drivers - other assorted fixes and cleanups. * tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (24 commits) Input: edt-ft5x06 - add support for FocalTech FT5452 and FT8719 dt-bindings: input: touchscreen: edt-ft5x06: Document FT5452 and FT8719 support Input: xpad - add support for Machenike G5 Pro Controller Input: try trimming too long modalias strings Input: drop explicit initialization of struct i2c_device_id::driver_data to 0 Input: zet6223 - remove an unused field in struct zet6223_ts Input: chipone_icn8505 - remove an unused field in struct icn8505_data Input: cros_ec_keyb - remove an unused field in struct cros_ec_keyb Input: lpc32xx-keys - remove an unused field in struct lpc32xx_kscan_drv Input: matrix_keypad - remove an unused field in struct matrix_keypad Input: tca6416-keypad - remove unused struct tca6416_drv_data Input: tca6416-keypad - remove an unused field in struct tca6416_keypad_chip Input: da7280 - remove an unused field in struct da7280_haptic Input: ff-core - prefer struct_size over open coded arithmetic Input: cyapa - add missing input core locking to suspend/resume functions input: pm8xxx-vibrator: add new SPMI vibrator support dt-bindings: input: qcom,pm8xxx-vib: add new SPMI vibrator module input: pm8xxx-vibrator: refactor to support new SPMI vibrator Input: pm8xxx-vibrator - correct VIB_MAX_LEVELS calculation Input: sur40 - convert le16 to cpu before use ...
2024-05-24nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offsetKundan Kumar1-1/+2
bio_vec start offset may be relatively large particularly when large folio gets added to the bio. A bigger offset will result in avoiding the single-segment mapping optimization and end up using expensive mempool_alloc further. Rather than using absolute value, adjust bv_offset by NVME_CTRL_PAGE_SIZE while checking if segment can be fitted into one/two PRP entries. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Kundan Kumar <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2024-05-24nvme: remove sgs and swsKanchan Joshi1-2/+0
sgs/sws are unused, so remove these from nvme_ns_head structure. Signed-off-by: Kanchan Joshi <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2024-05-24Merge tag 'sound-fix-6.10-rc1' of ↵Linus Torvalds13-106/+86
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes for 6.10-rc1. Most of changes are various device-specific fixes and quirks, while there are a few small changes in ALSA core timer and module / built-in fixes" * tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 440/460 G11. ALSA: core: Enable proc module when CONFIG_MODULES=y ALSA: core: Fix NULL module pointer assignment at card init ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897 ASoC: dt-bindings: stm32: Ensure compatible pattern matches whole string ASoC: tas2781: Fix wrong loading calibrated data sequence ASoC: tas2552: Add TX path for capturing AUDIO-OUT data ALSA: usb-audio: Fix for sampling rates support for Mbox3 Documentation: sound: Fix trailing whitespaces ALSA: timer: Set lower bound of start tick time ASoC: codecs: ES8326: solve hp and button detect issue ASoC: rt5645: mic-in detection threshold modification ASoC: Intel: sof_sdw_rt_sdca_jack_common: Use name_prefix for `-sdca` detection
2024-05-24Merge tag 'char-misc-6.10-rc1-fix' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fix from Greg KH: "Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final merge window, and has been sitting in my tree and linux-next for quite a while now, but wasn't sent to you (my fault, travels...) It is a bugfix to resolve an error in the speakup code that could overflow a buffer. It has been in linux-next for a while with no reported problems" * tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: speakup: Fix sizeof() vs ARRAY_SIZE() bug
2024-05-24Merge tag 'tty-6.10-rc1-fixes' of ↵Linus Torvalds4-91/+181
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some small TTY and Serial driver fixes that missed the 6.9-final merge window, but have been in my tree for weeks (my fault, travel caused me to miss this) These fixes include: - more n_gsm fixes for reported problems - 8520_mtk driver fix - 8250_bcm7271 driver fix - sc16is7xx driver fix All of these have been in linux-next for weeks without any reported problems" * tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler serial: 8250_bcm7271: use default_mux_rate if possible serial: 8520_mtk: Set RTS on shutdown for Rx in-band wakeup tty: n_gsm: fix missing receive state reset after mode switch tty: n_gsm: fix possible out-of-bounds in gsm0_receive()
2024-05-24Merge tag 'hardening-v6.10-rc1-fixes' of ↵Linus Torvalds3-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression (Stephen Boyd) - ubsan: Restore dependency on ARCH_HAS_UBSAN - kunit/fortify: Fix memcmp() test to be amplitude agnostic * tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kunit/fortify: Fix memcmp() test to be amplitude agnostic ubsan: Restore dependency on ARCH_HAS_UBSAN loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression
2024-05-24Merge tag 'trace-tracefs-v6.10' of ↵Linus Torvalds2-155/+116
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracefs/eventfs updates from Steven Rostedt: "Bug fixes: - The eventfs directories need to have unique inode numbers. Make sure that they do not get the default file inode number. - Update the inode uid and gid fields on remount. When a remount happens where a uid and/or gid is specified, all the tracefs files and directories should get the specified uid and/or gid. But this can be sporadic when some uids were assigned already. There's already a list of inodes that are allocated. Just update their uid and gid fields at the time of remount. - Update the eventfs_inodes on remount from the top level "events" descriptor. There was a bug where not all the eventfs files or directories where getting updated on remount. One fix was to clear the SAVED_UID/GID flags from the inode list during the iteration of the inodes during the remount. But because the eventfs inodes can be freed when the last referenced is released, not all the eventfs_inodes were being updated. This lead to the ownership selftest to fail if it was run a second time (the first time would leave eventfs_inodes with no corresponding tracefs_inode). Instead, for eventfs_inodes, only process the "events" eventfs_inode from the list iteration, as it is guaranteed to have a tracefs_inode (it's never freed while the "events" directory exists). As it has a list of its children, and the children have a list of their children, just iterate all the eventfs_inodes from the "events" descriptor and it is guaranteed to get all of them. - Clear the EVENT_INODE flag from the tracefs_drop_inode() callback. Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput() callback. But this is the wrong location. The iput() callback is called when the last reference to the dentry inode is hit. There could be a case where two dentry's have the same inode, and the flag will be cleared prematurely. The flag needs to be cleared when the last reference of the inode is dropped and that happens in the inode's drop_inode() callback handler. Cleanups: - Consolidate the creation of a tracefs_inode for an eventfs_inode A tracefs_inode is created for both files and directories of the eventfs system. It is open coded. Instead, consolidate it into a single eventfs_get_inode() function call. - Remove the eventfs getattr and permission callbacks. The permissions for the eventfs files and directories are updated when the inodes are created, on remount, and when the user sets them (via setattr). The inodes hold the current permissions so there is no need to have custom getattr or permissions callbacks as they will more likely cause them to be incorrect. The inode's permissions are updated when they should be updated. Remove the getattr and permissions inode callbacks. - Do not update eventfs_inode attributes on creation of inodes. The eventfs_inodes attribute field is used to store the permissions of the directories and files for when their corresponding inodes are freed and are created again. But when the creation of the inodes happen, the eventfs_inode attributes are recalculated. The recalculation should only happen when the permissions change for a given file or directory. Currently, the attribute changes are just being set to their current files so this is not a bug, but it's unnecessary and error prone. Stop doing that. - The events directory inode is created once when the events directory is created and deleted when it is deleted. It is now updated on remount and when the user changes the permissions. There's no need to use the eventfs_inode of the events directory to store the events directory permissions. But using it to store the default permissions for the files within the directory that have not been updated by the user can simplify the code" * tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Do not use attributes for events directory eventfs: Cleanup permissions in creation of inodes eventfs: Remove getattr and permission callbacks eventfs: Consolidate the eventfs_inode update in eventfs_get_inode() tracefs: Clear EVENT_INODE flag in tracefs_drop_inode() eventfs: Update all the eventfs_inodes from the events descriptor tracefs: Update inode permissions on remount eventfs: Keep the directories from having the same inode number as files
2024-05-24bpf: Fix potential integer overflow in resolve_btfidsFriedrich Vock1-1/+1
err is a 32-bit integer, but elf_update returns an off_t, which is 64-bit at least on 64-bit platforms. If symbols_patch is called on a binary between 2-4GB in size, the result will be negative when cast to a 32-bit integer, which the code assumes means an error occurred. This can wrongly trigger build failures when building very large kernel images. Fixes: fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object") Signed-off-by: Friedrich Vock <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-05-24dma-buf/sw-sync: don't enable IRQ from sync_print_obj()Tetsuo Handa1-2/+2
Since commit a6aa8fca4d79 ("dma-buf/sw-sync: Reduce irqsave/irqrestore from known context") by error replaced spin_unlock_irqrestore() with spin_unlock_irq() for both sync_debugfs_show() and sync_print_obj() despite sync_print_obj() is called from sync_debugfs_show(), lockdep complains inconsistent lock state warning. Use plain spin_{lock,unlock}() for sync_print_obj(), for sync_debugfs_show() is already using spin_{lock,unlock}_irq(). Reported-by: syzbot <[email protected]> Closes: https://syzkaller.appspot.com/bug?extid=a225ee3df7e7f9372dbe Fixes: a6aa8fca4d79 ("dma-buf/sw-sync: Reduce irqsave/irqrestore from known context") Signed-off-by: Tetsuo Handa <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
2024-05-24Merge branch 'mlx5-fixes'David S. Miller9-25/+43
Tariq Toukan says: ==================== mlx5 fixes 24-05-22 This patchset provides bug fixes to mlx5 core and Eth drivers. Series generated against: commit 9c91c7fadb17 ("net: mana: Fix the extra HZ in mana_hwc_send_request") ==================== Signed-off-by: David S. Miller <[email protected]>
2024-05-24net/mlx5e: Fix UDP GSO for encapsulated packetsGal Pressman2-2/+12
When the skb is encapsulated, adjust the inner UDP header instead of the outer one, and account for UDP header (instead of TCP) in the inline header size calculation. Fixes: 689adf0d4892 ("net/mlx5e: Add UDP GSO support") Reported-by: Jason Baron <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Gal Pressman <[email protected]> Reviewed-by: Dragos Tatulea <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-24net/mlx5e: Use rx_missed_errors instead of rx_dropped for reporting buffer ↵Carolina Jubran1-1/+1
exhaustion Previously, the driver incorrectly used rx_dropped to report device buffer exhaustion. According to the documentation, rx_dropped should not be used to count packets dropped due to buffer exhaustion, which is the purpose of rx_missed_errors. Use rx_missed_errors as intended for counting packets dropped due to buffer exhaustion. Fixes: 269e6b3af3bf ("net/mlx5e: Report additional error statistics in get stats ndo") Signed-off-by: Carolina Jubran <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-24net/mlx5e: Do not use ptp structure for tx ts stats when not initializedRahul Rameshbabu1-0/+4
The ptp channel instance is only initialized when ptp traffic is first processed by the driver. This means that there is a window in between when port timestamping is enabled and ptp traffic is sent where the ptp channel instance is not initialized. Accessing statistics during this window will lead to an access violation (NULL + member offset). Check the validity of the instance before attempting to query statistics. BUG: unable to handle page fault for address: 0000000000003524 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 109dfc067 P4D 109dfc067 PUD 1064ef067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 420 Comm: ethtool Not tainted 6.9.0-rc2-rrameshbabu+ #245 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/204 RIP: 0010:mlx5e_stats_ts_get+0x4c/0x130 <snip> Call Trace: <TASK> ? show_regs+0x60/0x70 ? __die+0x24/0x70 ? page_fault_oops+0x15f/0x430 ? do_user_addr_fault+0x2c9/0x5c0 ? exc_page_fault+0x63/0x110 ? asm_exc_page_fault+0x27/0x30 ? mlx5e_stats_ts_get+0x4c/0x130 ? mlx5e_stats_ts_get+0x20/0x130 mlx5e_get_ts_stats+0x15/0x20 <snip> Fixes: 3579032c08c1 ("net/mlx5e: Implement ethtool hardware timestamping statistics") Signed-off-by: Rahul Rameshbabu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-24net/mlx5e: Fix IPsec tunnel mode offload feature checkRahul Rameshbabu1-12/+5
Remove faulty check disabling checksum offload and GSO for offload of simple IPsec tunnel L4 traffic. Comment previously describing the deleted code incorrectly claimed the check prevented double tunnel (or three layers of ip headers). Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload") Signed-off-by: Rahul Rameshbabu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-24net/mlx5: Use mlx5_ipsec_rx_status_destroy to correctly delete status rulesRahul Rameshbabu1-2/+1
rx_create no longer allocates a modify_hdr instance that needs to be cleaned up. The mlx5_modify_header_dealloc call will lead to a NULL pointer dereference. A leak in the rules also previously occurred since there are now two rules populated related to status. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 109907067 P4D 109907067 PUD 116890067 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 484 Comm: ip Not tainted 6.9.0-rc2-rrameshbabu+ #254 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:mlx5_modify_header_dealloc+0xd/0x70 <snip> Call Trace: <TASK> ? show_regs+0x60/0x70 ? __die+0x24/0x70 ? page_fault_oops+0x15f/0x430 ? free_to_partial_list.constprop.0+0x79/0x150 ? do_user_addr_fault+0x2c9/0x5c0 ? exc_page_fault+0x63/0x110 ? asm_exc_page_fault+0x27/0x30 ? mlx5_modify_header_dealloc+0xd/0x70 rx_create+0x374/0x590 rx_add_rule+0x3ad/0x500 ? rx_add_rule+0x3ad/0x500 ? mlx5_cmd_exec+0x2c/0x40 ? mlx5_create_ipsec_obj+0xd6/0x200 mlx5e_accel_ipsec_fs_add_rule+0x31/0xf0 mlx5e_xfrm_add_state+0x426/0xc00 <snip> Fixes: 94af50c0a9bb ("net/mlx5e: Unify esw and normal IPsec status table creation/destruction") Signed-off-by: Rahul Rameshbabu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-24net/mlx5: Fix MTMP register capability offset in MCAM registerGal Pressman1-2/+2
The MTMP register (0x900a) capability offset is off-by-one, move it to the right place. Fixes: 1f507e80c700 ("net/mlx5: Expose NIC temperature via hardware monitoring kernel API") Signed-off-by: Gal Pressman <[email protected]> Reviewed-by: Cosmin Ratiu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-24net/mlx5: Do not query MPIR on embedded CPU functionTariq Toukan1-4/+8
A proper query to MPIR needs to set the correct value in the depth field. On embedded CPU this value is not necessarily zero. As there is no real use case for multi-PF netdev on the embedded CPU of the smart NIC, block this option. This fixes the following failure: ACCESS_REG(0x805) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x685f19), err(-5) Fixes: 678eb448055a ("net/mlx5: SD, Implement basic query and instantiation") Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-24net/mlx5: Lag, do bond only if slaves agree on roce stateMaher Sanalla1-2/+10
Currently, the driver does not enforce that lag bond slaves must have matching roce capabilities. Yet, in mlx5_do_bond(), the driver attempts to enable roce on all vports of the bond slaves, causing the following syndrome when one slave has no roce fw support: mlx5_cmd_out_err:809:(pid 25427): MODIFY_NIC_VPORT_CONTEXT(0×755) op_mod(0×0) failed, status bad parameter(0×3), syndrome (0xc1f678), err(-22) Thus, create HW lag only if bond's slaves agree on roce state, either all slaves have roce support resulting in a roce lag bond, or none do, resulting in a raw eth bond. Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature") Signed-off-by: Maher Sanalla <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-24swap: yield device immediatelyChristian Brauner1-1/+1
Otherwise we can cause spurious EBUSY issues when trying to mount the rootfs later on. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218845 Reported-by: Petri Kaukasoina <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-05-24netfs: Fix setting of BDP_ASYNC from iocb flagsDavid Howells1-1/+1
Fix netfs_perform_write() to set BDP_ASYNC if IOCB_NOWAIT is set rather than if IOCB_SYNC is not set. It reflects asynchronicity in the sense of not waiting rather than synchronicity in the sense of not returning until the op is complete. Without this, generic/590 fails on cifs in strict caching mode with a complaint that one of the writes fails with EAGAIN. The test can be distilled down to: mount -t cifs /my/share /mnt -ostuff xfs_io -i -c 'falloc 0 8191M -c fsync -f /mnt/file xfs_io -i -c 'pwrite -b 1M -W 0 8191M' /mnt/file Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write") Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jens Axboe <[email protected]> cc: Jeff Layton <[email protected]> cc: Enzo Matsumiya <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-05-24signalfd: drop an obsolete commentFedor Pchelkin1-4/+0
Commit fbe38120eb1d ("signalfd: convert to ->read_iter()") removed the call to anon_inode_getfd() by splitting fd setup into two parts. Drop the comment referencing the internal details of that function. Signed-off-by: Fedor Pchelkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-05-24signalfd: fix error return codeFedor Pchelkin1-1/+1
If anon_inode_getfile() fails, return appropriate error code. This looks like a single typo: the similar code changes in timerfd and userfaultfd are okay. Found by Linux Verification Center (linuxtesting.org). Fixes: fbe38120eb1d ("signalfd: convert to ->read_iter()") Signed-off-by: Fedor Pchelkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-05-24iomap: fault in smaller chunks for non-large folio mappingsXu Yang1-1/+1
Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"), iomap will try to copy in larger chunks than PAGE_SIZE. However, if the mapping doesn't support large folio, only one page of maximum 4KB will be created and 4KB data will be writen to pagecache each time. Then, next 4KB will be handled in next iteration. This will cause potential write performance problem. If chunk is 2MB, total 512 pages need to be handled finally. During this period, fault_in_iov_iter_readable() is called to check iov_iter readable validity. Since only 4KB will be handled each time, below address space will be checked over and over again: start end - buf, buf+2MB buf+4KB, buf+2MB buf+8KB, buf+2MB ... buf+2044KB buf+2MB Obviously the checking size is wrong since only 4KB will be handled each time. So this will get a correct chunk to let iomap work well in non-large folio case. With this change, the write speed will be stable. Tested on ARM64 device. Before: - dd if=/dev/zero of=/dev/sda bs=400K count=10485 (334 MB/s) - dd if=/dev/zero of=/dev/sda bs=800K count=5242 (278 MB/s) - dd if=/dev/zero of=/dev/sda bs=1600K count=2621 (204 MB/s) - dd if=/dev/zero of=/dev/sda bs=2200K count=1906 (170 MB/s) - dd if=/dev/zero of=/dev/sda bs=3000K count=1398 (150 MB/s) - dd if=/dev/zero of=/dev/sda bs=4500K count=932 (139 MB/s) After: - dd if=/dev/zero of=/dev/sda bs=400K count=10485 (339 MB/s) - dd if=/dev/zero of=/dev/sda bs=800K count=5242 (330 MB/s) - dd if=/dev/zero of=/dev/sda bs=1600K count=2621 (332 MB/s) - dd if=/dev/zero of=/dev/sda bs=2200K count=1906 (333 MB/s) - dd if=/dev/zero of=/dev/sda bs=3000K count=1398 (333 MB/s) - dd if=/dev/zero of=/dev/sda bs=4500K count=932 (333 MB/s) Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace") Cc: [email protected] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Xu Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-05-24filemap: add helper mapping_max_folio_size()Xu Yang1-13/+21
Add mapping_max_folio_size() to get the maximum folio size for this pagecache mapping. Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace") Cc: [email protected] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Xu Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Ritesh Harjani (IBM) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-05-24netfs: Fix AIO error handling when doing write-throughDavid Howells1-1/+6
If an error occurs whilst we're doing an AIO write in write-through mode, we may end up calling ->ki_complete() *and* returning an error from ->write_iter(). This can result in either a UAF (the ->ki_complete() func pointer may get overwritten, for example) or a refcount underflow in io_submit() as ->ki_complete is called twice. Fix this by making netfs_end_writethrough() - and thus netfs_perform_write() - unconditionally return -EIOCBQUEUED if we're doing an AIO write and wait for completion if we're not. Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: Enzo Matsumiya <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-05-24netfs: Fix io_uring based write-throughDavid Howells3-5/+6
This can be triggered by mounting a cifs filesystem with a cache=strict mount option and then, using the fsx program from xfstests, doing: ltp/fsx -A -d -N 1000 -S 11463 -P /tmp /cifs-mount/foo \ --replay-ops=gen112-fsxops Where gen112-fsxops holds: fallocate 0x6be7 0x8fc5 0x377d3 copy_range 0x9c71 0x77e8 0x2edaf 0x377d3 write 0x2776d 0x8f65 0x377d3 The problem is that netfs_io_request::len is being used for two purposes and ends up getting set to the amount of data we transferred, not the amount of data the caller asked to be transferred (for various reasons, such as mmap'd writes, we might end up rounding out the data written to the server to include the entire folio at each end). Fix this by keeping the amount we were asked to write in ->len and using ->submitted to track what we issued ops for. Then, when we come to calling ->ki_complete(), ->len is the right size. This also required netfs_cleanup_dio_write() to change since we're no longer advancing wreq->len. Use wreq->transferred instead as we might have done a short read. With this, the generic/112 xfstest passes if cifs is forced to put all non-DIO opens into write-through mode. Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Jeff Layton <[email protected]> cc: Steve French <[email protected]> cc: Enzo Matsumiya <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-05-24net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8061Mathieu Othacehe1-0/+1
Following a similar reinstate for the KSZ8081 and KSZ9031. Older kernels would use the genphy_soft_reset if the PHY did not implement a .soft_reset. The KSZ8061 errata described here: https://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8061-Errata-DS80000688B.pdf and worked around with 232ba3a51c ("net: phy: Micrel KSZ8061: link failure after cable connect") is back again without this soft reset. Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") Tested-by: Karim Ben Houcine <[email protected]> Signed-off-by: Mathieu Othacehe <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-24genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()dicken.ding1-1/+4
irq_find_at_or_after() dereferences the interrupt descriptor which is returned by mt_find() while neither holding sparse_irq_lock nor RCU read lock, which means the descriptor can be freed between mt_find() and the dereference: CPU0 CPU1 desc = mt_find() delayed_free_desc(desc) irq_desc_get_irq(desc) The use-after-free is reported by KASAN: Call trace: irq_get_next_irq+0x58/0x84 show_stat+0x638/0x824 seq_read_iter+0x158/0x4ec proc_reg_read_iter+0x94/0x12c vfs_read+0x1e0/0x2c8 Freed by task 4471: slab_free_freelist_hook+0x174/0x1e0 __kmem_cache_free+0xa4/0x1dc kfree+0x64/0x128 irq_kobj_release+0x28/0x3c kobject_put+0xcc/0x1e0 delayed_free_desc+0x14/0x2c rcu_do_batch+0x214/0x720 Guard the access with a RCU read lock section. Fixes: 721255b9826b ("genirq: Use a maple tree for interrupt descriptor management") Signed-off-by: dicken.ding <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2024-05-24fs/ntfs3: Break dir enumeration if directory contents errorKonstantin Komarov1-0/+1
If we somehow attempt to read beyond the directory size, an error is supposed to be returned. However, in some cases, read requests do not stop and instead enter into a loop. To avoid this, we set the position in the directory to the end. Signed-off-by: Konstantin Komarov <[email protected]> Cc: [email protected]
2024-05-24fs/ntfs3: Fix case when index is reused during tree transformationKonstantin Komarov1-0/+6
In most cases when adding a cluster to the directory index, they are placed at the end, and in the bitmap, this cluster corresponds to the last bit. The new directory size is calculated as follows: data_size = (u64)(bit + 1) << indx->index_bits; In the case of reusing a non-final cluster from the index, data_size is calculated incorrectly, resulting in the directory size differing from the actual size. A check for cluster reuse has been added, and the size update is skipped. Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <[email protected]> Cc: [email protected]
2024-05-24connector: Fix invalid conversion in cn_proc.hMatt Jan1-2/+1
The implicit conversion from unsigned int to enum proc_cn_event is invalid, so explicitly cast it for compilation in a C++ compiler. /usr/include/linux/cn_proc.h: In function 'proc_cn_event valid_event(proc_cn_event)': /usr/include/linux/cn_proc.h:72:17: error: invalid conversion from 'unsigned int' to 'proc_cn_event' [-fpermissive] 72 | ev_type &= PROC_EVENT_ALL; | ^ | | | unsigned int Signed-off-by: Matt Jan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-23selftest mm/mseal read-only elf memory segmentJeff Xu4-36/+275
Sealing read-only of elf mapping so it can't be changed by mprotect. [[email protected]: style change] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix linker error for inline function] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix compile warning] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix arm build] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Xu <[email protected]> Signed-off-by: Amer Al Shanawany <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Jorge Lucangeli Obes <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Pedro Falcato <[email protected]> Cc: Stephen Röttger <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Amer Al Shanawany <[email protected]> Cc: Javier Carrasco <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-23mseal: add documentationJeff Xu2-0/+200
Add documentation for mseal(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Xu <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Jorge Lucangeli Obes <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Pedro Falcato <[email protected]> Cc: Stephen Röttger <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Amer Al Shanawany <[email protected]> Cc: Javier Carrasco <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-23selftest mm/mseal memory sealingJeff Xu3-0/+1838
selftest for memory sealing change in mmap() and mseal(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Xu <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Jorge Lucangeli Obes <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Pedro Falcato <[email protected]> Cc: Stephen Röttger <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Amer Al Shanawany <[email protected]> Cc: Javier Carrasco <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>