aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-11-09KVM: SVM: do not allocate struct svm_cpu_data dynamicallyPaolo Bonzini3-29/+18
The svm_data percpu variable is a pointer, but it is allocated via svm_hardware_setup() when KVM is loaded. Unlike hardware_enable() this means that it is never NULL for the whole lifetime of KVM, and static allocation does not waste any memory compared to the status quo. It is also more efficient and more easily handled from assembly code, so do it and don't look back. Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09KVM: SVM: remove dead field from struct svm_cpu_dataPaolo Bonzini2-3/+0
The "cpu" field of struct svm_cpu_data has been write-only since commit 4b656b120249 ("KVM: SVM: force new asid on vcpu migration", 2009-08-05). Remove it. Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09KVM: SVM: remove unused field from struct vcpu_svmPaolo Bonzini1-1/+0
The pointer to svm_cpu_data in struct vcpu_svm looks interesting from the point of view of accessing it after vmexit, when the GSBASE is still containing the guest value. However, despite existing since the very first commit of drivers/kvm/svm.c (commit 6aa8b732ca01, "[PATCH] kvm: userspace interface", 2006-12-10), it was never set to anything. Ignore the opportunity to fix a 16 year old "bug" and delete it; doing things the "harder" way makes it possible to remove more old cruft. Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09KVM: SVM: retrieve VMCB from assemblyPaolo Bonzini4-15/+16
Continue moving accesses to struct vcpu_svm to vmenter.S. Reducing the number of arguments limits the chance of mistakes due to different registers used for argument passing in 32- and 64-bit ABIs; pushing the VMCB argument and almost immediately popping it into a different register looks pretty weird. 32-bit ABI is not a concern for __svm_sev_es_vcpu_run() which is 64-bit only; however, it will soon need @svm to save/restore SPEC_CTRL so stay consistent with __svm_vcpu_run() and let them share the same prototype. No functional change intended. Cc: [email protected] Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09KVM: SVM: adjust register allocation for __svm_vcpu_run()Paolo Bonzini1-19/+19
32-bit ABI uses RAX/RCX/RDX as its argument registers, so they are in the way of instructions that hardcode their operands such as RDMSR/WRMSR or VMLOAD/VMRUN/VMSAVE. In preparation for moving vmload/vmsave to __svm_vcpu_run(), keep the pointer to the struct vcpu_svm in %rdi. In particular, it is now possible to load svm->vmcb01.pa in %rax without clobbering the struct vcpu_svm pointer. No functional change intended. Cc: [email protected] Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svmPaolo Bonzini5-20/+30
Since registers are reachable through vcpu_svm, and we will need to access more fields of that struct, pass it instead of the regs[] array. No functional change intended. Cc: [email protected] Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09KVM: x86: use a separate asm-offsets.c filePaolo Bonzini5-7/+30
This already removes an ugly #include "" from asm-offsets.c, but especially it avoids a future error when trying to define asm-offsets for KVM's svm/svm.h header. This would not work for kernel/asm-offsets.c, because svm/svm.h includes kvm_cache_regs.h which is not in the include path when compiling asm-offsets.c. The problem is not there if the .c file is in arch/x86/kvm. Suggested-by: Sean Christopherson <[email protected]> Cc: [email protected] Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller3-7/+11
Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Fix deadlock in nfnetlink due to missing mutex release in error path, from Ziyang Xuan. 2) Clean up pending autoload module list from nf_tables_exit_net() path, from Shigeru Yoshida. 3) Fixes for the netfilter's reverse path selftest, from Phil Sutter. All of these bugs have been around for several releases. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-11-09drm: rcar-du: Fix Kconfig dependency between RCAR_DU and RCAR_MIPI_DSILaurent Pinchart1-4/+9
When the R-Car MIPI DSI driver was added, it was a standalone encoder driver without any dependency to or from the R-Car DU driver. Commit 957fe62d7d15 ("drm: rcar-du: Fix DSI enable & disable sequence") then added a direct call from the DU driver to the MIPI DSI driver, without updating Kconfig to take the new dependency into account. Fix it the same way that the LVDS encoder is handled. Fixes: 957fe62d7d15 ("drm: rcar-du: Fix DSI enable & disable sequence") Reported-by: kernel test robot <[email protected]> Reviewed-by: Tomi Valkeinen <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]>
2022-11-09drm/panfrost: Split io-pgtable requests properlyRobin Murphy1-1/+10
Although we don't use 1GB block mappings, we still need to split map/unmap requests at 1GB boundaries to match what io-pgtable expects. Fix that, and add some explanation to make sense of it all. Fixes: 3740b081795a ("drm/panfrost: Update io-pgtable API") Reported-by: Dmitry Osipenko <[email protected]> Signed-off-by: Robin Murphy <[email protected]> Tested-by: Dmitry Osipenko <[email protected]> Reviewed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/49e54bb4019cd06e01549b106d7ac37c3d182cd3.1667927179.git.robin.murphy@arm.com
2022-11-09Merge branch 'wwan-iosm-fixes'David S. Miller6-4/+27
M Chetan Kumar says: ==================== net: wwan: iosm: fixes This patch series contains iosm fixes. PATCH1: Fix memory leak in ipc_pcie_read_bios_cfg. PATCH2: Fix driver not working with INTEL_IOMMU disabled config. PATCH3: Fix invalid mux header type. PATCH4: Fix kernel build robot reported errors. Please refer to individual commit message for details. -- v2: * PATCH1: No Change * PATCH2: Kconfig change - Add dependency on PCI to resolve kernel build robot errors. * PATCH3: No Change * PATCH4: New (Fix kernel build robot errors) ==================== Signed-off-by: David S. Miller <[email protected]>
2022-11-09net: wwan: iosm: fix kernel test robot reported errorsM Chetan Kumar2-0/+2
Include linux/vmalloc.h in iosm_ipc_coredump.c & iosm_ipc_devlink.c to resolve kernel test robot errors. Reported-by: kernel test robot <[email protected]> Signed-off-by: M Chetan Kumar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-09net: wwan: iosm: fix invalid mux header typeM Chetan Kumar2-0/+9
Data stall seen during peak DL throughput test & packets are dropped by mux layer due to invalid header type in datagram. During initlization Mux aggregration protocol is set to default UL/DL size and TD count of Mux lite protocol. This configuration mismatch between device and driver is resulting in data stall/packet drops. Override the UL/DL size and TD count for Mux aggregation protocol. Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support") Signed-off-by: M Chetan Kumar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-09net: wwan: iosm: fix driver not working with INTEL_IOMMU disabledM Chetan Kumar2-1/+8
With INTEL_IOMMU disable config or by forcing intel_iommu=off from grub some of the features of IOSM driver like browsing, flashing & coredump collection is not working. When driver calls DMA API - dma_map_single() for tx transfers. It is resulting in dma mapping error. Set the device DMA addressing capabilities using dma_set_mask() and remove the INTEL_IOMMU dependency in kconfig so that driver follows the platform config either INTEL_IOMMU enable or disable. Fixes: f7af616c632e ("net: iosm: infrastructure") Signed-off-by: M Chetan Kumar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-09net: wwan: iosm: fix memory leak in ipc_pcie_read_bios_cfgM Chetan Kumar1-3/+8
ipc_pcie_read_bios_cfg() is using the acpi_evaluate_dsm() to obtain the wwan power state configuration from BIOS but is not freeing the acpi_object. The acpi_evaluate_dsm() returned acpi_object to be freed. Free the acpi_object after use. Fixes: 7e98d785ae61 ("net: iosm: entry point") Signed-off-by: M Chetan Kumar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-09nvmet: fix a memory leakSagi Grimberg1-0/+1
We need to also free the dhchap_ctrl_secret when releasing nvmet_host. kmemleak complaint: -- unreferenced object 0xffff99b1cbca5140 (size 64): comm "check", pid 4864, jiffies 4305092436 (age 2913.583s) hex dump (first 32 bytes): 44 48 48 43 2d 31 3a 30 30 3a 65 36 2b 41 63 44 DHHC-1:00:e6+AcD 39 76 47 4d 52 57 59 78 67 54 47 44 51 59 47 78 9vGMRWYxgTGDQYGx backtrace: [<00000000c07d369d>] kstrdup+0x2e/0x60 [<000000001372171c>] 0xffffffffc0cceec6 [<0000000010dbf50b>] 0xffffffffc0cc6783 [<000000007465e93c>] configfs_write_iter+0xb1/0x120 [<0000000039c23f62>] vfs_write+0x2be/0x3c0 [<000000002da4351c>] ksys_write+0x5f/0xe0 [<00000000d5011e32>] do_syscall_64+0x38/0x90 [<00000000503870cf>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication") Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-11-09nvmet: fix memory leak in nvmet_subsys_attr_model_store_lockedAleksandr Miloserdov1-2/+5
Since model_number is allocated before it needs to be freed before kmemdump_nul. Reviewed-by: Konstantin Shelekhin <[email protected]> Reviewed-by: Dmitriy Bogdanov <[email protected]> Signed-off-by: Aleksandr Miloserdov <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-11-09nvme: quiet user passthrough command errorsKeith Busch2-4/+1
The driver is spamming the kernel logs for entirely harmless errors from user space submitting unsupported commands. Just silence the errors. The application has direct access to command status, so there's no need to log these. And since every passthrough command now uses the quiet flag, move the setting to the common initializer. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Alan Adamson <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Kanchan Joshi <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Daniel Wagner <[email protected]> Tested-by: Alan Adamson <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-11-09mmc: sdhci-esdhc-imx: use the correct host caps for MMC_CAP_8_BIT_DATAHaibo Chen1-2/+2
MMC_CAP_8_BIT_DATA belongs to struct mmc_host, not struct sdhci_host. So correct it here. Fixes: 1ed5c3b22fc7 ("mmc: sdhci-esdhc-imx: Propagate ESDHC_FLAG_HS400* only on 8bit bus") Signed-off-by: Haibo Chen <[email protected]> Cc: [email protected] Acked-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2022-11-09udf: Fix a slab-out-of-bounds write bug in udf_find_entry()ZhangPeng1-1/+1
Syzbot reported a slab-out-of-bounds Write bug: loop0: detected capacity change from 0 to 2048 ================================================================== BUG: KASAN: slab-out-of-bounds in udf_find_entry+0x8a5/0x14f0 fs/udf/namei.c:253 Write of size 105 at addr ffff8880123ff896 by task syz-executor323/3610 CPU: 0 PID: 3610 Comm: syz-executor323 Not tainted 6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:284 print_report+0x107/0x1f0 mm/kasan/report.c:395 kasan_report+0xcd/0x100 mm/kasan/report.c:495 kasan_check_range+0x2a7/0x2e0 mm/kasan/generic.c:189 memcpy+0x3c/0x60 mm/kasan/shadow.c:66 udf_find_entry+0x8a5/0x14f0 fs/udf/namei.c:253 udf_lookup+0xef/0x340 fs/udf/namei.c:309 lookup_open fs/namei.c:3391 [inline] open_last_lookups fs/namei.c:3481 [inline] path_openat+0x10e6/0x2df0 fs/namei.c:3710 do_filp_open+0x264/0x4f0 fs/namei.c:3740 do_sys_openat2+0x124/0x4e0 fs/open.c:1310 do_sys_open fs/open.c:1326 [inline] __do_sys_creat fs/open.c:1402 [inline] __se_sys_creat fs/open.c:1396 [inline] __x64_sys_creat+0x11f/0x160 fs/open.c:1396 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7ffab0d164d9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe1a7e6bb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffab0d164d9 RDX: 00007ffab0d164d9 RSI: 0000000000000000 RDI: 0000000020000180 RBP: 00007ffab0cd5a10 R08: 0000000000000000 R09: 0000000000000000 R10: 00005555573552c0 R11: 0000000000000246 R12: 00007ffab0cd5aa0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Allocated by task 3610: kasan_save_stack mm/kasan/common.c:45 [inline] kasan_set_track+0x3d/0x60 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:371 [inline] __kasan_kmalloc+0x97/0xb0 mm/kasan/common.c:380 kmalloc include/linux/slab.h:576 [inline] udf_find_entry+0x7b6/0x14f0 fs/udf/namei.c:243 udf_lookup+0xef/0x340 fs/udf/namei.c:309 lookup_open fs/namei.c:3391 [inline] open_last_lookups fs/namei.c:3481 [inline] path_openat+0x10e6/0x2df0 fs/namei.c:3710 do_filp_open+0x264/0x4f0 fs/namei.c:3740 do_sys_openat2+0x124/0x4e0 fs/open.c:1310 do_sys_open fs/open.c:1326 [inline] __do_sys_creat fs/open.c:1402 [inline] __se_sys_creat fs/open.c:1396 [inline] __x64_sys_creat+0x11f/0x160 fs/open.c:1396 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff8880123ff800 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 150 bytes inside of 256-byte region [ffff8880123ff800, ffff8880123ff900) The buggy address belongs to the physical page: page:ffffea000048ff80 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x123fe head:ffffea000048ff80 order:1 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 ffffea00004b8500 dead000000000003 ffff888012041b40 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x0(), pid 1, tgid 1 (swapper/0), ts 1841222404, free_ts 0 create_dummy_stack mm/page_owner.c:67 [inline] register_early_stack+0x77/0xd0 mm/page_owner.c:83 init_page_owner+0x3a/0x731 mm/page_owner.c:93 kernel_init_freeable+0x41c/0x5d5 init/main.c:1629 kernel_init+0x19/0x2b0 init/main.c:1519 page_owner free stack trace missing Memory state around the buggy address: ffff8880123ff780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8880123ff800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880123ff880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 ^ ffff8880123ff900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8880123ff980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fix this by changing the memory size allocated for copy_name from UDF_NAME_LEN(254) to UDF_NAME_LEN_CS0(255), because the total length (lfi) of subsequent memcpy can be up to 255. CC: [email protected] Reported-by: [email protected] Fixes: 066b9cded00b ("udf: Use separate buffer for copying split names") Signed-off-by: ZhangPeng <[email protected]> Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-09arm64/syscall: Include asm/ptrace.h in syscall_wrapper header.Kuniyuki Iwashima1-1/+1
Add the same change for ARM64 as done in the commit 9440c4294160 ("x86/syscall: Include asm/ptrace.h in syscall_wrapper header") to make sure all syscalls see 'struct pt_regs' definition and resulted BTF for '__arm64_sys_*(struct pt_regs *regs)' functions point to actual struct. Without this patch, the BPF verifier refuses to load a tracing prog which accesses pt_regs. bpf(BPF_PROG_LOAD, {prog_type=0x1a, ...}, 128) = -1 EACCES With this patch, we can see the correct error, which saves us time in debugging the prog. bpf(BPF_PROG_LOAD, {prog_type=0x1a, ...}, 128) = 4 bpf(BPF_RAW_TRACEPOINT_OPEN, {raw_tracepoint={name=NULL, prog_fd=4}}, 128) = -1 ENOTSUPP Signed-off-by: Kuniyuki Iwashima <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-11-09arm64: Fix bit-shifting UB in the MIDR_CPU_MODEL() macroD Scott Phillips1-1/+1
CONFIG_UBSAN_SHIFT with gcc-5 complains that the shifting of ARM_CPU_IMP_AMPERE (0xC0) into bits [31:24] by MIDR_CPU_MODEL() is undefined behavior. Well, sort of, it actually spells the error as: arch/arm64/kernel/proton-pack.c: In function 'spectre_bhb_loop_affected': arch/arm64/include/asm/cputype.h:44:2: error: initializer element is not constant (((imp) << MIDR_IMPLEMENTOR_SHIFT) | \ ^ This isn't an issue for other Implementor codes, as all the other codes have zero in the top bit and so are representable as a signed int. Cast the implementor code to unsigned in MIDR_CPU_MODEL to remove the undefined behavior. Fixes: 0e5d5ae837c8 ("arm64: Add AMPERE1 to the Spectre-BHB affected list") Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: D Scott Phillips <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-11-09selftests: netfilter: Fix and review rpath.shPhil Sutter1-6/+8
Address a few problems with the initial test script version: * On systems with ip6tables but no ip6tables-legacy, testing for ip6tables was disabled by accident. * Firewall setup phase did not respect possibly unavailable tools. * Consistently call nft via '$nft'. Fixes: 6e31ce831c63b ("selftests: netfilter: Test reverse path filtering") Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-11-09ALSA: usb-audio: Add DSD support for Accuphase DAC-60Jussi Laako1-0/+1
Accuphase DAC-60 option card supports native DSD up to DSD256, but doesn't have support for auto-detection. Explicitly enable DSD support for the correct altsetting. Signed-off-by: Jussi Laako <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2022-11-08ibmveth: Reduce default tx queues to 8Nick Child2-1/+3
Previously, the default number of transmit queues was 16. Due to resource concerns, set to 8 queues instead. Still allow the user to set more queues (max 16) if they like. Since the driver is virtualized away from the physical NIC, the purpose of multiple queues is purely to allow for parallel calls to the hypervisor. Therefore, there is no noticeable effect on performance by reducing queue count to 8. Fixes: d926793c1de9 ("ibmveth: Implement multi queue on xmit") Reported-by: Dave Taht <[email protected]> Signed-off-by: Nick Child <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-08net: nixge: disable napi when enable interrupts failed in nixge_open()Zhengchao Shao1-0/+1
When failed to enable interrupts in nixge_open() for opening device, napi isn't disabled. When open nixge device next time, it will reports a invalid opcode issue. Fix it. Only be compiled, not be tested. Fixes: 492caffa8a1a ("net: ethernet: nixge: Add support for National Instruments XGE netdev") Signed-off-by: Zhengchao Shao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-08net: tun: call napi_schedule_prep() to ensure we own a napiEric Dumazet1-6/+13
A recent patch exposed another issue in napi_get_frags() caught by syzbot [1] Before feeding packets to GRO, and calling napi_complete() we must first grab NAPI_STATE_SCHED. [1] WARNING: CPU: 0 PID: 3612 at net/core/dev.c:6076 napi_complete_done+0x45b/0x880 net/core/dev.c:6076 Modules linked in: CPU: 0 PID: 3612 Comm: syz-executor408 Not tainted 6.1.0-rc3-syzkaller-00175-g1118b2049d77 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 RIP: 0010:napi_complete_done+0x45b/0x880 net/core/dev.c:6076 Code: c1 ea 03 0f b6 14 02 4c 89 f0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 24 04 00 00 41 89 5d 1c e9 73 fc ff ff e8 b5 53 22 fa <0f> 0b e9 82 fe ff ff e8 a9 53 22 fa 48 8b 5c 24 08 31 ff 48 89 de RSP: 0018:ffffc90003c4f920 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000030 RCX: 0000000000000000 RDX: ffff8880251c0000 RSI: ffffffff875a58db RDI: 0000000000000007 RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888072d02628 R13: ffff888072d02618 R14: ffff888072d02634 R15: 0000000000000000 FS: 0000555555f13300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055c44d3892b8 CR3: 00000000172d2000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> napi_complete include/linux/netdevice.h:510 [inline] tun_get_user+0x206d/0x3a60 drivers/net/tun.c:1980 tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2027 call_write_iter include/linux/fs.h:2191 [inline] do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735 do_iter_write+0x182/0x700 fs/read_write.c:861 vfs_writev+0x1aa/0x630 fs/read_write.c:934 do_writev+0x133/0x2f0 fs/read_write.c:977 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f37021a3c19 Fixes: 1118b2049d77 ("net: tun: Fix memory leaks of napi_get_frags") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Wang Yufen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-08net: marvell: prestera: fix memory leak in prestera_rxtx_switch_init()Zhengchao Shao1-1/+6
When prestera_sdma_switch_init() failed, the memory pointed to by sw->rxtx isn't released. Fix it. Only be compiled, not be tested. Fixes: 501ef3066c89 ("net: marvell: prestera: Add driver for Prestera family ASIC devices") Signed-off-by: Zhengchao Shao <[email protected]> Reviewed-by: Vadym Kochan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-08docs: kmsan: fix formatting of "Example report"Alexander Potapenko1-0/+1
Add a blank line to make the sentence before the list render as a separate paragraph, not a definition. Link: https://lkml.kernel.org/r/[email protected] Fixes: 93858ae70cf4 ("kmsan: add ReST documentation") Signed-off-by: Alexander Potapenko <[email protected]> Suggested-by: Bagas Sanjaya <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/damon/dbgfs: check if rm_contexts input is for a real contextSeongJae Park1-0/+7
A user could write a name of a file under 'damon/' debugfs directory, which is not a user-created context, to 'rm_contexts' file. In the case, 'dbgfs_rm_context()' just assumes it's the valid DAMON context directory only if a file of the name exist. As a result, invalid memory access could happen as below. Fix the bug by checking if the given input is for a directory. This check can filter out non-context inputs because directories under 'damon/' debugfs directory can be created via only 'mk_contexts' file. This bug has found by syzbot[1]. [1] https://lore.kernel.org/damon/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts") Signed-off-by: SeongJae Park <[email protected]> Reported-by: [email protected] Cc: <[email protected]> [5.15.x] Signed-off-by: Andrew Morton <[email protected]>
2022-11-08maple_tree: don't set a new maximum on the node when not reusing nodesLiam Howlett1-2/+1
In RCU mode, the node limits were being updated to the last pivot which may not be correct and would cause the metadata to be set when it shouldn't. Fix this by not setting a new limit in this case. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08maple_tree: fix depth tracking in maple_stateLiam Howlett1-1/+2
It is possible to confuse the depth tracking in the maple state by searching the same node for values. Fix the depth tracking by moving where the depth is incremented closer to where the node changes level. Also change the initial depth setting when using the root node. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level pagingNaoya Horiguchi1-0/+4
The following bug is reported to be triggered when starting X on x86-32 system with i915: [ 225.777375] kernel BUG at mm/memory.c:2664! [ 225.777391] invalid opcode: 0000 [#1] PREEMPT SMP [ 225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86 [ 225.777415] Hardware name: /8I865G775-G, BIOS F1 08/29/2006 [ 225.777421] EIP: __apply_to_page_range+0x24d/0x31c [ 225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76 [ 225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00 [ 225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80 [ 225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202 [ 225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0 [ 225.777479] Call Trace: [ 225.777486] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.777684] apply_to_page_range+0x21/0x27 [ 225.777694] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.777870] remap_io_mapping+0x49/0x75 [i915] [ 225.778046] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.778220] ? mutex_unlock+0xb/0xd [ 225.778231] ? i915_vma_pin_fence+0x6d/0xf7 [i915] [ 225.778420] vm_fault_gtt+0x2a9/0x8f1 [i915] [ 225.778644] ? lock_is_held_type+0x56/0xe7 [ 225.778655] ? lock_is_held_type+0x7a/0xe7 [ 225.778663] ? 0xc1000000 [ 225.778670] __do_fault+0x21/0x6a [ 225.778679] handle_mm_fault+0x708/0xb21 [ 225.778686] ? mt_find+0x21e/0x5ae [ 225.778696] exc_page_fault+0x185/0x705 [ 225.778704] ? doublefault_shim+0x127/0x127 [ 225.778715] handle_exception+0x130/0x130 [ 225.778723] EIP: 0xb700468a Recently pud_huge() got aware of non-present entry by commit 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") to handle some special states of gigantic page. However, it's overlooked that pud_none() always returns false when running with 2-level paging, and as a result pud_huge() can return true pointlessly. Introduce "#if CONFIG_PGTABLE_LEVELS > 2" to pud_huge() to deal with this. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") Signed-off-by: Naoya Horiguchi <[email protected]> Reported-by: Ville Syrjälä <[email protected]> Tested-by: Ville Syrjälä <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Liu Shixin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Yang Shi <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08fs: fix leaked psi pressure stateJohannes Weiner2-13/+19
When psi annotations were added to to btrfs compression reads, the psi state tracking over add_ra_bio_pages and btrfs_submit_compressed_read was faulty. A pressure state, once entered, is never left. This results in incorrectly elevated pressure, which triggers OOM kills. pflags record the *previous* memstall state when we enter a new one. The code tried to initialize pflags to 1, and then optimize the leave call when we either didn't enter a memstall, or were already inside a nested stall. However, there can be multiple PageWorkingset pages in the bio, at which point it's that path itself that enters repeatedly and overwrites pflags. This causes us to miss the exit. Enter the stall only once if needed, then unwind correctly. erofs has the same problem, fix that up too. And move the memstall exit past submit_bio() to restore submit accounting originally added by b8e24a9300b0 ("block: annotate refault stalls from IO submission"). Link: https://lkml.kernel.org/r/[email protected] Fixes: 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads") Fixes: 99486c511f68 ("erofs: add manual PSI accounting for the compressed address space") Fixes: 118f3663fbc6 ("block: remove PSI accounting from the bio layer") Link: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Thorsten Leemhuis <[email protected]> Tested-by: Thorsten Leemhuis <[email protected]> Cc: Chao Yu <[email protected]> Cc: Chris Mason <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Sterba <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08nilfs2: fix use-after-free bug of ns_writer on remountRyusuke Konishi2-9/+8
If a nilfs2 filesystem is downgraded to read-only due to metadata corruption on disk and is remounted read/write, or if emergency read-only remount is performed, detaching a log writer and synchronizing the filesystem can be done at the same time. In these cases, use-after-free of the log writer (hereinafter nilfs->ns_writer) can happen as shown in the scenario below: Task1 Task2 -------------------------------- ------------------------------ nilfs_construct_segment nilfs_segctor_sync init_wait init_waitqueue_entry add_wait_queue schedule nilfs_remount (R/W remount case) nilfs_attach_log_writer nilfs_detach_log_writer nilfs_segctor_destroy kfree finish_wait _raw_spin_lock_irqsave __raw_spin_lock_irqsave do_raw_spin_lock debug_spin_lock_before <-- use-after-free While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1 waked up, Task1 accesses nilfs->ns_writer which is already freed. This scenario diagram is based on the Shigeru Yoshida's post [1]. This patch fixes the issue by not detaching nilfs->ns_writer on remount so that this UAF race doesn't happen. Along with this change, this patch also inserts a few necessary read-only checks with superblock instance where only the ns_writer pointer was used to check if the filesystem is read-only. Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df8b Link: https://lkml.kernel.org/r/[email protected] [1] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: [email protected] Reported-by: Shigeru Yoshida <[email protected]> Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08x86/traps: avoid KMSAN bugs originating from handle_bug()Alexander Potapenko1-0/+7
There is a case in exc_invalid_op handler that is executed outside the irqentry_enter()/irqentry_exit() region when an UD2 instruction is used to encode a call to __warn(). In that case the `struct pt_regs` passed to the interrupt handler is never unpoisoned by KMSAN (this is normally done in irqentry_enter()), which leads to false positives inside handle_bug(). Use kmsan_unpoison_entry_regs() to explicitly unpoison those registers before using them. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Kees Cook <[email protected]> Cc: Marco Elver <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08kmsan: make sure PREEMPT_RT is offAlexander Potapenko1-0/+1
As pointed out by Peter Zijlstra, __msan_poison_alloca() does not play well with IRQ code when PREEMPT_RT is on, because in that mode even GFP_ATOMIC allocations cannot be performed. Fixing this would require making stackdepot completely lockless, which is quite challenging and may be excessive for the time being. Instead, make sure KMSAN is incompatible with PREEMPT_RT, like other debug configs are. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Alexander Potapenko <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARNAlexander Potapenko1-1/+1
As pointed out by Masahiro Yamada, Kconfig picks up the first default entry which has true 'if' condition. Hence, the previously added check for KMSAN was never used, because it followed the checks for 64BIT and !64BIT. Put KMSAN check before others to ensure it is always applied. Link: https://lkml.kernel.org/r/[email protected] Link: https://github.com/google/kmsan/issues/89 Link: https://lore.kernel.org/linux-mm/[email protected]/ Fixes: 921757bc9b61 ("Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default") Signed-off-by: Alexander Potapenko <[email protected]> Cc: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08x86/uaccess: instrument copy_from_user_nmi()Alexander Potapenko1-0/+3
Make sure usercopy hooks from linux/instrumented.h are invoked for copy_from_user_nmi(). This fixes KMSAN false positives reported when dumping opcodes for a stack trace. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Marco Elver <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08kmsan: core: kmsan_in_runtime() should return true in NMI contextAlexander Potapenko1-0/+2
Without that, every call to __msan_poison_alloca() in NMI may end up allocating memory, which is NMI-unsafe. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Alexander Potapenko <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm: hugetlb_vmemmap: include missing linux/moduleparam.hVasily Gorbik1-0/+1
The kernel test robot reported build failures with a 'randconfig' on s390: >> mm/hugetlb_vmemmap.c:421:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0); ^ Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/patch.git-296b83ca939b.your-ad-here.call-01667411912-ext-5073@work.hours Fixes: 30152245c63b ("mm: hugetlb_vmemmap: replace early_param() with core_param()") Signed-off-by: Vasily Gorbik <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/shmem: use page_mapping() to detect page cache for uffd continuePeter Xu1-1/+1
mfill_atomic_install_pte() checks page->mapping to detect whether one page is used in the page cache. However as pointed out by Matthew, the page can logically be a tail page rather than always the head in the case of uffd minor mode with UFFDIO_CONTINUE. It means we could wrongly install one pte with shmem thp tail page assuming it's an anonymous page. It's not that clear even for anonymous page, since normally anonymous pages also have page->mapping being setup with the anon vma. It's safe here only because the only such caller to mfill_atomic_install_pte() is always passing in a newly allocated page (mcopy_atomic_pte()), whose page->mapping is not yet setup. However that's not extremely obvious either. For either of above, use page_mapping() instead. Link: https://lkml.kernel.org/r/Y2K+y7wnhC4vbnP2@x1n Fixes: 153132571f02 ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem") Signed-off-by: Peter Xu <[email protected]> Reported-by: Matthew Wilcox <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/memremap.c: map FS_DAX device memory as decryptedPankaj Gupta1-0/+1
virtio_pmem use devm_memremap_pages() to map the device memory. By default this memory is mapped as encrypted with SEV. Guest reboot changes the current encryption key and guest no longer properly decrypts the FSDAX device meta data. Mark the corresponding device memory region for FSDAX devices (mapped with memremap_pages) as decrypted to retain the persistent memory property. Link: https://lkml.kernel.org/r/[email protected] Fixes: b7b3c01b19159 ("mm/memremap_pages: support multiple ranges per invocation") Signed-off-by: Pankaj Gupta <[email protected]> Cc: Dan Williams <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"Peter Xu1-3/+6
Anatoly Pugachev reported sparc64 breakage on the patch: https://lore.kernel.org/r/[email protected] The sparc64 impl of pte_mkdirty() is definitely slightly special in that it leverages a code patching mechanism for sun4u/sun4v on relevant pgtable entry operations. Before having a clue of why the sparc64 is special and caused the patch to SIGSEGV the processes, revert the patch for now. The swap path of dirty bit inheritage is kept because that's using the swap shared code so we assume it'll not be affected. Link: https://lkml.kernel.org/r/Y1Wbi4yyVvDtg4zN@x1n Fixes: 0ccf7f168e17 ("mm/thp: carry over dirty bit when thp splits on pmd") Signed-off-by: Peter Xu <[email protected]> Reported-by: Anatoly Pugachev <[email protected]> Tested-by: Anatoly Pugachev <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David S. Miller <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08nilfs2: fix deadlock in nilfs_count_free_blocks()Ryusuke Konishi1-2/+0
A semaphore deadlock can occur if nilfs_get_block() detects metadata corruption while locating data blocks and a superblock writeback occurs at the same time: task 1 task 2 ------ ------ * A file operation * nilfs_truncate() nilfs_get_block() down_read(rwsem A) <-- nilfs_bmap_lookup_contig() ... generic_shutdown_super() nilfs_put_super() * Prepare to write superblock * down_write(rwsem B) <-- nilfs_cleanup_super() * Detect b-tree corruption * nilfs_set_log_cursor() nilfs_bmap_convert_error() nilfs_count_free_blocks() __nilfs_error() down_read(rwsem A) <-- nilfs_set_error() down_write(rwsem B) <-- *** DEADLOCK *** Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem) and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata corruption, __nilfs_error() is called from nilfs_bmap_convert_error() inside the lock section. Since __nilfs_error() calls nilfs_set_error() unless the filesystem is read-only and nilfs_set_error() attempts to writelock rwsem B (= nilfs->ns_sem) to write back superblock exclusively, hierarchical lock acquisition occurs in the order rwsem A -> rwsem B. Now, if another task starts updating the superblock, it may writelock rwsem B during the lock sequence above, and can deadlock trying to readlock rwsem A in nilfs_count_free_blocks(). However, there is actually no need to take rwsem A in nilfs_count_free_blocks() because it, within the lock section, only reads a single integer data on a shared struct with nilfs_sufile_get_ncleansegs(). This has been the case after commit aa474a220180 ("nilfs2: add local variable to cache the number of clean segments"), that is, even before this bug was introduced. So, this resolves the deadlock problem by just not taking the semaphore in nilfs_count_free_blocks(). Link: https://lkml.kernel.org/r/[email protected] Fixes: e828949e5b42 ("nilfs2: call nilfs_error inside bmap routines") Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: [email protected] Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> [2.6.38+ Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm/mmap: fix memory leak in mmap_region()Li Zetao1-1/+5
There is a memory leak reported by kmemleak: unreferenced object 0xffff88817231ce40 (size 224): comm "mount.cifs", pid 19308, jiffies 4295917571 (age 405.880s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 60 c0 b2 00 81 88 ff ff 98 83 01 42 81 88 ff ff `..........B.... backtrace: [<ffffffff81936171>] __alloc_file+0x21/0x250 [<ffffffff81937051>] alloc_empty_file+0x41/0xf0 [<ffffffff81937159>] alloc_file+0x59/0x710 [<ffffffff81937964>] alloc_file_pseudo+0x154/0x210 [<ffffffff81741dbf>] __shmem_file_setup+0xff/0x2a0 [<ffffffff817502cd>] shmem_zero_setup+0x8d/0x160 [<ffffffff817cc1d5>] mmap_region+0x1075/0x19d0 [<ffffffff817cd257>] do_mmap+0x727/0x1110 [<ffffffff817518b2>] vm_mmap_pgoff+0x112/0x1e0 [<ffffffff83adf955>] do_syscall_64+0x35/0x80 [<ffffffff83c0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 The root cause was traced to an error handing path in mmap_region() when arch_validate_flags() or mas_preallocate() fails. In the shared anonymous mapping sence, vma will be setuped and mapped with a new shared anonymous file via shmem_zero_setup(). So in this case, the file resource needs to be released. Fix it by calling fput(vma->vm_file) and unmap_region() when arch_validate_flags() or mas_preallocate() returns an error in the shared anonymous mapping sence. Link: https://lkml.kernel.org/r/[email protected] Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree") Fixes: c462ac288f2c ("mm: Introduce arch_validate_flags()") Signed-off-by: Li Zetao <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08hugetlbfs: don't delete error page from pagecacheJames Houghton3-8/+14
This change is very similar to the change that was made for shmem [1], and it solves the same problem but for HugeTLBFS instead. Currently, when poison is found in a HugeTLB page, the page is removed from the page cache. That means that attempting to map or read that hugepage in the future will result in a new hugepage being allocated instead of notifying the user that the page was poisoned. As [1] states, this is effectively memory corruption. The fix is to leave the page in the page cache. If the user attempts to use a poisoned HugeTLB page with a syscall, the syscall will fail with EIO, the same error code that shmem uses. For attempts to map the page, the thread will get a BUS_MCEERR_AR SIGBUS. [1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: James Houghton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Tested-by: Naoya Horiguchi <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: James Houghton <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Muchun Song <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08maple_tree: reorganize testing to restore module testingLiam Howlett10-35745/+36031
Along the development cycle, the testing code support for module/in-kernel compiles was removed. Restore this functionality by moving any internal API tests to the userspace side, as well as threading tests. Fix the lockdep issues and add a way to reduce memory usage so the tests can complete with KASAN + memleak detection. Make the tests work on 32 bit hosts where possible and detect 32 bit hosts in the radix test suite. [[email protected]: fix module export] [[email protected]: fix it some more] [[email protected]: fix compile warnings on 32bit build in check_find()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08maple_tree: mas_anode_descend() clang-analyzer cleanupLiam Howlett1-6/+4
clang-analyzer reported some Dead Stores in mas_anode_descend(). Upon inspection, there were a few clean ups that would make the code cleaner: The count variable was set from the mt_slots array and then updated but never used again. Just use the array reference directly. Also stop updating the type since it isn't used after the update. Stop setting the gaps pointer to NULL at the start since it is always set before the loop begins. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Suggested-by: Lukas Bulwahn <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08maple_tree: remove pointer to pointer use in mas_alloc_nodes()Liam Howlett1-3/+1
There is a more direct and cleaner way of implementing the same functional code. Remove the confusing and unnecessary use of pointers here. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Suggested-by: Lukas Bulwahn <[email protected]> Signed-off-by: Andrew Morton <[email protected]>