aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-24mm/memory_failure: fix the missing pte_unmap() callQi Zheng1-5/+5
The paired pte_unmap() call is missing before the dev_pagemap_mapping_shift() returns. So fix it. David says: "I guess this code never runs on 32bit / highmem, that's why we didn't notice so far". [[email protected]: cleanup] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qi Zheng <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24kasan: always respect CONFIG_KASAN_STACKNathan Chancellor1-1/+2
Currently, the asan-stack parameter is only passed along if CFLAGS_KASAN_SHADOW is not empty, which requires KASAN_SHADOW_OFFSET to be defined in Kconfig so that the value can be checked. In RISC-V's case, KASAN_SHADOW_OFFSET is not defined in Kconfig, which means that asan-stack does not get disabled with clang even when CONFIG_KASAN_STACK is disabled, resulting in large stack warnings with allmodconfig: drivers/video/fbdev/omap2/omapfb/displays/panel-lgphilips-lb035q02.c:117:12: error: stack frame size (14400) exceeds limit (2048) in function 'lb035q02_connect' [-Werror,-Wframe-larger-than] static int lb035q02_connect(struct omap_dss_device *dssdev) ^ 1 error generated. Ensure that the value of CONFIG_KASAN_STACK is always passed along to the compiler so that these warnings do not happen when CONFIG_KASAN_STACK is disabled. Link: https://github.com/ClangBuiltLinux/linux/issues/1453 References: 6baec880d7a5 ("kasan: turn off asan-stack for clang-8 and earlier") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24sh: pgtable-3level: fix cast to pointer from integer of different sizeGeert Uytterhoeven1-1/+1
If X2TLB=y (CPU_SHX2=y or CPU_SHX3=y, e.g. migor_defconfig), pgd_t.pgd is "unsigned long long", causing: In file included from arch/sh/include/asm/pgtable.h:13, from include/linux/pgtable.h:6, from include/linux/mm.h:33, from arch/sh/kernel/asm-offsets.c:14: arch/sh/include/asm/pgtable-3level.h: In function `pud_pgtable': arch/sh/include/asm/pgtable-3level.h:37:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 37 | return (pmd_t *)pud_val(pud); | ^ Fix this by adding an intermediate cast to "unsigned long", which is basically what the old code did before. Link: https://lkml.kernel.org/r/2c2eef3c9a2f57e5609100a4864715ccf253d30f.1631713483.git.geert+renesas@glider.be Fixes: 9cf6fa2458443118 ("mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *") Signed-off-by: Geert Uytterhoeven <[email protected]> Tested-by: Daniel Palmer <[email protected]> Acked-by: Rob Landley <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Jacopo Mondi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24mm/debug: sync up latest migrate_reason to migrate_reason_namesWeizhao Ouyang2-1/+6
Sync up MR_DEMOTION to migrate_reason_names and add a synch prompt. Link: https://lkml.kernel.org/r/[email protected] Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim") Signed-off-by: Weizhao Ouyang <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mina Almasry <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Wei Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PINWeizhao Ouyang1-1/+2
Sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN to migrate_reason_names. Link: https://lkml.kernel.org/r/[email protected] Fixes: 310253514bbf ("mm/migrate: rename migration reason MR_CMA to MR_CONTIG_RANGE") Fixes: d1e153fea2a8 ("mm/gup: migrate pinned pages out of movable zone") Signed-off-by: Weizhao Ouyang <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mina Almasry <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Wei Xu <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24mm: fs: invalidate bh_lrus for only cold pathMinchan Kim3-7/+24
The kernel test robot reported the regression of fio.write_iops[1] with commit 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration"). Since lru_add_drain is called frequently, invalidate bh_lrus there could increase bh_lrus cache miss ratio, which needs more IO in the end. This patch moves the bh_lrus invalidation from the hot path( e.g., zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all, lru_cache_disable). Zhengjun Xing confirmed "I test the patch, the regression reduced to -2.9%" [1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/ [2] 8cc621d2f45d, mm: fs: invalidate BH LRU during page migration Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Chris Goldsworthy <[email protected]> Tested-by: "Xing, Zhengjun" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24lib/zlib_inflate/inffast: check config in C to avoid unused function warningPaul Menzel1-7/+6
Building Linux for ppc64le with Ubuntu clang version 12.0.0-3ubuntu1~21.04.1 shows the warning below. arch/powerpc/boot/inffast.c:20:1: warning: unused function 'get_unaligned16' [-Wunused-function] get_unaligned16(const unsigned short *p) ^ 1 warning generated. Fix it by moving the check from the preprocessor to C, so the compiler sees the use. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Paul Menzel <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Zhen Lei <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24tools/vm/page-types: remove dependency on opt_file for idle page trackingChangbin Du1-1/+1
Idle page tracking can also be used for process address space, not only file mappings. Without this change, using with '-i' option for process address space encounters below errors reported. $ sudo ./page-types -p $(pidof bash) -i mark page idle: Bad file descriptor mark page idle: Bad file descriptor mark page idle: Bad file descriptor mark page idle: Bad file descriptor ... Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Changbin Du <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' errorMiles Chen1-0/+4
Fix the following build failure reported in [1] by adding a conditional definition of EM_RISCV in order to allow cross-compilation on machines which do not have EM_RISCV definition in their host. scripts/sorttable.c:352:7: error: use of undeclared identifier 'EM_RISCV' EM_RISCV was added to <elf.h> in glibc 2.24 so builds on systems with glibc headers < 2.24 should show this error. [[email protected]: changelog addition] Link: https://lore.kernel.org/lkml/[email protected]/ [1] Link: https://lkml.kernel.org/r/[email protected] Fixes: 54fed35fd393 ("riscv: Enable BUILDTIME_TABLE_SORT") Signed-off-by: Miles Chen <[email protected]> Reported-by: Stefan Wahren <[email protected]> Tested-by: Stefan Wahren <[email protected]> Reviewed-by: Jisheng Zhang <[email protected]> Cc: Michal Kubecek <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Markus Mayer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24ocfs2: drop acl cache for directories tooWengang Wang1-1/+2
ocfs2_data_convert_worker() is currently dropping any cached acl info for FILE before down-converting meta lock. It should also drop for DIRECTORY. Otherwise the second acl lookup returns the cached one (from VFS layer) which could be already stale. The problem we are seeing is that the acl changes on one node doesn't get refreshed on other nodes in the following case: Node 1 Node 2 -------------- ---------------- getfacl dir1 getfacl dir1 <-- this is OK setfacl -m u:user1:rwX dir1 getfacl dir1 <-- see the change for user1 getfacl dir1 <-- can't see change for user1 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wengang Wang <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24mm/shmem.c: fix judgment error in shmem_is_huge()Liu Yuntao1-2/+2
In the case of SHMEM_HUGE_WITHIN_SIZE, the page index is not rounded up correctly. When the page index points to the first page in a huge page, round_up() cannot bring it to the end of the huge page, but to the end of the previous one. An example: HPAGE_PMD_NR on my machine is 512(2 MB huge page size). After allcoating a 3000 KB buffer, I access it at location 2050 KB. In shmem_is_huge(), the corresponding index happens to be 512. After rounded up by HPAGE_PMD_NR, it will still be 512 which is smaller than i_size, and shmem_is_huge() will return true. As a result, my buffer takes an additional huge page, and that shouldn't happen when shmem_enabled is set to within_size. Link: https://lkml.kernel.org/r/[email protected] Fixes: f3f0e1d2150b2b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Liu Yuntao <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: wuxu.wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24xtensa: increase size of gcc stack frame checkGuenter Roeck1-1/+1
xtensa frame size is larger than the frame size for almost all other architectures. This results in more than 50 "the frame size of <n> is larger than 1024 bytes" errors when trying to build xtensa:allmodconfig. Increase frame size for xtensa to 1536 bytes to avoid compile errors due to frame size limits. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Reviewed-by: Max Filippov <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Laight <[email protected]> Cc: Masahiro Yamada <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24mm/damon: don't use strnlen() with known-bogus source lengthAdam Borowski1-8/+8
gcc knows the true length too, and rightfully complains. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Adam Borowski <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESSMarco Elver1-0/+2
In the main KASAN config option CC_HAS_WORKING_NOSANITIZE_ADDRESS is checked for instrumentation-based modes. However, if HAVE_ARCH_KASAN_HW_TAGS is true all modes may still be selected. To fix, also make the software modes depend on CC_HAS_WORKING_NOSANITIZE_ADDRESS. Link: https://lkml.kernel.org/r/[email protected] Fixes: 6a63a63ff1ac ("kasan: introduce CONFIG_KASAN_HW_TAGS") Signed-off-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()Naoya Horiguchi1-1/+1
Commit fcc00621d88b ("mm/hwpoison: retry with shake_page() for unhandlable pages") changed the return value of __get_hwpoison_page() to retry for transiently unhandlable cases. However, __get_hwpoison_page() currently fails to properly judge buddy pages as handlable, so hard/soft offline for buddy pages always fail as "unhandlable page". This is totally regrettable. So let's add is_free_buddy_page() in HWPoisonHandlable(), so that __get_hwpoison_page() returns different return values between buddy pages and unhandlable pages as intended. Link: https://lkml.kernel.org/r/[email protected] Fixes: fcc00621d88b ("mm/hwpoison: retry with shake_page() for unhandlable pages") Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Tony Luck <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-24io_uring: make OP_CLOSE consistent with direct openPavel Begunkov1-1/+51
From recently open/accept are now able to manipulate fixed file table, but it's inconsistent that close can't. Close the gap, keep API same as with open/accept, i.e. via sqe->file_slot. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-24Merge tag 'gpio-fixes-for-v5.15-rc3' of ↵Linus Torvalds4-7/+9
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix a regression in GPIO ACPI on HP ElitePad 1000 G2 where the gpio_set_debounce_timeout() now returns a fatal error if the specific debounce period is not supported by the driver instead of just emitting a warning - fix return values of irq_mask/unmask() callbacks in gpio-uniphier - fix hwirq calculation in gpio-aspeed-sgpio - fix two issues in gpio-rockchip: only make the extended debounce support available for v2 and remove a redundant BIT() usage * tag 'gpio-fixes-for-v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio/rockchip: fix get_direction value handling gpio/rockchip: extended debounce support is only available on v2 gpio: gpio-aspeed-sgpio: Fix wrong hwirq in irq handler. gpio: uniphier: Fix void functions to remove return value gpiolib: acpi: Make set-debounce-timeout failures non fatal
2021-09-24Merge tag 'devprop-5.15-rc3' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework fix from Rafael Wysocki: "Fix software node refcount imbalance on device removal (Laurentiu Tudor)" * tag 'devprop-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: software node: balance refcount for managed software nodes
2021-09-24Merge tag 'acpi-5.15-rc3' of ↵Linus Torvalds4-43/+10
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Revert a recent commit related to memory management that turned out to be problematic (Jia He)" * tag 'acpi-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPI: Add memory semantics to acpi_os_map_memory()"
2021-09-24Merge tag 'arm64-fixes' of ↵Linus Torvalds8-12/+30
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - It turns out that the optimised string routines merged in 5.14 are not safe with in-kernel MTE (KASAN_HW_TAGS) because of reading beyond the end of a string (strcmp, strncmp). Such reading may go across a 16 byte tag granule and cause a tag check fault. When KASAN_HW_TAGS is enabled, use the generic strcmp/strncmp C implementation. - An errata workaround for ThunderX relied on the CPU capabilities being enabled in a specific order. This disappeared with the automatic generation of the cpucaps.h file (sorted alphabetically). Fix it by checking the current CPU only rather than the system-wide capability. - Add system_supports_mte() checks on the kernel entry/exit path and thread switching to avoid unnecessary barriers and function calls on systems where MTE is not supported. - kselftests: skip arm64 tests if the required features are missing. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Restore forced disabling of KPTI on ThunderX kselftest/arm64: signal: Skip tests if required features are missing arm64: Mitigate MTE issues with str{n}cmp() arm64: add MTE supported check to thread switching and syscall entry/exit
2021-09-24Merge tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-clientLinus Torvalds1-2/+2
Pull ceph fix from Ilya Dryomov: "A fix for a potential array out of bounds access from Dan" * tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-client: ceph: fix off by one bugs in unsafe_request_wait()
2021-09-24Merge tag 'fixes_for_v5.15-rc3' of ↵Linus Torvalds2-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc filesystem fixes from Jan Kara: "A for ext2 sleep in atomic context in case of some fs problems and a cleanup of an invalidate_lock initialization" * tag 'fixes_for_v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: fix sleeping in atomic bugs on error mm: Fully initialize invalidate_lock, amend lock class later
2021-09-24Merge branch 'work.init' of ↵Linus Torvalds1-14/+16
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Followups to nodev root stuff from this merge window" * 'work.init' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: init: don't panic if mount_nodev_root failed init/do_mounts.c: Harden split_fs_names() against buffer overflow
2021-09-24Merge tag 'drm-fixes-2021-09-24' of git://anongit.freedesktop.org/drm/drmLinus Torvalds10-32/+57
Pull drm fixes from Dave Airlie: "Quiet week this week, just some i915 and amd fixes, just getting ready for my all nighter maintainer summit! Summary: i915: - Fix ADL-P memory bandwidth parameters - Fix memory corruption due to a double free - Fix memory leak in DMC firmware handling amdgpu: - Update MAINTAINERS entry for powerplay - Fix empty macros - SI DPM fix amdkfd: - SVM fixes - DMA mapping fix" * tag 'drm-fixes-2021-09-24' of git://anongit.freedesktop.org/drm/drm: drm/amdkfd: fix svm_migrate_fini warning drm/amdkfd: handle svm migrate init error drm/amd/pm: Update intermediate power state for SI drm/amdkfd: fix dma mapping leaking warning drm/amdkfd: SVM map to gpus check vma boundary MAINTAINERS: fix up entry for AMD Powerplay drm/amd/display: fix empty debug macros drm/i915: Free all DMC payloads drm/i915: Move __i915_gem_free_object to ttm_bo_destroy drm/i915: Update memory bandwidth parameters
2021-09-24block: hold ->invalidate_lock in blkdev_fallocateMing Lei1-11/+10
When running ->fallocate(), blkdev_fallocate() should hold mapping->invalidate_lock to prevent page cache from being accessed, otherwise stale data may be read in page cache. Without this patch, blktests block/009 fails sometimes. With this patch, block/009 can pass always. Also as Jan pointed out, no pages can be created in the discarded area while you are holding the invalidate_lock, so remove the 2nd truncate_bdev_range(). Cc: Jan Kara <[email protected]> Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24blktrace: Fix uaf in blk_trace access after removing by sysfsZhihao Cheng1-0/+8
There is an use-after-free problem triggered by following process: P1(sda) P2(sdb) echo 0 > /sys/block/sdb/trace/enable blk_trace_remove_queue synchronize_rcu blk_trace_free relay_close rcu_read_lock __blk_add_trace trace_note_tsk (Iterate running_trace_list) relay_close_buf relay_destroy_buf kfree(buf) trace_note(sdb's bt) relay_reserve buf->offset <- nullptr deference (use-after-free) !!! rcu_read_unlock [ 502.714379] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 502.715260] #PF: supervisor read access in kernel mode [ 502.715903] #PF: error_code(0x0000) - not-present page [ 502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0 [ 502.717252] Oops: 0000 [#1] SMP [ 502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360 [ 502.732872] Call Trace: [ 502.733193] __blk_add_trace.cold+0x137/0x1a3 [ 502.733734] blk_add_trace_rq+0x7b/0xd0 [ 502.734207] blk_add_trace_rq_issue+0x54/0xa0 [ 502.734755] blk_mq_start_request+0xde/0x1b0 [ 502.735287] scsi_queue_rq+0x528/0x1140 ... [ 502.742704] sg_new_write.isra.0+0x16e/0x3e0 [ 502.747501] sg_ioctl+0x466/0x1100 Reproduce method: ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sda, BLKTRACESTART) ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sdb, BLKTRACESTART) echo 0 > /sys/block/sdb/trace/enable & // Add delay(mdelay/msleep) before kernel enters blk_trace_free() ioctl$SG_IO(/dev/sda, SG_IO, ...) // Enters trace_note_tsk() after blk_trace_free() returned // Use mdelay in rcu region rather than msleep(which may schedule out) Remove blk_trace from running_list before calling blk_trace_free() by sysfs if blk_trace is at Blktrace_running state. Fixes: c71a896154119f ("blktrace: add ftrace plugin") Signed-off-by: Zhihao Cheng <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24block: don't call rq_qos_ops->done_bio if the bio isn't trackedMing Lei1-1/+1
rq_qos framework is only applied on request based driver, so: 1) rq_qos_done_bio() needn't to be called for bio based driver 2) rq_qos_done_bio() needn't to be called for bio which isn't tracked, such as bios ended from error handling code. Especially in bio_endio(): 1) request queue is referred via bio->bi_bdev->bd_disk->queue, which may be gone since request queue refcount may not be held in above two cases 2) q->rq_qos may be freed in blk_cleanup_queue() when calling into __rq_qos_done_bio() Fix the potential kernel panic by not calling rq_qos_ops->done_bio if the bio isn't tracked. This way is safe because both ioc_rqos_done_bio() and blkcg_iolatency_done_bio() are nop if the bio isn't tracked. Reported-by: Yu Kuai <[email protected]> Cc: [email protected] Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: kill extra checks in io_write()Pavel Begunkov1-3/+0
We don't retry short writes and so we would never get to async setup in io_write() in that case. Thus ret2 > 0 is always false and iov_iter_advance() is never used. Apparently, the same is found by Coverity, which complains on the code. Fixes: cd65869512ab ("io_uring: use iov_iter state save/restore helpers") Reported-by: Dave Jones <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/5b33e61034748ef1022766efc0fb8854cfcf749c.1632500058.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: don't punt files update to io-wq unconditionallyJens Axboe1-5/+2
There's no reason to punt it unconditionally, we just need to ensure that the submit lock grabbing is conditional. Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: put provided buffer meta data under memcg accountingJens Axboe1-1/+1
For each provided buffer, we allocate a struct io_buffer to hold the data associated with it. As a large number of buffers can be provided, account that data with memcg. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: allow conditional reschedule for intensive iteratorsJens Axboe1-2/+6
If we have a lot of threads and rings, the tctx list can get quite big. This is especially true if we keep creating new threads and rings. Likewise for the provided buffers list. Be nice and insert a conditional reschedule point while iterating the nodes for deletion. Link: https://lore.kernel.org/io-uring/[email protected]/ Reported-by: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: fix potential req refcount underflowHao Xu1-2/+7
For multishot mode, there may be cases like: iowq original context io_poll_add _arm_poll() mask = vfs_poll() is not 0 if mask (2) io_poll_complete() compl_unlock (interruption happens tw queued to original context) io_poll_task_func() compl_lock (3) done = io_poll_complete() is true compl_unlock put req ref (1) if (poll->flags & EPOLLONESHOT) put req ref EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple combinations that can cause ref underfow. Let's address it by: - check the return value in (2) as done - change (1) to if (done) in this way, we only do ref put in (1) if 'oneshot flag' is from (2) - do poll.done check in io_poll_task_func(), so that we won't put ref for the second time. Signed-off-by: Hao Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: fix missing set of EPOLLONESHOT for CQ ring overflowHao Xu1-1/+3
We should set EPOLLONESHOT if cqring_fill_event() returns false since io_poll_add() decides to put req or not by it. Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow") Signed-off-by: Hao Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io_uring: fix race between poll completion and cancel_hash insertionHao Xu1-3/+3
If poll arming and poll completion runs in parallel, there maybe races. For instance, run io_poll_add in iowq and io_poll_task_func in original context, then: iowq original context io_poll_add vfs_poll (interruption happens tw queued to original context) io_poll_task_func generate cqe del from cancel_hash[] if !poll.done insert to cancel_hash[] The entry left in cancel_hash[], similar case for fast poll. Fix it by set poll.done = true when del from cancel_hash[]. Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow") Signed-off-by: Hao Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-24io-wq: ensure we exit if thread group is exitingJens Axboe1-1/+2
Dave reports that a coredumping workload gets stuck in 5.15-rc2, and identified the culprit in the Fixes line below. The problem is that relying solely on fatal_signal_pending() to gate whether to exit or not fails miserably if a process gets eg SIGILL sent. Don't exclusively rely on fatal signals, also check if the thread group is exiting. Fixes: 15e20db2e0ce ("io-wq: only exit on fatal signals") Reported-by: Dave Chinner <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-24vfio/ap_ops: Add missed vfio_uninit_group_dev()Jason Gunthorpe1-1/+3
Without this call an xarray entry is leaked when the vfio_ap device is unprobed. It was missed when the below patch was rebased across the dev_set patch. Keep the remove function in the same order as the error unwind in probe. Fixes: eb0feefd4c02 ("vfio/ap_ops: Convert to use vfio_register_group_dev()") Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Tony Krowiak <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Tony Krowiak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2021-09-24RDMA/usnic: Lock VF with mutex instead of spinlockLeon Romanovsky3-10/+10
Usnic VF doesn't need lock in atomic context to create QPs, so it is safe to use mutex instead of spinlock. Such change fixes the following smatch error. Smatch static checker warning: lib/kobject.c:289 kobject_set_name_vargs() warn: sleeping in atomic context Fixes: 514aee660df4 ("RDMA: Globally allocate and release QP memory") Link: https://lore.kernel.org/r/2a0e295786c127e518ebee8bb7cafcb819a625f6.1631520231.git.leonro@nvidia.com Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Håkon Bugge <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-09-24NIOS2: fix kconfig unmet dependency warning for SERIAL_CORE_CONSOLERandy Dunlap1-1/+2
SERIAL_CORE_CONSOLE depends on TTY so EARLY_PRINTK should also depend on TTY so that it does not select SERIAL_CORE_CONSOLE inadvertently. WARNING: unmet direct dependencies detected for SERIAL_CORE_CONSOLE Depends on [n]: TTY [=n] && HAS_IOMEM [=y] Selected by [y]: - EARLY_PRINTK [=y] Fixes: e8bf5bc776ed ("nios2: add early printk support") Signed-off-by: Randy Dunlap <[email protected]> Cc: Dinh Nguyen <[email protected]> Signed-off-by: Dinh Nguyen <[email protected]>
2021-09-24drivers: net: mhi: fix error path in mhi_net_newlinkDaniele Palmas1-5/+1
Fix double free_netdev when mhi_prepare_for_transfer fails. Fixes: 3ffec6a14f24 ("net: Add mhi-net driver") Signed-off-by: Daniele Palmas <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Reviewed-by: Loic Poulain <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-24smsc95xx: fix stalled rx after link changeAaro Koskinen1-0/+3
After commit 05b35e7eb9a1 ("smsc95xx: add phylib support"), link changes are no longer propagated to usbnet. As a result, rx URB allocation won't happen until there is a packet sent out first (this might never happen, e.g. running just ssh server with a static IP). Fix by triggering usbnet EVENT_LINK_CHANGE. Fixes: 05b35e7eb9a1 ("smsc95xx: add phylib support") Signed-off-by: Aaro Koskinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-24Merge tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme into block-5.15Jens Axboe3-28/+36
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.15: - keep ctrl->namespaces ordered (me) - fix incorrect h2cdata pdu offset accounting in nvme-tcp (Sagi Grimberg) - handled updated hw_queues in nvme-fc more carefully (Daniel Wagner, James Smart)" * tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme: nvme: keep ctrl->namespaces ordered nvme-tcp: fix incorrect h2cdata pdu offset accounting nvme-fc: remove freeze/unfreeze around update_nr_hw_queues nvme-fc: avoid race between time out and tear down nvme-fc: update hardware queues before using them
2021-09-24net: ipv4: Fix rtnexthop len when RTA_FLOW is presentXiao Liang4-11/+14
Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop() to get the length of rtnexthop correct. Fixes: b0f60193632e ("ipv4: Refactor nexthop attributes in fib_dump_info") Signed-off-by: Xiao Liang <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-24net: enetc: fix the incorrect clearing of IF_MODE bitsVladimir Oltean1-2/+1
The enetc phylink .mac_config handler intends to clear the IFMODE field (bits 1:0) of the PM0_IF_MODE register, but incorrectly clears all the other fields instead. For normal operation, the bug was inconsequential, due to the fact that we write the PM0_IF_MODE register in two stages, first in phylink .mac_config (which incorrectly cleared out a bunch of stuff), then we update the speed and duplex to the correct values in phylink .mac_link_up. Judging by the code (not tested), it looks like maybe loopback mode was broken, since this is one of the settings in PM0_IF_MODE which is incorrectly cleared. Fixes: c76a97218dcb ("net: enetc: force the RGMII speed and duplex instead of operating in inband mode") Reported-by: Pavel Machek (CIP) <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-24Merge tag 'irqchip-fixes-5.15-1' of ↵Thomas Gleixner9-17/+69
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Work around a bad GIC integration on a Renesas platform, where the interconnect cannot deal with byte-sized MMIO accesses - Cleanup another Renesas driver abusing the comma operator - Fix a potential GICv4 memory leak on an error path - Make the type of 'size' consistent with the rest of the code in __irq_domain_add() - Fix a regression in the Armada 370-XP IPI path - Fix the build for the obviously unloved goldfish-pic - Some documentation fixes Link: https://lore.kernel.org/r/[email protected]
2021-09-24hwmon: (ltc2947) Properly handle errors when looking for the external clockUwe Kleine-König1-2/+6
The return value of devm_clk_get should in general be propagated to upper layer. In this case the clk is optional, use the appropriate wrapper instead of interpreting all errors as "The optional clk is not available". Signed-off-by: Uwe Kleine-König <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2021-09-24hwmon: (tmp421) fix rounding for negative valuesPaul Fertser1-16/+8
Old code produces -24999 for 0b1110011100000000 input in standard format due to always rounding up rather than "away from zero". Use the common macro for division, unify and simplify the conversion code along the way. Fixes: 9410700b881f ("hwmon: Add driver for Texas Instruments TMP421/422/423 sensor chips") Signed-off-by: Paul Fertser <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2021-09-24hwmon: (tmp421) report /PVLD condition as faultPaul Fertser1-6/+3
For both local and remote sensors all the supported ICs can report an "undervoltage lockout" condition which means the conversion wasn't properly performed due to insufficient power supply voltage and so the measurement results can't be trusted. Fixes: 9410700b881f ("hwmon: Add driver for Texas Instruments TMP421/422/423 sensor chips") Signed-off-by: Paul Fertser <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2021-09-24hwmon: (tmp421) handle I2C errorsPaul Fertser1-10/+28
Function i2c_smbus_read_byte_data() can return a negative error number instead of the data read if I2C transaction failed for whatever reason. Lack of error checking can lead to serious issues on production hardware, e.g. errors treated as temperatures produce spurious critical temperature-crossed-threshold errors in BMC logs for OCP server hardware. The patch was tested with Mellanox OCP Mezzanine card emulating TMP421 protocol for temperature sensing which sometimes leads to I2C protocol error during early boot up stage. Fixes: 9410700b881f ("hwmon: Add driver for Texas Instruments TMP421/422/423 sensor chips") Cc: [email protected] Signed-off-by: Paul Fertser <[email protected]> Link: https://lore.kernel.org/r/[email protected] [groeck: dropped unnecessary line breaks] Signed-off-by: Guenter Roeck <[email protected]>
2021-09-24RDMA/hns: Work around broken constant propagation in gcc 8Jason Gunthorpe1-5/+6
gcc 8.3 and 5.4 throw this: In function 'modify_qp_init_to_rtr', ././include/linux/compiler_types.h:322:38: error: call to '__compiletime_assert_1859' declared with attribute error: FIELD_PREP: value too large for the field _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) [..] drivers/infiniband/hw/hns/hns_roce_common.h:91:52: note: in expansion of macro 'FIELD_PREP' *((__le32 *)ptr + (field_h) / 32) |= cpu_to_le32(FIELD_PREP( \ ^~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write' #define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val) ^~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:4412:2: note: in expansion of macro 'hr_reg_write' hr_reg_write(context, QPC_LP_PKTN_INI, lp_pktn_ini); Because gcc has miscalculated the constantness of lp_pktn_ini: mtu = ib_mtu_enum_to_int(ib_mtu); if (WARN_ON(mtu < 0)) [..] lp_pktn_ini = ilog2(MAX_LP_MSG_LEN / mtu); Since mtu is limited to {256,512,1024,2048,4096} lp_pktn_ini is between 4 and 8 which is compatible with the 4 bit field in the FIELD_PREP. Work around this broken compiler by adding a 'can never be true' constraint on lp_pktn_ini's value which clears out the problem. Fixes: f0cb411aad23 ("RDMA/hns: Use new interface to modify QP context") Link: https://lore.kernel.org/r/[email protected] Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-09-24m68k: Remove set_fs()Christoph Hellwig22-117/+46
Add a m68k-only set_fc helper to set the SFC and DFC registers for the few places that need to override it for special MM operations, but disconnect that from the deprecated kernel-wide set_fs() API. Note that the SFC/DFC registers are context switched, so there is no need to disable preemption. Partially based on an earlier patch from Linus Torvalds <[email protected]>. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Michael Schmitz <[email protected]> Tested-by: Michael Schmitz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Geert Uytterhoeven <[email protected]>