aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-06-04x86/fault: Don't send SIGSEGV twice on SEGV_PKUERRJiashuo Liang1-2/+2
__bad_area_nosemaphore() calls both force_sig_pkuerr() and force_sig_fault() when handling SEGV_PKUERR. This does not cause problems because the second signal is filtered by the legacy_queue() check in __send_signal() because in both cases, the signal is SIGSEGV, the second one seeing that the first one is already pending. This causes the kernel to do unnecessary work so send the signal only once for SEGV_PKUERR. [ bp: Massage commit message. ] Fixes: 9db812dbb29d ("signal/x86: Call force_sig_pkuerr from __bad_area_nosemaphore") Suggested-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Jiashuo Liang <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-06-03x86/setup: Always reserve the first 1M of RAMMike Rapoport3-20/+41
There are BIOSes that are known to corrupt the memory under 1M, or more precisely under 640K because the memory above 640K is anyway reserved for the EGA/VGA frame buffer and BIOS. To prevent usage of the memory that will be potentially clobbered by the kernel, the beginning of the memory is always reserved. The exact size of the reserved area is determined by CONFIG_X86_RESERVE_LOW build time and the "reservelow=" command line option. The reserved range may be from 4K to 640K with the default of 64K. There are also configurations that reserve the entire 1M range, like machines with SandyBridge graphic devices or systems that enable crash kernel. In addition to the potentially clobbered memory, EBDA of unknown size may be as low as 128K and the memory above that EBDA start is also reserved early. It would have been possible to reserve the entire range under 1M unless for the real mode trampoline that must reside in that area. To accommodate placement of the real mode trampoline and keep the memory safe from being clobbered by BIOS, reserve the first 64K of RAM before memory allocations are possible and then, after the real mode trampoline is allocated, reserve the entire range from 0 to 1M. Update trim_snb_memory() and reserve_real_mode() to avoid redundant reservations of the same memory range. Also make sure the memory under 1M is not getting freed by efi_free_boot_services(). [ bp: Massage commit message and comments. ] Fixes: a799c2bd29d1 ("x86/setup: Consolidate early memory reservations") Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Tested-by: Hugh Dickins <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=213177 Link: https://lkml.kernel.org/r/[email protected]
2021-06-03x86/alternative: Optimize single-byte NOPs at an arbitrary positionBorislav Petkov1-18/+46
Up until now the assumption was that an alternative patching site would have some instructions at the beginning and trailing single-byte NOPs (0x90) padding. Therefore, the patching machinery would go and optimize those single-byte NOPs into longer ones. However, this assumption is broken on 32-bit when code like hv_do_hypercall() in hyperv_init() would use the ratpoline speculation killer CALL_NOSPEC. The 32-bit version of that macro would align certain insns to 16 bytes, leading to the compiler issuing a one or more single-byte NOPs, depending on the holes it needs to fill for alignment. That would lead to the warning in optimize_nops() to fire: ------------[ cut here ]------------ Not a NOP at 0xc27fb598 WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops.isra.13 due to that function verifying whether all of the following bytes really are single-byte NOPs. Therefore, carve out the NOP padding into a separate function and call it for each NOP range beginning with a single-byte NOP. Fixes: 23c1ad538f4f ("x86/alternatives: Optimize optimize_nops()") Reported-by: Richard Narron <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=213301 Link: https://lkml.kernel.org/r/[email protected]
2021-06-03x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()Thomas Gleixner4-74/+3
While digesting the XSAVE-related horrors which got introduced with the supervisor/user split, the recent addition of ENQCMD-related functionality got on the radar and turned out to be similarly broken. update_pasid(), which is only required when X86_FEATURE_ENQCMD is available, is invoked from two places: 1) From switch_to() for the incoming task 2) Via a SMP function call from the IOMMU/SMV code #1 is half-ways correct as it hacks around the brokenness of get_xsave_addr() by enforcing the state to be 'present', but all the conditionals in that code are completely pointless for that. Also the invocation is just useless overhead because at that point it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task and all of this can be handled at return to user space. #2 is broken beyond repair. The comment in the code claims that it is safe to invoke this in an IPI, but that's just wishful thinking. FPU state of a running task is protected by fregs_lock() which is nothing else than a local_bh_disable(). As BH-disabled regions run usually with interrupts enabled the IPI can hit a code section which modifies FPU state and there is absolutely no guarantee that any of the assumptions which are made for the IPI case is true. Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is invoked with a NULL pointer argument, so it can hit a completely unrelated task and unconditionally force an update for nothing. Worse, it can hit a kernel thread which operates on a user space address space and set a random PASID for it. The offending commit does not cleanly revert, but it's sufficient to force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid() code to make this dysfunctional all over the place. Anything more complex would require more surgery and none of the related functions outside of the x86 core code are blatantly wrong, so removing those would be overkill. As nothing enables the PASID bit in the IA32_XSS MSR yet, which is required to make this actually work, this cannot result in a regression except for related out of tree train-wrecks, but they are broken already today. Fixes: 20f0afd1fb3d ("x86/mmu: Allocate/free a PASID") Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2021-06-03dmaengine: idxd: Use cpu_feature_enabled()Borislav Petkov1-2/+2
When testing x86 feature bits, use cpu_feature_enabled() so that build-disabled features can remain off, regardless of what CPUID says. Fixes: 8e50d392652f ("dmaengine: idxd: Add shared workqueue support") Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-By: Vinod Koul <[email protected]> Cc: <[email protected]>
2021-05-31x86/thermal: Fix LVT thermal setup for SMI delivery modeBorislav Petkov3-5/+23
There are machines out there with added value crap^WBIOS which provide an SMI handler for the local APIC thermal sensor interrupt. Out of reset, the BSP on those machines has something like 0x200 in that APIC register (timestamps left in because this whole issue is timing sensitive): [ 0.033858] read lvtthmr: 0x330, val: 0x200 which means: - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled - bits [10:8] have 010b which means SMI delivery mode. Now, later during boot, when the kernel programs the local APIC, it soft-disables it temporarily through the spurious vector register: setup_local_APIC: ... /* * If this comes from kexec/kcrash the APIC might be enabled in * SPIV. Soft disable it before doing further initialization. */ value = apic_read(APIC_SPIV); value &= ~APIC_SPIV_APIC_ENABLED; apic_write(APIC_SPIV, value); which means (from the SDM): "10.4.7.2 Local APIC State After It Has Been Software Disabled ... * The mask bits for all the LVT entries are set. Attempts to reset these bits will be ignored." And this happens too: [ 0.124111] APIC: Switch to symmetric I/O mode setup [ 0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0 [ 0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0 This results in CPU 0 soft lockups depending on the placement in time when the APIC soft-disable happens. Those soft lockups are not 100% reproducible and the reason for that can only be speculated as no one tells you what SMM does. Likely, it confuses the SMM code that the APIC is disabled and the thermal interrupt doesn't doesn't fire at all, leading to CPU 0 stuck in SMM forever... Now, before 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()") due to how the APIC_LVTTHMR was read before APIC initialization in mcheck_intel_therm_init(), it would read the value with the mask bit 16 clear and then intel_init_thermal() would replicate it onto the APs and all would be peachy - the thermal interrupt would remain enabled. But that commit moved that reading to a later moment in intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too late and with its interrupt mask bit set. Thus, revert back to the old behavior of reading the thermal LVT register before the APIC gets initialized. Fixes: 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()") Reported-by: James Feeney <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: <[email protected]> Cc: Zhang Rui <[email protected]> Cc: Srinivas Pandruvada <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-05-29x86/apic: Mark _all_ legacy interrupts when IO/APIC is missingThomas Gleixner3-0/+22
PIC interrupts do not support affinity setting and they can end up on any online CPU. Therefore, it's required to mark the associated vectors as system-wide reserved. Otherwise, the corresponding irq descriptors are copied to the secondary CPUs but the vectors are not marked as assigned or reserved. This works correctly for the IO/APIC case. When the IO/APIC is disabled via config, kernel command line or lack of enumeration then all legacy interrupts are routed through the PIC, but nothing marks them as system-wide reserved vectors. As a consequence, a subsequent allocation on a secondary CPU can result in allocating one of these vectors, which triggers the BUG() in apic_update_vector() because the interrupt descriptor slot is not empty. Imran tried to work around that by marking those interrupts as allocated when a CPU comes online. But that's wrong in case that the IO/APIC is available and one of the legacy interrupts, e.g. IRQ0, has been switched to PIC mode because then marking them as allocated will fail as they are already marked as system vectors. Stay consistent and update the legacy vectors after attempting IO/APIC initialization and mark them as system vectors in case that no IO/APIC is available. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Imran Khan <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2021-05-23Linux 5.13-rc3Linus Torvalds1-1/+1
2021-05-23Merge tag 'perf-urgent-2021-05-23' of ↵Linus Torvalds4-9/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two perf fixes: - Do not check the LBR_TOS MSR when setting up unrelated LBR MSRs as this can cause malfunction when TOS is not supported - Allocate the LBR XSAVE buffers along with the DS buffers upfront because allocating them when adding an event can deadlock" * tag 'perf-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
2021-05-23Merge tag 'locking-urgent-2021-05-23' of ↵Linus Torvalds5-13/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Two locking fixes: - Invoke the lockdep tracepoints in the correct place so the ordering is correct again - Don't leave the mutex WAITER bit stale when the last waiter is dropping out early due to a signal as that forces all subsequent lock operations needlessly into the slowpath until it's cleaned up again" * tag 'locking-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal locking/lockdep: Correct calling tracepoints
2021-05-23Merge tag 'irq-urgent-2021-05-23' of ↵Linus Torvalds5-11/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A few fixes for irqchip drivers: - Allocate interrupt descriptors correctly on Mainstone PXA when SPARSE_IRQ is enabled; otherwise the interrupt association fails - Make the APPLE AIC chip driver depend on APPLE - Remove redundant error output on devm_ioremap_resource() failure" * tag 'irq-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: Remove redundant error printing irqchip/apple-aic: APPLE_AIC should depend on ARCH_APPLE ARM: PXA: Fix cplds irqdesc allocation when using legacy mode
2021-05-23Merge tag 'x86_urgent_for_v5.13_rc3' of ↵Linus Torvalds3-57/+92
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Fix how SEV handles MMIO accesses by forwarding potential page faults instead of killing the machine and by using the accessors with the exact functionality needed when accessing memory. - Fix a confusion with Clang LTO compiler switches passed to the it - Handle the case gracefully when VMGEXIT has been executed in userspace * tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Use __put_user()/__get_user() for data accesses x86/sev-es: Forward page-faults which happen during emulation x86/sev-es: Don't return NULL from sev_es_get_ghcb() x86/build: Fix location of '-plugin-opt=' flags x86/sev-es: Invalidate the GHCB after completing VMGEXIT x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
2021-05-23Merge tag 'powerpc-5.13-4' of ↵Linus Torvalds5-46/+82
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix breakage of strace (and other ptracers etc.) when using the new scv ABI (Power9 or later with glibc >= 2.33). - Fix early_ioremap() on 64-bit, which broke booting on some machines. Thanks to Dmitry V. Levin, Nicholas Piggin, Alexey Kardashevskiy, and Christophe Leroy. * tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls powerpc: Fix early setup to make early_ioremap() work
2021-05-22Merge tag 'kbuild-fixes-v5.13' of ↵Linus Torvalds4-28/+32
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix short log indentation for tools builds - Fix dummy-tools to adjust to the latest stackprotector check * tag 'kbuild-fixes-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: dummy-tools: adjust to stricter stackprotector check scripts/jobserver-exec: Fix a typo ("envirnoment") tools build: Fix quiet cmd indentation
2021-05-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds18-75/+74
Merge misc fixes from Andrew Morton: "10 patches. Subsystems affected by this patch series: mm (pagealloc, gup, kasan, and userfaultfd), ipc, selftests, watchdog, bitmap, procfs, and lib" * emailed patches from Andrew Morton <[email protected]>: userfaultfd: hugetlbfs: fix new flag usage in error path lib: kunit: suppress a compilation warning of frame size proc: remove Alexey from MAINTAINERS linux/bits.h: fix compilation error with GENMASK watchdog: reliable handling of timestamps kasan: slab: always reset the tag in get_freepointer_safe() tools/testing/selftests/exec: fix link error ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry Revert "mm/gup: check page posion status for coredump." mm/shuffle: fix section mismatch warning
2021-05-22userfaultfd: hugetlbfs: fix new flag usage in error pathMike Kravetz2-15/+15
In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") the use of PagePrivate to indicate a reservation count should be restored at free time was changed to the hugetlb specific flag HPageRestoreReserve. Changes to a userfaultfd error path as well as a VM_BUG_ON() in remove_inode_hugepages() were overlooked. Users could see incorrect hugetlb reserve counts if they experience an error with a UFFDIO_COPY operation. Specifically, this would be the result of an unlikely copy_huge_page_from_user error. There is not an increased chance of hitting the VM_BUG_ON. Link: https://lkml.kernel.org/r/[email protected] Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Mina Almasry <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Mina Almasry <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22lib: kunit: suppress a compilation warning of frame sizeZhen Lei1-0/+1
lib/bitfield_kunit.c: In function `test_bitfields_constants': lib/bitfield_kunit.c:93:1: warning: the frame size of 7456 bytes is larger than 2048 bytes [-Wframe-larger-than=] } ^ As the description of BITFIELD_KUNIT in lib/Kconfig.debug, it "Only useful for kernel devs running the KUnit test harness, and not intended for inclusion into a production build". Therefore, it is not worth modifying variable 'test_bitfields_constants' to clear this warning. Just suppress it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Vitor Massaru Iha <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22proc: remove Alexey from MAINTAINERSAlexey Dobriyan1-1/+0
People Cc me and I don't have time. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22linux/bits.h: fix compilation error with GENMASKRikard Falkeborn5-10/+20
GENMASK() has an input check which uses __builtin_choose_expr() to enable a compile time sanity check of its inputs if they are known at compile time. However, it turns out that __builtin_constant_p() does not always return a compile time constant [0]. It was thought this problem was fixed with gcc 4.9 [1], but apparently this is not the case [2]. Switch to use __is_constexpr() instead which always returns a compile time constant, regardless of its inputs. Link: https://lore.kernel.org/lkml/[email protected] [0] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19449 [1] Link: https://lore.kernel.org/lkml/[email protected] [2] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rikard Falkeborn <[email protected]> Reported-by: Tetsuo Handa <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Yury Norov <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22watchdog: reliable handling of timestampsPetr Mladek1-14/+20
Commit 9bf3bc949f8a ("watchdog: cleanup handling of false positives") tried to handle a virtual host stopped by the host a more straightforward and cleaner way. But it introduced a risk of false softlockup reports. The virtual host might be stopped at any time, for example between kvm_check_and_clear_guest_paused() and is_softlockup(). As a result, is_softlockup() might read the updated jiffies and detects a softlockup. A solution might be to put back kvm_check_and_clear_guest_paused() after is_softlockup() and detect it. But it would put back the cycle that complicates the logic. In fact, the handling of all the timestamps is not reliable. The code does not guarantee when and how many times the timestamps are read. For example, "period_ts" might be touched anytime also from NMI and re-read in is_softlockup(). It works just by chance. Fix all the problems by making the code even more explicit. 1. Make sure that "now" and "period_ts" timestamps are read only once. They might be changed at anytime by NMI or when the virtual guest is stopped by the host. Note that "now" timestamp does this implicitly because "jiffies" is marked volatile. 2. "now" time must be read first. The state of "period_ts" will decide whether it will be used or the period will get restarted. 3. kvm_check_and_clear_guest_paused() must be called before reading "period_ts". It touches the variable when the guest was stopped. As a result, "now" timestamp is used only when the watchdog was not touched and the guest not stopped in the meantime. "period_ts" is restarted in all other situations. Link: https://lkml.kernel.org/r/YKT55gw+RZfyoFf7@alley Fixes: 9bf3bc949f8aeefeacea4b ("watchdog: cleanup handling of false positives") Signed-off-by: Petr Mladek <[email protected]> Reported-by: Sergey Senozhatsky <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22kasan: slab: always reset the tag in get_freepointer_safe()Alexander Potapenko1-0/+1
With CONFIG_DEBUG_PAGEALLOC enabled, the kernel should also untag the object pointer, as done in get_freepointer(). Failing to do so reportedly leads to SLUB freelist corruptions that manifest as boot-time crashes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Cc: Marco Elver <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Elliot Berman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22tools/testing/selftests/exec: fix link errorYang Yingliang1-3/+3
Fix the link error by adding '-static': gcc -Wall -Wl,-z,max-page-size=0x1000 -pie load_address.c -o /home/yang/linux/tools/testing/selftests/exec/load_address_4096 /usr/bin/ld: /tmp/ccopEGun.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `stderr@@GLIBC_2.17' which may bind externally can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: /tmp/ccopEGun.o(.text+0x158): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol `stderr@@GLIBC_2.17' /usr/bin/ld: final link failed: bad value collect2: error: ld returned 1 exit status make: *** [Makefile:25: tools/testing/selftests/exec/load_address_4096] Error 1 Link: https://lkml.kernel.org/r/[email protected] Fixes: 206e22f01941 ("tools/testing/selftests: add self-test for verifying load alignment") Signed-off-by: Yang Yingliang <[email protected]> Cc: Chris Kennelly <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiryVarad Gautam3-6/+12
do_mq_timedreceive calls wq_sleep with a stack local address. The sender (do_mq_timedsend) uses this address to later call pipelined_send. This leads to a very hard to trigger race where a do_mq_timedreceive call might return and leave do_mq_timedsend to rely on an invalid address, causing the following crash: RIP: 0010:wake_q_add_safe+0x13/0x60 Call Trace: __x64_sys_mq_timedsend+0x2a9/0x490 do_syscall_64+0x80/0x680 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f5928e40343 The race occurs as: 1. do_mq_timedreceive calls wq_sleep with the address of `struct ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it holds a valid `struct ext_wait_queue *` as long as the stack has not been overwritten. 2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call __pipelined_op. 3. Sender calls __pipelined_op::smp_store_release(&this->state, STATE_READY). Here is where the race window begins. (`this` is `ewq_addr`.) 4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it will see `state == STATE_READY` and break. 5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's stack. (Although the address may not get overwritten until another function happens to touch it, which means it can persist around for an indefinite time.) 6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a `struct ext_wait_queue *`, and uses it to find a task_struct to pass to the wake_q_add_safe call. In the lucky case where nothing has overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct. In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a bogus address as the receiver's task_struct causing the crash. do_mq_timedsend::__pipelined_op() should not dereference `this` after setting STATE_READY, as the receiver counterpart is now free to return. Change __pipelined_op to call wake_q_add_safe on the receiver's task_struct returned by get_task_struct, instead of dereferencing `this` which sits on the receiver's stack. As Manfred pointed out, the race potentially also exists in ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare. Fix those in the same way. Link: https://lkml.kernel.org/r/[email protected] Fixes: c5b2cbdbdac563 ("ipc/mqueue.c: update/document memory barriers") Fixes: 8116b54e7e23ef ("ipc/sem.c: document and update memory barriers") Fixes: 0d97a82ba830d8 ("ipc/msg.c: update and document memory barriers") Signed-off-by: Varad Gautam <[email protected]> Reported-by: Matthias von Faber <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Acked-by: Manfred Spraul <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22Revert "mm/gup: check page posion status for coredump."Michal Hocko2-24/+0
While reviewing [1] I came across commit d3378e86d182 ("mm/gup: check page posion status for coredump.") and noticed that this patch is broken in two ways. First it doesn't really prevent hwpoison pages from being dumped because hwpoison pages can be marked asynchornously at any time after the check. Secondly, and more importantly, the patch introduces a ref count leak because get_dump_page takes a reference on the page which is not released. It also seems that the patch was merged incorrectly because there were follow up changes not included as well as discussions on how to address the underlying problem [2] Therefore revert the original patch. Link: http://lkml.kernel.org/r/[email protected] [1] Link: http://lkml.kernel.org/r/[email protected] [2] Link: https://lkml.kernel.org/r/[email protected] Fixes: d3378e86d182 ("mm/gup: check page posion status for coredump.") Signed-off-by: Michal Hocko <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Aili Yao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22mm/shuffle: fix section mismatch warningArnd Bergmann1-2/+2
clang sometimes decides not to inline shuffle_zone(), but it calls a __meminit function. Without the extra __meminit annotation we get this warning: WARNING: modpost: vmlinux.o(.text+0x2a86d4): Section mismatch in reference from the function shuffle_zone() to the function .meminit.text:__shuffle_zone() The function shuffle_zone() references the function __meminit __shuffle_zone(). This is often because shuffle_zone lacks a __meminit annotation or the annotation of __shuffle_zone is wrong. shuffle_free_memory() did not show the same problem in my tests, but it could happen in theory as well, so mark both as __meminit. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Wei Yang <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-22Merge tag 'block-5.13-2021-05-22' of git://git.kernel.dk/linux-blockLinus Torvalds7-26/+28
Pull block fixes from Jens Axboe: - Fix BLKRRPART and deletion race (Gulam, Christoph) - NVMe pull request (Christoph): - nvme-tcp corruption and timeout fixes (Sagi Grimberg, Keith Busch) - nvme-fc teardown fix (James Smart) - nvmet/nvme-loop memory leak fixes (Wu Bo)" * tag 'block-5.13-2021-05-22' of git://git.kernel.dk/linux-block: block: fix a race between del_gendisk and BLKRRPART block: prevent block device lookups at the beginning of del_gendisk nvme-fc: clear q_live at beginning of association teardown nvme-tcp: rerun io_work if req_list is not empty nvme-tcp: fix possible use-after-completion nvme-loop: fix memory leak in nvme_loop_create_ctrl() nvmet: fix memory leak in nvmet_alloc_ctrl()
2021-05-22Merge tag 'io_uring-5.13-2021-05-22' of git://git.kernel.dk/linux-blockLinus Torvalds1-7/+7
Pull io_uring fixes from Jens Axboe: "One fix for a regression with poll in this merge window, and another just hardens the io-wq exit path a bit" * tag 'io_uring-5.13-2021-05-22' of git://git.kernel.dk/linux-block: io_uring: fortify tctx/io_wq cleanup io_uring: don't modify req->poll for rw
2021-05-22Merge tag 'for-linus-5.13b-rc3-tag' of ↵Linus Torvalds3-15/+29
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a fix for a boot regression when running as PV guest on hardware without NX support - a small series fixing a bug in the Xen pciback driver when configuring a PCI card with multiple virtual functions * tag 'for-linus-5.13b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: reconfigure also from backend watch handler xen-pciback: redo VF placement in the virtual topology x86/Xen: swap NX determination and GDT setup on BSP
2021-05-21Merge tag 'xfs-5.13-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds3-28/+78
Pull xfs fixes from Darrick Wong: - Fix some math errors in the realtime allocator when extent size hints are applied. - Fix unnecessary short writes to realtime files when free space is fragmented. - Fix a crash when using scrub tracepoints. - Restore ioctl uapi definitions that were accidentally removed in 5.13-rc1. * tag 'xfs-5.13-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: restore old ioctl definitions xfs: fix deadlock retry tracepoint arguments xfs: retry allocations when locality-based search fails xfs: adjust rt allocation minlen when extszhint > rtextsize
2021-05-21Merge tag 'for-5.13-rc2-tag' of ↵Linus Torvalds7-13/+49
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes: - fix unaligned compressed writes in zoned mode - fix false positive lockdep warning when cloning inline extent - remove wrong BUG_ON in tree-log error handling" * tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix parallel compressed writes btrfs: zoned: pass start block to btrfs_use_zone_append btrfs: do not BUG_ON in link_to_fixup_dir btrfs: release path before starting transaction when cloning inline extent
2021-05-21Merge tag '5.13-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds7-32/+55
Pull cifs fixes from Steve French: "Seven smb3 fixes: one for stable, three others fix problems found in testing handle leases, and a compounded request fix" * tag '5.13-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6: Fix KASAN identified use-after-free issue. Defer close only when lease is enabled. Fix kernel oops when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. cifs: Fix inconsistent indenting cifs: fix memory leak in smb2_copychunk_range SMB3: incorrect file id in requests compounded with open cifs: remove deadstore in cifs_close_all_deferred_files()
2021-05-21Merge tag 'gpio-fixes-for-v5.13-rc3' of ↵Linus Torvalds3-12/+2
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - add missing MODULE_DEVICE_TABLE in gpio-cadence - fix a kernel doc validator error in gpio-xilinx - don't set parent IRQ affinity in gpio-tegra186 * tag 'gpio-fixes-for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: tegra186: Don't set parent IRQ affinity gpio: xilinx: Correct kernel doc for xgpio_probe() gpio: cadence: Add missing MODULE_DEVICE_TABLE
2021-05-21Merge tag 'mmc-v5.13-rc1' of ↵Linus Torvalds2-3/+11
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC host fixes from Ulf Hansson: - Fix SD-card detection on Intel NUC10i3FNK4 (GL9755) - Replace WARN_ONCE with dev_warn_once for scatterlist offsets - Extend check of scatterlist size alignment with SD_IO_RW_EXTENDED * tag 'mmc-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-pci-gli: increase 1.8V regulator wait mmc: meson-gx: also check SD_IO_RW_EXTENDED for scatterlist size alignment mmc: meson-gx: make replace WARN_ONCE with dev_warn_once about scatterlist offset alignment
2021-05-21Merge tag 'devicetree-fixes-for-5.13-2' of ↵Linus Torvalds11-25/+13
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Another batch of removing unneeded type references in schemas - Fix some out of date filename references - Convert renesas,drif schema to use DT graph schema * tag 'devicetree-fixes-for-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: More removals of type references on common properties dt-bindings: media: renesas,drif: Use graph schema leds: Fix reference file name of documentation dt-bindings: phy: cadence-torrent: update reference file of docs
2021-05-21Merge branch 'for-v5.13-rc3' of ↵Linus Torvalds12-73/+79
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo fix from Eric Biederman: "During the merge window an issue with si_perf and the siginfo ABI came up. The alpha and sparc siginfo structure layout had changed with the addition of SIGTRAP TRAP_PERF and the new field si_perf. The reason only alpha and sparc were affected is that they are the only architectures that use si_trapno. Looking deeper it was discovered that si_trapno is used for only a few select signals on alpha and sparc, and that none of the other _sigfault fields past si_addr are used at all. Which means technically no regression on alpha and sparc. While the alignment concerns might be dismissed the abuse of si_errno by SIGTRAP TRAP_PERF does have the potential to cause regressions in existing userspace. While we still have time before userspace starts using and depending on the new definition siginfo for SIGTRAP TRAP_PERF this set of changes cleans up siginfo_t. - The si_trapno field is demoted from magic alpha and sparc status and made an ordinary union member of the _sigfault member of siginfo_t. Without moving it of course. - si_perf is replaced with si_perf_data and si_perf_type ending the abuse of si_errno. - Unnecessary additions to signalfd_siginfo are removed" * 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo signal: Deliver all of the siginfo perf data in _perf signal: Factor force_sig_perf out of perf_sigtrap signal: Implement SIL_FAULT_TRAPNO siginfo: Move si_trapno inside the union inside _si_fault
2021-05-21Merge tag 'modules-for-v5.13-rc3' of ↵Linus Torvalds1-6/+11
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module fix from Jessica Yu: "When CONFIG_MODULE_UNLOAD=n, module exit sections get sorted into the init region of the module in order to satisfy the requirements of jump_labels and static_calls. Previously, the exit section check was done in module_init_section(), but the solution there is not completely arch-indepedent as ARM is a special case and supplies its own module_init_section() function. Instead of pushing this logic further to the arch-specific code, switch to an arch-independent solution to check for module exit sections in the core module loader code in layout_sections() instead" * tag 'modules-for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: check for exit sections in layout_sections() instead of module_init_section()
2021-05-21Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds3-5/+12
Pull OpenRISC fixes from Stafford Horne: "A few fixes that came in around the time of the merge window" * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: Define memory barrier mb openrisc: mm/init.c: remove unused variable 'end' in paging_init() openrisc: mm/init.c: remove unused memblock_region variable in map_ram() openrisc: Fix a memory leak
2021-05-21xen-pciback: reconfigure also from backend watch handlerJan Beulich1-5/+17
When multiple PCI devices get assigned to a guest right at boot, libxl incrementally populates the backend tree. The writes for the first of the devices trigger the backend watch. In turn xen_pcibk_setup_backend() will set the XenBus state to Initialised, at which point no further reconfigures would happen unless a device got hotplugged. Arrange for reconfigure to also get triggered from the backend watch handler. Signed-off-by: Jan Beulich <[email protected]> Cc: [email protected] Reviewed-by: Boris Ostrovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2021-05-21xen-pciback: redo VF placement in the virtual topologyJan Beulich1-6/+8
The commit referenced below was incomplete: It merely affected what would get written to the vdev-<N> xenstore node. The guest would still find the function at the original function number as long as __xen_pcibk_get_pci_dev() wouldn't be in sync. The same goes for AER wrt __xen_pcibk_get_pcifront_dev(). Undo overriding the function to zero and instead make sure that VFs at function zero remain alone in their slot. This has the added benefit of improving overall capacity, considering that there's only a total of 32 slots available right now (PCI segment and bus can both only ever be zero at present). Fixes: 8a5248fe10b1 ("xen PV passthru: assign SR-IOV virtual functions to separate virtual slots") Signed-off-by: Jan Beulich <[email protected]> Cc: [email protected] Reviewed-by: Boris Ostrovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2021-05-21x86/Xen: swap NX determination and GDT setup on BSPJan Beulich1-4/+4
xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. [jgross] Note that this is a revert of 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established"), which is possible now that we no longer support running as PV guest in 32-bit mode. Cc: <stable.vger.kernel.org> # 5.9 Fixes: 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established") Reported-by: Olaf Hering <[email protected]> Signed-off-by: Jan Beulich <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2021-05-20Merge tag 'drm-fixes-2021-05-21-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds29-192/+191
Pull drm fixes from Dave Airlie: "Usual collection, mostly amdgpu and some i915 regression fixes. I nearly managed to hose my build/sign machine this week, but I recovered it just in time, and I even got clang12 built. dma-buf: - WARN fix amdgpu: - Fix downscaling ratio on DCN3.x - Fix for non-4K pages - PCO/RV compute hang fix - Dongle fix - Aldebaran codec query support - Refcount leak fix - Use after free fix - Navi12 golden settings updates - GPU reset fixes radeon: - Fix for imported BO handling i915: - Pin the L-shape quirked object as unshrinkable to fix crashes - Disable HiZ Raw Stall Optimization on broken gen7 to fix glitches, gfx corruption - GVT: Move mdev attribute groups into kvmgt module to fix kconfig deps issue exynos: - Correct kerneldoc of fimd_shadow_protect_win function - Drop redundant error messages" * tag 'drm-fixes-2021-05-21-1' of git://anongit.freedesktop.org/drm/drm: dma-buf: fix unintended pin/unpin warnings drm/amdgpu: stop touching sched.ready in the backend drm/amd/amdgpu: fix a potential deadlock in gpu reset drm/amdgpu: update sdma golden setting for Navi12 drm/amdgpu: update gc golden setting for Navi12 drm/amdgpu: Fix a use-after-free drm/amdgpu: add video_codecs query support for aldebaran drm/amd/amdgpu: fix refcount leak drm/amd/display: Disconnect non-DP with no EDID drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE drm/radeon: use the dummy page for GART if needed drm/amd/display: Use the correct max downscaling value for DCN3.x family drm/i915/gt: Disable HiZ Raw Stall Optimization on broken gen7 drm/i915/gem: Pin the L-shape quirked object as unshrinkable drm/exynos/decon5433: Remove redundant error printing in exynos5433_decon_probe() drm/exynos: Remove redundant error printing in exynos_dsi_probe() drm/exynos: correct exynos_drm_fimd kerneldoc drm/i915/gvt: Move mdev attribute groups into kvmgt module
2021-05-21Merge tag 'amd-drm-fixes-5.13-2021-05-19' of ↵Dave Airlie16-35/+54
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.13-2021-05-19: amdgpu: - Fix downscaling ratio on DCN3.x - Fix for non-4K pages - PCO/RV compute hang fix - Dongle fix - Aldebaran codec query support - Refcount leak fix - Use after free fix - Navi12 golden settings updates - GPU reset fixes radeon: - Fix for imported BO handling Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-21Merge tag 'drm-intel-fixes-2021-05-20' of ↵Dave Airlie9-145/+129
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.13-rc3: - Pin the L-shape quirked object as unshrinkable to fix crashes - Disable HiZ Raw Stall Optimization on broken gen7 to fix glitches, gfx corruption - GVT: Move mdev attribute groups into kvmgt module to fix kconfig deps issue Signed-off-by: Dave Airlie <[email protected]> From: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-21Merge tag 'drm-misc-fixes-2021-05-20' of ↵Dave Airlie1-5/+5
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Just a single fix for a dma-buf related WARN Signed-off-by: Dave Airlie <[email protected]> From: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/20210520140808.ds6bk6i3oarmiea6@gilmour
2021-05-21Merge tag 'exynos-drm-fixes-for-v5.13-rc3' of ↵Dave Airlie3-7/+3
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes Fixup - Correct kerneldoc of fimd_shadow_protect_win function. Cleanup - Drop redundant error messages. Signed-off-by: Dave Airlie <[email protected]> From: Inki Dae <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-20Merge tag 'arm-soc-fixes-5.13-1' of ↵Linus Torvalds24-29/+196
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Only a small number of fixes so far, including some that I had applied during the merge window, so this is based on the original merge of the other branches. - The largest change is a fix for a reference counting bug in the AMD TEE driver. - Neil Armstrong now co-maintains Amlogic SoC support - Two build warning fixes for renesas device tree files - A sign expansion bug for optee - A DT binding fix for a mismerge" * tag 'arm-soc-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: npcm: wpcm450: select interrupt controller driver MAINTAINERS: ARM/Amlogic SoCs: add Neil as primary maintainer tee: amdtee: unload TA only when its refcount becomes 0 dt-bindings: nvmem: mediatek: remove duplicate mt8192 line firmware: arm_scmi: Remove duplicate declaration of struct scmi_protocol_handle firmware: arm_scpi: Prevent the ternary sign expansion bug arm64: dts: renesas: Add port@0 node for all CSI-2 nodes to dtsi arm64: dts: renesas: aistarvision-mipi-adapter-2.1: Fix CSI40 ports
2021-05-20Merge branch 'urgent.2021.05.20a' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull kcsan fix from Paul McKenney: "Fix for a regression introduced in this merge window by commit e36299efe7d7 ("kcsan, debugfs: Move debugfs file creation out of early init"). The regression is not easy to trigger, requiring a KCSAN build using clang with CONFIG_LTO_CLANG=y. The fix is to simply make the kcsan_debugfs_init() function's type initcall-compatible. This has been posted to the relevant mailing lists:" * 'urgent.2021.05.20a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: kcsan: Fix debugfs initcall return type
2021-05-20Merge tag 'scsi-fixes' of ↵Linus Torvalds10-19/+36
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eight small fixes, all in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: pm80xx: Fix drives missing during rmmod/insmod loop scsi: qla2xxx: Fix error return code in qla82xx_write_flash_dword() scsi: qedf: Add pointer checks in qedf_update_link_speed() scsi: ufs: core: Increase the usable queue depth scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic scsi: ufs: ufs-mediatek: Fix power down spec violation
2021-05-20Merge tag 'for-5.13/dm-fixes' of ↵Linus Torvalds2-48/+39
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix a couple DM snapshot target crashes exposed by user-error. - Fix DM integrity target to not use discard optimization, introduced during 5.13 merge, when recalulating. - Fix some sparse warnings in DM integrity target. * tag 'for-5.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm integrity: fix sparse warnings dm integrity: revert to not using discard filler when recalulating dm snapshot: fix crash with transient storage and zero chunk size dm snapshot: fix a crash when an origin has no snapshots
2021-05-20Fix KASAN identified use-after-free issue.Rohith Surabattula2-9/+23
[ 612.157429] ================================================================== [ 612.158275] BUG: KASAN: use-after-free in process_one_work+0x90/0x9b0 [ 612.158801] Read of size 8 at addr ffff88810a31ca60 by task kworker/2:9/2382 [ 612.159611] CPU: 2 PID: 2382 Comm: kworker/2:9 Tainted: G OE 5.13.0-rc2+ #98 [ 612.159623] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 [ 612.159640] Workqueue: 0x0 (deferredclose) [ 612.159669] Call Trace: [ 612.159685] dump_stack+0xbb/0x107 [ 612.159711] print_address_description.constprop.0+0x18/0x140 [ 612.159733] ? process_one_work+0x90/0x9b0 [ 612.159743] ? process_one_work+0x90/0x9b0 [ 612.159754] kasan_report.cold+0x7c/0xd8 [ 612.159778] ? lock_is_held_type+0x80/0x130 [ 612.159789] ? process_one_work+0x90/0x9b0 [ 612.159812] kasan_check_range+0x145/0x1a0 [ 612.159834] process_one_work+0x90/0x9b0 [ 612.159877] ? pwq_dec_nr_in_flight+0x110/0x110 [ 612.159914] ? spin_bug+0x90/0x90 [ 612.159967] worker_thread+0x3b6/0x6c0 [ 612.160023] ? process_one_work+0x9b0/0x9b0 [ 612.160038] kthread+0x1dc/0x200 [ 612.160051] ? kthread_create_worker_on_cpu+0xd0/0xd0 [ 612.160092] ret_from_fork+0x1f/0x30 [ 612.160399] Allocated by task 2358: [ 612.160757] kasan_save_stack+0x1b/0x40 [ 612.160768] __kasan_kmalloc+0x9b/0xd0 [ 612.160778] cifs_new_fileinfo+0xb0/0x960 [cifs] [ 612.161170] cifs_open+0xadf/0xf20 [cifs] [ 612.161421] do_dentry_open+0x2aa/0x6b0 [ 612.161432] path_openat+0xbd9/0xfa0 [ 612.161441] do_filp_open+0x11d/0x230 [ 612.161450] do_sys_openat2+0x115/0x240 [ 612.161460] __x64_sys_openat+0xce/0x140 When mod_delayed_work is called to modify the delay of pending work, it might return false and queue a new work when pending work is already scheduled or when try to grab pending work failed. So, Increase the reference count when new work is scheduled to avoid use-after-free. Signed-off-by: Rohith Surabattula <[email protected]> Signed-off-by: Steve French <[email protected]>