aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-03-07xen/pvcalls: use alloc/free_pages_exact()Juergen Gross1-4/+4
Instead of __get_free_pages() and free_pages() use alloc_pages_exact() and free_pages_exact(). This is in preparation of a change of gnttab_end_foreign_access() which will prohibit use of high-order pages. This is part of CVE-2022-23041 / XSA-396. Reported-by: Simon Gaiser <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> --- V4: - new patch
2022-03-07xen/9p: use alloc/free_pages_exact()Juergen Gross1-8/+6
Instead of __get_free_pages() and free_pages() use alloc_pages_exact() and free_pages_exact(). This is in preparation of a change of gnttab_end_foreign_access() which will prohibit use of high-order pages. By using the local variable "order" instead of ring->intf->ring_order in the error path of xen_9pfs_front_alloc_dataring() another bug is fixed, as the error path can be entered before ring->intf->ring_order is being set. By using alloc_pages_exact() the size in bytes is specified for the allocation, which fixes another bug for the case of order < (PAGE_SHIFT - XEN_PAGE_SHIFT). This is part of CVE-2022-23041 / XSA-396. Reported-by: Simon Gaiser <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> --- V4: - new patch
2022-03-07xen/usb: don't use gnttab_end_foreign_access() in xenhcd_gnttab_done()Juergen Gross1-8/+18
The usage of gnttab_end_foreign_access() in xenhcd_gnttab_done() is not safe against a malicious backend, as the backend could keep the I/O page mapped and modify it even after the granted memory page is being used for completely other purposes in the local system. So replace that use case with gnttab_try_end_foreign_access() and disable the PV host adapter in case the backend didn't stop using the granted page. In xenhcd_urb_request_done() immediately return in case of setting the device state to "error" instead of looking into further backend responses. Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> --- V2: - use gnttab_try_end_foreign_access()
2022-03-07xen: remove gnttab_query_foreign_access()Juergen Gross2-27/+0
Remove gnttab_query_foreign_access(), as it is unused and unsafe to use. All previous use cases assumed a grant would not be in use after gnttab_query_foreign_access() returned 0. This information is useless in best case, as it only refers to a situation in the past, which could have changed already. Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]>
2022-03-07xen/gntalloc: don't use gnttab_query_foreign_access()Juergen Gross1-18/+7
Using gnttab_query_foreign_access() is unsafe, as it is racy by design. The use case in the gntalloc driver is not needed at all. While at it replace the call of gnttab_end_foreign_access_ref() with a call of gnttab_end_foreign_access(), which is what is really wanted there. In case the grant wasn't used due to an allocation failure, just free the grant via gnttab_free_grant_reference(). This is CVE-2022-23039 / part of XSA-396. Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> --- V3: - fix __del_gref() (Jan Beulich)
2022-03-07xen/scsifront: don't use gnttab_query_foreign_access() for mapped statusJuergen Gross1-2/+1
It isn't enough to check whether a grant is still being in use by calling gnttab_query_foreign_access(), as a mapping could be realized by the other side just after having called that function. In case the call was done in preparation of revoking a grant it is better to do so via gnttab_try_end_foreign_access() and check the success of that operation instead. This is CVE-2022-23038 / part of XSA-396. Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> --- V2: - use gnttab_try_end_foreign_access()
2022-03-07xen/netfront: don't use gnttab_query_foreign_access() for mapped statusJuergen Gross1-4/+2
It isn't enough to check whether a grant is still being in use by calling gnttab_query_foreign_access(), as a mapping could be realized by the other side just after having called that function. In case the call was done in preparation of revoking a grant it is better to do so via gnttab_end_foreign_access_ref() and check the success of that operation instead. This is CVE-2022-23037 / part of XSA-396. Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> --- V2: - use gnttab_try_end_foreign_access() V3: - don't use gnttab_try_end_foreign_access()
2022-03-07xen/blkfront: don't use gnttab_query_foreign_access() for mapped statusJuergen Gross1-26/+37
It isn't enough to check whether a grant is still being in use by calling gnttab_query_foreign_access(), as a mapping could be realized by the other side just after having called that function. In case the call was done in preparation of revoking a grant it is better to do so via gnttab_end_foreign_access_ref() and check the success of that operation instead. For the ring allocation use alloc_pages_exact() in order to avoid high order pages in case of a multi-page ring. If a grant wasn't unmapped by the backend without persistent grants being used, set the device state to "error". This is CVE-2022-23036 / part of XSA-396. Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Roger Pau Monné <[email protected]> --- V2: - use gnttab_try_end_foreign_access() V4: - use alloc_pages_exact() and free_pages_exact() - set state to error if backend didn't unmap (Roger Pau Monné)
2022-03-07xen/grant-table: add gnttab_try_end_foreign_access()Juergen Gross2-2/+24
Add a new grant table function gnttab_try_end_foreign_access(), which will remove and free a grant if it is not in use. Its main use case is to either free a grant if it is no longer in use, or to take some other action if it is still in use. This other action can be an error exit, or (e.g. in the case of blkfront persistent grant feature) some special handling. This is CVE-2022-23036, CVE-2022-23038 / part of XSA-396. Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> --- V2: - new patch V4: - add comments to header (Jan Beulich)
2022-03-07xen/xenbus: don't let xenbus_grant_ring() remove grants in error caseJuergen Gross1-13/+11
Letting xenbus_grant_ring() tear down grants in the error case is problematic, as the other side could already have used these grants. Calling gnttab_end_foreign_access_ref() without checking success is resulting in an unclear situation for any caller of xenbus_grant_ring() as in the error case the memory pages of the ring page might be partially mapped. Freeing them would risk unwanted foreign access to them, while not freeing them would leak memory. In order to remove the need to undo any gnttab_grant_foreign_access() calls, use gnttab_alloc_grant_references() to make sure no further error can occur in the loop granting access to the ring pages. It should be noted that this way of handling removes leaking of grant entries in the error case, too. This is CVE-2022-23040 / part of XSA-396. Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]>
2022-03-07powerpc: Fix STACKTRACE=n buildMichael Ellerman1-1/+1
Our skiroot_defconfig doesn't enable FTRACE, and so doesn't get STACKTRACE enabled either. That leads to a build failure since commit 1614b2b11fab ("arch: Make ARCH_STACKWALK independent of STACKTRACE") made stacktrace.c build even when STACKTRACE=n. arch/powerpc/kernel/stacktrace.c: In function ‘handle_backtrace_ipi’: arch/powerpc/kernel/stacktrace.c:171:2: error: implicit declaration of function ‘nmi_cpu_backtrace’ 171 | nmi_cpu_backtrace(regs); | ^~~~~~~~~~~~~~~~~ arch/powerpc/kernel/stacktrace.c: In function ‘arch_trigger_cpumask_backtrace’: arch/powerpc/kernel/stacktrace.c:226:2: error: implicit declaration of function ‘nmi_trigger_cpumask_backtrace’ 226 | nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This happens because our headers haven't defined arch_trigger_cpumask_backtrace, which causes lib/nmi_backtrace.c not to build nmi_cpu_backtrace(). The code in question doesn't actually depend on STACKTRACE=y, that was just added because arch_trigger_cpumask_backtrace() lived in stacktrace.c for convenience. So drop the dependency on CONFIG_STACKTRACE, that causes lib/nmi_backtrace.c to build nmi_cpu_backtrace() etc. and fixes the build. Fixes: 1614b2b11fab ("arch: Make ARCH_STACKWALK independent of STACKTRACE") [mpe: Cherry pick of 5a72345e6a78 from next into fixes] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-03-06Linux 5.17-rc7Linus Torvalds1-1/+1
2022-03-06Merge tag 'for-5.17-rc6-tag' of ↵Linus Torvalds13-28/+230
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes for various problems that have user visible effects or seem to be urgent: - fix corruption when combining DIO and non-blocking io_uring over multiple extents (seen on MariaDB) - fix relocation crash due to premature return from commit - fix quota deadlock between rescan and qgroup removal - fix item data bounds checks in tree-checker (found on a fuzzed image) - fix fsync of prealloc extents after EOF - add missing run of delayed items after unlink during log replay - don't start relocation until snapshot drop is finished - fix reversed condition for subpage writers locking - fix warning on page error" * tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fallback to blocking mode when doing async dio over multiple extents btrfs: add missing run of delayed items after unlink during log replay btrfs: qgroup: fix deadlock between rescan worker and remove qgroup btrfs: fix relocation crash due to premature return from btrfs_commit_transaction() btrfs: do not start relocation until in progress drops are done btrfs: tree-checker: use u64 for item data end to avoid overflow btrfs: do not WARN_ON() if we have PageError set btrfs: fix lost prealloc extents beyond eof after full fsync btrfs: subpage: fix a wrong check on subpage->writers
2022-03-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-14/+20
Pull kvm fixes from Paolo Bonzini: "x86 guest: - Tweaks to the paravirtualization code, to avoid using them when they're pointless or harmful x86 host: - Fix for SRCU lockdep splat - Brown paper bag fix for the propagation of errno" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots() KVM: x86: Yield to IPI target vCPU only if it is busy x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64 x86/kvm: Don't waste memory if kvmclock is disabled x86/kvm: Don't use PV TLB/yield when mwait is advertised
2022-03-06Merge tag 'powerpc-5.17-5' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set. Thanks to Murilo Opsfelder Araujo, and Erhard F" * tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set
2022-03-06Merge tag 'trace-v5.17-rc5' of ↵Linus Torvalds3-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Fix sorting on old "cpu" value in histograms - Fix return value of __setup() boot parameter handlers * tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix return value of __setup handlers tracing/histogram: Fix sorting on old "cpu" value
2022-03-06tools/virtio: handle fallout from folio workMichael S. Tsirkin1-0/+3
just add a stub Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-03-06tools/virtio: fix virtio_test executionStefano Garzarella1-0/+1
virtio_test hangs on __vring_new_virtqueue() because `vqs_list_lock` is not initialized. Let's initialize it in vdev_info_init(). Signed-off-by: Stefano Garzarella <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2022-03-06vhost: remove avail_event arg from vhost_update_avail_event()Stefano Garzarella1-2/+2
In vhost_update_avail_event() we never used the `avail_event` argument, since its introduction in commit 2723feaa8ec6 ("vhost: set log when updating used flags or avail event"). Let's remove it to clean up the code. Signed-off-by: Stefano Garzarella <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-03-06virtio: drop default for virtio-memMichael S. Tsirkin1-1/+0
There's no special reason why virtio-mem needs a default that's different from what kconfig provides, any more than e.g. virtio blk. Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: David Hildenbrand <[email protected]>
2022-03-06vdpa: fix use-after-free on vp_vdpa_removeZhang Min1-1/+1
When vp_vdpa driver is unbind, vp_vdpa is freed in vdpa_unregister_device and then vp_vdpa->mdev.pci_dev is dereferenced in vp_modern_remove, triggering use-after-free. Call Trace of unbinding driver free vp_vdpa : do_syscall_64 vfs_write kernfs_fop_write_iter device_release_driver_internal pci_device_remove vp_vdpa_remove vdpa_unregister_device kobject_release device_release kfree Call Trace of dereference vp_vdpa->mdev.pci_dev: vp_modern_remove pci_release_selected_regions pci_release_region pci_resource_len pci_resource_end (dev)->resource[(bar)].end Signed-off-by: Zhang Min <[email protected]> Signed-off-by: Yi Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Fixes: 64b9f64f80a6 ("vdpa: introduce virtio pci driver") Reviewed-by: Stefano Garzarella <[email protected]>
2022-03-06virtio-blk: Remove BUG_ON() in virtio_queue_rq()Xie Yongji1-10/+2
Currently we have a BUG_ON() to make sure the number of sg list does not exceed queue_max_segments() in virtio_queue_rq(). However, the block layer uses queue_max_discard_segments() instead of queue_max_segments() to limit the sg list for discard requests. So the BUG_ON() might be triggered if virtio-blk device reports a larger value for max discard segment than queue_max_segments(). To fix it, let's simply remove the BUG_ON() which has become unnecessary after commit 02746e26c39e("virtio-blk: avoid preallocating big SGL for data"). And the unused vblk->sg_elems can also be removed together. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Xie Yongji <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-03-06virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zeroXie Yongji1-2/+8
Currently the value of max_discard_segment will be set to MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device set 0 to max_discard_seg in configuration space. It's incorrect since the device might not be able to handle such large descriptors. To fix it, let's follow max_segments restrictions in this case. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Signed-off-by: Xie Yongji <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-03-06vhost: fix hung thread due to erroneous iotlb entriesAnirudh Rayabharam2-0/+16
In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when start is 0 and last is ULONG_MAX. One instance where it can happen is when userspace sends an IOTLB message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an entry with size = 0, start = 0, last = ULONG_MAX ends up in the iotlb. Next time a packet is sent, iotlb_access_ok() loops indefinitely due to that erroneous entry. Call Trace: <TASK> iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340 vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366 vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104 vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Reported by syzbot at: https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87 To fix this, do two things: 1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map a range with size 0. 2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX] by splitting it into two entries. Fixes: 0bbe30668d89e ("vhost: factor out IOTLB") Reported-by: [email protected] Tested-by: [email protected] Signed-off-by: Anirudh Rayabharam <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-03-06net: dsa: unlock the rtnl_mutex when dsa_master_setup() failsVladimir Oltean1-3/+3
After the blamed commit, dsa_tree_setup_master() may exit without calling rtnl_unlock(), fix that. Fixes: c146f9bc195a ("net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-03-06Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"Kai Lueke1-18/+3
This reverts commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 because ID 0 was meant to be used for configuring the policy/state without matching for a specific interface (e.g., Cilium is affected, see https://github.com/cilium/cilium/pull/18789 and https://github.com/cilium/cilium/pull/19019). Signed-off-by: Kai Lueke <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-03-05Merge branch 'for-linus' of ↵Linus Torvalds6-61/+51
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - a fixup for Goodix touchscreen driver allowing it to work on certain Cherry Trail devices - a fix for imbalanced enable/disable regulator in Elam touchpad driver that became apparent when used with Asus TF103C 2-in-1 dock - a couple new input keycodes used on newer keyboards * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: HID: add mapping for KEY_ALL_APPLICATIONS HID: add mapping for KEY_DICTATE Input: elan_i2c - fix regulator enable count imbalance after suspend/resume Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power() Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource Input: goodix - use the new soc_intel_is_byt() helper Input: samsung-keypad - properly state IOMEM dependency
2022-03-05Input: mt6779-keypad - add MediaTek keypad driverfengping.yu3-0/+234
This patch adds matrix keypad support for Mediatek SoCs. Signed-off-by: fengping.yu <[email protected]> Reviewed-by: Marco Felsch <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Mattijs Korpershoek <[email protected]> Signed-off-by: Mattijs Korpershoek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Torokhov <[email protected]>
2022-03-05dt-bindings: input: Add bindings for Mediatek matrix keypadfengping.yu1-0/+77
This patch add devicetree bindings for Mediatek matrix keypad driver. Signed-off-by: fengping.yu <[email protected]> Reviewed-by: Marco Felsch <[email protected]> Signed-off-by: Mattijs Korpershoek <[email protected]> Reviewed-by: Rob Herring <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Torokhov <[email protected]>
2022-03-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds18-137/+194
Merge misc fixes from Andrew Morton: "8 patches. Subsystems affected by this patch series: mm (hugetlb, pagemap, and userfaultfd), memfd, selftests, and kconfig" * emailed patches from Andrew Morton <[email protected]>: configs/debug: set CONFIG_DEBUG_INFO=y properly proc: fix documentation and description of pagemap kselftest/vm: fix tests build with old libc memfd: fix F_SEAL_WRITE after shmem huge page allocated mm: fix use-after-free when anon vma name is used after vma is freed mm: prevent vm_area_struct::anon_name refcount saturation mm: refactor vm_area_struct::anon_vma_name usage code selftests/vm: cleanup hugetlb file after mremap test
2022-03-05Merge tag 's390-5.17-5' of ↵Linus Torvalds6-7/+62
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix HAVE_DYNAMIC_FTRACE_WITH_ARGS implementation by providing correct switching between ftrace_caller/ftrace_regs_caller and supplying pt_regs only when ftrace_regs_caller is activated. - Fix exception table sorting. - Fix breakage of kdump tooling by preserving metadata it cannot function without. * tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/extable: fix exception table sorting s390/ftrace: fix arch_ftrace_get_regs implementation s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
2022-03-05configs/debug: set CONFIG_DEBUG_INFO=y properlyQian Cai1-1/+1
CONFIG_DEBUG_INFO can't be set by user directly, so set CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y instead. Otherwise, we end up with no debuginfo in vmlinux which is a big no-no for kernel debugging. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-05proc: fix documentation and description of pagemapYun Zhou2-2/+3
Since bit 57 was exported for uffd-wp write-protected (commit fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"), fixing it can reduce some unnecessary confusion. Link: https://lkml.kernel.org/r/[email protected] Fixes: fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information") Signed-off-by: Yun Zhou <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Tiberiu A Georgescu <[email protected]> Cc: Florian Schmidt <[email protected]> Cc: Ivan Teterevkov <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Yang Shi <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Colin Cross <[email protected]> Cc: Alistair Popple <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-05kselftest/vm: fix tests build with old libcChengming Zhou1-0/+1
The error message when I build vm tests on debian10 (GLIBC 2.28): userfaultfd.c: In function `userfaultfd_pagemap_test': userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use in this function); did you mean `MADV_RANDOM'? if (madvise(area_dst, test_pgsize, MADV_PAGEOUT)) ^~~~~~~~~~~~ MADV_RANDOM This patch includes these newer definitions from UAPI linux/mman.h, is useful to fix tests build on systems without these definitions in glibc sys/mman.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chengming Zhou <[email protected]> Reviewed-by: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-05memfd: fix F_SEAL_WRITE after shmem huge page allocatedHugh Dickins1-12/+28
Wangyong reports: after enabling tmpfs filesystem to support transparent hugepage with the following command: echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled the docker program tries to add F_SEAL_WRITE through the following command, but it fails unexpectedly with errno EBUSY: fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1. That is because memfd_tag_pins() and memfd_wait_for_pins() were never updated for shmem huge pages: checking page_mapcount() against page_count() is hopeless on THP subpages - they need to check total_mapcount() against page_count() on THP heads only. Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins() (compared != 1): either can be justified, but given the non-atomic total_mapcount() calculation, it is better now to be strict. Bear in mind that total_mapcount() itself scans all of the THP subpages, when choosing to take an XA_CHECK_SCHED latency break. Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a page has been swapped out since memfd_tag_pins(), then its refcount must have fallen, and so it can safely be untagged. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reported-by: Zeal Robot <[email protected]> Reported-by: wangyong <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: CGEL ZTE <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Song Liu <[email protected]> Cc: Yang Yang <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-05mm: fix use-after-free when anon vma name is used after vma is freedSuren Baghdasaryan1-1/+7
When adjacent vmas are being merged it can result in the vma that was originally passed to madvise_update_vma being destroyed. In the current implementation, the name parameter passed to madvise_update_vma points directly to vma->anon_name and it is used after the call to vma_merge. In the cases when vma_merge merges the original vma and destroys it, this might result in UAF. For that the original vma would have to hold the anon_vma_name with the last reference. The following vma would need to contain a different anon_vma_name object with the same string. Such scenario is shown below: madvise_vma_behavior(vma) madvise_update_vma(vma, ..., anon_name == vma->anon_name) vma_merge(vma) __vma_adjust(vma) <-- merges vma with adjacent one vm_area_free(vma) <-- frees the original vma replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name Fix this by raising the name refcount and stabilizing it. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 9a10064f5625 ("mm: add a field to store names for private anonymous memory") Signed-off-by: Suren Baghdasaryan <[email protected]> Reported-by: [email protected] Acked-by: Michal Hocko <[email protected]> Cc: Alexey Gladkov <[email protected]> Cc: Chris Hyser <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Colin Cross <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kees Cook <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xiaofeng Cao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-05mm: prevent vm_area_struct::anon_name refcount saturationSuren Baghdasaryan2-6/+15
A deep process chain with many vmas could grow really high. With default sysctl_max_map_count (64k) and default pid_max (32k) the max number of vmas in the system is 2147450880 and the refcounter has headroom of 1073774592 before it reaches REFCOUNT_SATURATED (3221225472). Therefore it's unlikely that an anonymous name refcounter will overflow with these defaults. Currently the max for pid_max is PID_MAX_LIMIT (4194304) and for sysctl_max_map_count it's INT_MAX (2147483647). In this configuration anon_vma_name refcount overflow becomes theoretically possible (that still require heavy sharing of that anon_vma_name between processes). kref refcounting interface used in anon_vma_name structure will detect a counter overflow when it reaches REFCOUNT_SATURATED value but will only generate a warning and freeze the ref counter. This would lead to the refcounted object never being freed. A determined attacker could leak memory like that but it would be rather expensive and inefficient way to do so. To ensure anon_vma_name refcount does not overflow, stop anon_vma_name sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still leaves INT_MAX/2 (1073741823) values before the counter reaches REFCOUNT_SATURATED. This should provide enough headroom for raising the refcounts temporarily. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Gladkov <[email protected]> Cc: Chris Hyser <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Colin Cross <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kees Cook <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xiaofeng Cao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-05mm: refactor vm_area_struct::anon_vma_name usage codeSuren Baghdasaryan12-114/+125
Avoid mixing strings and their anon_vma_name referenced pointers by using struct anon_vma_name whenever possible. This simplifies the code and allows easier sharing of anon_vma_name structures when they represent the same name. [[email protected]: fix comment] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Colin Cross <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Alexey Gladkov <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Chris Hyser <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Xiaofeng Cao <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-05selftests/vm: cleanup hugetlb file after mremap testMike Kravetz2-8/+21
The hugepage-mremap test will create a file in a hugetlb filesystem. In a default 'run_vmtests' run, the file will contain all the hugetlb pages. After the test, the file remains and there are no free hugetlb pages for subsequent tests. This causes those hugetlb tests to fail. Change hugepage-mremap to take the name of the hugetlb file as an argument. Unlink the file within the test, and just to be sure remove the file in the run_vmtests script. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Shuah Khan <[email protected]> Acked-by: Yosry Ahmed <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-05mISDN: Fix memory leak in dsp_pipeline_build()Alexey Khoroshilov1-3/+3
dsp_pipeline_build() allocates dup pointer by kstrdup(cfg), but then it updates dup variable by strsep(&dup, "|"). As a result when it calls kfree(dup), the dup variable contains NULL. Found by Linux Driver Verification project (linuxtesting.org) with SVACE. Signed-off-by: Alexey Khoroshilov <[email protected]> Fixes: 960366cf8dbb ("Add mISDN DSP") Signed-off-by: David S. Miller <[email protected]>
2022-03-05ARM: Spectre-BHB workaroundRussell King (Oracle)9-9/+254
Workaround the Spectre BHB issues for Cortex-A15, Cortex-A57, Cortex-A72, Cortex-A73 and Cortex-A75. We also include Brahma B15 as well to be safe, which is affected by Spectre V2 in the same ways as Cortex-A15. Reviewed-by: Catalin Marinas <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2022-03-05ARM: use LOADADDR() to get load address of sectionsRussell King (Oracle)1-7/+12
Use the linker's LOADADDR() macro to get the load address of the sections, and provide a macro to set the start and end symbols. Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2022-03-05ARM: early traps initialisationRussell King (Oracle)1-6/+21
Provide a couple of helpers to copy the vectors and stubs, and also to flush the copied vectors and stubs. Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2022-03-05ARM: report Spectre v2 status through sysfsRussell King (Oracle)5-39/+187
As per other architectures, add support for reporting the Spectre vulnerability status via sysfs CPU. Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2022-03-05powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not setMurilo Opsfelder Araujo2-2/+2
The following build failure occurs when CONFIG_PPC_64S_HASH_MMU is not set: arch/powerpc/kernel/setup_64.c: In function ‘setup_per_cpu_areas’: arch/powerpc/kernel/setup_64.c:811:21: error: ‘mmu_linear_psize’ undeclared (first use in this function); did you mean ‘mmu_virtual_psize’? 811 | if (mmu_linear_psize == MMU_PAGE_4K) | ^~~~~~~~~~~~~~~~ | mmu_virtual_psize arch/powerpc/kernel/setup_64.c:811:21: note: each undeclared identifier is reported only once for each function it appears in Move the declaration of mmu_linear_psize outside of CONFIG_PPC_64S_HASH_MMU ifdef. After the above is fixed, it fails later with the following error: ld: arch/powerpc/kexec/file_load_64.o: in function `.arch_kexec_kernel_image_probe': file_load_64.c:(.text+0x1c1c): undefined reference to `.add_htab_mem_range' Fix that, too, by conditioning add_htab_mem_range() symbol to CONFIG_PPC_64S_HASH_MMU. Fixes: 387e220a2e5e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU") Reported-by: Erhard F. <[email protected]> Signed-off-by: Murilo Opsfelder Araujo <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215567 Link: https://lore.kernel.org/r/[email protected]
2022-03-05x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMTJosh Poimboeuf1-2/+25
The commit 44a3918c8245 ("x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting") added a warning for the "eIBRS + unprivileged eBPF" combination, which has been shown to be vulnerable against Spectre v2 BHB-based attacks. However, there's no warning about the "eIBRS + LFENCE retpoline + unprivileged eBPF" combo. The LFENCE adds more protection by shortening the speculation window after a mispredicted branch. That makes an attack significantly more difficult, even with unprivileged eBPF. So at least for now the logic doesn't warn about that combination. But if you then add SMT into the mix, the SMT attack angle weakens the effectiveness of the LFENCE considerably. So extend the "eIBRS + unprivileged eBPF" warning to also include the "eIBRS + LFENCE + unprivileged eBPF + SMT" case. [ bp: Massage commit message. ] Suggested-by: Alyssa Milburn <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-03-05x86/speculation: Warn about Spectre v2 LFENCE mitigationJosh Poimboeuf1-0/+5
With: f8a66d608a3e ("x86,bugs: Unconditionally allow spectre_v2=retpoline,amd") it became possible to enable the LFENCE "retpoline" on Intel. However, Intel doesn't recommend it, as it has some weaknesses compared to retpoline. Now AMD doesn't recommend it either. It can still be left available as a cmdline option. It's faster than retpoline but is weaker in certain scenarios -- particularly SMT, but even non-SMT may be vulnerable in some cases. So just unconditionally warn if the user requests it on the cmdline. [ bp: Massage commit message. ] Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-03-04net: phy: meson-gxl: fix interrupt handling in forced modeHeiner Kallweit1-10/+13
This PHY doesn't support a link-up interrupt source. If aneg is enabled we use the "aneg complete" interrupt for this purpose, but if aneg is disabled link-up isn't signaled currently. According to a vendor driver there's an additional "energy detect" interrupt source that can be used to signal link-up if aneg is disabled. We can safely ignore this interrupt source if aneg is enabled. This patch was tested on a TX3 Mini TV box with S905W (even though boot message says it's a S905D). This issue has been existing longer, but due to changes in phylib and the driver the patch applies only from the commit marked as fixed. Fixes: 84c8f773d2dc ("net: phy: meson-gxl: remove the use of .ack_callback()") Signed-off-by: Heiner Kallweit <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-03-04Merge tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-blockLinus Torvalds1-8/+18
Pull block fix from Jens Axboe: "Just a small UAF fix for blktrace" * tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block: blktrace: fix use after free for struct blk_trace
2022-03-04Merge tag 'riscv-for-linus-5.17-rc7' of ↵Linus Torvalds7-9/+14
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - Fixes for a handful of KASAN-related crashes. - A fix to avoid a crash during boot for SPARSEMEM && !SPARSEMEM_VMEMMAP configurations. - A fix to stop reporting some incorrect errors under DEBUG_VIRTUAL. - A fix for the K210's device tree to properly populate the interrupt map, so hart1 will get interrupts again. * tag 'riscv-for-linus-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: dts: k210: fix broken IRQs on hart1 riscv: Fix kasan pud population riscv: Move high_memory initialization to setup_bootmem riscv: Fix config KASAN && DEBUG_VIRTUAL riscv: Fix DEBUG_VIRTUAL false warnings riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP riscv: Fix is_linear_mapping with recent move of KASAN region