aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-03-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds13-86/+168
Merge yet more updates from Andrew Morton: "This is the material which was staged after willystuff in linux-next. Subsystems affected by this patch series: mm (debug, selftests, pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise), and selftests" * emailed patches from Andrew Morton <[email protected]>: (113 commits) selftests: kselftest framework: provide "finished" helper mm: madvise: MADV_DONTNEED_LOCKED mm: fix race between MADV_FREE reclaim and blkdev direct IO read mm: generalize ARCH_HAS_FILTER_PGPROT mm: unmap_mapping_range_tree() with i_mmap_rwsem shared mm: warn on deleting redirtied only if accounted mm/huge_memory: remove stale locking logic from __split_huge_pmd() mm/huge_memory: remove stale page_trans_huge_mapcount() mm/swapfile: remove stale reuse_swap_page() mm/khugepaged: remove reuse_swap_page() usage mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() mm: streamline COW logic in do_swap_page() mm: slightly clarify KSM logic in do_swap_page() mm: optimize do_wp_page() for fresh pages in local LRU pagevecs mm: optimize do_wp_page() for exclusive pages in the swapcache mm/huge_memory: make is_transparent_hugepage() static userfaultfd/selftests: enable hugetlb remap and remove event testing selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test mm: enable MADV_DONTNEED for hugetlb mappings kasan: disable LOCKDEP when printing reports ...
2022-03-25Merge tag 'riscv-for-linus-5.18-mw0' of ↵Linus Torvalds3-0/+121
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for Sv57-based virtual memory. - Various improvements for the MicroChip PolarFire SOC and the associated Icicle dev board, which should allow upstream kernels to boot without any additional modifications. - An improved memmove() implementation. - Support for the new Ssconfpmf and SBI PMU extensions, which allows for a much more useful perf implementation on RISC-V systems. - Support for restartable sequences. * tag 'riscv-for-linus-5.18-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (36 commits) rseq/selftests: Add support for RISC-V RISC-V: Add support for restartable sequence MAINTAINERS: Add entry for RISC-V PMU drivers Documentation: riscv: Remove the old documentation RISC-V: Add sscofpmf extension support RISC-V: Add perf platform driver based on SBI PMU extension RISC-V: Add RISC-V SBI PMU extension definitions RISC-V: Add a simple platform driver for RISC-V legacy perf RISC-V: Add a perf core library for pmu drivers RISC-V: Add CSR encodings for all HPMCOUNTERS RISC-V: Remove the current perf implementation RISC-V: Improve /proc/cpuinfo output for ISA extensions RISC-V: Do no continue isa string parsing without correct XLEN RISC-V: Implement multi-letter ISA extension probing framework RISC-V: Extract multi-letter extension names from "riscv, isa" RISC-V: Minimal parser for "riscv, isa" strings RISC-V: Correctly print supported extensions riscv: Fixed misaligned memory access. Fixed pointer comparison. MAINTAINERS: update riscv/microchip entry riscv: dts: microchip: add new peripherals to icicle kit device tree ...
2022-03-25Merge tag 'powerpc-5.18-1' of ↵Linus Torvalds2-2/+15
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Livepatch support for 32-bit is probably the standout new feature, otherwise mostly just lots of bits and pieces all over the board. There's a series of commits cleaning up function descriptor handling, which touches a few other arches as well as LKDTM. It has acks from Arnd, Kees and Helge. Summary: - Enforce kernel RO, and implement STRICT_MODULE_RWX for 603. - Add support for livepatch to 32-bit. - Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. - Merge vdso64 and vdso32 into a single directory. - Fix build errors with newer binutils. - Add support for UADDR64 relocations, which are emitted by some toolchains. This allows powerpc to build with the latest lld. - Fix (another) potential userspace r13 corruption in transactional memory handling. - Cleanups of function descriptor handling & related fixes to LKDTM. Thanks to Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh Kumar K.V, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chen Jingwen, Christophe JAILLET, Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel Henrique Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu Hua, Haren Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason Wang, Jeremy Kerr, Joachim Wiberg, Jordan Niethe, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mamatha Inamdar, Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal Suchanek, Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nour-eddine Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy Dunlap, Ritesh Harjani, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain, Thierry Reding, Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean, Wedson Almeida Filho, and YueHaibing" * tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits) powerpc/pseries: Fix use after free in remove_phb_dynamic() powerpc/time: improve decrementer clockevent processing powerpc/time: Fix KVM host re-arming a timer beyond decrementer range powerpc/tm: Fix more userspace r13 corruption powerpc/xive: fix return value of __setup handler powerpc/64: Add UADDR64 relocation support powerpc: 8xx: fix a return value error in mpc8xx_pic_init powerpc/ps3: remove unneeded semicolons powerpc/64: Force inlining of prevent_user_access() and set_kuap() powerpc/bitops: Force inlining of fls() powerpc: declare unmodified attribute_group usages const powerpc/spufs: Fix build warning when CONFIG_PROC_FS=n powerpc/secvar: fix refcount leak in format_show() powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E powerpc: Move C prototypes out of asm-prototypes.h powerpc/kexec: Declare kexec_paca static powerpc/smp: Declare current_set static powerpc: Cleanup asm-prototypes.c powerpc/ftrace: Use STK_GOT in ftrace_mprofile.S powerpc/ftrace: Regroup PPC64 specific operations in ftrace_mprofile.S ...
2022-03-25drm/i915/uapi: Document DRM_I915_QUERY_HWCONFIG_BLOBTvrtko Ursulin1-0/+13
UAPI with absolutely no documentation should not have been added - clarify blob format and content will be described externally. Fixes: 78e1fb3112c0 ("drm/i915/uapi: Add query for hwconfig blob") Signed-off-by: Tvrtko Ursulin <[email protected]> Co-developed-by: Jordan Justen <[email protected]> Cc: Jon Bloomfield <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: John Harrison <[email protected]> Cc: Jon Ewins <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Rodrigo Vivi <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] [tursulin: Fixed spelling s/meading/meaning/.]
2022-03-25dma-buf: add dma_fence_unwrap v2Christian König3-0/+99
Add a general purpose helper to deep dive into dma_fence_chain/dma_fence_array structures and iterate over all the fences in them. This is useful when we need to flatten out all fences in those structures. v2: some selftests cleanup, improved function naming and documentation Signed-off-by: Christian König <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-03-25dma-buf: Add dma_fence_array_for_each (v2)Christian König1-0/+17
Add a helper to iterate over all fences in a dma_fence_array object. v2 (Jason Ekstrand) - Return NULL from dma_fence_array_first if head == NULL. This matches the iterator behavior of dma_fence_chain_for_each in that it iterates zero times if head == NULL. - Return NULL from dma_fence_array_next if index > array->num_fences. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Christian König <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
2022-03-24Merge tag 'iommu-updates-v5.18' of ↵Linus Torvalds5-352/+70
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - IOMMU Core changes: - Removal of aux domain related code as it is basically dead and will be replaced by iommu-fd framework - Split of iommu_ops to carry domain-specific call-backs separatly - Cleanup to remove useless ops->capable implementations - Improve 32-bit free space estimate in iova allocator - Intel VT-d updates: - Various cleanups of the driver - Support for ATS of SoC-integrated devices listed in ACPI/SATC table - ARM SMMU updates: - Fix SMMUv3 soft lockup during continuous stream of events - Fix error path for Qualcomm SMMU probe() - Rework SMMU IRQ setup to prepare the ground for PMU support - Minor cleanups and refactoring - AMD IOMMU driver: - Some minor cleanups and error-handling fixes - Rockchip IOMMU driver: - Use standard driver registration - MSM IOMMU driver: - Minor cleanup and change to standard driver registration - Mediatek IOMMU driver: - Fixes for IOTLB flushing logic * tag 'iommu-updates-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (47 commits) iommu/amd: Improve amd_iommu_v2_exit() iommu/amd: Remove unused struct fault.devid iommu/amd: Clean up function declarations iommu/amd: Call memunmap in error path iommu/arm-smmu: Account for PMU interrupts iommu/vt-d: Enable ATS for the devices in SATC table iommu/vt-d: Remove unused function intel_svm_capable() iommu/vt-d: Add missing "__init" for rmrr_sanity_check() iommu/vt-d: Move intel_iommu_ops to header file iommu/vt-d: Fix indentation of goto labels iommu/vt-d: Remove unnecessary prototypes iommu/vt-d: Remove unnecessary includes iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO iommu/vt-d: Remove domain and devinfo mempool iommu/vt-d: Remove iova_cache_get/put() iommu/vt-d: Remove finding domain in dmar_insert_one_dev_info() iommu/vt-d: Remove intel_iommu::domains iommu/mediatek: Always tlb_flush_all when each PM resume iommu/mediatek: Add tlb_lock in tlb_flush_all iommu/mediatek: Remove the power status checking in tlb flush all ...
2022-03-24Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds19-136/+127
Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (qla2xxx, pm8001, libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates and bug fixes. The high blast radius core update is the removal of write same, which affects block and several non-SCSI devices. The other big change, which is more local, is the removal of the SCSI pointer" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits) scsi: scsi_ioctl: Drop needless assignment in sg_io() scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn() scsi: lpfc: Copyright updates for 14.2.0.0 patches scsi: lpfc: Update lpfc version to 14.2.0.0 scsi: lpfc: SLI path split: Refactor BSG paths scsi: lpfc: SLI path split: Refactor Abort paths scsi: lpfc: SLI path split: Refactor SCSI paths scsi: lpfc: SLI path split: Refactor CT paths scsi: lpfc: SLI path split: Refactor misc ELS paths scsi: lpfc: SLI path split: Refactor VMID paths scsi: lpfc: SLI path split: Refactor FDISC paths scsi: lpfc: SLI path split: Refactor LS_RJT paths scsi: lpfc: SLI path split: Refactor LS_ACC paths scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4 scsi: lpfc: SLI path split: Refactor lpfc_iocbq scsi: lpfc: Use kcalloc() ...
2022-03-24Merge tag 'for-5.18/dm-changes' of ↵Linus Torvalds3-3/+12
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Significant refactoring and fixing of how DM core does bio-based IO accounting with focus on fixing wildly inaccurate IO stats for dm-crypt (and other DM targets that defer bio submission in their own workqueues). End result is proper IO accounting, made possible by targets being updated to use the new dm_submit_bio_remap() interface. - Add hipri bio polling support (REQ_POLLED) to bio-based DM. - Reduce dm_io and dm_target_io structs so that a single dm_io (which contains dm_target_io and first clone bio) weighs in at 256 bytes. For reference the bio struct is 128 bytes. - Various other small cleanups, fixes or improvements in DM core and targets. - Update MAINTAINERS with my kernel.org email address to allow distinction between my "upstream" and "Red" Hats. * tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (46 commits) dm: consolidate spinlocks in dm_io struct dm: reduce size of dm_io and dm_target_io structs dm: switch dm_target_io booleans over to proper flags dm: switch dm_io booleans over to proper flags dm: update email address in MAINTAINERS dm: return void from __send_empty_flush dm: factor out dm_io_complete dm cache: use dm_submit_bio_remap dm: simplify dm_sumbit_bio_remap interface dm thin: use dm_submit_bio_remap dm: add WARN_ON_ONCE to dm_submit_bio_remap dm: support bio polling block: add ->poll_bio to block_device_operations dm mpath: use DMINFO instead of printk with KERN_INFO dm: stop using bdevname dm-zoned: remove the ->name field in struct dmz_dev dm: remove unnecessary local variables in __bind dm: requeue IO if mapping table not yet available dm io: remove stale comment block for dm_io() dm thin metadata: remove unused dm_thin_remove_block and __remove ...
2022-03-24mm: madvise: MADV_DONTNEED_LOCKEDJohannes Weiner1-0/+2
MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating, there are valid use cases for depopulating locked ranges as well. Users mlock memory to protect secrets. There are allocators for secure buffers that want in-use memory generally mlocked, but cleared and invalidated memory to give up the physical pages. This could be done with explicit munlock -> mlock calls on free -> alloc of course, but that adds two unnecessary syscalls, heavy mmap_sem write locks, vma splits and re-merges - only to get rid of the backing pages. Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are okay with on-demand initial population. It seems valid to selectively free some memory during the lifetime of such a process, without having to mess with its overall policy. Why add a separate flag? Isn't this a pretty niche usecase? - MADV_DONTNEED has been bailing on locked vmas forever. It's at least conceivable that someone, somewhere is relying on mlock to protect data from perhaps broader invalidation calls. Changing this behavior now could lead to quiet data corruption. - It also clarifies expectations around MADV_FREE and maybe MADV_REMOVE. It avoids the situation where one quietly behaves different than the others. MADV_FREE_LOCKED can be added later. - The combination of mlock() and madvise() in the first place is probably niche. But where it happens, I'd say that dropping pages from a locked region once they don't contain secrets or won't page anymore is much saner than relying on mlock to protect memory from speculative or errant invalidation calls. It's just that we can't change the default behavior because of the two previous points. Given that, an explicit new flag seems to make the most sense. [[email protected]: fix mips build] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Nadav Amit <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dr. David Alan Gilbert <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm: warn on deleting redirtied only if accountedHugh Dickins1-2/+1
filemap_unaccount_folio() has a WARN_ON_ONCE(folio_test_dirty(folio)). It is good to warn of late dirtying on a persistent filesystem, but late dirtying on tmpfs can only lose data which is expected to be thrown away; and it's a pity if that warning comes ONCE on tmpfs, then hides others which really matter. Make it conditional on mapping_cap_writeback(). Cleanup: then folio_account_cleaned() no longer needs to check that for itself, and so no longer needs to know the mapping. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Jan Kara <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/huge_memory: remove stale page_trans_huge_mapcount()David Hildenbrand1-5/+0
All users are gone, let's remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Rientjes <[email protected]> Cc: Don Dutile <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Liang Zhang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/swapfile: remove stale reuse_swap_page()David Hildenbrand1-4/+0
All users are gone, let's remove it. We'll let SWP_STABLE_WRITES stick around for now, as it might come in handy in the near future. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Rientjes <[email protected]> Cc: Don Dutile <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Liang Zhang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/khugepaged: remove reuse_swap_page() usageDavid Hildenbrand1-1/+0
reuse_swap_page() currently indicates if we can write to an anon page without COW. A COW is required if the page is shared by multiple processes (either already mapped or via swap entries) or if there is concurrent writeback that cannot tolerate concurrent page modifications. However, in the context of khugepaged we're not actually going to write to a read-only mapped page, we'll copy the page content to our newly allocated THP and map that THP writable. All we have to make sure is that the read-only mapped page we're about to copy won't get reused by another process sharing the page, otherwise, page content would get modified. But that is already guaranteed via multiple mechanisms (e.g., holding a reference, holding the page lock, removing the rmap after copying the page). The swapcache handling was introduced in commit 10359213d05a ("mm: incorporate read-only pages into transparent huge pages") and it sounds like it merely wanted to mimic what do_swap_page() would do when trying to map a page obtained via the swapcache writable. As that logic is unnecessary, let's just remove it, removing the last user of reuse_swap_page(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Rientjes <[email protected]> Cc: Don Dutile <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Liang Zhang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/huge_memory: make is_transparent_hugepage() staticMiaohe Lin1-6/+0
It's only used inside the huge_memory.c now. Don't export it and make it static. We can thus reduce the size of huge_memory.o a bit. Without this patch: text data bss dec hex filename 32319 2965 4 35288 89d8 mm/huge_memory.o With this patch: text data bss dec hex filename 32042 2957 4 35003 88bb mm/huge_memory.o Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: William Kucharski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan: move and hide kasan_save_enable/restore_multi_shotAndrey Konovalov1-4/+0
- Move kasan_save_enable/restore_multi_shot() declarations to mm/kasan/kasan.h, as there is no need for them to be visible outside of KASAN implementation. - Only define and export these functions when KASAN tests are enabled. - Move their definitions closer to other test-related code in report.c. Link: https://lkml.kernel.org/r/6ba637333b78447f027d775f2d55ab1a40f63c99.1646237226.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan: test: support async (again) and asymm modes for HW_TAGSAndrey Konovalov1-5/+0
Async mode support has already been implemented in commit e80a76aa1a91 ("kasan, arm64: tests supports for HW_TAGS async mode") but then got accidentally broken in commit 99734b535d9b ("kasan: detect false-positives in tests"). Restore the changes removed by the latter patch and adapt them for asymm mode: add a sync_fault flag to kunit_kasan_expectation that only get set if the MTE fault was synchronous, and reenable MTE on such faults in tests. Also rename kunit_kasan_expectation to kunit_kasan_status and move its definition to mm/kasan/kasan.h from include/linux/kasan.h, as this structure is only internally used by KASAN. Also put the structure definition under IS_ENABLED(CONFIG_KUNIT). Link: https://lkml.kernel.org/r/133970562ccacc93ba19d754012c562351d4a8c8.1645033139.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, vmalloc: only tag normal vmalloc allocationsAndrey Konovalov1-3/+4
The kernel can use to allocate executable memory. The only supported way to do that is via __vmalloc_node_range() with the executable bit set in the prot argument. (vmap() resets the bit via pgprot_nx()). Once tag-based KASAN modes start tagging vmalloc allocations, executing code from such allocations will lead to the PC register getting a tag, which is not tolerated by the kernel. Only tag the allocations for normal kernel pages. [[email protected]: pass KASAN_VMALLOC_PROT_NORMAL to kasan_unpoison_vmalloc()] Link: https://lkml.kernel.org/r/9230ca3d3e40ffca041c133a524191fd71969a8d.1646233925.git.andreyknvl@google.com [[email protected]: support tagged vmalloc mappings] Link: https://lkml.kernel.org/r/2f6605e3a358cf64d73a05710cb3da356886ad29.1646233925.git.andreyknvl@google.com [[email protected]: don't unintentionally disabled poisoning] Link: https://lkml.kernel.org/r/de4587d6a719232e83c760113e46ed2d4d8da61e.1646757322.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/fbfd9939a4dc375923c9a5c6b9e7ab05c26b8c6b.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, vmalloc: add vmalloc tagging for HW_TAGSAndrey Konovalov1-4/+32
Add vmalloc tagging support to HW_TAGS KASAN. The key difference between HW_TAGS and the other two KASAN modes when it comes to vmalloc: HW_TAGS KASAN can only assign tags to physical memory. The other two modes have shadow memory covering every mapped virtual memory region. Make __kasan_unpoison_vmalloc() for HW_TAGS KASAN: - Skip non-VM_ALLOC mappings as HW_TAGS KASAN can only tag a single mapping of normal physical memory; see the comment in the function. - Generate a random tag, tag the returned pointer and the allocation, and initialize the allocation at the same time. - Propagate the tag into the page stucts to allow accesses through page_address(vmalloc_to_page()). The rest of vmalloc-related KASAN hooks are not needed: - The shadow-related ones are fully skipped. - __kasan_poison_vmalloc() is kept as a no-op with a comment. Poisoning and zeroing of physical pages that are backing vmalloc() allocations are skipped via __GFP_SKIP_KASAN_UNPOISON and __GFP_SKIP_ZERO: __kasan_unpoison_vmalloc() does that instead. Enabling CONFIG_KASAN_VMALLOC with HW_TAGS is not yet allowed. Link: https://lkml.kernel.org/r/d19b2e9e59a9abc59d05b72dea8429dcaea739c6.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Co-developed-by: Vincenzo Frascino <[email protected]> Signed-off-by: Vincenzo Frascino <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: allow skipping memory init for HW_TAGSAndrey Konovalov2-7/+12
Add a new GFP flag __GFP_SKIP_ZERO that allows to skip memory initialization. The flag is only effective with HW_TAGS KASAN. This flag will be used by vmalloc code for page_alloc allocations backing vmalloc() mappings in a following patch. The reason to skip memory initialization for these pages in page_alloc is because vmalloc code will be initializing them instead. With the current implementation, when __GFP_SKIP_ZERO is provided, __GFP_ZEROTAGS is ignored. This doesn't matter, as these two flags are never provided at the same time. However, if this is changed in the future, this particular implementation detail can be changed as well. Link: https://lkml.kernel.org/r/0d53efeff345de7d708e0baa0d8829167772521e.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: allow skipping unpoisoning for HW_TAGSAndrey Konovalov2-10/+16
Add a new GFP flag __GFP_SKIP_KASAN_UNPOISON that allows skipping KASAN poisoning for page_alloc allocations. The flag is only effective with HW_TAGS KASAN. This flag will be used by vmalloc code for page_alloc allocations backing vmalloc() mappings in a following patch. The reason to skip KASAN poisoning for these pages in page_alloc is because vmalloc code will be poisoning them instead. Also reword the comment for __GFP_SKIP_KASAN_POISON. Link: https://lkml.kernel.org/r/35c97d77a704f6ff971dd3bfe4be95855744108e.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGSAndrey Konovalov2-4/+16
Only define the ___GFP_SKIP_KASAN_POISON flag when CONFIG_KASAN_HW_TAGS is enabled. This patch it not useful by itself, but it prepares the code for additions of new KASAN-specific GFP patches. Link: https://lkml.kernel.org/r/44e5738a584c11801b2b8f1231898918efc8634a.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, vmalloc, arm64: mark vmalloc mappings as pgprot_taggedAndrey Konovalov1-0/+7
HW_TAGS KASAN relies on ARM Memory Tagging Extension (MTE). With MTE, a memory region must be mapped as MT_NORMAL_TAGGED to allow setting memory tags via MTE-specific instructions. Add proper protection bits to vmalloc() allocations. These allocations are always backed by page_alloc pages, so the tags will actually be getting set on the corresponding physical memory. Link: https://lkml.kernel.org/r/983fc33542db2f6b1e77b34ca23448d4640bbb9e.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Co-developed-by: Vincenzo Frascino <[email protected]> Signed-off-by: Vincenzo Frascino <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, vmalloc: add vmalloc tagging for SW_TAGSAndrey Konovalov1-6/+10
Add vmalloc tagging support to SW_TAGS KASAN. - __kasan_unpoison_vmalloc() now assigns a random pointer tag, poisons the virtual mapping accordingly, and embeds the tag into the returned pointer. - __get_vm_area_node() (used by vmalloc() and vmap()) and pcpu_get_vm_areas() save the tagged pointer into vm_struct->addr (note: not into vmap_area->addr). This requires putting kasan_unpoison_vmalloc() after setup_vmalloc_vm[_locked](); otherwise the latter will overwrite the tagged pointer. The tagged pointer then is naturally propagateed to vmalloc() and vmap(). - vm_map_ram() returns the tagged pointer directly. As a result of this change, vm_struct->addr is now tagged. Enabling KASAN_VMALLOC with SW_TAGS is not yet allowed. Link: https://lkml.kernel.org/r/4a78f3c064ce905e9070c29733aca1dd254a74f1.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan: add wrappers for vmalloc hooksAndrey Konovalov1-2/+15
Add wrappers around functions that [un]poison memory for vmalloc allocations. These functions will be used by HW_TAGS KASAN and therefore need to be disabled when kasan=off command line argument is provided. This patch does no functional changes for software KASAN modes. Link: https://lkml.kernel.org/r/3b8728eac438c55389fb0f9a8a2145d71dd77487.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan: reorder vmalloc hooksAndrey Konovalov1-11/+9
Group functions that [de]populate shadow memory for vmalloc. Group functions that [un]poison memory for vmalloc. This patch does no functional changes but prepares KASAN code for adding vmalloc support to HW_TAGS KASAN. Link: https://lkml.kernel.org/r/aeef49eb249c206c4c9acce2437728068da74c28.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, vmalloc: drop outdated VM_KASAN commentAndrey Konovalov1-11/+0
The comment about VM_KASAN in include/linux/vmalloc.c is outdated. VM_KASAN is currently only used to mark vm_areas allocated for kernel modules when CONFIG_KASAN_VMALLOC is disabled. Drop the comment. Link: https://lkml.kernel.org/r/780395afea83a147b3b5acc36cf2e38f7f8479f9.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, x86, arm64, s390: rename functions for modules shadowAndrey Konovalov1-7/+7
Rename kasan_free_shadow to kasan_free_module_shadow and kasan_module_alloc to kasan_alloc_module_shadow. These functions are used to allocate/free shadow memory for kernel modules when KASAN_VMALLOC is not enabled. The new names better reflect their purpose. Also reword the comment next to their declaration to improve clarity. Link: https://lkml.kernel.org/r/36db32bde765d5d0b856f77d2d806e838513fe84.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hookAndrey Konovalov1-9/+0
Currently, the code responsible for initializing and poisoning memory in post_alloc_hook() is scattered across two locations: kasan_alloc_pages() hook for HW_TAGS KASAN and post_alloc_hook() itself. This is confusing. This and a few following patches combine the code from these two locations. Along the way, these patches do a step-by-step restructure the many performed checks to make them easier to follow. Replace the only caller of kasan_alloc_pages() with its implementation. As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is enabled, moving the code does no functional changes. Also move init and init_tags variables definitions out of kasan_has_integrated_init() clause in post_alloc_hook(), as they have the same values regardless of what the if condition evaluates to. This patch is not useful by itself but makes the simplifications in the following patches easier to follow. Link: https://lkml.kernel.org/r/5ac7e0b30f5cbb177ec363ddd7878a3141289592.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm: clarify __GFP_ZEROTAGS commentAndrey Konovalov1-2/+4
__GFP_ZEROTAGS is intended as an optimization: if memory is zeroed during allocation, it's possible to set memory tags at the same time with little performance impact. Clarify this intention of __GFP_ZEROTAGS in the comment. Link: https://lkml.kernel.org/r/cdffde013973c5634a447513e10ec0d21e8eee29.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: merge kasan_free_pages into free_pages_prepareAndrey Konovalov1-8/+0
Currently, the code responsible for initializing and poisoning memory in free_pages_prepare() is scattered across two locations: kasan_free_pages() for HW_TAGS KASAN and free_pages_prepare() itself. This is confusing. This and a few following patches combine the code from these two locations. Along the way, these patches also simplify the performed checks to make them easier to follow. Replaces the only caller of kasan_free_pages() with its implementation. As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is enabled, moving the code does no functional changes. This patch is not useful by itself but makes the simplifications in the following patches easier to follow. Link: https://lkml.kernel.org/r/303498d15840bb71905852955c6e2390ecc87139.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/migration: add trace events for base page and HugeTLB migrationsAnshuman Khandual1-0/+31
This adds two trace events for base page and HugeTLB page migrations. These events, closely follow the implementation details like setting and removing of PTE migration entries, which are essential operations for migration. The new CREATE_TRACE_POINTS in <mm/rmap.c> covers both <events/migration.h> and <events/tlb.h> based trace events. Hence drop redundant CREATE_TRACE_POINTS from other places which could have otherwise conflicted during build. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reported-by: kernel test robot <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Zi Yan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/migration: add trace events for THP migrationsAnshuman Khandual1-0/+27
Patch series "mm/migration: Add trace events", v3. This adds trace events for all migration scenarios including base page, THP and HugeTLB. This patch (of 3): This adds two trace events for PMD based THP migration without split. These events closely follow the implementation details like setting and removing of PMD migration entries, which are essential operations for THP migration. This moves CREATE_TRACE_POINTS into generic THP from powerpc for these new trace events to be available on other platforms as well. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Zi Yan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm: delete __ClearPageWaiters()Hugh Dickins1-1/+1
The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and vmscan.c's free_unref_page_list() callers rely on that not to generate bad_page() alerts. So __page_cache_release(), put_pages_list() and release_pages() (and presumably copy-and-pasted free_zone_device_page()) are redundant and misleading to make a special point of clearing it (as the "__" implies, it could only safely be used on the freeing path). Delete __ClearPageWaiters(). Remark on this in one of the "possible" comments in folio_wake_bit(), and delete the superfluous comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Tested-by: Yu Zhao <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24Merge tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds2-2/+4
Pull ceph updates from Ilya Dryomov: "The highlights are: - several changes to how snap context and snap realms are tracked (Xiubo Li). In particular, this should resolve a long-standing issue of high kworker CPU usage and various stalls caused by needless iteration over all inodes in the snap realm. - async create fixes to address hangs in some edge cases (Jeff Layton) - support for getvxattr MDS op for querying server-side xattrs, such as file/directory layouts and ephemeral pins (Milind Changire) - average latency is now maintained for all metrics (Venky Shankar) - some tweaks around handling inline data to make it fit better with netfs helper library (David Howells) Also a couple of memory leaks got plugged along with a few assorted fixups. Last but not least, Xiubo has stepped up to serve as a CephFS co-maintainer" * tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client: (27 commits) ceph: fix memory leak in ceph_readdir when note_last_dentry returns error ceph: uninitialized variable in debug output ceph: use tracked average r/w/m latencies to display metrics in debugfs ceph: include average/stdev r/w/m latency in mds metrics ceph: track average r/w/m latency ceph: use ktime_to_timespec64() rather than jiffies_to_timespec64() ceph: assign the ci only when the inode isn't NULL ceph: fix inode reference leakage in ceph_get_snapdir() ceph: misc fix for code style and logs ceph: allocate capsnap memory outside of ceph_queue_cap_snap() ceph: do not release the global snaprealm until unmounting ceph: remove incorrect and unused CEPH_INO_DOTDOT macro MAINTAINERS: add Xiubo Li as cephfs co-maintainer ceph: eliminate the recursion when rebuilding the snap context ceph: do not update snapshot context when there is no new snapshot ceph: zero the dir_entries memory when allocating it ceph: move to a dedicated slabcache for ceph_cap_snap ceph: add getvxattr op libceph: drop else branches in prepare_read_data{,_cont} ceph: fix comments mentioning i_mutex ...
2022-03-24Merge tag 'drm-next-2022-03-24' of git://anongit.freedesktop.org/drm/drmLinus Torvalds54-178/+1289
Pull drm updates from Dave Airlie: "Lots of work all over, Intel improving DG2 support, amdkfd CRIU support, msm new hw support, and faster fbdev support. dma-buf: - rename dma-buf-map to iosys-map core: - move buddy allocator to core - add pci/platform init macros - improve EDID parser deep color handling - EDID timing type 7 support - add GPD Win Max quirk - add yes/no helpers to string_helpers - flatten syncobj chains - add nomodeset support to lots of drivers - improve fb-helper clipping support - add default property value interface fbdev: - improve fbdev ops speed ttm: - add a backpointer from ttm bo->ttm resource dp: - move displayport headers - add a dp helper module bridge: - anx7625 atomic support, HDCP support panel: - split out panel-lvds and lvds bindings - find panels in OF subnodes privacy: - add chromeos privacy screen support fb: - hot unplug fw fb on forced removal simpledrm: - request region instead of marking ioresource busy - add panel oreintation property udmabuf: - fix oops with 0 pages amdgpu: - power management code cleanup - Enable freesync video mode by default - RAS code cleanup - Improve VRAM access for debug using SDMA - SR-IOV rework special register access and fixes - profiling power state request ioctl - expose IP discovery via sysfs - Cyan skillfish updates - GC 10.3.7, SDMA 5.2.7, DCN 3.1.6 updates - expose benchmark tests via debugfs - add module param to disable XGMI for testing - GPU reset debugfs register dumping support amdkfd: - CRIU support - SDMA queue fixes radeon: - UVD suspend fix - iMac backlight fix i915: - minimal parallel submission for execlists - DG2-G12 subplatform added - DG2 programming workarounds - DG2 accelerated migration support - flat CCS and CCS engine support for XeHP - initial small BAR support - drop fake LMEM support - ADL-N PCH support - bigjoiner updates - introduce VMA resources and async unbinding - register definitions cleanups - multi-FBC refactoring - DG1 OPROM over SPI support - ADL-N platform enabling - opregion mailbox #5 support - DP MST ESI improvements - drm device based logging - async flip optimisation for DG2 - CPU arch abstraction fixes - improve GuC ADS init to work on aarch64 - tweak TTM LRU priority hint - GuC 69.0.3 support - remove short term execbuf pins nouveau: - higher DP/eDP bitrates - backlight fixes msm: - dpu + dp support for sc8180x - dp support for sm8350 - dpu + dsi support for qcm2290 - 10nm dsi phy tuning support - bridge support for dp encoder - gpu support for additional 7c3 SKUs ingenic: - HDMI support for JZ4780 - aux channel EDID support ast: - AST2600 support - add wide screen support - create DP/DVI connectors omapdrm: - fix implicit dma_buf fencing vc4: - add CSC + full range support - better display firmware handoff panfrost: - add initial dual-core GPU support stm: - new revision support - fb handover support mediatek: - transfer display binding document to yaml format. - add mt8195 display device binding. - allow commands to be sent during video mode. - add wait_for_event for crtc disable by cmdq. tegra: - YUV format support rcar-du: - LVDS support for M3-W+ (R8A77961) exynos: - BGR pixel format for FIMD device" * tag 'drm-next-2022-03-24' of git://anongit.freedesktop.org/drm/drm: (1529 commits) drm/i915/display: Do not re-enable PSR after it was marked as not reliable drm/i915/display: Fix HPD short pulse handling for eDP drm/amdgpu: Use drm_mode_copy() drm/radeon: Use drm_mode_copy() drm/amdgpu: Use ternary operator in `vcn_v1_0_start()` drm/amdgpu: Remove pointless on stack mode copies drm/amd/pm: fix indenting in __smu_cmn_reg_print_error() drm/amdgpu/dc: fix typos in comments drm/amdgpu: fix typos in comments drm/amd/pm: fix typos in comments drm/amdgpu: Add stolen reserved memory for MI25 SRIOV. drm/amdgpu: Merge get_reserved_allocation to get_vbios_allocations. drm/amdkfd: evict svm bo worker handle error drm/amdgpu/vcn: fix vcn ring test failure in igt reload test drm/amdgpu: only allow secure submission on rings which support that drm/amdgpu: fixed the warnings reported by kernel test robot drm/amd/display: 3.2.177 drm/amd/display: [FW Promotion] Release 0.0.108.0 drm/amd/display: Add save/restore PANEL_PWRSEQ_REF_DIV2 drm/amd/display: Wait for hubp read line for Pollock ...
2022-03-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds7-20/+25
Merge more updates from Andrew Morton: "Various misc subsystems, before getting into the post-linux-next material. 41 patches. Subsystems affected by this patch series: procfs, misc, core-kernel, lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump, taskstats, panic, kcov, resource, and ubsan" * emailed patches from Andrew Morton <[email protected]>: (41 commits) Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang" kernel/resource: fix kfree() of bootmem memory again kcov: properly handle subsequent mmap calls kcov: split ioctl handling into locked and unlocked parts panic: move panic_print before kmsg dumpers panic: add option to dump all CPUs backtraces in panic_print docs: sysctl/kernel: add missing bit to panic_print taskstats: remove unneeded dead assignment kasan: no need to unset panic_on_warn in end_report() ubsan: no need to unset panic_on_warn in ubsan_epilogue() panic: unset panic_on_warn inside panic() docs: kdump: add scp example to write out the dump file docs: kdump: update description about sysfs file system support arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible cgroup: use irqsave in cgroup_rstat_flush_locked(). fat: use pointer to simple type in put_user() minix: fix bug when opening a file with O_DIRECT ...
2022-03-24Merge tag 'net-next-5.18' of ↵Linus Torvalds162-825/+3346
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...
2022-03-24Merge tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds8-213/+917
Pull VFIO updates from Alex Williamson: - Introduce new device migration uAPI and implement device specific mlx5 vfio-pci variant driver supporting new protocol (Jason Gunthorpe, Yishai Hadas, Leon Romanovsky) - New HiSilicon acc vfio-pci variant driver, also supporting migration interface (Shameer Kolothum, Longfang Liu) - D3hot fixes for vfio-pci-core (Abhishek Sahu) - Document new vfio-pci variant driver acceptance criteria (Alex Williamson) - Fix UML build unresolved ioport_{un}map() functions (Alex Williamson) - Fix MAINTAINERS due to header movement (Lukas Bulwahn) * tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio: (31 commits) vfio-pci: Provide reviewers and acceptance criteria for variant drivers MAINTAINERS: adjust entry for header movement in hisilicon qm driver hisi_acc_vfio_pci: Use its own PCI reset_done error handler hisi_acc_vfio_pci: Add support for VFIO live migration crypto: hisilicon/qm: Set the VF QM state register hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices hisi_acc_qm: Move VF PCI device IDs to common header crypto: hisilicon/qm: Move few definitions to common header crypto: hisilicon/qm: Move the QM header to include/linux vfio/mlx5: Fix to not use 0 as NULL pointer PCI/IOV: Fix wrong kernel-doc identifier vfio/mlx5: Use its own PCI reset_done error handler vfio/pci: Expose vfio_pci_core_aer_err_detected() vfio/mlx5: Implement vfio_pci driver for mlx5 devices vfio/mlx5: Expose migration commands over mlx5 device vfio: Remove migration protocol v1 documentation vfio: Extend the device migration protocol with RUNNING_P2P vfio: Define device migration protocol v2 ...
2022-03-24Merge tag 'hyperv-next-signed-20220322' of ↵Linus Torvalds1-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: "Minor patches from various people" * tag 'hyperv-next-signed-20220322' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Output host build info as normal Windows version number hv_balloon: rate-limit "Unhandled message" warning drivers: hv: log when enabling crash_kexec_post_notifiers hv_utils: Add comment about max VMbus packet size in VSS driver Drivers: hv: Compare cpumasks and not their weights in init_vp_index() Drivers: hv: Rename 'alloced' to 'allocated' Drivers: hv: vmbus: Use struct_size() helper in kmalloc()
2022-03-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-12/+39
Pull kvm updates from Paolo Bonzini: "ARM: - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes RISC-V: - Prevent KVM_COMPAT from being selected - Optimize __kvm_riscv_switch_to() implementation - RISC-V SBI v0.3 support s390: - memop selftest - fix SCK locking - adapter interruptions virtualization for secure guests - add Claudio Imbrenda as maintainer - first step to do proper storage key checking x86: - Continue switching kvm_x86_ops to static_call(); introduce static_call_cond() and __static_call_ret0 when applicable. - Cleanup unused arguments in several functions - Synthesize AMD 0x80000021 leaf - Fixes and optimization for Hyper-V sparse-bank hypercalls - Implement Hyper-V's enlightened MSR bitmap for nested SVM - Remove MMU auditing - Eager splitting of page tables (new aka "TDP" MMU only) when dirty page tracking is enabled - Cleanup the implementation of the guest PGD cache - Preparation for the implementation of Intel IPI virtualization - Fix some segment descriptor checks in the emulator - Allow AMD AVIC support on systems with physical APIC ID above 255 - Better API to disable virtualization quirks - Fixes and optimizations for the zapping of page tables: - Zap roots in two passes, avoiding RCU read-side critical sections that last too long for very large guests backed by 4 KiB SPTEs. - Zap invalid and defunct roots asynchronously via concurrency-managed work queue. - Allowing yielding when zapping TDP MMU roots in response to the root's last reference being put. - Batch more TLB flushes with an RCU trick. Whoever frees the paging structure now holds RCU as a proxy for all vCPUs running in the guest, i.e. to prolongs the grace period on their behalf. It then kicks the the vCPUs out of guest mode before doing rcu_read_unlock(). Generic: - Introduce __vcalloc and use it for very large allocations that need memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) KVM: use kvcalloc for array allocations KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 kvm: x86: Require const tsc for RT KVM: x86: synthesize CPUID leaf 0x80000021h if useful KVM: x86: add support for CPUID leaf 0x80000021 KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()" kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU KVM: arm64: fix typos in comments KVM: arm64: Generalise VM features into a set of flags KVM: s390: selftests: Add error memop tests KVM: s390: selftests: Add more copy memop tests KVM: s390: selftests: Add named stages for memop test KVM: s390: selftests: Add macro as abstraction for MEM_OP KVM: s390: selftests: Split memop tests KVM: s390x: fix SCK locking RISC-V: KVM: Implement SBI HSM suspend call RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function RISC-V: Add SBI HSM suspend related defines RISC-V: KVM: Implement SBI v0.3 SRST extension ...
2022-03-24Merge tag 'flexible-array-transformations-5.18-rc1' of ↵Linus Torvalds10-15/+15
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull flexible-array transformations from Gustavo Silva: "Treewide patch that replaces zero-length arrays with flexible-array members. This has been baking in linux-next for a whole development cycle" * tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: treewide: Replace zero-length arrays with flexible-array members
2022-03-24Merge tag 'prlimit-tasklist_lock-for-v5.18' of ↵Linus Torvalds2-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull tasklist_lock optimizations from Eric Biederman: "prlimit and getpriority tasklist_lock optimizations The tasklist_lock popped up as a scalability bottleneck on some testing workloads. The readlocks in do_prlimit and set/getpriority are not necessary in all cases. Based on a cycles profile, it looked like ~87% of the time was spent in the kernel, ~42% of which was just trying to get *some* spinlock (queued_spin_lock_slowpath, not necessarily the tasklist_lock). The big offenders (with rough percentages in cycles of the overall trace): - do_wait 11% - setpriority 8% (done previously in commit 7f8ca0edfe07) - kill 8% - do_exit 5% - clone 3% - prlimit64 2% (this patchset) - getrlimit 1% (this patchset) I can't easily test this patchset on the original workload for various reasons. Instead, I used the microbenchmark below to at least verify there was some improvement. This patchset had a 28% speedup (12% from baseline to set/getprio, then another 14% for prlimit). This series used to do the setpriority case, but an almost identical change was merged as commit 7f8ca0edfe07 ("kernel/sys.c: only take tasklist_lock for get/setpriority(PRIO_PGRP)") so that has been dropped from here. One interesting thing is that my libc's getrlimit() was calling prlimit64, so hoisting the read_lock(tasklist_lock) into sys_prlimit64 had no effect - it essentially optimized the older syscalls only. I didn't do that in this patchset, but figured I'd mention it since it was an option from the previous patch's discussion" micobenchmark.c: --------------- int main(int argc, char **argv) { pid_t child; struct rlimit rlim[1]; fork(); fork(); fork(); fork(); fork(); fork(); for (int i = 0; i < 5000; i++) { child = fork(); if (child < 0) exit(1); if (child > 0) { usleep(1000); kill(child, SIGTERM); waitpid(child, NULL, 0); } else { for (;;) { setpriority(PRIO_PROCESS, 0, getpriority(PRIO_PROCESS, 0)); getrlimit(RLIMIT_CPU, rlim); } } } return 0; } Link: https://lore.kernel.org/lkml/[email protected]/ [v1] Link: https://lore.kernel.org/lkml/[email protected]/ [v2] Link: https://lore.kernel.org/lkml/[email protected]/ [v3] * tag 'prlimit-tasklist_lock-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: prlimit: do not grab the tasklist_lock prlimit: make do_prlimit() static
2022-03-24netfilter: egress: Report interface as outgoingPhil Sutter1-1/+1
Otherwise packets in egress chains seem like they are being received by the interface, not sent out via it. Fixes: 42df6e1d221dd ("netfilter: Introduce egress hook") Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2022-03-24io_uring: remove IORING_CQE_F_MSGJens Axboe1-2/+0
This was introduced with the message ring opcode, but isn't strictly required for the request itself. The sender can encode what is needed in user_data, which is passed to the receiver. It's unclear if having a separate flag that essentially says "This CQE did not originate from an SQE on this ring" provides any real utility to applications. While we can always re-introduce a flag to provide this information, we cannot take it away at a later point in time. Remove the flag while we still can, before it's in a released kernel. Signed-off-by: Jens Axboe <[email protected]>
2022-03-24dma-buf: finally make the dma_resv_list private v2Christian König1-25/+1
Drivers should never touch this directly. v2: drop kerneldoc for now internal handling Signed-off-by: Christian König <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-03-24dma-buf: add dma_resv_replace_fences v2Christian König1-0/+2
This function allows to replace fences from the shared fence list when we can gurantee that the operation represented by the original fence has finished or no accesses to the resources protected by the dma_resv object any more when the new fence finishes. Then use this function in the amdkfd code when BOs are unmapped from the process. v2: add an example when this is usefull. Signed-off-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-03-23kexec: make crashk_res, crashk_low_res and crash_notes symbols always visibleJisheng Zhang1-6/+6
Patch series "kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef", v2. Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and increase compile coverage. I only modified x86, arm, arm64 and riscv, other architectures such as sh, powerpc and s390 are better to be kept kexec code as-is so they are not touched. This patch (of 5): Make the forward declarations of crashk_res, crashk_low_res and crash_notes always visible. Code referring to these symbols can then just check for IS_ENABLED(CONFIG_KEXEC_CORE), instead of requiring conditional compilation using an #ifdef, thus preparing to increase compile coverage and simplify the code. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jisheng Zhang <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Russell King <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Russell King (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-23init.h: improve __setup and early_param documentationRandy Dunlap1-2/+9
Igor noted in [1] that there are quite a few __setup() handling functions that return incorrect values. Doing this can be harmless, but it can also cause strings to be added to init's argument or environment list, polluting them. Since __setup() handling and return values are not documented, first add documentation for that. Also add more documentation for early_param() handling and return values. For __setup() functions, returning 0 (not handled) has questionable value if it is just a malformed option value, as in rodata=junk since returning 0 would just cause "rodata=junk" to be added to init's environment unnecessarily: Run /sbin/init as init process with arguments: /sbin/init with environment: HOME=/ TERM=linux splash=native rodata=junk Also, there are no recommendations on whether to print a warning when an unknown parameter value is seen. I am not addressing that here. [1] lore.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Reported-by: Igor Zhbanov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-23bitfield: add explicit inclusions to the exampleAndy Shevchenko1-0/+3
It's not obvious that bitfield.h doesn't guarantee the bits.h inclusion and the example in the former is confusing. Some developers think that it's okay to just include bitfield.h to get it working. Change example to explicitly include necessary headers in order to avoid confusion. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3e9b3112ec74 ("add basic register-field manipulation macros") Depends-on: 8bd9cb51daac ("locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a new <linux/bits.h> file") Signed-off-by: Andy Shevchenko <[email protected]> Reported-by: Jan Dąbroś <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>