aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-04-05iov_iter: add copy_page_to_iter_nofault()Lorenzo Stoakes1-0/+2
Provide a means to copy a page to user space from an iterator, aborting if a page fault would occur. This supports compound pages, but may be passed a tail page with an offset extending further into the compound page, so we cannot pass a folio. This allows for this function to be called from atomic context and _try_ to user pages if they are faulted in, aborting if not. The function does not use _copy_to_iter() in order to not specify might_fault(), this is similar to copy_page_from_iter_atomic(). This is being added in order that an iteratable form of vread() can be implemented while holding spinlocks. Link: https://lkml.kernel.org/r/19734729defb0f498a76bdec1bef3ac48a3af3e8.1679511146.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Baoquan He <[email protected]> Cc: Alexander Viro <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Liu Shixin <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm/page_alloc: make deferred page init free pages in MAX_ORDER blocksKirill A. Shutemov1-0/+2
Normal page init path frees pages during the boot in MAX_ORDER chunks, but deferred page init path does it in pageblock blocks. Change deferred page init path to work in MAX_ORDER blocks. For cases when MAX_ORDER is larger than pageblock, set migrate type to MIGRATE_MOVABLE for all pageblocks covered by the page. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: remove vmf_insert_pfn_xxx_prot() for huge page-table entriesLorenzo Stoakes1-37/+2
This functionality's sole user, the drm ttm module, removed support for it in commit 0d979509539e ("drm/ttm: remove ttm_bo_vm_insert_huge()") as the whole approach is currently unworkable without a PMD/PUD special bit and updates to GUP. Link: https://lkml.kernel.org/r/604c2ad79659d4b8a6e3e1611c6219d5d3233988.1678661628.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Cc: Christian König <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Aaron Tomlin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Peter Xu <[email protected]> Cc: "Russell King (Oracle)" <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: remove unused vmf_insert_mixed_prot()Lorenzo Stoakes2-8/+1
Patch series "Remove drm/ttm-specific mm changes". Functionality was added specifically for the DRM TTM driver to support mapping memory for VM_MIXEDMAP VMAs with customised protection flags, however this has now been rolled back as issues were found with this approach. This series removes the mm changes too, retaining some of the useful comments. This patch (of 3): The sole user of vmf_insert_mixed_prot(), the drm ttm module, stopped using this in commit f91142c62161 ("drm/ttm: nuke VM_MIXEDMAP on BO mappings v3") citing use of VM_MIXEDMAP in this case being terribly broken. Remove this now-dead code and references to it, but retain the useful description of the prot != vma->vm_page_prot case, moving it to vmf_insert_pfn_prot() instead. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/a069644388e6f1593a7020d15840e6fc9f39bcaf.1678661628.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Cc: Christian König <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Aaron Tomlin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Peter Xu <[email protected]> Cc: "Russell King (Oracle)" <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm/memtest: add results of early memtest to /proc/meminfoTomas Mudrunka1-0/+2
Currently the memtest results were only presented in dmesg. When running a large fleet of devices without ECC RAM it's currently not easy to do bulk monitoring for memory corruption. You have to parse dmesg, but that's a ring buffer so the error might disappear after some time. In general I do not consider dmesg to be a great API to query RAM status. In several companies I've seen such errors remain undetected and cause issues for way too long. So I think it makes sense to provide a monitoring API, so that we can safely detect and act upon them. This adds /proc/meminfo entry which can be easily used by scripts. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tomas Mudrunka <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: move vmalloc_init() declaration to mm/internal.hMike Rapoport (IBM)1-4/+0
vmalloc_init() is called only from mm_core_init(), there is no need to declare it in include/linux/vmalloc.h Move vmalloc_init() declaration to mm/internal.h Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: move kmem_cache_init() declaration to mm/slab.hMike Rapoport (IBM)1-1/+0
kmem_cache_init() is called only from mm_core_init(), there is no need to declare it in include/linux/slab.h Move kmem_cache_init() declaration to mm/slab.h Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: move mem_init_print_info() to mm_init.cMike Rapoport (IBM)1-1/+0
mem_init_print_info() is only called from mm_core_init(). Move it close to the caller and make it static. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05init,mm: fold late call to page_ext_init() to page_alloc_init_late()Mike Rapoport (IBM)1-2/+0
When deferred initialization of struct pages is enabled, page_ext_init() must be called after all the deferred initialization is done, but there is no point to keep it a separate call from kernel_init_freeable() right after page_alloc_init_late(). Fold the call to page_ext_init() into page_alloc_init_late() and localize deferred_struct_pages variable. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: move init_mem_debugging_and_hardening() to mm/mm_init.cMike Rapoport (IBM)1-1/+0
init_mem_debugging_and_hardening() is only called from mm_core_init(). Move it close to the caller, make it static and rename it to mem_debugging_and_hardening_init() for consistency with surrounding convention. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: call {ptlock,pgtable}_cache_init() directly from mm_core_init()Mike Rapoport (IBM)1-6/+0
and drop pgtable_init() as it has no real value and its name is misleading. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Sergei Shtylyov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05init,mm: move mm_init() to mm/mm_init.c and rename it to mm_core_init()Mike Rapoport (IBM)1-0/+1
Make mm_init() a part of mm/ codebase. mm_core_init() better describes what the function does and does not clash with mm_init() in kernel/fork.c Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm/page_alloc: rename page_alloc_init() to page_alloc_init_cpuhp()Mike Rapoport (IBM)1-1/+1
The page_alloc_init() name is really misleading because all this function does is sets up CPU hotplug callbacks for the page allocator. Rename it to page_alloc_init_cpuhp() so that name will reflect what the function does. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: move most of core MM initialization to mm/mm_init.cMike Rapoport (IBM)1-5/+0
The bulk of memory management initialization code is spread all over mm/page_alloc.c and makes navigating through page allocator functionality difficult. Move most of the functions marked __init and __meminit to mm/mm_init.c to make it better localized and allow some more spare room before mm/page_alloc.c reaches 10k lines. No functional changes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: move get_page_from_free_area() to mm/page_alloc.cMike Rapoport (IBM)1-7/+0
The get_page_from_free_area() helper is only used in mm/page_alloc.c so move it there to reduce noise in include/linux/mmzone.h Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: userfaultfd: add UFFDIO_CONTINUE_MODE_WP to install WP PTEsAxel Rasmussen1-1/+2
UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new PTE to resolve a missing fault, one can install a write-protected one. This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in combination. This was motivated by testing HugeTLB HGM [1], and in particular its interaction with userfaultfd features. Existing userfaultfd code supports using WP and MINOR modes together (i.e. you can register an area with both enabled), but without this CONTINUE flag the combination is in practice unusable. So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing as UFFDIO_COPY_MODE_WP, but for *minor* faults. Update the selftest to do some very basic exercising of the new flag. Update Documentation/ to describe how these flags are used (neither the COPY nor the new CONTINUE versions of this mode flag were described there before). [1]: https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: userfaultfd: combine 'mode' and 'wp_copy' argumentsAxel Rasmussen3-24/+37
Many userfaultfd ioctl functions take both a 'mode' and a 'wp_copy' argument. In future commits we plan to plumb the flags through to more places, so we'd be proliferating the very long argument list even further. Let's take the time to simplify the argument list. Combine the two arguments into one - and generalize, so when we add more flags in the future, it doesn't imply more function arguments. Since the modes (copy, zeropage, continue) are mutually exclusive, store them as an integer value (0, 1, 2) in the low bits. Place combine-able flag bits in the high bits. This is quite similar to an earlier patch proposed by Nadav Amit ("userfaultfd: introduce uffd_flags" [1]). The main difference is that patch only handled flags, whereas this patch *also* combines the "mode" argument into the same type to shorten the argument list. [1]: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: James Houghton <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: userfaultfd: don't pass around both mm and vmaAxel Rasmussen3-7/+6
Quite a few userfaultfd functions took both mm and vma pointers as arguments. Since the mm is trivially accessible via vma->vm_mm, there's no reason to pass both; it just needlessly extends the already long argument list. Get rid of the mm pointer, where possible, to shorten the argument list. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: Jan Kara <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: userfaultfd: rename functions for clarity + consistencyAxel Rasmussen2-24/+24
Patch series "mm: userfaultfd: refactor and add UFFDIO_CONTINUE_MODE_WP", v5. - Commits 1-3 refactor userfaultfd ioctl code without behavior changes, with the main goal of improving consistency and reducing the number of function args. - Commit 4 adds UFFDIO_CONTINUE_MODE_WP. This patch (of 4): The basic problem is, over time we've added new userfaultfd ioctls, and we've refactored the code so functions which used to handle only one case are now re-used to deal with several cases. While this happened, we didn't bother to rename the functions. Similarly, as we added new functions, we cargo-culted pieces of the now-inconsistent naming scheme, so those functions too ended up with names that don't make a lot of sense. A key point here is, "copy" in most userfaultfd code refers specifically to UFFDIO_COPY, where we allocate a new page and copy its contents from userspace. There are many functions with "copy" in the name that don't actually do this (at least in some cases). So, rename things into a consistent scheme. The high level idea is that the call stack for userfaultfd ioctls becomes: userfaultfd_ioctl -> userfaultfd_(particular ioctl) -> mfill_atomic_(particular kind of fill operation) -> mfill_atomic /* loops over pages in range */ -> mfill_atomic_pte /* deals with single pages */ -> mfill_atomic_pte_(particular kind of fill operation) -> mfill_atomic_install_pte There are of course some special cases (shmem, hugetlb), but this is the general structure which all function names now adhere to. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: Jan Kara <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm, treewide: redefine MAX_ORDER sanelyKirill A. Shutemov4-11/+11
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [[email protected]: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [[email protected]: fix another min_t warning] [[email protected]: fixups per Zi Yan] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Michael Ellerman <[email protected]> [powerpc] Cc: "Kirill A. Shutemov" <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm/uffd: UFFD_FEATURE_WP_UNPOPULATEDPeter Xu2-0/+29
Patch series "mm/uffd: Add feature bit UFFD_FEATURE_WP_UNPOPULATED", v4. The new feature bit makes anonymous memory acts the same as file memory on userfaultfd-wp in that it'll also wr-protect none ptes. It can be useful in two cases: (1) Uffd-wp app that needs to wr-protect none ptes like QEMU snapshot, so pre-fault can be replaced by enabling this flag and speed up protections (2) It helps to implement async uffd-wp mode that Muhammad is working on [1] It's debatable whether this is the most ideal solution because with the new feature bit set, wr-protect none pte needs to pre-populate the pgtables to the last level (PAGE_SIZE). But it seems fine so far to service either purpose above, so we can leave optimizations for later. The series brings pte markers to anonymous memory too. There's some change in the common mm code path in the 1st patch, great to have some eye looking at it, but hopefully they're still relatively straightforward. This patch (of 2): This is a new feature that controls how uffd-wp handles none ptes. When it's set, the kernel will handle anonymous memory the same way as file memory, by allowing the user to wr-protect unpopulated ptes. File memories handles none ptes consistently by allowing wr-protecting of none ptes because of the unawareness of page cache being exist or not. For anonymous it was not as persistent because we used to assume that we don't need protections on none ptes or known zero pages. One use case of such a feature bit was VM live snapshot, where if without wr-protecting empty ptes the snapshot can contain random rubbish in the holes of the anonymous memory, which can cause misbehave of the guest when the guest OS assumes the pages should be all zeros. QEMU worked it around by pre-populate the section with reads to fill in zero page entries before starting the whole snapshot process [1]. Recently there's another need raised on using userfaultfd wr-protect for detecting dirty pages (to replace soft-dirty in some cases) [2]. In that case if without being able to wr-protect none ptes by default, the dirty info can get lost, since we cannot treat every none pte to be dirty (the current design is identify a page dirty based on uffd-wp bit being cleared). In general, we want to be able to wr-protect empty ptes too even for anonymous. This patch implements UFFD_FEATURE_WP_UNPOPULATED so that it'll make uffd-wp handling on none ptes being consistent no matter what the memory type is underneath. It doesn't have any impact on file memories so far because we already have pte markers taking care of that. So it only affects anonymous. The feature bit is by default off, so the old behavior will be maintained. Sometimes it may be wanted because the wr-protect of none ptes will contain overheads not only during UFFDIO_WRITEPROTECT (by applying pte markers to anonymous), but also on creating the pgtables to store the pte markers. So there's potentially less chance of using thp on the first fault for a none pmd or larger than a pmd. The major implementation part is teaching the whole kernel to understand pte markers even for anonymously mapped ranges, meanwhile allowing the UFFDIO_WRITEPROTECT ioctl to apply pte markers for anonymous too when the new feature bit is set. Note that even if the patch subject starts with mm/uffd, there're a few small refactors to major mm path of handling anonymous page faults. But they should be straightforward. With WP_UNPOPUATED, application like QEMU can avoid pre-read faults all the memory before wr-protect during taking a live snapshot. Quotting from Muhammad's test result here [3] based on a simple program [4]: (1) With huge page disabled echo madvise > /sys/kernel/mm/transparent_hugepage/enabled ./uffd_wp_perf Test DEFAULT: 4 Test PRE-READ: 1111453 (pre-fault 1101011) Test MADVISE: 278276 (pre-fault 266378) Test WP-UNPOPULATE: 11712 (2) With Huge page enabled echo always > /sys/kernel/mm/transparent_hugepage/enabled ./uffd_wp_perf Test DEFAULT: 4 Test PRE-READ: 22521 (pre-fault 22348) Test MADVISE: 4909 (pre-fault 4743) Test WP-UNPOPULATE: 14448 There'll be a great perf boost for no-thp case, while for thp enabled with extreme case of all-thp-zero WP_UNPOPULATED can be slower than MADVISE, but that's low possibility in reality, also the overhead was not reduced but postponed until a follow up write on any huge zero thp, so potentially it is faster by making the follow up writes slower. [1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/Y+v2HJ8+3i%2FKzDBu@x1n/ [3] https://lore.kernel.org/all/[email protected]/ [4] https://github.com/xzpeter/clibs/blob/master/uffd-test/uffd-wp-perf.c [[email protected]: comment changes, oneliner fix to khugepaged] Link: https://lkml.kernel.org/r/ZB2/8jPhD3fpx5U8@x1n Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Paul Gofman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: return an ERR_PTR from __filemap_get_folioChristoph Hellwig1-5/+6
Instead of returning NULL for all errors, distinguish between: - no entry found and not asked to allocated (-ENOENT) - failed to allocate memory (-ENOMEM) - would block (-EAGAIN) so that callers don't have to guess the error based on the passed in flags. Also pass through the error through the direct callers: filemap_get_folio, filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio. [[email protected]: fix null-pointer deref] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Ryusuke Konishi <[email protected]> [nilfs2] Cc: Andreas Gruenbacher <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: remove FGP_ENTRYChristoph Hellwig1-2/+1
FGP_ENTRY is unused now, so remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: make mapping_get_entry available outside of filemap.cChristoph Hellwig1-0/+1
mapping_get_entry is useful for page cache API users that need to know about xa_value internals. Rename it and make it available in pagemap.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05net: stmmac: add support for platform specific resetShenwei Wang1-0/+1
This patch adds support for platform-specific reset logic in the stmmac driver. Some SoCs require a different reset mechanism than the standard dwmac IP reset. To support these platforms, a new function pointer 'fix_soc_reset' is added to the plat_stmmacenet_data structure. The stmmac_reset in hwif.h is modified to call the 'fix_soc_reset' function if it exists. This enables the driver to use the platform-specific reset logic when necessary. Signed-off-by: Shenwei Wang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-05mm: enable maple tree RCU mode by defaultLiam R. Howlett1-1/+2
Use the maple tree in RCU mode for VMA tracking. The maple tree tracks the stack and is able to update the pivot (lower/upper boundary) in-place to allow the page fault handler to write to the tree while holding just the mmap read lock. This is safe as the writes to the stack have a guard VMA which ensures there will always be a NULL in the direction of the growth and thus will only update a pivot. It is possible, but not recommended, to have VMAs that grow up/down without guard VMAs. syzbot has constructed a testcase which sets up a VMA to grow and consume the empty space. Overwriting the entire NULL entry causes the tree to be altered in a way that is not safe for concurrent readers; the readers may see a node being rewritten or one that does not match the maple state they are using. Enabling RCU mode allows the concurrent readers to see a stable node and will return the expected result. [[email protected]: we don't need to free the nodes with RCU[ Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree") Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Reported-by: [email protected] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05Migrate the PCIe-IDIO-24 and WS16C48 GPIO driversMark Brown1-2/+4
Merge series from William Breathitt Gray <[email protected]>: The regmap API supports IO port accessors so we can take advantage of regmap abstractions rather than handling access to the device registers directly in the driver. A patch to pass irq_drv_data as a parameter for struct regmap_irq_chip set_type_config() is included. This is needed by the idio_24_set_type_config() and ws16c48_set_type_config() callbacks in order to update the type configuration on their respective devices.
2023-04-05PCI: Make pci_bus_for_each_resource() index optionalAndy Shevchenko1-5/+19
Refactor pci_bus_for_each_resource() in the same way as pci_dev_for_each_resource(). This allows the index to be hidden inside the implementation so the caller can omit it when it's not used otherwise. No functional changes intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Krzysztof Wilczyński <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
2023-04-05clk: Remove mmask and nmask fields in struct clk_fractional_dividerChristophe JAILLET1-2/+0
All users of these fields have been removed. They are now computed when needed with [mn]shift and [mn]width. This shrinks the size of struct clk_fractional_divider from 72 to 56 bytes. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/680357e5acb338433bfc94114b65b4a4ce2c99e2.1680423909.git.christophe.jaillet@wanadoo.fr Reviewed-by: Heiko Stuebner <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>
2023-04-05nvmem: Add macro to register nvmem layout driversMiquel Raynal1-0/+6
Provide a module_nvmem_layout_driver() macro at the end of the nvmem-provider.h header to reduce the boilerplate when registering nvmem layout drivers. Suggested-by: Srinivas Kandagatla <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Acked-by: Rafał Miłecki <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: support specifying both: cell raw data & post read lengthsRafał Miłecki1-0/+2
Callback .read_post_process() is designed to modify raw cell content before providing it to the consumer. So far we were dealing with modifications that didn't affect cell size (length). In some cases however cell content needs to be reformatted and resized. It's required e.g. to provide properly formatted MAC address in case it's stored in a non-binary format (e.g. using ASCII). There were few discussions how to optimally handle that. Following possible solutions were considered: 1. Allow .read_post_process() to realloc (resize) content buffer 2. Allow .read_post_process() to adjust (decrease) just buffer length 3. Register NVMEM cells using post-read sizes The preferred solution was the last one. The problem is that simply adjusting "bytes" in NVMEM providers would result in core code NOT passing whole raw data to .read_post_process() callbacks. It means callback functions couldn't do their job without somehow manually reading original cell content on their own. This patch deals with that by registering NVMEM cells with both lengths: raw content one and post read one. It allows: 1. Core code to read whole raw cell content 2. Callbacks to return content they want Signed-off-by: Rafał Miłecki <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: provide own priv pointer in post process callbackMichael Walle1-1/+4
It doesn't make any more sense to have a opaque pointer set up by the nvmem device. Usually, the layout isn't associated with a particular nvmem device. Instead, let the caller who set the post process callback provide the priv pointer. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: cell: drop global cell_post_processMichael Walle1-2/+0
There are no users anymore for the global cell_post_process callback anymore. New users should use proper nvmem layouts. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: allow to modify a cell before adding itMichael Walle1-0/+5
Provide a way to modify a cell before it will get added. This is useful to attach a custom post processing hook via a layout. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: add per-cell post processingMichael Walle1-0/+3
Instead of relying on the name the consumer is using for the cell, like it is done for the nvmem .cell_post_process configuration parameter, provide a per-cell post processing hook. This can then be populated by the NVMEM provider (or the NVMEM layout) when adding the cell. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: introduce NVMEM layoutsMichael Walle2-0/+58
NVMEM layouts are used to generate NVMEM cells during runtime. Think of an EEPROM with a well-defined conent. For now, the content can be described by a device tree or a board file. But this only works if the offsets and lengths are static and don't change. One could also argue that putting the layout of the EEPROM in the device tree is the wrong place. Instead, the device tree should just have a specific compatible string. Right now there are two use cases: (1) The NVMEM cell needs special processing. E.g. if it only specifies a base MAC address offset and you need to add an offset, or it needs to parse a MAC from ASCII format or some proprietary format. (Post processing of cells is added in a later commit). (2) u-boot environment parsing. The cells don't have a particular offset but it needs parsing the content to determine the offsets and length. Co-developed-by: Miquel Raynal <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05of: device: Kill of_device_request_module()Miquel Raynal1-6/+0
A new helper has been introduced, of_request_module(). Users have been converted, this helper can now be deleted. Signed-off-by: Miquel Raynal <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05of: Move the request module helper logic to module.cMiquel Raynal1-0/+6
Depending on device.c for pure OF handling is considered backwards. Let's extract the content of of_device_request_module() to have the real logic under module.c. The next step will be to convert users of of_device_request_module() to use the new helper. Signed-off-by: Miquel Raynal <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05of: Move of_modalias() to module.cMiquel Raynal1-0/+9
Create a specific .c file for OF related module handling. Move of_modalias() inside as a first step. The helper is exposed through of.h even though it is only used by core files because the users from device.c will soon be split into an OF-only helper in module.c as well as a device-oriented inline helper in of_device.h. Putting this helper in of_private.h would require to include of_private.h from of_device.h, which is not acceptable. Suggested-by: Rob Herring <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05of: Rename of_modalias_node()Miquel Raynal1-1/+2
This helper does not produce a real modalias, but tries to get the "product" compatible part of the "vendor,product" compatibles only. It is far from creating a purely useful modalias string and does not seem to be used like that directly anyway, so let's try to give this helper a more meaningful name before moving there a real modalias helper (already existing under of/device.c). Also update the various documentations to refer to the strings as "aliases" rather than "modaliases" which has a real meaning in the Linux kernel. There is no functional change. Cc: Rafael J. Wysocki <[email protected]> Cc: Len Brown <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Wolfram Sang <[email protected]> Cc: Mark Brown <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Reviewed-by: Rob Herring <[email protected]> Acked-by: Mark Brown <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Acked-by: Sebastian Reichel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05regmap: Pass irq_drv_data as a parameter for set_type_config()William Breathitt Gray1-2/+4
Allow the struct regmap_irq_chip set_type_config() callback to access irq_drv_data by passing it as a parameter. Signed-off-by: William Breathitt Gray <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/20e15cd3afae80922b7e0577c7741df86b3390c5.1680708357.git.william.gray@linaro.org Signed-off-by: Mark Brown <[email protected]>
2023-04-05Merge tag 'trace-v6.3-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix timerlat notification, as it was not triggering the notify to users when a new max latency was hit. - Do not trigger max latency if the tracing is off. When tracing is off, the ring buffer is not updated, it does not make sense to notify when there's a new max latency detected by the tracer, as why that latency happened is not available. The tracing logic still runs when the ring buffer is disabled, but it should not be triggering notifications. - Fix race on freeing the synthetic event "last_cmd" variable by adding a mutex around it. - Fix race between reader and writer of the ring buffer by adding memory barriers. When the writer is still on the reader page it must have its content visible on the buffer before it moves the commit index that the reader uses to know how much content is on the page. - Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_ and _RET_IP_, which gets broken if it is not inlined. - Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build. The field formats of trace events are calculated by using sizeof(type) and other means by what is passed into the structure macros like __field(). The __field() macro is only meant for atom types like int, long, short, pointer, etc. It is not meant for arrays. The code will currently compile with arrays, but then the format produced will be inaccurate, and user space parsing tools will break. Two bugs have already been fixed, now add code that will make the kernel fail to build if another trace event includes this buggy field format. - Fix boot up snapshot code: Boot snapshots were triggering when not even asked for on the kernel command line. This was caused by two bugs: 1) It would trigger a snapshot on any instance if one was created from the kernel command line. 2) The error handling would only affect the top level instance. So the fact that a snapshot was done on a instance that didn't allocate a buffer triggered a warning written into the top level buffer, and worse yet, disabled the top level buffer. - Fix memory leak that was caused when an error was logged in a trace buffer instance, and then the buffer instance was removed. The allocated error log messages still needed to be freed. * tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Free error logs of tracing instances tracing: Fix ftrace_boot_snapshot command line logic tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance tracing: Error if a trace event has an array for a __field() tracing/osnoise: Fix notify new tracing_max_latency tracing/timerlat: Notify new max thread latency ftrace: Mark get_lock_parent_ip() __always_inline ring-buffer: Fix race while reader and writer are on the same page tracing/synthetic: Fix races on freeing last_cmd
2023-04-05Merge branches 'rcu/staging-core', 'rcu/staging-docs' and ↵Joel Fernandes (Google)5-36/+111
'rcu/staging-kfree', remote-tracking branches 'paul/srcu-cf.2023.04.04a', 'fbq/rcu/lockdep.2023.03.27a' and 'fbq/rcu/rcutorture.2023.03.20a' into rcu/staging
2023-04-05tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystemJoel Fernandes (Google)1-0/+2
For CONFIG_NO_HZ_FULL systems, the tick_do_timer_cpu cannot be offlined. However, cpu_is_hotpluggable() still returns true for those CPUs. This causes torture tests that do offlining to end up trying to offline this CPU causing test failures. Such failure happens on all architectures. Fix the repeated error messages thrown by this (even if the hotplug errors are harmless) by asking the opinion of the nohz subsystem on whether the CPU can be hotplugged. [ Apply Frederic Weisbecker feedback on refactoring tick_nohz_cpu_down(). ] For drivers/base/ portion: Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Zhouyi Zhou <[email protected]> Cc: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: rcu <[email protected]> Cc: [email protected] Fixes: 2987557f52b9 ("driver-core/cpu: Expose hotpluggability to the rest of the kernel") Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]>
2023-04-05srcu: Add comments for srcu_size_statePingfan Liu1-10/+23
The SRCU_SIZE_* names are not self-explanatory, so this commit therefore adds comments to the definitions. Signed-off-by: Pingfan Liu <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Zhang, Qiang1" <[email protected]> To: [email protected] Reviewed-by: Paul E. McKenney <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]>
2023-04-05sed-opal: Add command to read locking range parameters.Ondrej Kozina1-0/+1
It returns following attributes: locking range start locking range length read lock enabled write lock enabled lock state (RW, RO or LK) It can be retrieved by user authority provided the authority was added to locking range via prior IOC_OPAL_ADD_USR_TO_LR ioctl command. The command was extended to add user in ACE that allows to read attributes listed above. Signed-off-by: Ondrej Kozina <[email protected]> Tested-by: Luca Boccassi <[email protected]> Tested-by: Milan Broz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-05interconnect: drop unused icc_link_destroy() interfaceJohan Hovold1-6/+0
Now that the link array is deallocated when destroying nodes and the explicit link removal has been dropped from the exynos driver there are no further users of and no need for the icc_link_destroy() interface. Signed-off-by: Johan Hovold <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Georgi Djakov <[email protected]>
2023-04-05sched/psi: Allow unprivileged polling of N*2s periodDomenico Cerasuolo2-1/+8
PSI offers 2 mechanisms to get information about a specific resource pressure. One is reading from /proc/pressure/<resource>, which gives average pressures aggregated every 2s. The other is creating a pollable fd for a specific resource and cgroup. The trigger creation requires CAP_SYS_RESOURCE, and gives the possibility to pick specific time window and threshold, spawing an RT thread to aggregate the data. Systemd would like to provide containers the option to monitor pressure on their own cgroup and sub-cgroups. For example, if systemd launches a container that itself then launches services, the container should have the ability to poll() for pressure in individual services. But neither the container nor the services are privileged. This patch implements a mechanism to allow unprivileged users to create pressure triggers. The difference with privileged triggers creation is that unprivileged ones must have a time window that's a multiple of 2s. This is so that we can avoid unrestricted spawning of rt threads, and use instead the same aggregation mechanism done for the averages, which runs independently of any triggers. Suggested-by: Johannes Weiner <[email protected]> Signed-off-by: Domenico Cerasuolo <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-05sched/psi: Rename existing poll members in preparationDomenico Cerasuolo1-18/+18
Renaming in PSI implementation to make a clear distinction between privileged and unprivileged triggers code to be implemented in the next patch. Suggested-by: Johannes Weiner <[email protected]> Signed-off-by: Domenico Cerasuolo <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-04bpf: Refactor btf_nested_type_is_trusted().Alexei Starovoitov1-3/+4
btf_nested_type_is_trusted() tries to find a struct member at corresponding offset. It works for flat structures and falls apart in more complex structs with nested structs. The offset->member search is already performed by btf_struct_walk() including nested structs. Reuse this work and pass {field name, field btf id} into btf_nested_type_is_trusted() instead of offset to make BTF_TYPE_SAFE*() logic more robust. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]