aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09maple_tree: add format option to mt_dump()Liam R. Howlett1-2/+7
Allow different formatting strings to be used when dumping the tree. Currently supports hex and decimal. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Cc: David Binderman <[email protected]> Cc: Peng Zhang <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Vernon Yang <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm: convert migrate_pages() to work on foliosMatthew Wilcox (Oracle)1-8/+8
Almost all of the callers & implementors of migrate_pages() were already converted to use folios. compaction_alloc() & compaction_free() are trivial to convert a part of this patch and not worth splitting out. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm/gup: remove vmas array from internal GUP functionsLorenzo Stoakes1-6/+4
Now we have eliminated all callers to GUP APIs which use the vmas parameter, eliminate it altogether. This eliminates a class of bugs where vmas might have been kept around longer than the mmap_lock and thus we need not be concerned about locks being dropped during this operation leaving behind dangling pointers. This simplifies the GUP API and makes it considerably clearer as to its purpose - follow flags are applied and if pinning, an array of pages is returned. Link: https://lkml.kernel.org/r/6811b4b2b4b3baf3dd07f422bb18853bb2cd09fb.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian König <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Janosch Frank <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Sean Christopherson <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm/gup: remove vmas parameter from pin_user_pages()Lorenzo Stoakes1-2/+1
We are now in a position where no caller of pin_user_pages() requires the vmas parameter at all, so eliminate this parameter from the function and all callers. This clears the way to removing the vmas parameter from GUP altogether. Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> [qib] Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Sakari Ailus <[email protected]> [drivers/media] Cc: Catalin Marinas <[email protected]> Cc: Christian König <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Janosch Frank <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Sean Christopherson <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm/gup: remove vmas parameter from get_user_pages_remote()Lorenzo Stoakes1-3/+31
The only instances of get_user_pages_remote() invocations which used the vmas parameter were for a single page which can instead simply look up the VMA directly. In particular:- - __update_ref_ctr() looked up the VMA but did nothing with it so we simply remove it. - __access_remote_vm() was already using vma_lookup() when the original lookup failed so by doing the lookup directly this also de-duplicates the code. We are able to perform these VMA operations as we already hold the mmap_lock in order to be able to call get_user_pages_remote(). As part of this work we add get_user_page_vma_remote() which abstracts the VMA lookup, error handling and decrementing the page reference count should the VMA lookup fail. This forms part of a broader set of patches intended to eliminate the vmas parameter altogether. [[email protected]: avoid passing NULL to PTR_ERR] Link: https://lkml.kernel.org/r/d20128c849ecdbf4dd01cc828fcec32127ed939a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> (for arm64) Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Janosch Frank <[email protected]> (for s390) Reviewed-by: Christoph Hellwig <[email protected]> Cc: Christian König <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Sean Christopherson <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm/gup: remove unused vmas parameter from pin_user_pages_remote()Lorenzo Stoakes1-1/+1
No invocation of pin_user_pages_remote() uses the vmas parameter, so remove it. This forms part of a larger patch set eliminating the use of the vmas parameters altogether. Link: https://lkml.kernel.org/r/28f000beb81e45bf538a2aaa77c90f5482b67a32.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian König <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Janosch Frank <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Sean Christopherson <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm/gup: remove unused vmas parameter from get_user_pages()Lorenzo Stoakes1-2/+1
Patch series "remove the vmas parameter from GUP APIs", v6. (pin_/get)_user_pages[_remote]() each provide an optional output parameter for an array of VMA objects associated with each page in the input range. These provide the means for VMAs to be returned, as long as mm->mmap_lock is never released during the GUP operation (i.e. the internal flag FOLL_UNLOCKABLE is not specified). In addition, these VMAs can only be accessed with the mmap_lock held and become invalidated the moment it is released. The vast majority of invocations do not use this functionality and of those that do, all but one case retrieve a single VMA to perform checks upon. It is not egregious in the single VMA cases to simply replace the operation with a vma_lookup(). In these cases we duplicate the (fast) lookup on a slow path already under the mmap_lock, abstracted to a new get_user_page_vma_remote() inline helper function which also performs error checking and reference count maintenance. The special case is io_uring, where io_pin_pages() specifically needs to assert that the VMAs underlying the range do not result in broken long-term GUP file-backed mappings. As GUP now internally asserts that FOLL_LONGTERM mappings are not file-backed in a broken fashion (i.e. requiring dirty tracking) - as implemented in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings" - this logic is no longer required and so we can simply remove it altogether from io_uring. Eliminating the vmas parameter eliminates an entire class of danging pointer errors that might have occured should the lock have been incorrectly released. In addition, the API is simplified and now clearly expresses what it is intended for - applying the specified GUP flags and (if pinning) returning pinned pages. This change additionally opens the door to further potential improvements in GUP and the possible marrying of disparate code paths. I have run this series against gup_test with no issues. Thanks to Matthew Wilcox for suggesting this refactoring! This patch (of 6): No invocation of get_user_pages() use the vmas parameter, so remove it. The GUP API is confusing and caveated. Recent changes have done much to improve that, however there is more we can do. Exporting vmas is a prime target as the caller has to be extremely careful to preclude their use after the mmap_lock has expired or otherwise be left with dangling pointers. Removing the vmas parameter focuses the GUP functions upon their primary purpose - pinning (and outputting) pages as well as performing the actions implied by the input flags. This is part of a patch series aiming to remove the vmas parameter altogether. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/589e0c64794668ffc799651e8d85e703262b1e9d.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Suggested-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Acked-by: Christian König <[email protected]> (for radeon parts) Acked-by: Jarkko Sakkinen <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Sean Christopherson <[email protected]> (KVM) Cc: Catalin Marinas <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Janosch Frank <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Sakari Ailus <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm/hugetlb: remove hugetlb_page_subpool()Sidhartha Kumar1-13/+0
All users of hugetlb_page_subpool() have been converted to use the folio equivalent. This function can be safely removed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm: page_alloc: move sysctls into it own filsKefeng Wang2-32/+0
This moves all page alloc related sysctls to its own file, as part of the kernel/sysctl.c spring cleaning, also move some functions declarations from mm.h into internal.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Len Brown <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm: page_alloc: move pm_* function into powerKefeng Wang2-11/+10
pm_restrict_gfp_mask()/pm_restore_gfp_mask() only used in power, let's move them out of page_alloc.c. Adding a general gfp_has_io_fs() function which return true if gfp with both __GFP_IO and __GFP_FS flags, then use it inside of pm_suspended_storage(), also the pm_suspended_storage() is moved into suspend.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Len Brown <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm: page_alloc: move mark_free_page() into snapshot.cKefeng Wang1-3/+0
The mark_free_page() is only used in kernel/power/snapshot.c, move it out to reduce a bit of page_alloc.c Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Len Brown <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm: page_alloc: split out DEBUG_PAGEALLOCKefeng Wang1-27/+49
Move DEBUG_PAGEALLOC related functions into a single file to reduce a bit of page_alloc.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Len Brown <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm: page_alloc: split out FAIL_PAGE_ALLOCKefeng Wang1-0/+9
... to a single file to reduce a bit of page_alloc.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Len Brown <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm: page_alloc: move set_zone_contiguous() into mm_init.cKefeng Wang1-3/+0
set_zone_contiguous() is only used in mm init/hotplug, and clear_zone_contiguous() only used in hotplug, move them from page_alloc.c to the more appropriate file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Len Brown <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09mm: memory_failure: move memory_failure_attr_group under MEMORY_FAILUREKefeng Wang1-5/+4
The memory_failure_attr_group is only called if MEMORY_FAILURE enabled, move it under this configuration. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09kasan: use internal prototypes matching gcc-13 builtinsArnd Bergmann1-1/+1
gcc-13 warns about function definitions for builtin interfaces that have a different prototype, e.g.: In file included from kasan_test.c:31: kasan.h:574:6: error: conflicting types for built-in function '__asan_register_globals'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch] 574 | void __asan_register_globals(struct kasan_global *globals, size_t size); kasan.h:577:6: error: conflicting types for built-in function '__asan_alloca_poison'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch] 577 | void __asan_alloca_poison(unsigned long addr, size_t size); kasan.h:580:6: error: conflicting types for built-in function '__asan_load1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch] 580 | void __asan_load1(unsigned long addr); kasan.h:581:6: error: conflicting types for built-in function '__asan_store1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch] 581 | void __asan_store1(unsigned long addr); kasan.h:643:6: error: conflicting types for built-in function '__hwasan_tag_memory'; expected 'void(void *, unsigned char, long int)' [-Werror=builtin-declaration-mismatch] 643 | void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size); The two problems are: - Addresses are passes as 'unsigned long' in the kernel, but gcc-13 expects a 'void *'. - sizes meant to use a signed ssize_t rather than size_t. Change all the prototypes to match these. Using 'void *' consistently for addresses gets rid of a couple of type casts, so push that down to the leaf functions where possible. This now passes all randconfig builds on arm, arm64 and x86, but I have not tested it on the other architectures that support kasan, since they tend to fail randconfig builds in other ways. This might fail if any of the 32-bit architectures expect a 'long' instead of 'int' for the size argument. The __asan_allocas_unpoison() function prototype is somewhat weird, since it uses a pointer for 'stack_top' and an size_t for 'stack_bottom'. This looks like it is meant to be 'addr' and 'size' like the others, but the implementation clearly treats them as 'top' and 'bottom'. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09filemap: remove page_endio()Pankaj Raghav1-2/+0
page_endio() is not used anymore. Remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pankaj Raghav <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Luis Chamberlain <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09cachestat: implement cachestat syscallNhat Pham3-1/+23
There is currently no good way to query the page cache state of large file sets and directory trees. There is mincore(), but it scales poorly: the kernel writes out a lot of bitmap data that userspace has to aggregate, when the user really doesn not care about per-page information in that case. The user also needs to mmap and unmap each file as it goes along, which can be quite slow as well. Some use cases where this information could come in handy: * Allowing database to decide whether to perform an index scan or direct table queries based on the in-memory cache state of the index. * Visibility into the writeback algorithm, for performance issues diagnostic. * Workload-aware writeback pacing: estimating IO fulfilled by page cache (and IO to be done) within a range of a file, allowing for more frequent syncing when and where there is IO capacity, and batching when there is not. * Computing memory usage of large files/directory trees, analogous to the du tool for disk usage. More information about these use cases could be found in the following thread: https://lore.kernel.org/lkml/[email protected]/ This patch implements a new syscall that queries cache state of a file and summarizes the number of cached pages, number of dirty pages, number of pages marked for writeback, number of (recently) evicted pages, etc. in a given range. Currently, the syscall is only wired in for x86 architecture. NAME cachestat - query the page cache statistics of a file. SYNOPSIS #include <sys/mman.h> struct cachestat_range { __u64 off; __u64 len; }; struct cachestat { __u64 nr_cache; __u64 nr_dirty; __u64 nr_writeback; __u64 nr_evicted; __u64 nr_recently_evicted; }; int cachestat(unsigned int fd, struct cachestat_range *cstat_range, struct cachestat *cstat, unsigned int flags); DESCRIPTION cachestat() queries the number of cached pages, number of dirty pages, number of pages marked for writeback, number of evicted pages, number of recently evicted pages, in the bytes range given by `off` and `len`. An evicted page is a page that is previously in the page cache but has been evicted since. A page is recently evicted if its last eviction was recent enough that its reentry to the cache would indicate that it is actively being used by the system, and that there is memory pressure on the system. These values are returned in a cachestat struct, whose address is given by the `cstat` argument. The `off` and `len` arguments must be non-negative integers. If `len` > 0, the queried range is [`off`, `off` + `len`]. If `len` == 0, we will query in the range from `off` to the end of the file. The `flags` argument is unused for now, but is included for future extensibility. User should pass 0 (i.e no flag specified). Currently, hugetlbfs is not supported. Because the status of a page can change after cachestat() checks it but before it returns to the application, the returned values may contain stale information. RETURN VALUE On success, cachestat returns 0. On error, -1 is returned, and errno is set to indicate the error. ERRORS EFAULT cstat or cstat_args points to an invalid address. EINVAL invalid flags. EBADF invalid file descriptor. EOPNOTSUPP file descriptor is of a hugetlbfs file [[email protected]: replace rounddown logic with the existing helper] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nhat Pham <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Brian Foster <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michael Kerrisk <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09workingset: refactor LRU refault to expose refault recency checkNhat Pham1-0/+1
Patch series "cachestat: a new syscall for page cache state of files", v13. There is currently no good way to query the page cache statistics of large files and directory trees. There is mincore(), but it scales poorly: the kernel writes out a lot of bitmap data that userspace has to aggregate, when the user really does not care about per-page information in that case. The user also needs to mmap and unmap each file as it goes along, which can be quite slow as well. Some use cases where this information could come in handy: * Allowing database to decide whether to perform an index scan or direct table queries based on the in-memory cache state of the index. * Visibility into the writeback algorithm, for performance issues diagnostic. * Workload-aware writeback pacing: estimating IO fulfilled by page cache (and IO to be done) within a range of a file, allowing for more frequent syncing when and where there is IO capacity, and batching when there is not. * Computing memory usage of large files/directory trees, analogous to the du tool for disk usage. More information about these use cases could be found in this thread: https://lore.kernel.org/lkml/[email protected]/ This series of patches introduces a new system call, cachestat, that summarizes the page cache statistics (number of cached pages, dirty pages, pages marked for writeback, evicted pages etc.) of a file, in a specified range of bytes. It also include a selftest suite that tests some typical usage. Currently, the syscall is only wired in for x86 architecture. This interface is inspired by past discussion and concerns with fincore, which has a similar design (and as a result, issues) as mincore. Relevant links: https://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04207.html https://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04209.html I have also developed a small tool that computes the memory usage of files and directories, analogous to the du utility. User can choose between mincore or cachestat (with cachestat exporting more information than mincore). To compare the performance of these two options, I benchmarked the tool on the root directory of a Meta's server machine, each for five runs: Using cachestat real -- Median: 33.377s, Average: 33.475s, Standard Deviation: 0.3602 user -- Median: 4.08s, Average: 4.1078s, Standard Deviation: 0.0742 sys -- Median: 28.823s, Average: 28.8866s, Standard Deviation: 0.2689 Using mincore: real -- Median: 102.352s, Average: 102.3442s, Standard Deviation: 0.2059 user -- Median: 10.149s, Average: 10.1482s, Standard Deviation: 0.0162 sys -- Median: 91.186s, Average: 91.2084s, Standard Deviation: 0.2046 I also ran both syscalls on a 2TB sparse file: Using cachestat: real 0m0.009s user 0m0.000s sys 0m0.009s Using mincore: real 0m37.510s user 0m2.934s sys 0m34.558s Very large files like this are the pathological case for mincore. In fact, to compute the stats for a single 2TB file, mincore takes as long as cachestat takes to compute the stats for the entire tree! This could easily happen inadvertently when we run it on subdirectories. Mincore is clearly not suitable for a general-purpose command line tool. Regarding security concerns, cachestat() should not pose any additional issues. The caller already has read permission to the file itself (since they need an fd to that file to call cachestat). This means that the caller can access the underlying data in its entirety, which is a much greater source of information (and as a result, a much greater security risk) than the cache status itself. The latest API change (in v13 of the patch series) is suggested by Jens Axboe. It allows for 64-bit length argument, even on 32-bit architecture (which is previously not possible due to the limit on the number of syscall arguments). Furthermore, it eliminates the need for compatibility handling - every user can use the same ABI. This patch (of 4): In preparation for computing recently evicted pages in cachestat, refactor workingset_refault and lru_gen_refault to expose a helper function that would test if an evicted page is recently evicted. [[email protected]: add missing rcu_read_unlock() in lru_gen_refault()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nhat Pham <[email protected]> Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Brian Foster <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09cgroup: remove cgroup_rstat_flush_atomic()Yosry Ahmed1-1/+0
Previous patches removed the only caller of cgroup_rstat_flush_atomic(). Remove the function and simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09memcg: remove mem_cgroup_flush_stats_atomic()Yosry Ahmed1-5/+0
Previous patches removed all callers of mem_cgroup_flush_stats_atomic(). Remove the function and simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Shakeel Butt <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-09drm/amdgpu: add the accelerator PCIe classShiwu Zhang1-0/+3
Add the accelerator PCIe class and match the class in amdgpu for 0x1002 devices of that class. From PCI spec: "PCI Code and ID Assignment, r1.9, sec 1, 1.19" Signed-off-by: Shiwu Zhang <[email protected]> Acked-by: Lijo Lazar <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> # pci_ids.h Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/connector: Allow drivers to pass list of supported colorspacesHarry Wentland1-2/+5
Drivers might not support all colorspaces defined in dp_colorspaces and hdmi_colorspaces. This results in undefined behavior when userspace is setting an unsupported colorspace. Allow drivers to pass the list of supported colorspaces when creating the colorspace property. v2: - Use 0 to indicate support for all colorspaces (Jani) - Print drm_dbg_kms message when drivers pass 0 to signal that drivers should specify supported colorspaecs explicity (Jani) v3: - Move changes to create a common colorspace_names array to separate patch v6: - Avoid magic when passing 0 for supported_colorspaces; be explicit in treating it as "all DP/HDMI" Signed-off-by: Harry Wentland <[email protected]> Reviewed-by: Sebastian Wick <[email protected]> Reviewed-by: Joshua Ashton <[email protected]> Reviewed-by: Simon Ser <[email protected]> Cc: Pekka Paalanen <[email protected]> Cc: Sebastian Wick <[email protected]> Cc: [email protected] Cc: Uma Shankar <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: Joshua Ashton <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Simon Ser <[email protected]> Cc: Melissa Wen <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/connector: Print connector colorspace in state debugfsHarry Wentland1-0/+1
v3: Fix kerneldocs (kernel test robot) v4: Avoid returning NULL from drm_get_colorspace_name Signed-off-by: Harry Wentland <[email protected]> Reviewed-by: Sebastian Wick <[email protected]> Reviewed-by: Joshua Ashton <[email protected]> Reviewed-by: Simon Ser <[email protected]> Cc: Pekka Paalanen <[email protected]> Cc: Sebastian Wick <[email protected]> Cc: [email protected] Cc: Uma Shankar <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: Joshua Ashton <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Simon Ser <[email protected]> Cc: Melissa Wen <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/connector: Use common colorspace_names arrayHarry Wentland1-0/+2
We an use bitfields to track the support ones for HDMI and DP. This allows us to print colorspaces in a consistent manner without needing to know whether we're dealing with DP or HDMI. v4: - Rename _MAX to _COUNT and leave comment to indicate it's not a valid value - Fix misplaced function doc v6: - Drop magic in drm_mode_create_colorspace_property for dealing with "0" supported_colorspaces. Expect the caller to always provide a non-zero supported_colorspaces. - Improve error checking and logging Signed-off-by: Harry Wentland <[email protected]> Reviewed-by: Sebastian Wick <[email protected]> Reviewed-by: Joshua Ashton <[email protected]> Reviewed-by: Simon Ser <[email protected]> Cc: Pekka Paalanen <[email protected]> Cc: Sebastian Wick <[email protected]> Cc: [email protected] Cc: Uma Shankar <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: Joshua Ashton <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Simon Ser <[email protected]> Cc: Melissa Wen <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/connector: Add enum documentation to drm_colorspaceJoshua Ashton1-2/+68
To match the other enums, and add more information about these values. v2: - Specify where an enum entry comes from - Clarify DEFAULT and NO_DATA behavior - BT.2020 CYCC is "constant luminance" - correct type for BT.601 v4: - drop DP/HDMI clarifications that might create more questions than answers v5: - Add note on YCC and RGB variants Signed-off-by: Joshua Ashton <[email protected]> Signed-off-by: Harry Wentland <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Reviewed-by: Sebastian Wick <[email protected]> Acked-by: Pekka Paalanen <[email protected]> Reviewed-by: Simon Ser <[email protected]> Cc: Pekka Paalanen <[email protected]> Cc: Sebastian Wick <[email protected]> Cc: [email protected] Cc: Uma Shankar <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: Joshua Ashton <[email protected]> Cc: Simon Ser <[email protected]> Cc: Melissa Wen <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/connector: Convert DRM_MODE_COLORIMETRY to enumHarry Wentland2-25/+26
This allows us to use strongly typed arguments. v2: - Bring NO_DATA back - Provide explicit enum values v3: - Drop unnecessary '&' from kerneldoc (emersion) v4: - Fix Normal Colorimetry comment Signed-off-by: Harry Wentland <[email protected]> Reviewed-by: Simon Ser <[email protected]> Reviewed-by: Sebastian Wick <[email protected]> Reviewed-by: Pekka Paalanen <[email protected]> Reviewed-by: Joshua Ashton <[email protected]> Cc: Pekka Paalanen <[email protected]> Cc: Sebastian Wick <[email protected]> Cc: [email protected] Cc: Uma Shankar <[email protected]> Cc: Ville Syrjälä <[email protected]> Cc: Joshua Ashton <[email protected]> Cc: Simon Ser <[email protected]> Cc: Melissa Wen <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: bump kfd ioctl minor version for debug api availabilityJonathan Kim1-1/+2
Bump the minor version to declare debugging capability is now available. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: display debug capabilitiesJonathan Kim1-0/+15
Expose debug capabilities in the KFD topology node's HSA capabilities and debug properties flags. Ensure correct capabilities are exposed based on firmware support. Flag definitions can be referenced in uapi/linux/kfd_sysfs.h. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add debug and runtime enable interfaceJonathan Kim1-1/+667
Introduce the GPU debug operations interface. For ROCm-GDB to extend the GNU Debugger's ability to inspect the AMD GPU instruction set, provide the necessary interface to allow the debugger to HW debug-mode set and query exceptions per HSA queue, process or device. The runtime_enable interface coordinates exception handling with the HSA runtime. Usage is available in the kern docs at uapi/linux/kfd_ioctl.h. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09amd/amdkfd: drop unused KFD_IOCTL_SVM_FLAG_UNCACHED flagAlex Deucher1-2/+0
Was leftover from GC 9.4.3 bring up and is currently unused. Drop it for now. Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09arm64: tegra: Add Tegra234 thermal supportThierry Reding1-0/+19
Add device tree node for the BPMP thermal node on Tegra234 and add thermal zone definitions. Acked-by: Jon Hunter <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
2023-06-09i2c: Add i2c_get_match_data()Biju Das1-0/+2
Add i2c_get_match_data() to get match data for I2C, ACPI and DT-based matching, so that we can optimize the driver code. Suggested-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Biju Das <[email protected]> [wsa: simplified var initialization] Signed-off-by: Wolfram Sang <[email protected]>
2023-06-09media: Add NV15_4L4 pixel formatBenjamin Gaignard1-0/+1
NV15_4L4 is the 10-bits per component 4x4 tiled format. Signed-off-by: Benjamin Gaignard <[email protected]> Signed-off-by: Nicolas Dufresne <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2023-06-09media: v4l2-common: Add support for fractional bppNicolas Dufresne1-0/+2
Fraction bytes-per-pixel exist for some packed format. You will find notably on Rockhip platform that 10bit data is stored fully packed, meaning that there is 1.25 pixels per bytes. This can be represented with the fraction 5/4 and can be used to scale the width into a bytesperline. Signed-off-by: Nicolas Dufresne <[email protected]> Signed-off-by: Benjamin Gaignard <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2023-06-09media: Add AV1 uAPIDaniel Almeida3-0/+739
This patch adds the AOMedia Video 1 (AV1) kernel uAPI. This design is based on currently available AV1 API implementations and aims to support the development of AV1 stateless video codecs on Linux. Signed-off-by: Daniel Almeida <[email protected]> Co-developed-by: Nicolas Dufresne <[email protected]> Signed-off-by: Nicolas Dufresne <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2023-06-09soc: mediatek: remove DDP_DOMPONENT_DITHER from enumJason-JH.Lin1-2/+1
After mmsys and drm change DITHER enum to DDP_COMPONENT_DITHER0, mmsys header can remove the useless DDP_COMPONENT_DITHER enum. Signed-off-by: Jason-JH.Lin <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Reviewed-by: Rex-BC Chen <[email protected]> Acked-by: Matthias Brugger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Matthias Brugger <[email protected]>
2023-06-09drm/ttm: add NUMA node id to the poolRajneesh Bhardwaj1-1/+3
This allows backing ttm_tt structure with pages from different NUMA pools. Tested-by: Graham Sider <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Update coherence settings for svm rangesRajneesh Bhardwaj1-0/+2
Recently introduced commit "drm/amdgpu: Set cache coherency for GC 9.4.3" did not update the settings applicable for svm ranges. Add the coherence settings for svm ranges for GFX IP 9.4.3. Reviewed-by: Amber Lin <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/ttm: Helper function to get TTM mem limitMukul Joshi1-1/+1
Add a helper function to get TTM memory limit. This is needed by KFD to set its own internal memory limits. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add new flag to AMDGPU_CTX_QUERY2Pierre-Eric Pelloux-Prayer1-0/+2
OpenGL EXT_robustness extension expects the driver to stop reporting GUILTY_CONTEXT_RESET when the reset has completed and the GPU is ready to accept submission again. This commit adds a AMDGPU_CTX_QUERY2_FLAGS_RESET_IN_PROGRESS flag, that let the UMD know that the reset is still not finished. Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290 Reviewed-by: Christian König <[email protected]> Reviewed-by: André Almeida <[email protected]> Signed-off-by: Pierre-Eric Pelloux-Prayer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09Merge tag 'renesas-drivers-for-v6.5-tag2' of ↵Arnd Bergmann1-5/+19
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/drivers Renesas driver updates for v6.5 (take two) - Convert the R-Mobile SYSC driver to readl_poll_timeout_atomic(). * tag 'renesas-drivers-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel: soc: renesas: rmobile-sysc: Convert to readl_poll_timeout_atomic() iopoll: Do not use timekeeping in read_poll_timeout_atomic() iopoll: Call cpu_relax() in busy loops Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2023-06-09net: phy: broadcom: Add support for setting LED brightnessFlorian Fainelli1-0/+3
Broadcom PHYs have two LEDs selector registers which allow us to control the LED assignment, including how to turn them on/off. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-09net: phy: broadcom: Rename LED registersFlorian Fainelli1-3/+3
These registers are common to most PHYs and are not specific to the BCM5482, renamed the constants accordingly, no functional change. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-09Merge tag 'v6.4-rc5' into media_stageMauro Carvalho Chehab54-121/+203
Linux 6.4-rc5 * tag 'v6.4-rc5': (919 commits) Linux 6.4-rc5 leds: qcom-lpg: Fix PWM period limits selftests/ftrace: Choose target function for filter test from samples KVM: selftests: Add test for race in kvm_recalculate_apic_map() KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds KVM: x86: Account fastpath-only VM-Exits in vCPU stats KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker tpm, tpm_tis: correct tpm_tis_flags enumeration values Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits" media: uvcvideo: Don't expose unsupported formats to userspace media: v4l2-subdev: Fix missing kerneldoc for client_caps media: staging: media: imx: initialize hs_settle to avoid warning media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad() riscv: Implement missing huge_ptep_get riscv: Fix huge_ptep_set_wrprotect when PTE is a NAPOT module/decompress: Fix error checking on zstd decompression fork, vhost: Use CLONE_THREAD to fix freezer/ps regression dt-bindings: serial: 8250_omap: add rs485-rts-active-high selinux: don't use make's grouped targets feature yet ...
2023-06-09dt-bindings: pinctrl: Drop k3Nishanth Menon1-60/+0
For convenience (less code duplication), the pin controller pin configuration register values were defined in the bindings header. These are not some IDs or other abstraction layer but raw numbers used in the registers. These constants do not fit the purpose of bindings. They do not provide any abstraction, any hardware and driver independent ID. In fact, the Linux pinctrl-single driver actually do not use the bindings header at all. Commit f2de003e1426 ("dt-bindings: pinctrl: k3: Deprecate header with register constants") already moved users to the local header, so, drop the binding header. See background discussion in [1]. While at it, clean up the MAINTAINERS file which is the only reference left. [1]: https://lore.kernel.org/linux-arm-kernel/[email protected]/ Suggested-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Nishanth Menon <[email protected]> Acked-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2023-06-09Merge tag 'drm-intel-gt-next-2023-06-08' of ↵Dave Airlie1-1/+43
git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake. Driver Changes: Fixes/improvements/new stuff: - Use large rings for compute contexts (Chris Wilson) - Better logging/debug of unexpected GuC communication issues (Michal Wajdeczko) - Clear out entire reports after reading if not power of 2 size (Ashutosh Dixit) - Limit lmem allocation size to succeed on SmallBars (Andrzej Hajda) - perf/OA capture robustness improvements on DG2 (Umesh Nerlige Ramappa) - Fix error code in intel_gsc_uc_heci_cmd_submit_nonpriv() (Dan Carpenter) Future platform enablement: - Add workaround 14016712196 (Tejas Upadhyay) - HuC loading for MTL (Daniele Ceraolo Spurio) - Allow user to set cache at BO creation (Fei Yang) Miscellaneous: - Use system include style for drm headers (Jani Nikula) - Drop legacy CTB definitions (Michal Wajdeczko) - Turn off the timer to sample frequencies when GT is parked (Ashutosh Dixit) - Make PMU sample array two-dimensional (Ashutosh Dixit) - Use the correct error value when kernel_context() fails (Andi Shyti) - Fix second parameter type of pre-gen8 pte_encode callbacks (Nathan Chancellor) - Fix parameter in gmch_ggtt_insert_{entries, page}() (Nathan Chancellor) - Fix size_t format specifier in gsccs_send_message() (Nathan Chancellor) - Use the fdinfo helper (Tvrtko Ursulin) - Add some missing error propagation (Tvrtko Ursulin) - Reduce I915_MAX_GT to 2 (Matt Atwood) - Rename I915_PMU_MAX_GTS to I915_PMU_MAX_GT (Matt Atwood) - Remove some obsolete definitions (John Harrison) Merges: - Merge drm/drm-next into drm-intel-gt-next (Tvrtko Ursulin) Signed-off-by: Dave Airlie <[email protected]> From: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/ZIH09fqe5v5yArsu@tursulin-desk
2023-06-08ipv4, ipv6: Use splice_eof() to flushDavid Howells3-0/+3
Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. For UDP, a pending packet will not be emitted if the socket is closed before it is flushed; with this change, it be flushed by ->splice_eof(). For TCP, it's not clear that MSG_MORE is actually effective. Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> cc: Kuniyuki Iwashima <[email protected]> cc: Willem de Bruijn <[email protected]> cc: David Ahern <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08splice, net: Add a splice_eof op to file-ops and socket-opsDavid Howells4-0/+4
Add an optional method, ->splice_eof(), to allow splice to indicate the premature termination of a splice to struct file_operations and struct proto_ops. This is called if sendfile() or splice() encounters all of the following conditions inside splice_direct_to_actor(): (1) the user did not set SPLICE_F_MORE (splice only), and (2) an EOF condition occurred (->splice_read() returned 0), and (3) we haven't read enough to fulfill the request (ie. len > 0 still), and (4) we have already spliced at least one byte. A further patch will modify the behaviour of SPLICE_F_MORE to always be passed to the actor if either the user set it or we haven't yet read sufficient data to fulfill the request. Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> cc: Jens Axboe <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Jan Kara <[email protected]> cc: Jeff Layton <[email protected]> cc: David Hildenbrand <[email protected]> cc: Christian Brauner <[email protected]> cc: Chuck Lever <[email protected]> cc: Boris Pismenny <[email protected]> cc: John Fastabend <[email protected]> cc: [email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()David Howells2-2/+2
Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage() with a net-specific handler, splice_to_socket(), that calls sendmsg() with MSG_SPLICE_PAGES set instead of calling ->sendpage(). MSG_MORE is used to indicate if the sendmsg() is expected to be followed with more data. This allows multiple pipe-buffer pages to be passed in a single call in a BVEC iterator, allowing the processing to be pushed down to a loop in the protocol driver. This helps pave the way for passing multipage folios down too. Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should just ignore it and do a normal sendmsg() for now - although that may be a bit slower as it may copy everything. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>