| Age | Commit message (Collapse) | Author | Files | Lines |
|
Allow different formatting strings to be used when dumping the tree.
Currently supports hex and decimal.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Cc: David Binderman <[email protected]>
Cc: Peng Zhang <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vernon Yang <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Almost all of the callers & implementors of migrate_pages() were already
converted to use folios. compaction_alloc() & compaction_free() are
trivial to convert a part of this patch and not worth splitting out.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Now we have eliminated all callers to GUP APIs which use the vmas
parameter, eliminate it altogether.
This eliminates a class of bugs where vmas might have been kept around
longer than the mmap_lock and thus we need not be concerned about locks
being dropped during this operation leaving behind dangling pointers.
This simplifies the GUP API and makes it considerably clearer as to its
purpose - follow flags are applied and if pinning, an array of pages is
returned.
Link: https://lkml.kernel.org/r/6811b4b2b4b3baf3dd07f422bb18853bb2cd09fb.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian König <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Sean Christopherson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
We are now in a position where no caller of pin_user_pages() requires the
vmas parameter at all, so eliminate this parameter from the function and
all callers.
This clears the way to removing the vmas parameter from GUP altogether.
Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Dennis Dalessandro <[email protected]> [qib]
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Sakari Ailus <[email protected]> [drivers/media]
Cc: Catalin Marinas <[email protected]>
Cc: Christian König <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Sean Christopherson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The only instances of get_user_pages_remote() invocations which used the
vmas parameter were for a single page which can instead simply look up the
VMA directly. In particular:-
- __update_ref_ctr() looked up the VMA but did nothing with it so we simply
remove it.
- __access_remote_vm() was already using vma_lookup() when the original
lookup failed so by doing the lookup directly this also de-duplicates the
code.
We are able to perform these VMA operations as we already hold the
mmap_lock in order to be able to call get_user_pages_remote().
As part of this work we add get_user_page_vma_remote() which abstracts the
VMA lookup, error handling and decrementing the page reference count should
the VMA lookup fail.
This forms part of a broader set of patches intended to eliminate the vmas
parameter altogether.
[[email protected]: avoid passing NULL to PTR_ERR]
Link: https://lkml.kernel.org/r/d20128c849ecdbf4dd01cc828fcec32127ed939a.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]> (for arm64)
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Janosch Frank <[email protected]> (for s390)
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Christian König <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Sean Christopherson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
No invocation of pin_user_pages_remote() uses the vmas parameter, so
remove it. This forms part of a larger patch set eliminating the use of
the vmas parameters altogether.
Link: https://lkml.kernel.org/r/28f000beb81e45bf538a2aaa77c90f5482b67a32.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian König <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Sakari Ailus <[email protected]>
Cc: Sean Christopherson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "remove the vmas parameter from GUP APIs", v6.
(pin_/get)_user_pages[_remote]() each provide an optional output parameter
for an array of VMA objects associated with each page in the input range.
These provide the means for VMAs to be returned, as long as mm->mmap_lock
is never released during the GUP operation (i.e. the internal flag
FOLL_UNLOCKABLE is not specified).
In addition, these VMAs can only be accessed with the mmap_lock held and
become invalidated the moment it is released.
The vast majority of invocations do not use this functionality and of
those that do, all but one case retrieve a single VMA to perform checks
upon.
It is not egregious in the single VMA cases to simply replace the
operation with a vma_lookup(). In these cases we duplicate the (fast)
lookup on a slow path already under the mmap_lock, abstracted to a new
get_user_page_vma_remote() inline helper function which also performs
error checking and reference count maintenance.
The special case is io_uring, where io_pin_pages() specifically needs to
assert that the VMAs underlying the range do not result in broken
long-term GUP file-backed mappings.
As GUP now internally asserts that FOLL_LONGTERM mappings are not
file-backed in a broken fashion (i.e. requiring dirty tracking) - as
implemented in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to
file-backed mappings" - this logic is no longer required and so we can
simply remove it altogether from io_uring.
Eliminating the vmas parameter eliminates an entire class of danging
pointer errors that might have occured should the lock have been
incorrectly released.
In addition, the API is simplified and now clearly expresses what it is
intended for - applying the specified GUP flags and (if pinning) returning
pinned pages.
This change additionally opens the door to further potential improvements
in GUP and the possible marrying of disparate code paths.
I have run this series against gup_test with no issues.
Thanks to Matthew Wilcox for suggesting this refactoring!
This patch (of 6):
No invocation of get_user_pages() use the vmas parameter, so remove it.
The GUP API is confusing and caveated. Recent changes have done much to
improve that, however there is more we can do. Exporting vmas is a prime
target as the caller has to be extremely careful to preclude their use
after the mmap_lock has expired or otherwise be left with dangling
pointers.
Removing the vmas parameter focuses the GUP functions upon their primary
purpose - pinning (and outputting) pages as well as performing the actions
implied by the input flags.
This is part of a patch series aiming to remove the vmas parameter
altogether.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/589e0c64794668ffc799651e8d85e703262b1e9d.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Acked-by: Christian König <[email protected]> (for radeon parts)
Acked-by: Jarkko Sakkinen <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Sean Christopherson <[email protected]> (KVM)
Cc: Catalin Marinas <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Sakari Ailus <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
All users of hugetlb_page_subpool() have been converted to use the folio
equivalent. This function can be safely removed.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This moves all page alloc related sysctls to its own file, as part of the
kernel/sysctl.c spring cleaning, also move some functions declarations
from mm.h into internal.h.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
pm_restrict_gfp_mask()/pm_restore_gfp_mask() only used in power, let's
move them out of page_alloc.c.
Adding a general gfp_has_io_fs() function which return true if gfp with
both __GFP_IO and __GFP_FS flags, then use it inside of
pm_suspended_storage(), also the pm_suspended_storage() is moved into
suspend.h.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The mark_free_page() is only used in kernel/power/snapshot.c, move it out
to reduce a bit of page_alloc.c
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Move DEBUG_PAGEALLOC related functions into a single file to reduce a bit
of page_alloc.c.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
... to a single file to reduce a bit of page_alloc.c.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
set_zone_contiguous() is only used in mm init/hotplug, and
clear_zone_contiguous() only used in hotplug, move them from page_alloc.c
to the more appropriate file.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The memory_failure_attr_group is only called if MEMORY_FAILURE enabled,
move it under this configuration.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
gcc-13 warns about function definitions for builtin interfaces that have a
different prototype, e.g.:
In file included from kasan_test.c:31:
kasan.h:574:6: error: conflicting types for built-in function '__asan_register_globals'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
574 | void __asan_register_globals(struct kasan_global *globals, size_t size);
kasan.h:577:6: error: conflicting types for built-in function '__asan_alloca_poison'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
577 | void __asan_alloca_poison(unsigned long addr, size_t size);
kasan.h:580:6: error: conflicting types for built-in function '__asan_load1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
580 | void __asan_load1(unsigned long addr);
kasan.h:581:6: error: conflicting types for built-in function '__asan_store1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
581 | void __asan_store1(unsigned long addr);
kasan.h:643:6: error: conflicting types for built-in function '__hwasan_tag_memory'; expected 'void(void *, unsigned char, long int)' [-Werror=builtin-declaration-mismatch]
643 | void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size);
The two problems are:
- Addresses are passes as 'unsigned long' in the kernel, but gcc-13
expects a 'void *'.
- sizes meant to use a signed ssize_t rather than size_t.
Change all the prototypes to match these. Using 'void *' consistently for
addresses gets rid of a couple of type casts, so push that down to the
leaf functions where possible.
This now passes all randconfig builds on arm, arm64 and x86, but I have
not tested it on the other architectures that support kasan, since they
tend to fail randconfig builds in other ways. This might fail if any of
the 32-bit architectures expect a 'long' instead of 'int' for the size
argument.
The __asan_allocas_unpoison() function prototype is somewhat weird, since
it uses a pointer for 'stack_top' and an size_t for 'stack_bottom'. This
looks like it is meant to be 'addr' and 'size' like the others, but the
implementation clearly treats them as 'top' and 'bottom'.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
page_endio() is not used anymore. Remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Pankaj Raghav <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There is currently no good way to query the page cache state of large file
sets and directory trees. There is mincore(), but it scales poorly: the
kernel writes out a lot of bitmap data that userspace has to aggregate,
when the user really doesn not care about per-page information in that
case. The user also needs to mmap and unmap each file as it goes along,
which can be quite slow as well.
Some use cases where this information could come in handy:
* Allowing database to decide whether to perform an index scan or
direct table queries based on the in-memory cache state of the
index.
* Visibility into the writeback algorithm, for performance issues
diagnostic.
* Workload-aware writeback pacing: estimating IO fulfilled by page
cache (and IO to be done) within a range of a file, allowing for
more frequent syncing when and where there is IO capacity, and
batching when there is not.
* Computing memory usage of large files/directory trees, analogous to
the du tool for disk usage.
More information about these use cases could be found in the following
thread:
https://lore.kernel.org/lkml/[email protected]/
This patch implements a new syscall that queries cache state of a file and
summarizes the number of cached pages, number of dirty pages, number of
pages marked for writeback, number of (recently) evicted pages, etc. in a
given range. Currently, the syscall is only wired in for x86
architecture.
NAME
cachestat - query the page cache statistics of a file.
SYNOPSIS
#include <sys/mman.h>
struct cachestat_range {
__u64 off;
__u64 len;
};
struct cachestat {
__u64 nr_cache;
__u64 nr_dirty;
__u64 nr_writeback;
__u64 nr_evicted;
__u64 nr_recently_evicted;
};
int cachestat(unsigned int fd, struct cachestat_range *cstat_range,
struct cachestat *cstat, unsigned int flags);
DESCRIPTION
cachestat() queries the number of cached pages, number of dirty
pages, number of pages marked for writeback, number of evicted
pages, number of recently evicted pages, in the bytes range given by
`off` and `len`.
An evicted page is a page that is previously in the page cache but
has been evicted since. A page is recently evicted if its last
eviction was recent enough that its reentry to the cache would
indicate that it is actively being used by the system, and that
there is memory pressure on the system.
These values are returned in a cachestat struct, whose address is
given by the `cstat` argument.
The `off` and `len` arguments must be non-negative integers. If
`len` > 0, the queried range is [`off`, `off` + `len`]. If `len` ==
0, we will query in the range from `off` to the end of the file.
The `flags` argument is unused for now, but is included for future
extensibility. User should pass 0 (i.e no flag specified).
Currently, hugetlbfs is not supported.
Because the status of a page can change after cachestat() checks it
but before it returns to the application, the returned values may
contain stale information.
RETURN VALUE
On success, cachestat returns 0. On error, -1 is returned, and errno
is set to indicate the error.
ERRORS
EFAULT cstat or cstat_args points to an invalid address.
EINVAL invalid flags.
EBADF invalid file descriptor.
EOPNOTSUPP file descriptor is of a hugetlbfs file
[[email protected]: replace rounddown logic with the existing helper]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nhat Pham <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Brian Foster <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "cachestat: a new syscall for page cache state of files",
v13.
There is currently no good way to query the page cache statistics of large
files and directory trees. There is mincore(), but it scales poorly: the
kernel writes out a lot of bitmap data that userspace has to aggregate,
when the user really does not care about per-page information in that
case. The user also needs to mmap and unmap each file as it goes along,
which can be quite slow as well.
Some use cases where this information could come in handy:
* Allowing database to decide whether to perform an index scan or direct
table queries based on the in-memory cache state of the index.
* Visibility into the writeback algorithm, for performance issues
diagnostic.
* Workload-aware writeback pacing: estimating IO fulfilled by page cache
(and IO to be done) within a range of a file, allowing for more
frequent syncing when and where there is IO capacity, and batching
when there is not.
* Computing memory usage of large files/directory trees, analogous to
the du tool for disk usage.
More information about these use cases could be found in this thread:
https://lore.kernel.org/lkml/[email protected]/
This series of patches introduces a new system call, cachestat, that
summarizes the page cache statistics (number of cached pages, dirty pages,
pages marked for writeback, evicted pages etc.) of a file, in a specified
range of bytes. It also include a selftest suite that tests some typical
usage. Currently, the syscall is only wired in for x86 architecture.
This interface is inspired by past discussion and concerns with fincore,
which has a similar design (and as a result, issues) as mincore. Relevant
links:
https://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04207.html
https://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04209.html
I have also developed a small tool that computes the memory usage of files
and directories, analogous to the du utility. User can choose between
mincore or cachestat (with cachestat exporting more information than
mincore). To compare the performance of these two options, I benchmarked
the tool on the root directory of a Meta's server machine, each for five
runs:
Using cachestat
real -- Median: 33.377s, Average: 33.475s, Standard Deviation: 0.3602
user -- Median: 4.08s, Average: 4.1078s, Standard Deviation: 0.0742
sys -- Median: 28.823s, Average: 28.8866s, Standard Deviation: 0.2689
Using mincore:
real -- Median: 102.352s, Average: 102.3442s, Standard Deviation: 0.2059
user -- Median: 10.149s, Average: 10.1482s, Standard Deviation: 0.0162
sys -- Median: 91.186s, Average: 91.2084s, Standard Deviation: 0.2046
I also ran both syscalls on a 2TB sparse file:
Using cachestat:
real 0m0.009s
user 0m0.000s
sys 0m0.009s
Using mincore:
real 0m37.510s
user 0m2.934s
sys 0m34.558s
Very large files like this are the pathological case for mincore. In
fact, to compute the stats for a single 2TB file, mincore takes as long as
cachestat takes to compute the stats for the entire tree! This could
easily happen inadvertently when we run it on subdirectories. Mincore is
clearly not suitable for a general-purpose command line tool.
Regarding security concerns, cachestat() should not pose any additional
issues. The caller already has read permission to the file itself (since
they need an fd to that file to call cachestat). This means that the
caller can access the underlying data in its entirety, which is a much
greater source of information (and as a result, a much greater security
risk) than the cache status itself.
The latest API change (in v13 of the patch series) is suggested by Jens
Axboe. It allows for 64-bit length argument, even on 32-bit architecture
(which is previously not possible due to the limit on the number of
syscall arguments). Furthermore, it eliminates the need for compatibility
handling - every user can use the same ABI.
This patch (of 4):
In preparation for computing recently evicted pages in cachestat, refactor
workingset_refault and lru_gen_refault to expose a helper function that
would test if an evicted page is recently evicted.
[[email protected]: add missing rcu_read_unlock() in lru_gen_refault()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nhat Pham <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Brian Foster <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Previous patches removed the only caller of cgroup_rstat_flush_atomic().
Remove the function and simplify the code.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yosry Ahmed <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Previous patches removed all callers of mem_cgroup_flush_stats_atomic().
Remove the function and simplify the code.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yosry Ahmed <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add the accelerator PCIe class and match the
class in amdgpu for 0x1002 devices of that class.
From PCI spec:
"PCI Code and ID Assignment, r1.9, sec 1, 1.19"
Signed-off-by: Shiwu Zhang <[email protected]>
Acked-by: Lijo Lazar <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]> # pci_ids.h
Signed-off-by: Alex Deucher <[email protected]>
|
|
Drivers might not support all colorspaces defined in
dp_colorspaces and hdmi_colorspaces. This results in
undefined behavior when userspace is setting an
unsupported colorspace.
Allow drivers to pass the list of supported colorspaces
when creating the colorspace property.
v2:
- Use 0 to indicate support for all colorspaces (Jani)
- Print drm_dbg_kms message when drivers pass 0
to signal that drivers should specify supported
colorspaecs explicity (Jani)
v3:
- Move changes to create a common colorspace_names array
to separate patch
v6:
- Avoid magic when passing 0 for supported_colorspaces;
be explicit in treating it as "all DP/HDMI"
Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Sebastian Wick <[email protected]>
Reviewed-by: Joshua Ashton <[email protected]>
Reviewed-by: Simon Ser <[email protected]>
Cc: Pekka Paalanen <[email protected]>
Cc: Sebastian Wick <[email protected]>
Cc: [email protected]
Cc: Uma Shankar <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Joshua Ashton <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: Melissa Wen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
v3: Fix kerneldocs (kernel test robot)
v4: Avoid returning NULL from drm_get_colorspace_name
Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Sebastian Wick <[email protected]>
Reviewed-by: Joshua Ashton <[email protected]>
Reviewed-by: Simon Ser <[email protected]>
Cc: Pekka Paalanen <[email protected]>
Cc: Sebastian Wick <[email protected]>
Cc: [email protected]
Cc: Uma Shankar <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Joshua Ashton <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: Melissa Wen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
We an use bitfields to track the support ones for HDMI
and DP. This allows us to print colorspaces in a consistent
manner without needing to know whether we're dealing with
DP or HDMI.
v4:
- Rename _MAX to _COUNT and leave comment to indicate
it's not a valid value
- Fix misplaced function doc
v6:
- Drop magic in drm_mode_create_colorspace_property for
dealing with "0" supported_colorspaces. Expect the caller
to always provide a non-zero supported_colorspaces.
- Improve error checking and logging
Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Sebastian Wick <[email protected]>
Reviewed-by: Joshua Ashton <[email protected]>
Reviewed-by: Simon Ser <[email protected]>
Cc: Pekka Paalanen <[email protected]>
Cc: Sebastian Wick <[email protected]>
Cc: [email protected]
Cc: Uma Shankar <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Joshua Ashton <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: Melissa Wen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
To match the other enums, and add more information about these values.
v2:
- Specify where an enum entry comes from
- Clarify DEFAULT and NO_DATA behavior
- BT.2020 CYCC is "constant luminance"
- correct type for BT.601
v4:
- drop DP/HDMI clarifications that might create
more questions than answers
v5:
- Add note on YCC and RGB variants
Signed-off-by: Joshua Ashton <[email protected]>
Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Reviewed-by: Sebastian Wick <[email protected]>
Acked-by: Pekka Paalanen <[email protected]>
Reviewed-by: Simon Ser <[email protected]>
Cc: Pekka Paalanen <[email protected]>
Cc: Sebastian Wick <[email protected]>
Cc: [email protected]
Cc: Uma Shankar <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Joshua Ashton <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: Melissa Wen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
This allows us to use strongly typed arguments.
v2:
- Bring NO_DATA back
- Provide explicit enum values
v3:
- Drop unnecessary '&' from kerneldoc (emersion)
v4:
- Fix Normal Colorimetry comment
Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Simon Ser <[email protected]>
Reviewed-by: Sebastian Wick <[email protected]>
Reviewed-by: Pekka Paalanen <[email protected]>
Reviewed-by: Joshua Ashton <[email protected]>
Cc: Pekka Paalanen <[email protected]>
Cc: Sebastian Wick <[email protected]>
Cc: [email protected]
Cc: Uma Shankar <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Joshua Ashton <[email protected]>
Cc: Simon Ser <[email protected]>
Cc: Melissa Wen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
Bump the minor version to declare debugging capability is now
available.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Expose debug capabilities in the KFD topology node's HSA capabilities and
debug properties flags.
Ensure correct capabilities are exposed based on firmware support.
Flag definitions can be referenced in uapi/linux/kfd_sysfs.h.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Introduce the GPU debug operations interface.
For ROCm-GDB to extend the GNU Debugger's ability to inspect the AMD GPU
instruction set, provide the necessary interface to allow the debugger
to HW debug-mode set and query exceptions per HSA queue, process or
device.
The runtime_enable interface coordinates exception handling with the
HSA runtime.
Usage is available in the kern docs at uapi/linux/kfd_ioctl.h.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Was leftover from GC 9.4.3 bring up and is currently
unused. Drop it for now.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add device tree node for the BPMP thermal node on Tegra234 and add
thermal zone definitions.
Acked-by: Jon Hunter <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Add i2c_get_match_data() to get match data for I2C, ACPI and
DT-based matching, so that we can optimize the driver code.
Suggested-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Biju Das <[email protected]>
[wsa: simplified var initialization]
Signed-off-by: Wolfram Sang <[email protected]>
|
|
NV15_4L4 is the 10-bits per component 4x4 tiled format.
Signed-off-by: Benjamin Gaignard <[email protected]>
Signed-off-by: Nicolas Dufresne <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
Fraction bytes-per-pixel exist for some packed format. You will find
notably on Rockhip platform that 10bit data is stored fully packed,
meaning that there is 1.25 pixels per bytes. This can be represented
with the fraction 5/4 and can be used to scale the width into a
bytesperline.
Signed-off-by: Nicolas Dufresne <[email protected]>
Signed-off-by: Benjamin Gaignard <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
This patch adds the AOMedia Video 1 (AV1) kernel uAPI.
This design is based on currently available AV1 API implementations and
aims to support the development of AV1 stateless video codecs
on Linux.
Signed-off-by: Daniel Almeida <[email protected]>
Co-developed-by: Nicolas Dufresne <[email protected]>
Signed-off-by: Nicolas Dufresne <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
After mmsys and drm change DITHER enum to DDP_COMPONENT_DITHER0,
mmsys header can remove the useless DDP_COMPONENT_DITHER enum.
Signed-off-by: Jason-JH.Lin <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Reviewed-by: Rex-BC Chen <[email protected]>
Acked-by: Matthias Brugger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Matthias Brugger <[email protected]>
|
|
This allows backing ttm_tt structure with pages from different NUMA
pools.
Tested-by: Graham Sider <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Christian König <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Recently introduced commit "drm/amdgpu: Set cache coherency
for GC 9.4.3" did not update the settings applicable for svm ranges.
Add the coherence settings for svm ranges for GFX IP 9.4.3.
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a helper function to get TTM memory limit. This is
needed by KFD to set its own internal memory limits.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
OpenGL EXT_robustness extension expects the driver to stop reporting
GUILTY_CONTEXT_RESET when the reset has completed and the GPU is ready
to accept submission again.
This commit adds a AMDGPU_CTX_QUERY2_FLAGS_RESET_IN_PROGRESS flag,
that let the UMD know that the reset is still not finished.
Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290
Reviewed-by: Christian König <[email protected]>
Reviewed-by: André Almeida <[email protected]>
Signed-off-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/drivers
Renesas driver updates for v6.5 (take two)
- Convert the R-Mobile SYSC driver to readl_poll_timeout_atomic().
* tag 'renesas-drivers-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
soc: renesas: rmobile-sysc: Convert to readl_poll_timeout_atomic()
iopoll: Do not use timekeeping in read_poll_timeout_atomic()
iopoll: Call cpu_relax() in busy loops
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
Broadcom PHYs have two LEDs selector registers which allow us to control
the LED assignment, including how to turn them on/off.
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
These registers are common to most PHYs and are not specific to the
BCM5482, renamed the constants accordingly, no functional change.
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Linux 6.4-rc5
* tag 'v6.4-rc5': (919 commits)
Linux 6.4-rc5
leds: qcom-lpg: Fix PWM period limits
selftests/ftrace: Choose target function for filter test from samples
KVM: selftests: Add test for race in kvm_recalculate_apic_map()
KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
KVM: x86: Account fastpath-only VM-Exits in vCPU stats
KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker
tpm, tpm_tis: correct tpm_tis_flags enumeration values
Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"
media: uvcvideo: Don't expose unsupported formats to userspace
media: v4l2-subdev: Fix missing kerneldoc for client_caps
media: staging: media: imx: initialize hs_settle to avoid warning
media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad()
riscv: Implement missing huge_ptep_get
riscv: Fix huge_ptep_set_wrprotect when PTE is a NAPOT
module/decompress: Fix error checking on zstd decompression
fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
dt-bindings: serial: 8250_omap: add rs485-rts-active-high
selinux: don't use make's grouped targets feature yet
...
|
|
For convenience (less code duplication), the pin controller pin
configuration register values were defined in the bindings header.
These are not some IDs or other abstraction layer but raw numbers used
in the registers.
These constants do not fit the purpose of bindings. They do not
provide any abstraction, any hardware and driver independent ID. In
fact, the Linux pinctrl-single driver actually do not use the bindings
header at all.
Commit f2de003e1426 ("dt-bindings: pinctrl: k3: Deprecate header with
register constants") already moved users to the local header, so, drop
the binding header. See background discussion in [1].
While at it, clean up the MAINTAINERS file which is the only reference
left.
[1]: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Suggested-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Nishanth Menon <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-next
UAPI Changes:
- I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake.
Driver Changes:
Fixes/improvements/new stuff:
- Use large rings for compute contexts (Chris Wilson)
- Better logging/debug of unexpected GuC communication issues (Michal Wajdeczko)
- Clear out entire reports after reading if not power of 2 size (Ashutosh Dixit)
- Limit lmem allocation size to succeed on SmallBars (Andrzej Hajda)
- perf/OA capture robustness improvements on DG2 (Umesh Nerlige Ramappa)
- Fix error code in intel_gsc_uc_heci_cmd_submit_nonpriv() (Dan Carpenter)
Future platform enablement:
- Add workaround 14016712196 (Tejas Upadhyay)
- HuC loading for MTL (Daniele Ceraolo Spurio)
- Allow user to set cache at BO creation (Fei Yang)
Miscellaneous:
- Use system include style for drm headers (Jani Nikula)
- Drop legacy CTB definitions (Michal Wajdeczko)
- Turn off the timer to sample frequencies when GT is parked (Ashutosh Dixit)
- Make PMU sample array two-dimensional (Ashutosh Dixit)
- Use the correct error value when kernel_context() fails (Andi Shyti)
- Fix second parameter type of pre-gen8 pte_encode callbacks (Nathan Chancellor)
- Fix parameter in gmch_ggtt_insert_{entries, page}() (Nathan Chancellor)
- Fix size_t format specifier in gsccs_send_message() (Nathan Chancellor)
- Use the fdinfo helper (Tvrtko Ursulin)
- Add some missing error propagation (Tvrtko Ursulin)
- Reduce I915_MAX_GT to 2 (Matt Atwood)
- Rename I915_PMU_MAX_GTS to I915_PMU_MAX_GT (Matt Atwood)
- Remove some obsolete definitions (John Harrison)
Merges:
- Merge drm/drm-next into drm-intel-gt-next (Tvrtko Ursulin)
Signed-off-by: Dave Airlie <[email protected]>
From: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/ZIH09fqe5v5yArsu@tursulin-desk
|
|
Allow splice to undo the effects of MSG_MORE after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.
For UDP, a pending packet will not be emitted if the socket is closed
before it is flushed; with this change, it be flushed by ->splice_eof().
For TCP, it's not clear that MSG_MORE is actually effective.
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <[email protected]>
cc: Kuniyuki Iwashima <[email protected]>
cc: Willem de Bruijn <[email protected]>
cc: David Ahern <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add an optional method, ->splice_eof(), to allow splice to indicate the
premature termination of a splice to struct file_operations and struct
proto_ops.
This is called if sendfile() or splice() encounters all of the following
conditions inside splice_direct_to_actor():
(1) the user did not set SPLICE_F_MORE (splice only), and
(2) an EOF condition occurred (->splice_read() returned 0), and
(3) we haven't read enough to fulfill the request (ie. len > 0 still), and
(4) we have already spliced at least one byte.
A further patch will modify the behaviour of SPLICE_F_MORE to always be
passed to the actor if either the user set it or we haven't yet read
sufficient data to fulfill the request.
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Al Viro <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Jan Kara <[email protected]>
cc: Jeff Layton <[email protected]>
cc: David Hildenbrand <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: [email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage()
with a net-specific handler, splice_to_socket(), that calls sendmsg() with
MSG_SPLICE_PAGES set instead of calling ->sendpage().
MSG_MORE is used to indicate if the sendmsg() is expected to be followed
with more data.
This allows multiple pipe-buffer pages to be passed in a single call in a
BVEC iterator, allowing the processing to be pushed down to a loop in the
protocol driver. This helps pave the way for passing multipage folios down
too.
Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should
just ignore it and do a normal sendmsg() for now - although that may be a
bit slower as it may copy everything.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|