Age | Commit message (Collapse) | Author | Files | Lines |
|
For huge pages (and in fact, any compound page), the GUP_PIN_COUNTING_BIAS
scheme tends to overflow too easily, each tail page increments the head
page->_refcount by GUP_PIN_COUNTING_BIAS (1024). That limits the number
of huge pages that can be pinned.
This patch removes that limitation, by using an exact form of pin counting
for compound pages of order > 1. The "order > 1" is required because this
approach uses the 3rd struct page in the compound page, and order 1
compound pages only have two pages, so that won't work there.
A new struct page field, hpage_pinned_refcount, has been added, replacing
a padding field in the union (so no new space is used).
This enhancement also has a useful side effect: huge pages and compound
pages (of order > 1) do not suffer from the "potential false positives"
problem that is discussed in the page_dma_pinned() comment block. That is
because these compound pages have extra space for tracking things, so they
get exact pin counts instead of overloading page->_refcount.
Documentation/core-api/pin_user_pages.rst is updated accordingly.
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add tracking of pages that were pinned via FOLL_PIN. This tracking is
implemented via overloading of page->_refcount: pins are added by adding
GUP_PIN_COUNTING_BIAS (1024) to the refcount. This provides a fuzzy
indication of pinning, and it can have false positives (and that's OK).
Please see the pre-existing Documentation/core-api/pin_user_pages.rst for
details.
As mentioned in pin_user_pages.rst, callers who effectively set FOLL_PIN
(typically via pin_user_pages*()) are required to ultimately free such
pages via unpin_user_page().
Please also note the limitation, discussed in pin_user_pages.rst under the
"TODO: for 1GB and larger huge pages" section. (That limitation will be
removed in a following patch.)
The effect of a FOLL_PIN flag is similar to that of FOLL_GET, and may be
thought of as "FOLL_GET for DIO and/or RDMA use".
Pages that have been pinned via FOLL_PIN are identifiable via a new
function call:
bool page_maybe_dma_pinned(struct page *page);
What to do in response to encountering such a page, is left to later
patchsets. There is discussion about this in [1], [2], [3], and [4].
This also changes a BUG_ON(), to a WARN_ON(), in follow_page_mask().
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages():
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
[[email protected]: add kerneldoc]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: if pin fails, we need to unpin, a simple put_page will not be enough]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix put_compound_head defined but not used]
Suggested-by: Jan Kara <[email protected]>
Suggested-by: Jérôme Glisse <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Internal to mm/gup.c, require that get_user_pages_fast() and
__get_user_pages_fast() identify themselves, by setting FOLL_GET. This is
required in order to be able to make decisions based on "FOLL_PIN, or
FOLL_GET, or both or neither are set", in upcoming patches.
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In preparation for an upcoming patch, send gup flags args to two more
routines: put_compound_head(), and undo_dev_pagemap().
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
An upcoming patch requires subtracting a large chunk of refcounts from a
page, and checking what the resulting refcount is. This is a little
different than the usual "check for zero refcount" that many of the page
ref functions already do. However, it is similar to a few other routines
that (like this one) are generally useful for things such as 1-based
refcounting.
Add page_ref_sub_return(), that subtracts a chunk of refcounts atomically,
and returns an atomic snapshot of the result.
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A subsequent patch requires access to gup flags, so pass the flags
argument through to the __gup_device_* functions.
Also placate checkpatch.pl by shortening a nearby line.
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm/gup: track FOLL_PIN pages", v6.
This activates tracking of FOLL_PIN pages. This is in support of fixing
the get_user_pages()+DMA problem described in [1]-[4].
FOLL_PIN support is now in the main linux tree. However, the patch to use
FOLL_PIN to track pages was *not* submitted, because Leon saw an RDMA test
suite failure that involved (I think) page refcount overflows when huge
pages were used.
This patch definitively solves that kind of overflow problem, by adding an
exact pincount, for compound pages (of order > 1), in the 3rd struct page
of a compound page. If available, that form of pincounting is used,
instead of the GUP_PIN_COUNTING_BIAS approach. Thanks again to Jan Kara
for that idea.
Other interesting changes:
* dump_page(): added one, or two new things to report for compound
pages: head refcount (for all compound pages), and map_pincount (for
compound pages of order > 1).
* Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
huge page refcount upper limit problems, and added notes about how it
works now. Also added a note about the dump_page() enhancements.
* Added some comments in gup.c and mm.h, to explain that there are two
ways to count pinned pages: exact (for compound pages of order > 1) and
fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
============================================================
General notes about the tracking patch:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], [4] and in a remarkable number of email threads since about
2017. :)
In contrast to earlier approaches, the page tracking can be incrementally
applied to the kernel call sites that, until now, have been simply calling
get_user_pages() ("gup"). In other words, opt-in by changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Future steps:
* Convert more subsystems from get_user_pages() to pin_user_pages().
The first probably needs to be bio/biovecs, because any filesystem
testing is too difficult without those in place.
* Change VFS and filesystems to respond appropriately when encountering
dma-pinned pages.
* Work with Ira and others to connect this all up with file system
leases.
[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/
[4] LWN kernel index: get_user_pages()
https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
This patch (of 12):
An upcoming patch requires reusing the implementation of
get_user_pages_remote(). Split up get_user_pages_remote() into an outer
routine that checks flags, and an implementation routine that will be
reused. This makes subsequent changes much easier to understand.
There should be no change in behavior due to this patch.
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
- These were never called PCG flags; they've been called FGP flags since
their introduction in 2014.
- The FGP_FOR_MMAP flag was misleadingly documented as if it was an
alternative to FGP_CREAT instead of an option to it.
- Rename the 'offset' parameter to 'index'.
- Capitalisation, formatting, rewording.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
No in-tree users (proc, madvise, memcg, mincore) can be built as a module.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Dumping the page information in this circumstance helps for debugging.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This isn't just a random struct page, it's known to be a head page, and
calling it head makes the function better self-documenting. The pgoff_t
is less confusing if it's named index instead of offset. Also add a
couple of comments to explain why we're doing various things.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Pankaj Gupta <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use VM_FAULT_OOM instead of indirecting through vmf_error(-ENOMEM).
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The first argument of shrink_readahead_size_eio() is not used. Hence
remove it from the function definition and from all the callers.
Signed-off-by: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Mount failure issue happens under the scenario: Application forked dozens
of threads to mount the same number of cramfs images separately in docker,
but several mounts failed with high probability. Mount failed due to the
checking result of the page(read from the superblock of loop dev) is not
uptodate after wait_on_page_locked(page) returned in function cramfs_read:
wait_on_page_locked(page);
if (!PageUptodate(page)) {
...
}
The reason of the checking result of the page not uptodate: systemd-udevd
read the loopX dev before mount, because the status of loopX is Lo_unbound
at this time, so loop_make_request directly trigger the calling of io_end
handler end_buffer_async_read, which called SetPageError(page). So It
caused the page can't be set to uptodate in function
end_buffer_async_read:
if(page_uptodate && !PageError(page)) {
SetPageUptodate(page);
}
Then mount operation is performed, it used the same page which is just
accessed by systemd-udevd above, Because this page is not uptodate, it
will launch a actual read via submit_bh, then wait on this page by calling
wait_on_page_locked(page). When the I/O of the page done, io_end handler
end_buffer_async_read is called, because no one cleared the page
error(during the whole read path of mount), which is caused by
systemd-udevd reading, so this page is still in "PageError" status, which
can't be set to uptodate in function end_buffer_async_read, then caused
mount failure.
But sometimes mount succeed even through systemd-udeved read loopX dev
just before, The reason is systemd-udevd launched other loopX read just
between step 3.1 and 3.2, the steps as below:
1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
1) set loopX status to Lo_bound;
==>systemd-udevd read loopX dev<==
2) read loopX dev(page has no error)
3) mount succeed
As the loopX dev status is set to Lo_bound after step 3.1, so the other
loopX dev read by systemd-udevd will go through the whole I/O stack, part
of the call trace as below:
SYS_read
vfs_read
do_sync_read
blkdev_aio_read
generic_file_aio_read
do_generic_file_read:
ClearPageError(page);
mapping->a_ops->readpage(filp, page);
here, mapping->a_ops->readpage() is blkdev_readpage. In latest kernel,
some function name changed, the call trace as below:
blkdev_read_iter
generic_file_read_iter
generic_file_buffered_read:
/*
* A previous I/O error may have been due to temporary
* failures, eg. mutipath errors.
* Pg_error will be set again if readpage fails.
*/
ClearPageError(page);
/* Start the actual read. The read will unlock the page*/
error=mapping->a_ops->readpage(flip, page);
We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed.
This patch is to add the calling of ClearPageError just before the actual
read of read path of cramfs mount. Without the patch, the call trace as
below when performing cramfs mount:
do_mount
cramfs_read
cramfs_blkdev_read
read_cache_page
do_read_cache_page:
filler(data, page);
or
mapping->a_ops->readpage(data, page);
With the patch, the call trace as below when performing mount:
do_mount
cramfs_read
cramfs_blkdev_read
read_cache_page:
do_read_cache_page:
ClearPageError(page); <== new add
filler(data, page);
or
mapping->a_ops->readpage(data, page);
With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no error if no
additional page error happen when I/O done.
Signed-off-by: Xianting Tian <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There used to be a 'retry' label in between the two (identical) checks
when first introduced in commit f446daaea9d4 ("mm: implement writeback
livelock avoidance using page tagging"), and later modified/updated in
commit 6e6938b6d313 ("writeback: introduce .tagged_writepages for the
WB_SYNC_NONE sync stage").
The label has been removed in commit 64081362e8ff ("mm/page-writeback.c:
fix range_cyclic writeback vs writepages deadlock"), and the (identical)
checks are now present / performed immediately one after another.
So, remove/deduplicate the latter check, moving tag_pages_for_writeback()
into the former check before the 'tag' variable assignment, so it's clear
that it's not used in this (similarly-named) function call but only later
in pagevec_lookup_range_tag().
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Jan Kara <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When handling a page fault, we drop mmap_sem to start async readahead so
that we don't block on IO submission with mmap_sem held. However there's
no point to drop mmap_sem in case readahead is disabled. Handle that case
to avoid pointless dropping of mmap_sem and retrying the fault. This was
actually reported to block mlockall(MCL_CURRENT) indefinitely.
Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
Reported-by: Minchan Kim <[email protected]>
Reported-by: Robert Stupp <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Kmemleak could scan task stacks while plain writes happens to those stack
variables which could results in data races. For example, in
sys_rt_sigaction and do_sigaction(), it could have plain writes in a
32-byte size. Since the kmemleak does not care about the actual values of
a non-pointer and all do_sigaction() call sites only copy to stack
variables, just disable KCSAN for kmemleak to avoid annotating anything
outside Kmemleak just because Kmemleak scans everything.
Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Marco Elver <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Clang warns:
mm/kmemleak.c:1955:28: warning: array comparison always evaluates to a constant [-Wtautological-compare]
if (__start_ro_after_init < _sdata || __end_ro_after_init > _edata)
^
mm/kmemleak.c:1955:60: warning: array comparison always evaluates to a constant [-Wtautological-compare]
if (__start_ro_after_init < _sdata || __end_ro_after_init > _edata)
These are not true arrays, they are linker defined symbols, which are just
addresses. Using the address of operator silences the warning and does
not change the resulting assembly with either clang/ld.lld or gcc/ld
(tested with diff + objdump -Dr).
Suggested-by: Nick Desaulniers <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Link: https://github.com/ClangBuiltLinux/linux/issues/895
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
fallback node"
This reverts commit ad2c8144418c6a81cefe65379fd47bbe8344cef2.
The function node_to_mem_node() was introduced by that commit for use in SLUB
on systems with memoryless nodes, but it turned out to be unreliable on some
architectures/configurations and a simpler solution exists than fixing it up.
Thus commit 0715e6c516f1 ("mm, slub: prevent kmalloc_node crashes and
memory leaks") removed the only user of node_to_mem_node() and we can
revert the commit that introduced the function.
Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Srikar Dronamraju <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Bharata B Rao <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Nathan Lynch <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: PUVICHAKRAVARTHY RAMACHANDRAN <[email protected]>
Cc: Sachin Sant <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In a recent discussion[1] with Vitaly Nikolenko and Silvio Cesare, it
became clear that moving the freelist pointer away from the edge of
allocations would likely improve the overall defensive posture of the
inline freelist pointer. My benchmarks show no meaningful change to
performance (they seem to show it being faster), so this looks like a
reasonable change to make.
Instead of having the freelist pointer at the very beginning of an
allocation (offset 0) or at the very end of an allocation (effectively
offset -sizeof(void *) from the next allocation), move it away from the
edges of the allocation and into the middle. This provides some
protection against small-sized neighboring overflows (or underflows), for
which the freelist pointer is commonly the target. (Large or well
controlled overwrites are much more likely to attack live object contents,
instead of attempting freelist corruption.)
The vaunted kernel build benchmark, across 5 runs. Before:
Mean: 250.05
Std Dev: 1.85
and after, which appears mysteriously faster:
Mean: 247.13
Std Dev: 0.76
Attempts at running "sysbench --test=memory" show the change to be well in
the noise (sysbench seems to be pretty unstable here -- it's not really
measuring allocation).
Hackbench is more allocation-heavy, and while the std dev is above the
difference, it looks like may manifest as an improvement as well:
20 runs of "hackbench -g 20 -l 1000", before:
Mean: 36.322
Std Dev: 0.577
and after:
Mean: 36.056
Std Dev: 0.598
[1] https://twitter.com/vnik5287/status/1235113523098685440
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Vitaly Nikolenko <[email protected]>
Cc: Silvio Cesare <[email protected]>
Cc: Christoph Lameter <[email protected]>Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Link: http://lkml.kernel.org/r/202003051624.AAAC9AECC@keescook
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Under CONFIG_SLAB_FREELIST_HARDENED=y, the obfuscation was relatively weak
in that the ptr and ptr address were usually so close that the first XOR
would result in an almost entirely 0-byte value[1], leaving most of the
"secret" number ultimately being stored after the third XOR. A single
blind memory content exposure of the freelist was generally sufficient to
learn the secret.
Add a swab() call to mix bits a little more. This is a cheap way (1
cycle) to make attacks need more than a single exposure to learn the
secret (or to know _where_ the exposure is in memory).
kmalloc-32 freelist walk, before:
ptr ptr_addr stored value secret
ffff90c22e019020@ffff90c22e019000 is 86528eb656b3b5bd (86528eb656b3b59d)
ffff90c22e019040@ffff90c22e019020 is 86528eb656b3b5fd (86528eb656b3b59d)
ffff90c22e019060@ffff90c22e019040 is 86528eb656b3b5bd (86528eb656b3b59d)
ffff90c22e019080@ffff90c22e019060 is 86528eb656b3b57d (86528eb656b3b59d)
ffff90c22e0190a0@ffff90c22e019080 is 86528eb656b3b5bd (86528eb656b3b59d)
...
after:
ptr ptr_addr stored value secret
ffff9eed6e019020@ffff9eed6e019000 is 793d1135d52cda42 (86528eb656b3b59d)
ffff9eed6e019040@ffff9eed6e019020 is 593d1135d52cda22 (86528eb656b3b59d)
ffff9eed6e019060@ffff9eed6e019040 is 393d1135d52cda02 (86528eb656b3b59d)
ffff9eed6e019080@ffff9eed6e019060 is 193d1135d52cdae2 (86528eb656b3b59d)
ffff9eed6e0190a0@ffff9eed6e019080 is f93d1135d52cdac2 (86528eb656b3b59d)
[1] https://blog.infosectcbr.com.au/2020/03/weaknesses-in-linux-kernel-heap.html
Fixes: 2482ddec670f ("mm: add SLUB free list pointer obfuscation")
Reported-by: Silvio Cesare <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/202003051623.AF4F8CB@keescook
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are slub_cpu_partial() and slub_set_cpu_partial() APIs to wrap
kmem_cache->cpu_partial. This patch will use the two APIs to replace
kmem_cache->cpu_partial in slub code.
Signed-off-by: chenqiwu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are slub_percpu_partial() and slub_set_percpu_partial() APIs to wrap
kmem_cache->cpu_partial. This patch will use the two to replace
cpu_slab->partial in slub code.
Signed-off-by: chenqiwu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This notice fills my boot logs with scary-looking asterisks but doesn't
really tell me anything. Let's just remove it; validation errors are
already reported separately, so this is just a redundant list of
filesystems.
$ dmesg | grep VALIDATE
[ 0.306256] *** VALIDATE tmpfs ***
[ 0.307422] *** VALIDATE proc ***
[ 0.308355] *** VALIDATE cgroup ***
[ 0.308741] *** VALIDATE cgroup2 ***
[ 0.813256] *** VALIDATE bpf ***
[ 0.815272] *** VALIDATE ramfs ***
[ 0.815665] *** VALIDATE hugetlbfs ***
[ 0.876970] *** VALIDATE nfs ***
[ 0.877383] *** VALIDATE nfs4 ***
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Seth Arnold <[email protected]>
Cc: Alexander Viro <[email protected]>
Link: http://lkml.kernel.org/r/202003061617.A8835CAAF@keescook
Signed-off-by: Linus Torvalds <[email protected]>
|
|
OCFS2 doesn't mind if memory reclaim makes I/Os happen; it just cares that
it won't be reentered, so it can use memalloc_nofs_save() instead of
memalloc_noio_save().
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since snprintf() returns the would-be-output size instead of the actual
output size, the succeeding calls may go beyond the given buffer limit.
Fix it by replacing with scnprintf().
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
an error occurs
Under some conditions, the directory cannot be deleted. The specific
scenarios are as follows: (for example, /mnt/ocfs2 is the mount point)
1. Create the /mnt/ocfs2/p_dir directory. At this time, the i_nlink
corresponding to the inode of the /mnt/ocfs2/p_dir directory is equal
to 2.
2. During the process of creating the /mnt/ocfs2/p_dir/s_dir
directory, if the call to the inc_nlink function in ocfs2_mknod
succeeds, the functions such as ocfs2_init_acl,
ocfs2_init_security_set, and ocfs2_dentry_attach_lock fail. At this
time, the i_nlink corresponding to the inode of the /mnt/ocfs2/p_dir
directory is equal to 3, but /mnt/ocfs2/p_dir/s_dir is not added to the
/mnt/ocfs2/p_dir directory entry.
3. Delete the /mnt/ocfs2/p_dir directory (rm -rf /mnt/ocfs2/p_dir).
At this time, it is found that the i_nlink corresponding to the inode
corresponding to the /mnt/ocfs2/p_dir directory is equal to 3.
Therefore, the /mnt/ocfs2/p_dir directory cannot be deleted.
Signed-off-by: Jian wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jun Piao <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://urldefense.com/v3/__https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html__;!!GqivPVa7Brio!OKPotRhYhHbCG2kibo8Q6_6CuKaa28d_74h1svxyR6rbshrK2L_BdrQpNbvJWBWb40QCkg$
[2] https://urldefense.com/v3/__https://github.com/KSPP/linux/issues/21__;!!GqivPVa7Brio!OKPotRhYhHbCG2kibo8Q6_6CuKaa28d_74h1svxyR6rbshrK2L_BdrQpNbvJWBUhNn9M6g$
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/20200309202155.GA8432@embeddedor
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://urldefense.com/v3/__https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html__;!!GqivPVa7Brio!OVOYL_CouISa5L1Lw-20EEFQntw6cKMx-j8UdY4z78uYgzKBUFcfpn50GaurvbV5v7YiUA$
[2] https://urldefense.com/v3/__https://github.com/KSPP/linux/issues/21__;!!GqivPVa7Brio!OVOYL_CouISa5L1Lw-20EEFQntw6cKMx-j8UdY4z78uYgzKBUFcfpn50GaurvbXs8Eh8eg$
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/20200309202016.GA8210@embeddedor
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://urldefense.com/v3/__https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html__;!!GqivPVa7Brio!NzMr-YRl2zy-K3lwLVVatz7x0uD2z7-ykQag4GrGigxmfWU8TWzDy6xrkTiW3hYl00czlw$
[2] https://urldefense.com/v3/__https://github.com/KSPP/linux/issues/21__;!!GqivPVa7Brio!NzMr-YRl2zy-K3lwLVVatz7x0uD2z7-ykQag4GrGigxmfWU8TWzDy6xrkTiW3hYHG1nAnw$
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/20200309201907.GA8005@embeddedor
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/20200213160244.GA6088@embeddedor
Signed-off-by: Linus Torvalds <[email protected]>
|
|
ocfs2_refcount_cache_unlock()
Sparse reports warnings at ocfs2_refcount_cache_lock()
and ocfs2_refcount_cache_unlock()
warning: context imbalance in ocfs2_refcount_cache_lock()
- wrong count at exit
warning: context imbalance in ocfs2_refcount_cache_unlock()
- unexpected unlock
The root cause is the missing annotation at ocfs2_refcount_cache_lock()
and at ocfs2_refcount_cache_unlock()
Add the missing __acquires(&rf->rf_lock) annotation to
ocfs2_refcount_cache_lock()
Add the missing __releases(&rf->rf_lock) annotation to
ocfs2_refcount_cache_unlock()
Signed-off-by: Jules Irenge <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We don't need 'err' in these 2 places, better to remove them.
Signed-off-by: Alex Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: ChenGang <[email protected]>
Cc: Richard Fontana <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Correct annotation from "l_next_rec" to "l_next_free_rec"
Signed-off-by: Yan Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jun Piao <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is no need to log twice in several functions.
Signed-off-by: Yan Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jun Piao <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This macro has been unused since it was introduced.
Signed-off-by: Alex Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This macro should be used.
Signed-off-by: Alex Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
O2HB_DEFAULT_BLOCK_BITS/DLM_THREAD_MAX_ASTS/DLM_MIGRATION_RETRY_MS and
OCFS2_MAX_RESV_WINDOW_BITS/OCFS2_MIN_RESV_WINDOW_BITS have been unused
since commit 66effd3c6812 ("ocfs2/dlm: Do not migrate resource to a node
that is leaving the domain").
Signed-off-by: Alex Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: ChenGang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Richard Fontana <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This macro is unused since commit ab09203e302b ("sysctl fs: Remove dead
binary sysctl support").
Signed-off-by: Alex Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Here are some of the more common spelling mistakes and typos that I've
found while fixing up spelling mistakes in the kernel since November 2019
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Joe Perches <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are a few cases in the tree where "sysfs" is misspelled as "syfs".
Signed-off-by: Jonathan Neuschäfer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Xiong <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Chris Paterson <[email protected]>
Cc: Luca Ceresoli <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Change a header to mandatory-y if both of the following are met:
[1] At least one architecture (except um) specifies it as generic-y in
arch/*/include/asm/Kbuild
[2] Every architecture (except um) either has its own implementation
(arch/*/include/asm/*.h) or specifies it as generic-y in
arch/*/include/asm/Kbuild
This commit was generated by the following shell script.
----------------------------------->8-----------------------------------
arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d')
tmpfile=$(mktemp)
grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile
find arch -path 'arch/*/include/asm/Kbuild' |
xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u |
while read header
do
mandatory=yes
for arch in $arches
do
if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild &&
! [ -f arch/$arch/include/asm/$header ]; then
mandatory=no
break
fi
done
if [ "$mandatory" = yes ]; then
echo "mandatory-y += $header" >> $tmpfile
for arch in $arches
do
sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild
done
fi
done
sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild
LANG=C sort $tmpfile >> include/asm-generic/Kbuild
----------------------------------->8-----------------------------------
One obvious benefit is the diff stat:
25 files changed, 52 insertions(+), 557 deletions(-)
It is tedious to list generic-y for each arch that needs it.
So, mandatory-y works like a fallback default (by just wrapping
asm-generic one) when arch does not have a specific header
implementation.
See the following commits:
def3f7cefe4e81c296090e1722a76551142c227c
a1b39bae16a62ce4aae02d958224f19316d98b24
It is tedious to convert headers one by one, so I processed by a shell
script.
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The timer used by delayed kthread works are IRQ safe because the used
kthread_delayed_work_timer_fn() is IRQ safe.
It is properly marked when initialized by KTHREAD_DELAYED_WORK_INIT().
But TIMER_IRQSAFE flag is missing when initialized by
kthread_init_delayed_work().
The missing flag might trigger invalid warning from del_timer_sync() when
kthread_mod_delayed_work() is called with interrupts disabled.
This patch is result of a discussion about using the API, see
https://lkml.kernel.org/r/[email protected]
Reported-by: Grygorii Strashko <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Grygorii Strashko <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A recent change to the netlink code: 6e237d099fac ("netlink: Relax attr
validation for fixed length types") logs a warning when programs send
messages with invalid attributes (e.g., wrong length for a u32). Yafang
reported this error message for tools/accounting/getdelays.c.
send_cmd() is wrongly adding 1 to the attribute length. As noted in
include/uapi/linux/netlink.h nla_len should be NLA_HDRLEN + payload
length, so drop the +1.
Fixes: 9e06d3f9f6b1 ("per task delay accounting taskstats interface: documentation fix")
Reported-by: Yafang Shao <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Yafang Shao <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Shailabh Nagar <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Every remaining user just has the error case returning -EFAULT.
In fact, the exception was __get_user_asm_nozero(), which was removed in
commit 4b842e4e25b1 ("x86: get rid of small constant size cases in
raw_copy_{to,from}_user()"), and the other __get_user_xyz() macros just
followed suit for consistency.
Fix up some macro whitespace while at it.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The last user was removed by commit 4b842e4e25b1 ("x86: get rid of small
constant size cases in raw_copy_{to,from}_user()"). Get rid of the
left-overs before somebody tries to use it again.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull networking updates from David Miller:
"Highlights:
1) Fix the iwlwifi regression, from Johannes Berg.
2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.
3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.
4) Add TTL decrement action to openvswitch, from Matteo Croce.
5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.
6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.
7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.
8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.
9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.
10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.
11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.
12) Support bcmgenet on ACPI, from Jeremy Linton.
13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.
14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.
15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.
16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.
17) Offload FIFO qdisc in mlxsw, from Petr Machata.
18) Support UDP sockets in sockmap, from Lorenz Bauer.
19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.
20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.
21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.
22) Add support for hw offload of MACSEC, from Antoine Tenart.
23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.
24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.
25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...
|
|
Pull NTB updates from Jon Mason:
"Bug fixes for a few printing issues, link status detection bug on AMD
hardware, and a DMA address issue with ntb_perf.
Also, large series of AMD NTB patches"
* tag 'ntb-5.7' of git://github.com/jonmason/ntb: (21 commits)
NTB: add pci shutdown handler for AMD NTB
NTB: send DB event when driver is loaded or un-loaded
NTB: remove redundant setting of DB valid mask
NTB: return link up status correctly for PRI and SEC
NTB: add helper functions to set and clear sideinfo
NTB: move ntb_ctrl handling to init and deinit
NTB: handle link up, D0 and D3 events correctly
NTB: handle link down event correctly
NTB: remove handling of peer_sta from amd_link_is_up
NTB: set peer_sta within event handler itself
NTB: return the side info status from amd_poll_link
NTB: define a new function to get link status
NTB: Enable link up and down event notification
NTB: clear interrupt status register
NTB: Fix access to link status and control register
MAINTAINERS: update maintainer list for AMD NTB driver
NTB: ntb_transport: Use scnprintf() for avoiding potential buffer overflow
ntb_hw_switchtec: Fix ntb_mw_clear_trans error if size == 0
ntb_tool: Fix printk format
NTB: ntb_perf: Fix address err in perf_copy_chunk
...
|
|
git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver updates from Andy Shevchenko:
- Fix for improper handling of fan_boost_mode in sysfs for ASUS
laptops.
- On newer ASUS laptops the 1st battery is named differently, here is a
fix.
- Fix Lex 2I385SW to allow both network cards to be used.
- The power integrated circuit driver for Surface 3 has been added.
- Refactor and clean up of Intel PMC driver and enable it on Intel
Jasper Lake.
- Clean up of Dell RBU driver.
- Big update for Intel Speed Select technology support tool and driver.
* tag 'platform-drivers-x86-v5.7-1' of git://git.infradead.org/linux-platform-drivers-x86: (75 commits)
platform/x86: surface3_power: Fix always true condition in mshw0011_space_handler()
platform/x86: surface3_power: Fix Kconfig section ordering
platform/x86: surface3_power: Add missed headers
platform/x86: surface3_power: Reformat GUID assignment
platform/x86: surface3_power: Drop useless macro ACPI_PTR()
platform/x86: surface3_power: Prefix POLL_INTERVAL with SURFACE_3
platform/x86: surface3_power: Simplify mshw0011_adp_psr() to one liner
platform/x86: surface3_power: Use dev_err() instead of pr_err()
platform/x86: surface3_power: Drop unused structure definition
platform/x86: surface3_power: MSHW0011 rev-eng implementation
platform/x86: intel_pmc_core: Make pmc_core_substate_res_show() generic
platform/x86: intel_pmc_core: Make pmc_core_lpm_display() generic for platforms that support sub-states
tools/power/x86/intel-speed-select: Fix a typo in error message
tools/power/x86/intel-speed-select: Update version
tools/power/x86/intel-speed-select: Avoid duplicate Package strings for json
tools/power/x86/intel-speed-select: Add display for enabled cpus count
tools/power/x86/intel-speed-select: Print friendly warning for bad command line
tools/power/x86/intel-speed-select: Fix avx options for turbo-freq feature
tools/power/x86/intel-speed-select: Improve CLX commands
tools/power/x86/intel-speed-select: Show error for invalid CPUs in the options
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the big set of TTY / Serial patches for 5.7-rc1
Lots of console fixups and reworking in here, serial core tweaks
(doesn't that ever get old, why are we still creating new serial
devices?), serial driver updates, line-protocol driver updates, and
some vt cleanups and fixes included in here as well.
All have been in linux-next with no reported issues"
* tag 'tty-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (161 commits)
serial: 8250: Optimize irq enable after console write
serial: 8250: Fix rs485 delay after console write
vt: vt_ioctl: fix use-after-free in vt_in_use()
vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console
tty: serial: make SERIAL_SPRD depend on COMMON_CLK
tty: serial: fsl_lpuart: fix return value checking
tty: serial: fsl_lpuart: move dma_request_chan()
ARM: dts: tango4: Make /serial compatible with ns16550a
ARM: dts: mmp*: Make the serial ports compatible with xscale-uart
ARM: dts: mmp*: Fix serial port names
ARM: dts: mmp2-brownstone: Don't redeclare phandle references
ARM: dts: pxa*: Make the serial ports compatible with xscale-uart
ARM: dts: pxa*: Fix serial port names
ARM: dts: pxa*: Don't redeclare phandle references
serial: omap: drop unused dt-bindings header
serial: 8250: 8250_omap: Add DMA support for UARTs on K3 SoCs
serial: 8250: 8250_omap: Work around errata causing spurious IRQs with DMA
serial: 8250: 8250_omap: Extend driver data to pass FIFO trigger info
serial: 8250: 8250_omap: Move locking out from __dma_rx_do_complete()
serial: 8250: 8250_omap: Account for data in flight during DMA teardown
...
|