Age | Commit message (Collapse) | Author | Files | Lines |
|
For quite a while, I was doing a quick hack to gup_test.c (previously,
gup_benchmark.c) whenever I wanted to try out my changes to dump_page().
This makes that hack unnecessary, and instead allows anyone to easily get
the same coverage from a user space program. That saves a lot of time
because you don't have to change the kernel, in order to test different
pages and options.
The new sub-test takes advantage of the existing gup_test infrastructure,
which already provides a simple user space program, some allocated user
space pages, an ioctl call, pinning of those pages (via either
get_user_pages or pin_user_pages) and a corresponding kernel-side test
invocation. There's not much more required, mainly just a couple of
inputs from the user.
In fact, the new test re-uses the existing command line options in order
to get various helpful combinations (THP or normal, _fast or slow gup, gup
vs. pup, and more).
New command line options are: which pages to dump, and what type of
"get/pin" to use.
In order to figure out which pages to dump, the logic is:
* If the user doesn't specify anything, the page 0 (the first page in
the address range that the program sets up for testing) is dumped.
* Or, the user can type up to 8 page indices anywhere on the command
line. If you type more than 8, then it uses the first 8 and ignores the
remaining items.
For example:
./gup_test -ct -F 1 0 19 0x1000
Meaning:
-c: dump pages sub-test
-t: use THP pages
-F 1: use pin_user_pages() instead of get_user_pages()
0 19 0x1000: dump pages 0, 19, and 4096
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Therefore, some minor cleanup and improvements are in order:
1. Rename the other items appropriately.
2. Stop reporting timing information on the non-benchmark items. It's
still being recorded and is available, but there's no point in
cluttering up the report with data that no one reasonably needs to
check.
3. Don't do iterations, for non-benchmark items.
4. Print out a shorter, more appropriate report for the non-benchmark
tests.
5. Add the command that was run, to the report. This really helps, as
there are quite a lot of options now.
6. Use a larger integer type for cmd, now that it's being compared
Otherwise it doesn't work, because in this case cmd is about 3 billion,
which is the perfect size for problems with signed vs unsigned int.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A few cleanups that don't deserve separate patches, but that also should
not clutter up other functional changes:
1. Remove an unnecessary #include <prctl.h>
2. Restore the sorted order of TEST_GEN_FILES.
3. Add -lpthread to the common LDLIBS, as it is harmless and several
tests use it. This gets rid of one special rule already.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rename to *.sh, in order to match the conventions of all of the other
items in selftest/vm.
The only reason not to use a .sh suffix a shell script like this, might be
to make it look more like a normal program, but that's not an issue here.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Avoid the need to copy-paste the gup_test ioctl commands and the struct
gup_test definition, between the kernel and the user space application, by
providing a new header file for these. This allows easier and safer
adding of new ioctl calls, as well as reducing the overall line count.
Details: The header file has to be able to compile independently, because
of the arguably unfortunate way that the Makefile is written: the Makefile
tries to build all of its prerequisites, when really it should be only
building the .c files, and leaving the other prerequisites (LOCAL_HDRS) as
pure dependencies.
That Makefile limitation is probably not worth fixing, but it explains why
one of the includes had to be moved into the new header file.
Also: simplify the ioctl struct (struct gup_test), by deleting the unused
__expansion[10] field. This sort of thing is what you might see in a
stable ABI, but this low-level, kernel-developer-oriented selftests/vm
system is very much not subject to ABI stability. So "expansion" and
"reserved" fields are unnecessary here.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v3.
Summary: This series provides two main things, and a number of smaller
supporting goodies. The two main points are:
1) Add a new sub-test to gup_test, which in turn is a renamed version
of gup_benchmark. This sub-test allows nicer testing of dump_pages(),
at least on user-space pages.
For quite a while, I was doing a quick hack to gup_test.c whenever I
wanted to try out changes to dump_page(). Then Matthew Wilcox asked me
what I meant when I said "I used my dump_page() unit test", and I
realized that it might be nice to check in a polished up version of
that.
Details about how it works and how to use it are in the commit
description for patch #6 ("selftests/vm: gup_test: introduce the
dump_pages() sub-test").
2) Fixes a limitation of hmm-tests: these tests are incredibly useful,
but only if people actually build and run them. And it turns out that
libhugetlbfs is a little too effective at throwing a wrench in the
works, there. So I've added a little configuration check that removes
just two of the 21 hmm-tests, if libhugetlbfs is not available.
Further details in the commit description of patch #8
("selftests/vm: hmm-tests: remove the libhugetlbfs dependency").
Other smaller things that this series does:
a) Remove code duplication by creating gup_test.h.
b) Clear up the sub-test organization, and their invocation within
run_vmtests.sh.
c) Other minor assorted improvements.
[1] v2 is here:
https://lore.kernel.org/linux-doc/[email protected]/
[2] https://lore.kernel.org/r/CAHk-=wgh-TMPHLY3jueHX7Y2fWh3D+nMBqVS__AZm6-oorquWA@mail.gmail.com
This patch (of 9):
Rename nearly every "gup_benchmark" reference and file name to "gup_test".
The one exception is for the actual gup benchmark test itself.
The current code already does a *little* bit more than benchmarking, and
definitely covers more than get_user_pages_fast(). More importantly,
however, subsequent patches are about to add some functionality that is
non-benchmark related.
Closely related changes:
* Kconfig: in addition to renaming the options from GUP_BENCHMARK to
GUP_TEST, update the help text to reflect that it's no longer a
benchmark-only test.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The `else' is not useful after a `return' in __lock_page_or_retry().
[[email protected]: coding style fixes]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hailong Liu<[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
To fix a kernel-doc markups issue:
mm/truncate.c:646: warning: Function parameter or member 'mapping' not described in 'invalidate_mapping_pagevec'
mm/truncate.c:646: warning: Function parameter or member 'start' not described in 'invalidate_mapping_pagevec'
mm/truncate.c:646: warning: Function parameter or member 'end' not described in 'invalidate_mapping_pagevec'
mm/truncate.c:646: warning: Function parameter or member 'nr_pagevec' not described in 'invalidate_mapping_pagevec'
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Convert generic_file_buffered_read() to get pages to read from in batches,
and then copy data to userspace from many pages at once - in particular,
we now don't touch any cachelines that might be contended while we're in
the loop to copy data to userspace.
This is is a performance improvement on workloads that do buffered reads
with large blocksizes, and a very large performance improvement if that
file is also being accessed concurrently by different threads.
On smaller reads (512 bytes), there's a very small performance improvement
(1%, within the margin of error).
akpm: kernel test robot found a 32% speedup on one test:
https://lkml.kernel.org/r/20201030081456.GY31092@shao2-debian
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: kernel test robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "generic_file_buffered_read() improvements", v2.
generic_file_buffered_read() has turned into a real monstrosity to work
with. And it's a major performance improvement, for both small random and
large sequential reads. On my test box, 4k buffered random reads go from
~150k to ~250k iops, and the improvements to big sequential reads are even
bigger.
This incorporates the fix for IOCB_WAITQ handling that Jens just posted as
well, also factors out lock_page_for_iocb() to improve handling of the
various iocb flags.
This patch (of 2):
This is prep work for changing generic_file_buffered_read() to use
find_get_pages_contig() to batch up all the pagecache lookups.
This patch should be functionally identical to the existing code and
changes as little as of the flow control as possible. More refactoring
could be done, this patch is intended to be relatively minimal.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kent Overstreet <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Collect the time for each allocation recorded in page owner so that
allocation "surges" can be measured.
Record the pid for each allocation recorded in page owner so that the
source of allocation "surges" can be better identified.
The above is very useful when doing memory analysis. On a crash for
example, we can get this information from kdump (or ramdump) and parse it
to figure out memory allocation problems.
Please note that on x86_64 this increases the size of struct page_owner
from 16 bytes to 32.
Vlastimil: it's not a functionality intended for production, so unless
somebody says they need to enable page_owner for debugging and this
increase prevents them from fitting into available memory, let's not
complicate things with making this optional.
[[email protected]: v3]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam Mark <[email protected]>
Signed-off-by: Georgi Djakov <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Page owner of pages used by page owner itself used is missing on arm32
targets. The reason is dummy_handle and failure_handle is not initialized
correctly. Buddy allocator is used to initialize these two handles.
However, buddy allocator is not ready when page owner calls it. This
change fixed that by initializing page owner after buddy initialization.
The working flow before and after this change are:
original logic:
1. allocated memory for page_ext(using memblock).
2. invoke the init callback of page_ext_ops like page_owner(using buddy
allocator).
3. initialize buddy.
after this change:
1. allocated memory for page_ext(using memblock).
2. initialize buddy.
3. invoke the init callback of page_ext_ops like page_owner(using buddy
allocator).
with the change, failure/dummy_handle can get its correct value and page
owner output for example has the one for page owner itself:
Page allocated via order 2, mask 0x6202c0(GFP_USER|__GFP_NOWARN), pid 1006, ts 67278156558 ns
PFN 543776 type Unmovable Block 531 type Unmovable Flags 0x0()
init_page_owner+0x28/0x2f8
invoke_init_callbacks_flatmem+0x24/0x34
start_kernel+0x33c/0x5d8
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zhenhua Huang <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Linus notes the kernel has had a nice helper for the 'size of struct with
variable array member at the end' operation for a couple years now, use
it.
Link: http://lore.kernel.org/r/CAHk-=wgNTLbvAD8mNTvh+GQyapNWeX20PXhU_+frqEvVq4298w@mail.gmail.com
Link: https://lkml.kernel.org/r/160288261564.3242821.6055291930923876456.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The page order of the slab that gets chosen for a given slab cache depends
on the number of objects that can be fit in the slab while meeting other
requirements. We start with a value of minimum objects based on
nr_cpu_ids that is driven by possible number of CPUs and hence could be
higher than the actual number of CPUs present in the system. This leads
to calculate_order() chosing a page order that is on the higher side
leading to increased slab memory consumption on systems that have bigger
page sizes.
Hence rely on the number of online CPUs when determining the mininum
objects, thereby increasing the chances of chosing a lower conservative
page order for the slab.
Vlastimil said:
"Ideally, we would react to hotplug events and update existing caches
accordingly. But for that, recalculation of order for existing caches
would have to be made safe, while not affecting hot paths. We have
removed the sysfs interface with 32a6f409b693 ("mm, slub: remove
runtime allocation order changes") as it didn't seem easy and worth
the trouble.
In case somebody wants to start with a large order right from the
boot because they know they will hotplug lots of cpus later, they can
use slub_min_objects= boot param to override this heuristic. So in
case this change regresses somebody's performance, there's a way
around it and thus the risk is low IMHO"
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bharata B Rao <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 9cf7a1118365 ("mm/slub: make add_full() condition more explicit")
replaced an unnecessarily generic kmem_cache_debug(s) check with an
explicit check of SLAB_STORE_USER and #ifdef CONFIG_SLUB_DEBUG.
We can achieve the same specific check with the recently added
kmem_cache_debug_flags() which removes the #ifdef and restores the
no-branch-overhead benefit of static key check when slub debugging is not
enabled.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Abel Wu <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Liu Xiang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently in CONFIG_SLAB init_on_free happens too late, and heap objects
go to the heap quarantine not being erased.
Lets move init_on_free clearing before calling kasan_slab_free(). In that
case heap quarantine will store erased objects, similarly to CONFIG_SLUB=y
behavior.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Popov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The page allocator expects that page->mapping is NULL for a page being
freed. SLAB and SLUB use the slab_cache field which is in union with
mapping, but before freeing the page, the field is referenced with the
"mapping" name when set to NULL.
It's IMHO more correct (albeit functionally the same) to use the
slab_cache name as that's the field we use in SL*B, and document why we
clear it in a comment (we don't clear fields such as s_mem or freelist, as
page allocator doesn't care about those). While using the 'mapping' name
would automagically keep the code correct if the unions in struct page
changed, such changes should be done consciously and needed changes
evaluated - the comment should help with that.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Christian König <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Takashi Iwai <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When allocating an array of elements, users should check for
multiplication overflow or preferably use one of the provided helpers
like: kmalloc_array().
There's no krealloc_array() counterpart but there are many users who use
regular krealloc() to reallocate arrays. Let's provide an actual
krealloc_array() implementation.
While at it: add some documentation regarding krealloc.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "slab: provide and use krealloc_array()", v3.
Andy brought to my attention the fact that users allocating an array of
equally sized elements should check if the size multiplication doesn't
overflow. This is why we have helpers like kmalloc_array().
However we don't have krealloc_array() equivalent and there are many users
who do their own multiplication when calling krealloc() for arrays.
This series provides krealloc_array() and uses it in a couple places.
A separate series will follow adding devm_krealloc_array() which is needed
in the xilinx adc driver.
This patch (of 9):
__GFP_ZERO is ignored by krealloc() (unless we fall-back to kmalloc()
path, in which case it's honored). Point that out in the kerneldoc.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: James Morse <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
dump_unreclaimable_slab() acquires the slab_mutex first, and it won't
remove any slab_caches list entry when itering the slab_caches lists.
Thus we do not need list_for_each_entry_safe here, which is against
removal of list entry.
Link: https://lkml.kernel.org/r/20200926043440.GA180545@rlk
Signed-off-by: Hui Su <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are a few spelling mistakes in the Kconfig comments and help text.
Fix these.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Running stress-ng on ocfs2 completely fills the kernel log with 'max
lookup times reached, filesystem may have nested directories.'
Let's ratelimit this message as done with others in the code.
Test-case:
# mkfs.ocfs2 --mount local $DEV
# mount $DEV $MNT
# cd $MNT
# dmesg -C
# stress-ng --dirdeep 1 --dirdeep-ops 1000
# dmesg | grep -c 'max lookup times reached'
Before:
# dmesg -C
# stress-ng --dirdeep 1 --dirdeep-ops 1000
...
stress-ng: info: [11116] successful run completed in 3.03s
# dmesg | grep -c 'max lookup times reached'
967
After:
# dmesg -C
# stress-ng --dirdeep 1 --dirdeep-ops 1000
...
stress-ng: info: [739] successful run completed in 0.96s
# dmesg | grep -c 'max lookup times reached'
10
# dmesg
[ 259.086086] ocfs2_check_if_ancestor: 1990 callbacks suppressed
[ 259.086092] (stress-ng-dirde,740,1):ocfs2_check_if_ancestor:1091 max lookup times reached, filesystem may have nested directories, src inode: 18007, dest inode: 17940.
...
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A break is not needed if it is preceded by a goto
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tom Rix <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This variable isn't used anymore, remove it to skip W=1 warning:
fs/ntfs/inode.c:2350:6: warning: variable `attr_len' set but not used [-Wunused-but-set-variable]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Anton Altaparmakov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We actually don't use these varibles, so remove them to avoid gcc warning:
fs/ntfs/file.c:326:14: warning: variable `base_ni' set but not used [-Wunused-but-set-variable]
fs/ntfs/logfile.c:481:21: warning: variable `log_page_mask' set but not used [-Wunused-but-set-variable]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Anton Altaparmakov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In the discussion about preempt count consistency across kernel
configurations:
https://lore.kernel.org/r/[email protected]/
it was concluded that the usage of in_interrupt() and related context
checks should be removed from non-core code.
Both BUG_ON()s in ide-probe.c were introduced in commit
4015c949fb465 ("[PATCH] update ide core")
when ide_unregister() was extended with semaphore based locking. Both
checks won't complain about disabled preemption which is also wrong.
The might_sleep() in today's mutex_lock() will complain about the
missuses.
Remove the BUG_ON() statements.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Cc: "David S. Miller" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
falconide_get_lock() is called by ide_lock_host() and its caller
(ide_issue_rq()) has already a might_sleep() check.
stdma_lock() has wait_event() which also has a might_sleep() check.
Remove the in_interrupt() check.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Cc: "David S. Miller" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
and include <linux/const.h> in UAPI headers instead of <linux/kernel.h>.
The reason is to avoid indirect <linux/sysinfo.h> include when using
some network headers: <linux/netlink.h> or others -> <linux/kernel.h>
-> <linux/sysinfo.h>.
This indirect include causes on MUSL redefinition of struct sysinfo when
included both <sys/sysinfo.h> and some of UAPI headers:
In file included from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/kernel.h:5,
from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/netlink.h:5,
from ../include/tst_netlink.h:14,
from tst_crypto.c:13:
x86_64-buildroot-linux-musl/sysroot/usr/include/linux/sysinfo.h:8:8: error: redefinition of `struct sysinfo'
struct sysinfo {
^~~~~~~
In file included from ../include/tst_safe_macros.h:15,
from ../include/tst_test.h:93,
from tst_crypto.c:11:
x86_64-buildroot-linux-musl/sysroot/usr/include/sys/sysinfo.h:10:8: note: originally defined here
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Vorel <[email protected]>
Suggested-by: Rich Felker <[email protected]>
Acked-by: Rich Felker <[email protected]>
Cc: Peter Korsgaard <[email protected]>
Cc: Baruch Siach <[email protected]>
Cc: Florian Weimer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kthread worker API is simple. In short, it allows to create, use, and
destroy workers. kthread_create_worker_on_cpu() just allows to bind a
newly created worker to a given CPU.
It is up to the API user how to handle CPU hotplug. They have to decide
how to handle pending work items, prevent queuing new ones, and restore
the functionality when the CPU goes off and on. There are few catches:
+ The CPU affinity gets lost when it is scheduled on an offline CPU.
+ The worker might not exist when the CPU was off when the user
created the workers.
A good practice is to implement two CPU hotplug callbacks and
destroy/create the worker when CPU goes down/up.
Mention this in the function description.
[[email protected]: grammar tweaks]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Reported-by: Zhang Qiang <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
While migrating some code from wq to kthread_worker, I found that I missed
the execute_start/end tracepoints. So add similar tracepoints for
kthread_work. And for completeness, queue_work tracepoint (although this
one differs slightly from the matching workqueue tracepoint).
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rob Clark <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Phil Auld <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Thara Gopinath <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Vincent Donnefort <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ilias Stamatis <[email protected]>
Cc: Liang Chen <[email protected]>
Cc: Ben Dooks <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates for 5.11 from Marc Zyngier:
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Remove the fasteoi IPI flow which has been proved useless
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups
Link: https://lore.kernel.org/r/[email protected]
|
|
The || condition in hdev->fd_active_type != HCLGE_FD_ARFS_ACTIVE ||
hdev->fd_active_type != HCLGE_FD_RULE_NONE will always be true because
hdev->fd_active_type cannot be equal to two different values at the same
time. The expression is always true which is not correct. Fix this by
replacing || with && to correct the logic in the expression.
Addresses-Coverity: ("Constant expression result")
Fixes: 0205ec041ec6 ("net: hns3: add support for hw tc offload of tc flower")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Huazhong Tan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
proc_fs was used, in af_packet, without a surrounding #ifdef,
although there is no hard dependency on proc_fs.
That caused the initialization of the af_packet module to fail
when CONFIG_PROC_FS=n.
Specifically, proc_create_net() was used in af_packet.c,
and when it fails, packet_net_init() returns -ENOMEM.
It will always fail when the kernel is compiled without proc_fs,
because, proc_create_net() for example always returns NULL.
The calling order that starts in af_packet.c is as follows:
packet_init()
register_pernet_subsys()
register_pernet_operations()
__register_pernet_operations()
ops_init()
ops->init() (packet_net_ops.init=packet_net_init())
proc_create_net()
It worked in the past because register_pernet_subsys()'s return value
wasn't checked before this Commit 36096f2f4fa0 ("packet: Fix error path in
packet_init.").
It always returned an error, but was not checked before, so everything
was working even when CONFIG_PROC_FS=n.
The fix here is simply to add the necessary #ifdef.
This also fixes a similar error in tls_proc.c, that was found by Jakub
Kicinski.
Fixes: d26b698dd3cd ("net/tls: add skeleton of MIB statistics")
Fixes: 36096f2f4fa0 ("packet: Fix error path in packet_init")
Signed-off-by: Yonatan Linik <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Andra Paraschiv says:
====================
vsock: Add flags field in the vsock address
vsock enables communication between virtual machines and the host they are
running on. Nested VMs can be setup to use vsock channels, as the multi
transport support has been available in the mainline since the v5.5 Linux
kernel has been released.
Implicitly, if no host->guest vsock transport is loaded, all the vsock packets
are forwarded to the host. This behavior can be used to setup communication
channels between sibling VMs that are running on the same host. One example can
be the vsock channels that can be established within AWS Nitro Enclaves
(see Documentation/virt/ne_overview.rst).
To be able to explicitly mark a connection as being used for a certain use case,
add a flags field in the vsock address data structure. The value of the flags
field is taken into consideration when the vsock transport is assigned. This
way can distinguish between different use cases, such as nested VMs / local
communication and sibling VMs.
The flags field can be set in the user space application connect logic. On the
listen path, the field can be set in the kernel space logic.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The vsock flags field can be set in the connect path (user space app)
and the (listen) receive path (kernel space logic).
When the vsock transport is assigned, the remote CID is used to
distinguish between types of connection.
Use the vsock flags value (in addition to the CID) from the remote
address to decide which vsock transport to assign. For the sibling VMs
use case, all the vsock packets need to be forwarded to the host, so
always assign the guest->host transport if the VMADDR_FLAG_TO_HOST flag
is set. For the other use cases, the vsock transport assignment logic is
not changed.
Changelog
v3 -> v4
* Update the "remote_flags" local variable type to reflect the change of
the "svm_flags" field to be 1 byte in size.
v2 -> v3
* Update bitwise check logic to not compare result to the flag value.
v1 -> v2
* Use bitwise operator to check the vsock flag.
* Use the updated "VMADDR_FLAG_TO_HOST" flag naming.
* Merge the checks for the g2h transport assignment in one "if" block.
Signed-off-by: Andra Paraschiv <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The vsock flags can be set during the connect() setup logic, when
initializing the vsock address data structure variable. Then the vsock
transport is assigned, also considering this flags field.
The vsock transport is also assigned on the (listen) receive path. The
flags field needs to be set considering the use case.
Set the value of the vsock flags of the remote address to the one
targeted for packets forwarding to the host, if the following conditions
are met:
* The source CID of the packet is higher than VMADDR_CID_HOST.
* The destination CID of the packet is higher than VMADDR_CID_HOST.
Changelog
v3 -> v4
* No changes.
v2 -> v3
* No changes.
v1 -> v2
* Set the vsock flag on the receive path in the vsock transport
assignment logic.
* Use bitwise operator for the vsock flag setup.
* Use the updated "VMADDR_FLAG_TO_HOST" flag naming.
Signed-off-by: Andra Paraschiv <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Check if the provided flags value from the vsock address data structure
includes the supported flags in the corresponding kernel version.
The first byte of the "svm_zero" field is used as "svm_flags", so add
the flags check instead.
Changelog
v3 -> v4
* New patch in v4.
Signed-off-by: Andra Paraschiv <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add VMADDR_FLAG_TO_HOST vsock flag that is used to setup a vsock
connection where all the packets are forwarded to the host.
Then, using this type of vsock channel, vsock communication between
sibling VMs can be built on top of it.
Changelog
v3 -> v4
* Update the "VMADDR_FLAG_TO_HOST" value, as the size of the field has
been updated to 1 byte.
v2 -> v3
* Update comments to mention when the flag is set in the connect and
listen paths.
v1 -> v2
* New patch in v2, it was split from the first patch in the series.
* Remove the default value for the vsock flags field.
* Update the naming for the vsock flag to "VMADDR_FLAG_TO_HOST".
Signed-off-by: Andra Paraschiv <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
vsock enables communication between virtual machines and the host they
are running on. With the multi transport support (guest->host and
host->guest), nested VMs can also use vsock channels for communication.
In addition to this, by default, all the vsock packets are forwarded to
the host, if no host->guest transport is loaded. This behavior can be
implicitly used for enabling vsock communication between sibling VMs.
Add a flags field in the vsock address data structure that can be used
to explicitly mark the vsock connection as being targeted for a certain
type of communication. This way, can distinguish between different use
cases such as nested VMs and sibling VMs.
This field can be set when initializing the vsock address variable used
for the connect() call.
Changelog
v3 -> v4
* Update the size of "svm_flags" field to be 1 byte instead of 2 bytes.
v2 -> v3
* Add "svm_flags" as a new field, not reusing "svm_reserved1".
v1 -> v2
* Update the field name to "svm_flags".
* Split the current patch in 2 patches.
Signed-off-by: Andra Paraschiv <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
With NETIF_F_HW_TLS_TX packets are encrypted in HW. This cannot be
logically done when HW_CSUM offload is off.
Fixes: 2342a8512a1e ("net: Add TLS TX offload features")
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There are cases where a fastopen SYN may trigger either a ICMP_TOOBIG
message in the case of IPv6 or a fragmentation request in the case of
IPv4. This results in the socket stalling for a second or more as it does
not respond to the message by retransmitting the SYN frame.
Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that
makes use of the entire MSS. In the case of fastopen it does, and an
additional complication is that the retransmit queue doesn't contain the
original frames. As a result when tcp_simple_retransmit is called and
walks the list of frames in the queue it may not mark the frames as lost
because both the SYN and the data packet each individually are smaller than
the MSS size after the adjustment. This results in the socket being stalled
until the retransmit timer kicks in and forces the SYN frame out again
without the data attached.
In order to resolve this we can reduce the MSS the packets are compared
to in tcp_simple_retransmit to -1 for cases where we are still in the
TCP_SYN_SENT state for a fastopen socket. Doing this we will mark all of
the packets related to the fastopen SYN as lost.
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Link: https://lore.kernel.org/r/160780498125.3272.15437756269539236825.stgit@localhost.localdomain
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
context
Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has
a very nice ocelot_mact_wait_for_completion at the end. Introduced in
commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be
wall time not attempts"), this function uses readx_poll_timeout which
triggers a lot of lockdep warnings and is also dangerous to use from
atomic context, potentially leading to lockups and panics.
Steen Hegelund added a poll timeout of 100 ms for checking the MAC
table, a duration which is clearly absurd to poll in atomic context.
So we need to defer the MAC table access to process context, which we do
via a dynamically allocated workqueue which contains all there is to
know about the MAC table operation it has to do.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|