aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-12-15selftests/vm: gup_test: introduce the dump_pages() sub-testJohn Hubbard4-4/+113
For quite a while, I was doing a quick hack to gup_test.c (previously, gup_benchmark.c) whenever I wanted to try out my changes to dump_page(). This makes that hack unnecessary, and instead allows anyone to easily get the same coverage from a user space program. That saves a lot of time because you don't have to change the kernel, in order to test different pages and options. The new sub-test takes advantage of the existing gup_test infrastructure, which already provides a simple user space program, some allocated user space pages, an ioctl call, pinning of those pages (via either get_user_pages or pin_user_pages) and a corresponding kernel-side test invocation. There's not much more required, mainly just a couple of inputs from the user. In fact, the new test re-uses the existing command line options in order to get various helpful combinations (THP or normal, _fast or slow gup, gup vs. pup, and more). New command line options are: which pages to dump, and what type of "get/pin" to use. In order to figure out which pages to dump, the logic is: * If the user doesn't specify anything, the page 0 (the first page in the address range that the program sets up for testing) is dumped. * Or, the user can type up to 8 page indices anywhere on the command line. If you type more than 8, then it uses the first 8 and ignores the remaining items. For example: ./gup_test -ct -F 1 0 19 0x1000 Meaning: -c: dump pages sub-test -t: use THP pages -F 1: use pin_user_pages() instead of get_user_pages() 0 19 0x1000: dump pages 0, 19, and 4096 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15selftests/vm: only some gup_test items are really benchmarksJohn Hubbard4-20/+51
Therefore, some minor cleanup and improvements are in order: 1. Rename the other items appropriately. 2. Stop reporting timing information on the non-benchmark items. It's still being recorded and is available, but there's no point in cluttering up the report with data that no one reasonably needs to check. 3. Don't do iterations, for non-benchmark items. 4. Print out a shorter, more appropriate report for the non-benchmark tests. 5. Add the command that was run, to the report. This really helps, as there are quite a lot of options now. 6. Use a larger integer type for cmd, now that it's being compared Otherwise it doesn't work, because in this case cmd is about 3 billion, which is the perfect size for problems with signed vs unsigned int. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15selftests/vm: minor cleanup: Makefile and gup_test.cJohn Hubbard2-7/+4
A few cleanups that don't deserve separate patches, but that also should not clutter up other functional changes: 1. Remove an unnecessary #include <prctl.h> 2. Restore the sorted order of TEST_GEN_FILES. 3. Add -lpthread to the common LDLIBS, as it is harmless and several tests use it. This gets rid of one special rule already. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15selftests/vm: rename run_vmtests --> run_vmtests.shJohn Hubbard1-1/+1
Rename to *.sh, in order to match the conventions of all of the other items in selftest/vm. The only reason not to use a .sh suffix a shell script like this, might be to make it look more like a normal program, but that's not an issue here. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15selftests/vm: use a common gup_test.hJohn Hubbard4-37/+26
Avoid the need to copy-paste the gup_test ioctl commands and the struct gup_test definition, between the kernel and the user space application, by providing a new header file for these. This allows easier and safer adding of new ioctl calls, as well as reducing the overall line count. Details: The header file has to be able to compile independently, because of the arguably unfortunate way that the Makefile is written: the Makefile tries to build all of its prerequisites, when really it should be only building the .c files, and leaving the other prerequisites (LOCAL_HDRS) as pure dependencies. That Makefile limitation is probably not worth fixing, but it explains why one of the includes had to be moved into the new header file. Also: simplify the ioctl struct (struct gup_test), by deleting the unused __expansion[10] field. This sort of thing is what you might see in a stable ABI, but this low-level, kernel-developer-oriented selftests/vm system is very much not subject to ABI stability. So "expansion" and "reserved" fields are unnecessary here. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/gup_benchmark: rename to mm/gup_testJohn Hubbard11-44/+49
Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v3. Summary: This series provides two main things, and a number of smaller supporting goodies. The two main points are: 1) Add a new sub-test to gup_test, which in turn is a renamed version of gup_benchmark. This sub-test allows nicer testing of dump_pages(), at least on user-space pages. For quite a while, I was doing a quick hack to gup_test.c whenever I wanted to try out changes to dump_page(). Then Matthew Wilcox asked me what I meant when I said "I used my dump_page() unit test", and I realized that it might be nice to check in a polished up version of that. Details about how it works and how to use it are in the commit description for patch #6 ("selftests/vm: gup_test: introduce the dump_pages() sub-test"). 2) Fixes a limitation of hmm-tests: these tests are incredibly useful, but only if people actually build and run them. And it turns out that libhugetlbfs is a little too effective at throwing a wrench in the works, there. So I've added a little configuration check that removes just two of the 21 hmm-tests, if libhugetlbfs is not available. Further details in the commit description of patch #8 ("selftests/vm: hmm-tests: remove the libhugetlbfs dependency"). Other smaller things that this series does: a) Remove code duplication by creating gup_test.h. b) Clear up the sub-test organization, and their invocation within run_vmtests.sh. c) Other minor assorted improvements. [1] v2 is here: https://lore.kernel.org/linux-doc/[email protected]/ [2] https://lore.kernel.org/r/CAHk-=wgh-TMPHLY3jueHX7Y2fWh3D+nMBqVS__AZm6-oorquWA@mail.gmail.com This patch (of 9): Rename nearly every "gup_benchmark" reference and file name to "gup_test". The one exception is for the actual gup benchmark test itself. The current code already does a *little* bit more than benchmarking, and definitely covers more than get_user_pages_fast(). More importantly, however, subsequent patches are about to add some functionality that is non-benchmark related. Closely related changes: * Kconfig: in addition to renaming the options from GUP_BENCHMARK to GUP_TEST, update the help text to reflect that it's no longer a benchmark-only test. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/filemap.c: remove else after a returnHailong Liu1-11/+12
The `else' is not useful after a `return' in __lock_page_or_retry(). [[email protected]: coding style fixes] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hailong Liu<[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/truncate: add parameter explanation for invalidate_mapping_pagevecAlex Shi1-3/+9
To fix a kernel-doc markups issue: mm/truncate.c:646: warning: Function parameter or member 'mapping' not described in 'invalidate_mapping_pagevec' mm/truncate.c:646: warning: Function parameter or member 'start' not described in 'invalidate_mapping_pagevec' mm/truncate.c:646: warning: Function parameter or member 'end' not described in 'invalidate_mapping_pagevec' mm/truncate.c:646: warning: Function parameter or member 'nr_pagevec' not described in 'invalidate_mapping_pagevec' Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Shi <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contigKent Overstreet1-138/+175
Convert generic_file_buffered_read() to get pages to read from in batches, and then copy data to userspace from many pages at once - in particular, we now don't touch any cachelines that might be contended while we're in the loop to copy data to userspace. This is is a performance improvement on workloads that do buffered reads with large blocksizes, and a very large performance improvement if that file is also being accessed concurrently by different threads. On smaller reads (512 bytes), there's a very small performance improvement (1%, within the margin of error). akpm: kernel test robot found a 32% speedup on one test: https://lkml.kernel.org/r/20201030081456.GY31092@shao2-debian Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kent Overstreet <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: kernel test robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/filemap/c: break generic_file_buffered_read up into multiple functionsKent Overstreet1-222/+261
Patch series "generic_file_buffered_read() improvements", v2. generic_file_buffered_read() has turned into a real monstrosity to work with. And it's a major performance improvement, for both small random and large sequential reads. On my test box, 4k buffered random reads go from ~150k to ~250k iops, and the improvements to big sequential reads are even bigger. This incorporates the fix for IOCB_WAITQ handling that Jens just posted as well, also factors out lock_page_for_iocb() to improve handling of the various iocb flags. This patch (of 2): This is prep work for changing generic_file_buffered_read() to use find_get_pages_contig() to batch up all the pagecache lookups. This patch should be functionally identical to the existing code and changes as little as of the flow control as possible. More refactoring could be done, this patch is intended to be relatively minimal. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kent Overstreet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/page_owner: record timestamp and pidLiam Mark2-10/+19
Collect the time for each allocation recorded in page owner so that allocation "surges" can be measured. Record the pid for each allocation recorded in page owner so that the source of allocation "surges" can be better identified. The above is very useful when doing memory analysis. On a crash for example, we can get this information from kdump (or ramdump) and parse it to figure out memory allocation problems. Please note that on x86_64 this increases the size of struct page_owner from 16 bytes to 32. Vlastimil: it's not a functionality intended for production, so unless somebody says they need to enable page_owner for debugging and this increase prevents them from fitting into available memory, let's not complicate things with making this optional. [[email protected]: v3] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam Mark <[email protected]> Signed-off-by: Georgi Djakov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm: fix page_owner initializing issue for arm32Zhenhua Huang3-2/+18
Page owner of pages used by page owner itself used is missing on arm32 targets. The reason is dummy_handle and failure_handle is not initialized correctly. Buddy allocator is used to initialize these two handles. However, buddy allocator is not ready when page owner calls it. This change fixed that by initializing page owner after buddy initialization. The working flow before and after this change are: original logic: 1. allocated memory for page_ext(using memblock). 2. invoke the init callback of page_ext_ops like page_owner(using buddy allocator). 3. initialize buddy. after this change: 1. allocated memory for page_ext(using memblock). 2. initialize buddy. 3. invoke the init callback of page_ext_ops like page_owner(using buddy allocator). with the change, failure/dummy_handle can get its correct value and page owner output for example has the one for page owner itself: Page allocated via order 2, mask 0x6202c0(GFP_USER|__GFP_NOWARN), pid 1006, ts 67278156558 ns PFN 543776 type Unmovable Block 531 type Unmovable Flags 0x0() init_page_owner+0x28/0x2f8 invoke_init_callbacks_flatmem+0x24/0x34 start_kernel+0x33c/0x5d8 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhenhua Huang <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15device-dax/kmem: use struct_size()Dan Williams1-1/+1
Linus notes the kernel has had a nice helper for the 'size of struct with variable array member at the end' operation for a couple years now, use it. Link: http://lore.kernel.org/r/CAHk-=wgNTLbvAD8mNTvh+GQyapNWeX20PXhU_+frqEvVq4298w@mail.gmail.com Link: https://lkml.kernel.org/r/160288261564.3242821.6055291930923876456.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/slub: let number of online CPUs determine the slub page orderBharata B Rao1-1/+1
The page order of the slab that gets chosen for a given slab cache depends on the number of objects that can be fit in the slab while meeting other requirements. We start with a value of minimum objects based on nr_cpu_ids that is driven by possible number of CPUs and hence could be higher than the actual number of CPUs present in the system. This leads to calculate_order() chosing a page order that is on the higher side leading to increased slab memory consumption on systems that have bigger page sizes. Hence rely on the number of online CPUs when determining the mininum objects, thereby increasing the chances of chosing a lower conservative page order for the slab. Vlastimil said: "Ideally, we would react to hotplug events and update existing caches accordingly. But for that, recalculation of order for existing caches would have to be made safe, while not affecting hot paths. We have removed the sysfs interface with 32a6f409b693 ("mm, slub: remove runtime allocation order changes") as it didn't seem easy and worth the trouble. In case somebody wants to start with a large order right from the boot because they know they will hotplug lots of cpus later, they can use slub_min_objects= boot param to override this heuristic. So in case this change regresses somebody's performance, there's a way around it and thus the risk is low IMHO" Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bharata B Rao <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm, slub: use kmem_cache_debug_flags() in deactivate_slab()Vlastimil Babka1-3/+1
Commit 9cf7a1118365 ("mm/slub: make add_full() condition more explicit") replaced an unnecessarily generic kmem_cache_debug(s) check with an explicit check of SLAB_STORE_USER and #ifdef CONFIG_SLUB_DEBUG. We can achieve the same specific check with the recently added kmem_cache_debug_flags() which removes the #ifdef and restores the no-branch-overhead benefit of static key check when slub debugging is not enabled. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Cc: Abel Wu <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Liu Xiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/slab: rerform init_on_free earlierAlexander Popov1-2/+3
Currently in CONFIG_SLAB init_on_free happens too late, and heap objects go to the heap quarantine not being erased. Lets move init_on_free clearing before calling kasan_slab_free(). In that case heap quarantine will store erased objects, similarly to CONFIG_SLUB=y behavior. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Popov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm, slab, slub: clear the slab_cache field when freeing pageVlastimil Babka2-3/+4
The page allocator expects that page->mapping is NULL for a page being freed. SLAB and SLUB use the slab_cache field which is in union with mapping, but before freeing the page, the field is referenced with the "mapping" name when set to NULL. It's IMHO more correct (albeit functionally the same) to use the slab_cache name as that's the field we use in SL*B, and document why we clear it in a comment (we don't clear fields such as s_mem or freelist, as page allocator doesn't care about those). While using the 'mapping' name would automagically keep the code correct if the unions in struct page changed, such changes should be done consciously and needed changes evaluated - the comment should help with that. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15dma-buf: use krealloc_array()Bartosz Golaszewski1-2/+1
Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Acked-by: Christian König <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gustavo Padovan <[email protected]> Cc: James Morse <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Jason Wang <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: "Michael S . Tsirkin" <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15hwtracing: intel: use krealloc_array()Bartosz Golaszewski1-1/+1
Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Knig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gustavo Padovan <[email protected]> Cc: James Morse <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Jason Wang <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: "Michael S . Tsirkin" <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15drm: atomic: use krealloc_array()Bartosz Golaszewski1-1/+2
Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Knig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gustavo Padovan <[email protected]> Cc: James Morse <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Jason Wang <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: "Michael S . Tsirkin" <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15edac: ghes: use krealloc_array()Bartosz Golaszewski1-2/+2
Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Acked-by: Borislav Petkov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Knig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gustavo Padovan <[email protected]> Cc: James Morse <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Jason Wang <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: "Michael S . Tsirkin" <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15pinctrl: use krealloc_array()Bartosz Golaszewski1-1/+1
Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Knig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gustavo Padovan <[email protected]> Cc: James Morse <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Jason Wang <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: "Michael S . Tsirkin" <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15vhost: vringh: use krealloc_array()Bartosz Golaszewski1-1/+2
Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Knig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gustavo Padovan <[email protected]> Cc: James Morse <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Jason Wang <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15ALSA: pcm: use krealloc_array()Bartosz Golaszewski1-2/+2
Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Reviewed-by: Takashi Iwai <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Knig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gustavo Padovan <[email protected]> Cc: James Morse <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Jason Wang <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: "Michael S . Tsirkin" <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm: slab: provide krealloc_array()Bartosz Golaszewski2-0/+22
When allocating an array of elements, users should check for multiplication overflow or preferably use one of the provided helpers like: kmalloc_array(). There's no krealloc_array() counterpart but there are many users who use regular krealloc() to reallocate arrays. Let's provide an actual krealloc_array() implementation. While at it: add some documentation regarding krealloc. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Knig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gustavo Padovan <[email protected]> Cc: James Morse <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Jason Wang <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: "Michael S . Tsirkin" <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm: slab: clarify krealloc()'s behavior with __GFP_ZEROBartosz Golaszewski1-3/+3
Patch series "slab: provide and use krealloc_array()", v3. Andy brought to my attention the fact that users allocating an array of equally sized elements should check if the size multiplication doesn't overflow. This is why we have helpers like kmalloc_array(). However we don't have krealloc_array() equivalent and there are many users who do their own multiplication when calling krealloc() for arrays. This series provides krealloc_array() and uses it in a couple places. A separate series will follow adding devm_krealloc_array() which is needed in the xilinx adc driver. This patch (of 9): __GFP_ZERO is ignored by krealloc() (unless we fall-back to kmalloc() path, in which case it's honored). Point that out in the kerneldoc. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: Gustavo Padovan <[email protected]> Cc: Christian Knig <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Tony Luck <[email protected]> Cc: James Morse <[email protected]> Cc: Robert Richter <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Linus Walleij <[email protected]> Cc: "Michael S . Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Jaroslav Kysela <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/slab_common.c: use list_for_each_entry in dump_unreclaimable_slab()Hui Su1-2/+2
dump_unreclaimable_slab() acquires the slab_mutex first, and it won't remove any slab_caches list entry when itering the slab_caches lists. Thus we do not need list_for_each_entry_safe here, which is against removal of list entry. Link: https://lkml.kernel.org/r/20200926043440.GA180545@rlk Signed-off-by: Hui Su <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15arch/Kconfig: fix spelling mistakesColin Ian King1-4/+4
There are a few spelling mistakes in the Kconfig comments and help text. Fix these. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15ocfs2: ratelimit the 'max lookup times reached' noticeMauricio Faria de Oliveira1-2/+2
Running stress-ng on ocfs2 completely fills the kernel log with 'max lookup times reached, filesystem may have nested directories.' Let's ratelimit this message as done with others in the code. Test-case: # mkfs.ocfs2 --mount local $DEV # mount $DEV $MNT # cd $MNT # dmesg -C # stress-ng --dirdeep 1 --dirdeep-ops 1000 # dmesg | grep -c 'max lookup times reached' Before: # dmesg -C # stress-ng --dirdeep 1 --dirdeep-ops 1000 ... stress-ng: info: [11116] successful run completed in 3.03s # dmesg | grep -c 'max lookup times reached' 967 After: # dmesg -C # stress-ng --dirdeep 1 --dirdeep-ops 1000 ... stress-ng: info: [739] successful run completed in 0.96s # dmesg | grep -c 'max lookup times reached' 10 # dmesg [ 259.086086] ocfs2_check_if_ancestor: 1990 callbacks suppressed [ 259.086092] (stress-ng-dirde,740,1):ocfs2_check_if_ancestor:1091 max lookup times reached, filesystem may have nested directories, src inode: 18007, dest inode: 17940. ... Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15fs/ocfs2/cluster/tcp.c: remove unneeded breakTom Rix1-1/+0
A break is not needed if it is preceded by a goto Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tom Rix <[email protected]> Acked-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15fs/ntfs: remove unused variable attr_lenAlex Shi1-2/+0
This variable isn't used anymore, remove it to skip W=1 warning: fs/ntfs/inode.c:2350:6: warning: variable `attr_len' set but not used [-Wunused-but-set-variable] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Shi <[email protected]> Acked-by: Anton Altaparmakov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15fs/ntfs: remove unused variblesAlex Shi2-6/+2
We actually don't use these varibles, so remove them to avoid gcc warning: fs/ntfs/file.c:326:14: warning: variable `base_ni' set but not used [-Wunused-but-set-variable] fs/ntfs/logfile.c:481:21: warning: variable `log_page_mask' set but not used [-Wunused-but-set-variable] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Shi <[email protected]> Acked-by: Anton Altaparmakov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15ide: remove BUG_ON(in_interrupt() || irqs_disabled()) from ide_unregister()Sebastian Andrzej Siewior1-3/+0
In the discussion about preempt count consistency across kernel configurations: https://lore.kernel.org/r/[email protected]/ it was concluded that the usage of in_interrupt() and related context checks should be removed from non-core code. Both BUG_ON()s in ide-probe.c were introduced in commit 4015c949fb465 ("[PATCH] update ide core") when ide_unregister() was extended with semaphore based locking. Both checks won't complain about disabled preemption which is also wrong. The might_sleep() in today's mutex_lock() will complain about the missuses. Remove the BUG_ON() statements. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Jens Axboe <[email protected]> Cc: "David S. Miller" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15ide/falcon: remove in_interrupt() usageSebastian Andrzej Siewior1-2/+0
falconide_get_lock() is called by ide_lock_host() and its caller (ide_issue_rq()) has already a might_sleep() check. stdma_lock() has wait_event() which also has a might_sleep() check. Remove the in_interrupt() check. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Cc: "David S. Miller" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15uapi: move constants from <linux/kernel.h> to <linux/const.h>Petr Vorel8-14/+12
and include <linux/const.h> in UAPI headers instead of <linux/kernel.h>. The reason is to avoid indirect <linux/sysinfo.h> include when using some network headers: <linux/netlink.h> or others -> <linux/kernel.h> -> <linux/sysinfo.h>. This indirect include causes on MUSL redefinition of struct sysinfo when included both <sys/sysinfo.h> and some of UAPI headers: In file included from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/kernel.h:5, from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/netlink.h:5, from ../include/tst_netlink.h:14, from tst_crypto.c:13: x86_64-buildroot-linux-musl/sysroot/usr/include/linux/sysinfo.h:8:8: error: redefinition of `struct sysinfo' struct sysinfo { ^~~~~~~ In file included from ../include/tst_safe_macros.h:15, from ../include/tst_test.h:93, from tst_crypto.c:11: x86_64-buildroot-linux-musl/sysroot/usr/include/sys/sysinfo.h:10:8: note: originally defined here Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Vorel <[email protected]> Suggested-by: Rich Felker <[email protected]> Acked-by: Rich Felker <[email protected]> Cc: Peter Korsgaard <[email protected]> Cc: Baruch Siach <[email protected]> Cc: Florian Weimer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15kthread_worker: document CPU hotplug handlingPetr Mladek1-1/+19
The kthread worker API is simple. In short, it allows to create, use, and destroy workers. kthread_create_worker_on_cpu() just allows to bind a newly created worker to a given CPU. It is up to the API user how to handle CPU hotplug. They have to decide how to handle pending work items, prevent queuing new ones, and restore the functionality when the CPU goes off and on. There are few catches: + The CPU affinity gets lost when it is scheduled on an offline CPU. + The worker might not exist when the CPU was off when the user created the workers. A good practice is to implement two CPU hotplug callbacks and destroy/create the worker when CPU goes down/up. Mention this in the function description. [[email protected]: grammar tweaks] Link: https://lore.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Reported-by: Zhang Qiang <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15kthread: add kthread_work tracepointsRob Clark2-0/+93
While migrating some code from wq to kthread_worker, I found that I missed the execute_start/end tracepoints. So add similar tracepoints for kthread_work. And for completeness, queue_work tracepoint (although this one differs slightly from the matching workqueue tracepoint). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rob Clark <[email protected]> Cc: Rob Clark <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Phil Auld <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Thara Gopinath <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ilias Stamatis <[email protected]> Cc: Liang Chen <[email protected]> Cc: Ben Dooks <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15Merge tag 'irqchip-5.11' of ↵Thomas Gleixner1468-9236/+19693
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates for 5.11 from Marc Zyngier: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Remove the fasteoi IPI flow which has been proved useless - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups Link: https://lore.kernel.org/r/[email protected]
2020-12-14net: hns3: fix expression that is currently always trueColin Ian King1-1/+1
The || condition in hdev->fd_active_type != HCLGE_FD_ARFS_ACTIVE || hdev->fd_active_type != HCLGE_FD_RULE_NONE will always be true because hdev->fd_active_type cannot be equal to two different values at the same time. The expression is always true which is not correct. Fix this by replacing || with && to correct the logic in the expression. Addresses-Coverity: ("Constant expression result") Fixes: 0205ec041ec6 ("net: hns3: add support for hw tc offload of tc flower") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Huazhong Tan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14net: fix proc_fs init handling in af_packet and tlsYonatan Linik2-0/+5
proc_fs was used, in af_packet, without a surrounding #ifdef, although there is no hard dependency on proc_fs. That caused the initialization of the af_packet module to fail when CONFIG_PROC_FS=n. Specifically, proc_create_net() was used in af_packet.c, and when it fails, packet_net_init() returns -ENOMEM. It will always fail when the kernel is compiled without proc_fs, because, proc_create_net() for example always returns NULL. The calling order that starts in af_packet.c is as follows: packet_init() register_pernet_subsys() register_pernet_operations() __register_pernet_operations() ops_init() ops->init() (packet_net_ops.init=packet_net_init()) proc_create_net() It worked in the past because register_pernet_subsys()'s return value wasn't checked before this Commit 36096f2f4fa0 ("packet: Fix error path in packet_init."). It always returned an error, but was not checked before, so everything was working even when CONFIG_PROC_FS=n. The fix here is simply to add the necessary #ifdef. This also fixes a similar error in tls_proc.c, that was found by Jakub Kicinski. Fixes: d26b698dd3cd ("net/tls: add skeleton of MIB statistics") Fixes: 36096f2f4fa0 ("packet: Fix error path in packet_init") Signed-off-by: Yonatan Linik <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14nfc: pn533: convert comma to semicolonZheng Yongjun1-1/+1
Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14Merge branch 'vsock-add-flags-field-in-the-vsock-address'Jakub Kicinski3-4/+47
Andra Paraschiv says: ==================== vsock: Add flags field in the vsock address vsock enables communication between virtual machines and the host they are running on. Nested VMs can be setup to use vsock channels, as the multi transport support has been available in the mainline since the v5.5 Linux kernel has been released. Implicitly, if no host->guest vsock transport is loaded, all the vsock packets are forwarded to the host. This behavior can be used to setup communication channels between sibling VMs that are running on the same host. One example can be the vsock channels that can be established within AWS Nitro Enclaves (see Documentation/virt/ne_overview.rst). To be able to explicitly mark a connection as being used for a certain use case, add a flags field in the vsock address data structure. The value of the flags field is taken into consideration when the vsock transport is assigned. This way can distinguish between different use cases, such as nested VMs / local communication and sibling VMs. The flags field can be set in the user space application connect logic. On the listen path, the field can be set in the kernel space logic. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14af_vsock: Assign the vsock transport considering the vsock address flagsAndra Paraschiv1-2/+7
The vsock flags field can be set in the connect path (user space app) and the (listen) receive path (kernel space logic). When the vsock transport is assigned, the remote CID is used to distinguish between types of connection. Use the vsock flags value (in addition to the CID) from the remote address to decide which vsock transport to assign. For the sibling VMs use case, all the vsock packets need to be forwarded to the host, so always assign the guest->host transport if the VMADDR_FLAG_TO_HOST flag is set. For the other use cases, the vsock transport assignment logic is not changed. Changelog v3 -> v4 * Update the "remote_flags" local variable type to reflect the change of the "svm_flags" field to be 1 byte in size. v2 -> v3 * Update bitwise check logic to not compare result to the flag value. v1 -> v2 * Use bitwise operator to check the vsock flag. * Use the updated "VMADDR_FLAG_TO_HOST" flag naming. * Merge the checks for the g2h transport assignment in one "if" block. Signed-off-by: Andra Paraschiv <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive pathAndra Paraschiv1-0/+12
The vsock flags can be set during the connect() setup logic, when initializing the vsock address data structure variable. Then the vsock transport is assigned, also considering this flags field. The vsock transport is also assigned on the (listen) receive path. The flags field needs to be set considering the use case. Set the value of the vsock flags of the remote address to the one targeted for packets forwarding to the host, if the following conditions are met: * The source CID of the packet is higher than VMADDR_CID_HOST. * The destination CID of the packet is higher than VMADDR_CID_HOST. Changelog v3 -> v4 * No changes. v2 -> v3 * No changes. v1 -> v2 * Set the vsock flag on the receive path in the vsock transport assignment logic. * Use bitwise operator for the vsock flag setup. * Use the updated "VMADDR_FLAG_TO_HOST" flag naming. Signed-off-by: Andra Paraschiv <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14vsock_addr: Check for supported flag valuesAndra Paraschiv1-1/+3
Check if the provided flags value from the vsock address data structure includes the supported flags in the corresponding kernel version. The first byte of the "svm_zero" field is used as "svm_flags", so add the flags check instead. Changelog v3 -> v4 * New patch in v4. Signed-off-by: Andra Paraschiv <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flagAndra Paraschiv1-0/+20
Add VMADDR_FLAG_TO_HOST vsock flag that is used to setup a vsock connection where all the packets are forwarded to the host. Then, using this type of vsock channel, vsock communication between sibling VMs can be built on top of it. Changelog v3 -> v4 * Update the "VMADDR_FLAG_TO_HOST" value, as the size of the field has been updated to 1 byte. v2 -> v3 * Update comments to mention when the flag is set in the connect and listen paths. v1 -> v2 * New patch in v2, it was split from the first patch in the series. * Remove the default value for the vsock flags field. * Update the naming for the vsock flag to "VMADDR_FLAG_TO_HOST". Signed-off-by: Andra Paraschiv <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14vm_sockets: Add flags field in the vsock address data structureAndra Paraschiv1-1/+5
vsock enables communication between virtual machines and the host they are running on. With the multi transport support (guest->host and host->guest), nested VMs can also use vsock channels for communication. In addition to this, by default, all the vsock packets are forwarded to the host, if no host->guest transport is loaded. This behavior can be implicitly used for enabling vsock communication between sibling VMs. Add a flags field in the vsock address data structure that can be used to explicitly mark the vsock connection as being targeted for a certain type of communication. This way, can distinguish between different use cases such as nested VMs and sibling VMs. This field can be set when initializing the vsock address variable used for the connect() call. Changelog v3 -> v4 * Update the size of "svm_flags" field to be 1 byte instead of 2 bytes. v2 -> v3 * Add "svm_flags" as a new field, not reusing "svm_reserved1". v1 -> v2 * Update the field name to "svm_flags". * Split the current patch in 2 patches. Signed-off-by: Andra Paraschiv <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabledTariq Toukan2-1/+12
With NETIF_F_HW_TLS_TX packets are encrypted in HW. This cannot be logically done when HW_CSUM offload is off. Fixes: 2342a8512a1e ("net: Add TLS TX offload features") Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14tcp: Add logic to check for SYN w/ data in tcp_simple_retransmitAlexander Duyck1-1/+16
There are cases where a fastopen SYN may trigger either a ICMP_TOOBIG message in the case of IPv6 or a fragmentation request in the case of IPv4. This results in the socket stalling for a second or more as it does not respond to the message by retransmitting the SYN frame. Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that makes use of the entire MSS. In the case of fastopen it does, and an additional complication is that the retransmit queue doesn't contain the original frames. As a result when tcp_simple_retransmit is called and walks the list of frames in the queue it may not mark the frames as lost because both the SYN and the data packet each individually are smaller than the MSS size after the adjustment. This results in the socket being stalled until the retransmit timer kicks in and forces the SYN frame out again without the data attached. In order to resolve this we can reduce the MSS the packets are compared to in tcp_simple_retransmit to -1 for cases where we are still in the TCP_SYN_SENT state for a fastopen socket. Doing this we will mark all of the packets related to the fastopen SYN as lost. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Link: https://lore.kernel.org/r/160780498125.3272.15437756269539236825.stgit@localhost.localdomain Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process ↵Vladimir Oltean3-3/+86
context Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has a very nice ocelot_mact_wait_for_completion at the end. Introduced in commit 639c1b2625af ("net: mscc: ocelot: Register poll timeout should be wall time not attempts"), this function uses readx_poll_timeout which triggers a lot of lockdep warnings and is also dangerous to use from atomic context, potentially leading to lockups and panics. Steen Hegelund added a poll timeout of 100 ms for checking the MAC table, a duration which is clearly absurd to poll in atomic context. So we need to defer the MAC table access to process context, which we do via a dynamically allocated workqueue which contains all there is to know about the MAC table operation it has to do. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>