blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-24	mm/swap.c: don't pass "enum lru_list" to del_page_from_lru_list()	Yu Zhao	1	-2/+3
	The parameter is redundant in the sense that it can be potentially extracted from the "struct page" parameter by page_lru(). We need to make sure that existing PageActive() or PageUnevictable() remains until the function returns. A few places don't conform, and simple reordering fixes them. This patch may have left page_off_lru() seemingly odd, and we'll take care of it in the next patch. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Cc: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/swap.c: don't pass "enum lru_list" to trace_mm_lru_insertion()	Yu Zhao	1	-7/+4
	The parameter is redundant in the sense that it can be extracted from the "struct page" parameter by page_lru() correctly. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Reviewed-by: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: don't pass "enum lru_list" to lru list addition functions	Yu Zhao	1	-2/+6
	The "enum lru_list" parameter to add_page_to_lru_list() and add_page_to_lru_list_tail() is redundant in the sense that it can be extracted from the "struct page" parameter by page_lru(). A caveat is that we need to make sure PageActive() or PageUnevictable() is correctly set or cleared before calling these two functions. And they are indeed. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Cc: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	include/linux/mm_inline.h: shuffle lru list addition and deletion functions	Yu Zhao	1	-21/+21
	These functions will call page_lru() in the following patches. Move them below page_lru() to avoid the forward declaration. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/vmscan: __isolate_lru_page_prepare() cleanup	Alex Shi	1	-1/+1
	The function just returns 2 results, so using a 'switch' to deal with its result is unnecessary. Also simplify it to a bool func as Vlastimil suggested. Also remove 'goto' by reusing list_move(), and take Matthew Wilcox's suggestion to update comments in function. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Shi <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/pmem: avoid inserting hugepage PTE entry with fsdax if hugepage support ↵	Aneesh Kumar K.V	1	-6/+9
	is disabled Differentiate between hardware not supporting hugepages and user disabling THP via 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' For the devdax namespace, the kernel handles the above via the supported_alignment attribute and failing to initialize the namespace if the namespace align value is not supported on the platform. For the fsdax namespace, the kernel will continue to initialize the namespace. This can result in the kernel creating a huge pte entry even though the hardware don't support the same. We do want hugepage support with pmem even if the end-user disabled THP via sysfs file (/sys/kernel/mm/transparent_hugepage/enabled). Hence differentiate between hardware/firmware lacking support vs user-controlled disable of THP and prevent a huge fault if the hardware lacks hugepage support. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Jan Kara <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/hugetlb: fix some comment typos	Miaohe Lin	1	-1/+1
	Fix typos sasitfy to satisfy, reservtion to reservation, hugegpage to hugepage and uniprocesor to uniprocessor in comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Souptick Joarder <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/hugetlb: grab head page refcount once for group of subpages	Joao Martins	1	-0/+3
	Patch series "mm/hugetlb: follow_hugetlb_page() improvements", v2. While looking at ZONE_DEVICE struct page reuse particularly the last patch[0], I found two possible improvements for follow_hugetlb_page() which is solely used for get_user_pages()/pin_user_pages(). The first patch batches page refcount updates while the second tidies up storing the subpages/vmas. Both together bring the cost of slow variant of gup() cost from ~87.6k usecs to ~5.8k usecs. libhugetlbfs tests seem to pass as well gup_test benchmarks with hugetlbfs vmas. This patch (of 2): follow_hugetlb_page() once it locks the pmd/pud, checks all its N subpages in a huge page and grabs a reference for each one. Similar to gup-fast, have follow_hugetlb_page() grab the head page refcount only after counting all its subpages that are part of the just faulted huge page. Consequently we reduce the number of atomics necessary to pin said huge page, which improves non-fast gup() considerably: - 16G with 1G huge page size gup_test -f /mnt/huge/file -m 16384 -r 10 -L -S -n 512 -w PIN_LONGTERM_BENCHMARK: ~87.6k us -> ~12.8k us Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/gfp: add kernel-doc for gfp_t	Matthew Wilcox (Oracle)	1	-0/+14
	The generated html will link to the definition of the gfp_t automatically once we define it. Move the one-paragraph overview of GFP flags from the documentation directory into gfp.h and pull gfp.h into the documentation. This generates warnings with clang (https://lkml.kernel.org/r/20210219195509.GA59987@24bbad8f3778), so use a #if 0 to hide it from the compiler for now. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Nathan Chancellor <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: simplify free_highmem_page() and free_reserved_page()	David Hildenbrand	2	-19/+2
	adjust_managed_page_count() as called by free_reserved_page() properly handles pages in a highmem zone, so we can reuse it for free_highmem_page(). We can now get rid of totalhigh_pages_inc() and simplify free_reserved_page(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Cc: "Gustavo A. R. Silva" <[email protected]> Cc: Sam Ravnborg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: simplify parater of function memmap_init_zone()	Baoquan He	1	-2/+1
	As David suggested, simply passing 'struct zone zone' is enough. We can get all needed information from 'struct zone' easily. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: rename memmap_init() and memmap_init_zone()	Baoquan He	1	-2/+2
	The current memmap_init_zone() only handles memory region inside one zone, actually memmap_init() does the memmap init of one zone. So rename both of them accordingly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: fix prototype warning from kernel test robot	Baoquan He	1	-0/+2
	Patch series "mm: clean up names and parameters of memmap_init_xxxx functions", v5. This patchset corrects inappropriate function names of memmap_init_xxx, and simplify parameters of functions in the code flow. And also fix a prototype warning reported by lkp. This patch (of 5); Kernel test robot calling make with 'W=1' is triggering warning like below for memmap_init_zone() function. mm/page_alloc.c:6259:23: warning: no previous prototype for 'memmap_init_zone' [-Wmissing-prototypes] 6259 \| void __meminit __weak memmap_init_zone(unsigned long size, int nid, \| ^~~~~~~~~~~~~~~~ Fix it by adding the function declaration in include/linux/mm.h. Since memmap_init_zone() has a generic version with '__weak', the declaratoin in ia64 header file can be simply removed. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	kasan: fix bug detection via ksize for HW_TAGS mode	Andrey Konovalov	2	-0/+23
	The currently existing kasan_check_read/write() annotations are intended to be used for kernel modules that have KASAN compiler instrumentation disabled. Thus, they are only relevant for the software KASAN modes that rely on compiler instrumentation. However there's another use case for these annotations: ksize() checks that the object passed to it is indeed accessible before unpoisoning the whole object. This is currently done via __kasan_check_read(), which is compiled away for the hardware tag-based mode that doesn't rely on compiler instrumentation. This leads to KASAN missing detecting some memory corruptions. Provide another annotation called kasan_check_byte() that is available for all KASAN modes. As the implementation rename and reuse kasan_check_invalid_free(). Use this new annotation in ksize(). To avoid having ksize() as the top frame in the reported stack trace pass _RET_IP_ to __kasan_check_byte(). Also add a new ksize_uaf() test that checks that a use-after-free is detected via ksize() itself, and via plain accesses that happen later. Link: https://linux-review.googlesource.com/id/Iaabf771881d0f9ce1b969f2a62938e99d3308ec5 Link: https://lkml.kernel.org/r/f32ad74a60b28d8402482a38476f02bb7600f620.1610733117.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Marco Elver <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Kevin Brodsky <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	kasan: move _RET_IP_ to inline wrappers	Andrey Konovalov	1	-11/+9
	Generic mm functions that call KASAN annotations that might report a bug pass _RET_IP_ to them as an argument. This allows KASAN to include the name of the function that called the mm function in its report's header. Now that KASAN has inline wrappers for all of its annotations, move _RET_IP_ to those wrappers to simplify annotation call sites. Link: https://linux-review.googlesource.com/id/I8fb3c06d49671305ee184175a39591bc26647a67 Link: https://lkml.kernel.org/r/5c1490eddf20b436b8c4eeea83fce47687d5e4a4.1610733117.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Marco Elver <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Kevin Brodsky <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	fs: buffer: use raw page_memcg() on locked page	Johannes Weiner	1	-7/+0
	alloc_page_buffers() currently uses get_mem_cgroup_from_page() for charging the buffers to the page owner, which does an rcu-protected page->memcg lookup and acquires a reference. But buffer allocation has the page lock held throughout, which pins the page to the memcg and thereby the memcg - neither rcu nor holding an extra reference during the allocation are necessary. Use a raw page_memcg() instead. This was the last user of get_mem_cgroup_from_page(), delete it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Muchun Song <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: page_counter: re-layout structure to reduce false sharing	Feng Tang	1	-1/+8
	When checking a memory cgroup related performance regression [1], from the perf c2c profiling data, we found high false sharing for accessing 'usage' and 'parent'. On 64 bit system, the 'usage' and 'parent' are close to each other, and easy to be in one cacheline (for cacheline size == 64+ B). 'usage' is usally written, while 'parent' is usually read as the cgroup's hierarchical counting nature. So move the 'parent' to the end of the structure to make sure they are in different cache lines. Following are some performance data with the patch, against v5.11-rc1. [ In the data, A means a platform with 2 sockets 48C/96T, B is a platform of 4 sockests 72C/144T, and if a %stddev will be shown bigger than 2%, P100/P50 means number of test tasks equals to 100%/50% of nr_cpu] will-it-scale/malloc1 --------------------- v5.11-rc1 v5.11-rc1+patch A-P100 15782 ± 2% -0.1% 15765 ± 3% will-it-scale.per_process_ops A-P50 21511 +8.9% 23432 will-it-scale.per_process_ops B-P100 9155 +2.2% 9357 will-it-scale.per_process_ops B-P50 10967 +7.1% 11751 ± 2% will-it-scale.per_process_ops will-it-scale/pagefault2 ------------------------ v5.11-rc1 v5.11-rc1+patch A-P100 79028 +3.0% 81411 will-it-scale.per_process_ops A-P50 183960 ± 2% +4.4% 192078 ± 2% will-it-scale.per_process_ops B-P100 85966 +9.9% 94467 ± 3% will-it-scale.per_process_ops B-P50 198195 +9.8% 217526 will-it-scale.per_process_ops fio (4k/1M is block size) ------------------------- v5.11-rc1 v5.11-rc1+patch A-P50-r-4k 16881 ± 2% +1.2% 17081 ± 2% fio.read_bw_MBps A-P50-w-4k 3931 +4.5% 4111 ± 2% fio.write_bw_MBps A-P50-r-1M 15178 -0.2% 15154 fio.read_bw_MBps A-P50-w-1M 3924 +0.1% 3929 fio.write_bw_MBps [1].https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Feng Tang <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: kmem: make __memcg_kmem_(un)charge static	Roman Gushchin	1	-3/+0
	I've noticed that __memcg_kmem_charge() and __memcg_kmem_uncharge() are not used anywhere except memcontrol.c. Yet they are not declared as non-static and are declared in memcontrol.h. This patch makes them static. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: memcg: add swapcache stat for memcg v2	Shakeel Butt	2	-1/+8
	This patch adds swapcache stat for the cgroup v2. The swapcache represents the memory that is accounted against both the memory and the swap limit of the cgroup. The main motivation behind exposing the swapcache stat is for enabling users to gracefully migrate from cgroup v1's memsw counter to cgroup v2's memory and swap counters. Cgroup v1's memsw limit allows users to limit the memory+swap usage of a workload but without control on the exact proportion of memory and swap. Cgroup v2 provides separate limits for memory and swap which enables more control on the exact usage of memory and swap individually for the workload. With some little subtleties, the v1's memsw limit can be switched with the sum of the v2's memory and swap limits. However the alternative for memsw usage is not yet available in cgroup v2. Exposing per-cgroup swapcache stat enables that alternative. Adding the memory usage and swap usage and subtracting the swapcache will approximate the memsw usage. This will help in the transparent migration of the workloads depending on memsw usage and limit to v2' memory and swap counters. The reasons these applications are still interested in this approximate memsw usage are: (1) these applications are not really interested in two separate memory and swap usage metrics. A single usage metric is more simple to use and reason about for them. (2) The memsw usage metric hides the underlying system's swap setup from the applications. Applications with multiple instances running in a datacenter with heterogeneous systems (some have swap and some don't) will keep seeing a consistent view of their usage. [[email protected]: fix CONFIG_SWAP=n build] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Muchun Song <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: memcontrol: convert NR_FILE_PMDMAPPED account to pages	Muchun Song	1	-1/+2
	Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_FILE_PMDMAPPED account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Feng Tang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: NeilBrown <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Rafael. J. Wysocki <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: memcontrol: convert NR_SHMEM_PMDMAPPED account to pages	Muchun Song	1	-1/+2
	Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_SHMEM_PMDMAPPED account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Feng Tang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: NeilBrown <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Rafael. J. Wysocki <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: memcontrol: convert NR_SHMEM_THPS account to pages	Muchun Song	1	-1/+2
	Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_SHMEM_THPS account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Feng Tang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: NeilBrown <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Rafael. J. Wysocki <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: memcontrol: convert NR_FILE_THPS account to pages	Muchun Song	1	-1/+2
	Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with if hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_FILE_THPS account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Feng Tang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: NeilBrown <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Rafael. J. Wysocki <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: memcontrol: convert NR_ANON_THPS account to pages	Muchun Song	1	-0/+13
	Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_ANON_THPS account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Rafael. J. Wysocki <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Feng Tang <[email protected]> Cc: NeilBrown <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pankaj Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: memcontrol: optimize per-lruvec stats counter memory usage	Muchun Song	1	-2/+12
	The vmstat threshold is 32 (MEMCG_CHARGE_BATCH), Actually the threshold can be as big as MEMCG_CHARGE_BATCH * PAGE_SIZE. It still fits into s32. So introduce struct batched_lruvec_stat to optimize memory usage. The size of struct lruvec_stat is 304 bytes on 64 bit systems. As it is a per-cpu structure. So with this patch, we can save 304 / 2 * ncpu bytes per-memcg per-node where ncpu is the number of the possible CPU. If there are c memory cgroup (include dying cgroup) and n NUMA node in the system. Finally, we can save (152 * ncpu * c * n) bytes. [[email protected]: fix typo in comment] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Chris Down <[email protected]> Cc: Yafang Shao <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT	Roman Gushchin	1	-19/+0
	In general it's unknown in advance if a slab page will contain accounted objects or not. In order to avoid memory waste, an obj_cgroup vector is allocated dynamically when a need to account of a new object arises. Such approach is memory efficient, but requires an expensive cmpxchg() to set up the memcg/objcgs pointer, because an allocation can race with a different allocation on another cpu. But in some common cases it's known for sure that a slab page will contain accounted objects: if the page belongs to a slab cache with a SLAB_ACCOUNT flag set. It includes such popular objects like vm_area_struct, anon_vma, task_struct, etc. In such cases we can pre-allocate the objcgs vector and simple assign it to the page without any atomic operations, because at this early stage the page is not visible to anyone else. A very simplistic benchmark (allocating 10000000 64-bytes objects in a row) shows ~15% win. In the real life it seems that most workloads are not very sensitive to the speed of (accounted) slab allocations. [[email protected]: open-code set_page_objcgs() and add some comments, by Johannes] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix it for mm-slub-call-account_slab_page-after-slab-page-initialization-fix.patch] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/filemap: rename generic_file_buffered_read to filemap_read	Christoph Hellwig	1	-2/+2
	Rename generic_file_buffered_read to match the naming of filemap_fault, also update the written parameter to a more descriptive name and improve the kerneldoc comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/filemap: pass a sleep state to put_and_wait_on_page_locked	Matthew Wilcox (Oracle)	1	-2/+1
	This is prep work for the next patch, but I think at least one of the current callers would prefer a killable sleep to an uninterruptible one. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/filemap: remove unused parameter and change to void type for ↵	Baolin Wang	1	-1/+1
	replace_page_cache_page() Since commit 74d609585d8b ("page cache: Add and replace pages using the XArray") was merged, the replace_page_cache_page() can not fail and always return 0, we can remove the redundant return value and void it. Moreover remove the unused gfp_mask. Link: https://lkml.kernel.org/r/609c30e5274ba15d8b90c872fd0d8ac437a9b2bb.1610071401.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miklos Szeredi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm, tracing: record slab name for kmem_cache_free()	Jacob Wen	1	-8/+16
	Currently, a trace record generated by the RCU core is as below. ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f3b49a66 It doesn't tell us what the RCU core has freed. This patch adds the slab name to trace_kmem_cache_free(). The new format is as follows. ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000037f79c8d name=dentry ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f78cb7b5 name=sock_inode_cache ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000018768985 name=pool_workqueue ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=000000006a6cb484 name=radix_tree_node We can use it to understand what the RCU core is going to free. For example, some users maybe interested in when the RCU core starts freeing reclaimable slabs like dentry to reduce memory pressure. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jacob Wen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	Merge branch 'pci/ntb'	Bjorn Helgaas	3	-8/+60
	- Account for 64-bit BARs in pci_epc_get_first_free_bar() (Kishon Vijay Abraham I) - Add pci_epc_get_next_free_bar() helper (Kishon Vijay Abraham I) - Return error codes on failure of endpoint BAR interfaces (Kishon Vijay Abraham I) - Remove unused pci_epf_match_device() (Kishon Vijay Abraham I) - Add support for secondary endpoint controller to prepare for NTB endpoint functionality (Kishon Vijay Abraham I) - Add configfs support for secondary endpoint controller (Kishon Vijay Abraham I) - Add MSI address mapping ops for NTB doorbell support (Kishon Vijay Abraham I) - Add ops for endpoint function-specific attributes (Kishon Vijay Abraham I) - Allow configfs subdirectory for endpoint function configuration (Kishon Vijay Abraham I) - Implement cadence MSI address mapping ops (Kishon Vijay Abraham I) - Configure cadence LM_EP_FUNC_CFG based on epc->function_num_map (Kishon Vijay Abraham I) - Add endpoint-side driver to provide NTB functionality (Kishon Vijay Abraham I) - Add host-side driver for generic EPF NTB functionality (Kishon Vijay Abraham I) - Document NTB endpoint functionality (Kishon Vijay Abraham I) * pci/ntb: Documentation: PCI: Add PCI endpoint NTB function user guide Documentation: PCI: Add configfs binding documentation for pci-ntb endpoint function NTB: Add support for EPF PCI Non-Transparent Bridge PCI: Add TI J721E device to PCI IDs PCI: endpoint: Add EP function driver to provide NTB functionality PCI: cadence: Configure LM_EP_FUNC_CFG based on epc->function_num_map PCI: cadence: Implement ->msi_map_irq() ops PCI: endpoint: Allow user to create sub-directory of 'EPF Device' directory PCI: endpoint: Add pci_epf_ops to expose function-specific attrs PCI: endpoint: Add pci_epc_ops to map MSI IRQ PCI: endpoint: Add support in configfs to associate two EPCs with EPF PCI: endpoint: Add support to associate secondary EPC with EPF PCI: endpoint: Remove unused pci_epf_match_device() PCI: endpoint: Make _free_bar() to return error codes on failure PCI: endpoint: Add helper API to get the 'next' unreserved BAR PCI: endpoint: Make _get_first_free_bar() take into account 64 bit BAR Documentation: PCI: Add specification for the PCI NTB function device
2021-02-24	Merge branch 'pci/misc'	Bjorn Helgaas	1	-0/+2
	- Align checking of syscall user config accessor return codes (Heiner Kallweit) - Fix "ordering" comment typos (Bjorn Helgaas) - Fix 'ARM/TEXAS INSTRUMENT KEYSTONE CLOCKSOURCE' capitalization in MAINTAINERS (Bjorn Helgaas) - Add Silicom Denmark vendor ID (Martin Hundebøll) - Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy (Junhao He) - Remove WARN_ON(in_interrupt()) (Sebastian Andrzej Siewior) * pci/misc: PCI: Remove WARN_ON(in_interrupt()) PCI: Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy PCI: Add Silicom Denmark vendor ID MAINTAINERS: Fix 'ARM/TEXAS INSTRUMENT KEYSTONE CLOCKSOURCE' capitalization Fix "ordering" comment typos PCI: Align checking of syscall user config accessors
2021-02-24	Merge tag 'rpmsg-v5.12' of ↵	Linus Torvalds	1	-2/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc Pull rpmsg updates from Bjorn Andersson: "Fix two build issues in the GLINK driver and correct some kerneldoc in the same" * tag 'rpmsg-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: rpmsg: glink: add include of header file rpmsg: glink: Guard qcom_glink_ssr_notify() with correct config rpmsg: glink: fix some kerneldoc comments
2021-02-24	Merge tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfio	Linus Torvalds	2	-0/+34
	Pull VFIO updatesfrom Alex Williamson: - Virtual address update handling (Steve Sistare) - s390/zpci fixes and cleanups (Max Gurtovoy) - Fixes for dirty bitmap handling, non-mdev page pinning, and improved pinned dirty scope tracking (Keqian Zhu) - Batched page pinning enhancement (Daniel Jordan) - Page access permission fix (Alex Williamson) * tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfio: (21 commits) vfio/type1: Batch page pinning vfio/type1: Prepare for batched pinning with struct vfio_batch vfio/type1: Change success value of vaddr_get_pfn() vfio/type1: Use follow_pte() vfio/pci: remove CONFIG_VFIO_PCI_ZDEV from Kconfig vfio/iommu_type1: Fix duplicate included kthread.h vfio-pci/zdev: fix possible segmentation fault issue vfio-pci/zdev: remove unused vdev argument vfio/pci: Fix handling of pci use accessor return codes vfio/iommu_type1: Mantain a counter for non_pinned_groups vfio/iommu_type1: Fix some sanity checks in detach group vfio/iommu_type1: Populate full dirty when detach non-pinned group vfio/type1: block on invalid vaddr vfio/type1: implement notify callback vfio: iommu driver notify callback vfio/type1: implement interfaces to update vaddr vfio/type1: massage unmap iteration vfio: interfaces to update vaddr vfio/type1: implement unmap all vfio/type1: unmap cleanup ...
2021-02-24	Merge tag 'sfi-removal-5.12-rc1' of ↵	Linus Torvalds	3	-756/+0
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull Simple Firmware Interface (SFI) support removal from Rafael Wysocki: "Drop support for depercated platforms using SFI, drop the entire support for SFI that has been long deprecated too and make some janitorial changes on top of that (Andy Shevchenko)" * tag 'sfi-removal-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: x86/platform/intel-mid: Update Copyright year and drop file names x86/platform/intel-mid: Remove unused header inclusion in intel-mid.h x86/platform/intel-mid: Drop unused __intel_mid_cpu_chip and Co. x86/platform/intel-mid: Get rid of intel_scu_ipc_legacy.h x86/PCI: Describe @reg for type1_access_ok() x86/PCI: Get rid of custom x86 model comparison sfi: Remove framework for deprecated firmware cpufreq: sfi-cpufreq: Remove driver for deprecated firmware media: atomisp: Remove unused header mfd: intel_msic: Remove driver for deprecated platform x86/apb_timer: Remove driver for deprecated platform x86/platform/intel-mid: Remove unused leftovers (vRTC) x86/platform/intel-mid: Remove unused leftovers (msic) x86/platform/intel-mid: Remove unused leftovers (msic_thermal) x86/platform/intel-mid: Remove unused leftovers (msic_power_btn) x86/platform/intel-mid: Remove unused leftovers (msic_gpio) x86/platform/intel-mid: Remove unused leftovers (msic_battery) x86/platform/intel-mid: Remove unused leftovers (msic_ocd) x86/platform/intel-mid: Remove unused leftovers (msic_audio) platform/x86: intel_scu_wdt: Drop mistakenly added const
2021-02-24	Merge tag 'char-misc-5.12-rc1' of ↵	Linus Torvalds	20	-252/+1467
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the large set of char/misc/whatever driver subsystem updates for 5.12-rc1. Over time it seems like this tree is collecting more and more tiny driver subsystems in one place, making it easier for those maintainers, which is why this is getting larger. Included in here are: - coresight driver updates - habannalabs driver updates - virtual acrn driver addition (proper acks from the x86 maintainers) - broadcom misc driver addition - speakup driver updates - soundwire driver updates - fpga driver updates - amba driver updates - mei driver updates - vfio driver updates - greybus driver updates - nvmeem driver updates - phy driver updates - mhi driver updates - interconnect driver udpates - fsl-mc bus driver updates - random driver fix - some small misc driver updates (rtsx, pvpanic, etc.) All of these have been in linux-next for a while, with the only reported issue being a merge conflict due to the dfl_device_id addition from the fpga subsystem in here" * tag 'char-misc-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits) spmi: spmi-pmic-arb: Fix hw_irq overflow Documentation: coresight: Add PID tracing description coresight: etm-perf: Support PID tracing for kernel at EL2 coresight: etm-perf: Clarify comment on perf options ACRN: update MAINTAINERS: mailing list is subscribers-only regmap: sdw-mbq: use MODULE_LICENSE("GPL") regmap: sdw: use no_pm routines for SoundWire 1.2 MBQ regmap: sdw: use _no_pm functions in regmap_read/write soundwire: intel: fix possible crash when no device is detected MAINTAINERS: replace my with email with replacements mhi: Fix double dma free uapi: map_to_7segment: Update example in documentation uio: uio_pci_generic: don't fail probe if pdev->irq equals to IRQ_NOTCONNECTED drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue firewire: replace tricky statement by two simple ones vme: make remove callback return void firmware: google: make coreboot driver's remove callback return void firmware: xilinx: Use explicit values for all enum values sample/acrn: Introduce a sample of HSM ioctl interface usage virt: acrn: Introduce an interface for Service VM to control vCPU ...
2021-02-24	Merge tag 'driver-core-5.12-rc1' of ↵	Linus Torvalds	4	-4/+29
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / debugfs update from Greg KH: "Here is the "big" driver core and debugfs update for 5.12-rc1 This set of driver core patches caused a bunch of problems in linux-next for the past few weeks, when Saravana tried to set fw_devlink=on as the default functionality. This caused a number of systems to stop booting, and lots of bugs were fixed in this area for almost all of the reported systems, but this option is not ready to be turned on just yet for the default operation based on this testing, so I've reverted that change at the very end so we don't have to worry about regressions in 5.12 We will try to turn this on for 5.13 if testing goes better over the next few months. Other than the fixes caused by the fw_devlink testing in here, there's not much more: - debugfs fixes for invalid input into debugfs_lookup() - kerneldoc cleanups - warn message if platform drivers return an error on their remove callback (a futile effort, but good to catch). All of these have been in linux-next for a while now, and the regressions have gone away with the revert of the fw_devlink change" * tag 'driver-core-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits) Revert "driver core: Set fw_devlink=on by default" of: property: fw_devlink: Ignore interrupts property for some configs debugfs: do not attempt to create a new file before the filesystem is initalized debugfs: be more robust at handling improper input in debugfs_lookup() driver core: auxiliary bus: Fix calling stage for auxiliary bus init of: irq: Fix the return value for of_irq_parse_one() stub of: irq: make a stub for of_irq_parse_one() clk: Mark fwnodes when their clock provider is added/removed PM: domains: Mark fwnodes when their powerdomain is added/removed irqdomain: Mark fwnodes when their irqdomain is added/removed driver core: fw_devlink: Handle suppliers that don't use driver core of: property: Add fw_devlink support for optional properties driver core: Add fw_devlink.strict kernel param of: property: Don't add links to absent suppliers driver core: fw_devlink: Detect supplier devices that will never be added driver core: platform: Emit a warning if a remove callback returned non-zero of: property: Fix fw_devlink handling of interrupts/interrupts-extended gpiolib: Don't probe gpio_device if it's not the primary device device.h: Remove bogus "the" in kerneldoc gpiolib: Bind gpio_device to a driver to enable fw_devlink=on by default ...
2021-02-24	Merge tag 'dma-mapping-5.12' of git://git.infradead.org/users/hch/dma-mapping	Linus Torvalds	2	-9/+13
	Pull dma-mapping updates from Christoph Hellwig: - add support to emulate processing delays in the DMA API benchmark selftest (Barry Song) - remove support for non-contiguous noncoherent allocations, which aren't used and will be replaced by a different API * tag 'dma-mapping-5.12' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: remove the {alloc,free}_noncoherent methods dma-mapping: benchmark: pretend DMA is transmitting
2021-02-24	Merge tag 'cxl-for-5.12' of ↵	Linus Torvalds	2	-0/+173
	git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull initial support for CXL (Compute Express Link) from Dan Williams: "Introduce an initial driver for CXL 2.0 Type-3 Memory Devices. CXL is Compute Express Link which released the 2.0 specification in November. The Linux relevant changes in CXL 2.0 are support for an OS to dynamically assign address space to memory devices, support for switches, persistent memory, and hotplug. A Type-3 Memory Device is a PCI enumerated device presenting the CXL Memory Device Class Code and implementing the CXL.mem protocol. CXL.mem allows device to advertise CPU and I/O coherent memory to the system, i.e. typical "System RAM" and "Persistent Memory" in Linux /proc/iomem terms. In addition to the CXL.mem fast path there is an administrative command hardware mailbox interface for maintenance and provisioning. It is this command interface that is the focus of the initial driver. With this driver a CXL device that is mapped by the BIOS can be administered by Linux. Linux support for CXL PMEM and dynamic CXL address space management are to be implemented post v5.12" Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> 4cdadfd5e0a7 ("cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints") 13237183c735 ("cxl/mem: Add a "RAW" send command") 472b1ce6e9d6 ("cxl/mem: Enable commands via CEL") 57ee605b976c ("cxl/mem: Add set of informational commands") Reviewed-by: Jonathan Cameron <[email protected]> 8adaf747c9f0 ("cxl/mem: Find device capabilities") b39cb1052a5c ("cxl/mem: Register CXL memX devices") * tag 'cxl-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: cxl/mem: Fix potential memory leak cxl/mem: Return -EFAULT if copy_to_user() fails MAINTAINERS: Add maintainers of the CXL driver cxl/mem: Add set of informational commands cxl/mem: Enable commands via CEL cxl/mem: Add a "RAW" send command cxl/mem: Add basic IOCTL interface cxl/mem: Register CXL memX devices cxl/mem: Find device capabilities cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints
2021-02-24	Merge tag 'libnvdimm-for-5.12' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and device-dax updates from Dan Williams: - Fix the error code polarity for the device-dax/mapping attribute - For the device-dax and libnvdimm bus implementations stop implementing a useless return code for the remove() callback. - Miscellaneous cleanups * tag 'libnvdimm-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax-device: Make remove callback return void device-dax: Drop an empty .remove callback device-dax: Fix error path in dax_driver_register device-dax: Properly handle drivers without remove callback device-dax: Prevent registering drivers without probe callback libnvdimm: Make remove callback return void libnvdimm/dimm: Simplify nvdimm_remove() device-dax: Fix default return code of range_parse()
2021-02-24	drm/drm_vblank: set the dma-fence timestamp during send_vblank_event	Veera Sundaram Sankaran	1	-0/+3
	The explicit out-fences in crtc are signaled as part of vblank event, indicating all framebuffers present on the Atomic Commit request are scanned out on the screen. Though the fence signal and the vblank event notification happens at the same time, triggered by the same hardware vsync event, the timestamp set in both are different. With drivers supporting precise vblank timestamp the difference between the two timestamps would be even higher. This might have an impact on use-mode frameworks using these fence timestamps for purposes other than simple buffer usage. For instance, the Android framework [1] uses the retire-fences as an alternative to vblank when frame-updates are in progress. Set the fence timestamp during send vblank event using a new drm_send_event_timestamp_locked variant to avoid discrepancies. [1] https://android.googlesource.com/platform/frameworks/native/+/master/ services/surfaceflinger/Scheduler/Scheduler.cpp#397 Changes in v2: - Use drm_send_event_timestamp_locked to update fence timestamp - add more information to commit text Changes in v3: - use same backend helper function for variants of drm_send_event to avoid code duplications Changes in v4: - remove WARN_ON from drm_send_event_timestamp_locked Signed-off-by: Veera Sundaram Sankaran <[email protected]> Reviewed-by: John Stultz <[email protected]> Signed-off-by: Sumit Semwal <[email protected]> [sumits: minor parenthesis alignment correction] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit a78e7a51d2fa9d2f482b462be4299784c884d988) Signed-off-by: Sumit Semwal <[email protected]>
2021-02-24	dma-fence: allow signaling drivers to set fence timestamp	Veera Sundaram Sankaran	1	-0/+3
	Some drivers have hardware capability to get the precise HW timestamp of certain events based on which the fences are triggered. The delta between the event HW timestamp & current HW reference timestamp can be used to calculate the timestamp in kernel's CLOCK_MONOTONIC time domain. This allows it to set accurate timestamp factoring out any software and IRQ latencies. Add a timestamp variant of fence signal function, dma_fence_signal_timestamp to allow drivers to update the precise timestamp for fences. Changes in v2: - Add a new fence signal variant instead of modifying fence struct Changes in v3: - Add timestamp domain information to commit-text and dma_fence_signal_timestamp documentation Signed-off-by: Veera Sundaram Sankaran <[email protected]> Reviewed-by: John Stultz <[email protected]> Signed-off-by: Sumit Semwal <[email protected]> [sumits: minor parenthesis alignment] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 5a164ac4dbd21b82bcdc03186d40e455ff467fdc) Signed-off-by: Sumit Semwal <[email protected]>
2021-02-24	dma-buf: heaps: Rework heap allocation hooks to return struct dma_buf ↵	John Stultz	1	-6/+6
	instead of fd Every heap needs to create a dmabuf and then export it to a fd via dma_buf_fd(), so to consolidate things a bit, have the heaps just return a struct dmabuf * and let the top level dma_heap_buffer_alloc() call handle creating the fd via dma_buf_fd(). Cc: Sumit Semwal <[email protected]> Cc: Liam Mark <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Brian Starkey <[email protected]> Cc: Hridya Valsaraju <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sandeep Patil <[email protected]> Cc: Daniel Mentz <[email protected]> Cc: Chris Goldsworthy <[email protected]> Cc: Ørjan Eide <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Ezequiel Garcia <[email protected]> Cc: Simon Ser <[email protected]> Cc: James Jones <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: John Stultz <[email protected]> Signed-off-by: Sumit Semwal <[email protected]> [sumits: minor reword of commit message] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit c7f59e3dd60313071a989227dcb69094f499d310) Signed-off-by: Sumit Semwal <[email protected]>
2021-02-24	ACPI: platform: Add balanced-performance platform profile	Maximilian Luz	1	-0/+1
	Some devices, including most Microsoft Surface devices, have a platform profile somewhere inbetween balanced and performance. More specifically, adding this profile allows the following mapping on Surface devices: Vendor Name Platform Profile ------------------------------------------ Battery Saver low-power Recommended balanced Better Performance balanced-performance Best Performance performance Suggested-by: Hans de Goede <[email protected]> Signed-off-by: Maximilian Luz <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2021-02-24	ACPI: platform: Fix file references in comment	Maximilian Luz	1	-3/+2
	The referenced files are named slightly different. Replace '-' with '_' and drop the .rst ending. Signed-off-by: Maximilian Luz <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2021-02-23	io_uring: ensure io-wq context is always destroyed for tasks	Jens Axboe	1	-1/+1
	If the task ends up doing no IO, the context list is empty and we don't call into __io_uring_files_cancel() when the task exits. This can cause a leak of the io-wq structures. Ensure we always call __io_uring_files_cancel(), even if the task context list is empty. Fixes: 5aa75ed5b93f ("io_uring: tie async worker side to the task context") Signed-off-by: Jens Axboe <[email protected]>
2021-02-23	io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS	Jens Axboe	1	-0/+1
	A few reasons to do this: - The naming of the manager and worker have changed. That's a user visible change, so makes sense to flag it. - Opening certain files that use ->signal (like /proc/self or /dev/tty) now works, and the flag tells the application upfront that this is the case. - Related to the above, using signalfd will now work as well. Signed-off-by: Jens Axboe <[email protected]>
2021-02-23	net: remove cmsg restriction from io_uring based send/recvmsg calls	Jens Axboe	1	-3/+0
	No need to restrict these anymore, as the worker threads are direct clones of the original task. Hence we know for a fact that we can support anything that the regular task can. Since the only user of proto_ops->flags was to flag PROTO_CMSG_DATA_ONLY, kill the member and the flag definition too. Signed-off-by: Jens Axboe <[email protected]>
2021-02-23	Merge tag 'keys-misc-20210126' of ↵	Linus Torvalds	4	-4/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring updates from David Howells: "Here's a set of minor keyrings fixes/cleanups that I've collected from various people for the upcoming merge window. A couple of them might, in theory, be visible to userspace: - Make blacklist_vet_description() reject uppercase letters as they don't match the all-lowercase hex string generated for a blacklist search. This may want reconsideration in the future, but, currently, you can't add to the blacklist keyring from userspace and the only source of blacklist keys generates lowercase descriptions. - Fix blacklist_init() to use a new KEY_ALLOC_* flag to indicate that it wants KEY_FLAG_KEEP to be set rather than passing KEY_FLAG_KEEP into keyring_alloc() as KEY_FLAG_KEEP isn't a valid alloc flag. This isn't currently a problem as the blacklist keyring isn't currently writable by userspace. The rest of the patches are cleanups and I don't think they should have any visible effect" * tag 'keys-misc-20210126' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: watch_queue: rectify kernel-doc for init_watch() certs: Replace K{U,G}IDT_INIT() with GLOBAL_ROOT_{U,G}ID certs: Fix blacklist flag type confusion PKCS#7: Fix missing include certs: Fix blacklisted hexadecimal hash string check certs/blacklist: fix kernel doc interface issue crypto: public_key: Remove redundant header file from public_key.h keys: remove trailing semicolon in macro definition crypto: pkcs7: Use match_string() helper to simplify the code PKCS#7: drop function from kernel-doc pkcs7_validate_trust_one encrypted-keys: Replace HTTP links with HTTPS ones crypto: asymmetric_keys: fix some comments in pkcs7_parser.h KEYS: remove redundant memset security: keys: delete repeated words in comments KEYS: asymmetric: Fix kerneldoc security/keys: use kvfree_sensitive() watch_queue: Drop references to /dev/watch_queue keys: Remove outdated __user annotations security: keys: Fix fall-through warnings for Clang
2021-02-24	s390/cpumf: Add support for complete counter set extraction	Thomas Richter	1	-0/+1
	Add support to the CPU Measurement counter facility device driver to extract complete counter sets per CPU and per counter set from user space. This includes a new device named /dev/hwctr and support for the device driver functions open, close and ioctl. Other functions are not supported. The ioctl command supports 3 subcommands: S390_HWCTR_START: enables counter sets on a list of CPUs. S390_HWCTR_STOP: disables counter sets on a list of CPUs. S390_HWCTR_READ: reads counter sets on a list of CPUs. The ioctl(..., S390_HWCTR_READ, ...) is the only subcommand which returns data. It requires member data_bytes to be positive and indicates the maximum amount of data available to store counter set data. The other ioctl() subcommands do not use this member and it should be set to zero. The S390_HWCTR_READ subcommand returns the following data: The cpuset data is flattened using the following scheme, stored in member data: 0x0 0x8 0xc 0x10 0x10 0x18 0x20 0x28 0xU-1 +---------+-----+---------+-----+---------+-----+-----+------+------+ \| no_cpus \| cpu \| no_sets \| set \| no_cnts \| cv1 \| cv2 \| .... \| cv_n \| +---------+-----+---------+-----+---------+-----+-----+------+------+ 0xU 0xU+4 0xU+8 0xU+10 0xV-1 +-----+---------+-----+-----+------+------+ \| set \| no_cnts \| cv1 \| cv2 \| .... \| cv_n \| +-----+---------+-----+-----+------+------+ 0xV 0xV+4 0xV+8 0xV+c +-----+---------+-----+---------+-----+-----+------+------+ \| cpu \| no_sets \| set \| no_cnts \| cv1 \| cv2 \| .... \| cv_n \| +-----+---------+-----+---------+-----+-----+------+------+ U and V denote arbitrary hexadezimal addresses. The first integer represents the number of CPUs data was extracted from. This is followed by CPU number and number of counter sets extracted. Both are two integer values. This is followed by the set identifer and number of counters extracted. Both are two integer values. This is followed by the counter values, each element is eight bytes in size. The S390_HWCTR_READ ioctl subcommand is also limited to one call per minute. This ensures that an application does not read out the counter sets too often and reduces the overall CPU performance. The complete counter set extraction is an expensive operation. Reviewed-by: Sumanth Korikkar <[email protected]> Signed-off-by: Thomas Richter <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>