aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2022-01-15include/linux/gfp.h: further document GFP_DMA32Miles Chen1-1/+3
kmalloc(..., GFP_DMA32) does not return DMA32 memory because the DMA32 kmalloc cache array is not implemented. (Reason: there is no such user in kernel). Put a short comment about this so people can understand this by reading the comment. [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miles Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: drop node from alloc_pages_vmaMichal Hocko1-4/+4
alloc_pages_vma is meant to allocate a page with a vma specific memory policy. The initial node parameter is always a local node so it is pointless to waste a function argument for this. Drop the parameter. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Feng Tang <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Dan Williams <[email protected]> Cc: "Huang, Ying" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: fix boolreturn.cocci warningChangcheng Deng1-1/+1
Return statements in functions returning bool should use true/false instead of 1/0. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Changcheng Deng <[email protected]> Reported-by: Zeal Robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: introduce memalloc_retry_wait()NeilBrown1-0/+26
Various places in the kernel - largely in filesystems - respond to a memory allocation failure by looping around and re-trying. Some of these cannot conveniently use __GFP_NOFAIL, for reasons such as: - a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on - a need to check for the process being signalled between failures - the possibility that other recovery actions could be performed - the allocation is quite deep in support code, and passing down an extra flag to say if __GFP_NOFAIL is wanted would be clumsy. Many of these currently use congestion_wait() which (in almost all cases) simply waits the given timeout - congestion isn't tracked for most devices. It isn't clear what the best delay is for loops, but it is clear that the various filesystems shouldn't be responsible for choosing a timeout. This patch introduces memalloc_retry_wait() with takes on that responsibility. Code that wants to retry a memory allocation can call this function passing the GFP flags that were used. It will wait however is appropriate. For now, it only considers __GFP_NORETRY and whatever gfpflags_allow_blocking() tests. If blocking is allowed without __GFP_NORETRY, then alloc_page either made some reclaim progress, or waited for a while, before failing. So there is no need for much further waiting. memalloc_retry_wait() will wait until the current jiffie ends. If this condition is not met, then alloc_page() won't have waited much if at all. In that case memalloc_retry_wait() waits about 200ms. This is the delay that most current loops uses. linux/sched/mm.h needs to be included in some files now, but linux/backing-dev.h does not. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: NeilBrown <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Chao Yu <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Chuck Lever <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: allow !GFP_KERNEL allocations for kvmallocMichal Hocko1-1/+0
Support for GFP_NO{FS,IO} and __GFP_NOFAIL has been implemented by previous patches so we can allow the support for kvmalloc. This will allow some external users to simplify or completely remove their helpers. GFP_NOWAIT semantic hasn't been supported so far but it hasn't been explicitly documented so let's add a note about that. ceph_kvmalloc is the first helper to be dropped and changed to kvmalloc. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ilya Dryomov <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Neil Brown <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: remove the total_mapcount argument from page_trans_huge_mapcount()Matthew Wilcox (Oracle)2-8/+4
All callers pass NULL, so we can stop calculating the value we would store in it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: William Kucharski <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: remove last argument of reuse_swap_page()Matthew Wilcox (Oracle)1-3/+3
None of the callers care about the total_map_swapcount() any more. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Linus Torvalds <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: page table checkPasha Tatashin1-0/+147
Check user page table entries at the time they are added and removed. Allows to synchronously catch memory corruption issues related to double mapping. When a pte for an anonymous page is added into page table, we verify that this pte does not already point to a file backed page, and vice versa if this is a file backed page that is being added we verify that this page does not have an anonymous mapping We also enforce that read-only sharing for anonymous pages is allowed (i.e. cow after fork). All other sharing must be for file pages. Page table check allows to protect and debug cases where "struct page" metadata became corrupted for some reason. For example, when refcnt or mapcount become invalid. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pasha Tatashin <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Greg Thelen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Muchun Song <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Xu <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: ptep_clear() page table helperPasha Tatashin1-0/+8
We have ptep_get_and_clear() and ptep_get_and_clear_full() helpers to clear PTE from user page tables, but there is no variant for simple clear of a present PTE from user page tables without using a low level pte_clear() which can be either native or para-virtualised. Add a new ptep_clear() that can be used in common code to clear PTEs from page table. We will need this call later in order to add a hook for page table check. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pasha Tatashin <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Greg Thelen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Muchun Song <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Xu <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: document locking restrictions for vm_operations_struct::closeSuren Baghdasaryan1-0/+4
Add comments for vm_operations_struct::close documenting locking requirements for this callback and its callers. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Jan Engelhardt <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tim Murray <[email protected]> Cc: Jason Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: move tlb_flush_pending inline helpers to mm_inline.hArnd Bergmann3-129/+131
linux/mm_types.h should only define structure definitions, to make it cheap to include elsewhere. The atomic_t helper function definitions are particularly large, so it's better to move the helpers using those into the existing linux/mm_inline.h and only include that where needed. As a follow-up, we may want to go through all the indirect includes in mm_types.h and reduce them as much as possible. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Cc: Al Viro <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Colin Cross <[email protected]> Cc: Kees Cook <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Eric Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: move anon_vma declarations to linux/mm_inline.hArnd Bergmann2-48/+50
The patch to add anonymous vma names causes a build failure in some configurations: include/linux/mm_types.h: In function 'is_same_vma_anon_name': include/linux/mm_types.h:924:37: error: implicit declaration of function 'strcmp' [-Werror=implicit-function-declaration] 924 | return name && vma_name && !strcmp(name, vma_name); | ^~~~~~ include/linux/mm_types.h:22:1: note: 'strcmp' is defined in header '<string.h>'; did you forget to '#include <string.h>'? This should not really be part of linux/mm_types.h in the first place, as that header is meant to only contain structure defintions and need a minimum set of indirect includes itself. While the header clearly includes more than it should at this point, let's not make it worse by including string.h as well, which would pull in the expensive (compile-speed wise) fortify-string logic. Move the new functions into a separate header that only needs to be included in a couple of locations. Link: https://lkml.kernel.org/r/[email protected] Fixes: "mm: add a field to store names for private anonymous memory" Signed-off-by: Arnd Bergmann <[email protected]> Cc: Al Viro <[email protected]> Cc: Colin Cross <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Kees Cook <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: add anonymous vma name refcountingSuren Baghdasaryan1-1/+8
While forking a process with high number (64K) of named anonymous vmas the overhead caused by strdup() is noticeable. Experiments with ARM64 Android device show up to 40% performance regression when forking a process with 64k unpopulated anonymous vmas using the max name lengths vs the same process with the same number of anonymous vmas having no name. Introduce anon_vma_name refcounted structure to avoid the overhead of copying vma names during fork() and when splitting named anonymous vmas. When a vma is duplicated, instead of copying the name we increment the refcount of this structure. Multiple vmas can point to the same anon_vma_name as long as they increment the refcount. The name member of anon_vma_name structure is assigned at structure allocation time and is never changed. If vma name changes then the refcount of the original structure is dropped, a new anon_vma_name structure is allocated to hold the new name and the vma pointer is updated to point to the new structure. With this approach the fork() performance regressions is reduced 3-4x times and with usecases using more reasonable number of VMAs (a few thousand) the regressions is not measurable. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Colin Cross <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Glauber <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Stultz <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rob Landley <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Shaohua Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: add a field to store names for private anonymous memoryColin Cross2-5/+72
In many userspace applications, and especially in VM based applications like Android uses heavily, there are multiple different allocators in use. At a minimum there is libc malloc and the stack, and in many cases there are libc malloc, the stack, direct syscalls to mmap anonymous memory, and multiple VM heaps (one for small objects, one for big objects, etc.). Each of these layers usually has its own tools to inspect its usage; malloc by compiling a debug version, the VM through heap inspection tools, and for direct syscalls there is usually no way to track them. On Android we heavily use a set of tools that use an extended version of the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped in userspace and slice their usage by process, shared (COW) vs. unique mappings, backing, etc. This can account for real physical memory usage even in cases like fork without exec (which Android uses heavily to share as many private COW pages as possible between processes), Kernel SamePage Merging, and clean zero pages. It produces a measurement of the pages that only exist in that process (USS, for unique), and a measurement of the physical memory usage of that process with the cost of shared pages being evenly split between processes that share them (PSS). If all anonymous memory is indistinguishable then figuring out the real physical memory usage (PSS) of each heap requires either a pagemap walking tool that can understand the heap debugging of every layer, or for every layer's heap debugging tools to implement the pagemap walking logic, in which case it is hard to get a consistent view of memory across the whole system. Tracking the information in userspace leads to all sorts of problems. It either needs to be stored inside the process, which means every process has to have an API to export its current heap information upon request, or it has to be stored externally in a filesystem that somebody needs to clean up on crashes. It needs to be readable while the process is still running, so it has to have some sort of synchronization with every layer of userspace. Efficiently tracking the ranges requires reimplementing something like the kernel vma trees, and linking to it from every layer of userspace. It requires more memory, more syscalls, more runtime cost, and more complexity to separately track regions that the kernel is already tracking. This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a userspace-provided name for anonymous vmas. The names of named anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>]. Userspace can set the name for a region of memory by calling prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name) Setting the name to NULL clears it. The name length limit is 80 bytes including NUL-terminator and is checked to contain only printable ascii characters (including space), except '[',']','\','$' and '`'. Ascii strings are being used to have a descriptive identifiers for vmas, which can be understood by the users reading /proc/pid/maps or /proc/pid/smaps. Names can be standardized for a given system and they can include some variable parts such as the name of the allocator or a library, tid of the thread using it, etc. The name is stored in a pointer in the shared union in vm_area_struct that points to a null terminated string. Anonymous vmas with the same name (equivalent strings) and are otherwise mergeable will be merged. The name pointers are not shared between vmas even if they contain the same name. The name pointer is stored in a union with fields that are only used on file-backed mappings, so it does not increase memory usage. CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this feature. It keeps the feature disabled by default to prevent any additional memory overhead and to avoid confusing procfs parsers on systems which are not ready to support named anonymous vmas. The patch is based on the original patch developed by Colin Cross, more specifically on its latest version [1] posted upstream by Sumit Semwal. It used a userspace pointer to store vma names. In that design, name pointers could be shared between vmas. However during the last upstreaming attempt, Kees Cook raised concerns [2] about this approach and suggested to copy the name into kernel memory space, perform validity checks [3] and store as a string referenced from vm_area_struct. One big concern is about fork() performance which would need to strdup anonymous vma names. Dave Hansen suggested experimenting with worst-case scenario of forking a process with 64k vmas having longest possible names [4]. I ran this experiment on an ARM64 Android device and recorded a worst-case regression of almost 40% when forking such a process. This regression is addressed in the followup patch which replaces the pointer to a name with a refcounted structure that allows sharing the name pointer between vmas of the same name. Instead of duplicating the string during fork() or when splitting a vma it increments the refcount. [1] https://lore.kernel.org/linux-mm/[email protected]/ [2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/ [3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/ [4] https://lore.kernel.org/linux-mm/[email protected]/ Changes for prctl(2) manual page (in the options section): PR_SET_VMA Sets an attribute specified in arg2 for virtual memory areas starting from the address specified in arg3 and spanning the size specified in arg4. arg5 specifies the value of the attribute to be set. Note that assigning an attribute to a virtual memory area might prevent it from being merged with adjacent virtual memory areas due to the difference in that attribute's value. Currently, arg2 must be one of: PR_SET_VMA_ANON_NAME Set a name for anonymous virtual memory areas. arg5 should be a pointer to a null-terminated string containing the name. The name length including null byte cannot exceed 80 bytes. If arg5 is NULL, the name of the appropriate anonymous virtual memory areas will be reset. The name can contain only printable ascii characters (including space), except '[',']','\','$' and '`'. This feature is available only if the kernel is built with the CONFIG_ANON_VMA_NAME option enabled. [[email protected]: docs: proc.rst: /proc/PID/maps: fix malformed table] Link: https://lkml.kernel.org/r/[email protected] [surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy, added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the work here was done by Colin Cross, therefore, with his permission, keeping him as the author] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Cross <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Al Viro <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Glauber <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Stultz <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rob Landley <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Shaohua Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15memcg: add per-memcg vmalloc statShakeel Butt1-0/+21
The kvmalloc* allocation functions can fallback to vmalloc allocations and more often on long running machines. In addition the kernel does have __GFP_ACCOUNT kvmalloc* calls. So, often on long running machines, the memory.stat does not tell the complete picture which type of memory is charged to the memcg. So add a per-memcg vmalloc stat. [[email protected]: page_memcg() within rcu lock, per Muchun] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: remove cast, per Muchun] [[email protected]: remove area->page[0] checks and move to page by page accounting per Michal] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/memcg: add oom_group_kill memory eventDan Schatzberg1-0/+1
Our container agent wants to know when a container exits if it was OOM killed or not to report to the user. We use memory.oom.group = 1 to ensure that OOM kills within the container's cgroup kill everything. Existing memory.events are insufficient for knowing if this triggered: 1) Our current approach reads memory.events oom_kill and reports the container was killed if the value is non-zero. This is erroneous in some cases where containers create their children cgroups with memory.oom.group=1 as such OOM kills will get counted against the container cgroup's oom_kill counter despite not actually OOM killing the entire container. 2) Reading memory.events.local will fail to identify OOM kills in leaf cgroups (that don't set memory.oom.group) within the container cgroup. This patch adds a new oom_group_kill event when memory.oom.group triggers to allow userspace to cleanly identify when an entire cgroup is oom killed. [[email protected]: changes from Johannes and Chris] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dan Schatzberg <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Chris Down <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Zefan Li <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Alex Shi <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm,fs: split dump_mapping() out from dump_page()Matthew Wilcox (Oracle)1-0/+1
dump_mapping() is a big chunk of dump_page(), and it'd be handy to be able to call it when we don't have a struct page. Split it out and move it to fs/inode.c. Take the opportunity to simplify some of the debug messages a little. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: William Kucharski <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/memremap: add ZONE_DEVICE support for compound pagesJoao Martins1-0/+11
Add a new @vmemmap_shift property for struct dev_pagemap which specifies that a devmap is composed of a set of compound pages of order @vmemmap_shift, instead of base pages. When a compound page devmap is requested, all but the first page are initialised as tail pages instead of order-0 pages. For certain ZONE_DEVICE users like device-dax which have a fixed page size, this creates an opportunity to optimize GUP and GUP-fast walkers, treating it the same way as THP or hugetlb pages. Additionally, commit 7118fc2906e2 ("hugetlb: address ref count racing in prep_compound_gigantic_page") removed set_page_count() because the setting of page ref count to zero was redundant. devmap pages don't come from page allocator though and only head page refcount is used for compound pages, hence initialize tail page count to zero. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Verma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: defer kmemleak object creation of module_alloc()Kefeng Wang2-2/+9
Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN enabled(without KASAN_VMALLOC) on x86[1]. When the module area allocates memory, it's kmemleak_object is created successfully, but the KASAN shadow memory of module allocation is not ready, so when kmemleak scan the module's pointer, it will panic due to no shadow memory with KASAN check. module_alloc __vmalloc_node_range kmemleak_vmalloc kmemleak_scan update_checksum kasan_module_alloc kmemleak_ignore Note, there is no problem if KASAN_VMALLOC enabled, the modules area entire shadow memory is preallocated. Thus, the bug only exits on ARCH which supports dynamic allocation of module area per module load, for now, only x86/arm64/s390 are involved. Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of kmemleak in module_alloc() to fix this issue. [1] https://lore.kernel.org/all/[email protected]/ [[email protected]: fix build] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: simplify ifdefs, per Andrey] Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Fixes: 793213a82de4 ("s390/kasan: dynamic shadow mem allocation for modules") Fixes: 39d114ddc682 ("arm64: add KASAN support") Fixes: bebf56a1b176 ("kasan: enable instrumentation of global variables") Signed-off-by: Kefeng Wang <[email protected]> Reported-by: Yongqiang Liu <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Kefeng Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15kthread: add the helper function kthread_run_on_cpu()Cai Huoqing1-0/+25
Add a new helper function kthread_run_on_cpu(), which includes kthread_create_on_cpu/wake_up_process(). In some cases, use kthread_run_on_cpu() directly instead of kthread_create_on_node/kthread_bind/wake_up_process() or kthread_create_on_cpu/wake_up_process() or kthreadd_create/kthread_bind/wake_up_process() to simplify the code. [[email protected]: export kthread_create_on_cpu to modules] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Cai Huoqing <[email protected]> Cc: Bernard Metzler <[email protected]> Cc: Cai Huoqing <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Doug Ledford <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-14vdpa: Avoid taking cf_mutex lock on get statusEli Cohen1-1/+0
Avoid the wrapper holding cf_mutex since it is not protecting anything. To avoid confusion and unnecessary overhead incurred by it, remove. Fixes: f489f27bc0ab ("vdpa: Sync calls set/get config/status with cf_mutex") Signed-off-by: Eli Cohen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Si-Wei Liu<[email protected]> Acked-by: Jason Wang <[email protected]>
2022-01-14vdpa: Support reporting max device capabilitiesEli Cohen1-0/+2
Add max_supported_vqs and supported_features fields to struct vdpa_mgmt_dev. Upstream drivers need to feel these values according to the device capabilities. These values are reported back in a netlink message when showing management devices. Examples: $ auxiliary/mlx5_core.sf.1: supported_classes net max_supported_vqs 257 dev_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ MQ \ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM $ vdpa -j mgmtdev show {"mgmtdev":{"auxiliary/mlx5_core.sf.1":{"supported_classes":["net"], \ "max_supported_vqs":257,"dev_features":["CSUM","GUEST_CSUM","MTU", \ "HOST_TSO4","HOST_TSO6","STATUS","CTRL_VQ","MQ","CTRL_MAC_ADDR", \ "VERSION_1","ACCESS_PLATFORM"]}}} $ vdpa -jp mgmtdev show { "mgmtdev": { "auxiliary/mlx5_core.sf.1": { "supported_classes": [ "net" ], "max_supported_vqs": 257, "dev_features": ["CSUM","GUEST_CSUM","MTU","HOST_TSO4", \ "HOST_TSO6","STATUS","CTRL_VQ","MQ", \ "CTRL_MAC_ADDR","VERSION_1","ACCESS_PLATFORM"] } } } Signed-off-by: Eli Cohen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Si-Wei Liu<[email protected]>
2022-01-14vdpa: Allow to configure max data virtqueuesEli Cohen1-3/+16
Add netlink support to configure the max virtqueue pairs for a device. At least one pair is required. The maximum is dictated by the device. Example: $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 4 Signed-off-by: Eli Cohen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-01-14vdpa: Sync calls set/get config/status with cf_mutexEli Cohen1-0/+3
Add wrappers to get/set status and protect these operations with cf_mutex to serialize these operations with respect to get/set config operations. Signed-off-by: Eli Cohen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-01-14vdpa: Provide interface to read driver featuresEli Cohen1-5/+9
Provide an interface to read the negotiated features. This is needed when building the netlink message in vdpa_dev_net_config_fill(). Also fix the implementation of vdpa_dev_net_config_fill() to use the negotiated features instead of the device features. To make APIs clearer, make the following name changes to struct vdpa_config_ops so they better describe their operations: get_features -> get_device_features set_features -> set_driver_features Finally, add get_driver_features to return the negotiated features and add implementation to all the upstream drivers. Acked-by: Jason Wang <[email protected]> Signed-off-by: Eli Cohen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-01-14vdpa: Mark vdpa_config_ops.get_vq_notification as optionalEugenio Pérez1-1/+1
Since vhost_vdpa_mmap checks for its existence before calling it. Signed-off-by: Eugenio Pérez <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]>
2022-01-14vdpa: add driver_override supportStefano Garzarella1-0/+2
`driver_override` allows to control which of the vDPA bus drivers binds to a vDPA device. If `driver_override` is not set, the previous behaviour is followed: devices use the first vDPA bus driver loaded (unless auto binding is disabled). Tested on Fedora 34 with driverctl(8): $ modprobe virtio-vdpa $ modprobe vhost-vdpa $ modprobe vdpa-sim-net $ vdpa dev add mgmtdev vdpasim_net name dev1 # dev1 is attached to the first vDPA bus driver loaded $ driverctl -b vdpa list-devices dev1 virtio_vdpa $ driverctl -b vdpa set-override dev1 vhost_vdpa $ driverctl -b vdpa list-devices dev1 vhost_vdpa [*] Note: driverctl(8) integrates with udev so the binding is preserved. Suggested-by: Jason Wang <[email protected]> Acked-by: Jason Wang <[email protected]> Signed-off-by: Stefano Garzarella <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-01-14virtio: wrap config->reset callsMichael S. Tsirkin1-0/+1
This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: Michael S. Tsirkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2022-01-14Merge tag 'char-misc-5.17-rc1' of ↵Linus Torvalds23-194/+1727
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc and other driver updates from Greg KH: "Here is the large set of char, misc, and other "small" driver subsystem changes for 5.17-rc1. Lots of different things are in here for char/misc drivers such as: - habanalabs driver updates - mei driver updates - lkdtm driver updates - vmw_vmci driver updates - android binder driver updates - other small char/misc driver updates Also smaller driver subsystems have also been updated, including: - fpga subsystem updates - iio subsystem updates - soundwire subsystem updates - extcon subsystem updates - gnss subsystem updates - phy subsystem updates - coresight subsystem updates - firmware subsystem updates - comedi subsystem updates - mhi subsystem updates - speakup subsystem updates - rapidio subsystem updates - spmi subsystem updates - virtual driver updates - counter subsystem updates Too many individual changes to summarize, the shortlog contains the full details. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (406 commits) counter: 104-quad-8: Fix use-after-free by quad8_irq_handler dt-bindings: mux: Document mux-states property dt-bindings: ti-serdes-mux: Add defines for J721S2 SoC counter: remove old and now unused registration API counter: ti-eqep: Convert to new counter registration counter: stm32-lptimer-cnt: Convert to new counter registration counter: stm32-timer-cnt: Convert to new counter registration counter: microchip-tcb-capture: Convert to new counter registration counter: ftm-quaddec: Convert to new counter registration counter: intel-qep: Convert to new counter registration counter: interrupt-cnt: Convert to new counter registration counter: 104-quad-8: Convert to new counter registration counter: Update documentation for new counter registration functions counter: Provide alternative counter registration functions counter: stm32-timer-cnt: Convert to counter_priv() wrapper counter: stm32-lptimer-cnt: Convert to counter_priv() wrapper counter: ti-eqep: Convert to counter_priv() wrapper counter: ftm-quaddec: Convert to counter_priv() wrapper counter: intel-qep: Convert to counter_priv() wrapper counter: microchip-tcb-capture: Convert to counter_priv() wrapper ...
2022-01-14Merge tag 'powerpc-5.17-1' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Optimise radix KVM guest entry/exit by 2x on Power9/Power10. - Allow firmware to tell us whether to disable the entry and uaccess flushes on Power10 or later CPUs. - Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits. - Several fixes and improvements to our hard lockup watchdog. - Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit. - Allow building the 64-bit Book3S kernel without hash MMU support, ie. Radix only. - Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit). - Add new encodings for perf_mem_data_src.mem_hops field, and use them on Power10. - A series of small performance improvements to 64-bit interrupt entry. - Several commits fixing issues when building with the clang integrated assembler. - Many other small features and fixes. Thanks to Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell, Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christophe JAILLET, Christophe Leroy, Christoph Hellwig, Daniel Axtens, David Yang, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren, Hari Bathini, Jason Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child, Oliver O'Halloran, Peiwei Hu, Randy Dunlap, Ravi Bangoria, Rob Herring, Russell Currey, Sachin Sant, Sean Christopherson, Segher Boessenkool, Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang wangx, and Yang Guang. * tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (240 commits) powerpc/xmon: Dump XIVE information for online-only processors. powerpc/opal: use default_groups in kobj_type powerpc/cacheinfo: use default_groups in kobj_type powerpc/sched: Remove unused TASK_SIZE_OF powerpc/xive: Add missing null check after calling kmalloc powerpc/floppy: Remove usage of the deprecated "pci-dma-compat.h" API selftests/powerpc: Add a test of sigreturning to an unaligned address powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings powerpc/64s: Mask NIP before checking against SRR0 powerpc/perf: Fix spelling of "its" powerpc/32: Fix boot failure with GCC latent entropy plugin powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests powerpc/code-patching: Move code patching selftests in its own file powerpc/code-patching: Move instr_is_branch_{i/b}form() in code-patching.h powerpc/code-patching: Move patch_exception() outside code-patching.c powerpc/code-patching: Use test_trampoline for prefixed patch test powerpc/code-patching: Fix patch_branch() return on out-of-range failure powerpc/code-patching: Reorganise do_patch_instruction() to ease error handling powerpc/code-patching: Fix unmap_patch_area() error handling powerpc/code-patching: Fix error handling in do_patch_instruction() ...
2022-01-14Merge tag 'sound-5.17-rc1' of ↵Linus Torvalds6-11/+42
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "It's a relatively calm development cycle, but still lots of updates in the driver side like Intel SOF. Below are some highlights: ALSA / ASoC core: - A new kselftest for ALSA control API - PCM NO_REWINDS support - Potential race fixes around control removals - Unify x86 SG-buffer memory allocation code - Cleanups and race fixes for ASoC DPCM locking ASoC: - Refinements and cleanups around the delay() APIs - Wider use of dev_err_probe(). - Continuing cleanups and improvements to the SOF code - Support for pin switches in simple-card derived cards - Support for AMD Renoir ACP, Asahi Kasei Microdevices AKM4375, Intel systems using NAU8825 and MAX98390, Mediatek MT8915, nVidia Tegra20 S/PDIF, Qualcomm systems using ALC5682I-VS and Texas Instruments TLV320ADC3xxx HD-audio / USB-audio: - Fix deadlock at HD-audio codec unbinding - Fixes for Tegra194 HD-audio, new HDA support for CS35L41 codec - Quirks for Lenovo and HP machines, Gigabyte mobo, Bose device Misc: - Fix virmidi drain behavior Note that the merge of CS35L41 codec support is still half-baked, and at least one ACPI change is missing. Although this won't hinder the kernel build itself, we're going to catch up before RC1" * tag 'sound-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (415 commits) ALSA: hda: intel-dsp-config: reorder the config table ALSA: hda: intel-dsp-config: add JasperLake support ALSA: hda: cs35l41: fix double free on error in probe() ALSA: hda: Fix dependencies of CS35L41 on SPI/I2C buses ALSA: hda: Fix dependency on ASoC cs35l41 codec ASoC: cs35l41: Add support for hibernate memory retention mode ASoC: cs35l41: Update handling of test key registers ALSA: intel_hdmi: Check for error num after setting mask ASoC: wcd9335: Keep a RX port value for each SLIM RX mux ASoC: amd: acp: acp-mach: Change default RT1019 amp dev id ALSA: virmidi: Remove duplicated code ALSA: seq: virmidi: Add a drain operation ASoC: topology: Fix typo ASoC: fsl_asrc: refine the check of available clock divider ASoC: Intel: bytcr_rt5640: Add support for external GPIO jack-detect ASoC: Intel: bytcr_rt5640: Support retrieving the codec IRQ from the AMCR0F28 ACPI dev ASoC: rt5640: Add support for boards with an external jack-detect GPIO ASoC: rt5640: Allow snd_soc_component_set_jack() to override the codec IRQ ASoC: rt5640: Change jack_work to a delayed_work ASoC: rt5640: Fix possible NULL pointer deref on resume ...
2022-01-14Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-1/+1
Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, pm80xx, lpfc, mpi3mr, mpt3sas, hisi_sas, libsas) and minor updates and bug fixes. The most impactful change is likely the switch from GFP_DMA to GFP_KERNEL in a bunch of drivers, but even that shouldn't affect too many people" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (121 commits) scsi: mpi3mr: Bump driver version to 8.0.0.61.0 scsi: mpi3mr: Fixes around reply request queues scsi: mpi3mr: Enhanced Task Management Support Reply handling scsi: mpi3mr: Use TM response codes from MPI3 headers scsi: mpi3mr: Add io_uring interface support in I/O-polled mode scsi: mpi3mr: Print cable mngnt and temp threshold events scsi: mpi3mr: Support Prepare for Reset event scsi: mpi3mr: Add Event acknowledgment logic scsi: mpi3mr: Gracefully handle online FW update operation scsi: mpi3mr: Detect async reset that occurred in firmware scsi: mpi3mr: Add IOC reinit function scsi: mpi3mr: Handle offline FW activation in graceful manner scsi: mpi3mr: Code refactor of IOC init - part2 scsi: mpi3mr: Code refactor of IOC init - part1 scsi: mpi3mr: Fault IOC when internal command gets timeout scsi: mpi3mr: Display IOC firmware package version scsi: mpi3mr: Handle unaligned PLL in unmap cmnds scsi: mpi3mr: Increase internal cmnds timeout to 60s scsi: mpi3mr: Do access status validation before adding devices scsi: mpi3mr: Add support for PCIe Managed Switch SES device ...
2022-01-14ata: libata: Rename link flag ATA_LFLAG_NO_DB_DELAYPaul Menzel1-1/+1
Rename the link flag ATA_LFLAG_NO_DB_DELAY to ATA_LFLAG_NO_DEBOUNCE_DELAY. The new name is longer, but clearer. Signed-off-by: Paul Menzel <[email protected]> Signed-off-by: Damien Le Moal <[email protected]>
2022-01-14ata: fix read_id() ata port operation interfaceDamien Le Moal1-2/+3
Drivers that need to tweak a device IDENTIFY data implement the read_id() port operation. The IDENTIFY data buffer is passed as an argument to the read_id() operation for drivers to use. However, when this operation is called, the IDENTIFY data is not yet converted to CPU endian and contains le16 words. Change the interface of the read_id operation to pass a __le16 * pointer to the IDENTIFY data buffer to clarify the buffer endianness. Fix the pata_netcell, pata_it821x, ahci_xgene, ahci_ceva and ahci_brcm drivers implementation of this operation and modify the code to corretly deal with identify data words manipulation to avoid sparse warnings such as: drivers/ata/ahci_xgene.c:262:33: warning: invalid assignment: &= drivers/ata/ahci_xgene.c:262:33: left side has type unsigned short drivers/ata/ahci_xgene.c:262:33: right side has type restricted __le16 Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]>
2022-01-14ata: sata_fsl: fix scsi host initializationDamien Le Moal1-0/+11
When compiling with W=1, the sata_fsl driver compilation throws the warning: drivers/ata/sata_fsl.c:1385:22: error: initialized field overwritten [-Werror=override-init] 1385 | .can_queue = SATA_FSL_QUEUE_DEPTH, This is due to the driver scsi host template initialization overwriting the can_queue field that is already set using the ATA_NCQ_SHT() initializer macro, resulting in the same field being initialized twice in the host template declaration. To remove this warning, introduce the ATA_SUBBASE_SHT_QD() and ATA_NCQ_SHT_QD() initialization macros to allow specifying a queue depth different from the default ATA_DEF_QUEUE using an additional argument to the macro. Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]>
2022-01-13tracing: Account bottom half disabled sections.Sebastian Andrzej Siewior1-0/+1
Disabling only bottom halves via local_bh_disable() disables also preemption but this remains invisible to tracing. On a CONFIG_PREEMPT kernel one might wonder why there is no scheduling happening despite the N flag in the trace. The reason might be the a rcu_read_lock_bh() section. Add a 'b' to the tracing output if in task context with disabled bottom halves. Link: https://lkml.kernel.org/r/YbcbtdtC/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2022-01-13blk-mq: fix tag_get wait task can't be awakenedLaibin Qiu1-0/+11
In case of shared tags, there might be more than one hctx which allocates from the same tags, and each hctx is limited to allocate at most: hctx_max_depth = max((bt->sb.depth + users - 1) / users, 4U); tag idle detection is lazy, and may be delayed for 30sec, so there could be just one real active hctx(queue) but all others are actually idle and still accounted as active because of the lazy idle detection. Then if wake_batch is > hctx_max_depth, driver tag allocation may wait forever on this real active hctx. Fix this by recalculating wake_batch when inc or dec active_queues. Fixes: 0d2602ca30e41 ("blk-mq: improve support for shared tags maps") Suggested-by: Ming Lei <[email protected]> Suggested-by: John Garry <[email protected]> Signed-off-by: Laibin Qiu <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-01-13Merge tag 'irq-msi-2022-01-13' of ↵Linus Torvalds5-153/+179
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI irq updates from Thomas Gleixner: "Rework of the MSI interrupt infrastructure. This is a treewide cleanup and consolidation of MSI interrupt handling in preparation for further changes in this area which are necessary to: - address existing shortcomings in the VFIO area - support the upcoming Interrupt Message Store functionality which decouples the message store from the PCI config/MMIO space" * tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits) genirq/msi: Populate sysfs entry only once PCI/MSI: Unbreak pci_irq_get_affinity() genirq/msi: Convert storage to xarray genirq/msi: Simplify sysfs handling genirq/msi: Add abuse prevention comment to msi header genirq/msi: Mop up old interfaces genirq/msi: Convert to new functions genirq/msi: Make interrupt allocation less convoluted platform-msi: Simplify platform device MSI code platform-msi: Let core code handle MSI descriptors bus: fsl-mc-msi: Simplify MSI descriptor handling soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs() soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation NTB/msi: Convert to msi_on_each_desc() PCI: hv: Rework MSI handling powerpc/mpic_u3msi: Use msi_for_each-desc() powerpc/fsl_msi: Use msi_for_each_desc() powerpc/pasemi/msi: Convert to msi_on_each_dec() powerpc/cell/axon_msi: Convert to msi_on_each_desc() powerpc/4xx/hsta: Rework MSI handling ...
2022-01-13Merge tag 'timers-core-2022-01-13' of ↵Linus Torvalds1-0/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for the time(r) subsystem: Core: - Make the clocksource watchdog more robust by better validation checks of the measurement. Drivers: - New drivers for MStar and SSD20xd SOCs - The usual cleanups and improvements all over the place" * tag 'timers-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dt-bindings: timer: Add Mstar MSC313e timer devicetree bindings documentation clocksource/drivers/msc313e: Add support for ssd20xd-based platforms clocksource/drivers: Add MStar MSC313e timer support clocksource/drivers/pistachio: Fix -Wunused-but-set-variable warning clocksource/drivers/timer-imx-sysctr: Set cpumask to cpu_possible_mask clocksource/drivers/imx-sysctr: Mark two variable with __ro_after_init clocksource/drivers/renesas,ostm: Make RENESAS_OSTM symbol visible clocksource/drivers/renesas-ostm: Add RZ/G2L OSTM support dt-bindings: timer: renesas: ostm: Document Renesas RZ/G2L OSTM clocksource/drivers/exynos_mct: Fix silly typo resulting in checkpatch warning clocksource: Reduce the default clocksource_watchdog() retries to 2 clocksource: Avoid accidental unstable marking of clocksources dt-bindings: timer: tpm-timer: Add imx8ulp compatible string reset: Add of_reset_control_get_optional_exclusive() clocksource/drivers/exynos_mct: Refactor resources allocation dt-bindings: timer: remove rockchip,rk3066-timer compatible string from rockchip,rk-timer.yaml dt-bindings: timer: cadence_ttc: Add power-domains
2022-01-13Merge tag 'irq-core-2022-01-13' of ↵Linus Torvalds3-3/+56
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core: - Provide a new interface for affinity hints to provide a separation between hint and actual affinity change which has become a hidden property of the current interface - Fix up the in tree usage of the affinity hint interfaces Drivers: - No new irqchip drivers! - Fix GICv3 redistributor table reservation with RT across kexec - Fix GICv4.1 redistributor view of the VPE table across kexec - Add support for extra interrupts on spear-shirq - Make obtaining some interrupts optional for the Renesas drivers - Various cleanups and bug fixes" * tag 'irq-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) irqchip/renesas-intc-irqpin: Use platform_get_irq_optional() to get the interrupt irqchip/renesas-irqc: Use platform_get_irq_optional() to get the interrupt irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time irqchip/ingenic-tcu: Use correctly sized arguments for bit field irqchip/gic-v2m: Add const to of_device_id irqchip/imx-gpcv2: Mark imx_gpcv2_instance with __ro_after_init irqchip/spear-shirq: Add support for IRQ 0..6 irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime irqchip/gic-v3-its: Postpone LPI pending table freeing and memreserve irqchip/gic-v3-its: Give the percpu rdist struct its own flags field net/mlx4: Use irq_update_affinity_hint() net/mlx5: Use irq_set_affinity_and_hint() hinic: Use irq_set_affinity_and_hint() scsi: lpfc: Use irq_set_affinity() mailbox: Use irq_update_affinity_hint() ixgbe: Use irq_update_affinity_hint() be2net: Use irq_update_affinity_hint() enic: Use irq_update_affinity_hint() RDMA/irdma: Use irq_update_affinity_hint() scsi: mpt3sas: Use irq_set_affinity_and_hint() ...
2022-01-13Merge branch 'pci/errors'Bjorn Helgaas1-0/+9
- Add PCI_ERROR_RESPONSE and related definitions for signaling and checking for transaction errors on PCI (Naveen Naidu) - Fabricate PCI_ERROR_RESPONSE data (~0) in config read wrappers, instead of in host controller drivers, when transactions fail on PCI (Naveen Naidu) - Use PCI_POSSIBLE_ERROR() to check for possible failure of config reads (Naveen Naidu) * pci/errors: PCI: xgene: Use PCI_ERROR_RESPONSE to identify config read errors PCI: hv: Use PCI_ERROR_RESPONSE to identify config read errors PCI: keystone: Use PCI_ERROR_RESPONSE to identify config read errors PCI: Use PCI_ERROR_RESPONSE to identify config read errors PCI: cpqphp: Use PCI_POSSIBLE_ERROR() to check config reads PCI/PME: Use PCI_POSSIBLE_ERROR() to check config reads PCI/DPC: Use PCI_POSSIBLE_ERROR() to check config reads PCI: pciehp: Use PCI_POSSIBLE_ERROR() to check config reads PCI: vmd: Use PCI_POSSIBLE_ERROR() to check config reads PCI/ERR: Use PCI_POSSIBLE_ERROR() to check config reads PCI: rockchip-host: Drop error data fabrication when config read fails PCI: rcar-host: Drop error data fabrication when config read fails PCI: altera: Drop error data fabrication when config read fails PCI: mvebu: Drop error data fabrication when config read fails PCI: aardvark: Drop error data fabrication when config read fails PCI: kirin: Drop error data fabrication when config read fails PCI: histb: Drop error data fabrication when config read fails PCI: exynos: Drop error data fabrication when config read fails PCI: mediatek: Drop error data fabrication when config read fails PCI: iproc: Drop error data fabrication when config read fails PCI: thunder: Drop error data fabrication when config read fails PCI: Drop error data fabrication when config read fails PCI: Use PCI_SET_ERROR_RESPONSE() for disconnected devices PCI: Set error response data when config read fails PCI: Add PCI_ERROR_RESPONSE and related definitions
2022-01-13Merge branch 'pci/misc'Bjorn Helgaas1-25/+25
- Sort Intel Device IDs by value (Andy Shevchenko) - Change Capability offsets to hex to match spec (Baruch Siach) - Correct misspellings (Krzysztof Wilczyński) - Terminate statement with semicolon in pci_endpoint_test.c (Ming Wang) * pci/misc: misc: pci_endpoint_test: Terminate statement with semicolon PCI: Correct misspelled words PCI: Change capability register offsets to hex PCI: Sort Intel Device IDs by value
2022-01-13Merge branch 'pci/resource'Bjorn Helgaas1-0/+1
- Always write Intel I210 ROM BAR on update to work around device defect (Bjorn Helgaas) * pci/resource: PCI: Work around Intel I210 ROM BAR overlap defect
2022-01-13libceph: rename parse_fsid() to ceph_parse_fsid() and exportVenky Shankar1-0/+1
... as it is too generic. also, use __func__ when logging rather than hardcoding the function name. Signed-off-by: Venky Shankar <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2022-01-13libceph: generalize addr/ip parsing based on delimiterVenky Shankar2-2/+2
... and remove hardcoded function name in ceph_parse_ips(). [ idryomov: delim parameter, drop CEPH_ADDR_PARSE_DEFAULT_DELIM ] Signed-off-by: Venky Shankar <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2022-01-12Merge tag 'clk-for-linus' of ↵Linus Torvalds4-16/+35
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "We have a couple patches in the framework core this time around but they're mostly minor cleanups and some debugfs stuff. The real work that's in here is the typical pile of clk driver updates and new SoC support. Per usual (or maybe just recent trends), Qualcomm gains a handful of SoC drivers additions and has the largest diffstat. After that there are quite a few updates to the Allwinner (sunxi) drivers to support modular drivers and Renesas is heavily updated to add more support for various clks. Overall it looks pretty normal. New Drivers: - Add MDMA and BDMA clks to Ingenic JZ4760 and JZ4770 - MediaTek mt7986 SoC basic support - Clock and reset driver for Toshiba Visconti SoCs - Initial clock driver for the Exynos7885 SoC (Samsung Galaxy A8) - Allwinner D1 clks - Lan966x Generic Clock Controller driver and associated DT bindings - Qualcomm SDX65, SM8450, and MSM8976 GCC clks - Qualcomm SDX65 and SM8450 RPMh clks Updates: - Set suppress_bind_attrs to true for i.MX8ULP driver - Switch from do_div to div64_ul for throughout all i.MX drivers - Fix imx8mn_clko1_sels for i.MX8MN - Remove unused IPG_AUDIO_ROOT from i.MX8MP - Switch parent for audio_root_clk to audio ahb in i.MX8MP driver - Removal of all remaining uses of __clk_lookup() in drivers/clk/samsung - Refactoring of the CPU clocks registration to use common interface - An update of the Exynos850 driver (support for more clock domains) required by the E850-96 development board - Prep for runtime PM and generic power domains on Tegra - Support modular Allwinner clk drivers via platform bus - Lan966x clock driver extended to support clock gating - Add serial (SCI1), watchdog (WDT), timer (OSTM), SPI (RSPI), and thermal (TSU) clocks and resets on Renesas RZ/G2L - Rework SDHI clock handling in the Renesas R-Car Gen3 and RZ/G2 clock drivers, and in the Renesas SDHI driver - Make the Cortex-A55 (I) clock on Renesas RZ/G2L programmable - Document support for the new Renesas R-Car S4-8 (R8A779F0) SoC - Add support for the new Renesas R-Car S4-8 (R8A779F0) SoC - Add GPU clock and resets on Renesas RZ/G2L - Add clk-provider.h to various Qualcomm clk drivers - devm version of clk_hw_register_gate() - kerneldoc fixes in a couple drivers" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (131 commits) clk: visconti: Remove pointless NULL check in visconti_pll_add_lookup() clk: mediatek: add mt7986 clock support clk: mediatek: add mt7986 clock IDs dt-bindings: clock: mediatek: document clk bindings for mediatek mt7986 SoC clk: mediatek: clk-gate: Use regmap_{set/clear}_bits helpers clk: mediatek: clk-gate: Shrink by adding clockgating bit check helper clk: x86: Fix clk_gate_flags for RV_CLK_GATE clk: x86: Use dynamic con_id string during clk registration ACPI: APD: Add a fmw property clk-name drivers: acpi: acpi_apd: Remove unused device property "is-rv" x86: clk: clk-fch: Add support for newer family of AMD's SOC clk: ingenic: Add MDMA and BDMA clocks dt-bindings: clk/ingenic: Add MDMA and BDMA clocks clk: bm1880: remove kfrees on static allocations clk: Drop unused COMMON_CLK_STM32MP157_SCMI config clk: st: clkgen-mux: search reg within node or parent clk: st: clkgen-fsyn: search reg within node or parent clk: Enable/Disable runtime PM for clk_summary MAINTAINERS: Add entries for Toshiba Visconti PLL and clock controller clk: visconti: Add support common clock driver and reset driver ...
2022-01-12Merge tag 'devicetree-for-5.17' of ↵Linus Torvalds1-6/+5
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "Bindings: - DT schema conversions for Samsung clocks, RNG bindings, Qcom Command DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl, Tegra I2C and BPMP, pwm-vibrator, Arm DSU, and Cadence macb - DT schema conversions for Broadcom platforms: interrupt controllers, STB GPIO, STB waketimer, STB reset, iProc MDIO mux, iProc PCIe, Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON, SYSTEMPORT, AMAC, Northstar 2 PCIe PHY, GENET, moca PHY, GISB arbiter, and SATA - Add binding schemas for Tegra210 EMC table, TI DC-DC converters, - Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues - More fixes due to 'unevaluatedProperties' enabling - Data type fixes and clean-ups of binding examples found in preparation to move to validating DTB files directly (instead of intermediate YAML representation. - Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus - Add various new compatible strings DT core: - Silence a warning for overlapping reserved memory regions - Reimplement unittest overlay tracking - Fix stack frame size warning in unittest - Clean-ups of early FDT scanning functions - Fix handling of "linux,usable-memory-range" on EFI booted systems - Add support for 'fail' status on CPU nodes - Improve error message in of_phandle_iterator_next() - kbuild: Disable duplicate unit-address warnings for disabled nodes" * tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (114 commits) dt-bindings: net: mdio: Drop resets/reset-names child properties dt-bindings: clock: samsung: convert S5Pv210 to dtschema dt-bindings: clock: samsung: convert Exynos5410 to dtschema dt-bindings: clock: samsung: convert Exynos5260 to dtschema dt-bindings: clock: samsung: extend Exynos7 bindings with UFS dt-bindings: clock: samsung: convert Exynos7 to dtschema dt-bindings: clock: samsung: convert Exynos5433 to dtschema dt-bindings: i2c: maxim,max96712: Add bindings for Maxim Integrated MAX96712 dt-bindings: iio: adi,ltc2983: Fix 64-bit property sizes dt-bindings: power: maxim,max17040: Fix incorrect type for 'maxim,rcomp' dt-bindings: interrupt-controller: arm,gic-v3: Fix 'interrupts' cell size in example dt-bindings: iio/magnetometer: yamaha,yas530: Fix invalid 'interrupts' in example dt-bindings: clock: imx5: Drop clock consumer node from example dt-bindings: Drop required 'interrupt-parent' dt-bindings: net: ti,dp83869: Drop value on boolean 'ti,max-output-impedance' dt-bindings: net: wireless: mt76: Fix 8-bit property sizes dt-bindings: PCI: snps,dw-pcie-ep: Drop conflicting 'max-functions' schema dt-bindings: i2c: st,stm32-i2c: Make each example a separate entry dt-bindings: net: stm32-dwmac: Make each example a separate entry dt-bindings: net: Cleanup MDIO node schemas ...
2022-01-12Merge tag 'x86_core_for_v5.17_rc1' of ↵Linus Torvalds1-1/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Get rid of all the .fixup sections because this generates misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and LIVEPATCH as the backtrace misses the function which is being fixed up. - Add Straight Line Speculation mitigation support which uses a new compiler switch -mharden-sls= which sticks an INT3 after a RET or an indirect branch in order to block speculation after them. Reportedly, CPUs do speculate behind such insns. - The usual set of cleanups and improvements * tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) x86/entry_32: Fix segment exceptions objtool: Remove .fixup handling x86: Remove .fixup section x86/word-at-a-time: Remove .fixup usage x86/usercopy: Remove .fixup usage x86/usercopy_32: Simplify __copy_user_intel_nocache() x86/sgx: Remove .fixup usage x86/checksum_32: Remove .fixup usage x86/vmx: Remove .fixup usage x86/kvm: Remove .fixup usage x86/segment: Remove .fixup usage x86/fpu: Remove .fixup usage x86/xen: Remove .fixup usage x86/uaccess: Remove .fixup usage x86/futex: Remove .fixup usage x86/msr: Remove .fixup usage x86/extable: Extend extable functionality x86/entry_32: Remove .fixup usage x86/entry_64: Remove .fixup usage x86/copy_mc_64: Remove .fixup usage ...
2022-01-12Merge tag 'perf_core_for_v5.17_rc1' of ↵Linus Torvalds2-12/+42
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Borislav Petkov: "Cleanup of the perf/kvm interaction." * tag 'perf_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Drop guest callback (un)register stubs KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=y KVM: arm64: Convert to the generic perf callbacks KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c KVM: Move x86's perf guest info callbacks to generic KVM KVM: x86: More precisely identify NMI from guest when handling PMI KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable perf/core: Use static_call to optimize perf_guest_info_callbacks perf: Force architectures to opt-in to guest callbacks perf: Add wrappers for invoking guest callbacks perf/core: Rework guest callbacks to prepare for static_call support perf: Drop dead and useless guest "support" from arm, csky, nds32 and riscv perf: Stop pretending that perf can handle multiple guest callbacks KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest KVM: x86: Register perf callbacks after calling vendor's hardware_setup() perf: Protect perf_guest_cbs with RCU
2022-01-12Merge tag 'iommu-updates-v5.17' of ↵Linus Torvalds3-74/+3
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Identity domain support for virtio-iommu - Move flush queue code into iommu-dma - Some fixes for AMD IOMMU suspend/resume support when x2apic is used - Arm SMMU Updates from Will Deacon: - Revert evtq and priq back to their former sizes - Return early on short-descriptor page-table allocation failure - Fix page fault reporting for Adreno GPU on SMMUv2 - Make SMMUv3 MMU notifier ops 'const' - Numerous new compatible strings for Qualcomm SMMUv2 implementations - Various smaller fixes and cleanups * tag 'iommu-updates-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (38 commits) iommu/iova: Temporarily include dma-mapping.h from iova.h iommu: Move flush queue data into iommu_dma_cookie iommu/iova: Move flush queue code to iommu-dma iommu/iova: Consolidate flush queue code iommu/vt-d: Use put_pages_list iommu/amd: Use put_pages_list iommu/amd: Simplify pagetable freeing iommu/iova: Squash flush_cb abstraction iommu/iova: Squash entry_dtor abstraction iommu/iova: Fix race between FQ timeout and teardown iommu/amd: Fix typo in *glues … together* in comment iommu/vt-d: Remove unused dma_to_mm_pfn function iommu/vt-d: Drop duplicate check in dma_pte_free_pagetable() iommu/vt-d: Use bitmap_zalloc() when applicable iommu/amd: Remove useless irq affinity notifier iommu/amd: X2apic mode: mask/unmask interrupts on suspend/resume iommu/amd: X2apic mode: setup the INTX registers on mask/unmask iommu/amd: X2apic mode: re-enable after resume iommu/amd: Restore GA log/tail pointer on host resume iommu/iova: Move fast alloc size roundup into alloc_iova_fast() ...