aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-11-09include/media/media-entity.h: replace kernel.h with the necessary inclusionsAndy Shevchenko1-1/+2
When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Sakari Ailus <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09include/linux/plist.h: replace kernel.h with the necessary inclusionsAndy Shevchenko1-1/+4
When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09include/linux/llist.h: replace kernel.h with the necessary inclusionsAndy Shevchenko1-1/+3
When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09include/linux/list.h: replace kernel.h with the necessary inclusionsAndy Shevchenko1-1/+3
When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09include/kunit/test.h: replace kernel.h with the necessary inclusionsAndy Shevchenko1-2/+11
When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09kernel.h: split out container_of() and typeof_member() macrosAndy Shevchenko2-32/+41
kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt cleaning it up by splitting out container_of() and typeof_member() macros. For time being include new header back to kernel.h to avoid twisted indirected includes for existing users. Note, there are _a lot_ of headers and modules that include kernel.h solely for one of these macros and this allows to unburden compiler for the twisted inclusion paths and to make new code cleaner in the future. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09kernel.h: drop unneeded <linux/kernel.h> inclusion from other headersAndy Shevchenko4-3/+1
Patch series "kernel.h further split", v5. kernel.h is a set of something which is not related to each other and often used in non-crossed compilation units, especially when drivers need only one or two macro definitions from it. This patch (of 7): There is no evidence we need kernel.h inclusion in certain headers. Drop unneeded <linux/kernel.h> inclusion from other headers. [[email protected]: bottom_half.h needs kernel] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Will Deacon <[email protected]> Cc: Waiman Long <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Sakari Ailus <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09proc: allow pid_revalidate() during LOOKUP_RCUStephen Brennan1-8/+10
Problem Description: When running running ~128 parallel instances of TZ=/etc/localtime ps -fe >/dev/null on a 128CPU machine, the %sys utilization reaches 97%, and perf shows the following code path as being responsible for heavy contention on the d_lockref spinlock: walk_component() lookup_fast() d_revalidate() pid_revalidate() // returns -ECHILD unlazy_child() lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention The reason is that pid_revalidate() is triggering a drop from RCU to ref path walk mode. All concurrent path lookups thus try to grab a reference to the dentry for /proc/, before re-executing pid_revalidate() and then stepping into the /proc/$pid directory. Thus there is huge spinlock contention. This patch allows pid_revalidate() to execute in RCU mode, meaning that the path lookup can successfully enter the /proc/$pid directory while still in RCU mode. Later on, the path lookup may still drop into ref mode, but the contention will be much reduced at this point. By applying this patch, %sys utilization falls to around 85% under the same workload, and the number of ps processes executed per unit time increases by 3x-4x. Although this particular workload is a bit contrived, we have seen some large collections of eager monitoring scripts which produced similarly high %sys time due to contention in the /proc directory. As a result this patch, Al noted that several procfs methods which were only called in ref-walk mode could now be called from RCU mode. To ensure that this patch is safe, I audited all the inode get_link and permission() implementations, as well as dentry d_revalidate() implementations, in fs/proc. The purpose here is to ensure that they either are safe to call in RCU (i.e. don't sleep) or correctly bail out of RCU mode if they don't support it. My analysis shows that all at-risk procfs methods are safe to call under RCU, and thus this patch is safe. Procfs RCU-walk Analysis: This analysis is up-to-date with 5.15-rc3. When called under RCU mode, these functions have arguments as follows: * get_link() receives a NULL dentry pointer when called in RCU mode. * permission() receives MAY_NOT_BLOCK in the mode parameter when called from RCU. * d_revalidate() receives LOOKUP_RCU in flags. For the following functions, either they are trivially RCU safe, or they explicitly bail at the beginning of the function when they run: proc_ns_get_link (bails out) proc_get_link (RCU safe) proc_pid_get_link (bails out) map_files_d_revalidate (bails out) map_misc_d_revalidate (bails out) proc_net_d_revalidate (RCU safe) proc_sys_revalidate (bails out, also not under /proc/$pid) tid_fd_revalidate (bails out) proc_sys_permission (not under /proc/$pid) The remainder of the functions require a bit more detail: * proc_fd_permission: RCU safe. All of the body of this function is under rcu_read_lock(), except generic_permission() which declares itself RCU safe in its documentation string. * proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware and otherwise looks safe. The same is true of proc_thread_self_get_link. * proc_map_files_get_link: calls ns_capable, which calls capable(), and thus calls into the audit code (see note #1 below). The remainder is just a call to the trivially safe proc_pid_get_link(). * proc_pid_permission: calls ptrace_may_access(), which appears RCU safe, although it does call into the "security_ptrace_access_check()" hook, which looks safe under smack and selinux. Just the audit code is of concern. Also uses get_task_struct() and put_task_struct(), see note #2 below. * proc_tid_comm_permission: Appears safe, though calls put_task_struct (see note #2 below). Note #1: Most of the concern of RCU safety has centered around the audit code. However, since b17ec22fb339 ("selinux: slow_avc_audit has become non-blocking"), it's safe to call this code under RCU. So all of the above are safe by my estimation. Note #2: get_task_struct() and put_task_struct(): The majority of get_task_struct() is under RCU read lock, and in any case it is a simple increment. But put_task_struct() is complex, given that it could at some point free the task struct, and this process has many steps which I couldn't manually verify. However, several other places call put_task_struct() under RCU, so it appears safe to use here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296) Patch description: pid_revalidate() drops from RCU into REF lookup mode. When many threads are resolving paths within /proc in parallel, this can result in heavy spinlock contention on d_lockref as each thread tries to grab a reference to the /proc dentry (and drop it shortly thereafter). Investigation indicates that it is not necessary to drop RCU in pid_revalidate(), as no RCU data is modified and the function never sleeps. So, remove the LOOKUP_RCU check. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Brennan <[email protected]> Cc: Konrad Wilk <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09virtio-mem: kdump mode to sanitize /proc/vmcore accessDavid Hildenbrand1-12/+124
Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [1] Let's register a vmcore callback, to allow vmcore code to check if a PFN belonging to a virtio-mem device is either currently plugged and should be dumped or is currently unplugged and should not be accessed, instead mapping the shared zeropage or returning zeroes when reading. This is important when not capturing /proc/vmcore via tools like "makedumpfile" that can identify logically unplugged virtio-mem memory via PG_offline in the memmap, but simply by e.g., copying the file. Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. With this series, we'll send one virtio-mem state request for every ~2 MiB chunk of virtio-mem memory indicated in the vmcore that we intend to read/map. In the future, we might want to allow building virtio-mem for kdump mode only, even without CONFIG_MEMORY_HOTPLUG and friends: this way, we could support special stripped-down kdump kernels that have many other config options disabled; we'll tackle that once required. Further, we might want to try sensing bigger blocks (e.g., memory sections) first before falling back to device blocks on demand. Tested with Fedora rawhide, which contains a recent kexec-tools version (considering "System RAM (virtio_mem)" when creating the vmcore header) and a recent dracut version (including the virtio_mem module in the kdump initrd). Link: https://lkml.kernel.org/r/[email protected] [1] Link: https://github.com/dracutdevs/dracut/pull/1157 [2] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Dave Young <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Juergen Gross <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09virtio-mem: factor out hotplug specifics from virtio_mem_remove() into ↵David Hildenbrand1-3/+10
virtio_mem_deinit_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Dave Young <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Juergen Gross <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09virtio-mem: factor out hotplug specifics from virtio_mem_probe() into ↵David Hildenbrand1-42/+45
virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Dave Young <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Juergen Gross <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09virtio-mem: factor out hotplug specifics from virtio_mem_init() into ↵David Hildenbrand1-37/+44
virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Dave Young <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Juergen Gross <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacksDavid Hildenbrand4-38/+111
Let's support multiple registered callbacks, making sure that registering vmcore callbacks cannot fail. Make the callback return a bool instead of an int, handling how to deal with errors internally. Drop unused HAVE_OLDMEM_PFN_IS_RAM. We soon want to make use of this infrastructure from other drivers: virtio-mem, registering one callback for each virtio-mem device, to prevent reading unplugged virtio-mem memory. Handle it via a generic vmcore_cb structure, prepared for future extensions: for example, once we support virtio-mem on s390x where the vmcore is completely constructed in the second kernel, we want to detect and add plugged virtio-mem memory ranges to the vmcore in order for them to get dumped properly. Handle corner cases that are unexpected and shouldn't happen in sane setups: registering a callback after the vmcore has already been opened (warn only) and unregistering a callback after the vmcore has already been opened (warn and essentially read only zeroes from that point on). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Dave Young <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Juergen Gross <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09proc/vmcore: let pfn_is_ram() return a boolDavid Hildenbrand1-4/+4
The callback should deal with errors internally, it doesn't make sense to expose these via pfn_is_ram(). We'll rework the callbacks next. Right now we consider errors as if "it's RAM"; no functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Dave Young <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Juergen Gross <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09x86/xen: print a warning when HVMOP_get_mem_type failsDavid Hildenbrand1-1/+3
HVMOP_get_mem_type is not expected to fail, "This call failing is indication of something going quite wrong and it would be good to know about this." [1] Let's add a pr_warn_once(). Link: https://lkml.kernel.org/r/[email protected] [1] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Suggested-by: Boris Ostrovsky <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Young <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Juergen Gross <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09x86/xen: simplify xen_oldmem_pfn_is_ram()David Hildenbrand1-14/+1
Let's simplify return handling. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Dave Young <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Juergen Gross <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09x86/xen: update xen_oldmem_pfn_is_ram() documentationDavid Hildenbrand1-6/+3
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: "Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut" This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/[email protected] [2] https://github.com/dracutdevs/dracut/pull/1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09procfs: do not list TID 0 in /proc/<pid>/taskFlorian Weimer4-0/+87
If a task exits concurrently, task_pid_nr_ns may return 0. [[email protected]: coding style tweaks] [[email protected]: test that /proc/*/task doesn't contain "0"] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Florian Weimer <[email protected]> Signed-off-by: Alexey Dobriyan <[email protected]> Acked-by: Christian Brauner <[email protected]> Reviewed-by: Alexey Dobriyan <[email protected]> Cc: Kees Cook <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09mm,hugetlb: remove mlock ulimit for SHM_HUGETLBzhangyiru5-31/+13
Commit 21a3c273f88c ("mm, hugetlb: add thread name and pid to SHM_HUGETLB mlock rlimit warning") marked this as deprecated in 2012, but it is not deleted yet. Mike says he still sees that message in log files on occasion, so maybe we should preserve this warning. Also remove hugetlbfs related user_shm_unlock in ipc/shm.c and remove the user_shm_unlock after out. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: zhangyiru <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Liu Zixian <[email protected]> Cc: Michal Hocko <[email protected]> Cc: wuxu.wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09vfs: keep inodes with page cache off the inode shrinker LRUJohannes Weiner8-22/+120
Historically (pre-2.5), the inode shrinker used to reclaim only empty inodes and skip over those that still contained page cache. This caused problems on highmem hosts: struct inode could put fill lowmem zones before the cache was getting reclaimed in the highmem zones. To address this, the inode shrinker started to strip page cache to facilitate reclaiming lowmem. However, this comes with its own set of problems: the shrinkers may drop actively used page cache just because the inodes are not currently open or dirty - think working with a large git tree. It further doesn't respect cgroup memory protection settings and can cause priority inversions between containers. Nowadays, the page cache also holds non-resident info for evicted cache pages in order to detect refaults. We've come to rely heavily on this data inside reclaim for protecting the cache workingset and driving swap behavior. We also use it to quantify and report workload health through psi. The latter in turn is used for fleet health monitoring, as well as driving automated memory sizing of workloads and containers, proactive reclaim and memory offloading schemes. The consequences of dropping page cache prematurely is that we're seeing subtle and not-so-subtle failures in all of the above-mentioned scenarios, with the workload generally entering unexpected thrashing states while losing the ability to reliably detect it. To fix this on non-highmem systems at least, going back to rotating inodes on the LRU isn't feasible. We've tried (commit a76cf1a474d7 ("mm: don't reclaim inodes with many attached pages")) and failed (commit 69056ee6a8a3 ("Revert "mm: don't reclaim inodes with many attached pages"")). The issue is mostly that shrinker pools attract pressure based on their size, and when objects get skipped the shrinkers remember this as deferred reclaim work. This accumulates excessive pressure on the remaining inodes, and we can quickly eat into heavily used ones, or dirty ones that require IO to reclaim, when there potentially is plenty of cold, clean cache around still. Instead, this patch keeps populated inodes off the inode LRU in the first place - just like an open file or dirty state would. An otherwise clean and unused inode then gets queued when the last cache entry disappears. This solves the problem without reintroducing the reclaim issues, and generally is a bit more scalable than having to wade through potentially hundreds of thousands of busy inodes. Locking is a bit tricky because the locks protecting the inode state (i_lock) and the inode LRU (lru_list.lock) don't nest inside the irq-safe page cache lock (i_pages.xa_lock). Page cache deletions are serialized through i_lock, taken before the i_pages lock, to make sure depopulated inodes are queued reliably. Additions may race with deletions, but we'll check again in the shrinker. If additions race with the shrinker itself, we're protected by the i_lock: if find_inode() or iput() win, the shrinker will bail on the elevated i_count or I_REFERENCED; if the shrinker wins and goes ahead with the inode, it will set I_FREEING and inhibit further igets(), which will cause the other side to create a new instance of the inode instead. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-09nvme: wait until quiesce is doneMing Lei1-0/+4
NVMe uses one atomic flag to check if quiesce is needed. If quiesce is started, the helper returns immediately. This way is wrong, since we have to wait until quiesce is done. Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue quiesce/unquiesce") Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-11-09scsi: make sure that request queue queiesce and unquiesce balancedMing Lei2-9/+29
For fixing queue quiesce race between driver and block layer(elevator switch, update nr_requests, ...), we need to support concurrent quiesce and unquiesce, which requires the two call balanced. It isn't easy to audit that in all scsi drivers, especially the two may be called from different contexts, so do it in scsi core with one per-device atomic variable to balance quiesce and unquiesce. Reported-by: Yi Zhang <[email protected]> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue quiesce/unquiesce") Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-11-09scsi: avoid to quiesce sdev->request_queue two timesMing Lei1-15/+14
For fixing queue quiesce race between driver and block layer(elevator switch, update nr_requests, ...), we need to support concurrent quiesce and unquiesce, which requires the two to be balanced. blk_mq_quiesce_queue() calls blk_mq_quiesce_queue_nowait() for updating quiesce depth and marking the flag, then scsi_internal_device_block() calls blk_mq_quiesce_queue_nowait() two times actually. Fix the double quiesce and keep quiesce and unquiesce balanced. Reported-by: Yi Zhang <[email protected]> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue quiesce/unquiesce") Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-11-09blk-mq: add one API for waiting until quiesce is doneMing Lei2-8/+21
Some drivers(NVMe, SCSI) need to call quiesce and unquiesce in pair, but it is hard to switch to this style, so these drivers need one atomic flag for helping to balance quiesce and unquiesce. When quiesce is in-progress, the driver still needs to wait until the quiesce is done, so add API of blk_mq_wait_quiesce_done() for these drivers. Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-11-09dmaengine: ti: k3-udma: Set r/tchan or rflow to NULL if request failKishon Vijay Abraham I1-4/+20
udma_get_*() checks if rchan/tchan/rflow is already allocated by checking if it has a NON NULL value. For the error cases, rchan/tchan/rflow will have error value and udma_get_*() considers this as already allocated (PASS) since the error values are NON NULL. This results in NULL pointer dereference error while de-referencing rchan/tchan/rflow. Reset the value of rchan/tchan/rflow to NULL if a channel request fails. CC: [email protected] Acked-by: Peter Ujfalusi <[email protected]> Signed-off-by: Kishon Vijay Abraham I <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2021-11-09dmaengine: ti: k3-udma: Set bchan to NULL if a channel request failKishon Vijay Abraham I1-2/+6
bcdma_get_*() checks if bchan is already allocated by checking if it has a NON NULL value. For the error cases, bchan will have error value and bcdma_get_*() considers this as already allocated (PASS) since the error values are NON NULL. This results in NULL pointer dereference error while de-referencing bchan. Reset the value of bchan to NULL if a channel request fails. CC: [email protected] Acked-by: Peter Ujfalusi <[email protected]> Signed-off-by: Kishon Vijay Abraham I <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2021-11-09dmaengine: stm32-dma: avoid 64-bit division in stm32_dma_get_max_widthArnd Bergmann1-3/+3
Using the % operator on a 64-bit variable is expensive and can cause a link failure: arm-linux-gnueabi-ld: drivers/dma/stm32-dma.o: in function `stm32_dma_get_max_width': stm32-dma.c:(.text+0x170): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/dma/stm32-dma.o: in function `stm32_dma_set_xfer_param': stm32-dma.c:(.text+0x1cd4): undefined reference to `__aeabi_uldivmod' As we know that we just want to check the alignment in stm32_dma_get_max_width(), there is no need for a full division, and using a simple mask is a faster replacement. Same in stm32_dma_set_xfer_param(), change this to only allow burst transfers if the address is a multiple of the length. stm32_dma_get_best_burst just after will take buf_len into account to fix burst in case of misalignment. Fixes: b20fd5fa310c ("dmaengine: stm32-dma: fix stm32_dma_get_max_width") Reported-by: kernel test robot <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Amelie Delaunay <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2021-11-08Merge tag 'backlight-next-5.16' of ↵Linus Torvalds4-15/+22
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight Pull backlight updates from Lee Jones: "Fix-ups: - Standardise *_exit() and *_remove() return values in ili9320 and vgg2432a4 Bug Fixes: - Do not override maximum brightness - Propagate errors from get_brightness()" * tag 'backlight-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: video: backlight: ili9320: Make ili9320_remove() return void backlight: Propagate errors from get_brightness() video: backlight: Drop maximum brightness override for brightness zero
2021-11-08Merge tag 'mfd-next-5.16' of ↵Linus Torvalds77-2200/+2120
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "Removed Drivers: - Remove support for TI TPS80031/TPS80032 PMICs New Device Support: - Add support for Magnetic Reader to TI AM335x - Add support for DA9063_EA to Dialog DA9063 - Add support for SC2730 PMIC to Spreadtrum SC27xx - Add support for MacBookPro16,2 ICL-N UART Intel LPSS PCI - Add support for lots of new PMICS in QCom SPMI PMIC - Add support for ADC to Diolan DLN2 New Functionality: - Add support for Power Off to Rockchip RK817 Fix-ups: - Simplify Regmap passing to child devices in hi6421-spmi-pmic - SPDX licensing updates in ti_am335x_tscadc - Improve error handling in ti_am335x_tscadc - Expedite clock search in ti_am335x_tscadc - Generic simplifications in ti_am335x_tscadc - Use generic macros/defines in ti_am335x_tscadc - Remove unused code in ti_am335x_tscadc, cros_ec_dev - Convert to GPIOD in wcd934x - Add namespacing in ti_am335x_tscadc - Restrict compilation to relevant arches in intel_pmt - Provide better description/documentation in exynos_lpass - Add SPI device ID table in altera-a10sr, motorola-cpcap, sprd-sc27xx-spi - Change IRQ handling in qcom-pm8xxx - Split out I2C and SPI code in arizona - Explicitly include used headers in altera-a10sr - Convert sysfs show() function to in sysfs_emit - Standardise *_exit() and *_remove() return values in mc13xxx, stmpe, tps65912 - Trivial (style/spelling/whitespace) fixups in ti_am335x_tscadc, qcom-spmi-pmic, max77686-private - Device Tree fix-ups in ti,am3359-tscadc, samsung,s2mps11, samsung,s2mpa01, samsung,s5m8767, brcm,misc, brcm,cru, syscon, qcom,tcsr, xylon,logicvc, max77686, x-powers,ac100, x-powers,axp152, x-powers,axp209-gpio, syscon, qcom,spmi-pmic Bug Fixes: - Balance refcounting (get/put) in ti_am335x_tscadc, mfd-core - Fix IRQ trigger type in sec-irq, max77693, max14577 - Repair off-by-one in altera-sysmgr - Add explicit 'select MFD_CORE' to MFD_SIMPLE_MFD_I2C" * tag 'mfd-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (95 commits) mfd: simple-mfd-i2c: Select MFD_CORE to fix build error mfd: tps80031: Remove driver mfd: max77686: Correct tab-based alignment of register addresses mfd: wcd934x: Replace legacy gpio interface for gpiod dt-bindings: mfd: qcom: pm8xxx: Add pm8018 compatible mfd: dln2: Add cell for initializing DLN2 ADC mfd: qcom-spmi-pmic: Add missing PMICs supported by socinfo mfd: qcom-spmi-pmic: Document ten more PMICs in the binding mfd: qcom-spmi-pmic: Sort compatibles in the driver mfd: qcom-spmi-pmic: Sort the compatibles in the binding mfd: janz-cmoio: Replace snprintf in show functions with sysfs_emit mfd: altera-a10sr: Include linux/module.h mfd: tps65912: Make tps65912_device_exit() return void mfd: stmpe: Make stmpe_remove() return void mfd: mc13xxx: Make mc13xxx_common_exit() return void dt-bindings: mfd: syscon: Add samsung,exynosautov9-sysreg compatible mfd: altera-sysmgr: Fix a mistake caused by resource_size conversion dt-bindings: gpio: Convert X-Powers AXP209 GPIO binding to a schema dt-bindings: mfd: syscon: Add rk3368 QoS register compatible mfd: arizona: Split of_match table into I2C and SPI versions ...
2021-11-08Merge tag 'gpio-updates-for-v5.16' of ↵Linus Torvalds26-351/+943
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio updates from Bartosz Golaszewski: "We have a single new driver, new features in others and some cleanups all over the place. Nothing really stands out and it is all relatively small. - new driver: gpio-modepin (plus relevant change in zynqmp firmware) - add interrupt support to gpio-virtio - enable the 'gpio-line-names' property in the DT bindings for gpio-rockchip - use the subsystem helpers where applicable in gpio-uniphier instead of accessing IRQ structures directly - code shrink in gpio-xilinx - add interrupt to gpio-mlxbf2 (and include the removal of custom interrupt code from the mellanox ethernet driver) - support multiple interrupts per bank in gpio-tegra186 (and force one interrupt per bank in older models) - fix GPIO line IRQ offset calculation in gpio-realtek-otto - drop unneeded MODULE_ALIAS expansions in multiple drivers - code cleanup in gpio-aggregator - minor improvements in gpio-max730x and gpio-mc33880 - Kconfig cleanups" * tag 'gpio-updates-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: virtio_gpio: drop packed attribute gpio: virtio: Add IRQ support gpio: realtek-otto: fix GPIO line IRQ offset gpio: clean up Kconfig file net: mellanox: mlxbf_gige: Replace non-standard interrupt handling gpio: mlxbf2: Introduce IRQ support gpio: mc33880: Drop if with an always false condition gpio: max730x: Make __max730x_remove() return void gpio: aggregator: Wrap access to gpiochip_fwd.tmp[] gpio: modepin: Add driver support for modepin GPIO controller dt-bindings: gpio: zynqmp: Add binding documentation for modepin firmware: zynqmp: Add MMIO read and write support for PS_MODE pin gpio: tps65218: drop unneeded MODULE_ALIAS gpio: max77620: drop unneeded MODULE_ALIAS gpio: xilinx: simplify getting .driver_data gpio: tegra186: Support multiple interrupts per bank gpio: tegra186: Force one interrupt per bank gpio: uniphier: Use helper functions to get private data from IRQ data gpio: uniphier: Use helper function to get IRQ hardware number dt-bindings: gpio: add gpio-line-names to rockchip,gpio-bank.yaml
2021-11-08Merge tag 'cxl-for-5.16' of ↵Linus Torvalds36-1490/+3269
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl updates from Dan Williams: "More preparation and plumbing work in the CXL subsystem. From an end user perspective the highlight here is lighting up the CXL Persistent Memory related commands (label read / write) with the generic ioctl() front-end in LIBNVDIMM. Otherwise, the ability to instantiate new persistent and volatile memory regions is still on track for v5.17. Summary: - Fix support for platforms that do not enumerate every ACPI0016 (CXL Host Bridge) in the CHBS (ACPI Host Bridge Structure). - Introduce a common pci_find_dvsec_capability() helper, clean up open coded implementations in various drivers. - Add 'cxl_test' for regression testing CXL subsystem ABIs. 'cxl_test' is a module built from tools/testing/cxl/ that mocks up a CXL topology to augment the nascent support for emulation of CXL devices in QEMU. - Convert libnvdimm to use the uuid API. - Complete the definition of CXL namespace labels in libnvdimm. - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL mailbox driver. Enable 'ndctl {read,write}-labels' for CXL. - Continue to sort and refactor functionality into distinct driver and core-infrastructure buckets. For example, mailbox handling is now a generic core capability consumed by the PCI and cxl_test drivers" * tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits) ocxl: Use pci core's DVSEC functionality cxl/pci: Use pci core's DVSEC functionality PCI: Add pci_find_dvsec_capability to find designated VSEC cxl/pci: Split cxl_pci_setup_regs() cxl/pci: Add @base to cxl_register_map cxl/pci: Make more use of cxl_register_map cxl/pci: Remove pci request/release regions cxl/pci: Fix NULL vs ERR_PTR confusion cxl/pci: Remove dev_dbg for unknown register blocks cxl/pci: Convert register block identifiers to an enum cxl/acpi: Do not fail cxl_acpi_probe() based on a missing CHBS cxl/pci: Disambiguate cxl_pci further from cxl_mem Documentation/cxl: Add bus internal docs cxl/core: Split decoder setup into alloc + add tools/testing/cxl: Introduce a mock memory device + driver cxl/mbox: Move command definitions to common location cxl/bus: Populate the target list at decoder create tools/testing/cxl: Introduce a mocked-up CXL port hierarchy cxl/pmem: Add support for multiple nvdimm-bridge objects cxl/pmem: Translate NVDIMM label commands to CXL label commands ...
2021-11-08Merge branch 'i2c/for-mergewindow' of ↵Linus Torvalds23-261/+558
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: - big refactoring of the PASEMI driver to support the Apple M1 - huge improvements to the XIIC in terms of locking and SMP safety - refactoring and clean ups for the i801 driver ... and the usual bunch of small driver updates * 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (43 commits) i2c: amd-mp2-plat: ACPI: Use ACPI_COMPANION() directly i2c: i801: Add support for Intel Ice Lake PCH-N i2c: virtio: update the maintainer to Conghui i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()' i2c: qup: move to use request_irq by IRQF_NO_AUTOEN flag i2c: qup: fix a trivial typo i2c: tegra: Ensure that device is suspended before driver is removed i2c: i801: Fix incorrect and needless software PEC disabling i2c: mediatek: Dump i2c/dma register when a timeout occurs i2c: mediatek: Reset the handshake signal between i2c and dma i2c: mlxcpld: Allow flexible polling time setting for I2C transactions i2c: pasemi: Set enable bit for Apple variant i2c: pasemi: Add Apple platform driver i2c: pasemi: Refactor _probe to use devm_* i2c: pasemi: Allow to configure bus frequency i2c: pasemi: Move common reset code to own function i2c: pasemi: Split pci driver to its own file i2c: pasemi: Split off common probing code i2c: pasemi: Remove usage of pci_dev i2c: pasemi: Use dev_name instead of port number ...
2021-11-08Merge tag 'mtd/for-5.16' of ↵Linus Torvalds44-175/+204
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd updates from Miquel Raynal: "Core: - Remove obsolete macros only used by the old nand_ecclayout struct - Don't remove debugfs directory if device is in use - MAINTAINERS: - Add entry for Qualcomm NAND controller driver - Update the devicetree documentation path of hyperbus MTD devices: - block2mtd: - Add support for an optional custom MTD label - Minor refactor to avoid hard coded constant - mtdswap: Remove redundant assignment of pointer eb CFI: - Fixup CFI on ixp4xx Raw NAND controller drivers: - Arasan: - Prevent an unsupported configuration - Xway, Socrates: plat_nand, Pasemi, Orion, mpc5121, GPIO, Au1550nd, AMS-Delta: - Keep the driver compatible with on-die ECC engines - cs553x, lpc32xx_slc, ndfc, sharpsl, tmio, txx9ndfmc: - Revert the commits: "Fix external use of SW Hamming ECC helper" - And let callers use the bare Hamming helpers - Fsmc: Fix use of SM ORDER - Intel: - Fix potential buffer overflow in probe - xway, vf610, txx9ndfm, tegra, stm32, plat_nand, oxnas, omap, mtk, hisi504, gpmi, gpio, denali, bcm6368, atmel: - Make use of the helper function devm_platform_ioremap_resource{,byname}() Onenand drivers: - Samsung: Drop Exynos4 and describe driver in KConfig Raw NAND chip drivers: - Hynix: Add support for H27UCG8T2ETR-BC MLC NAND SPI NOR core: - Add spi-nor device tree binding under SPI NOR maintainers SPI NOR manufacturer drivers: - Enable locking for n25q128a13 SPI NOR controller drivers: - Use devm_platform_ioremap_resource_byname()" * tag 'mtd/for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (50 commits) mtd: core: don't remove debugfs directory if device is in use MAINTAINERS: Update the devicetree documentation path of hyperbus mtd: block2mtd: add support for an optional custom MTD label mtd: block2mtd: minor refactor to avoid hard coded constant mtd: fixup CFI on ixp4xx mtd: rawnand: arasan: Prevent an unsupported configuration MAINTAINERS: Add entry for Qualcomm NAND controller driver mtd: rawnand: hynix: Add support for H27UCG8T2ETR-BC MLC NAND mtd: rawnand: xway: Keep the driver compatible with on-die ECC engines mtd: rawnand: socrates: Keep the driver compatible with on-die ECC engines mtd: rawnand: plat_nand: Keep the driver compatible with on-die ECC engines mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC engines mtd: rawnand: orion: Keep the driver compatible with on-die ECC engines mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC engines mtd: rawnand: gpio: Keep the driver compatible with on-die ECC engines mtd: rawnand: au1550nd: Keep the driver compatible with on-die ECC engines mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC engines Revert "mtd: rawnand: cs553x: Fix external use of SW Hamming ECC helper" Revert "mtd: rawnand: lpc32xx_slc: Fix external use of SW Hamming ECC helper" Revert "mtd: rawnand: ndfc: Fix external use of SW Hamming ECC helper" ...
2021-11-08Add 'tools/perf/libbpf/' to ignored filesLinus Torvalds1-0/+1
Commit 6b491a86b77c ("perf build: Install libbpf headers locally when building") installed copies of the libbpf headers into the build tree, causing unnecessary noise from 'git status' after a perf tools build. Add the 'libbpf/' subdirectory to the .gitignore file to silence it all again. Signed-off-by: Linus Torvalds <[email protected]>
2021-11-08Merge tag 'kgdb-5.16-rc1' of ↵Linus Torvalds4-122/+53
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb update from Daniel Thompson: "A single patch this cycle. We replace some open-coded routines to classify task states with the scheduler's own function to do this. Alongside the obvious benefits of removing funky code and aligning more exactly with the scheduler's task classification, this also fixes a long standing compiler warning by removing the open-coded routines that generated the warning" * tag 'kgdb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Adopt scheduler's task classification
2021-11-08Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds4-7/+7
Pull OpenRISC updates from Stafford Horne: "This includes two minor cleanups, plus a bug fix for OpenRISC TLB flush code that allows the the SMP kernel to boot again" * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: fix SMP tlb flush NULL pointer dereference openrisc: signal: remove unused DEBUG_SIG macro openrisc: time: don't mark comment as kernel-doc
2021-11-08Merge tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of ↵Linus Torvalds184-1705/+5083
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: "perf annotate: - Add riscv64 support. - Add fusion logic for AMD microarchs. perf record: - Add an option to control the synthesizing behavior: --synth <no|all|task|mmap|cgroup> core: - Allow controlling synthesizing PERF_RECORD_ metadata events during record. - perf.data reader prep work for multithreaded processing. - Fix missing exclude_{host,guest} setting in PMUs that don't support it and that were causing the feature detection code to disable it for all events, even the ones in PMUs that support it. - Fix the default use of precise events on AMD, that were always falling back to non-precise because perf_event_attr.exclude_guest=1 was set and IBS does not have filtering capability, refusing precise + exclude_guest. - Add bitfield_swap() to handle branch_stack endian issue. perf script: - Show binary offsets for userspace addresses in callchains. - Support instruction latency via new "ins_lat" selectable field. - Add dlfilter-show-cycles perf inject: - Add vmlinux and ignore-vmlinux arguments, similar to other tools. perf list: - Display PMU prefix for partially supported hybrid cache events. - Display hybrid PMU events with cpu type. perf stat: - Improve metrics documentation of data structures. - Fix memory leaks in the metric code. - Use NAN for missing event IDs. - Don't compute unused events. - Fix memory leak on error path. - Encode and use metric-id as a metric qualifier. - Allow metrics with no events. - Avoid events for an 'if' constant result. - Only add a referenced metric once. - Simplify metric_refs calculation. - Allow modifiers on metrics. perf test: - Add workload test of metric and metric groups. - Workload test of all PMUs. - vmlinux-kallsyms: Ignore hidden symbols. - Add pmu-event test for event described as "config=". - Verify more event members in pmu-events test. - Add endian test for struct branch_flags on the sample-parsing test. - Improve temp file cleanup in several tests. perf daemon: - Address MSAN warnings on send_cmd(). perf kmem: - Improve man page for record options perf srcline: - Use long-running addr2line per DSO, greatly speeding up the 'srcline' sort order. perf symbols: - Ignore $a/$d symbols for ARM modules. - Fix /proc/kcore access on 32 bit systems. Kernel UAPI copies: - Update copy of linux/socket.h with the kernel sources, no change in tooling output. libbpf: - Pull in bpf_program__get_prog_info_linear() from libbpf, too much specific to perf. - Deprecate bpf_map__resize() in favor of bpf_map_set_max_entries() - Install libbpf headers locally when building. - Bump minimum LLVM C++ std to GNU++14. libperf: - Use binary search in perf_cpu_map__idx() as array are sorted. libtracefs: - Enable libtracefs dynamic linking. libtraceevent: - Increase logging when verbose. Arch specific: * PowerPC: - Add support to expose instruction and data address registers as part of extended regs. Vendor events: * JSON parser: - Support ConfigCode to set the config= in PMUs - Make the JSON parser more conformant when in strict mode. * All JSON files: - Fix all remaining invalid JSON files. * ARM: - Syntax corrections in Neoverse N1 json. - Categorise the Neoverse V1 counters. - Add new armv8 PMU events. - Revise hip08 uncore events. Hardware tracing: * auxtrace: - Add missing Z option to ITRACE_HELP. - Add itrace A option to approximate IPC. - Add itrace d+o option to direct debug log to stdout. * Intel PT: - Add support for PERF_RECORD_AUX_OUTPUT_HW_ID - Support itrace A option to approximate IPC - Support itrace d+o option to direct debug log to stdout" * tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (120 commits) perf build: Install libbpf headers locally when building perf MANIFEST: Add bpftool files to allow building with BUILD_BPF_SKEL=1 perf metric: Fix memory leaks perf parse-event: Add init and exit to parse_event_error perf parse-events: Rename parse_events_error functions perf stat: Fix memory leak on error path perf tools: Use __BYTE_ORDER__ perf inject: Add vmlinux and ignore-vmlinux arguments perf tools: Check vmlinux/kallsyms arguments in all tools perf tools: Refactor out kernel symbol argument sanity checking perf symbols: Ignore $a/$d symbols for ARM modules perf evsel: Don't set exclude_guest by default perf evsel: Fix missing exclude_{host,guest} setting perf bpf: Add missing free to bpf_event__print_bpf_prog_info() perf beauty: Update copy of linux/socket.h with the kernel sources perf clang: Fixes for more recent LLVM/clang tools: Bump minimum LLVM C++ std to GNU++14 perf bpf: Pull in bpf_program__get_prog_info_linear() Revert "perf bench futex: Add support for 32-bit systems with 64-bit time_t" perf test sample-parsing: Add endian test for struct branch_flags ...
2021-11-08Merge tag 'kbuild-v5.16' of ↵Linus Torvalds62-479/+453
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Remove the global -isystem compiler flag, which was made possible by the introduction of <linux/stdarg.h> - Improve the Kconfig help to print the location in the top menu level - Fix "FORCE prerequisite is missing" build warning for sparc - Add new build targets, tarzst-pkg and perf-tarzst-src-pkg, which generate a zstd-compressed tarball - Prevent gen_init_cpio tool from generating a corrupted cpio when KBUILD_BUILD_TIMESTAMP is set to 2106-02-07 or later - Misc cleanups * tag 'kbuild-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits) kbuild: use more subdir- for visiting subdirectories while cleaning sh: remove meaningless archclean line initramfs: Check timestamp to prevent broken cpio archive kbuild: split DEBUG_CFLAGS out to scripts/Makefile.debug gen_init_cpio: add static const qualifiers kbuild: Add make tarzst-pkg build option scripts: update the comments of kallsyms support sparc: Add missing "FORCE" target when using if_changed kconfig: refactor conf_touch_dep() kconfig: refactor conf_write_dep() kconfig: refactor conf_write_autoconf() kconfig: add conf_get_autoheader_name() kconfig: move sym_escape_string_value() to confdata.c kconfig: refactor listnewconfig code kconfig: refactor conf_write_symbol() kconfig: refactor conf_write_heading() kconfig: remove 'const' from the return type of sym_escape_string_value() kconfig: rename a variable in the lexer to a clearer name kconfig: narrow the scope of variables in the lexer kconfig: Create links to main menu items in search ...
2021-11-08Merge tag 'modules-5.16-rc1' of ↵Linus Torvalds1-21/+58
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "As requested by Jessica I'm stepping in to help with modules maintenance. This is my first pull request to you. I've collected only two patches for modules for the 5.16-rc1 merge window. These patches are from Shuah Khan as she debugged some corner case error with modules. The error messages are improved for elf_validity_check(). While doing this work a corner case fix was spotted on validate_section_offset() due to a possible overflow bug on 64-bit. The impact of this fix is low given this just limits module section headers placed within the 32-bit boundary, and we obviously don't have insane module sizes. Even if a specially crafted module is constructed later checks would invalidate the module right away. I've let this sit through 0-day testing since October 15th with no issues found" * tag 'modules-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: change to print useful messages from elf_validity_check() module: fix validate_section_offset() overflow bug on 64-bit
2021-11-08soc: ti: fix wkup_m3_rproc_boot_thread return typeArnd Bergmann1-2/+3
The wkup_m3_rproc_boot_thread() function uses a nonstandard prototype, which broke after Eric's recent cleanup: drivers/soc/ti/wkup_m3_ipc.c: In function 'wkup_m3_rproc_boot_thread': drivers/soc/ti/wkup_m3_ipc.c:429:16: error: 'return' with a value, in function returning void [-Werror=return-type] 429 | return 0; | ^ drivers/soc/ti/wkup_m3_ipc.c:416:13: note: declared here 416 | static void wkup_m3_rproc_boot_thread(struct wkup_m3_ipc *m3_ipc) | ^~~~~~~~~~~~~~~~~~~~~~~~~ Change it to the normal prototype as it should have been from the start. Fixes: 111e70490d2a ("exit/kthread: Have kernel threads return instead of calling do_exit") Fixes: cdd5de500b2c ("soc: ti: Add wkup_m3_ipc driver") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Acked-by: Santosh Shilimkar <[email protected]> Acked-by: Tony Lindgren <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2021-11-08xen/balloon: fix unused-variable warningArnd Bergmann1-2/+3
In configurations with CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=n and CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y, gcc warns about an unused variable: drivers/xen/balloon.c:83:12: error: 'xen_hotplug_unpopulated' defined but not used [-Werror=unused-variable] Since this is always zero when CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is disabled, turn it into a preprocessor constant in that case. Fixes: 121f2faca2c0 ("xen/balloon: rename alloc/free_xenballooned_pages") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]>
2021-11-08io_uring: honour zeroes as io-wq worker limitsPavel Begunkov1-1/+3
When we pass in zero as an io-wq worker number limit it shouldn't actually change the limits but return the old value, follow that behaviour with deferred limits setup as well. Cc: [email protected] # 5.15 Reported-by: Beld Zhang <[email protected]> Fixes: e139a1ec92f8d ("io_uring: apply max_workers limit to all future users") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1b222a92f7a78a24b042763805e891a4cdd4b544.1636384034.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-11-08blk-mq: don't free tags if the tag_set is used by other device in queue ↵Ye Bin1-1/+5
initialztion We got UAF report on v5.10 as follows: [ 1446.674930] ================================================================== [ 1446.675970] BUG: KASAN: use-after-free in blk_mq_get_driver_tag+0x9a4/0xa90 [ 1446.676902] Read of size 8 at addr ffff8880185afd10 by task kworker/1:2/12348 [ 1446.677851] [ 1446.678073] CPU: 1 PID: 12348 Comm: kworker/1:2 Not tainted 5.10.0-10177-gc9c81b1e346a #2 [ 1446.679168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1446.680692] Workqueue: kthrotld blk_throtl_dispatch_work_fn [ 1446.681448] Call Trace: [ 1446.681800] dump_stack+0x9b/0xce [ 1446.682916] print_address_description.constprop.6+0x3e/0x60 [ 1446.685999] kasan_report.cold.9+0x22/0x3a [ 1446.687186] blk_mq_get_driver_tag+0x9a4/0xa90 [ 1446.687785] blk_mq_dispatch_rq_list+0x21a/0x1d40 [ 1446.692576] __blk_mq_do_dispatch_sched+0x394/0x830 [ 1446.695758] __blk_mq_sched_dispatch_requests+0x398/0x4f0 [ 1446.698279] blk_mq_sched_dispatch_requests+0xdf/0x140 [ 1446.698967] __blk_mq_run_hw_queue+0xc0/0x270 [ 1446.699561] __blk_mq_delay_run_hw_queue+0x4cc/0x550 [ 1446.701407] blk_mq_run_hw_queue+0x13b/0x2b0 [ 1446.702593] blk_mq_sched_insert_requests+0x1de/0x390 [ 1446.703309] blk_mq_flush_plug_list+0x4b4/0x760 [ 1446.705408] blk_flush_plug_list+0x2c5/0x480 [ 1446.708471] blk_finish_plug+0x55/0xa0 [ 1446.708980] blk_throtl_dispatch_work_fn+0x23b/0x2e0 [ 1446.711236] process_one_work+0x6d4/0xfe0 [ 1446.711778] worker_thread+0x91/0xc80 [ 1446.713400] kthread+0x32d/0x3f0 [ 1446.714362] ret_from_fork+0x1f/0x30 [ 1446.714846] [ 1446.715062] Allocated by task 1: [ 1446.715509] kasan_save_stack+0x19/0x40 [ 1446.716026] __kasan_kmalloc.constprop.1+0xc1/0xd0 [ 1446.716673] blk_mq_init_tags+0x6d/0x330 [ 1446.717207] blk_mq_alloc_rq_map+0x50/0x1c0 [ 1446.717769] __blk_mq_alloc_map_and_request+0xe5/0x320 [ 1446.718459] blk_mq_alloc_tag_set+0x679/0xdc0 [ 1446.719050] scsi_add_host_with_dma.cold.3+0xa0/0x5db [ 1446.719736] virtscsi_probe+0x7bf/0xbd0 [ 1446.720265] virtio_dev_probe+0x402/0x6c0 [ 1446.720808] really_probe+0x276/0xde0 [ 1446.721320] driver_probe_device+0x267/0x3d0 [ 1446.721892] device_driver_attach+0xfe/0x140 [ 1446.722491] __driver_attach+0x13a/0x2c0 [ 1446.723037] bus_for_each_dev+0x146/0x1c0 [ 1446.723603] bus_add_driver+0x3fc/0x680 [ 1446.724145] driver_register+0x1c0/0x400 [ 1446.724693] init+0xa2/0xe8 [ 1446.725091] do_one_initcall+0x9e/0x310 [ 1446.725626] kernel_init_freeable+0xc56/0xcb9 [ 1446.726231] kernel_init+0x11/0x198 [ 1446.726714] ret_from_fork+0x1f/0x30 [ 1446.727212] [ 1446.727433] Freed by task 26992: [ 1446.727882] kasan_save_stack+0x19/0x40 [ 1446.728420] kasan_set_track+0x1c/0x30 [ 1446.728943] kasan_set_free_info+0x1b/0x30 [ 1446.729517] __kasan_slab_free+0x111/0x160 [ 1446.730084] kfree+0xb8/0x520 [ 1446.730507] blk_mq_free_map_and_requests+0x10b/0x1b0 [ 1446.731206] blk_mq_realloc_hw_ctxs+0x8cb/0x15b0 [ 1446.731844] blk_mq_init_allocated_queue+0x374/0x1380 [ 1446.732540] blk_mq_init_queue_data+0x7f/0xd0 [ 1446.733155] scsi_mq_alloc_queue+0x45/0x170 [ 1446.733730] scsi_alloc_sdev+0x73c/0xb20 [ 1446.734281] scsi_probe_and_add_lun+0x9a6/0x2d90 [ 1446.734916] __scsi_scan_target+0x208/0xc50 [ 1446.735500] scsi_scan_channel.part.3+0x113/0x170 [ 1446.736149] scsi_scan_host_selected+0x25a/0x360 [ 1446.736783] store_scan+0x290/0x2d0 [ 1446.737275] dev_attr_store+0x55/0x80 [ 1446.737782] sysfs_kf_write+0x132/0x190 [ 1446.738313] kernfs_fop_write_iter+0x319/0x4b0 [ 1446.738921] new_sync_write+0x40e/0x5c0 [ 1446.739429] vfs_write+0x519/0x720 [ 1446.739877] ksys_write+0xf8/0x1f0 [ 1446.740332] do_syscall_64+0x2d/0x40 [ 1446.740802] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1446.741462] [ 1446.741670] The buggy address belongs to the object at ffff8880185afd00 [ 1446.741670] which belongs to the cache kmalloc-256 of size 256 [ 1446.743276] The buggy address is located 16 bytes inside of [ 1446.743276] 256-byte region [ffff8880185afd00, ffff8880185afe00) [ 1446.744765] The buggy address belongs to the page: [ 1446.745416] page:ffffea0000616b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x185ac [ 1446.746694] head:ffffea0000616b00 order:2 compound_mapcount:0 compound_pincount:0 [ 1446.747719] flags: 0x1fffff80010200(slab|head) [ 1446.748337] raw: 001fffff80010200 ffffea00006a3208 ffffea000061bf08 ffff88801004f240 [ 1446.749404] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 [ 1446.750455] page dumped because: kasan: bad access detected [ 1446.751227] [ 1446.751445] Memory state around the buggy address: [ 1446.752102] ffff8880185afc00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.753090] ffff8880185afc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.754079] >ffff8880185afd00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1446.755065] ^ [ 1446.755589] ffff8880185afd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1446.756574] ffff8880185afe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.757566] ================================================================== Flag 'BLK_MQ_F_TAG_QUEUE_SHARED' will be set if the second device on the same host initializes it's queue successfully. However, if the second device failed to allocate memory in blk_mq_alloc_and_init_hctx() from blk_mq_realloc_hw_ctxs() from blk_mq_init_allocated_queue(), __blk_mq_free_map_and_rqs() will be called on error path, and if 'BLK_MQ_TAG_HCTX_SHARED' is not set, 'tag_set->tags' will be freed while it's still used by the first device. To fix this issue we move release newly allocated hardware context from blk_mq_realloc_hw_ctxs to __blk_mq_update_nr_hw_queues. As there is needn't to release hardware context in blk_mq_init_allocated_queue. Fixes: 868f2f0b7206 ("blk-mq: dynamic h/w context count") Signed-off-by: Ye Bin <[email protected]> Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-11-08bcache: Revert "bcache: use bvec_virt"Coly Li1-1/+1
This reverts commit 2fd3e5efe791946be0957c8e1eed9560b541fe46. The above commit replaces page_address(bv->bv_page) by bvec_virt(bv) to avoid directly access to bv->bv_page, but in situation bv->bv_offset is not zero and page_address(bv->bv_page) is not equal to bvec_virt(bv). In such case a memory corruption may happen because memory in next page is tainted by following line in do_btree_node_write(), memcpy(bvec_virt(bv), addr, PAGE_SIZE); This patch reverts the mentioned commit to avoid the memory corruption. Fixes: 2fd3e5efe791 ("bcache: use bvec_virt") Signed-off-by: Coly Li <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] # 5.15 Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-11-08arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functionsArnd Bergmann1-3/+9
gcc warns about undefined behavior the vmalloc code when building with CONFIG_ARM64_PA_BITS_52, when the 'idx++' in the argument to __phys_to_pte_val() is evaluated twice: mm/vmalloc.c: In function 'vmap_pfn_apply': mm/vmalloc.c:2800:58: error: operation on 'data->idx' may be undefined [-Werror=sequence-point] 2800 | *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot)); | ~~~~~~~~~^~ arch/arm64/include/asm/pgtable-types.h:25:37: note: in definition of macro '__pte' 25 | #define __pte(x) ((pte_t) { (x) } ) | ^ arch/arm64/include/asm/pgtable.h:80:15: note: in expansion of macro '__phys_to_pte_val' 80 | __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) | ^~~~~~~~~~~~~~~~~ mm/vmalloc.c:2800:30: note: in expansion of macro 'pfn_pte' 2800 | *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot)); | ^~~~~~~ I have no idea why this never showed up earlier, but the safest workaround appears to be changing those macros into inline functions so the arguments get evaluated only once. Cc: Matthew Wilcox <[email protected]> Fixes: 75387b92635e ("arm64: handle 52-bit physical addresses in page table entries") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-11-08arm64: Track no early_pgtable_alloc() for kmemleakQian Cai5-8/+13
After switched page size from 64KB to 4KB on several arm64 servers here, kmemleak starts to run out of early memory pool due to a huge number of those early_pgtable_alloc() calls: kmemleak_alloc_phys() memblock_alloc_range_nid() memblock_phys_alloc_range() early_pgtable_alloc() init_pmd() alloc_init_pud() __create_pgd_mapping() __map_memblock() paging_init() setup_arch() start_kernel() Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times won't be enough for a server with 200GB+ memory. There isn't much interesting to check memory leaks for those early page tables and those early memory mappings should not reference to other memory. Hence, no kmemleak false positives, and we can safely skip tracking those early allocations from kmemleak like we did in the commit fed84c785270 ("mm/memblock.c: skip kmemleak for kasan_init()") without needing to introduce complications to automatically scale the value depends on the runtime memory size etc. After the patch, the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again. Signed-off-by: Qian Cai <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-11-08arm64: mte: change PR_MTE_TCF_NONE back into an unsigned longPeter Collingbourne2-2/+2
This constant was previously an unsigned long, but was changed into an int in commit 433c38f40f6a ("arm64: mte: change ASYNC and SYNC TCF settings into bitfields"). This ended up causing spurious unsigned-signed comparison warnings in expressions such as: (x & PR_MTE_TCF_MASK) != PR_MTE_TCF_NONE Therefore, change it back into an unsigned long to silence these warnings. Link: https://linux-review.googlesource.com/id/I07a72310db30227a5b7d789d0b817d78b657c639 Signed-off-by: Peter Collingbourne <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-11-08arm64: vdso: remove -nostdlib compiler flagMasahiro Yamada2-2/+2
The -nostdlib option requests the compiler to not use the standard system startup files or libraries when linking. It is effective only when $(CC) is used as a linker driver. Since commit 691efbedc60d ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO"), $(LD) is directly used, hence -nostdlib is unneeded. Signed-off-by: Masahiro Yamada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-11-08arm64: arm64_ftr_reg->name may not be a human-readable stringReiji Watanabe1-3/+7
The id argument of ARM64_FTR_REG_OVERRIDE() is used for two purposes: one as the system register encoding (used for the sys_id field of __ftr_reg_entry), and the other as the register name (stringified and used for the name field of arm64_ftr_reg), which is debug information. The id argument is supposed to be a macro that indicates an encoding of the register (eg. SYS_ID_AA64PFR0_EL1, etc). ARM64_FTR_REG(), which also has the same id argument, uses ARM64_FTR_REG_OVERRIDE() and passes the id to the macro. Since the id argument is completely macro-expanded before it is substituted into a macro body of ARM64_FTR_REG_OVERRIDE(), the stringified id in the body of ARM64_FTR_REG_OVERRIDE is not a human-readable register name, but a string of numeric bitwise operations. Fix this so that human-readable register names are available as debug information. Fixes: 8f266a5d878a ("arm64: cpufeature: Add global feature override facility") Signed-off-by: Reiji Watanabe <[email protected]> Reviewed-by: Oliver Upton <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-11-07Add gitignore file for samples/fanotify/ subdirectoryLinus Torvalds1-0/+1
Commit 5451093081db ("samples: Add fs error monitoring example") added a new sample program, but didn't teach git to ignore the new generated files, causing unnecessary noise from 'git status' after a full build. Add the 'fs-monitor' sample executable to the .gitignore for this subdirectory to silence it all again. Signed-off-by: Linus Torvalds <[email protected]>