aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-01-15mm: kmemleak: alloc gray object for reserved region with direct mapCalvin Zhang1-1/+5
Reserved regions with direct mapping may contain references to other regions. CMA region with fixed location is reserved without creating kmemleak_object for it. So add them as gray kmemleak objects. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Calvin Zhang <[email protected]> Cc: Rob Herring <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15kmemleak: fix kmemleak false positive report with HW tag-based kasan enableKuan-Ying Lee1-7/+14
With HW tag-based kasan enable, We will get the warning when we free object whose address starts with 0xFF. It is because kmemleak rbtree stores tagged object and this freeing object's tag does not match with rbtree object. In the example below, kmemleak rbtree stores the tagged object in the kmalloc(), and kfree() gets the pointer with 0xFF tag. Call sequence: ptr = kmalloc(size, GFP_KERNEL); page = virt_to_page(ptr); offset = offset_in_page(ptr); kfree(page_address(page) + offset); ptr = kmalloc(size, GFP_KERNEL); A sequence like that may cause the warning as following: 1) Freeing unknown object: In kfree(), we will get free unknown object warning in kmemleak_free(). Because object(0xFx) in kmemleak rbtree and pointer(0xFF) in kfree() have different tag. 2) Overlap existing: When we allocate that object with the same hw-tag again, we will find the overlap in the kmemleak rbtree and kmemleak thread will be killed. kmemleak: Freeing unknown object at 0xffff000003f88000 CPU: 5 PID: 177 Comm: cat Not tainted 5.16.0-rc1-dirty #21 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x1ac show_stack+0x1c/0x30 dump_stack_lvl+0x68/0x84 dump_stack+0x1c/0x38 kmemleak_free+0x6c/0x70 slab_free_freelist_hook+0x104/0x200 kmem_cache_free+0xa8/0x3d4 test_version_show+0x270/0x3a0 module_attr_show+0x28/0x40 sysfs_kf_seq_show+0xb0/0x130 kernfs_seq_show+0x30/0x40 seq_read_iter+0x1bc/0x4b0 seq_read_iter+0x1bc/0x4b0 kernfs_fop_read_iter+0x144/0x1c0 generic_file_splice_read+0xd0/0x184 do_splice_to+0x90/0xe0 splice_direct_to_actor+0xb8/0x250 do_splice_direct+0x88/0xd4 do_sendfile+0x2b0/0x344 __arm64_sys_sendfile64+0x164/0x16c invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0x44/0xec do_el0_svc+0x74/0x90 el0_svc+0x20/0x80 el0t_64_sync_handler+0x1a8/0x1b0 el0t_64_sync+0x1ac/0x1b0 ... kmemleak: Cannot insert 0xf2ff000003f88000 into the object search tree (overlaps existing) CPU: 5 PID: 178 Comm: cat Not tainted 5.16.0-rc1-dirty #21 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x1ac show_stack+0x1c/0x30 dump_stack_lvl+0x68/0x84 dump_stack+0x1c/0x38 create_object.isra.0+0x2d8/0x2fc kmemleak_alloc+0x34/0x40 kmem_cache_alloc+0x23c/0x2f0 test_version_show+0x1fc/0x3a0 module_attr_show+0x28/0x40 sysfs_kf_seq_show+0xb0/0x130 kernfs_seq_show+0x30/0x40 seq_read_iter+0x1bc/0x4b0 kernfs_fop_read_iter+0x144/0x1c0 generic_file_splice_read+0xd0/0x184 do_splice_to+0x90/0xe0 splice_direct_to_actor+0xb8/0x250 do_splice_direct+0x88/0xd4 do_sendfile+0x2b0/0x344 __arm64_sys_sendfile64+0x164/0x16c invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0x44/0xec do_el0_svc+0x74/0x90 el0_svc+0x20/0x80 el0t_64_sync_handler+0x1a8/0x1b0 el0t_64_sync+0x1ac/0x1b0 kmemleak: Kernel memory leak detector disabled kmemleak: Object 0xf2ff000003f88000 (size 128): kmemleak: comm "cat", pid 177, jiffies 4294921177 kmemleak: min_count = 1 kmemleak: count = 0 kmemleak: flags = 0x1 kmemleak: checksum = 0 kmemleak: backtrace: kmem_cache_alloc+0x23c/0x2f0 test_version_show+0x1fc/0x3a0 module_attr_show+0x28/0x40 sysfs_kf_seq_show+0xb0/0x130 kernfs_seq_show+0x30/0x40 seq_read_iter+0x1bc/0x4b0 kernfs_fop_read_iter+0x144/0x1c0 generic_file_splice_read+0xd0/0x184 do_splice_to+0x90/0xe0 splice_direct_to_actor+0xb8/0x250 do_splice_direct+0x88/0xd4 do_sendfile+0x2b0/0x344 __arm64_sys_sendfile64+0x164/0x16c invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0x44/0xec do_el0_svc+0x74/0x90 kmemleak: Automatic memory scanning thread ended [[email protected]: whitespace tweak] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kuan-Ying Lee <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Doug Berger <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: slab: make slab iterator functions staticMuchun Song3-20/+15
There is no external users of slab_start/next/stop(), so make them static. And the memory.kmem.slabinfo is deprecated, which outputs nothing now, so move memcg_slab_show() into mm/memcontrol.c and rename it to mem_cgroup_slab_show to be consistent with other function names. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/slab_common: use WARN() if cache still has objects on destroyMarco Elver1-8/+3
Calling kmem_cache_destroy() while the cache still has objects allocated is a kernel bug, and will usually result in the entire cache being leaked. While the message in kmem_cache_destroy() resembles a warning, it is currently not implemented using a real WARN(). This is problematic for infrastructure testing the kernel, all of which rely on the specific format of WARN()s to pick up on bugs. Some 13 years ago this used to be a simple WARN_ON() in slub, but commit d629d8195793 ("slub: improve kmem_cache_destroy() error message") changed it into an open-coded warning to avoid confusion with a bug in slub itself. Instead, turn the open-coded warning into a real WARN() with the message preserved, so that test systems can actually identify these issues, and we get all the other benefits of using a normal WARN(). The warning message is extended with "when called from <caller-ip>" to make it even clearer where the fault lies. For most configurations this is only a cosmetic change, however, note that WARN() here will now also respect panic_on_warn. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15fs/ioctl: remove unnecessary __user annotationAmit Daniel Kachhap1-1/+1
__user annotations are used by the checker (e.g sparse) to mark user pointers. However here __user is applied to a struct directly, without a pointer being directly involved. Although the presence of __user does not cause sparse to emit a warning, __user should be removed for consistency with other uses of offsetof(). Note: No functional changes intended. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Amit Daniel Kachhap <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Kevin Brodsky <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ocfs2: remove redundant assignment to variable free_spaceColin Ian King1-1/+1
The variable 'free_space' is being initialized with a value that is not read, it is being re-assigned later in the two paths of an if statement. The early initialization is redundant and can be removed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Acked-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ocfs2: cluster: use default_groups in kobj_typeGreg Kroah-Hartman1-5/+6
There are currently two ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the ocfs2 cluster sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Tested-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ocfs2: remove redundant assignment to pointer root_bhColin Ian King1-1/+1
The variable 'root_bh' is being initialized with a value that is not read, it is being re-assigned later on closer to its use. The early initialization is redundant and can be removed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Acked-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ocfs2: use default_groups in kobj_typeGreg Kroah-Hartman1-1/+2
There are currently two ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the ocfs2 code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Acked-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ocfs2: clearly handle ocfs2_grab_pages_for_write() return valueJoseph Qi1-13/+13
ocfs2_grab_pages_for_write() may return -EAGAIN if write context type is mmap and it could not lock the target page. In this case, we exit with no error and no target page. And then trigger the caller page_mkwrite() to retry. Since there are other caller types, e.g. buffer and direct io, make the return value handling more clear. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joseph Qi <[email protected]> Reported-by: Dan Carpenter <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ocfs2: use BUG_ON instead of if condition followed by BUG.Zhang Mingyu1-4/+2
This issue was detected with the help of Coccinelle. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhang Mingyu <[email protected]> Reported-by: Zeal Robot <[email protected]> Acked-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15squashfs: provide backing_dev_info in order to disable read-aheadZheng Liang1-0/+33
Commit c1f6925e1091 ("mm: put readahead pages in cache earlier") causes the read performance of squashfs to deteriorate.Through testing, we find that the performance will be back by closing the readahead of squashfs. So we want to learn the way of ubifs, provides backing_dev_info and disable read-ahead We tested the following data by fio. squashfs image blocksize=128K test command: fio --name basic --bs=? --filename="/mnt/test_file" --rw=? --iodepth=1 --ioengine=psync --runtime=200 --time_based turn on squashfs readahead in 5.10 kernel bs(k) read/randread MB/s 4 randread 271 128 randread 231 1024 randread 246 4 read 310 128 read 245 1024 read 247 turn off squashfs readahead in 5.10 kernel bs(k) read/randread MB/s 4 randread 293 128 randread 330 1024 randread 363 4 read 338 128 read 360 1024 read 365 turn on squashfs readahead and revert the commit c1f6925e1091("mm: put readahead pages in cache earlier") in 5.10 kernel bs(k) read/randread MB/s 4 randread 289 128 randread 306 1024 randread 335 4 read 337 128 read 336 1024 read 338 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zheng Liang <[email protected]> Reviewed-by: Phillip Lougher <[email protected]> Cc: Zhang Yi <[email protected]> Cc: Hou Tao <[email protected]> Cc: Miao Xie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15fs/ntfs/attrib.c: fix one kernel-doc commentYang Li1-1/+1
The comments for the file should not be in kernel-doc format: /** * attrib.c - NTFS attribute operations. Part of the Linux-NTFS as it causes it to be incorrectly identified for function ntfs_map_runlist_nolock(), causing some warnings found by running scripts/kernel-doc.: fs/ntfs/attrib.c:25: warning: Incorrect use of kernel-doc format: * ntfs_map_runlist_nolock - map (a part of) a runlist of an ntfs inode fs/ntfs/attrib.c:71: warning: Function parameter or member 'ni' not described in 'ntfs_map_runlist_nolock' fs/ntfs/attrib.c:71: warning: Function parameter or member 'vcn' not described in 'ntfs_map_runlist_nolock' fs/ntfs/attrib.c:71: warning: Function parameter or member 'ctx' not described in 'ntfs_map_runlist_nolock' fs/ntfs/attrib.c:71: warning: expecting prototype for attrib.c - NTFS attribute operations. Part of the Linux(). Prototype was for ntfs_map_runlist_nolock() instead Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Li <[email protected]> Reported-by: Abaci Robot <[email protected]> Acked-by: Randy Dunlap <[email protected]> Cc: Anton Altaparmakov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15scripts/spelling.txt: add "oveflow"Drew Fustini1-0/+1
Add typo "oveflow" for "overflow". This typo was found and fixed in tools/testing/selftests/bpf/prog_tests/btf_dump.c Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Drew Fustini <[email protected]> Suggested-by: Gustavo A. R. Silva <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Drew Fustini <[email protected]> Cc: zuoqilin <[email protected]> Cc: Tom Saeger <[email protected]> Cc: Sven Eckelmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ia64: topology: use default_groups in kobj_typeGreg Kroah-Hartman1-1/+2
There are currently two ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the ia64 topology sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ia64: fix typo in a commentJason Wang1-1/+1
The double `the' in a comment is repeated, thus it should be removed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jason Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15arch/ia64/kernel/setup.c: use swap() to make code cleanerYang Guang1-4/+1
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid opencoding it. Link: https://lkml.kernel.org/r/[email protected] Reported-by: Zeal Robot <[email protected]> Signed-off-by: Yang Guang <[email protected]> Cc: David Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ia64: module: use swap() to make code cleanerYang Guang1-4/+2
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid opencoding it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Guang <[email protected]> Reported-by: Zeal Robot <[email protected]> Cc: David Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15trace/hwlat: make use of the helper function kthread_run_on_cpu()Cai Huoqing1-5/+1
Replace kthread_create_on_cpu/wake_up_process() with kthread_run_on_cpu() to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Cai Huoqing <[email protected]> Cc: Bernard Metzler <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Doug Ledford <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15trace/osnoise: make use of the helper function kthread_run_on_cpu()Cai Huoqing1-2/+1
Replace kthread_create_on_cpu/wake_up_process() with kthread_run_on_cpu() to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Cai Huoqing <[email protected]> Cc: Bernard Metzler <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Doug Ledford <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15rcutorture: make use of the helper function kthread_run_on_cpu()Cai Huoqing1-5/+2
Replace kthread_create_on_node/kthread_bind/wake_up_process() with kthread_run_on_cpu() to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Cai Huoqing <[email protected]> Cc: Bernard Metzler <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Doug Ledford <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15ring-buffer: make use of the helper function kthread_run_on_cpu()Cai Huoqing1-5/+2
Replace kthread_create/kthread_bind/wake_up_process() with kthread_run_on_cpu() to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Cai Huoqing <[email protected]> Cc: Bernard Metzler <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Doug Ledford <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15RDMA/siw: make use of the helper function kthread_run_on_cpu()Cai Huoqing1-4/+3
Replace kthread_create/kthread_bind/wake_up_process() with kthread_run_on_cpu() to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Cai Huoqing <[email protected]> Cc: Bernard Metzler <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Doug Ledford <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15kthread: add the helper function kthread_run_on_cpu()Cai Huoqing2-0/+26
Add a new helper function kthread_run_on_cpu(), which includes kthread_create_on_cpu/wake_up_process(). In some cases, use kthread_run_on_cpu() directly instead of kthread_create_on_node/kthread_bind/wake_up_process() or kthread_create_on_cpu/wake_up_process() or kthreadd_create/kthread_bind/wake_up_process() to simplify the code. [[email protected]: export kthread_create_on_cpu to modules] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Cai Huoqing <[email protected]> Cc: Bernard Metzler <[email protected]> Cc: Cai Huoqing <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Doug Ledford <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15drop fen.cocciJulia Lawall1-124/+0
This semantic patch does not take into account the fact that of_node_put can be safely applied to NULL. Thus it gives only false positives. Drop it. Reported-by: Qing Wang <[email protected]> Signed-off-by: Julia Lawall <[email protected]>
2022-01-15scripts/coccinelle: drop bugon.cocciJulia Lawall1-63/+0
The BUG_ON script was never safe, in that it was not able to check whether the condition was side-effecting. At this point, BUG_ON should be well known, so it has probably outlived its usefuless. Signed-off-by: Julia Lawall <[email protected]> Suggested-by: Matthew Wilcox <[email protected]>
2022-01-15MAINTAINERS: remove Gilles MullerJulia Lawall1-1/+0
Gilles Muller passed away on November 17, 2021. We would like to thank him for his continued support for the development of Coccinelle. Signed-off-by: Julia Lawall <[email protected]>
2022-01-15Merge tag 'xfs-5.17-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds3-46/+72
Pull xfs fixes from Darrick Wong: "These are the last few obvious fixes that I found while stress testing online fsck for XFS prior to initiating a design review of the whole giant machinery. - Fix a minor locking inconsistency in readdir - Fix incorrect fs feature bit validation for secondary superblocks" * tag 'xfs-5.17-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix online fsck handling of v5 feature bits on secondary supers xfs: take the ILOCK when readdir inspects directory mapping data
2022-01-14MAINTAINERS: Add Helge as fbdev maintainerHelge Deller1-3/+4
The fbdev layer is orphaned, but seems to need some care. So I'd like to step up as new maintainer. Signed-off-by: Helge Deller <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]>
2022-01-14x86/fpu: Fix inline prefix warningsYang Zhong2-2/+2
Fix sparse warnings in xstate and remove inline prefix. Fixes: 980fe2fddcff ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions") Signed-off-by: Yang Zhong <[email protected]> Reported-by: kernel test robot <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14selftest: kvm: Add amx selftestYang Zhong2-0/+449
This selftest covers two aspects of AMX. The first is triggering #NM exception and checking the MSR XFD_ERR value. The second case is loading tile config and tile data into guest registers and trapping to the host side for a complete save/load of the guest state. TMM0 is also checked against memory data after save/restore. Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14selftest: kvm: Move struct kvm_x86_state to headerYang Zhong2-16/+15
Those changes can avoid dereferencing pointer compile issue when amx_test.c reference state->xsave. Move struct kvm_x86_state definition to processor.h. Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14selftest: kvm: Reorder vcpu_load_state steps for AMXPaolo Bonzini1-8/+9
For AMX support it is recommended to load XCR0 after XFD, so that KVM does not see XFD=0, XCR=1 for a save state that will eventually be disabled (which would lead to premature allocation of the space required for that save state). It is also required to load XSAVE data after XCR0 and XFD, so that KVM can trigger allocation of the extra space required to store AMX state. Adjust vcpu_load_state to obey these new requirements. Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: x86: Disable interception for IA32_XFD on demandKevin Tian4-6/+29
Always intercepting IA32_XFD causes non-negligible overhead when this register is updated frequently in the guest. Disable r/w emulation after intercepting the first WRMSR(IA32_XFD) with a non-zero value. Disable WRMSR emulation implies that IA32_XFD becomes out-of-sync with the software states in fpstate and the per-cpu xfd cache. This leads to two additional changes accordingly: - Call fpu_sync_guest_vmexit_xfd_state() after vm-exit to bring software states back in-sync with the MSR, before handle_exit_irqoff() is called. - Always trap #NM once write interception is disabled for IA32_XFD. The #NM exception is rare if the guest doesn't use dynamic features. Otherwise, there is at most one exception per guest task given a dynamic feature. p.s. We have confirmed that SDM is being revised to say that when setting IA32_XFD[18] the AMX register state is not guaranteed to be preserved. This clarification avoids adding mess for a creative guest which sets IA32_XFD[18]=1 before saving active AMX state to its own storage. Signed-off-by: Kevin Tian <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()Thomas Gleixner2-0/+26
KVM can disable the write emulation for the XFD MSR when the vCPU's fpstate is already correctly sized to reduce the overhead. When write emulation is disabled the XFD MSR state after a VMEXIT is unknown and therefore not in sync with the software states in fpstate and the per CPU XFD cache. Provide fpu_sync_guest_vmexit_xfd_state() which has to be invoked after a VMEXIT before enabling interrupts when write emulation is disabled for the XFD MSR. It could be invoked unconditionally even when write emulation is enabled for the price of a pointless MSR read. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: selftests: Add support for KVM_CAP_XSAVE2Wei Wang10-8/+130
When KVM_CAP_XSAVE2 is supported, userspace is expected to allocate buffer for KVM_GET_XSAVE2 and KVM_SET_XSAVE using the size returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2). Signed-off-by: Wei Wang <[email protected]> Signed-off-by: Guang Zeng <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: x86: Add support for getting/setting expanded xstate bufferGuang Zeng6-5/+106
With KVM_CAP_XSAVE, userspace uses a hardcoded 4KB buffer to get/set xstate data from/to KVM. This doesn't work when dynamic xfeatures (e.g. AMX) are exposed to the guest as they require a larger buffer size. Introduce a new capability (KVM_CAP_XSAVE2). Userspace VMM gets the required xstate buffer size via KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2). KVM_SET_XSAVE is extended to work with both legacy and new capabilities by doing properly-sized memdup_user() based on the guest fpu container. KVM_GET_XSAVE is kept for backward-compatible reason. Instead, KVM_GET_XSAVE2 is introduced under KVM_CAP_XSAVE2 as the preferred interface for getting xstate buffer (4KB or larger size) from KVM (Link: https://lkml.org/lkml/2021/12/15/510) Also, update the api doc with the new KVM_GET_XSAVE2 ioctl. Signed-off-by: Guang Zeng <[email protected]> Signed-off-by: Wei Wang <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Kevin Tian <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14x86/fpu: Add uabi_size to guest_fpuThomas Gleixner3-0/+7
Userspace needs to inquire KVM about the buffer size to work with the new KVM_SET_XSAVE and KVM_GET_XSAVE2. Add the size info to guest_fpu for KVM to access. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Wei Wang <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: x86: Add CPUID support for Intel AMXJing Liu2-2/+27
Extend CPUID emulation to support XFD, AMX_TILE, AMX_INT8 and AMX_BF16. Adding those bits into kvm_cpu_caps finally activates all previous logics in this series. Hide XFD on 32bit host kernels. Otherwise it leads to a weird situation where KVM tells userspace to migrate MSR_IA32_XFD and then rejects attempts to read/write the MSR. Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: x86: Add XCR0 support for Intel AMXJing Liu1-1/+6
Two XCR0 bits are defined for AMX to support XSAVE mechanism. Bit 17 is for tilecfg and bit 18 is for tiledata. The value of XCR0[17:18] is always either 00b or 11b. Also, SDM recommends that only 64-bit operating systems enable Intel AMX by setting XCR0[18:17]. 32-bit host kernel never sets the tile bits in vcpu->arch.guest_supported_xcr0. Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Kevin Tian <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: x86: Disable RDMSR interception of IA32_XFD_ERRJing Liu2-1/+7
This saves one unnecessary VM-exit in guest #NM handler, given that the MSR is already restored with the guest value before the guest is resumed. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: x86: Emulate IA32_XFD_ERR for guestJing Liu1-1/+20
Emulate read/write to IA32_XFD_ERR MSR. Only the saved value in the guest_fpu container is touched in the emulation handler. Actual MSR update is handled right before entering the guest (with preemption disabled) Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Zeng Guang <[email protected]> Signed-off-by: Wei Wang <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: x86: Intercept #NM for saving IA32_XFD_ERRJing Liu3-0/+59
Guest IA32_XFD_ERR is generally modified in two places: - Set by CPU when #NM is triggered; - Cleared by guest in its #NM handler; Intercept #NM for the first case when a nonzero value is written to IA32_XFD. Nonzero indicates that the guest is willing to do dynamic fpstate expansion for certain xfeatures, thus KVM needs to manage and virtualize guest XFD_ERR properly. The vcpu exception bitmap is updated in XFD write emulation according to guest_fpu::xfd. Save the current XFD_ERR value to the guest_fpu container in the #NM VM-exit handler. This must be done with interrupt disabled, otherwise the unsaved MSR value may be clobbered by host activity. The saving operation is conducted conditionally only when guest_fpu:xfd includes a non-zero value. Doing so also avoids misread on a platform which doesn't support XFD but #NM is triggered due to L1 interception. Queueing #NM to the guest is postponed to handle_exception_nmi(). This goes through the nested_vmx check so a virtual vmexit is queued instead when #NM is triggered in L2 but L1 wants to intercept it. Restore the host value (always ZERO outside of the host #NM handler) before enabling interrupt. Restore the guest value from the guest_fpu container right before entering the guest (with interrupt disabled). Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Kevin Tian <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14x86/fpu: Prepare xfd_err in struct fpu_guestJing Liu1-0/+5
When XFD causes an instruction to generate #NM, IA32_XFD_ERR contains information about which disabled state components are being accessed. The #NM handler is expected to check this information and then enable the state components by clearing IA32_XFD for the faulting task (if having permission). If the XFD_ERR value generated in guest is consumed/clobbered by the host before the guest itself doing so, it may lead to non-XFD-related #NM treated as XFD #NM in host (due to non-zero value in XFD_ERR), or XFD-related #NM treated as non-XFD #NM in guest (XFD_ERR cleared by the host #NM handler). Introduce a new field in fpu_guest to save the guest xfd_err value. KVM is expected to save guest xfd_err before interrupt is enabled and restore it right before entering the guest (with interrupt disabled). Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Kevin Tian <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: x86: Add emulation for IA32_XFDJing Liu1-0/+27
Intel's eXtended Feature Disable (XFD) feature allows the software to dynamically adjust fpstate buffer size for XSAVE features which have large state. Because guest fpstate has been expanded for all possible dynamic xstates at KVM_SET_CPUID2, emulation of the IA32_XFD MSR is straightforward. For write just call fpu_update_guest_xfd() to update the guest fpu container once all the sanity checks are passed. For read simply return the cached value in the container. Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Zeng Guang <[email protected]> Signed-off-by: Wei Wang <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulationKevin Tian2-0/+18
Guest XFD can be updated either in the emulation path or in the restore path. Provide a wrapper to update guest_fpu::fpstate::xfd. If the guest fpstate is currently in-use, also update the per-cpu xfd cache and the actual MSR. Signed-off-by: Kevin Tian <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2Jing Liu1-13/+29
KVM can request fpstate expansion in two approaches: 1) When intercepting guest updates to XCR0 and XFD MSR; 2) Before vcpu runs (e.g. at KVM_SET_CPUID2); The first option doesn't waste memory for legacy guest if it doesn't support XFD. However doing so introduces more complexity and also imposes an order requirement in the restoring path, i.e. XCR0/XFD must be restored before XSTATE. Given that the agreement is to do the static approach. This is considered a better tradeoff though it does waste 8K memory for legacy guest if its CPUID includes dynamically-enabled xfeatures. Successful fpstate expansion requires userspace VMM to acquire guest xstate permissions before calling KVM_SET_CPUID2. Also take the chance to adjust the indent in kvm_set_cpuid(). Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Kevin Tian <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14x86/fpu: Provide fpu_enable_guest_xfd_features() for KVMSean Christopherson2-0/+23
Provide a wrapper for expanding the guest fpstate buffer according to requested xfeatures. KVM wants to call this wrapper to manage any dynamic xstate used by the guest. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Kevin Tian <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Message-Id: <[email protected]> [Remove unnecessary 32-bit check. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14x86/fpu: Add guest support to xfd_enable_feature()Thomas Gleixner2-39/+60
Guest support for dynamically enabled FPU features requires a few modifications to the enablement function which is currently invoked from the #NM handler: 1) Use guest permissions and sizes for the update 2) Update fpu_guest state accordingly 3) Take into account that the enabling can be triggered either from a running guest via XSETBV and MSR_IA32_XFD write emulation or from a guest restore. In the latter case the guests fpstate is not the current tasks active fpstate. Split the function and implement the guest mechanics throughout the callchain. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> [Add 32-bit stub for __xfd_enable_feature. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-14x86/fpu: Make XFD initialization in __fpstate_reset() a function argumentJing Liu1-5/+6
vCPU threads are different from native tasks regarding to the initial XFD value. While all native tasks follow a fixed value (init_fpstate::xfd) established by the FPU core at boot, vCPU threads need to obey the reset value (i.e. ZERO) defined by the specification, to meet the expectation of the guest. Let the caller supply an argument and adjust the host and guest related invocations accordingly. Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Yang Zhong <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>