blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2021-07-15	mm/page_alloc: further fix __alloc_pages_bulk() return value	Chuck Lever	1	-6/+8
	The author of commit b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after checking populated elements") was possibly confused by the mixture of return values throughout the function. The API contract is clear that the function "Returns the number of pages on the list or array." It does not list zero as a unique return value with a special meaning. Therefore zero is a plausible return value only if @nr_pages is zero or less. Clean up the return logic to make it clear that the returned value is always the total number of pages in the array/list, not the number of pages that were allocated during this call. The only change in behavior with this patch is the value returned if prepare_alloc_pages() fails. To match the API contract, the number of pages currently in the array/list is returned in this case. The call site in __page_pool_alloc_pages_slow() also seems to be confused on this matter. It should be attended to by someone who is familiar with that code. [[email protected]: Return nr_populated if 0 pages are requested] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Cc: Desmond Cheong Zhi Xi <[email protected]> Cc: Zhang Qiang <[email protected]> Cc: Yanfei Xu <[email protected]> Cc: Matteo Croce <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-15	mm/page_alloc: correct return value when failing at preparing	Yanfei Xu	1	-1/+1
	If the array passed in is already partially populated, we should return "nr_populated" even failing at preparing arguments stage. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yanfei Xu <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-15	mm/page_alloc: avoid page allocator recursion with pagesets.lock held	Mel Gorman	1	-0/+12
	Syzbot is reporting potential deadlocks due to pagesets.lock when PAGE_OWNER is enabled. One example from Desmond Cheong Zhi Xi is as follows __alloc_pages_bulk() local_lock_irqsave(&pagesets.lock, flags) <---- outer lock here prep_new_page(): post_alloc_hook(): set_page_owner(): __set_page_owner(): save_stack(): stack_depot_save(): alloc_pages(): alloc_page_interleave(): __alloc_pages(): get_page_from_freelist(): rm_queue(): rm_queue_pcplist(): local_lock_irqsave(&pagesets.lock, flags); * DEADLOCK * Zhang, Qiang also reported BUG: sleeping function called from invalid context at mm/page_alloc.c:5179 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 ..... __dump_stack lib/dump_stack.c:79 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96 ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153 prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179 __alloc_pages+0x12f/0x500 mm/page_alloc.c:5375 alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2147 alloc_pages+0x238/0x2a0 mm/mempolicy.c:2270 stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303 save_stack+0x15e/0x1e0 mm/page_owner.c:120 __set_page_owner+0x50/0x290 mm/page_owner.c:181 prep_new_page mm/page_alloc.c:2445 [inline] __alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313 alloc_pages_bulk_array_node include/linux/gfp.h:557 [inline] vm_area_alloc_pages mm/vmalloc.c:2775 [inline] __vmalloc_area_node mm/vmalloc.c:2845 [inline] __vmalloc_node_range+0x39d/0x960 mm/vmalloc.c:2947 __vmalloc_node mm/vmalloc.c:2996 [inline] vzalloc+0x67/0x80 mm/vmalloc.c:3066 There are a number of ways it could be fixed. The page owner code could be audited to strip GFP flags that allow sleeping but it'll impair the functionality of PAGE_OWNER if allocations fail. The bulk allocator could add a special case to release/reacquire the lock for prep_new_page and lookup PCP after the lock is reacquired at the cost of performance. The pages requiring prep could be tracked using the least significant bit and looping through the array although it is more complicated for the list interface. The options are relatively complex and the second one still incurs a performance penalty when PAGE_OWNER is active so this patch takes the simple approach -- disable bulk allocation of PAGE_OWNER is active. The caller will be forced to allocate one page at a time incurring a performance penalty but PAGE_OWNER is already a performance penalty. Link: https://lkml.kernel.org/r/[email protected] Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock") Signed-off-by: Mel Gorman <[email protected]> Reported-by: Desmond Cheong Zhi Xi <[email protected]> Reported-by: "Zhang, Qiang" <[email protected]> Reported-by: [email protected] Tested-by: [email protected] Acked-by: Rafael Aquini <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-15	Revert "mm/page_alloc: make should_fail_alloc_page() static"	Matteo Croce	1	-1/+1
	This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441. Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y: LD vmlinux BTFIDS vmlinux FAILED unresolved symbol should_fail_alloc_page make: * [Makefile:1199: vmlinux] Error 255 make: * Deleting file 'vmlinux' Link: https://lkml.kernel.org/r/[email protected] Fixes: f7173090033c ("mm/page_alloc: make should_fail_alloc_page() static") Signed-off-by: Matteo Croce <[email protected]> Acked-by: Mel Gorman <[email protected]> Tested-by: John Hubbard <[email protected]> Cc: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-15	kasan: fix build by including kernel.h	Marco Elver	1	-0/+1
	The <linux/kasan.h> header relies on _RET_IP_ being defined, and had been receiving that definition via inclusion of bug.h which includes kernel.h. However, since f39650de687e ("kernel.h: split out panic and oops helpers") that is no longer the case and get the following build error when building CONFIG_KASAN_HW_TAGS on arm64: In file included from arch/arm64/mm/kasan_init.c:10: include/linux/kasan.h: In function 'kasan_slab_free': include/linux/kasan.h:230:39: error: '_RET_IP_' undeclared (first use in this function) 230 \| return __kasan_slab_free(s, object, _RET_IP_, init); Fix it by including kernel.h from kasan.h. Link: https://lkml.kernel.org/r/[email protected] Fixes: f39650de687e ("kernel.h: split out panic and oops helpers") Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-15	kasan: add memzero init for unaligned size at DEBUG	Yee Lee	1	-0/+12
	Issue: when SLUB debug is on, hwtag kasan_unpoison() would overwrite the redzone of object with unaligned size. An additional memzero_explicit() path is added to replacing init by hwtag instruction for those unaligned size at SLUB debug mode. The penalty is acceptable since they are only enabled in debug mode, not production builds. A block of comment is added for explanation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yee Lee <[email protected]> Suggested-by: Andrey Konovalov <[email protected]> Suggested-by: Marco Elver <[email protected]> Reviewed-by: Marco Elver <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Nicholas Tang <[email protected]> Cc: Kuan-Ying Lee <[email protected]> Cc: Chinwen Chang <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-15	mm: move helper to check slub_debug_enabled	Marco Elver	2	-18/+11
	Move the helper to check slub_debug_enabled, so that we can confine the use of #ifdef outside slub.c as well. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Yee Lee <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Chinwen Chang <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Kuan-Ying Lee <[email protected]> Cc: Nicholas Tang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-15	xfs: detect misaligned rtinherit directory extent size hints	Darrick J. Wong	1	-2/+16
	If we encounter a directory that has been configured to pass on an extent size hint to a new realtime file and the hint isn't an integer multiple of the rt extent size, we should flag the hint for administrative review because that is a misconfiguration (that other parts of the kernel will fix automatically). Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2021-07-15	xfs: fix an integer overflow error in xfs_growfs_rt	Darrick J. Wong	1	-5/+5
	During a realtime grow operation, we run a single transaction for each rt bitmap block added to the filesystem. This means that each step has to be careful to increase sb_rblocks appropriately. Fix the integer overflow error in this calculation that can happen when the extent size is very large. Found by running growfs to add a rt volume to a filesystem formatted with a 1g rt extent size. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2021-07-15	xfs: improve FSGROWFSRT precondition checking	Darrick J. Wong	1	-7/+32
	Improve the checking at the start of a realtime grow operation so that we avoid accidentally set a new extent size that is too large and avoid adding an rt volume to a filesystem with rmap or reflink because we don't support rt rmap or reflink yet. While we're at it, separate the checks so that we're only testing one aspect at a time. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2021-07-15	xfs: don't expose misaligned extszinherit hints to userspace	Darrick J. Wong	1	-1/+18
	Commit 603f000b15f2 changed xfs_ioctl_setattr_check_extsize to reject an attempt to set an EXTSZINHERIT extent size hint on a directory with RTINHERIT set if the hint isn't a multiple of the realtime extent size. However, I have recently discovered that it is possible to change the realtime extent size when adding a rt device to a filesystem, which means that the existence of directories with misaligned inherited hints is not an accident. As a result, it's possible that someone could have set a valid hint and added an rt volume with a different rt extent size, which invalidates the ondisk hints. After such a sequence, FSGETXATTR will report a misaligned hint, which FSSETXATTR will trip over, causing confusion if the user was doing the usual GET/SET sequence to change some other attribute. Change xfs_fill_fsxattr to omit the hint if it isn't aligned properly. Fixes: 603f000b15f2 ("xfs: validate extsz hints against rt extent size when rtinherit is set") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2021-07-15	xfs: correct the narrative around misaligned rtinherit/extszinherit dirs	Darrick J. Wong	3	-22/+24
	While auditing the realtime growfs code, I realized that the GROWFSRT ioctl (and by extension xfs_growfs) has always allowed sysadmins to change the realtime extent size when adding a realtime section to the filesystem. Since we also have always allowed sysadmins to set RTINHERIT and EXTSZINHERIT on directories even if there is no realtime device, this invalidates the premise laid out in the comments added in commit 603f000b15f2. In other words, this is not a case of inadequate metadata validation. This is a case of nearly forgotten (and apparently untested) but supported functionality. Update the comments to reflect what we've learned, and remove the log message about correcting the misalignment. Fixes: 603f000b15f2 ("xfs: validate extsz hints against rt extent size when rtinherit is set") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Carlos Maiolino <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2021-07-15	xfs: reset child dir '..' entry when unlinking child	Darrick J. Wong	1	-0/+13
	While running xfs/168, I noticed a second source of post-shrink corruption errors causing shutdowns. Let's say that directory B has a low inode number and is a child of directory A, which has a high number. If B is empty but open, and unlinked from A, B's dotdot link continues to point to A. If A is then unlinked and the filesystem shrunk so that A is no longer a valid inode, a subsequent AIL push of B will trip the inode verifiers because the dotdot entry points outside of the filesystem. To avoid this problem, reset B's dotdot entry to the root directory when unlinking directories, since the root directory cannot be removed. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Gao Xiang <[email protected]>
2021-07-15	xfs: check for sparse inode clusters that cross new EOAG when shrinking	Darrick J. Wong	3	-0/+66
	While running xfs/168, I noticed occasional write verifier shutdowns involving inodes at the very end of the filesystem. Existing inode btree validation code checks that all inode clusters are fully contained within the filesystem. However, due to inadequate checking in the fs shrink code, it's possible that there could be a sparse inode cluster at the end of the filesystem where the upper inodes of the cluster are marked as holes and the corresponding blocks are free. In this case, the last blocks in the AG are listed in the bnobt. This enables the shrink to proceed but results in a filesystem that trips the inode verifiers. Fix this by disallowing the shrink. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Gao Xiang <[email protected]>
2021-07-15	iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor	Andreas Gruenbacher	1	-1/+0
	Now that we create those objects in iomap_writepage_map when needed, there's no need to pre-create them in iomap_page_mkwrite_actor anymore. Signed-off-by: Andreas Gruenbacher <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-07-15	iomap: Don't create iomap_page objects for inline files	Andreas Gruenbacher	1	-1/+3
	In iomap_readpage_actor, don't create iop objects for inline inodes. Otherwise, iomap_read_inline_data will set PageUptodate without setting iop->uptodate, and iomap_page_release will eventually complain. To prevent this kind of bug from occurring in the future, make sure the page doesn't have private data attached in iomap_read_inline_data. Signed-off-by: Andreas Gruenbacher <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-07-15	iomap: Permit pages without an iop to enter writeback	Andreas Gruenbacher	1	-2/+1
	Create an iop in the writeback path if one doesn't exist. This allows us to avoid creating the iop in some cases. We'll initially do that for pages with inline data, but it can be extended to pages which are entirely within an extent. It also allows for an iop to be removed from pages in the future (eg page split). Co-developed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-07-15	iomap: remove the length variable in iomap_seek_hole	Christoph Hellwig	1	-6/+3
	The length variable is rather pointless given that it can be trivially deduced from offset and size. Also the initial calculation can lead to KASAN warnings. Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: Leizhen (ThunderTown) <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
2021-07-15	iomap: remove the length variable in iomap_seek_data	Christoph Hellwig	1	-10/+6
	The length variable is rather pointless given that it can be trivially deduced from offset and size. Also the initial calculation can lead to KASAN warnings. Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: Leizhen (ThunderTown) <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
2021-07-15	arm64: entry: fix KCOV suppression	Mark Rutland	1	-1/+1
	We suppress KCOV for entry.o rather than entry-common.o. As entry.o is built from entry.S, this is pointless, and permits instrumentation of entry-common.o, which is built from entry-common.c. Fix the Makefile to suppress KCOV for entry-common.o, as we had intended to begin with. I've verified with objdump that this is working as expected. Fixes: bf6fa2c0dda7 ("arm64: entry: don't instrument entry code with KCOV") Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-07-15	arm64: entry: add missing noinstr	Mark Rutland	1	-1/+1
	We intend that all the early exception handling code is marked as `noinstr`, but we forgot this for __el0_error_handler_common(), which is called before we have completed entry from user mode. If it were instrumented, we could run into problems with RCU, lockdep, etc. Mark it as `noinstr` to prevent this. The few other functions in entry-common.c which do not have `noinstr` are called once we've completed entry, and are safe to instrument. Fixes: bb8e93a287a5 ("arm64: entry: convert SError handlers to C") Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Joey Gouly <[email protected]> Cc: James Morse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-07-15	arm64: mte: fix restoration of GCR_EL1 from suspend	Mark Rutland	1	-13/+2
	Since commit: bad1e1c663e0a72f ("arm64: mte: switch GCR_EL1 in kernel entry and exit") we saved/restored the user GCR_EL1 value at exception boundaries, and update_gcr_el1_excl() is no longer used for this. However it is used to restore the kernel's GCR_EL1 value when returning from a suspend state. Thus, the comment is misleading (and an ISB is necessary). When restoring the kernel's GCR value, we need an ISB to ensure this is used by subsequent instructions. We don't necessarily get an ISB by other means (e.g. if the kernel is built without support for pointer authentication). As __cpu_setup() initialised GCR_EL1.Exclude to 0xffff, until a context synchronization event, allocation tag 0 may be used rather than the desired set of tags. This patch drops the misleading comment, adds the missing ISB, and for clarity folds update_gcr_el1_excl() into its only user. Fixes: bad1e1c663e0 ("arm64: mte: switch GCR_EL1 in kernel entry and exit") Signed-off-by: Mark Rutland <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-07-15	arm64: Avoid premature usercopy failure	Robin Murphy	3	-13/+35
	Al reminds us that the usercopy API must only return complete failure if absolutely nothing could be copied. Currently, if userspace does something silly like giving us an unaligned pointer to Device memory, or a size which overruns MTE tag bounds, we may fail to honour that requirement when faulting on a multi-byte access even though a smaller access could have succeeded. Add a mitigation to the fixup routines to fall back to a single-byte copy if we faulted on a larger access before anything has been written to the destination, to guarantee making some forward progress. We needn't be too concerned about the overall performance since this should only occur when callers are doing something a bit dodgy in the first place. Particularly broken userspace might still be able to trick generic_perform_write() into an infinite loop by targeting write() at an mmap() of some read-only device register where the fault-in load succeeds but any store synchronously aborts such that copy_to_user() is genuinely unable to make progress, but, well, don't do that... CC: [email protected] Reported-by: Chen Huang <[email protected]> Suggested-by: Al Viro <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Signed-off-by: Robin Murphy <[email protected]> Link: https://lore.kernel.org/r/dc03d5c675731a1f24a62417dba5429ad744234e.1626098433.git.robin.murphy@arm.com Signed-off-by: Will Deacon <[email protected]>
2021-07-15	xen-blkfront: sanitize the removal state machine	Christoph Hellwig	1	-198/+26
	xen-blkfront has a weird protocol where close message from the remote side can be delayed, and where hot removals are treated somewhat differently from regular removals, all leading to potential NULL pointer removals, and a del_gendisk from the block device release method, which will deadlock. Fix this by just performing normal hot removals even when the device is opened like all other Linux block drivers. Fixes: c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk") Reported-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Vitaly Kuznetsov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-07-15	Merge tag 'nvme-5.14-2021-07-15' of git://git.infradead.org/nvme into block-5.14	Jens Axboe	2	-12/+59
	Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.14 - fix various races in nvme-pci when shutting down just after probing (Casey Chen) - fix a net_device leak in nvme-tcp (Prabhakar Kushwaha)" * tag 'nvme-5.14-2021-07-15' of git://git.infradead.org/nvme: nvme-pci: do not call nvme_dev_remove_admin from nvme_remove nvme-pci: fix multiple races in nvme_setup_io_queues nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE
2021-07-15	nbd: fix order of cleaning up the queue and freeing the tagset	Wang Qing	1	-1/+1
	We must release the queue before freeing the tagset. Fixes: 4af5f2e03013 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") Reported-and-tested-by: [email protected] Signed-off-by: Wang Qing <[email protected]> Signed-off-by: Guoqing Jiang <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-07-15	pd: fix order of cleaning up the queue and freeing the tagset	Guoqing Jiang	1	-1/+1
	We must release the queue before freeing the tagset. Fixes: 262d431f9000 ("pd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Guoqing Jiang <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-07-15	dt-bindings: Move fixed string 'patternProperties' to 'properties'	Rob Herring	5	-59/+60
	There's no need for fixed strings to be under 'patternProperties', so move them under 'properties' instead. Cc: Jean Delvare <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Kishon Vijay Abraham I <[email protected]> Cc: Vinod Koul <[email protected]> Cc: Saravanan Sekar <[email protected]> Cc: Mark Brown <[email protected]> Cc: Jagan Teki <[email protected]> Cc: Troy Kisky <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Rob Herring <[email protected]> Acked-by: Mark Brown <[email protected]> Acked-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-07-15	dt-bindings: More dropping redundant minItems/maxItems	Rob Herring	6	-15/+0
	Another round of removing redundant minItems/maxItems from new schema in the recent merge window. If a property has an 'items' list, then a 'minItems' or 'maxItems' with the same size as the list is redundant and can be dropped. Note that is DT schema specific behavior and not standard json-schema behavior. The tooling will fixup the final schema adding any unspecified minItems/maxItems. This condition is partially checked with the meta-schema already, but only if both 'minItems' and 'maxItems' are equal to the 'items' length. An improved meta-schema is pending. Cc: Stephen Boyd <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Will Deacon <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Vignesh Raghavendra <[email protected]> Cc: Alessandro Zummo <[email protected]> Cc: Alexandre Belloni <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Sureshkumar Relli <[email protected]> Cc: Brian Norris <[email protected]> Cc: Kamal Dasu <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Sebastian Siewior <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Acked-by: Alexandre Belloni <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-07-15	KVM: selftests: smm_test: Test SMM enter from L2	Vitaly Kuznetsov	1	-6/+64
	Two additional tests are added: - SMM triggered from L2 does not currupt L1 host state. - Save/restore during SMM triggered from L2 does not corrupt guest/host state. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: nSVM: Restore nested control upon leaving SMM	Vitaly Kuznetsov	3	-3/+10
	If the VM was migrated while in SMM, no nested state was saved/restored, and therefore svm_leave_smm has to load both save and control area of the vmcb12. Save area is already loaded from HSAVE area, so now load the control area as well from the vmcb12. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: nSVM: Fix L1 state corruption upon return from SMM	Vitaly Kuznetsov	1	-1/+38
	VMCB split commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest") broke return from SMM when we entered there from guest (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem manifests itself like this: kvm_exit: reason EXIT_RSM rip 0x7ffbb280 info 0 0 kvm_emulate_insn: 0:7ffbb280: 0f aa kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x7ffb3000 kvm_nested_vmrun: rip: 0x000000007ffbb280 vmcb: 0x0000000008224000 nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000 npt: on kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002 intercepts: fd44bfeb 0000217f 00000000 kvm_entry: vcpu 0, rip 0xffffffffffbbe119 kvm_exit: reason EXIT_NPF rip 0xffffffffffbbe119 info 200000006 1ab000 kvm_nested_vmexit: vcpu 0 reason npf rip 0xffffffffffbbe119 info1 0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000 error_code 0x00000000 kvm_page_fault: address 1ab000 error_code 6 kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000 int_info 0 int_info_err 0 kvm_entry: vcpu 0, rip 0x7ffbb280 kvm_exit: reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0 kvm_emulate_insn: 0:7ffbb280: 0f aa kvm_inj_exception: #GP (0x0) Note: return to L2 succeeded but upon first exit to L1 its RIP points to 'RSM' instruction but we're not in SMM. The problem appears to be that VMCB01 gets irreversibly destroyed during SMM execution. Previously, we used to have 'hsave' VMCB where regular (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just switch to VMCB01 from VMCB02. Pre-split (working) flow looked like: - SMM is triggered during L2's execution - L2's state is pushed to SMRAM - nested_svm_vmexit() restores L1's state from 'hsave' - SMM -> RSM - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have pre-SMM (and pre L2 VMRUN) L1's state there - L2's state is restored from SMRAM - upon first exit L1's state is restored from L1. This was always broken with regards to svm_get_nested_state()/ svm_set_nested_state(): 'hsave' was never a part of what's being save and restored so migration happening during SMM triggered from L2 would never restore L1's state correctly. Post-split flow (broken) looks like: - SMM is triggered during L2's execution - L2's state is pushed to SMRAM - nested_svm_vmexit() switches to VMCB01 from VMCB02 - SMM -> RSM - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01 is already lost. - L2's state is restored from SMRAM - upon first exit L1's state is restored from VMCB01 but it is corrupted (reflects the state during 'RSM' execution). VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest and host state so when we switch back to VMCS02 L1's state is intact there. To resolve the issue we need to save L1's state somewhere. We could've created a third VMCB for SMM but that would require us to modify saved state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA) seems appropriate: L0 is free to save any (or none) of L1's state there. Currently, KVM does 'none'. Note, for nested state migration to succeed, both source and destination hypervisors must have the fix. We, however, don't need to create a new flag indicating the fact that HSAVE area is now populated as migration during SMM triggered from L2 was always broken. Fixes: 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: nSVM: Introduce svm_copy_vmrun_state()	Vitaly Kuznetsov	2	-18/+24
	Separate the code setting non-VMLOAD-VMSAVE state from svm_set_nested_state() into its own function. This is going to be re-used from svm_enter_smm()/svm_leave_smm(). Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN	Vitaly Kuznetsov	1	-0/+5
	APM states that "The address written to the VM_HSAVE_PA MSR, which holds the address of the page used to save the host state on a VMRUN, must point to a hypervisor-owned page. If this check fails, the WRMSR will fail with a #GP(0) exception. Note that a value of 0 is not considered valid for the VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will fail with a #GP(0) exception." svm_set_msr() already checks that the supplied address is valid, so only check for '0' is missing. Add it to nested_svm_vmrun(). Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA	Vitaly Kuznetsov	1	-1/+10
	APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when the supplied address is not page-aligned or is outside of "maximum supported physical address for this implementation". page_address_valid() check seems suitable. Also, forcefully page-align the address when it's written from VMM. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Reviewed-by: Maxim Levitsky <[email protected]> [Add comment about behavior for host-provided values. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: SVM: Fix sev_pin_memory() error checks in SEV migration utilities	Sean Christopherson	1	-4/+5
	Use IS_ERR() instead of checking for a NULL pointer when querying for sev_pin_memory() failures. sev_pin_memory() always returns an error code cast to a pointer, or a valid pointer; it never returns NULL. Reported-by: Dan Carpenter <[email protected]> Cc: Steve Rutherford <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: Ashish Kalra <[email protected]> Fixes: d3d1af85e2c7 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command") Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: SVM: Return -EFAULT if copy_to_user() for SEV mig packet header fails	Sean Christopherson	1	-2/+3
	Return -EFAULT if copy_to_user() fails; if accessing user memory faults, copy_to_user() returns the number of bytes remaining, not an error code. Reported-by: Dan Carpenter <[email protected]> Cc: Steve Rutherford <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: Ashish Kalra <[email protected]> Fixes: d3d1af85e2c7 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: SVM: add module param to control the #SMI interception	Maxim Levitsky	3	-1/+14
	In theory there are no side effects of not intercepting #SMI, because then #SMI becomes transparent to the OS and the KVM. Plus an observation on recent Zen2 CPUs reveals that these CPUs ignore #SMI interception and never deliver #SMI VMexits. This is also useful to test nested KVM to see that L1 handles #SMIs correctly in case when L1 doesn't intercept #SMI. Finally the default remains the same, the SMI are intercepted by default thus this patch doesn't have any effect unless non default module param value is used. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: SVM: remove INIT intercept handler	Maxim Levitsky	1	-1/+0
	Kernel never sends real INIT even to CPUs, other than on boot. Thus INIT interception is an error which should be caught by a check for an unknown VMexit reason. On top of that, the current INIT VM exit handler skips the current instruction which is wrong. That was added in commit 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code"). Fixes: 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code") Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: SVM: #SMI interception must not skip the instruction	Maxim Levitsky	1	-1/+6
	Commit 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code"), unfortunately made a mistake of treating nop_on_interception and nop_interception in the same way. Former does truly nothing while the latter skips the instruction. SMI VM exit handler should do nothing. (SMI itself is handled by the host when we do STGI) Fixes: 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code") Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: VMX: Remove vmx_msr_index from vmx.h	Yu Zhang	1	-2/+0
	vmx_msr_index was used to record the list of MSRs which can be lazily restored when kvm returns to userspace. It is now reimplemented as kvm_uret_msrs_list, a common x86 list which is only used inside x86.c. So just remove the obsolete declaration in vmx.h. Signed-off-by: Yu Zhang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run()	Lai Jiangshan	1	-0/+2
	When the host is using debug registers but the guest is not using them nor is the guest in guest-debug state, the kvm code does not reset the host debug registers before kvm_x86->run(). Rather, it relies on the hardware vmentry instruction to automatically reset the dr7 registers which ensures that the host breakpoints do not affect the guest. This however violates the non-instrumentable nature around VM entry and exit; for example, when a host breakpoint is set on vcpu->arch.cr2, Another issue is consistency. When the guest debug registers are active, the host breakpoints are reset before kvm_x86->run(). But when the guest debug registers are inactive, the host breakpoints are delayed to be disabled. The host tracing tools may see different results depending on what the guest is doing. To fix the problems, we clear %db7 unconditionally before kvm_x86->run() if the host has set any breakpoints, no matter if the guest is using them or not. Signed-off-by: Lai Jiangshan <[email protected]> Message-Id: <[email protected]> Cc: [email protected] [Only clear %db7 instead of reloading all debug registers. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	KVM: selftests: Address extra memslot parameters in vm_vaddr_alloc	Ricardo Koller	1	-1/+1
	Commit a75a895e6457 ("KVM: selftests: Unconditionally use memslot 0 for vaddr allocations") removed the memslot parameters from vm_vaddr_alloc. It addressed all callers except one under lib/aarch64/, due to a race with commit e3db7579ef35 ("KVM: selftests: Add exception handling support for aarch64") Fix the vm_vaddr_alloc call in lib/aarch64/processor.c. Reported-by: Zenghui Yu <[email protected]> Signed-off-by: Ricardo Koller <[email protected]> Message-Id: <[email protected]> Reviewed-by: Eric Auger <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	kvm: debugfs: fix memory leak in kvm_create_vm_debugfs	Pavel Skripkin	1	-1/+1
	In commit bc9e9e672df9 ("KVM: debugfs: Reuse binary stats descriptors") loop for filling debugfs_stat_data was copy-pasted 2 times, but in the second loop pointers are saved over pointers allocated in the first loop. All this causes is a memory leak, fix it. Fixes: bc9e9e672df9 ("KVM: debugfs: Reuse binary stats descriptors") Signed-off-by: Pavel Skripkin <[email protected]> Reviewed-by: Jing Zhang <[email protected]> Message-Id: <[email protected]> Reviewed-by: Jing Zhang <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-07-15	dt-bindings: net: dsa: sja1105: Fix indentation warnings	Thierry Reding	1	-2/+2
	Some of the lines aren't properly indented, causing yamllint to warn about them: .../nxp,sja1105.yaml:70:17: [warning] wrong indentation: expected 18 but found 16 (indentation) Use the proper indentation to fix those warnings. Signed-off-by: Thierry Reding <[email protected]> Fixes: 070f5b701d559ae1 ("dt-bindings: net: dsa: sja1105: add SJA1110 bindings") Tested-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2021-07-15	docs/zh_CN: add a missing space character	Hu Haowen	1	-2/+2
	"LinusTorvalds" is not pretty. Replace it with "Linus Torvalds". Signed-off-by: Hu Haowen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2021-07-15	Documentation/features: Add THREAD_INFO_IN_TASK feature matrix	Ingo Molnar	1	-0/+32
	Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2021-07-15	Documentation/features: Update the ARCH_HAS_TICK_BROADCAST entry	Ingo Molnar	1	-1/+1
	Risc-V gained support recently. Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2021-07-15	LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes"	Nishanth Menon	1	-1/+1
	A couple of exotic quote characters came in with this license text; they can confuse software that is not expecting non-ASCII text. Switch to normal quotes here, with no changes to the actual license text. Reported-by: Rahul T R <[email protected]> Signed-off-by: Nishanth Menon <[email protected]> CC: Greg Kroah-Hartman <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Thorsten Leemhuis <[email protected]> Acked-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2021-07-15	KVM: PPC: Book3S HV P9: Fix guest TM support	Nicholas Piggin	1	-3/+22
	The conversion to C introduced several bugs in TM handling that can cause host crashes with TM bad thing interrupts. Mostly just simple typos or missed logic in the conversion that got through due to my not testing TM in the guest sufficiently. - Early TM emulation for the softpatch interrupt should be done if fake suspend mode is _not_ active. - Early TM emulation wants to return immediately to the guest so as to not doom transactions unnecessarily. - And if exiting from the guest, the host MSR should include the TM[S] bit if the guest was T/S, before it is treclaimed. After this fix, all the TM selftests pass when running on a P9 processor that implements TM with softpatch interrupt. Fixes: 89d35b2391015 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C") Reported-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]