blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2021-06-25	Merge tag 'kvm-s390-next-5.14-1' of ↵	Paolo Bonzini	2	-9/+17
	git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features for 5.14 - new HW facilities for guests - make inline assembly more robust with KASAN and co
2021-06-25	Merge branch kvm-arm64/mmu/mte into kvmarm-master/next	Marc Zyngier	1	-4/+8
	Last minute fix for MTE, making sure the pages are flagged as MTE before they are released. * kvm-arm64/mmu/mte: KVM: arm64: Set the MTE tag bit before releasing the page Signed-off-by: Marc Zyngier <[email protected]>
2021-06-25	Merge remote-tracking branch 'spi/for-5.14' into spi-next	Mark Brown	57	-849/+1236

2021-06-25	Merge remote-tracking branch 'spi/for-5.13' into spi-linus	Mark Brown	3	-7/+15

2021-06-25	Merge remote-tracking branch 'spi/for-5.12' into spi-linus	Mark Brown	1	-3/+5

2021-06-25	spi: core: add dma_map_dev for dma device	Vinod Koul	2	-0/+5
	Some controllers like qcom geni need the parent device to be used for dma mapping, so add a dma_map_dev field and let drivers fill this to be used as mapping device Signed-off-by: Vinod Koul <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2021-06-25	spi: convert Xilinx Zynq UltraScale+ MPSoC GQSPI bindings to YAML	Nobuhiro Iwamatsu	2	-25/+51
	Convert spi for Xilinx Zynq UltraScale+ MPSoC GQSPI bindings documentation to YAML. Signed-off-by: Nobuhiro Iwamatsu <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2021-06-25	gpio: AMD8111 and TQMX86 require HAS_IOPORT_MAP	Johannes Berg	1	-0/+2
	Both of these drivers use ioport_map(), so they need to depend on HAS_IOPORT_MAP. Otherwise, they cannot be built even with COMPILE_TEST on architectures without an ioport implementation, such as ARCH=um. Reported-by: kernel test robot <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2021-06-24	mailmap: add Marek's other e-mail address and identity without diacritics	Marek Behún	1	-0/+2
	Some of my commits were sent with identities Marek Behun <[email protected]> Marek Behún <[email protected]> while the correct one is Marek Behún <[email protected]> Put this into mailmap so that git shortlog prints all my commits under one identity. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marek Behún <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	MAINTAINERS: fix Marek's identity again	Marek Behún	1	-2/+2
	Fix my name to use diacritics, since MAINTAINERS supports it. Fix my e-mail address in MAINTAINERS' marvell10g PHY driver description, I accidentally put my other e-mail address here. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marek Behún <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm/page_alloc: do bulk array bounds check after checking populated elements	Mel Gorman	1	-0/+4
	Dan Carpenter reported the following The patch 0f87d9d30f21: "mm/page_alloc: add an array-based interface to the bulk page allocator" from Apr 29, 2021, leads to the following static checker warning: mm/page_alloc.c:5338 __alloc_pages_bulk() warn: potentially one past the end of array 'page_array[nr_populated]' The problem can occur if an array is passed in that is fully populated. That potentially ends up allocating a single page and storing it past the end of the array. This patch returns 0 if the array is fully populated. Link: https://lkml.kernel.org/r/[email protected] Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator") Signed-off-by: Mel Gorman <[email protected]> Reported-by: Dan Carpenter <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array	Rasmus Villemoes	1	-1/+1
	In the event that somebody would call this with an already fully populated page_array, the last loop iteration would do an access beyond the end of page_array. It's of course extremely unlikely that would ever be done, but this triggers my internal static analyzer. Also, if it really is not supposed to be invoked this way (i.e., with no NULL entries in page_array), the nr_populated<nr_pages check could simply be removed instead. Link: https://lkml.kernel.org/r/[email protected] Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator") Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm/hwpoison: do not lock page again when me_huge_page() successfully recovers	Naoya Horiguchi	1	-14/+30
	Currently me_huge_page() temporary unlocks page to perform some actions then locks it again later. My testcase (which calls hard-offline on some tail page in a hugetlb, then accesses the address of the hugetlb range) showed that page allocation code detects this page lock on buddy page and printed out "BUG: Bad page state" message. check_new_page_bad() does not consider a page with __PG_HWPOISON as bad page, so this flag works as kind of filter, but this filtering doesn't work in this case because the "bad page" is not the actual hwpoisoned page. So stop locking page again. Actions to be taken depend on the page type of the error, so page unlocking should be done in ->action() callbacks. So let's make it assumed and change all existing callbacks that way. Link: https://lkml.kernel.org/r/[email protected] Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error") Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tony Luck <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned	Aili Yao	1	-1/+2
	When memory_failure() is called with MF_ACTION_REQUIRED on the page that has already been hwpoisoned, memory_failure() could fail to send SIGBUS to the affected process, which results in infinite loop of MCEs. Currently memory_failure() returns 0 if it's called for already hwpoisoned page, then the caller, kill_me_maybe(), could return without sending SIGBUS to current process. An action required MCE is raised when the current process accesses to the broken memory, so no SIGBUS means that the current process continues to run and access to the error page again soon, so running into MCE loop. This issue can arise for example in the following scenarios: - Two or more threads access to the poisoned page concurrently. If local MCE is enabled, MCE handler independently handles the MCE events. So there's a race among MCE events, and the second or latter threads fall into the situation in question. - If there was a precedent memory error event and memory_failure() for the event failed to unmap the error page for some reason, the subsequent memory access to the error page triggers the MCE loop situation. To fix the issue, make memory_failure() return an error code when the error page has already been hwpoisoned. This allows memory error handler to control how it sends signals to userspace. And make sure that any process touching a hwpoisoned page should get a SIGBUS even in "already hwpoisoned" path of memory_failure() as is done in page fault path. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aili Yao <[email protected]> Signed-off-by: Naoya Horiguchi <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jue Wang <[email protected]> Cc: Tony Luck <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm/memory-failure: use a mutex to avoid memory_failure() races	Tony Luck	1	-13/+23
	Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5. I wrote this patchset to materialize what I think is the current allowable solution mentioned by the previous discussion [1]. I simply borrowed Tony's mutex patch and Aili's return code patch, then I queued another one to find error virtual address in the best effort manner. I know that this is not a perfect solution, but should work for some typical case. [1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/ This patch (of 2): There can be races when multiple CPUs consume poison from the same page. The first into memory_failure() atomically sets the HWPoison page flag and begins hunting for tasks that map this page. Eventually it invalidates those mappings and may send a SIGBUS to the affected tasks. But while all that work is going on, other CPUs see a "success" return code from memory_failure() and so they believe the error has been handled and continue executing. Fix by wrapping most of the internal parts of memory_failure() in a mutex. [[email protected]: make mf_mutex local to memory_failure()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Naoya Horiguchi <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Aili Yao <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jue Wang <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm, futex: fix shared futex pgoff on shmem huge page	Hugh Dickins	4	-28/+9
	If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong. When 3.11 commit 13d60f4b6ab5 ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages. page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head. Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it. [[email protected]: give hugetlb_basepage_index() prototype the correct scope] Link: https://lkml.kernel.org/r/[email protected] Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Reported-by: Neel Natu <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Zhang Yi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	kthread: prevent deadlock when kthread_mod_delayed_work() races with ↵	Petr Mladek	1	-11/+24
	kthread_cancel_delayed_work_sync() The system might hang with the following backtrace: schedule+0x80/0x100 schedule_timeout+0x48/0x138 wait_for_common+0xa4/0x134 wait_for_completion+0x1c/0x2c kthread_flush_work+0x114/0x1cc kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144 kthread_cancel_delayed_work_sync+0x18/0x2c xxxx_pm_notify+0xb0/0xd8 blocking_notifier_call_chain_robust+0x80/0x194 pm_notifier_call_chain_robust+0x28/0x4c suspend_prepare+0x40/0x260 enter_state+0x80/0x3f4 pm_suspend+0x60/0xdc state_store+0x108/0x144 kobj_attr_store+0x38/0x88 sysfs_kf_write+0x64/0xc0 kernfs_fop_write_iter+0x108/0x1d0 vfs_write+0x2f4/0x368 ksys_write+0x7c/0xec It is caused by the following race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(): CPU0 CPU1 Context: Thread A Context: Thread B kthread_mod_delayed_work() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() kthread_cancel_delayed_work_sync() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() spin_lock() work->canceling++ spin_unlock spin_lock() queue_delayed_work() // dwork is put into the worker->delayed_work_list spin_unlock() kthread_flush_work() // flush_work is put at the tail of the dwork wait_for_completion() Context: IRQ kthread_delayed_work_timer_fn() spin_lock() list_del_init(&work->node); spin_unlock() BANG: flush_work is not longer linked and will never get proceed. The problem is that kthread_mod_delayed_work() checks work->canceling flag before canceling the timer. A simple solution is to (re)check work->canceling after __kthread_cancel_work(). But then it is not clear what should be returned when __kthread_cancel_work() removed the work from the queue (list) and it can't queue it again with the new @delay. The return value might be used for reference counting. The caller has to know whether a new work has been queued or an existing one was replaced. The proper solution is that kthread_mod_delayed_work() will remove the work from the queue (list) _only_ when work->canceling is not set. The flag must be checked after the timer is stopped and the remaining operations can be done under worker->lock. Note that kthread_mod_delayed_work() could remove the timer and then bail out. It is fine. The other canceling caller needs to cancel the timer as well. The important thing is that the queue (list) manipulation is done atomically under worker->lock. Link: https://lkml.kernel.org/r/[email protected] Fixes: 9a6b06c8d9a220860468a ("kthread: allow to modify delayed kthread work") Signed-off-by: Petr Mladek <[email protected]> Reported-by: Martin Liu <[email protected]> Cc: <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	kthread_worker: split code for canceling the delayed work timer	Petr Mladek	1	-17/+29
	Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()". This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling. This patch (of 2): Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(). It does not modify the existing behavior. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Cc: <[email protected]> Cc: Martin Liu <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm/vmalloc: unbreak kasan vmalloc support	Daniel Axtens	1	-10/+14
	In commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings"), __vmalloc_node_range was changed such that __get_vm_area_node was no longer called with the requested/real size of the vmalloc allocation, but rather with a rounded-up size. This means that __get_vm_area_node called kasan_unpoision_vmalloc() with a rounded up size rather than the real size. This led to it allowing access to too much memory and so missing vmalloc OOBs and failing the kasan kunit tests. Pass the real size and the desired shift into __get_vm_area_node. This allows it to round up the size for the underlying allocators while still unpoisioning the correct quantity of shadow memory. Adjust the other call-sites to pass in PAGE_SHIFT for the shift value. Link: https://lkml.kernel.org/r/[email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=213335 Fixes: 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") Signed-off-by: Daniel Axtens <[email protected]> Tested-by: David Gow <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Acked-by: Andrey Konovalov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	KVM: s390: prepare for hugepage vmalloc	Claudio Imbrenda	1	-1/+6
	The Create Secure Configuration Ultravisor Call does not support using large pages for the virtual memory area. This is a hardware limitation. This patch replaces the vzalloc call with an almost equivalent call to the newly introduced vmalloc_no_huge function, which guarantees that only small pages will be used for the backing. The new call will not clear the allocated memory, but that has never been an actual requirement. Link: https://lkml.kernel.org/r/[email protected] Fixes: 121e6f3258fe3 ("mm/vmalloc: hugepage vmalloc mappings") Signed-off-by: Claudio Imbrenda <[email protected]> Reviewed-by: Janosch Frank <[email protected]> Acked-by: Christian Borntraeger <[email protected]> Acked-by: Nicholas Piggin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm/vmalloc: add vmalloc_no_huge	Claudio Imbrenda	2	-0/+18
	Patch series "mm: add vmalloc_no_huge and use it", v4. Add vmalloc_no_huge() and export it, so modules can allocate memory with small pages. Use the newly added vmalloc_no_huge() in KVM on s390 to get around a hardware limitation. This patch (of 2): Commit 121e6f3258fe3 ("mm/vmalloc: hugepage vmalloc mappings") added support for hugepage vmalloc mappings, it also added the flag VM_NO_HUGE_VMAP for __vmalloc_node_range to request the allocation to be performed with 0-order non-huge pages. This flag is not accessible when calling vmalloc, the only option is to call directly __vmalloc_node_range, which is not exported. This means that a module can't vmalloc memory with small pages. Case in point: KVM on s390x needs to vmalloc a large area, and it needs to be mapped with non-huge pages, because of a hardware limitation. This patch adds the function vmalloc_no_huge, which works like vmalloc, but it is guaranteed to always back the mapping using small pages. This new function is exported, therefore it is usable by modules. [[email protected]: whitespace fixes, per Christoph] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 121e6f3258fe3 ("mm/vmalloc: hugepage vmalloc mappings") Signed-off-by: Claudio Imbrenda <[email protected]> Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Nicholas Piggin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Cornelia Huck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	nilfs2: fix memory leak in nilfs_sysfs_delete_device_group	Pavel Skripkin	1	-0/+1
	My local syzbot instance hit memory leak in nilfs2. The problem was in missing kobject_put() in nilfs_sysfs_delete_device_group(). kobject_del() does not call kobject_cleanup() for passed kobject and it leads to leaking duped kobject name if kobject_put() was not called. Fail log: BUG: memory leak unreferenced object 0xffff8880596171e0 (size 8): comm "syz-executor379", pid 8381, jiffies 4294980258 (age 21.100s) hex dump (first 8 bytes): 6c 6f 6f 70 30 00 00 00 loop0... backtrace: kstrdup+0x36/0x70 mm/util.c:60 kstrdup_const+0x53/0x80 mm/util.c:83 kvasprintf_const+0x108/0x190 lib/kasprintf.c:48 kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xc9/0x160 lib/kobject.c:473 nilfs_sysfs_create_device_group+0x150/0x800 fs/nilfs2/sysfs.c:999 init_nilfs+0xe26/0x12b0 fs/nilfs2/the_nilfs.c:637 Link: https://lkml.kernel.org/r/[email protected] Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group") Signed-off-by: Pavel Skripkin <[email protected]> Acked-by: Ryusuke Konishi <[email protected]> Cc: Michael L. Semon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()	Hugh Dickins	1	-0/+4
	Aha! Shouldn't that quick scan over pte_none()s make sure that it holds ptlock in the PVMW_SYNC case? That too might have been responsible for BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though I've never seen any. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-mm/[email protected]/ Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Tested-by: Wang Yugui <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes	Hugh Dickins	1	-9/+25
	Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). Crash dumps showed two tail pages of a shmem huge page remained mapped by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end of a long unmapped range; and no page table had yet been allocated for the head of the huge page to be mapped into. Although designed to handle these odd misaligned huge-page-mapped-by-pte cases, page_vma_mapped_walk() falls short by returning false prematurely when !pmd_present or !pud_present or !p4d_present or !pgd_present: there are cases when a huge page may span the boundary, with ptes present in the next. Restructure page_vma_mapped_walk() as a loop to continue in these cases, while keeping its layout much as before. Add a step_forward() helper to advance pvmw->address across those boundaries: originally I tried to use mm's standard p?d_addr_end() macros, but hit the same crash 512 times less often: because of the way redundant levels are folded together, but folded differently in different configurations, it was just too difficult to use them correctly; and step_forward() is simpler anyway. Link: https://lkml.kernel.org/r/[email protected] Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm: page_vma_mapped_walk(): get vma_address_end() earlier	Hugh Dickins	1	-4/+9
	page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start, rather than later at next_pte. It's a little unnecessary overhead on the first call, but makes for a simpler loop in the following commit. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm: page_vma_mapped_walk(): use goto instead of while (1)	Hugh Dickins	1	-4/+3
	page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte, and use "goto this_pte", in place of the "while (1)" loop at the end. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm: page_vma_mapped_walk(): add a level of indentation	Hugh Dickins	1	-50/+55
	page_vma_mapped_walk() cleanup: add a level of indentation to much of the body, making no functional change in this commit, but reducing the later diff when this is all converted to a loop. [[email protected]: : page_vma_mapped_walk(): add a level of indentation fix] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm: page_vma_mapped_walk(): crossing page table boundary	Hugh Dickins	1	-4/+4
	page_vma_mapped_walk() cleanup: adjust the test for crossing page table boundary - I believe pvmw->address is always page-aligned, but nothing else here assumed that; and remember to reset pvmw->pte to NULL after unmapping the page table, though I never saw any bug from that. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block	Hugh Dickins	1	-16/+14
	page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to follow the same "return not_found, return not_found, return true" pattern as the block above it (note: returning not_found there is never premature, since existence or prior existence of huge pmd guarantees good alignment). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd	Hugh Dickins	1	-5/+6
	page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then use it in subsequent tests, instead of repeatedly dereferencing pointer. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm: page_vma_mapped_walk(): settle PageHuge on entry	Hugh Dickins	1	-4/+8
	page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of the way at the start, so no need to worry about it later. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Alistair Popple <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	mm: page_vma_mapped_walk(): use page for pvmw->page	Hugh Dickins	1	-5/+4
	Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes". I've marked all of these for stable: many are merely cleanups, but I think they are much better before the main fix than after. This patch (of 11): page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was used, sometimes pvmw->page itself: use the local copy "page" throughout. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Yang Shi <[email protected]> Cc: Wang Yugui <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Zi Yan <[email protected]> Cc: Will Deacon <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-24	ata: rb532_cf: remove redundant codes	gushengxian	1	-3/+1
	The codes "dev_err(&pdev->dev, "no IRQ resource found\n");" is redundant because platform_get_irq() already prints an error. Signed-off-by: gushengxian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-24	KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled	Maxim Levitsky	3	-5/+5
	This better reflects the purpose of this variable on AMD, since on AMD the AVIC's memory slot can be enabled and disabled dynamically. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	kvm: x86: disable the narrow guest module parameter on unload	Aaron Lewis	1	-0/+2
	When the kvm_intel module unloads the module parameter 'allow_smaller_maxphyaddr' is not cleared because the backing variable is defined in the kvm module. As a result, if the module parameter's state was set before kvm_intel unloads, it will also be set when it reloads. Explicitly clear the state in vmx_exit() to prevent this from happening. Signed-off-by: Aaron Lewis <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Jim Mattson <[email protected]>
2021-06-24	selftests: kvm: Allows userspace to handle emulation errors.	Aaron Lewis	5	-0/+317
	This test exercises the feature KVM_CAP_EXIT_ON_EMULATION_FAILURE. When enabled, errors in the in-kernel instruction emulator are forwarded to userspace with the instruction bytes stored in the exit struct for KVM_EXIT_INTERNAL_ERROR. So, when the guest attempts to emulate an 'flds' instruction, which isn't able to be emulated in KVM, instead of failing, KVM sends the instruction to userspace to handle. For this test to work properly the module parameter 'allow_smaller_maxphyaddr' has to be set. Signed-off-by: Aaron Lewis <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	kvm: x86: Allow userspace to handle emulation errors	Aaron Lewis	4	-4/+85
	Add a fallback mechanism to the in-kernel instruction emulator that allows userspace the opportunity to process an instruction the emulator was unable to. When the in-kernel instruction emulator fails to process an instruction it will either inject a #UD into the guest or exit to userspace with exit reason KVM_INTERNAL_ERROR. This is because it does not know how to proceed in an appropriate manner. This feature lets userspace get involved to see if it can figure out a better path forward. Signed-off-by: Aaron Lewis <[email protected]> Reviewed-by: David Edmondson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on	Sean Christopherson	1	-3/+17
	Let the guest use 1g hugepages if TDP is enabled and the host supports GBPAGES, KVM can't actively prevent the guest from using 1g pages in this case since they can't be disabled in the hardware page walker. While injecting a page fault if a bogus 1g page is encountered during a software page walk is perfectly reasonable since KVM is simply honoring userspace's vCPU model, doing so arguably doesn't provide any meaningful value, and at worst will be horribly confusing as the guest will see inconsistent behavior and seemingly spurious page faults. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault	Sean Christopherson	1	-1/+1
	Use the current MMU instead of vCPU state to query CR4.SMEP when handling a page fault. In the nested NPT case, the current CR4.SMEP reflects L2, whereas the page fault is shadowing L1's NPT, which uses L1's hCR4. Practically speaking, this is a nop a NPT walks are always user faults, i.e. this code will never be reached, but fix it up for consistency. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault	Sean Christopherson	2	-8/+2
	Use the current MMU instead of vCPU state to query CR0.WP when handling a page fault. In the nested NPT case, the current CR0.WP reflects L2, whereas the page fault is shadowing L1's NPT. Practically speaking, this is a nop a NPT walks are always user faults, but fix it up for consistency. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT	Sean Christopherson	1	-6/+0
	Drop the extra reset of shadow_zero_bits in the nested NPT flow now that shadow_mmu_init_context computes the correct level for nested NPT. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic	Sean Christopherson	3	-35/+30
	Drop the pre-computed last_nonleaf_level, which is arguably wrong and at best confusing. Per the comment: Can have large pages at levels 2..last_nonleaf_level-1. the intent of the variable would appear to be to track what levels can _legally_ have large pages, but that intent doesn't align with reality. The computed value will be wrong for 5-level paging, or if 1gb pages are not supported. The flawed code is not a problem in practice, because except for 32-bit PSE paging, bit 7 is reserved if large pages aren't supported at the level. Take advantage of this invariant and simply omit the level magic math for 64-bit page tables (including PAE). For 32-bit paging (non-PAE), the adjustments are needed purely because bit 7 is ignored if PSE=0. Retain that logic as is, but make is_last_gpte() unique per PTTYPE so that the PSE check is avoided for PAE and EPT paging. In the spirit of avoiding branches, bump the "last nonleaf level" for 32-bit PSE paging by adding the PSE bit itself. Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are not PTEs and will blow up all the other gpte helpers. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86: Enhance comments for MMU roles and nested transition trickiness	Sean Christopherson	3	-10/+49
	Expand the comments for the MMU roles. The interactions with gfn_track PGD reuse in particular are hairy. Regarding PGD reuse, add comments in the nested virtualization flows to call out why kvm_init_mmu() is unconditionally called even when nested TDP is used. Cc: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE	Sean Christopherson	1	-1/+4
	Replace make_spte()'s WARN on a collision with the magic MMIO value with a generic WARN on reserved bits being set (including EPT's reserved WX combination). Warning on any reserved bits covers MMIO, A/D tracking bits with PAE paging, and in theory any future goofs that are introduced. Opportunistically convert to ONCE behavior to avoid spamming the kernel log, odds are very good that if KVM screws up one SPTE, it will botch all SPTEs for the same MMU. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU	Sean Christopherson	2	-21/+34
	Extract the reserved SPTE check and print helpers in get_mmio_spte() to new helpers so that KVM can also WARN on reserved badness when making a SPTE. Tag the checking helper with __always_inline to improve the probability of the compiler generating optimal code for the checking loop, e.g. gcc appears to avoid using %rbp when the helper is tagged with a vanilla "inline". No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Use MMU's role to determine PTTYPE	Sean Christopherson	1	-4/+4
	Use the MMU's role instead of vCPU state or role_regs to determine the PTTYPE, i.e. which helpers to wire up. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers	Sean Christopherson	1	-17/+2
	Skip paging32E_init_context() and paging64_init_context_common() and go directly to paging64_init_context() (was the common version) now that the relevant flows don't need to distinguish between 64-bit PAE and 32-bit PAE for other reasons. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Add a helper to calculate root from role_regs	Sean Christopherson	1	-35/+25
	Add a helper to calculate the level for non-EPT page tables from the MMU's role_regs. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Add helper to update paging metadata	Sean Christopherson	1	-18/+15
	Consolidate MMU guest metadata updates into a common helper for TDP, shadow, and nested MMUs. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-06-24	KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0	Sean Christopherson	1	-10/+10
	Don't bother updating the bitmasks and last-leaf information if paging is disabled as the metadata will never be used. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>