aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-07-26selftests/mm: skip test for non-LPA2 and non-LVA systemsDev Jain1-1/+15
Post my improvement of the test in e4a4ba415419 ("selftests/mm: va_high_addr_switch: dynamically initialize testcases to enable LPA2 testing"): The test begins to fail on 4k and 16k pages, on non-LPA2 systems. To reduce noise in the CI systems, let us skip the test when higher address space is not implemented. Link: https://lkml.kernel.org/r/[email protected] Fixes: e4a4ba415419 ("selftests/mm: va_high_addr_switch: dynamically initialize testcases to enable LPA2 testing") Signed-off-by: Dev Jain <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Mark Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-26mm/page_alloc: fix pcp->count race between drain_pages_zone() vs ↵Li Zhijian1-7/+11
__rmqueue_pcplist() It's expected that no page should be left in pcp_list after calling zone_pcp_disable() in offline_pages(). Previously, it's observed that offline_pages() gets stuck [1] due to some pages remaining in pcp_list. Cause: There is a race condition between drain_pages_zone() and __rmqueue_pcplist() involving the pcp->count variable. See below scenario: CPU0 CPU1 ---------------- --------------- spin_lock(&pcp->lock); __rmqueue_pcplist() { zone_pcp_disable() { /* list is empty */ if (list_empty(list)) { /* add pages to pcp_list */ alloced = rmqueue_bulk() mutex_lock(&pcp_batch_high_lock) ... __drain_all_pages() { drain_pages_zone() { /* read pcp->count, it's 0 here */ count = READ_ONCE(pcp->count) /* 0 means nothing to drain */ /* update pcp->count */ pcp->count += alloced << order; ... ... spin_unlock(&pcp->lock); In this case, after calling zone_pcp_disable() though, there are still some pages in pcp_list. And these pages in pcp_list are neither movable nor isolated, offline_pages() gets stuck as a result. Solution: Expand the scope of the pcp->lock to also protect pcp->count in drain_pages_zone(), to ensure no pages are left in the pcp list after zone_pcp_disable() [1] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock") Signed-off-by: Li Zhijian <[email protected]> Reported-by: Yao Xingtao <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-26mm: memcg: add cacheline padding after lruvec in mem_cgroup_per_nodeRoman Gushchin1-0/+1
Oliver Sand reported a performance regression caused by commit 98c9daf5ae6b ("mm: memcg: guard memcg1-specific members of struct mem_cgroup_per_node"), which puts some fields of the mem_cgroup_per_node structure under the CONFIG_MEMCG_V1 config option. Apparently it causes a false cache sharing between lruvec and lru_zone_size members of the structure. Fix it by adding an explicit padding after the lruvec member. Even though the padding is not required with CONFIG_MEMCG_V1 set, it seems like the introduced memory overhead is not significant enough to warrant another divergence in the mem_cgroup_per_node layout, so the padding is added unconditionally. Link: https://lkml.kernel.org/r/[email protected] Fixes: 98c9daf5ae6b ("mm: memcg: guard memcg1-specific members of struct mem_cgroup_per_node") Signed-off-by: Roman Gushchin <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Tested-by: Oliver Sang <[email protected]> Acked-by: Shakeel Butt <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-26alloc_tag: outline and export free_reserved_page()Suren Baghdasaryan2-15/+18
Outline and export free_reserved_page() because modules use it and it in turn uses page_ext_{get|put} which should not be exported. The same result could be obtained by outlining {get|put}_page_tag_ref() but that would have higher performance impact as these functions are used in more performance critical paths. Link: https://lkml.kernel.org/r/[email protected] Fixes: dcfe378c81f7 ("lib: introduce support for page allocation tagging") Signed-off-by: Suren Baghdasaryan <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Suggested-by: Christoph Hellwig <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Sourav Panda <[email protected]> Cc: <[email protected]> [6.10] Signed-off-by: Andrew Morton <[email protected]>
2024-07-26decompress_bunzip2: fix rare decompression failureRoss Lagerwall1-1/+2
The decompression code parses a huffman tree and counts the number of symbols for a given bit length. In rare cases, there may be >= 256 symbols with a given bit length, causing the unsigned char to overflow. This causes a decompression failure later when the code tries and fails to find the bit length for a given symbol. Since the maximum number of symbols is 258, use unsigned short instead. Link: https://lkml.kernel.org/r/[email protected] Fixes: bc22c17e12c1 ("bzip2/lzma: library support for gzip, bzip2 and lzma decompression") Signed-off-by: Ross Lagerwall <[email protected]> Cc: Alain Knaff <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-26mm/huge_memory: avoid PMD-size page cache if neededGavin Shan2-5/+19
xarray can't support arbitrary page cache size. the largest and supported page cache size is defined as MAX_PAGECACHE_ORDER by commit 099d90642a71 ("mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray"). However, it's possible to have 512MB page cache in the huge memory's collapsing path on ARM64 system whose base page size is 64KB. 512MB page cache is breaking the limitation and a warning is raised when the xarray entry is split as shown in the following example. [root@dhcp-10-26-1-207 ~]# cat /proc/1/smaps | grep KernelPageSize KernelPageSize: 64 kB [root@dhcp-10-26-1-207 ~]# cat /tmp/test.c : int main(int argc, char **argv) { const char *filename = TEST_XFS_FILENAME; int fd = 0; void *buf = (void *)-1, *p; int pgsize = getpagesize(); int ret = 0; if (pgsize != 0x10000) { fprintf(stdout, "System with 64KB base page size is required!\n"); return -EPERM; } system("echo 0 > /sys/devices/virtual/bdi/253:0/read_ahead_kb"); system("echo 1 > /proc/sys/vm/drop_caches"); /* Open the xfs file */ fd = open(filename, O_RDONLY); assert(fd > 0); /* Create VMA */ buf = mmap(NULL, TEST_MEM_SIZE, PROT_READ, MAP_SHARED, fd, 0); assert(buf != (void *)-1); fprintf(stdout, "mapped buffer at 0x%p\n", buf); /* Populate VMA */ ret = madvise(buf, TEST_MEM_SIZE, MADV_NOHUGEPAGE); assert(ret == 0); ret = madvise(buf, TEST_MEM_SIZE, MADV_POPULATE_READ); assert(ret == 0); /* Collapse VMA */ ret = madvise(buf, TEST_MEM_SIZE, MADV_HUGEPAGE); assert(ret == 0); ret = madvise(buf, TEST_MEM_SIZE, MADV_COLLAPSE); if (ret) { fprintf(stdout, "Error %d to madvise(MADV_COLLAPSE)\n", errno); goto out; } /* Split xarray entry. Write permission is needed */ munmap(buf, TEST_MEM_SIZE); buf = (void *)-1; close(fd); fd = open(filename, O_RDWR); assert(fd > 0); fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, TEST_MEM_SIZE - pgsize, pgsize); out: if (buf != (void *)-1) munmap(buf, TEST_MEM_SIZE); if (fd > 0) close(fd); return ret; } [root@dhcp-10-26-1-207 ~]# gcc /tmp/test.c -o /tmp/test [root@dhcp-10-26-1-207 ~]# /tmp/test ------------[ cut here ]------------ WARNING: CPU: 25 PID: 7560 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128 Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib \ nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct \ nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 \ ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm fuse \ xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 virtio_net \ sha1_ce net_failover virtio_blk virtio_console failover dimlib virtio_mmio CPU: 25 PID: 7560 Comm: test Kdump: loaded Not tainted 6.10.0-rc7-gavin+ #9 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024 pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : xas_split_alloc+0xf8/0x128 lr : split_huge_page_to_list_to_order+0x1c4/0x780 sp : ffff8000ac32f660 x29: ffff8000ac32f660 x28: ffff0000e0969eb0 x27: ffff8000ac32f6c0 x26: 0000000000000c40 x25: ffff0000e0969eb0 x24: 000000000000000d x23: ffff8000ac32f6c0 x22: ffffffdfc0700000 x21: 0000000000000000 x20: 0000000000000000 x19: ffffffdfc0700000 x18: 0000000000000000 x17: 0000000000000000 x16: ffffd5f3708ffc70 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: ffffffffffffffc0 x10: 0000000000000040 x9 : ffffd5f3708e692c x8 : 0000000000000003 x7 : 0000000000000000 x6 : ffff0000e0969eb8 x5 : ffffd5f37289e378 x4 : 0000000000000000 x3 : 0000000000000c40 x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000 Call trace: xas_split_alloc+0xf8/0x128 split_huge_page_to_list_to_order+0x1c4/0x780 truncate_inode_partial_folio+0xdc/0x160 truncate_inode_pages_range+0x1b4/0x4a8 truncate_pagecache_range+0x84/0xa0 xfs_flush_unmap_range+0x70/0x90 [xfs] xfs_file_fallocate+0xfc/0x4d8 [xfs] vfs_fallocate+0x124/0x2f0 ksys_fallocate+0x4c/0xa0 __arm64_sys_fallocate+0x24/0x38 invoke_syscall.constprop.0+0x7c/0xd8 do_el0_svc+0xb4/0xd0 el0_svc+0x44/0x1d8 el0t_64_sync_handler+0x134/0x150 el0t_64_sync+0x17c/0x180 Fix it by correcting the supported page cache orders, different sets for DAX and other files. With it corrected, 512MB page cache becomes disallowed on all non-DAX files on ARM64 system where the base page size is 64KB. After this patch is applied, the test program fails with error -EINVAL returned from __thp_vma_allowable_orders() and the madvise() system call to collapse the page caches. Link: https://lkml.kernel.org/r/[email protected] Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Signed-off-by: Gavin Shan <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Acked-by: Zi Yan <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Barry Song <[email protected]> Cc: Don Dutile <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: William Kucharski <[email protected]> Cc: <[email protected]> [5.17+] Signed-off-by: Andrew Morton <[email protected]>
2024-07-26mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit ↵Yang Shi1-1/+1
machines Yves-Alexis Perez reported commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment on 32 bit") didn't work for x86_32 [1]. It is because x86_32 uses CONFIG_X86_32 instead of CONFIG_32BIT. !CONFIG_64BIT should cover all 32 bit machines. [1] https://lore.kernel.org/linux-mm/CAHbLzkr1LwH3pcTgM+aGQ31ip2bKqiqEQ8=FQB+t2c3dhNKNHA@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment on 32 bit") Signed-off-by: Yang Shi <[email protected]> Reported-by: Yves-Alexis Perez <[email protected]> Tested-by: Yves-Alexis Perez <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Salvatore Bonaccorso <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: <[email protected]> [6.8+] Signed-off-by: Andrew Morton <[email protected]>
2024-07-26mm: fix old/young bit handling in the faulting pathRam Tummala1-1/+1
Commit 3bd786f76de2 ("mm: convert do_set_pte() to set_pte_range()") replaced do_set_pte() with set_pte_range() and that introduced a regression in the following faulting path of non-anonymous vmas which caused the PTE for the faulting address to be marked as old instead of young. handle_pte_fault() do_pte_missing() do_fault() do_read_fault() || do_cow_fault() || do_shared_fault() finish_fault() set_pte_range() The polarity of prefault calculation is incorrect. This leads to prefault being incorrectly set for the faulting address. The following check will incorrectly mark the PTE old rather than young. On some architectures this will cause a double fault to mark it young when the access is retried. if (prefault && arch_wants_old_prefaulted_pte()) entry = pte_mkold(entry); On a subsequent fault on the same address, the faulting path will see a non NULL vmf->pte and instead of reaching the do_pte_missing() path, PTE will then be correctly marked young in handle_pte_fault() itself. Due to this bug, performance degradation in the fault handling path will be observed due to unnecessary double faulting. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3bd786f76de2 ("mm: convert do_set_pte() to set_pte_range()") Signed-off-by: Ram Tummala <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Yin Fengwei <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-26dt-bindings: arm: update James Clark's email addressJames Clark2-2/+2
My new address is [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Cc: Bjorn Andersson <[email protected]> Cc: Conor Dooley <[email protected]> Cc: David S. Miller <[email protected]> Cc: Geliang Tang <[email protected]> Cc: Hao Zhang <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Kees Cook <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Mao Jinlong <[email protected]> Cc: Matthieu Baerts <[email protected]> Cc: Matt Ranostay <[email protected]> Cc: Mike Leach <[email protected]> Cc: Oleksij Rempel <[email protected]> Cc: Rob Herring (Arm) <[email protected]> Cc: Suzuki K Poulose <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-26MAINTAINERS: mailmap: update James Clark's email addressJames Clark2-2/+3
My new address is [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Cc: Bjorn Andersson <[email protected]> Cc: Conor Dooley <[email protected]> Cc: David S. Miller <[email protected]> Cc: Geliang Tang <[email protected]> Cc: Hao Zhang <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Kees Cook <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Mao Jinlong <[email protected]> Cc: Matthieu Baerts <[email protected]> Cc: Matt Ranostay <[email protected]> Cc: Mike Leach <[email protected]> Cc: Oleksij Rempel <[email protected]> Cc: Rob Herring (Arm) <[email protected]> Cc: Suzuki K Poulose <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-26irqchip/loongarch-cpu: Fix return value of lpic_gsi_to_irq()Huacai Chen1-2/+4
lpic_gsi_to_irq() should return a valid Linux interrupt number if acpi_register_gsi() succeeds, and return 0 otherwise. But lpic_gsi_to_irq() converts a negative return value of acpi_register_gsi() to a positive value silently. Convert the return value explicitly. Fixes: e8bba72b396c ("irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch") Reported-by: Miao Wang <[email protected]> Signed-off-by: Huacai Chen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jiaxun Yang <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-26dt-bindings: iio: adc: ad7192: Fix 'single-channel' constraintsRob Herring (Arm)1-3/+2
The 'single-channel' property is an uint32, not an array, so 'items' is an incorrect constraint. This didn't matter until dtschema recently changed how properties are decoded. This results in this warning: Documentation/devicetree/bindings/iio/adc/adi,ad7192.example.dtb: adc@0: \ channel@1:single-channel: 1 is not of type 'array' Fixes: caf7b7632b8d ("dt-bindings: iio: adc: ad7192: Add AD7194 support") Reviewed-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring (Arm) <[email protected]>
2024-07-26KVM: guest_memfd: abstract how prepared folios are recordedPaolo Bonzini1-13/+20
Right now, large folios are not supported in guest_memfd, and therefore the order used by kvm_gmem_populate() is always 0. In this scenario, using the up-to-date bit to track prepared-ness is nice and easy because we have one bit available per page. In the future, however, we might have large pages that are partially populated; for example, in the case of SEV-SNP, if a large page has both shared and private areas inside, it is necessary to populate it at a granularity that is smaller than that of the guest_memfd's backing store. In that case we will have to track preparedness at a 4K level, probably as a bitmap. In preparation for that, do not use explicitly folio_test_uptodate() and folio_mark_uptodate(). Return the state of the page directly from __kvm_gmem_get_pfn(), so that it is expected to apply to 2^N pages with N=*max_order. The function to mark a range as prepared for now takes just a folio, but is expected to take also an index and order (or something like that) when large pages are introduced. Thanks to Michael Roth for pointing out the issue with large pages. Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfnsPaolo Bonzini3-7/+14
This check is currently performed by sev_gmem_post_populate(), but it applies to all callers of kvm_gmem_populate(): the point of the function is that the memory is being encrypted and some work has to be done on all the gfns in order to encrypt them. Therefore, check the KVM_MEMORY_ATTRIBUTE_PRIVATE attribute prior to invoking the callback, and stop the operation if a shared page is encountered. Because CONFIG_KVM_PRIVATE_MEM in principle does not require attributes, this makes kvm_gmem_populate() depend on CONFIG_KVM_GENERIC_PRIVATE_MEM (which does require them). Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: extend kvm_range_has_memory_attributes() to check subset of attributesPaolo Bonzini3-8/+9
While currently there is no other attribute than KVM_MEMORY_ATTRIBUTE_PRIVATE, KVM code such as kvm_mem_is_private() is written to expect their existence. Allow using kvm_range_has_memory_attributes() as a multi-page version of kvm_mem_is_private(), without it breaking later when more attributes are introduced. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()Paolo Bonzini1-22/+20
Use a guard to simplify early returns, and add two more easy shortcuts. If the requested attributes are invalid, the attributes xarray will never show them as set. And if testing a single page, kvm_get_memory_attributes() is more efficient. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: guest_memfd: move check for already-populated page to common codePaolo Bonzini2-1/+8
Do not allow populating the same page twice with startup data. In the case of SEV-SNP, for example, the firmware does not allow it anyway, since the launch-update operation is only possible on pages that are still shared in the RMP. Even if it worked, kvm_gmem_populate()'s callback is meant to have side effects such as updating launch measurements, and updating the same page twice is unlikely to have the desired results. Races between calls to the ioctl are not possible because kvm_gmem_populate() holds slots_lock and the VM should not be running. But again, even if this worked on other confidential computing technology, it doesn't matter to guest_memfd.c whether this is something fishy such as missing synchronization in userspace, or rather something intentional. One of the racers wins, and the page is initialized by either kvm_gmem_prepare_folio() or kvm_gmem_populate(). Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use the same errno that kvm_gmem_populate() is using. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: remove kvm_arch_gmem_prepare_needed()Paolo Bonzini3-16/+3
It is enough to return 0 if a guest need not do any preparation. This is in fact how sev_gmem_prepare() works for non-SNP guests, and it extends naturally to Intel hosts: the x86 callback for gmem_prepare is optional and returns 0 if not defined. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvmPaolo Bonzini1-30/+19
This is now possible because preparation is done by kvm_gmem_get_pfn() instead of fallocate(). In practice this is not a limitation, because even though guest_memfd can be bound to multiple struct kvm, for hardware implementations of confidential computing only one guest (identified by an ASID on SEV-SNP, or an HKID on TDX) will be able to access it. In the case of intra-host migration (not implemented yet for SEV-SNP, but we can use SEV-ES as an idea of how it will work), the new struct kvm inherits the same ASID and preparation need not be repeated. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed ↵Paolo Bonzini1-44/+66
to the guest Initializing the contents of the folio on fallocate() is unnecessarily restrictive. It means that the page is registered with the firmware and then it cannot be touched anymore. In particular, this loses the possibility of using fallocate() to pre-allocate the page for SEV-SNP guests, because kvm_arch_gmem_prepare() then fails. It's only when the guest actually accesses the page (and therefore kvm_gmem_get_pfn() is called) that the page must be cleared from any stale host data and registered with the firmware. The up-to-date flag is clear if this has to be done (i.e. it is the first access and kvm_gmem_populate() has not been called). All in all, there are enough differences between kvm_gmem_get_pfn() and kvm_gmem_populate(), that it's better to separate the two flows completely. Extract the bulk of kvm_gmem_get_folio(), which take a folio and end up setting its up-to-date flag, to a new function kvm_gmem_prepare_folio(); these are now done only by the non-__-prefixed kvm_gmem_get_pfn(). As a bonus, __kvm_gmem_get_pfn() loses its ugly "bool prepare" argument. One difference is that fallocate(PUNCH_HOLE) can now race with a page fault. Potentially this causes a page to be prepared and into the filemap even after fallocate(PUNCH_HOLE). This is harmless, as it can be fixed by another hole punching operation, and can be avoided by clearing the private-page attribute prior to invoking fallocate(PUNCH_HOLE). This way, the page fault will cause an exit to user space. The previous semantics, where fallocate() could be used to prepare the pages in advance of running the guest, can be accessed with KVM_PRE_FAULT_MEMORY. For now, accessing a page in one VM will attempt to call kvm_arch_gmem_prepare() in all of those that have bound the guest_memfd. Cleaning this up is left to a separate patch. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfnPaolo Bonzini1-1/+4
Allow testing the up-to-date flag in the caller without taking the lock again. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*Paolo Bonzini5-11/+11
Add "ARCH" to the symbols; shortly, the "prepare" phase will include both the arch-independent step to clear out contents left in the page by the host, and the arch-dependent step enabled by CONFIG_HAVE_KVM_GMEM_PREPARE. For consistency do the same for CONFIG_HAVE_KVM_GMEM_INVALIDATE as well. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: guest_memfd: do not go through struct pagePaolo Bonzini1-10/+17
We have a perfectly usable folio, use it to retrieve the pfn and order. All that's needed is a version of folio_file_page that returns a pfn. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparationPaolo Bonzini1-2/+4
The up-to-date flag as is now is not too useful; it tells guest_memfd not to overwrite the contents of a folio, but it doesn't say that the page is ready to be mapped into the guest. For encrypted guests, mapping a private page requires that the "preparation" phase has succeeded, and at the same time the same page cannot be prepared twice. So, ensure that folio_mark_uptodate() is only called on a prepared page. If kvm_gmem_prepare_folio() or the post_populate callback fail, the folio will not be marked up-to-date; it's not a problem to call clear_highpage() again on such a page prior to the next preparation attempt. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()Paolo Bonzini1-17/+20
Right now this is simply more consistent and avoids use of pfn_to_page() and put_page(). It will be put to more use in upcoming patches, to ensure that the up-to-date flag is set at the very end of both the kvm_gmem_get_pfn() and kvm_gmem_populate() flows. Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26KVM: x86: disallow pre-fault for SNP VMs before initializationPaolo Bonzini6-0/+22
KVM_PRE_FAULT_MEMORY for an SNP guest can race with sev_gmem_post_populate() in bad ways. The following sequence for instance can potentially trigger an RMP fault: thread A, sev_gmem_post_populate: called thread B, sev_gmem_prepare: places below 'pfn' in a private state in RMP thread A, sev_gmem_post_populate: *vaddr = kmap_local_pfn(pfn + i); thread A, sev_gmem_post_populate: copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE); RMP #PF Fix this by only allowing KVM_PRE_FAULT_MEMORY to run after a guest's initial private memory contents have been finalized via KVM_SEV_SNP_LAUNCH_FINISH. Beyond fixing this issue, it just sort of makes sense to enforce this, since the KVM_PRE_FAULT_MEMORY documentation states: "KVM maps memory as if the vCPU generated a stage-2 read page fault" which sort of implies we should be acting on the same guest state that a vCPU would see post-launch after the initial guest memory is all set up. Co-developed-by: Michael Roth <[email protected]> Signed-off-by: Michael Roth <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26tools/power turbostat: version 2024.07.26Len Brown1-53/+52
Release 2024.07.26: Enable turbostat extensions to add both perf and PMT (Intel Platform Monitoring Technology) counters from the cmdline. Demonstrate PMT access with built-in support for Meteor Lake's Die%c6 counter. This commit: Clean up white-space nits introduced since version 2024.05.10 Signed-off-by: Len Brown <[email protected]>
2024-07-26tools/power turbostat: Include umask=%x in perf counter's configPatryk Wlazlyn1-10/+50
Some counters, like cpu/cache-misses/, expose and require umask=%x parameter alongside event=%x in the sysfs perf counter's event file. This change make sure we parse and use it when opening user added counters. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-07-26tools/power turbostat: Document PMT in turbostat.8Patryk Wlazlyn1-0/+65
Add a general description of the user interface for adding PMT counters with the new --add pmt,... option. Provide a complete example for requesting two counters. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-07-26tools/power turbostat: Add MTL's PMT DC6 builtin counterPatryk Wlazlyn1-1/+69
Provide a definition for metadata that allows reading DC6 residency counter via PMT and exposes it as a builtin counter. Note that this residency counter is updated and read via entirely different mechanisms vs the MSR-based residency counters. On MTL processors, there are times when Die%c6 will report above 100%. This is still useful, but don't expect 3 digits of precision... Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-07-26tools/power turbostat: Add early support for PMT countersPatryk Wlazlyn1-2/+766
Allows users to read Intel PMT (Platform Monitoring Technology) counters, providing interface similar to one used to add MSR and perf counters. Because PMT is exposed as a raw MMIO range, without metadata, user has to supply the necessary information to find and correctly display the requested counter. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-07-26perf docs: Document cross compilationLeo Yan1-0/+28
Records the commands for cross compilation with two methods. The first method relies on Multiarch. The second approach is to explicitly specify the PKG_CONFIG variables, which is widely used in build system (like Buildroot, Yocto, etc). Co-developed-by: James Clark <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Leo Yan <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2024-07-26perf: build: Link lib 'zstd' for static buildLeo Yan2-2/+2
When build static perf, Makefile reports the error: Makefile.config:480: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR The libdw has been installed on the system, but the build system fails to build the feature detecting binary 'test-libdw-dwarf-unwind'. The failure is caused by missing to link the lib 'zstd'. Link lib 'zstd' for the static build, in the end, the dwarf feature can be enabled in the static perf. Signed-off-by: Leo Yan <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: James Clark <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2024-07-26perf: build: Link lib 'lzma' for static buildLeo Yan1-8/+8
The libunwind feature test failed with the static linkage. This is due to the 'lzma' lib is missed, so link it to dismiss building failure. Signed-off-by: Leo Yan <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: James Clark <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2024-07-26perf: build: Only link libebl.a for old libdwLeo Yan2-2/+22
Since libdw version 0.177, elfutils has merged libebl.a into libdw (see the commit "libebl: Don't install libebl.a, libebl.h and remove backends from spec." in the elfutils repository). As a result, libebl.a does not exist on Debian Bullseye and newer releases, causing static perf builds to fail on these distributions. This commit checks the libdw version and only links libebl.a if it detects that the libdw version is older than 0.177. Signed-off-by: Leo Yan <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: James Clark <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2024-07-26perf: build: Set Python configuration for cross compilationLeo Yan1-0/+8
Python configuration has dedicated folders for different architectures. For example, Python 3.11 has two folders as shown below, one for Arm64 and another for x86_64: /usr/lib/python3.11/config-3.11-aarch64-linux-gnu/ /usr/lib/python3.11/config-3.11-x86_64-linux-gnu/ This commit updates the Python configuration path based on the compiler's machine type, guiding the compiler to find the correct path for Python libraries. It also renames the generated .so file name to match the machine name. Signed-off-by: Leo Yan <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: James Clark <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2024-07-26perf: build: Setup PKG_CONFIG_LIBDIR for cross compilationLeo Yan2-2/+50
On recent Linux distros like Ubuntu Noble and Debian Bookworm, the 'pkg-config-aarch64-linux-gnu' package is missing. As a result, the aarch64-linux-gnu-pkg-config command is not available, which causes build failures. When a build passes the environment variables PKG_CONFIG_LIBDIR or PKG_CONFIG_PATH, like a user uses make command or a build system (like Yocto, Buildroot, etc) prepares the variables and passes to the Perf's Makefile, the commit keeps these variables for package configuration. Otherwise, this commit sets the PKG_CONFIG_LIBDIR variable to use the Multiarch libs for the cross compilation. Signed-off-by: Leo Yan <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2024-07-26perf tool: fix dereferencing NULL al->mapsCasey Chen1-1/+1
With 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions"), when cpumode is 3 (macro PERF_RECORD_MISC_HYPERVISOR), thread__find_map() could return with al->maps being NULL. The path below could add a callchain_cursor_node with NULL ms.maps. add_callchain_ip() thread__find_symbol(.., &al) thread__find_map(.., &al) // al->maps becomes NULL ms.maps = maps__get(al.maps) callchain_cursor_append(..., &ms, ...) node->ms.maps = maps__get(ms->maps) Then the path below would dereference NULL maps and get segfault. fill_callchain_info() maps__machine(node->ms.maps); Fix it by checking if maps is NULL in fill_callchain_info(). Fixes: 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions") Signed-off-by: Casey Chen <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2024-07-26Merge tag 'auxdisplay-for-v6.11-tag1' of ↵Linus Torvalds7-4/+17
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull auxdisplay updates from Geert Uytterhoeven: - add support for configuring the boot message on line displays - miscellaneous fixes and improvements * tag 'auxdisplay-for-v6.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: auxdisplay: ht16k33: Drop reference after LED registration auxdisplay: Use sizeof(*pointer) instead of sizeof(type) auxdisplay: hd44780: add missing MODULE_DESCRIPTION() macro auxdisplay: linedisp: add missing MODULE_DESCRIPTION() macro auxdisplay: linedisp: Support configuring the boot message auxdisplay: charlcd: Provide a forward declaration
2024-07-26Merge tag 'sound-fix-6.11-rc1' of ↵Linus Torvalds17-46/+434
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of fixes gathered since the previous pull. We see a bit large LOCs at a HD-audio quirk, but that's only bulk COEF data, hence it's safe to take. In addition to that, there were two minor fixes for MIDI 2.0 handling for ALSA core, and the rest are all rather random small and device-specific fixes" * tag 'sound-fix-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: fsl-asoc-card: Dynamically allocate memory for snd_soc_dai_link_components ASoC: amd: yc: Support mic on Lenovo Thinkpad E16 Gen 2 ALSA: hda/realtek: Implement sound init sequence for Samsung Galaxy Book3 Pro 360 ALSA: hda/realtek: cs35l41: Fixup remaining asus strix models ASoC: SOF: ipc4-topology: Preserve the DMA Link ID for ChainDMA on unprepare ASoC: SOF: ipc4-topology: Only handle dai_config with HW_PARAMS for ChainDMA ALSA: ump: Force 1 Group for MIDI1 FBs ALSA: ump: Don't update FB name for static blocks ALSA: usb-audio: Add a quirk for Sonix HD USB Camera ASoC: TAS2781: Fix tasdev_load_calibrated_data() ASoC: tegra: select CONFIG_SND_SIMPLE_CARD_UTILS ASoC: Intel: use soc_intel_is_byt_cr() only when IOSF_MBI is reachable ALSA: usb-audio: Move HD Webcam quirk to the right place ALSA: hda: tas2781: mark const variables as __maybe_unused ALSA: usb-audio: Fix microphone sound on HD webcam. ASoC: sof: amd: fix for firmware reload failure in Vangogh platform ASoC: Intel: Fix RT5650 SSP lookup ASOC: SOF: Intel: hda-loader: only wait for HDaudio IOC for IPC4 devices ASoC: SOF: imx8m: Fix DSP control regmap retrieval
2024-07-26Merge tag 'drm-next-2024-07-26' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds56-197/+690
Pull drm fixes from Dave Airlie: "Fixes for rc1, mostly amdgpu, i915 and xe, with some other misc ones, doesn't seem to be anything too serious. amdgpu: - Bump driver version for GFX12 DCC - DC documention warning fixes - VCN unified queue power fix - SMU fix - RAS fix - Display corruption fix - SDMA 5.2 workaround - GFX12 fixes - Uninitialized variable fix - VCN/JPEG 4.0.3 fixes - Misc display fixes - RAS fixes - VCN4/5 harvest fix - GPU reset fix i915: - Reset intel_dp->link_trained before retraining the link - Don't switch the LTTPR mode on an active link - Do not consider preemption during execlists_dequeue for gen8 - Allow NULL memory region xe: - xe_exec ioctl minor fix on sync entry cleanup upon error - SRIOV: limit VF LMEM provisioning - Wedge mode fixes v3d: - fix indirect dispatch on newer v3d revs panel: - fix panel backlight bindings" * tag 'drm-next-2024-07-26' of https://gitlab.freedesktop.org/drm/kernel: (39 commits) drm/amdgpu: reset vm state machine after gpu reset(vram lost) drm/amdgpu: add missed harvest check for VCN IP v4/v5 drm/amdgpu: Fix eeprom max record count drm/amdgpu: fix ras UE error injection failure issue drm/amd/display: Remove ASSERT if significance is zero in math_ceil2 drm/amd/display: Check for NULL pointer drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF drm/amdgpu: Add empty HDP flush function to VCN v4.0.3 drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3 drm/amd/amdgpu: Fix uninitialized variable warnings drm/amdgpu: Fix atomics on GFX12 drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell drm/i915: Allow NULL memory region drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8 dt-bindings: display: panel: samsung,atna33xc20: Document ATNA45AF01 drm/xe: Don't suspend device upon wedge drm/xe: Wedge the entire device drm/xe/pf: Limit fair VF LMEM provisioning drm/xe/exec: Fix minor bug related to xe_sync_entry_cleanup drm/amd/display: fix corruption with high refresh rates on DCN 3.0 ...
2024-07-26tools/power turbostat: Add selftests for added perf countersPatryk Wlazlyn1-0/+178
Test adds several perf counters from msr, cstate_core and cstate_pkg groups and checks if the columns for those counters show up. The test skips the counters that are not present. It is not an error, but the test may not be as exhaustive. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-07-26tools/power turbostat: Add selftests for SMI, APERF and MPERF countersPatryk Wlazlyn1-0/+157
The test requests BICs that are dependent on SMI, APERF and MPERF counters and checks if the columns show up in the output and the turbostat doesn't crash. Read the counters in both --no-msr and --no-perf mode. The test skips counters that are not present or user does not have permissions to read. It is not an error, but the test may not be as exhaustive. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-07-26Merge tag 's390-6.11-2' of ↵Linus Torvalds52-525/+746
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Vasily Gorbik: - Fix KMSAN build breakage caused by the conflict between s390 and mm-stable trees - Add KMSAN page markers for ptdump - Add runtime constant support - Fix __pa/__va for modules under non-GPL licenses by exporting necessary vm_layout struct with EXPORT_SYMBOL to prevent linkage problems - Fix an endless loop in the CF_DIAG event stop in the CPU Measurement Counter Facility code when the counter set size is zero - Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable its functionality by default - Support allocation of multiple MSI interrupts per device and improve logging of architecture-specific limitations - Add support for lowcore relocation as a debugging feature to catch all null ptr dereferences in the kernel address space, improving detection beyond the current implementation's limited write access protection - Clean up and rework CPU alternatives to allow for callbacks and early patching for the lowcore relocation * tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits) s390: Remove protvirt and kvm config guards for uv code s390/boot: Add cmdline option to relocate lowcore s390/kdump: Make kdump ready for lowcore relocation s390/entry: Make system_call() ready for lowcore relocation s390/entry: Make ret_from_fork() ready for lowcore relocation s390/entry: Make __switch_to() ready for lowcore relocation s390/entry: Make restart_int_handler() ready for lowcore relocation s390/entry: Make mchk_int_handler() ready for lowcore relocation s390/entry: Make int handlers ready for lowcore relocation s390/entry: Make pgm_check_handler() ready for lowcore relocation s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro s390/entry: Add base register to SIEEXIT macro s390/entry: Add base register to MBEAR macro s390/entry: Make __sie64a() ready for lowcore relocation s390/head64: Make startup code ready for lowcore relocation s390: Add infrastructure to patch lowcore accesses s390/atomic_ops: Disable flag outputs constraint for GCC versions below 14.2.0 s390/entry: Move SIE indicator flag to thread info s390/nmi: Simplify ptregs setup s390/alternatives: Remove alternative facility list ...
2024-07-26tools/power turbostat: Move verbose counter messages to level 2Patryk Wlazlyn1-13/+11
Printing information about the source and value during initialization and reading of the counter for each cpu, while useful when debugging, results in too verbose output. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-07-26tools/power turbostat: Move debug prints from stdout to stderrPatryk Wlazlyn1-8/+9
This leaves the stdout cleaner, having only counter data. It makes it easier for programs to parse the output of turbostat, for example selftests. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-07-26Merge tag 'arm64-fixes' of ↵Linus Torvalds6-9/+31
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The usual summary below, but the main fix is for the fast GUP lockless page-table walk when we have a combination of compile-time and run-time folding of the p4d and the pud respectively. - Remove some redundant Kconfig conditionals - Fix string output in ptrace selftest - Fix fast GUP crashes in some page-table configurations - Remove obsolete linker option when building the vDSO - Fix some sysreg field definitions for the GIC" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mm: Fix lockless walks with static and dynamic page-table folding arm64/sysreg: Correct the values for GICv4.1 arm64/vdso: Remove --hash-style=sysv kselftest: missing arg in ptrace.c arm64/Kconfig: Remove redundant 'if HAVE_FUNCTION_GRAPH_TRACER' arm64: remove redundant 'if HAVE_ARCH_KASAN' in Kconfig
2024-07-26KVM: Documentation: Fix title underline too short warningChang Yu1-1/+1
Fix "WARNING: Title underline too short" by extending title line to the proper length. Signed-off-by: Chang Yu <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-26smb3: add dynamic trace point for session setup key expired failuresSteve French2-1/+47
There are cases where services need to remount (or change their credentials files) when keys have expired, but it can be helpful to have a dynamic trace point to make it easier to notify the service to refresh the storage account key. Here is sample output, one from mount with bad password, one from a reconnect where the password has been changed or expired and reconnect fails (requiring remount with new storage account key) TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | mount.cifs-11362 [000] ..... 6000.241620: smb3_key_expired: rc=-13 user=testpassu conn_id=0x2 server=localhost addr=127.0.0.1:445 kworker/4:0-8458 [004] ..... 6044.892283: smb3_key_expired: rc=-13 user=testpassu conn_id=0x3 server=localhost addr=127.0.0.1:445 Reviewed-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-07-26Merge tag 'ceph-for-6.11-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds6-22/+34
Pull ceph updates from Ilya Dryomov: "A small patchset to address bogus I/O errors and ultimately an assertion failure in the face of watch errors with -o exclusive mappings in RBD marked for stable and some assorted CephFS fixes" * tag 'ceph-for-6.11-rc1' of https://github.com/ceph/ceph-client: rbd: don't assume rbd_is_lock_owner() for exclusive mappings rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait ceph: fix incorrect kmalloc size of pagevec mempool ceph: periodically flush the cap releases ceph: convert comma to semicolon in __ceph_dentry_dir_lease_touch() ceph: use cap_wait_list only if debugfs is enabled