aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-15mm: shmem: fix getting incorrect lruvec when replacing a shmem folioBaolin Wang2-3/+2
When testing shmem swapin, I encountered the warning below on my machine. The reason is that replacing an old shmem folio with a new one causes mem_cgroup_migrate() to clear the old folio's memcg data. As a result, the old folio cannot get the correct memcg's lruvec needed to remove itself from the LRU list when it is being freed. This could lead to possible serious problems, such as LRU list crashes due to holding the wrong LRU lock, and incorrect LRU statistics. To fix this issue, we can fallback to use the mem_cgroup_replace_folio() to replace the old shmem folio. [ 5241.100311] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5d9960 [ 5241.100317] head: order:4 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 5241.100319] flags: 0x17fffe0000040068(uptodate|lru|head|swapbacked|node=0|zone=2|lastcpupid=0x3ffff) [ 5241.100323] raw: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000 [ 5241.100325] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 5241.100326] head: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000 [ 5241.100327] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 5241.100328] head: 17fffe0000000204 fffffdffd6665801 ffffffffffffffff 0000000000000000 [ 5241.100329] head: 0000000a00000010 0000000000000000 00000000ffffffff 0000000000000000 [ 5241.100330] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled()) [ 5241.100338] ------------[ cut here ]------------ [ 5241.100339] WARNING: CPU: 19 PID: 78402 at include/linux/memcontrol.h:775 folio_lruvec_lock_irqsave+0x140/0x150 [...] [ 5241.100374] pc : folio_lruvec_lock_irqsave+0x140/0x150 [ 5241.100375] lr : folio_lruvec_lock_irqsave+0x138/0x150 [ 5241.100376] sp : ffff80008b38b930 [...] [ 5241.100398] Call trace: [ 5241.100399] folio_lruvec_lock_irqsave+0x140/0x150 [ 5241.100401] __page_cache_release+0x90/0x300 [ 5241.100404] __folio_put+0x50/0x108 [ 5241.100406] shmem_replace_folio+0x1b4/0x240 [ 5241.100409] shmem_swapin_folio+0x314/0x528 [ 5241.100411] shmem_get_folio_gfp+0x3b4/0x930 [ 5241.100412] shmem_fault+0x74/0x160 [ 5241.100414] __do_fault+0x40/0x218 [ 5241.100417] do_shared_fault+0x34/0x1b0 [ 5241.100419] do_fault+0x40/0x168 [ 5241.100420] handle_pte_fault+0x80/0x228 [ 5241.100422] __handle_mm_fault+0x1c4/0x440 [ 5241.100424] handle_mm_fault+0x60/0x1f0 [ 5241.100426] do_page_fault+0x120/0x488 [ 5241.100429] do_translation_fault+0x4c/0x68 [ 5241.100431] do_mem_abort+0x48/0xa0 [ 5241.100434] el0_da+0x38/0xc0 [ 5241.100436] el0t_64_sync_handler+0x68/0xc0 [ 5241.100437] el0t_64_sync+0x14c/0x150 [ 5241.100439] ---[ end trace 0000000000000000 ]--- [[email protected]: remove less helpful comments, per Matthew] Link: https://lkml.kernel.org/r/ccad3fe1375b468ebca3227b6b729f3eaf9d8046.1718423197.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/3c11000dd6c1df83015a8321a859e9775ebbc23e.1718266112.git.baolin.wang@linux.alibaba.com Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") Signed-off-by: Baolin Wang <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Muchun Song <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15mm/debug_vm_pgtable: drop RANDOM_ORVALUE trickPeter Xu1-26/+5
Macro RANDOM_ORVALUE was used to make sure the pgtable entry will be populated with !none data in clear tests. The RANDOM_ORVALUE tried to cover mostly all the bits in a pgtable entry, even if there's no discussion on whether all the bits will be vaild. Both S390 and PPC64 have their own masks to avoid touching some bits. Now it's the turn for x86_64. The issue is there's a recent report from Mikhail Gavrilov showing that this can cause a warning with the newly added pte set check in commit 8430557fc5 on writable v.s. userfaultfd-wp bit, even though the check itself was valid, the random pte is not. We can choose to mask more bits out. However the need to have such random bits setup is questionable, as now it's already guaranteed to be true on below: - For pte level, the pgtable entry will be installed with value from pfn_pte(), where pfn points to a valid page. Hence the pte will be !none already if populated with pfn_pte(). - For upper-than-pte level, the pgtable entry should contain a directory entry always, which is also !none. All the cases look like good enough to test a pxx_clear() helper. Instead of extending the bitmask, drop the "set random bits" trick completely. Add some warning guards to make sure the entries will be !none before clear(). Link: https://lkml.kernel.org/r/[email protected] Fixes: 8430557fc584 ("mm/page_table_check: support userfault wr-protect entries") Signed-off-by: Peter Xu <[email protected]> Reported-by: Mikhail Gavrilov <[email protected]> Link: https://lore.kernel.org/r/CABXGCsMB9A8-X+Np_Q+fWLURYL_0t3Y-MdoNabDM-Lzk58-DGA@mail.gmail.com Tested-by: Mikhail Gavrilov <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gavin Shan <[email protected]> Cc: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15mm: fix possible OOB in numa_rebuild_large_mapping()Kefeng Wang1-4/+10
The large folio is mapped with folio size(not greater PMD_SIZE) aligned virtual address during the pagefault, ie, 'addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE)' in do_anonymous_page(). But after the mremap(), the virtual address only requires PAGE_SIZE alignment. Also pte is moved to new in move_page_tables(), then traversal of the new pte in the numa_rebuild_large_mapping() could hit the following issue, Unable to handle kernel paging request at virtual address 00000a80c021a788 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00002040341a6000 [00000a80c021a788] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP ... CPU: 76 PID: 15187 Comm: git Kdump: loaded Tainted: G W 6.10.0-rc2+ #209 Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.79 08/21/2021 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : numa_rebuild_large_mapping+0x338/0x638 lr : numa_rebuild_large_mapping+0x320/0x638 sp : ffff8000b41c3b00 x29: ffff8000b41c3b30 x28: ffff8000812a0000 x27: 00000000000a8000 x26: 00000000000000a8 x25: 0010000000000001 x24: ffff20401c7170f0 x23: 0000ffff33a1e000 x22: 0000ffff33a76000 x21: ffff20400869eca0 x20: 0000ffff33976000 x19: 00000000000000a8 x18: ffffffffffffffff x17: 0000000000000000 x16: 0000000000000020 x15: ffff8000b41c36a8 x14: 0000000000000000 x13: 205d373831353154 x12: 5b5d333331363732 x11: 000000000011ff78 x10: 000000000011ff10 x9 : ffff800080273f30 x8 : 000000320400869e x7 : c0000000ffffd87f x6 : 00000000001e6ba8 x5 : ffff206f3fb5af88 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : fffffdffc0000000 x0 : 00000a80c021a780 Call trace: numa_rebuild_large_mapping+0x338/0x638 do_numa_page+0x3e4/0x4e0 handle_pte_fault+0x1bc/0x238 __handle_mm_fault+0x20c/0x400 handle_mm_fault+0xa8/0x288 do_page_fault+0x124/0x498 do_translation_fault+0x54/0x80 do_mem_abort+0x4c/0xa8 el0_da+0x40/0x110 el0t_64_sync_handler+0xe4/0x158 el0t_64_sync+0x188/0x190 Fix it by making the start and end not only within the vma range, but also within the page table range. Link: https://lkml.kernel.org/r/[email protected] Fixes: d2136d749d76 ("mm: support multi-size THP numa balancing") Signed-off-by: Kefeng Wang <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: John Hubbard <[email protected]> Cc: Liu Shixin <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Ryan Roberts <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15mm/migrate: fix kernel BUG at mm/compaction.c:2761!Hugh Dickins1-1/+7
I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); and if DEBUG_VM were off, then pages would be lost on a local list. Our convention is that if migrate_pages() reports complete success (0), then the migratepages list will be empty; but if it reports an error or some pages remaining, then its caller must putback_movable_pages(). There's a new case in which migrate_pages() has been reporting complete success, but returning with pages left on the migratepages list: when migrate_pages_batch() successfully split a folio on the deferred list, but then the "Failure isn't counted" call does not dispose of them all. Since that block is expecting the large folio to have been counted as 1 failure already, and since the return code is later adjusted to success whenever the returned list is found empty, the simple way to fix this safely is to count splitting the deferred folio as "a failure". Link: https://lkml.kernel.org/r/[email protected] Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list") Signed-off-by: Hugh Dickins <[email protected]> Cc: Baolin Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15selftests: mm: make map_fixed_noreplace test names stableMark Brown1-8/+16
KTAP parsers interpret the output of ksft_test_result_*() as being the name of the test. The map_fixed_noreplace test uses a dynamically allocated base address for the mmap()s that it tests and currently includes this in the test names that it logs so the test names that are logged are not stable between runs. It also uses multiples of PAGE_SIZE which mean that runs for kernels with different PAGE_SIZE configurations can't be directly compared. Both these factors cause issues for CI systems when interpreting and displaying results. Fix this by replacing the current test names with fixed strings describing the intent of the mappings that are logged, the existing messages with the actual addresses and sizes are retained as diagnostic prints to aid in debugging. Link: https://lkml.kernel.org/r/20240605-kselftest-mm-fixed-noreplace-v1-1-a235db8b9be9@kernel.org Fixes: 4838cf70e539 ("selftests/mm: map_fixed_noreplace: conform test to TAP format output") Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXECJeff Xu2-0/+87
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it didn't have proper documentation. This led to a lot of confusion, especially about whether or not memfd created with the MFD_NOEXEC_SEAL flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set MFD_ALLOW_SEALING to be sealable, so it's a fair question. As one might have noticed, unlike other flags in memfd_create, MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea is to make it easier to use memfd in the most common way, which is NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl vm.noexec to help existing applications move to a more secure way of using memfd. Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless MFD_ALLOW_SEALING is set, to be consistent with other flags [1], Those are based on the viewpoint that each flag is an atomic unit, which is a reasonable assumption. However, MFD_NOEXEC_SEAL was designed with the intent of promoting the most secure method of using memfd, therefore a combination of multiple functionalities into one bit. Furthermore, the MFD_NOEXEC_SEAL has been added for more than one year, and multiple applications and distributions have backported and utilized it. Altering ABI now presents a degree of risk and may lead to disruption. MFD_NOEXEC_SEAL is a new flag, and applications must change their code to use it. There is no backward compatibility problem. When sysctl vm.noexec == 1 or 2, applications that don't set MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And old-application might break, that is by-design, in such a system vm.noexec = 0 shall be used. Also no backward compatibility problem. I propose to include this documentation patch to assist in clarifying the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential future confusion. Finally, I would like to express my gratitude to David Rheinsberg and Barnabás Pőcze for initiating the discussion on the topic of sealability. [1] https://lore.kernel.org/lkml/[email protected]/ [[email protected]: updates per Randy] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: v3] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Xu <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Cc: Aleksa Sarai <[email protected]> Cc: Barnabás Pőcze <[email protected]> Cc: Daniel Verkamp <[email protected]> Cc: David Rheinsberg <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jorge Lucangeli Obes <[email protected]> Cc: Kees Cook <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15mm: mmap: allow for the maximum number of bits for randomizing mmap_base by ↵Rafael Aquini1-0/+12
default An ASLR regression was noticed [1] and tracked down to file-mapped areas being backed by THP in recent kernels. The 21-bit alignment constraint for such mappings reduces the entropy for randomizing the placement of 64-bit library mappings and breaks ASLR completely for 32-bit libraries. The reported issue is easily addressed by increasing vm.mmap_rnd_bits and vm.mmap_rnd_compat_bits. This patch just provides a simple way to set ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS to their maximum values allowed by the architecture at build time. [1] https://zolutal.github.io/aslrnt/ [[email protected]: default to `y' if 32-bit, per Rafael] Link: https://lkml.kernel.org/r/[email protected] Fixes: 1854bc6e2420 ("mm/readahead: Align file mappings for non-DAX") Signed-off-by: Rafael Aquini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Samuel Holland <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15gcov: add support for GCC 14Peter Oberparleiter1-1/+3
Using gcov on kernels compiled with GCC 14 results in truncated 16-byte long .gcda files with no usable data. To fix this, update GCOV_COUNTERS to match the value defined by GCC 14. Tested with GCC versions 14.1.0 and 13.2.0. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Oberparleiter <[email protected]> Reported-by: Allison Henderson <[email protected]> Reported-by: Chuck Lever III <[email protected]> Tested-by: Chuck Lever <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDINGOleg Nesterov1-0/+1
kernel_wait4() doesn't sleep and returns -EINTR if there is no eligible child and signal_pending() is true. That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending() return false and avoid a busy-wait loop. Link: https://lkml.kernel.org/r/[email protected] Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") Signed-off-by: Oleg Nesterov <[email protected]> Reported-by: Rachel Menge <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Boqun Feng <[email protected]> Tested-by: Wei Fu <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Cc: Allen Pais <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Joel Granados <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Mike Christie <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Zqiang <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15mm: huge_memory: fix misused mapping_large_folio_support() for anon foliosRan Xiaokai2-11/+21
When I did a large folios split test, a WARNING "[ 5059.122759][ T166] Cannot split file folio to non-0 order" was triggered. But the test cases are only for anonmous folios. while mapping_large_folio_support() is only reasonable for page cache folios. In split_huge_page_to_list_to_order(), the folio passed to mapping_large_folio_support() maybe anonmous folio. The folio_test_anon() check is missing. So the split of the anonmous THP is failed. This is also the same for shmem_mapping(). We'd better add a check for both. But the shmem_mapping() in __split_huge_page() is not involved, as for anonmous folios, the end parameter is set to -1, so (head[i].index >= end) is always false. shmem_mapping() is not called. Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support() for anon mapping, So we can detect the wrong use more easily. THP folios maybe exist in the pagecache even the file system doesn't support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE is enabled, khugepaged will try to collapse read-only file-backed pages to THP. But the mapping does not actually support multi order large folios properly. Using /sys/kernel/debug/split_huge_pages to verify this, with this patch, large anon THP is successfully split and the warning is ceased. Link: https://lkml.kernel.org/r/[email protected] Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Reviewed-by: Barry Song <[email protected]> Reviewed-by: Zi Yan <[email protected]> Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Ran Xiaokai <[email protected]> Cc: Michal Hocko <[email protected]> Cc: xu xin <[email protected]> Cc: Yang Yang <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()Suren Baghdasaryan1-3/+8
put_page_tag_ref() should be called only when get_page_tag_ref() returns a valid reference because only in that case get_page_tag_ref() enters RCU read section while put_page_tag_ref() will call rcu_read_unlock() even if the provided reference is NULL. Fix pgalloc_tag_get() which does not follow this rule causing RCU imbalance. Add a warning in put_page_tag_ref() to catch any future mistakes. Link: https://lkml.kernel.org/r/[email protected] Fixes: cc92eba1c88b ("mm: fix non-compound multi-order memory accounting in __free_pages") Signed-off-by: Suren Baghdasaryan <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Acked-by: Vlastimil Babka <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Pasha Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=nSuren Baghdasaryan1-3/+13
Memory allocation profiling is trying to register sysctl interface even when CONFIG_SYSCTL=n, resulting in proc_do_static_key() being undefined. Prevent that by skipping sysctl registration for such configurations. Link: https://lkml.kernel.org/r/[email protected] Fixes: 22d407b164ff ("lib: add allocation tagging support for memory allocation profiling") Signed-off-by: Suren Baghdasaryan <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Acked-by: Vlastimil Babka <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15MAINTAINERS: remove Lorenzo as vmalloc reviewerLorenzo Stoakes1-1/+0
I haven't had the bandwidth to review vmalloc patches recently and I suspect I won't be able to do so consistently moving forwards, so I think it's best if I remove myself as reviewer for the time being. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Lorenzo Stoakes <[email protected]> Cc: Baoquan He <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15Revert "mm: init_mlocked_on_free_v3"David Hildenbrand7-73/+9
There was insufficient review and no agreement that this is the right approach. There are serious flaws with the implementation that make processes using mlock() not even work with simple fork() [1] and we get reliable crashes when rebooting. Further, simply because we might be unmapping a single PTE of a large mlocked folio, we shouldn't zero out the whole folio. ... especially because the code can also *corrupt* urelated memory because kernel_init_pages(page, folio_nr_pages(folio)); Could end up writing outside of the actual folio if we work with a tail page. Let's revert it. Once there is agreement that this is the right approach, the issues were fixed and there was reasonable review and proper testing, we can consider it again. [1] https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: ba42b524a040 ("mm: init_mlocked_on_free_v3") Signed-off-by: David Hildenbrand <[email protected]> Reported-by: David Wang <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Reported-by: Lance Yang <[email protected]> Closes: https://lkml.kernel.org/r/[email protected] Acked-by: Lance Yang <[email protected]> Cc: York Jasper Niebuhr <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15mm/page_table_check: fix crash on ZONE_DEVICEPeter Xu1-1/+10
Not all pages may apply to pgtable check. One example is ZONE_DEVICE pages: they map PFNs directly, and they don't allocate page_ext at all even if there's struct page around. One may reference devm_memremap_pages(). When both ZONE_DEVICE and page-table-check enabled, then try to map some dax memories, one can trigger kernel bug constantly now when the kernel was trying to inject some pfn maps on the dax device: kernel BUG at mm/page_table_check.c:55! While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page fault resolutions, skip all the checks if page_ext doesn't even exist in pgtable checker, which applies to ZONE_DEVICE but maybe more. Link: https://lkml.kernel.org/r/[email protected] Fixes: df4e817b7108 ("mm: page table check") Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15gcc: disable '-Warray-bounds' for gcc-9Yury Norov1-1/+1
'-Warray-bounds' is already disabled for gcc-10+. Now that we've merged bitmap_{read,write), I see the following error when building the kernel with gcc-9.4 (Ubuntu 20.04.4 LTS) for x86_64 allmodconfig: drivers/pinctrl/pinctrl-cy8c95x0.c: In function `cy8c95x0_read_regs_mask.isra.0': include/linux/bitmap.h:756:18: error: array subscript [1, 288230376151711744] is outside array bounds of `long unsigned int[1]' [-Werror=array-bounds] 756 | value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits); | ~~~^~~~~~~~~~~ The immediate reason is that the commit b44759705f7d ("bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()") switched the bitmap_get_value8() to an alias of bitmap_read(); the same for 'set'. Now; the code that triggers Warray-bounds, calls the function like this: #define MAX_BANK 8 #define BANK_SZ 8 #define MAX_LINE (MAX_BANK * BANK_SZ) DECLARE_BITMAP(tval, MAX_LINE); // 64-bit map: unsigned long tval[1] read_val |= bitmap_get_value8(tval, i * BANK_SZ) & ~bits; bitmap_read() is implemented such that it may conditionally dereference a pointer beyond the boundary like this: unsigned long offset = start % BITS_PER_LONG; unsigned long space = BITS_PER_LONG - offset; if (space >= nbits) return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits); value_low = map[index] & BITMAP_FIRST_WORD_MASK(start); value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits); return (value_low >> offset) | (value_high << space); In case of bitmap_get_value8(), it's impossible to violate the boundary because 'space >= nbits' is never the true for byte-aligned 8-bit access. So, this is clearly a false-positive. The same type of false-positives break my allmodconfig build in many places. gcc-8, is clear, however. Link: https://lkml.kernel.org/r/[email protected] Fixes: b44759705f7d ("bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()") Signed-off-by: Yury Norov <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: David S. Miller <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Yoann Congal <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()Joseph Qi3-82/+131
bdev->bd_super has been removed and commit 8887b94d9322 change the usage from bdev->bd_super to b_assoc_map->host->i_sb. Since ocfs2 hasn't set bh->b_assoc_map, it will trigger NULL pointer dereference when calling into ocfs2_abort_trigger(). Actually this was pointed out in history, see commit 74e364ad1b13. But I've made a mistake when reviewing commit 8887b94d9322 and then re-introduce this regression. Since we cannot revive bdev in buffer head, so fix this issue by initializing all types of ocfs2 triggers when fill super, and then get the specific ocfs2 trigger from ocfs2_caching_info when access journal. [[email protected]: v2] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 8887b94d9322 ("ocfs2: stop using bdev->bd_super for journal error logging") Signed-off-by: Joseph Qi <[email protected]> Reviewed-by: Heming Zhao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> [6.6+] Signed-off-by: Andrew Morton <[email protected]>
2024-06-15ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()Joseph Qi1-4/+6
bdev->bd_super has been removed and commit 8887b94d9322 change the usage from bdev->bd_super to b_assoc_map->host->i_sb. This introduces the following NULL pointer dereference in ocfs2_journal_dirty() since b_assoc_map is still not initialized. This can be easily reproduced by running xfstests generic/186, which simulate no more credits. [ 134.351592] BUG: kernel NULL pointer dereference, address: 0000000000000000 ... [ 134.355341] RIP: 0010:ocfs2_journal_dirty+0x14f/0x160 [ocfs2] ... [ 134.365071] Call Trace: [ 134.365312] <TASK> [ 134.365524] ? __die_body+0x1e/0x60 [ 134.365868] ? page_fault_oops+0x13d/0x4f0 [ 134.366265] ? __pfx_bit_wait_io+0x10/0x10 [ 134.366659] ? schedule+0x27/0xb0 [ 134.366981] ? exc_page_fault+0x6a/0x140 [ 134.367356] ? asm_exc_page_fault+0x26/0x30 [ 134.367762] ? ocfs2_journal_dirty+0x14f/0x160 [ocfs2] [ 134.368305] ? ocfs2_journal_dirty+0x13d/0x160 [ocfs2] [ 134.368837] ocfs2_create_new_meta_bhs.isra.51+0x139/0x2e0 [ocfs2] [ 134.369454] ocfs2_grow_tree+0x688/0x8a0 [ocfs2] [ 134.369927] ocfs2_split_and_insert.isra.67+0x35c/0x4a0 [ocfs2] [ 134.370521] ocfs2_split_extent+0x314/0x4d0 [ocfs2] [ 134.371019] ocfs2_change_extent_flag+0x174/0x410 [ocfs2] [ 134.371566] ocfs2_add_refcount_flag+0x3fa/0x630 [ocfs2] [ 134.372117] ocfs2_reflink_remap_extent+0x21b/0x4c0 [ocfs2] [ 134.372994] ? inode_update_timestamps+0x4a/0x120 [ 134.373692] ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2] [ 134.374545] ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2] [ 134.375393] ocfs2_reflink_remap_blocks+0xe4/0x4e0 [ocfs2] [ 134.376197] ocfs2_remap_file_range+0x1de/0x390 [ocfs2] [ 134.376971] ? security_file_permission+0x29/0x50 [ 134.377644] vfs_clone_file_range+0xfe/0x320 [ 134.378268] ioctl_file_clone+0x45/0xa0 [ 134.378853] do_vfs_ioctl+0x457/0x990 [ 134.379422] __x64_sys_ioctl+0x6e/0xd0 [ 134.379987] do_syscall_64+0x5d/0x170 [ 134.380550] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 134.381231] RIP: 0033:0x7fa4926397cb [ 134.381786] Code: 73 01 c3 48 8b 0d bd 56 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 56 38 00 f7 d8 64 89 01 48 [ 134.383930] RSP: 002b:00007ffc2b39f7b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 134.384854] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fa4926397cb [ 134.385734] RDX: 00007ffc2b39f7f0 RSI: 000000004020940d RDI: 0000000000000003 [ 134.386606] RBP: 0000000000000000 R08: 00111a82a4f015bb R09: 00007fa494221000 [ 134.387476] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 134.388342] R13: 0000000000f10000 R14: 0000558e844e2ac8 R15: 0000000000f10000 [ 134.389207] </TASK> Fix it by only aborting transaction and journal in ocfs2_journal_dirty() now, and leave ocfs2_abort() later when detecting an aborted handle, e.g. start next transaction. Also log the handle details in this case. Link: https://lkml.kernel.org/r/[email protected] Fixes: 8887b94d9322 ("ocfs2: stop using bdev->bd_super for journal error logging") Signed-off-by: Joseph Qi <[email protected]> Reviewed-by: Heming Zhao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> [6.6+] Signed-off-by: Andrew Morton <[email protected]>
2024-06-15arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix BT shutdown GPIOTim Harvey1-1/+1
Fix the invalid BT shutdown GPIO (gpio1_io3 not gpio4_io16) Fixes: 716ced308234 ("arm64: dts: freescale: Add imx8mp-venice-gw73xx-2x") Signed-off-by: Tim Harvey <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
2024-06-15efi/arm64: Fix kmemleak false positive in arm64_efi_rt_init()Waiman Long1-0/+2
The kmemleak code sometimes complains about the following leak: unreferenced object 0xffff8000102e0000 (size 32768):   comm "swapper/0", pid 1, jiffies 4294937323 (age 71.240s)   hex dump (first 32 bytes):     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................   backtrace:     [<00000000db9a88a3>] __vmalloc_node_range+0x324/0x450     [<00000000ff8903a4>] __vmalloc_node+0x90/0xd0     [<000000001a06634f>] arm64_efi_rt_init+0x64/0xdc     [<0000000007826a8d>] do_one_initcall+0x178/0xac0     [<0000000054a87017>] do_initcalls+0x190/0x1d0     [<00000000308092d0>] kernel_init_freeable+0x2c0/0x2f0     [<000000003e7b99e0>] kernel_init+0x28/0x14c     [<000000002246af5b>] ret_from_fork+0x10/0x20 The memory object in this case is for efi_rt_stack_top and is allocated in an initcall. So this is certainly a false positive. Mark the object as not a leak to quash it. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2024-06-15efi/x86: Free EFI memory map only when installing a new one.Ard Biesheuvel3-11/+11
The logic in __efi_memmap_init() is shared between two different execution flows: - mapping the EFI memory map early or late into the kernel VA space, so that its entries can be accessed; - the x86 specific cloning of the EFI memory map in order to insert new entries that are created as a result of making a memory reservation via a call to efi_mem_reserve(). In the former case, the underlying memory containing the kernel's view of the EFI memory map (which may be heavily modified by the kernel itself on x86) is not modified at all, and the only thing that changes is the virtual mapping of this memory, which is different between early and late boot. In the latter case, an entirely new allocation is created that carries a new, updated version of the kernel's view of the EFI memory map. When installing this new version, the old version will no longer be referenced, and if the memory was allocated by the kernel, it will leak unless it gets freed. The logic that implements this freeing currently lives on the code path that is shared between these two use cases, but it should only apply to the latter. So move it to the correct spot. While at it, drop the dummy definition for non-x86 architectures, as that is no longer needed. Cc: <[email protected]> Fixes: f0ef6523475f ("efi: Fix efi_memmap_alloc() leaks") Tested-by: Ashish Kalra <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Ard Biesheuvel <[email protected]>
2024-06-15efi/arm: Disable LPAE PAN when calling EFI runtime servicesArd Biesheuvel1-0/+13
EFI runtime services are remapped into the lower 1 GiB of virtual address space at boot, so they are guaranteed to be able to co-exist with the kernel virtual mappings without the need to allocate space for them in the kernel's vmalloc region, which is rather small. This means those mappings are covered by TTBR0 when LPAE PAN is enabled, and so 'user' access must be enabled while such calls are in progress. Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2024-06-15arm: dts: imx53-qsb-hdmi: Disable panel instead of deleting nodeLiu Ying2-3/+5
We cannot use /delete-node/ directive to delete a node in a DT overlay. The node won't be deleted effectively. Instead, set the node's status property to "disabled" to achieve something similar. Fixes: eeb403df953f ("ARM: dts: imx53-qsb: add support for the HDMI expander") Signed-off-by: Liu Ying <[email protected]> Reviewed-by: Dmitry Baryshkov <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
2024-06-15arm64: dts: imx8mp: Fix TC9595 input clock on DH i.MX8M Plus DHCOM SoMMarek Vasut1-1/+1
The IMX8MP_CLK_CLKOUT2 supplies the TC9595 bridge with 13 MHz reference clock. The IMX8MP_CLK_CLKOUT2 is supplied from IMX8MP_AUDIO_PLL2_OUT. The IMX8MP_CLK_CLKOUT2 operates only as a power-of-two divider, and the current 156 MHz is not power-of-two divisible to achieve 13 MHz. To achieve 13 MHz output from IMX8MP_CLK_CLKOUT2, set IMX8MP_AUDIO_PLL2_OUT to 208 MHz, because 208 MHz / 16 = 13 MHz. Fixes: 20d0b83e712b ("arm64: dts: imx8mp: Add TC9595 bridge on DH electronics i.MX8M Plus DHCOM") Signed-off-by: Marek Vasut <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
2024-06-15firewire: core: record card index in bus_reset_handle tracepoints eventTakashi Sakamoto2-4/+7
The bus reset event occurs in the bus managed by one of 1394 OHCI controller in Linux system, however the existing tracepoints events has the lack of data about it to distinguish the issued hardware from the others. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>
2024-06-15firewire: core: record card index in tracepoinrts events derived from ↵Takashi Sakamoto2-12/+15
bus_reset_arrange_template The asynchronous transmission of phy packet is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>
2024-06-15firewire: core: record card index in async_phy_inbound tracepoints eventTakashi Sakamoto2-4/+6
The asynchronous transmission of phy packet is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>
2024-06-15firewire: core: record card index in async_phy_outbound_complete tracepoints ↵Takashi Sakamoto3-5/+8
event The asynchronous transmission of phy packet is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>
2024-06-15firewire: core: record card index in async_phy_outbound_initiate tracepoints ↵Takashi Sakamoto3-6/+9
event The asynchronous transaction is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>
2024-06-15firewire: core: record card index in tracepoinrts events derived from ↵Takashi Sakamoto2-12/+16
async_inbound_template The asynchronous transaction is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>
2024-06-15firewire: core: record card index in tracepoinrts events derived from ↵Takashi Sakamoto2-12/+18
async_outbound_initiate_template The asynchronous transaction is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>
2024-06-15firewire: core: record card index in tracepoinrts events derived from ↵Takashi Sakamoto2-10/+13
async_outbound_complete_template The asynchronous transaction is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>
2024-06-15firewire: fix website URL in KconfigTakashi Sakamoto1-1/+1
The wiki in kernel.org is no longer updated. This commit replaces the website URL with the latest one. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Sakamoto <[email protected]>
2024-06-14Merge tag 's390-6.10-4' of ↵Linus Torvalds6-25/+103
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - A couple of fixes for regressions resulting from the uncoupling of physical vs virtual kernel address spaces: fix the mapping of the kernel image using large pages; enforce alignment checks on physical addresses before creating large pages - Update defconfigs * tag 's390-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: Restore mapping of kernel image using large pages s390/mm: Allow large pages only for aligned physical addresses s390: Update defconfigs
2024-06-14Merge branch '40GbE' of ↵Jakub Kicinski2-3/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-06-11 (ice) This series contains updates to ice driver only. En-Wei Wu resolves IRQ collision during suspend. Paul corrects 200Gbps speed being reported as unknown. Wojciech adds retry mechanism when package download fails. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: implement AQ download pkg retry ice: fix 200G link speed message log ice: avoid IRQ collision to fix init failure on ACPI S3 resume ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-14Merge tag 'drm-fixes-2024-06-15' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds22-37/+87
Pull drm fixes from Dave Airlie: "Weekly fixes. Seems a little quieter than usual, but still a bunch of stuff across the board. Mostly xe, some exynos and nouveau fixes. core: - Werror Kconfig fix panel: - add orientation quirk for Aya Neo KUN - fix runtime warning on panel/bridge release nouveau: - remove unused struct - fix wq crash on cards with no display amdgpu: - fix bo release clear page warning xe: - update MAINTAINERS - Use correct forcewake assertions - Assert that VRAM provisioning is only done on DGFX - Flush render caches before user-fence signalling on all engines - Move the disable_c6 call since it was sometimes never called exynos: - fix regression with fallback mode - fix EDID related memory leak - remove redundant code komeda: - fix debugfs conditional compilations - check pointer error value renesas: - atomic shutdown fix mediatek: - atomic shutdown fix" * tag 'drm-fixes-2024-06-15' of https://gitlab.freedesktop.org/drm/kernel: arm/komeda: Remove all CONFIG_DEBUG_FS conditional compilations drm/xe: move disable_c6 call drm/xe: flush engine buffers before signalling user fence on all engines drm/xe/pf: Assert LMEM provisioning is done only on DGFX drm/xe/xe_gt_idle: use GT forcewake domain assertion drm/mediatek: Call drm_atomic_helper_shutdown() at shutdown time drm: renesas: shmobile: Call drm_atomic_helper_shutdown() at shutdown time drm/nouveau: remove unused struct 'init_exec' drm/nouveau: don't attempt to schedule hpd_work on headless cards drm/amdgpu: Fix the BO release clear memory warning drm/bridge/panel: Fix runtime warning on panel bridge release drm/komeda: check for error-valued pointer drm: panel-orientation-quirks: Add quirk for Aya Neo KUN drm/exynos/vidi: fix memory leak in .get_modes() drm/exynos: dp: drop driver owner initialization drm/exynos: hdmi: report safe 640x480 mode as a fallback when no EDID found drm: have config DRM_WERROR depend on !WERROR MAINTAINERS: Update Xe driver maintainers MAINTAINERS: update Xe driver maintainers
2024-06-14Merge tag 'vfio-v6.10-rc4' of https://github.com/awilliam/linux-vfioLinus Torvalds6-207/+125
Pull VFIO fixes from Alex Williamson: "Fix long standing lockdep issue of using remap_pfn_range() from the vfio-pci fault handler for mapping device MMIO. Commit ba168b52bf8e ("mm: use rwsem assertion macros for mmap_lock") now exposes this as a warning forcing this to be addressed. remap_pfn_range() was used here to efficiently map the entire vma, but it really never should have been used in the fault handler and doesn't handle concurrency, which introduced complex locking. We also needed to track vmas mapping the device memory in order to zap those vmas when the memory is disabled resulting in a vma list. Instead of all that mess, setup an address space on the device fd such that we can use unmap_mapping_range() for zapping to avoid the tracking overhead and use the standard vmf_insert_pfn() to insert mappings on fault. For now we'll iterate the vma and opportunistically try to insert mappings for the entire vma. This aligns with typical use cases, but hopefully in the future we can drop the iterative approach and make use of huge_fault instead, once vmf_insert_pfn{pud,pmd}() learn to handle pfnmaps" * tag 'vfio-v6.10-rc4' of https://github.com/awilliam/linux-vfio: vfio/pci: Insert full vma on mmap'd MMIO fault vfio/pci: Use unmap_mapping_range() vfio: Create vfio_fs_type with inode per device
2024-06-14netdev-genl: fix error codes when outputting XDP featuresJakub Kicinski1-8/+8
-EINVAL will interrupt the dump. The correct error to return if we have more data to dump is -EMSGSIZE. Discovered by doing: for i in `seq 80`; do ip link add type veth; done ./cli.py --dbg-small-recv 5300 --spec netdev.yaml --dump dev-get >> /dev/null [...] nl_len = 64 (48) nl_flags = 0x0 nl_type = 19 nl_len = 20 (4) nl_flags = 0x2 nl_type = 3 error: -22 Fixes: d3d854fd6a1d ("netdev-genl: create a simple family for netdev stuff") Reviewed-by: Amritha Nambiar <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-14Merge tag 'for-netdev' of ↵Jakub Kicinski9-11/+92
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-14 We've added 8 non-merge commits during the last 2 day(s) which contain a total of 9 files changed, 92 insertions(+), 11 deletions(-). The main changes are: 1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason() triggered via __bpf_try_make_writable(), from Florian Westphal. 2) Fix removal of kfuncs during linking phase which then throws a kernel build warning via resolve_btfids about unresolved symbols, from Tony Ambardar. 3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol is not available on User Mode Linux, from Maciej Żenczykowski. 4) Fix a register corruption in reg_set_min_max triggering an invariant violation in BPF verifier, from Daniel Borkmann. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Harden __bpf_kfunc tag against linker kfunc removal compiler_types.h: Define __retain for __attribute__((__retain__)) bpf: Avoid splat in pskb_pull_reason bpf: fix UML x86_64 compile failure selftests/bpf: Add test coverage for reg_set_min_max handling bpf: Reduce stack consumption in check_stack_write_fixed_off bpf: Fix reg_set_min_max corruption of fake_reg MAINTAINERS: mailmap: Update Stanislav's email address ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-15Merge tag 'drm-misc-fixes-2024-06-14' of ↵Dave Airlie14-21/+38
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.10-rc4: - Kconfig fix for WERROR. - Add panel quirk for Aya Neo KUN - Small bugfixes in komeda, bridge/panel, amdgpu, nouveau. - Remove unused nouveau struct. - Call drm_atomic_helper_shutdown for shmobile and mediatek on shutdown. - Remove DEBUGFS ifdefs from komeda. Signed-off-by: Dave Airlie <[email protected]> From: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-14Merge tag 'block-6.10-20240614' of git://git.kernel.dk/linuxLinus Torvalds13-53/+100
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Discard double free on error conditions (Chunguang) - Target Fixes (Daniel) - Namespace detachment regression fix (Keith) - Fix for an issue with flush requests and queuelist reuse (Chengming) - nbd sparse annotation fixes (Christoph) - unmap and free bio mapped data via submitter (Anuj) - loop discard/fallocate unsupported fix (Cyril) - Fix for the zoned write plugging added in this release (Damien) - sed-opal wrong address fix (Su) * tag 'block-6.10-20240614' of git://git.kernel.dk/linux: loop: Disable fallocate() zero and discard if not supported nvme: fix namespace removal list nbd: Remove __force casts nvmet: always initialize cqe.result nvmet-passthru: propagate status from id override functions nvme: avoid double free special payload block: unmap and free user mapped integrity via submitter block: fix request.queuelist usage in flush block: Optimize disk zone resource cleanup block: sed-opal: avoid possible wrong address reference in read_sed_opal_key()
2024-06-14Merge tag 'io_uring-6.10-20240614' of git://git.kernel.dk/linuxLinus Torvalds4-3/+6
Pull io_uring fixes from Jens Axboe: "Two fixes from Pavel headed to stable: - Ensure that the task state is correct before attempting to grab a mutex - Split cancel sequence flag into a separate variable, as it can get set by someone not owning the request (but holding the ctx lock)" * tag 'io_uring-6.10-20240614' of git://git.kernel.dk/linux: io_uring: fix cancellation overwriting req->flags io_uring/rsrc: don't lock while !TASK_RUNNING
2024-06-14Merge tag 'scsi-fixes' of ↵Linus Torvalds10-36/+130
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three obvious driver fixes and two core fixes. The two core fixes are to disable Command Duration Limits by default to fix an inconsistency in SATA and some USB devices. The other is to change the default read size for block zero to follow the device preference (some USB bridges preferring 16 byte commands don't have a translation for READ(10) and thus don't scan properly)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpi3mr: Fix ATA NCQ priority support scsi: ufs: core: Quiesce request queues before checking pending cmds scsi: core: Disable CDL by default scsi: mpt3sas: Avoid test/set_bit() operating in non-allocated memory scsi: sd: Use READ(16) when reading block zero on large capacity disks
2024-06-14bpf: Harden __bpf_kfunc tag against linker kfunc removalTony Ambardar1-1/+1
BPF kfuncs are often not directly referenced and may be inadvertently removed by optimization steps during kernel builds, thus the __bpf_kfunc tag mitigates against this removal by including the __used macro. However, this macro alone does not prevent removal during linking, and may still yield build warnings (e.g. on mips64el): [...] LD vmlinux BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol bpf_verify_pkcs7_signature WARN: resolve_btfids: unresolved symbol bpf_lookup_user_key WARN: resolve_btfids: unresolved symbol bpf_lookup_system_key WARN: resolve_btfids: unresolved symbol bpf_key_put WARN: resolve_btfids: unresolved symbol bpf_iter_task_next WARN: resolve_btfids: unresolved symbol bpf_iter_css_task_new WARN: resolve_btfids: unresolved symbol bpf_get_file_xattr WARN: resolve_btfids: unresolved symbol bpf_ct_insert_entry WARN: resolve_btfids: unresolved symbol bpf_cgroup_release WARN: resolve_btfids: unresolved symbol bpf_cgroup_from_id WARN: resolve_btfids: unresolved symbol bpf_cgroup_acquire WARN: resolve_btfids: unresolved symbol bpf_arena_free_pages NM System.map SORTTAB vmlinux OBJCOPY vmlinux.32 [...] Update the __bpf_kfunc tag to better guard against linker optimization by including the new __retain compiler macro, which fixes the warnings above. Verify the __retain macro with readelf by checking object flags for 'R': $ readelf -Wa kernel/trace/bpf_trace.o Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [...] [178] .text.bpf_key_put PROGBITS 00000000 6420 0050 00 AXR 0 0 8 [...] Key to Flags: [...] R (retain), D (mbind), p (processor specific) Fixes: 57e7c169cd6a ("bpf: Add __bpf_kfunc tag for marking kernel functions as kfuncs") Reported-by: kernel test robot <[email protected]> Signed-off-by: Tony Ambardar <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Yonghong Song <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Link: https://lore.kernel.org/bpf/ZlmGoT9KiYLZd91S@krava/T/ Link: https://lore.kernel.org/bpf/e9c64e9b5c073dabd457ff45128aabcab7630098.1717477560.git.Tony.Ambardar@gmail.com
2024-06-14compiler_types.h: Define __retain for __attribute__((__retain__))Tony Ambardar1-0/+23
Some code includes the __used macro to prevent functions and data from being optimized out. This macro implements __attribute__((__used__)), which operates at the compiler and IR-level, and so still allows a linker to remove objects intended to be kept. Compilers supporting __attribute__((__retain__)) can address this gap by setting the flag SHF_GNU_RETAIN on the section of a function/variable, indicating to the linker the object should be retained. This attribute is available since gcc 11, clang 13, and binutils 2.36. Provide a __retain macro implementing __attribute__((__retain__)), whose first user will be the '__bpf_kfunc' tag. [ Additional remark from discussion: Why is CONFIG_LTO_CLANG added here? The __used macro permits garbage collection at section level, so CLANG_LTO_CLANG without CONFIG_LD_DEAD_CODE_DATA_ELIMINATION should not change final section dynamics? The conditional guard was included to ensure consistent behaviour between __retain and other features forcing split sections. In particular, the same guard is used in vmlinux.lds.h to merge split sections where needed. For example, using __retain in LLVM builds without CONFIG_LTO was failing CI tests on kernel-patches/bpf because the kernel didn't boot properly. And in further testing, the kernel had no issues loading BPF kfunc modules with such split sections, so the module (partial) linking scripts were left alone. ] Signed-off-by: Tony Ambardar <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Cc: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/ZlmGoT9KiYLZd91S@krava/T/ Link: https://lore.kernel.org/bpf/b31bca5a5e6765a0f32cc8c19b1d9cdbfaa822b5.1717477560.git.Tony.Ambardar@gmail.com
2024-06-14Merge tag 'iommu-fix-v6.10-rc3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fix from Joerg Roedel: "A single patch that fixes a regression which several people reported: - AMD-Vi: Fix regression causing panics" * tag 'iommu-fix-v6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix panic accessing amd_iommu_enable_faulting
2024-06-14Merge tag 'pm-6.10-rc4' of ↵Linus Torvalds1-7/+12
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Restore the behavior of the no_turbo sysfs attribute in the intel_pstate driver which allowed users to make the driver start using turbo P-states if they have been enabled on the fly by the firmware after OS initialization (Rafael Wysocki)" * tag 'pm-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Check turbo_is_disabled() in store_no_turbo()
2024-06-14Merge tag 'acpi-6.10-rc4' of ↵Linus Torvalds6-19/+76
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix a recent regression in the ACPI EC driver and make system suspend work on multiple platforms where StorageD3Enable _DSD is missing in the ACPI tables. Specifics: - Make the ACPI EC driver directly evaluate an "orphan" _REG method under the EC device, if present, which stopped being evaluated after the driver had started to install its EC address space handler at the root of the ACPI namespace (Rafael Wysocki) - Make more devices put NVMe storage devices into D3 at suspend to work around missing StorageD3Enable _DSD in the BIOS (Mario Limonciello)" * tag 'acpi-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: EC: Evaluate orphan _REG under EC device ACPI: x86: Force StorageD3Enable on more products
2024-06-14Merge tag 'thermal-6.10-rc4' of ↵Linus Torvalds3-3/+35
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "These fix three issues introduced recently, two related to defects in ACPI tables supplied by the platform firmware and one cause by a thermal core change that went too far: - Prevent the thermal core from failing the registration of a cooling device if its .get_cur_state() reports an incorrect state to start with which may happen for fans handled through firmware-supplied AML in ACPI tables (Rafael Wysocki) - Make the ACPI thermal zone driver initialize all trip points with temperature of 0 centigrade and below as invalid because such trip point temperatures do not make sense on systems with ACPI thermal control and they cause performance regressions due to permanent thermal mitigations to occur (Rafael Wysocki) - Restore passive polling management in the Step-Wise thermal governor that uses it to ensure that all cooling devices used for thermal mitigation will go back to their initial states eventually (Rafael Wysocki)" * tag 'thermal-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: gov_step_wise: Restore passive polling management thermal: ACPI: Invalidate trip points with temperature of 0 or below thermal: core: Do not fail cdev registration because of invalid initial state
2024-06-14thermal: core: Change PM notifier priority to the minimumRafael J. Wysocki1-0/+6
It is reported that commit 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") causes battery data in sysfs on Thinkpad P1 Gen2 to become invalid after a resume from S3 (and it is necessary to reboot the machine to restore correct battery data). Some investigation into the problem indicated that it happened because, after the commit in question, the ACPI battery PM notifier ran in parallel with thermal_zone_device_resume() for one of the thermal zones which apparently confused the platform firmware on the affected system. While the exact reason for the firmware confusion remains unclear, it is arguably not particularly relevant, and the expected behavior of the affected system can be restored by making the thermal PM notifier run at the lowest priority which avoids interference between work items spawned by it and the other PM notifiers (that will run before those work items now). Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218881 Reported-by: [email protected] Tested-by: [email protected] Cc: 6.8+ <[email protected]> # 6.8+ Signed-off-by: Rafael J. Wysocki <[email protected]>