aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-10-28fs/ext4/super.c: remove unused `deprecated_msg'Andrew Morton1-4/+0
fs/ext4/super.c:1744:19: warning: 'deprecated_msg' defined but not used [-Wunused-const-variable=] Reported-by: kernel test robot <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-28ipc/msg.c: fix percpu_counter use after freeAndrew Morton1-2/+2
These percpu counters are referenced in free_ipcs->freeque, so destroy them later. Fixes: 72d1e611082e ("ipc/msg: mitigate the lock contention with percpu counter") Reported-by: [email protected] Tested-by: Mark Rutland <[email protected]> Cc: Jiebin Sun <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-28memory tier, sysfs: rename attribute "nodes" to "nodelist"Huang Ying2-6/+6
In sysfs, we use attribute name "cpumap" or "cpus" for cpu mask and "cpulist" or "cpus_list" for cpu list. For example, in my system, $ cat /sys/devices/system/node/node0/cpumap f,ffffffff $ cat /sys/devices/system/cpu/cpu2/topology/core_cpus 0,00100004 $ cat cat /sys/devices/system/node/node0/cpulist 0-35 $ cat /sys/devices/system/cpu/cpu2/topology/core_cpus_list 2,20 It looks reasonable to use "nodemap" for node mask and "nodelist" for node list. So, rename the attribute to follow the naming convention. Link: https://lkml.kernel.org/r/[email protected] Fixes: 9832fb87834e2b ("mm/demotion: expose memory tier details via sysfs") Signed-off-by: "Huang, Ying" <[email protected]> Acked-by: Wei Xu <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hesham Almatary <[email protected]> Cc: Jagdish Gediya <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tim Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-28MAINTAINERS: git://github.com -> https://github.com for nilfs2Palmer Dabbelt1-1/+1
Github deprecated the git:// links about a year ago, so let's move to the https:// URLs instead. Link: https://lkml.kernel.org/r/[email protected] Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: Conor Dooley <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-28mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loopsWaiman Long1-19/+42
Commit 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan()") adds cond_resched() in the first object iteration loop of kmemleak_scan(). However, it turns that the 2nd objection iteration loop can still cause soft lockup to happen in some cases. So add a cond_resched() call in the 2nd and 3rd loops as well to prevent that and for completeness. Link: https://lkml.kernel.org/r/[email protected] Fixes: 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan()") Signed-off-by: Waiman Long <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Muchun Song <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-28squashfs: fix buffer release race condition in readahead codePhillip Lougher1-2/+3
Fix a buffer release race condition, where the error value was used after release. Link: https://lkml.kernel.org/r/[email protected] Fixes: b09a7a036d20 ("squashfs: support reading fragments in readahead call") Signed-off-by: Phillip Lougher <[email protected]> Tested-by: Bagas Sanjaya <[email protected]> Reported-by: Marc Miltenberger <[email protected]> Cc: Dimitri John Ledkov <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Mirsad Goran Todorovac <[email protected]> Cc: Slade Watkins <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-28squashfs: fix extending readahead beyond end of filePhillip Lougher1-4/+7
The readahead code will try to extend readahead to the entire size of the Squashfs data block. But, it didn't take into account that the last block at the end of the file may not be a whole block. In this case, the code would extend readahead to beyond the end of the file, leaving trailing pages. Fix this by only requesting the expected number of pages. Link: https://lkml.kernel.org/r/[email protected] Fixes: 8fc78b6fe24c ("squashfs: implement readahead") Signed-off-by: Phillip Lougher <[email protected]> Tested-by: Bagas Sanjaya <[email protected]> Reported-by: Marc Miltenberger <[email protected]> Cc: Dimitri John Ledkov <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Mirsad Goran Todorovac <[email protected]> Cc: Slade Watkins <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-28squashfs: fix read regression introduced in readahead codePhillip Lougher3-4/+12
Patch series "squashfs: fix some regressions introduced in the readahead code". This patchset fixes 3 regressions introduced by the recent readahead code changes. The first regression is causing "snaps" to randomly fail after a couple of hours or days, which how the regression came to light. This patch (of 3): If a file isn't a whole multiple of the page size, the last page will have trailing bytes unfilled. There was a mistake in the readahead code which did this. In particular it incorrectly assumed that the last page in the readahead page array (page[nr_pages - 1]) will always contain the last page in the block, which if we're at file end, will be the page that needs to be zero filled. But the readahead code may not return the last page in the block, which means it is unmapped and will be skipped by the decompressors (a temporary buffer used). In this case the zero filling code will zero out the wrong page, leading to data corruption. Fix this by by extending the "page actor" to return the last page if present, or NULL if a temporary buffer was used. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 8fc78b6fe24c ("squashfs: implement readahead") Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Phillip Lougher <[email protected]> Reported-by: Mirsad Goran Todorovac <[email protected]> Tested-by: Mirsad Goran Todorovac <[email protected]> Tested-by: Slade Watkins <[email protected]> Tested-by: Bagas Sanjaya <[email protected]> Reported-by: Marc Miltenberger <[email protected]> Cc: Dimitri John Ledkov <[email protected]> Cc: Hsin-Yi Wang <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20nouveau: fix migrate_to_ram() for faulting pageAlistair Popple1-0/+1
Commit 16ce101db85d ("mm/memory.c: fix race when faulting a device private page") changed the migrate_to_ram() callback to take a reference on the device page to ensure it can't be freed while handling the fault. Unfortunately the corresponding update to Nouveau to accommodate this change was inadvertently dropped from that patch causing GPU to CPU migration to fail so add it here. Link: https://lkml.kernel.org/r/[email protected] Fixes: 16ce101db85d ("mm/memory.c: fix race when faulting a device private page") Signed-off-by: Alistair Popple <[email protected]> Cc: John Hubbard <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Lyude Paul <[email protected]> Cc: Ben Skeggs <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mm/huge_memory: do not clobber swp_entry_t during THP splitMel Gorman1-1/+10
The following has been observed when running stressng mmap since commit b653db77350c ("mm: Clear page->private when splitting or migrating a page") watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546] CPU: 75 PID: 9546 Comm: stress-ng Tainted: G E 6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374 Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016 RIP: 0010:xas_descend+0x28/0x80 Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2 RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246 RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002 RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0 RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0 R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000 R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88 FS: 00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> xas_load+0x3a/0x50 __filemap_get_folio+0x80/0x370 ? put_swap_page+0x163/0x360 pagecache_get_page+0x13/0x90 __try_to_reclaim_swap+0x50/0x190 scan_swap_map_slots+0x31e/0x670 get_swap_pages+0x226/0x3c0 folio_alloc_swap+0x1cc/0x240 add_to_swap+0x14/0x70 shrink_page_list+0x968/0xbc0 reclaim_page_list+0x70/0xf0 reclaim_pages+0xdd/0x120 madvise_cold_or_pageout_pte_range+0x814/0xf30 walk_pgd_range+0x637/0xa30 __walk_page_range+0x142/0x170 walk_page_range+0x146/0x170 madvise_pageout+0xb7/0x280 ? asm_common_interrupt+0x22/0x40 madvise_vma_behavior+0x3b7/0xac0 ? find_vma+0x4a/0x70 ? find_vma+0x64/0x70 ? madvise_vma_anon_name+0x40/0x40 madvise_walk_vmas+0xa6/0x130 do_madvise+0x2f4/0x360 __x64_sys_madvise+0x26/0x30 do_syscall_64+0x5b/0x80 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? do_syscall_64+0x67/0x80 ? common_interrupt+0x8b/0xa0 entry_SYSCALL_64_after_hwframe+0x63/0xcd The problem can be reproduced with the mmtests config config-workload-stressng-mmap. It does not always happen and when it triggers is variable but it has happened on multiple machines. The intent of commit b653db77350c patch was to avoid the case where PG_private is clear but folio->private is not-NULL. However, THP tail pages uses page->private for "swp_entry_t if folio_test_swapcache()" as stated in the documentation for struct folio. This patch only clobbers page->private for tail pages if the head page was not in swapcache and warns once if page->private had an unexpected value. Link: https://lkml.kernel.org/r/[email protected] Fixes: b653db77350c ("mm: Clear page->private when splitting or migrating a page") Signed-off-by: Mel Gorman <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Yang Shi <[email protected]> Cc: Brian Foster <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20hugetlb: fix memory leak associated with vma_lock structureMike Kravetz1-8/+27
The hugetlb vma_lock structure hangs off the vm_private_data pointer of sharable hugetlb vmas. The structure is vma specific and can not be shared between vmas. At fork and various other times, vmas are duplicated via vm_area_dup(). When this happens, the pointer in the newly created vma must be cleared and the structure reallocated. Two hugetlb specific routines deal with this hugetlb_dup_vma_private and hugetlb_vm_op_open. Both routines are called for newly created vmas. hugetlb_dup_vma_private would always clear the pointer and hugetlb_vm_op_open would allocate the new vms_lock structure. This did not work in the case of this calling sequence pointed out in [1]. move_vma copy_vma new_vma = vm_area_dup(vma); new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock. is_vm_hugetlb_page(vma) clear_vma_resv_huge_pages hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL When clearing hugetlb_dup_vma_private we actually leak the associated vma_lock structure. The vma_lock structure contains a pointer to the associated vma. This information can be used in hugetlb_dup_vma_private and hugetlb_vm_op_open to ensure we only clear the vm_private_data of newly created (copied) vmas. In such cases, the vma->vma_lock->vma field will not point to the vma. Update hugetlb_dup_vma_private and hugetlb_vm_op_open to not clear vm_private_data if vma->vma_lock->vma == vma. Also, log a warning if hugetlb_vm_op_open ever encounters the case where vma_lock has already been correctly allocated for the vma. [1] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 131a79b474e9 ("hugetlb: fix vma lock handling during split vma and range unmapping") Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: James Houghton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Prakash Sangappa <[email protected]> Cc: Sven Schnelle <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mm/page_alloc: reduce potential fragmentation in make_alloc_exact()Liam R. Howlett1-8/+12
Try to avoid using the left over split page on the next request for a page by calling __free_pages_ok() with FPI_TO_TAIL. This increases the potential of defragmenting memory when it's used for a short period of time. Link: https://lkml.kernel.org/r/20220531185626.yvlmymbxyoe5vags@revolver Signed-off-by: Liam R. Howlett <[email protected]> Suggested-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mm: /proc/pid/smaps_rollup: fix maple tree searchHugh Dickins1-1/+1
/proc/pid/smaps_rollup showed 0 kB for everything: now find first vma. Link: https://lkml.kernel.org/r/[email protected] Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end") Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pagesRik van Riel1-1/+1
The h->*_huge_pages counters are protected by the hugetlb_lock, but alloc_huge_page has a corner case where it can decrement the counter outside of the lock. This could lead to a corrupted value of h->resv_huge_pages, which we have observed on our systems. Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a potential race. Link: https://lkml.kernel.org/r/[email protected] Fixes: a88c76954804 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count") Signed-off-by: Rik van Riel <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Glen McCready <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mm/mmap: fix MAP_FIXED address return on VMA mergeLiam Howlett1-8/+7
mmap should return the start address of newly mapped area when successful. On a successful merge of a VMA, the return address was changed and thus was violating that expectation from userspace. This is a restoration of functionality provided by 309d08d9b3a3 (mm/mmap.c: fix mmap return value when vma is merged after call_mmap()). For completeness of fixing MAP_FIXED, implement the comments from the previous discussion to never update the address and fail if the address changes. Leaving the error as a WARN_ON() to avoid crashing the kernel. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/Y06yk66SKxlrwwfb@lakrids/ Link: https://lore.kernel.org/all/[email protected]/ Fixes: 4dd1b84140c1 ("mm/mmap: use advanced maple tree API for mmap_region()") Signed-off-by: Liam R. Howlett <[email protected]> Reported-by: Mark Rutland <[email protected]> Cc: Liu Zixian <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mm/mmap.c: __vma_adjust(): suppress uninitialized var warningAndrew Morton1-1/+2
The code is OK, but it fools gcc. mm/mmap.c:802 __vma_adjust() error: uninitialized symbol 'next_next'. Fixes: 524e00b36e8c5 ("mm: remove rb tree.") Reported-by: kernel test robot <[email protected]> Cc: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mm/mmap: undo ->mmap() when mas_preallocate() failsMike Kravetz1-1/+1
A memory leak in hugetlb_reserve_pages was reported in [1]. The root cause was traced to an error path in mmap_region when mas_preallocate() fails. In this case, the vma is freed after a successful call to filesystem specific mmap. The hugetlbfs mmap routine may allocate data structures pointed to by m_private_data. These need to be cleaned up by the hugetlb vm_ops->close() routine. The same issue was addressed by commit deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") for the arch_validate_flags() test. Go to the same close_and_free_vma label if mas_preallocate() fails. [1] https://lore.kernel.org/linux-mm/CAKXUXMxf7OiCwbxib7MwfR4M1b5+b3cNTU7n5NV9Zm4967=FPQ@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree") Signed-off-by: Mike Kravetz <[email protected]> Reported-by: Lukas Bulwahn <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Carlos Llamas <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20init: Kconfig: fix spelling mistake "satify" -> "satisfy"Colin Ian King1-1/+1
There is a spelling mistake in a Kconfig description. Fix it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20ocfs2: clear dinode links count in case of errorJoseph Qi1-2/+10
In ocfs2_mknod(), if error occurs after dinode successfully allocated, ocfs2 i_links_count will not be 0. So even though we clear inode i_nlink before iput in error handling, it still won't wipe inode since we'll refresh inode from dinode during inode lock. So just like clear inode i_nlink, we clear ocfs2 i_links_count as well. Also do the same change for ocfs2_symlink(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joseph Qi <[email protected]> Reported-by: Yan Wang <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20ocfs2: fix BUG when iput after ocfs2_mknod failsJoseph Qi1-10/+1
Commit b1529a41f777 "ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed inode if __ocfs2_mknod_locked() fails later. But this introduce a race, the freed bit may be reused immediately by another thread, which will update dinode, e.g. i_generation. Then iput this inode will lead to BUG: inode->i_generation != le32_to_cpu(fe->i_generation) We could make this inode as bad, but we did want to do operations like wipe in some cases. Since the claimed inode bit can only affect that an dinode is missing and will return back after fsck, it seems not a big problem. So just leave it as is by revert the reclaim logic. Link: https://lkml.kernel.org/r/[email protected] Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error") Signed-off-by: Joseph Qi <[email protected]> Reported-by: Yan Wang <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20gcov: support GCC 12.1 and newer compilersMartin Liska1-2/+16
Starting with GCC 12.1, the created .gcda format can't be read by gcov tool. There are 2 significant changes to the .gcda file format that need to be supported: a) [gcov: Use system IO buffering] (23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05) changed that all sizes in the format are in bytes and not in words (4B) b) [gcov: make profile merging smarter] (72e0c742bd01f8e7e6dcca64042b9ad7e75979de) add a new checksum to the file header. Tested with GCC 7.5, 10.4, 12.2 and the current master. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Martin Liska <[email protected]> Tested-by: Peter Oberparleiter <[email protected]> Reviewed-by: Peter Oberparleiter <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20zsmalloc: zs_destroy_pool: add size_class NULL checkAlexey Romanov1-0/+3
Inside the zs_destroy_pool() function, there can still be NULL size_class pointers: if when the next size_class is allocated, inside zs_create_pool() function, kzalloc will return NULL and handling the error condition, zs_create_pool() will call zs_destroy_pool(). Link: https://lkml.kernel.org/r/[email protected] Fixes: f24263a5a076 ("zsmalloc: remove unnecessary size_class NULL check") Signed-off-by: Alexey Romanov <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mm/mempolicy: fix mbind_range() arguments to vma_merge()Liam Howlett1-6/+11
Fuzzing produced an invalid argument to vma_merge() which was caught by the newly added verification of the number of VMAs being removed on process exit. Analyzing the failure eventually resulted in finding an issue with the search of a VMA that started at address 0, which caused an underflow and thus the loss of many VMAs being tracked in the tree. Fix the underflow by changing the search of the maple tree to use the start address directly. Link: https://lkml.kernel.org/r/[email protected] Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list") Signed-off-by: Liam R. Howlett <[email protected]> Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mailmap: update email for Qais YousefQais Yousef1-1/+2
Update my email address for old entry and add a new entry for my contribution while working with arm to continue support that work. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qais Yousef <[email protected]> Acked-by: Qais Yousef <[email protected]> Acked-by: Qais Yousef <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-20mailmap: update Dan Carpenter's email addressDan Carpenter1-0/+1
My time at Oracle is ending at the end of the month. Update my email address accordingly. Link: https://lkml.kernel.org/r/Y0a+6+5SHMdvUnpg@kili Signed-off-by: Dan Carpenter <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-16Merge branch 'master' into mm-hotfixes-stableAndrew Morton11857-196520/+597403
2022-10-16Merge branch 'master' into mm-hotfixes-stableAndrew Morton1107-7247/+10722
2022-10-16Linux 6.1-rc1Linus Torvalds1-2/+2
2022-10-16Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds185-421/+378
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
2022-10-16Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of ↵Linus Torvalds36-71/+1265
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools updates from Arnaldo Carvalho de Melo: - Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels when using bperf (perf BPF based counters) with cgroups. - Support HiSilicon PCIe Performance Monitoring Unit (PMU), that monitors bandwidth, latency, bus utilization and buffer occupancy. Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst. - User space tasks can migrate between CPUs, so when tracing selected CPUs, system-wide sideband is still needed, fix it in the setup of Intel PT on hybrid systems. - Fix metricgroups title message in 'perf list', it should state that the metrics groups are to be used with the '-M' option, not '-e'. - Sync the msr-index.h copy with the kernel sources, adding support for using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well as decoding it when printing the MSR tracepoint arguments. - Fix program header size and alignment when generating a JIT ELF in 'perf inject'. - Add multiple new Intel PT 'perf test' entries, including a jitdump one. - Fix the 'perf test' entries for 'perf stat' CSV and JSON output when running on PowerPC due to an invalid topology number in that arch. - Fix the 'perf test' for arm_coresight failures on the ARM Juno system. - Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this option to the or expression expected in the intercepted perf_event_open() syscall. - Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the 'perf annotate' asm parser. - Fix 'perf mem record -C' option processing, it was being chopped up when preparing the underlying 'perf record -e mem-events' and thus being ignored, requiring using '-- -C CPUs' as a workaround. - Improvements and tidy ups for 'perf test' shell infra. - Fix Intel PT information printing segfault in uClibc, where a NULL format was being passed to fprintf. * tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits) tools arch x86: Sync the msr-index.h copy with the kernel sources perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver perf auxtrace arm: Refactor event list iteration in auxtrace_record__init() perf tests stat+json_output: Include sanity check for topology perf tests stat+csv_output: Include sanity check for topology perf intel-pt: Fix system_wide dummy event for hybrid perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc perf test: Fix attr tests for PERF_FORMAT_LOST perf test: test_intel_pt.sh: Add 9 tests perf inject: Fix GEN_ELF_TEXT_OFFSET for jit perf test: test_intel_pt.sh: Add jitdump test perf test: test_intel_pt.sh: Tidy some alignment perf test: test_intel_pt.sh: Print a message when skipping kernel tracing perf test: test_intel_pt.sh: Tidy some perf record options perf test: test_intel_pt.sh: Fix return checking again perf: Skip and warn on unknown format 'configN' attrs perf list: Fix metricgroups title message perf mem: Fix -C option behavior for perf mem record perf annotate: Add missing condition flags for arm64 ...
2022-10-16Merge tag 'kbuild-fixes-v6.1' of ↵Linus Torvalds6-11/+18
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the combination of Clang >= 14 and GAS <= 2.35. - Drop vmlinux.bz2 from the rpm package as it just annoyingly increased the package size. - Fix modpost error under build environments using musl. - Make *.ll files keep value names for easier debugging - Fix single directory build - Prevent RISC-V from selecting the broken DWARF5 support when Clang and GAS are used together. * tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5 kbuild: fix single directory build kbuild: add -fno-discard-value-names to cmd_cc_ll_c scripts/clang-tools: Convert clang-tidy args to list modpost: put modpost options before argument kbuild: Stop including vmlinux.bz2 in the rpm's Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
2022-10-16Merge tag 'clk-for-linus' of ↵Linus Torvalds18-129/+1831
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull more clk updates from Stephen Boyd: "This is the final part of the clk patches for this merge window. The clk rate range series needed another week to fully bake. Maxime fixed the bug that broke clk notifiers and prevented this from being included in the first pull request. He also added a unit test on top to make sure it doesn't break so easily again. The majority of the series fixes up how the clk_set_rate_*() APIs work, particularly around when the rate constraints are dropped and how they move around when reparenting clks. Overall it's a much needed improvement to the clk rate range APIs that used to be pretty broken if you looked sideways. Beyond the core changes there are a few driver fixes for a compilation issue or improper data causing clks to fail to register or have the wrong parents. These are good to get in before the first -rc so that the system actually boots on the affected devices" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits) clk: tegra: Fix Tegra PWM parent clock clk: at91: fix the build with binutils 2.27 clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks clk: mediatek: clk-mux: Add .determine_rate() callback clk: tests: Add tests for notifiers clk: Update req_rate on __clk_recalc_rates() clk: tests: Add missing test case for ranges clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d clk: Introduce the clk_hw_get_rate_range function clk: Zero the clk_rate_request structure clk: Stop forwarding clk_rate_requests to the parent clk: Constify clk_has_parent() clk: Introduce clk_core_has_parent() clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock clk: Add our request boundaries in clk_core_init_rate_req clk: Introduce clk_hw_init_rate_request() clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller clk: Change clk_core_init_rate_req prototype clk: Set req_rate on reparenting clk: Take into account uncached clocks in clk_set_rate_range() ...
2022-10-16Merge tag '6.1-rc-smb3-client-fixes-part2' of ↵Linus Torvalds23-726/+922
git://git.samba.org/sfrench/cifs-2.6 Pull more cifs updates from Steve French: - fix a regression in guest mounts to old servers - improvements to directory leasing (caching directory entries safely beyond the root directory) - symlink improvement (reducing roundtrips needed to process symlinks) - an lseek fix (to problem where some dir entries could be skipped) - improved ioctl for returning more detailed information on directory change notifications - clarify multichannel interface query warning - cleanup fix (for better aligning buffers using ALIGN and round_up) - a compounding fix - fix some uninitialized variable bugs found by Coverity and the kernel test robot * tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: improve SMB3 change notification support cifs: lease key is uninitialized in two additional functions when smb1 cifs: lease key is uninitialized in smb1 paths smb3: must initialize two ACL struct fields to zero cifs: fix double-fault crash during ntlmssp cifs: fix static checker warning cifs: use ALIGN() and round_up() macros cifs: find and use the dentry for cached non-root directories also cifs: enable caching of directories for which a lease is held cifs: prevent copying past input buffer boundaries cifs: fix uninitialised var in smb2_compound_op() cifs: improve symlink handling for smb2+ smb3: clarify multichannel warning cifs: fix regression in very old smb1 mounts cifs: fix skipping to incorrect offset in emit_cached_dirents
2022-10-16Revert "cpumask: fix checking valid cpu range".Tetsuo Handa1-8/+11
This reverts commit 78e5a3399421 ("cpumask: fix checking valid cpu range"). syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at cpu_max_bits_warn() [1], for commit 78e5a3399421 ("cpumask: fix checking valid cpu range") is broken. Obviously that patch hits WARN_ON_ONCE() when e.g. reading /proc/cpuinfo because passing "cpu + 1" instead of "cpu" will trivially hit cpu == nr_cpumask_bits condition. Although syzbot found this problem in linux-next.git on 2022/09/27 [2], this problem was not fixed immediately. As a result, that patch was sent to linux.git before the patch author recognizes this problem, and syzbot started failing to test changes in linux.git since 2022/10/10 [3]. Andrew Jones proposed a fix for x86 and riscv architectures [4]. But [2] and [5] indicate that affected locations are not limited to arch code. More delay before we find and fix affected locations, less tested kernel (and more difficult to bisect and fix) before release. We should have inspected and fixed basically all cpumask users before applying that patch. We should not crash kernels in order to ask existing cpumask users to update their code, even if limited to CONFIG_DEBUG_PER_CPU_MAPS=y case. Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1] Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2] Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3] Link: https://lkml.kernel.org/r/[email protected] [4] Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5] Reported-by: Andrew Jones <[email protected]> Reported-by: [email protected] Signed-off-by: Tetsuo Handa <[email protected]> Cc: Yury Norov <[email protected]> Cc: Borislav Petkov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-10-17lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5Nathan Chancellor1-2/+7
When building with a RISC-V kernel with DWARF5 debug info using clang and the GNU assembler, several instances of the following error appear: /tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported Dumping the .s file reveals these .uleb128 directives come from .debug_loc and .debug_ranges: .Ldebug_loc0: .byte 4 # DW_LLE_offset_pair .uleb128 .Lfunc_begin0-.Lfunc_begin0 # starting offset .uleb128 .Ltmp1-.Lfunc_begin0 # ending offset .byte 1 # Loc expr size .byte 90 # DW_OP_reg10 .byte 0 # DW_LLE_end_of_list .Ldebug_ranges0: .byte 4 # DW_RLE_offset_pair .uleb128 .Ltmp6-.Lfunc_begin0 # starting offset .uleb128 .Ltmp27-.Lfunc_begin0 # ending offset .byte 4 # DW_RLE_offset_pair .uleb128 .Ltmp28-.Lfunc_begin0 # starting offset .uleb128 .Ltmp30-.Lfunc_begin0 # ending offset .byte 0 # DW_RLE_end_of_list There is an outstanding binutils issue to support a non-constant operand to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to be any movement on it, due to concerns over how it would work with linker relaxation. To avoid these build errors, prevent DWARF5 from being selected when using clang and an assembler that does not have support for these symbol deltas, which can be easily checked in Kconfig with as-instr plus the small test program from the dwz test suite from the binutils issue. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215 Link: https://github.com/ClangBuiltLinux/linux/issues/1719 Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2022-10-17kbuild: fix single directory buildMasahiro Yamada1-0/+2
Commit f110e5a250e3 ("kbuild: refactor single builds of *.ko") was wrong. KBUILD_MODULES _is_ needed for single builds. Otherwise, "make foo/bar/baz/" does not build module objects at all. Fixes: f110e5a250e3 ("kbuild: refactor single builds of *.ko") Reported-by: David Sterba <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Tested-by: David Sterba <[email protected]>
2022-10-15Merge tag 'slab-for-6.1-rc1-hotfix' of ↵Linus Torvalds2-19/+19
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab hotfix from Vlastimil Babka: "A single fix for the common-kmalloc series, for warnings on mips and sparc64 reported by Guenter Roeck" * tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
2022-10-15Merge tag 'for-linus' of https://github.com/openrisc/linuxLinus Torvalds2-9/+9
Pull OpenRISC updates from Stafford Horne: "I have relocated to London so not much work from me while I get settled. Still, OpenRISC picked up two patches in this window: - Fix for kernel page table walking from Jann Horn - MAINTAINER entry cleanup from Palmer Dabbelt" * tag 'for-linus' of https://github.com/openrisc/linux: MAINTAINERS: git://github -> https://github.com for openrisc openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
2022-10-15Merge tag 'pci-v6.1-fixes-1' of ↵Linus Torvalds1-61/+1
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fix from Bjorn Helgaas: "Revert the attempt to distribute spare resources to unconfigured hotplug bridges at boot time. This fixed some dock hot-add scenarios, but Jonathan Cameron reported that it broke a topology with a multi-function device where one function was a Switch Upstream Port and the other was an Endpoint" * tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: Revert "PCI: Distribute available resources for root buses, too"
2022-10-15mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocationHyeonggon Yoo2-19/+19
After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2) requests to buddy like SLUB does. SLAB has been using kmalloc caches to allocate freelist_idx_t array for off slab caches. But after the commit, freelist_size can be bigger than KMALLOC_MAX_CACHE_SIZE. Instead of using pointer to kmalloc cache, use kmalloc_node() and only check if the kmalloc cache is off slab during calculate_slab_order(). If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens as it allocates freelist_idx_t array directly from buddy. Link: https://lore.kernel.org/all/[email protected]/ Reported-and-tested-by: Guenter Roeck <[email protected]> Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator") Signed-off-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-10-15MAINTAINERS: git://github -> https://github.com for openriscPalmer Dabbelt1-1/+1
Github deprecated the git:// links about a year ago, so let's move to the https:// URLs instead. Reported-by: Conor Dooley <[email protected]> Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/ Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Stafford Horne <[email protected]>
2022-10-15smb3: improve SMB3 change notification supportSteve French6-13/+90
Change notification is a commonly supported feature by most servers, but the current ioctl to request notification when a directory is changed does not return the information about what changed (even though it is returned by the server in the SMB3 change notify response), it simply returns when there is a change. This ioctl improves upon CIFS_IOC_NOTIFY by returning the notify information structure which includes the name of the file(s) that changed and why. See MS-SMB2 2.2.35 for details on the individual filter flags and the file_notify_information structure returned. To use this simply pass in the following (with enough space to fit at least one file_notify_information structure) struct __attribute__((__packed__)) smb3_notify { uint32_t completion_filter; bool watch_tree; uint32_t data_len; uint8_t data[]; } __packed; using CIFS_IOC_NOTIFY_INFO 0xc009cf0b or equivalently _IOWR(CIFS_IOCTL_MAGIC, 11, struct smb3_notify_info) The ioctl will block until the server detects a change to that directory or its subdirectories (if watch_tree is set). Acked-by: Paulo Alcantara (SUSE) <[email protected]> Acked-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]>
2022-10-15cifs: lease key is uninitialized in two additional functions when smb1Steve French1-2/+2
cifs_open and _cifsFileInfo_put also end up with lease_key uninitialized in smb1 mounts. It is cleaner to set lease key to zero in these places where leases are not supported (smb1 can not return lease keys so the field was uninitialized). Addresses-Coverity: 1514207 ("Uninitialized scalar variable") Addresses-Coverity: 1514331 ("Uninitialized scalar variable") Reviewed-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>
2022-10-15cifs: lease key is uninitialized in smb1 pathsSteve French1-1/+1
It is cleaner to set lease key to zero in the places where leases are not supported (smb1 can not return lease keys so the field was uninitialized). Addresses-Coverity: 1513994 ("Uninitialized scalar variable") Reviewed-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>
2022-10-15smb3: must initialize two ACL struct fields to zeroSteve French1-1/+2
Coverity spotted that we were not initalizing Stbz1 and Stbz2 to zero in create_sd_buf. Addresses-Coverity: 1513848 ("Uninitialized scalar variable") Cc: <[email protected]> Reviewed-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>
2022-10-15cifs: fix double-fault crash during ntlmsspPaulo Alcantara1-7/+9
The crash occurred because we were calling memzero_explicit() on an already freed sess_data::iov[1] (ntlmsspblob) in sess_free_buffer(). Fix this by not calling memzero_explicit() on sess_data::iov[1] as it's already by handled by callers. Fixes: a4e430c8c8ba ("cifs: replace kfree() with kfree_sensitive() for sensitive data") Reviewed-by: Enzo Matsumiya <[email protected]> Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>
2022-10-15tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo1-0/+18
To pick up the changes in: b8d1d163604bd1e6 ("x86/apic: Don't disable x2APIC if locked") ca5b7c0d9621702e ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") Addressing these tools/perf build warnings: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' That makes the beautification scripts to pick some new entries: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2022-10-14 18:06:34.294561729 -0300 +++ after 2022-10-14 18:06:41.285744044 -0300 @@ -264,6 +264,7 @@ [0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE", [0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX", [0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO", + [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT", [0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG", [0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS", [0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL", $ Now one can trace systemwide asking to see backtraces to where that MSR is being read/written, see this example with a previous update: # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" ^C# If we use -v (verbose mode) we can see what it does behind the scenes: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" Using CPUID AuthenticAMD-25-21-0 0x6a0 0x6a8 New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) 0x6a0 0x6a8 New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) mmap size 528384B ^C# Example with a frequent msr: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2 Using CPUID AuthenticAMD-25-21-0 0x48 New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) 0x48 New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) mmap size 528384B Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols 0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) futex_wait_queue_me ([kernel.kallsyms]) futex_wait ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so) 0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule_idle ([kernel.kallsyms]) do_idle ([kernel.kallsyms]) cpu_startup_entry ([kernel.kallsyms]) secondary_startup_64_no_verify ([kernel.kallsyms]) # Cc: Adrian Hunter <[email protected]> Cc: Daniel Sneddon <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sandipan Das <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-15perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packetQi Liu7-0/+396
Add support for using 'perf report --dump-raw-trace' to parse PTT packet. Example usage: Output will contain raw PTT data and its textual representation, such as (8DW format): 0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000 offset: 0 ref: 0xa5d50c725 idx: 0 tid: -1 cpu: 0 . . ... HISI PTT data: size 4194304 bytes . 00000000: 00 00 00 00 Prefix . 00000004: 08 20 00 60 Header DW0 . 00000008: ff 02 00 01 Header DW1 . 0000000c: 20 08 00 00 Header DW2 . 00000010: 10 e7 44 ab Header DW3 . 00000014: 2a a8 1e 01 Time . 00000020: 00 00 00 00 Prefix . 00000024: 01 00 00 60 Header DW0 . 00000028: 0f 1e 00 01 Header DW1 . 0000002c: 04 00 00 00 Header DW2 . 00000030: 40 00 81 02 Header DW3 . 00000034: ee 02 00 00 Time .... This patch only add basic parsing support according to the definition of the PTT packet described in Documentation/trace/hisi-ptt.rst. And the fields of each packet can be further decoded following the PCIe Spec's definition of TLP packet. Signed-off-by: Qi Liu <[email protected]> Signed-off-by: Yicong Yang <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: John Garry <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Leo Yan <[email protected]> Cc: Lorenzo Pieralisi <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qi Liu <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shaokun Zhang <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: Zeng Prime <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-15perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driverQi Liu7-1/+273
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the PCIe link's events, and trace the TLP headers). This patch add support for PTT device in perf tool, so users could use 'perf record' to get TLP headers trace data. Reviewed-by: Leo Yan <[email protected]> Signed-off-by: Qi Liu <[email protected]> Signed-off-by: Yicong Yang <[email protected]> Acked-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Lorenzo Pieralisi <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qi Liu <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shaokun Zhang <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: Zeng Prime <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-15perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()Qi Liu1-19/+34
Add find_pmu_for_event() and use to simplify logic in auxtrace_record_init(). find_pmu_for_event() will be reused in subsequent patches. Reviewed-by: John Garry <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Leo Yan <[email protected]> Signed-off-by: Qi Liu <[email protected]> Signed-off-by: Yicong Yang <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Lorenzo Pieralisi <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qi Liu <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shaokun Zhang <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: Zeng Prime <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>