Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently, it used old rss size to get current tc mode. As a result, the
rss size is updated, but the tc mode is still configured based on the old
rss size.
So this patch fixes it by using the new rss size in both process.
Fixes: 93969dc14fcd ("net: hns3: refactor VF rss init APIs with new common rss init APIs")
Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: Hao Lan <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently the nfsd-client shrinker is registered and unregistered at
the time the nfsd module is loaded and unloaded. The problem with this
is the shrinker is being registered before all of the relevant fields
in nfsd_net are initialized when nfsd is started. This can lead to an
oops when memory is low and the shrinker is called while nfsd is not
running.
This patch moves the register/unregister of nfsd-client shrinker from
module load/unload time to nfsd startup/shutdown time.
Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
Reported-by: Mike Galbraith <[email protected]>
Signed-off-by: Dai Ngo <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
If signal_pending() returns true, schedule_timeout() will not be executed,
causing the waiting task to remain in the wait queue.
Fixed by adding a call to finish_wait(), which ensures that the waiting
task will always be removed from the wait queue.
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Signed-off-by: Xingyuan Mo <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
Fix the help text for the PRINTK_SAFE_LOG_BUF_SHIFT setting.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Lizzy Fleckenstein <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
During the maple tree conversion of nommu, an error in counting the VMAs
was introduced by counting the existing VMA again. The counting used to
be decremented by one and incremented by two, but now it only increments
by two. Fix the counting error by moving the increment outside the
setup_vma_to_mm() function to the callers.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8220543df148 ("nommu: remove uses of VMA linked list")
Signed-off-by: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When removing a VMA from the tree fails due to no memory, do not free the
VMA since a reference still exists.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8220543df148 ("nommu: remove uses of VMA linked list")
Signed-off-by: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The preallocation of the maple tree nodes may leak if the error path to
"error_just_free" is taken. Fix this by moving the freeing of the maple
tree nodes to a shared location for all error paths.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8220543df148 ("nommu: remove uses of VMA linked list")
Signed-off-by: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Update the email address for Robert's maintainer entries and fill in
.mailmap accordingly.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Robert Foss <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Vasily Averin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
vsyscall detection code uses direct call to the beginning of
the vsyscall page:
asm ("call %P0" :: "i" (0xffffffffff600000))
It generates "call rel32" instruction but it is not relocated if binary
is PIE, so binary segfaults into random userspace address and vsyscall
page status is detected incorrectly.
Do more direct:
asm ("call *%rax")
which doesn't do need any relocaltions.
Mark g_vsyscall as volatile for a good measure, I didn't find instruction
setting it to 0. Now the code is obviously correct:
xor eax, eax
mov rdi, rbp
mov rsi, rbp
mov DWORD PTR [rip+0x2d15], eax # g_vsyscall = 0
mov rax, 0xffffffffff600000
call rax
mov DWORD PTR [rip+0x2d02], 1 # g_vsyscall = 1
mov eax, DWORD PTR ds:0xffffffffff600000
mov DWORD PTR [rip+0x2cf1], 2 # g_vsyscall = 2
mov edi, [rip+0x2ceb] # exit(g_vsyscall)
call exit
Note: fixed proc-empty-vm test oopses 5.19.0-28-generic kernel
but this is separate story.
Link: https://lkml.kernel.org/r/Y7h2xvzKLg36DSq8@p183
Fixes: 5bc73bb3451b9 ("proc: test how it holds up with mapping'less process")
Signed-off-by: Alexey Dobriyan <[email protected]>
Reported-by: Mirsad Goran Todorovac <[email protected]>
Tested-by: Mirsad Goran Todorovac <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The rename from mm->mmap_sem to mm->mmap_lock was performed in commit
da1c55f1b272 ("mmap locking API: rename mmap_sem to mmap_lock") and commit
c1e8d7c6a7a6 ("map locking API: convert mmap_sem comments"), however some
incorrect comments remain.
This patch simply corrects those comments which are obviously incorrect
within mm itself.
Link: https://lkml.kernel.org/r/33fba04389ab63fc4980e7ba5442f521df6dc657.1673048927.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Commit 449c796768c9 ("mm: teach release_pages() to take an array of
encoded page pointers too") added the kernel doc comment for
release_pages() on top of 'union release_pages_arg', so making 'make
htmldocs' complains as below:
./include/linux/mm.h:1268: warning: cannot understand function prototype: 'typedef union '
The kernel doc comment for the function is already on top of the
function's definition in mm/swap.c, and the new comment is actually not
for the function but indeed release_pages_arg. Fixing the comment to
reflect the intent would be one option. But, kernel doc cannot parse
the union as below due to the attribute.
./include/linux/mm.h:1272: error: Cannot parse struct or union!
Modify the comment to reflect the intent but do not mark it as a kernel
doc comment.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 449c796768c9 ("mm: teach release_pages() to take an array of encoded page pointers too")
Signed-off-by: SeongJae Park <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Don't use kernel-doc "/**" notation for non-kernel-doc comments.
Prevents a kernel-doc warning:
lib/win_minmax.c:31: warning: expecting prototype for lib/minmax.c(). Prototype was for minmax_subwin_update() instead
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: David S. Miller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Mark kasan_kunit_executing as static, as it is only used within
mm/kasan/report.c.
Link: https://lkml.kernel.org/r/f64778a4683b16a73bba72576f73bf4a2b45a82f.1672794398.git.andreyknvl@google.com
Fixes: c8c7016f50c8 ("kasan: fail non-kasan KUnit tests on KASAN reports")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
If nilfs2 reads a corrupted disk image and tries to reads a b-tree node
block by calling __nilfs_btree_get_block() against an invalid virtual
block address, it returns -ENOENT because conversion of the virtual block
address to a disk block address fails. However, this return value is the
same as the internal code that b-tree lookup routines return to indicate
that the block being searched does not exist, so functions that operate on
that b-tree may misbehave.
When nilfs_btree_insert() receives this spurious 'not found' code from
nilfs_btree_do_lookup(), it misunderstands that the 'not found' check was
successful and continues the insert operation using incomplete lookup path
data, causing the following crash:
general protection fault, probably for non-canonical address
0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
...
RIP: 0010:nilfs_btree_get_nonroot_node fs/nilfs2/btree.c:418 [inline]
RIP: 0010:nilfs_btree_prepare_insert fs/nilfs2/btree.c:1077 [inline]
RIP: 0010:nilfs_btree_insert+0x6d3/0x1c10 fs/nilfs2/btree.c:1238
Code: bc 24 80 00 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 28 00 74 08 4c 89
ff e8 4b 02 92 fe 4d 8b 3f 49 83 c7 28 4c 89 f8 48 c1 e8 03 <42> 80 3c
28 00 74 08 4c 89 ff e8 2e 02 92 fe 4d 8b 3f 49 83 c7 02
...
Call Trace:
<TASK>
nilfs_bmap_do_insert fs/nilfs2/bmap.c:121 [inline]
nilfs_bmap_insert+0x20d/0x360 fs/nilfs2/bmap.c:147
nilfs_get_block+0x414/0x8d0 fs/nilfs2/inode.c:101
__block_write_begin_int+0x54c/0x1a80 fs/buffer.c:1991
__block_write_begin fs/buffer.c:2041 [inline]
block_write_begin+0x93/0x1e0 fs/buffer.c:2102
nilfs_write_begin+0x9c/0x110 fs/nilfs2/inode.c:261
generic_perform_write+0x2e4/0x5e0 mm/filemap.c:3772
__generic_file_write_iter+0x176/0x400 mm/filemap.c:3900
generic_file_write_iter+0xab/0x310 mm/filemap.c:3932
call_write_iter include/linux/fs.h:2186 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x7dc/0xc50 fs/read_write.c:584
ksys_write+0x177/0x2a0 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
...
</TASK>
This patch fixes the root cause of this problem by replacing the error
code that __nilfs_btree_get_block() returns on block address conversion
failure from -ENOENT to another internal code -EINVAL which means that the
b-tree metadata is corrupted.
By returning -EINVAL, it propagates without glitches, and for all relevant
b-tree operations, functions in the upper bmap layer output an error
message indicating corrupted b-tree metadata via
nilfs_bmap_convert_error(), and code -EIO will be eventually returned as
it should be.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Writeback has been implemented for zsmalloc, so this warning no longer
holds.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9997bc017549a ("zsmalloc: implement writeback mechanism for zsmalloc")
Suggested-by: Thomas Weißschuh <[email protected]>
Signed-off-by: Nhat Pham <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Userfaultfd-wp uses pte markers to mark wr-protected pages for both shmem
and hugetlb. Shmem has pre-allocation ready for markers, but hugetlb path
was overlooked.
Doing so by calling huge_pte_alloc() if the initial pgtable walk fails to
find the huge ptep. It's possible that huge_pte_alloc() can fail with
high memory pressure, in that case stop the loop immediately and fail
silently. This is not the most ideal solution but it matches with what we
do with shmem meanwhile it avoids the splat in dmesg.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 60dfaad65aa9 ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: Peter Xu <[email protected]>
Reported-by: James Houghton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: James Houghton <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: <[email protected]> [5.19+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
PMD sharing can only be done in PUD_SIZE-aligned pieces of VMAs; however,
it is possible that HugeTLB VMAs are split without unsharing the PMDs
first.
Without this fix, it is possible to hit the uffd-wp-related WARN_ON_ONCE
in hugetlb_change_protection [1]. The key there is that
hugetlb_unshare_all_pmds will not attempt to unshare PMDs in
non-PUD_SIZE-aligned sections of the VMA.
It might seem ideal to unshare in hugetlb_vm_op_open, but we need to
unshare in both the new and old VMAs, so unsharing in hugetlb_vm_op_split
seems natural.
[1]: https://lore.kernel.org/linux-mm/CADrL8HVeOkj0QH5VZZbRzybNE8CG-tEGFshnA+bG9nMgcWtBSg@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp")
Signed-off-by: James Houghton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Acked-by: Peter Xu <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
free_anon_vma_name() is missing a check for anonymous shmem VMA which
leads to a memory leak due to refcount not being dropped. Fix this by
calling anon_vma_name_put() unconditionally. It will free vma->anon_name
whenever it's non-NULL.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: d09e8ca6cb93 ("mm: anonymous shared memory naming")
Signed-off-by: Suren Baghdasaryan <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reported-by: [email protected]
Cc: Hugh Dickins <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
SHMEM_HUGE_DENY is for emergency use by the admin, to disable allocation
of shmem huge pages if, for example, a dangerous bug is found in their
usage: see "deny" in Documentation/mm/transhuge.rst. An app using
madvise(,,MADV_COLLAPSE) should not be allowed to override it: restore its
precedence over shmem_huge_force.
Restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7c6c6cc4d3a2 ("mm/shmem: add flag to enforce shmem THP in hugepage_vma_check()")
Signed-off-by: Zach O'Keefe <[email protected]>
Suggested-by: Hugh Dickins <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained within the bounds supplied
by the user.
At the top of each hugepage iteration we (re)lock mmap_lock and
(re)validate the VMA for eligibility and update variables that might have
changed while mmap_lock was dropped. One thing that might occur is that
the VMA could be resized, and as such, we refetch vma->vm_end to make sure
we don't collapse past the end of the VMA's new end.
However, it's possible that when refetching vma->vm_end that we expand the
region acted on by MADV_COLLAPSE if vma->vm_end is greater than size+len
supplied by the user.
The consequence here is that we may attempt to collapse more memory than
requested, possibly yielding either "too much success" or "false failure"
user-visible results. An example of the former is if we MADV_COLLAPSE the
first 4MiB of a 2TiB mmap()'d file, the incorrect refetch would cause the
operation to block for much longer than anticipated as we attempt to
collapse the entire TiB region. An example of the latter is that applying
MADV_COLLPSE to a 4MiB file mapped to the start of a 6MiB VMA will
successfully collapse the first 4MiB, then incorrectly attempt to collapse
the last hugepage-aligned/sized region -- fail (since readahead/page cache
lookup will fail) -- and report a failure to the user.
I don't believe there is a kernel stability concern here as we always
(re)validate the VMA / region accordingly. Also as Hugh mentions, the
user-visible effects are: we try to collapse more memory than requested
by the user, and/or failing an operation that should have otherwise
succeeded. An example is trying to collapse a 4MiB file contained
within a 12MiB VMA.
Don't expand the acted-on region when refetching vma->vm_end.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4d24de9425f7 ("mm: MADV_COLLAPSE: refetch vm_end after reacquiring mmap_lock")
Signed-off-by: Zach O'Keefe <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Currently, we don't enable writenotify when enabling userfaultfd-wp on a
shared writable mapping (for now only shmem and hugetlb). The consequence
is that vma->vm_page_prot will still include write permissions, to be set
as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
page migration, ...).
So far, vma->vm_page_prot is assumed to be a safe default, meaning that we
only add permissions (e.g., mkwrite) but not remove permissions (e.g.,
wrprotect). For example, when enabling softdirty tracking, we enable
writenotify. With uffd-wp on shared mappings, that changed. More details
on vma->vm_page_prot semantics were summarized in [1].
This is problematic for uffd-wp: we'd have to manually check for a uffd-wp
PTEs/PMDs and manually write-protect PTEs/PMDs, which is error prone.
Prone to such issues is any code that uses vma->vm_page_prot to set PTE
permissions: primarily pte_modify() and mk_pte().
Instead, let's enable writenotify such that PTEs/PMDs/... will be mapped
write-protected as default and we will only allow selected PTEs that are
definitely safe to be mapped without write-protection (see
can_change_pte_writable()) to be writable. In the future, we might want
to enable write-bit recovery -- e.g., can_change_pte_writable() -- at more
locations, for example, also when removing uffd-wp protection.
This fixes two known cases:
(a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting
in uffd-wp not triggering on write access.
(b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs
writable, resulting in uffd-wp not triggering on write access.
Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even
without NUMA hinting (which currently doesn't seem to be applicable to
shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA. On
such a VMA, userfaultfd-wp is currently non-functional.
Note that when enabling userfaultfd-wp, there is no need to walk page
tables to enforce the new default protection for the PTEs: we know that
they cannot be uffd-wp'ed yet, because that can only happen after enabling
uffd-wp for the VMA in general.
Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not
accidentally set the write bit -- which would result in uffd-wp not
triggering on later write access. This commit makes uffd-wp on shmem
behave just like uffd-wp on anonymous memory in that regard, even though,
mixing mprotect with uffd-wp is controversial.
[1] https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: David Hildenbrand <[email protected]>
Reported-by: Ives van Hoorne <[email protected]>
Debugged-by: Peter Xu <[email protected]>
Acked-by: Peter Xu <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
uprobe_write_opcode() uses collapse_pte_mapped_thp() to restore huge pmd,
when removing a breakpoint from hugepage text: vma->anon_vma is always set
in that case, so undo the prohibition. And MADV_COLLAPSE ought to be able
to collapse some page tables in a vma which happens to have anon_vma set
from CoWing elsewhere.
Is anon_vma lock required? Almost not: if any page other than expected
subpage of the non-anon huge page is found in the page table, collapse is
aborted without making any change. However, it is possible that an anon
page was CoWed from this extent in another mm or vma, in which case a
concurrent lookup might look here: so keep it away while clearing pmd (but
perhaps we shall go back to using pmd_lock() there in future).
Note that collapse_pte_mapped_thp() is exceptional in freeing a page table
without having cleared its ptes: I'm uneasy about that, and had thought
pte_clear()ing appropriate; but exclusive i_mmap lock does fix the
problem, and we would have to move the mmu_notification if clearing those
ptes.
What this fixes is not a dangerous instability. But I suggest Cc stable
because uprobes "healing" has regressed in that way, so this should follow
8d3c106e19e8 into those stable releases where it was backported (and may
want adjustment there - I'll supply backports as needed).
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8d3c106e19e8 ("mm/khugepaged: take the right locks for page table retraction")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zach O'Keefe <[email protected]>
Cc: Song Liu <[email protected]>
Cc: <[email protected]> [5.4+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
hugetlb_change_protection()
We have to update the uffd-wp SWP PTE bit independent of the type of
migration entry. Currently, if we're unlucky and we want to install/clear
the uffd-wp bit just while we're migrating a read-only mapped hugetlb
page, we would miss to set/clear the uffd-wp bit.
Further, if we're processing a readable-exclusive migration entry and
neither want to set or clear the uffd-wp bit, we could currently end up
losing the uffd-wp bit. Note that the same would hold for writable
migrating entries, however, having a writable migration entry with the
uffd-wp bit set would already mean that something went wrong.
Note that the change from !is_readable_migration_entry ->
writable_migration_entry is harmless and actually cleaner, as raised by
Miaohe Lin and discussed in [1].
[1] https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 60dfaad65aa9 ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/hugetlb: uffd-wp fixes for hugetlb_change_protection()".
Playing with virtio-mem and background snapshots (using uffd-wp) on
hugetlb in QEMU, I managed to trigger a VM_BUG_ON(). Looking into the
details, hugetlb_change_protection() seems to not handle uffd-wp correctly
in all cases.
Patch #1 fixes my test case. I don't have reproducers for patch #2, as it
requires running into migration entries.
I did not yet check in detail yet if !hugetlb code requires similar care.
This patch (of 2):
There are two problematic cases when stumbling over a PTE marker in
hugetlb_change_protection():
(1) We protect an uffd-wp PTE marker a second time using uffd-wp: we will
end up in the "!huge_pte_none(pte)" case and mess up the PTE marker.
(2) We unprotect a uffd-wp PTE marker: we will similarly end up in the
"!huge_pte_none(pte)" case even though we cleared the PTE, because
the "pte" variable is stale. We'll mess up the PTE marker.
For example, if we later stumble over such a "wrongly modified" PTE marker,
we'll treat it like a present PTE that maps some garbage page.
This can, for example, be triggered by mapping a memfd backed by huge
pages, registering uffd-wp, uffd-wp'ing an unmapped page and (a)
uffd-wp'ing it a second time; or (b) uffd-unprotecting it; or (c)
unregistering uffd-wp. Then, ff we trigger fallocate(FALLOC_FL_PUNCH_HOLE)
on that file range, we will run into a VM_BUG_ON:
[ 195.039560] page:00000000ba1f2987 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0
[ 195.039565] flags: 0x7ffffc0001000(reserved|node=0|zone=0|lastcpupid=0x1fffff)
[ 195.039568] raw: 0007ffffc0001000 ffffe742c0000008 ffffe742c0000008 0000000000000000
[ 195.039569] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 195.039569] page dumped because: VM_BUG_ON_PAGE(compound && !PageHead(page))
[ 195.039573] ------------[ cut here ]------------
[ 195.039574] kernel BUG at mm/rmap.c:1346!
[ 195.039579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 195.039581] CPU: 7 PID: 4777 Comm: qemu-system-x86 Not tainted 6.0.12-200.fc36.x86_64 #1
[ 195.039583] Hardware name: LENOVO 20WNS1F81N/20WNS1F81N, BIOS N35ET50W (1.50 ) 09/15/2022
[ 195.039584] RIP: 0010:page_remove_rmap+0x45b/0x550
[ 195.039588] Code: [...]
[ 195.039589] RSP: 0018:ffffbc03c3633ba8 EFLAGS: 00010292
[ 195.039591] RAX: 0000000000000040 RBX: ffffe742c0000000 RCX: 0000000000000000
[ 195.039592] RDX: 0000000000000002 RSI: ffffffff8e7aac1a RDI: 00000000ffffffff
[ 195.039592] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffbc03c3633a08
[ 195.039593] R10: 0000000000000003 R11: ffffffff8f146328 R12: ffff9b04c42754b0
[ 195.039594] R13: ffffffff8fcc6328 R14: ffffbc03c3633c80 R15: ffff9b0484ab9100
[ 195.039595] FS: 00007fc7aaf68640(0000) GS:ffff9b0bbf7c0000(0000) knlGS:0000000000000000
[ 195.039596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 195.039597] CR2: 000055d402c49110 CR3: 0000000159392003 CR4: 0000000000772ee0
[ 195.039598] PKRU: 55555554
[ 195.039599] Call Trace:
[ 195.039600] <TASK>
[ 195.039602] __unmap_hugepage_range+0x33b/0x7d0
[ 195.039605] unmap_hugepage_range+0x55/0x70
[ 195.039608] hugetlb_vmdelete_list+0x77/0xa0
[ 195.039611] hugetlbfs_fallocate+0x410/0x550
[ 195.039612] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 195.039616] vfs_fallocate+0x12e/0x360
[ 195.039618] __x64_sys_fallocate+0x40/0x70
[ 195.039620] do_syscall_64+0x58/0x80
[ 195.039623] ? syscall_exit_to_user_mode+0x17/0x40
[ 195.039624] ? do_syscall_64+0x67/0x80
[ 195.039626] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 195.039628] RIP: 0033:0x7fc7b590651f
[ 195.039653] Code: [...]
[ 195.039654] RSP: 002b:00007fc7aaf66e70 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[ 195.039655] RAX: ffffffffffffffda RBX: 0000558ef4b7f370 RCX: 00007fc7b590651f
[ 195.039656] RDX: 0000000018000000 RSI: 0000000000000003 RDI: 000000000000000c
[ 195.039657] RBP: 0000000008000000 R08: 0000000000000000 R09: 0000000000000073
[ 195.039658] R10: 0000000008000000 R11: 0000000000000293 R12: 0000000018000000
[ 195.039658] R13: 00007fb8bbe00000 R14: 000000000000000c R15: 0000000000001000
[ 195.039661] </TASK>
Fix it by not going into the "!huge_pte_none(pte)" case if we stumble over
an exclusive marker. spin_unlock() + continue would get the job done.
However, instead, make it clearer that there are no fall-through
statements: we process each case (hwpoison, migration, marker, !none,
none) and then unlock the page table to continue with the next PTE. Let's
avoid "continue" statements and use a single spin_unlock() at the end.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 60dfaad65aa9 ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
stress_hpt_timer_fn() is only used in hash_utils.c, make it static.
Fixes: 6b34a099faa1 ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults")
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Make 'perf kmem' cope with the removal of some
kmem:kmem_cache_alloc_node and kmem:kmalloc_node in the
11e9734bcb6a7361 ("mm/slab_common: unify NUMA and UMA version of
tracepoints") commit, making sure it works with Linux >= 6.2 as well
as with older kernels where those tracepoints are present.
- Also make it handle the new "node" kmem:kmalloc and
kmem:kmem_cache_alloc tracepoint field introduced in that same
commit.
- Fix hardware tracing PMU address filter duplicate symbol selection,
that was preventing to match with static functions with the same name
present in different object files.
- Fix regression on what linux/types.h file gets used to build the "BPF
prologue" 'perf test' entry, the system one lacks the fmode_t
definition used in this test, so provide that type in the test
itself.
- Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1. If the
user asks for linking with the libbpf package provided by the distro,
then it has to be >= 0.8.0. Using the libbpf supplied with the kernel
would be a fallback in that case.
- Fix the build when libbpf isn't available or explicitly disabled via
NO_LIBBPF=1.
- Don't try to install libtraceevent plugins as its not anymore in the
kernel sources and will thus always fail.
* tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf auxtrace: Fix address filter duplicate symbol selection
perf bpf: Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1
perf build: Fix build error when NO_LIBBPF=1
perf tools: Don't install libtraceevent plugins as its not anymore in the kernel sources
perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring
perf kmem: Support legacy tracepoints
perf build: Properly guard libbpf includes
perf tests bpf prologue: Fix bpf-script-test-prologue test compile issue with clang
|
|
In commit 14243b387137a ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN
and event channel delivery") the clever version of me left some helpful
notes for those who would come after him:
/*
* For the irqfd workqueue, using the main kvm->lock mutex is
* fine since this function is invoked from kvm_set_irq() with
* no other lock held, no srcu. In future if it will be called
* directly from a vCPU thread (e.g. on hypercall for an IPI)
* then it may need to switch to using a leaf-node mutex for
* serializing the shared_info mapping.
*/
mutex_lock(&kvm->lock);
In commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
the other version of me ran straight past that comment without reading it,
and introduced a potential deadlock by taking vcpu->mutex and kvm->lock
in the wrong order.
Solve this as originally suggested, by adding a leaf-node lock in the Xen
state rather than using kvm->lock for it.
Fixes: 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Signed-off-by: David Woodhouse <[email protected]>
Message-Id: <[email protected]>
[Rebase, add docs. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
is_mmconf_reserved() takes a "with_e820" parameter that only determines the
message logged if it finds the MMCONFIG region is reserved. Pass the
message directly, which will simplify a future patch that adds a new way of
looking for that reservation. No functional change intended.
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Tony Luck <[email protected]>
Tested-by: Giovanni Cabiddu <[email protected]>
Tested-by: Kan Liang <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
|
|
"about.html" is available only for the alabaster theme [1].
Unconditionally putting it to html_sidebars prevents us from
using other themes which respect html_sidebars.
Remove about.html from the initialization and insert it at the
front for the alabaster theme.
Link: [1] https://alabaster.readthedocs.io/en/latest/installation.html#sidebars
Fixes: d5389d3145ef ("docs: Switch the default HTML theme to alabaster")
Signed-off-by: Akira Yokosawa <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
Signed-off-by: Heiko Carstens <[email protected]>
|
|
The commit 79417d040f4f ("btrfs: zoned: disable metadata overcommit for
zoned") disabled the metadata over-commit to track active zones properly.
However, it also introduced a heavy overhead by allocating new metadata
block groups and/or flushing dirty buffers to release the space
reservations. Specifically, a workload (write only without any sync
operations) worsen its performance from 343.77 MB/sec (v5.19) to 182.89
MB/sec (v6.0).
The performance is still bad on current misc-next which is 187.95 MB/sec.
And, with this patch applied, it improves back to 326.70 MB/sec (+73.82%).
This patch introduces a new fs_info->flag BTRFS_FS_NO_OVERCOMMIT to
indicate it needs to disable the metadata over-commit. The flag is enabled
when a device with max active zones limit is loaded into a file-system.
Fixes: 79417d040f4f ("btrfs: zoned: disable metadata overcommit for zoned")
CC: [email protected] # 6.0+
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
[BUG]
There are some reports from the mailing list that since v6.1 kernel, the
WARN_ON() inside btrfs_qgroup_account_extent() gets triggered during
rescan:
WARNING: CPU: 3 PID: 6424 at fs/btrfs/qgroup.c:2756 btrfs_qgroup_account_extents+0x1ae/0x260 [btrfs]
CPU: 3 PID: 6424 Comm: snapperd Tainted: P OE 6.1.2-1-default #1 openSUSE Tumbleweed 05c7a1b1b61d5627475528f71f50444637b5aad7
RIP: 0010:btrfs_qgroup_account_extents+0x1ae/0x260 [btrfs]
Call Trace:
<TASK>
btrfs_commit_transaction+0x30c/0xb40 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6]
? start_transaction+0xc3/0x5b0 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6]
btrfs_qgroup_rescan+0x42/0xc0 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6]
btrfs_ioctl+0x1ab9/0x25c0 [btrfs c39c9c546c241c593f03bd6d5f39ea1b676250f6]
? __rseq_handle_notify_resume+0xa9/0x4a0
? mntput_no_expire+0x4a/0x240
? __seccomp_filter+0x319/0x4d0
__x64_sys_ioctl+0x90/0xd0
do_syscall_64+0x5b/0x80
? syscall_exit_to_user_mode+0x17/0x40
? do_syscall_64+0x67/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fd9b790d9bf
</TASK>
[CAUSE]
Since commit e15e9f43c7ca ("btrfs: introduce
BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting"), if
our qgroup is already in inconsistent state, we will no longer do the
time-consuming backref walk.
This can leave some qgroup records without a valid old_roots ulist.
Normally this is fine, as btrfs_qgroup_account_extents() would also skip
those records if we have NO_ACCOUNTING flag set.
But there is a small window, if we have NO_ACCOUNTING flag set, and
inserted some qgroup_record without a old_roots ulist, but then the user
triggered a qgroup rescan.
During btrfs_qgroup_rescan(), we firstly clear NO_ACCOUNTING flag, then
commit current transaction.
And since we have a qgroup_record with old_roots = NULL, we trigger the
WARN_ON() during btrfs_qgroup_account_extents().
[FIX]
Unfortunately due to the introduction of NO_ACCOUNTING flag, the
assumption that every qgroup_record would have its old_roots populated
is no longer correct.
Fix the false alerts and drop the WARN_ON().
Reported-by: Lukas Straub <[email protected]>
Reported-by: HanatoK <[email protected]>
Fixes: e15e9f43c7ca ("btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting")
CC: [email protected] # 6.1
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
device_add_list()
[BUG]
When test case btrfs/219 (aka, mount a registered device but with a lower
generation) failed, there is not any useful information for the end user
to find out what's going wrong.
The mount failure just looks like this:
# mount -o loop /tmp/219.img2 /mnt/btrfs/
mount: /mnt/btrfs: mount(2) system call failed: File exists.
dmesg(1) may have more information after failed mount system call.
While the dmesg contains nothing but the loop device change:
loop1: detected capacity change from 0 to 524288
[CAUSE]
In device_list_add() we have a lot of extra checks to reject invalid
cases.
That function also contains the regular device scan result like the
following prompt:
BTRFS: device fsid 6222333e-f9f1-47e6-b306-55ddd4dcaef4 devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (3027)
But unfortunately not all errors have their own error messages, thus if
we hit something wrong in device_add_list(), there may be no error
messages at all.
[FIX]
Add errors message for all non-ENOMEM errors.
For ENOMEM, I'd say we're in a much worse situation, and there should be
some OOM messages way before our call sites.
CC: [email protected] # 6.0+
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Documentation/virt/kvm/locking.rst tells us that kvm->lock is taken outside
vcpu->mutex. But that doesn't actually happen very often; it's only in
some esoteric cases like migration with AMD SEV. This means that lockdep
usually doesn't notice, and doesn't do its job of keeping us honest.
Ensure that lockdep *always* knows about the ordering of these two locks,
by briefly taking vcpu->mutex in kvm_vm_ioctl_create_vcpu() while kvm->lock
is held.
Signed-off-by: David Woodhouse <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The kvm_xen_update_runstate_guest() function can be called when the vCPU
is being scheduled out, from a preempt notifier. It *opportunistically*
updates the runstate area in the guest memory, if the gfn_to_pfn_cache
which caches the appropriate address is still valid.
If there is *contention* when it attempts to obtain gpc->lock, then
locking inside the priority inheritance checks may cause a deadlock.
Lockdep reports:
[13890.148997] Chain exists of:
&gpc->lock --> &p->pi_lock --> &rq->__lock
[13890.149002] Possible unsafe locking scenario:
[13890.149003] CPU0 CPU1
[13890.149004] ---- ----
[13890.149005] lock(&rq->__lock);
[13890.149007] lock(&p->pi_lock);
[13890.149009] lock(&rq->__lock);
[13890.149011] lock(&gpc->lock);
[13890.149013]
*** DEADLOCK ***
In the general case, if there's contention for a read lock on gpc->lock,
that's going to be because something else is either invalidating or
revalidating the cache. Either way, we've raced with seeing it in an
invalid state, in which case we would have aborted the opportunistic
update anyway.
So in the 'atomic' case when called from the preempt notifier, just
switch to using read_trylock() and avoid the PI handling altogether.
Signed-off-by: David Woodhouse <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In commit 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate
area") we declared it safe to obtain two gfn_to_pfn_cache locks at the same
time:
/*
* The guest's runstate_info is split across two pages and we
* need to hold and validate both GPCs simultaneously. We can
* declare a lock ordering GPC1 > GPC2 because nothing else
* takes them more than one at a time.
*/
However, we forgot to tell lockdep. Do so, by setting a subclass on the
first lock before taking the second.
Fixes: 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate area")
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm64 fixes for 6.2, take #1
- Fix the PMCR_EL0 reset value after the PMU rework
- Correctly handle S2 fault triggered by a S1 page table walk
by not always classifying it as a write, as this breaks on
R/O memslots
- Document why we cannot exit with KVM_EXIT_MMIO when taking
a write fault from a S1 PTW on a R/O memslot
- Put the Apple M2 on the naughty step for not being able to
correctly implement the vgic SEIS feature, just liek the M1
before it
- Reviewer updates: Alex is stepping down, replaced by Zenghui
|
|
kvm->srcu is taken in KVM_RUN and several other vCPU ioctls, therefore
vcpu->mutex is susceptible to the same deadlock that is documented
for kvm->slots_lock. The same holds for kvm->lock, since kvm->lock
is held outside vcpu->mutex. Fix the documentation and rearrange it
to highlight the difference between these locks and kvm->slots_arch_lock,
and how kvm->slots_arch_lock can be useful while processing a vmexit.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When a match has been made to the nth duplicate symbol, return
success not error.
Example:
Before:
$ cat file.c
cat: file.c: No such file or directory
$ cat file1.c
#include <stdio.h>
static void func(void)
{
printf("First func\n");
}
void other(void);
int main()
{
func();
other();
return 0;
}
$ cat file2.c
#include <stdio.h>
static void func(void)
{
printf("Second func\n");
}
void other(void)
{
func();
}
$ gcc -Wall -Wextra -o test file1.c file2.c
$ perf record -e intel_pt//u --filter 'filter func @ ./test' -- ./test
Multiple symbols with name 'func'
#1 0x1149 l func
which is near main
#2 0x1179 l func
which is near other
Disambiguate symbol name by inserting #n after the name e.g. func #2
Or select a global symbol by inserting #0 or #g or #G
Failed to parse address filter: 'filter func @ ./test'
Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
Where multiple filters are separated by space or comma.
$ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test
Failed to parse address filter: 'filter func #2 @ ./test'
Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
Where multiple filters are separated by space or comma.
After:
$ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test
First func
Second func
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data ]
$ perf script --itrace=b -Ftime,flags,ip,sym,addr --ns
1231062.526977619: tr strt 0 [unknown] => 558495708179 func
1231062.526977619: tr end call 558495708188 func => 558495708050 _init
1231062.526979286: tr strt 0 [unknown] => 55849570818d func
1231062.526979286: tr end return 55849570818f func => 55849570819d other
Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters")
Reported-by: Dmitrii Dolgov <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Dmitry Dolgov <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
CHICKEN_RASTER_{1,2} got overlooked with the move done in commit
a9e69428b1b4 ("drm/i915: Define MCR registers explicitly"). Registers
from the SVG unit became multicast as of Xe_HP graphics.
BSpec: 66534
Fixes: a9e69428b1b4 ("drm/i915: Define MCR registers explicitly")
Signed-off-by: Gustavo Sousa <[email protected]>
Cc: Matt Roper <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 10903b0a0f4d4964b352fa3df12d3d2ef5fb7a3b)
Signed-off-by: Rodrigo Vivi <[email protected]>
|
|
Use READ_ONCE() before cmpxchg() to prevent that the compiler generates
code that fetches the to be compared old value several times from memory.
Reviewed-by: Christian Borntraeger <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Make sure that *ptr__ within arch_this_cpu_to_op_simple() is only
dereferenced once by using READ_ONCE(). Otherwise the compiler could
generate incorrect code.
Cc: <[email protected]>
Reviewed-by: Alexander Gordeev <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
The current cmpxchg_double() loops within the perf hw sampling code do not
have READ_ONCE() semantics to read the old value from memory. This allows
the compiler to generate code which reads the "old" value several times
from memory, which again allows for inconsistencies.
For example:
/* Reset trailer (using compare-double-and-swap) */
do {
te_flags = te->flags & ~SDB_TE_BUFFER_FULL_MASK;
te_flags |= SDB_TE_ALERT_REQ_MASK;
} while (!cmpxchg_double(&te->flags, &te->overflow,
te->flags, te->overflow,
te_flags, 0ULL));
The compiler could generate code where te->flags used within the
cmpxchg_double() call may be refetched from memory and which is not
necessarily identical to the previous read version which was used to
generate te_flags. Which in turn means that an incorrect update could
happen.
Fix this by adding READ_ONCE() semantics to all cmpxchg_double()
loops. Given that READ_ONCE() cannot generate code on s390 which atomically
reads 16 bytes, use a private compare-and-swap-double implementation to
achieve that.
Also replace cmpxchg_double() with the private implementation to be able to
re-use the old value within the loops.
As a side effect this converts the whole code to only use bit fields
to read and modify bits within the hws trailer header.
Reported-by: Alexander Gordeev <[email protected]>
Acked-by: Alexander Gordeev <[email protected]>
Acked-by: Hendrik Brueckner <[email protected]>
Reviewed-by: Thomas Richter <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/linux-s390/Y71QJBhNTIatvxUT@osiris/T/#ma14e2a5f7aa8ed4b94b6f9576799b3ad9c60f333
Signed-off-by: Heiko Carstens <[email protected]>
|
|
The newly added spi-cs-setup-ns doesn't really fit with the existing
property names for delays, rename it so that it does before it makes it
into a release and becomes ABI.
|
|
The two debug messages in spidev_open() dereference spidev->spi without
taking the lock and without checking if it's not null. This can lead to
a crash. Drop the messages as they're not needed - the user-space will
get informed about ENOMEM with the syscall return value.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
There's a spinlock in place that is taken in file_operations callbacks
whenever we check if spidev->spi is still alive (not null). It's also
taken when spidev->spi is set to NULL in remove().
This however doesn't protect the code against driver unbind event while
one of the syscalls is still in progress. To that end we need a lock taken
continuously as long as we may still access spidev->spi. As both the file
ops and the remove callback are never called from interrupt context, we
can replace the spinlock with a mutex.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
mlx5-fixes-2023-01-09
|
|
The total cork length created by ip6_append_data includes extension
headers, so we must exclude them when comparing them against the
IPV6_CHECKSUM offset which does not include extension headers.
Reported-by: Kyle Zeng <[email protected]>
Fixes: 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can corrupt kernel memory")
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers")
converted nouveau to generic fbdev emulation. The driver's internal
implementation later got accidentally restored during a merge commit.
Remove the file from the driver. No functional changes.
v2:
* point Fixes tag to merge commit (Alex)
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Fixes: 4e291f2f5853 ("Merge tag 'drm-misc-next-2022-11-10-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")
Cc: Ben Skeggs <[email protected]>
Cc: Karol Herbst <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Javier Martinez Canillas <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
If ptp was not enabled due to missing IRQ for instance,
lan966x_ptp_deinit() will dereference NULL pointers.
Fixes: d096459494a8 ("net: lan966x: Add support for ptp clocks")
Signed-off-by: Clément Léger <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|