Age | Commit message (Collapse) | Author | Files | Lines |
|
In gfs2_create_inode, get rid of the ghs array in favor of two separate
variables. This makes the code much less irritating.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
We have reserved the number of blocks we want to allocate, so the actual
allocation isn't expected to fail. Nevertheless, make the code behave
correctly even when things go wrong.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
The console_lock is used in part to guarantee safe list iteration.
The console_list_lock should be used because list synchronization
responsibility will be removed from the console_lock in a later
change.
Note, the console_lock is still needed to serialize the device()
callback with other console operations.
Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The console_lock is held throughout the start/show/stop procedure
to print out device/driver information about all registered
consoles. Since the console_lock is being used for multiple reasons,
explicitly document these reasons. This will be useful when the
console_lock is split into fine-grained locking.
Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Replace the open coded single linked list with a hlist so a conversion
to SRCU protected list walks can reuse the existing primitives.
Co-developed-by: John Ogness <[email protected]>
Signed-off-by: John Ogness <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
kmemleak reported memory leaks in configfs_create_dir():
unreferenced object 0xffff888009f6af00 (size 192):
comm "modprobe", pid 3777, jiffies 4295537735 (age 233.784s)
backtrace:
kmem_cache_alloc (mm/slub.c:3250 mm/slub.c:3256 mm/slub.c:3263 mm/slub.c:3273)
new_fragment (./include/linux/slab.h:600 fs/configfs/dir.c:163)
configfs_register_subsystem (fs/configfs/dir.c:1857)
basic_write (drivers/hwtracing/stm/p_basic.c:14) stm_p_basic
do_one_initcall (init/main.c:1296)
do_init_module (kernel/module/main.c:2455)
...
unreferenced object 0xffff888003ba7180 (size 96):
comm "modprobe", pid 3777, jiffies 4295537735 (age 233.784s)
backtrace:
kmem_cache_alloc (mm/slub.c:3250 mm/slub.c:3256 mm/slub.c:3263 mm/slub.c:3273)
configfs_new_dirent (./include/linux/slab.h:723 fs/configfs/dir.c:194)
configfs_make_dirent (fs/configfs/dir.c:248)
configfs_create_dir (fs/configfs/dir.c:296)
configfs_attach_group.isra.28 (fs/configfs/dir.c:816 fs/configfs/dir.c:852)
configfs_register_subsystem (fs/configfs/dir.c:1881)
basic_write (drivers/hwtracing/stm/p_basic.c:14) stm_p_basic
do_one_initcall (init/main.c:1296)
do_init_module (kernel/module/main.c:2455)
...
This is because the refcount is not correct in configfs_make_dirent().
For normal stage, the refcount is changing as:
configfs_register_subsystem()
configfs_create_dir()
configfs_make_dirent()
configfs_new_dirent() # set s_count = 1
dentry->d_fsdata = configfs_get(sd); # s_count = 2
...
configfs_unregister_subsystem()
configfs_remove_dir()
remove_dir()
configfs_remove_dirent() # s_count = 1
dput() ...
*dentry_unlink_inode()*
configfs_d_iput() # s_count = 0, release
However, if we failed in configfs_create():
configfs_register_subsystem()
configfs_create_dir()
configfs_make_dirent() # s_count = 2
...
configfs_create() # fail
->out_remove:
configfs_remove_dirent(dentry)
configfs_put(sd) # s_count = 1
return PTR_ERR(inode);
There is no inode in the error path, so the configfs_d_iput() is lost
and makes sd and fragment memory leaked.
To fix this, when we failed in configfs_create(), manually call
configfs_put(sd) to keep the refcount correct.
Fixes: 7063fbf22611 ("[PATCH] configfs: User-driven configuration filesystem")
Signed-off-by: Chen Zhongjin <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
The type should be struct posix_acl * instead of void *.
Cc: Christian Brauner <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
|
|
Fix to return a negative error code from create_elf_fdpic_tables()
instead of 0.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: [email protected]
Signed-off-by: Wang Yufen <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add support for XTS and CTS mode variant of SM4 algorithm. The former is
used to encrypt file contents, while the latter (SM4-CTS-CBC) is used to
encrypt filenames.
SM4 is a symmetric algorithm widely used in China, and is even mandatory
algorithm in some special scenarios. We need to provide these users with
the ability to encrypt files or disks using SM4-XTS.
Signed-off-by: Tianjia Zhang <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
While investigating test failures in xfs/17[1-3] in alwayscow mode, I
noticed through code inspection that xfs_bmap_alloc_userdata isn't
setting XFS_ALLOC_USERDATA when allocating extents for a file's CoW
fork. COW staging extents should be flagged as USERDATA, since user
data are persisted to these blocks before being remapped into a file.
This mis-classification has a few impacts on the behavior of the system.
First, the filestreams allocator is supposed to keep allocating from a
chosen AG until it runs out of space in that AG. However, it only does
that for USERDATA allocations, which means that COW allocations aren't
tied to the filestreams AG. Fortunately, few people use filestreams, so
nobody's noticed.
A more serious problem is that xfs_alloc_ag_vextent_small looks for a
buffer to invalidate *if* the USERDATA flag is set and the AG is so full
that the allocation had to come from the AGFL because the cntbt is
empty. The consequences of not invalidating the buffer are severe --
if the AIL incorrectly checkpoints a buffer that is now being used to
store user data, that action will clobber the user's written data.
Fix filestreams and yet another data corruption vector by flagging COW
allocations as USERDATA.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
xfs_btree_check_block contains debugging knobs. With XFS_DEBUG setting up,
turn on the debugging knob can trigger the assert of xfs_btree_islastblock,
test script as follows:
while true
do
mount $disk $mountpoint
fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null
echo 1 > /sys/fs/xfs/sda/errortag/btree_chk_sblk
sleep 10
umount $mountpoint
done
Kick off fsstress and only *then* turn on the debugging knob. If it
happens that the knob gets turned on after the cntbt lookup succeeds
but before the call to xfs_btree_islastblock, then we *can* end up in
the situation where a previously checked btree block suddenly starts
returning EFSCORRUPTED from xfs_btree_check_block. Kaboom.
Darrick give a very detailed explanation as follows:
Looking back at commit 27d9ee577dcce, I think the point of all this was
to make sure that the cursor has actually performed a lookup, and that
the btree block at whatever level we're asking about is ok.
If the caller hasn't ever done a lookup, the bc_levels array will be
empty, so cur->bc_levels[level].bp pointer will be NULL. The call to
xfs_btree_get_block will crash anyway, so the "ASSERT(block);" part is
pointless.
If the caller did a lookup but the lookup failed due to block
corruption, the corresponding cur->bc_levels[level].bp pointer will also
be NULL, and we'll still crash. The "ASSERT(xfs_btree_check_block);"
logic is also unnecessary.
If the cursor level points to an inode root, the block buffer will be
incore, so it had better always be consistent.
If the caller ignores a failed lookup after a successful one and calls
this function, the cursor state is garbage and the assert wouldn't have
tripped anyway. So get rid of the assert.
Fixes: 27d9ee577dcc ("xfs: actually check xfs_btree_check_block return in xfs_btree_islastblock")
Signed-off-by: Guo Xuenan <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeD
xfs: fix broken MAXREFCOUNT handling
This series fixes a bug in the refcount code where we don't merge
records correctly if the refcount is hovering around MAXREFCOUNT. This
fixes regressions in xfs/179 when fsdax is enabled. xfs/179 itself will
be modified to exploit the bug through the pagecache path.
Signed-off-by: Darrick J. Wong <[email protected]>
* tag 'maxrefcount-fixes-6.2_2022-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: estimate post-merge refcounts correctly
xfs: hoist refcount record merge predicates
|
|
Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount
metadata corruptions after being run. Specifically, xfs_repair noticed
single-block refcount records that could be combined but had not been.
The root cause of this is improper MAXREFCOUNT edge case handling in
xfs_refcount_merge_extents. When we're trying to find candidates for a
refcount btree record merge, we compute the refcount attribute of the
merged record, but we fail to account for the fact that once a record
hits rc_refcount == MAXREFCOUNT, it is pinned that way forever. Hence
the computed refcount is wrong, and we fail to merge the extents.
Fix this by adjusting the merge predicates to compute the adjusted
refcount correctly.
Fixes: 3172725814f9 ("xfs: adjust refcount of an extent of blocks in refcount btree")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Xiao Yang <[email protected]>
|
|
Hoist these multiline conditionals into separate static inline helpers
to improve readability and set the stage for corruption fixes that will
be introduced in the next patch.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Xiao Yang <[email protected]>
|
|
we might want to know why jbd2 thread using high io for detail,
split ext4_journal_start trace to ext4_journal_start_sb and
ext4_journal_start_inode, show ino and handle type when possible.
Signed-off-by: changfengnan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Before the commit 461c3af045d3 ("ext4: Change handle_mount_opt() to use
fs_parameter") ext4 mount option journal_path did follow links in the
provided path.
Bring this behavior back by allowing to pass pathwalk flags to
fs_lookup_param().
Fixes: 461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter")
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
|
|
Check the return value of ext4_xattr_inode_dec_ref(), which could
return error code and need to be warned.
Signed-off-by: Li Zhong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Return value directly from ext4_group_extend_no_check()
instead of getting value from redundant variable err.
Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Jinpeng Cui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
infinite loop
In do_writepages, if the value returned by ext4_writepages is "-ENOMEM"
and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met.
In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL,
the function returns -ENOMEM.
In __getblk_slow, if the return value of grow_buffers is less than 0,
the function returns NULL.
When the three processes are connected in series like the following stack,
an infinite loop may occur:
do_writepages <--- keep retrying
ext4_writepages
mpage_map_and_submit_extent
mpage_map_one_extent
ext4_map_blocks
ext4_ext_map_blocks
ext4_ext_handle_unwritten_extents
ext4_ext_convert_to_initialized
ext4_split_extent
ext4_split_extent_at
__ext4_ext_dirty
__ext4_mark_inode_dirty
ext4_reserve_inode_write
ext4_get_inode_loc
__ext4_get_inode_loc <--- return -ENOMEM
sb_getblk
__getblk_gfp
__getblk_slow <--- return NULL
grow_buffers
grow_dev_page <--- return -ENXIO
ret = (block < end_block) ? 1 : -ENXIO;
In this issue, bg_inode_table_hi is overwritten as an incorrect value.
As a result, `block < end_block` cannot be met in grow_dev_page.
Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages
keeps retrying. As a result, the writeback process is in the D state due
to an infinite loop.
Add a check on inode table block in the __ext4_get_inode_loc function by
referring to ext4_read_inode_bitmap to avoid this infinite loop.
Cc: [email protected]
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
In anticipation of putting random seeds in EFI variables, it's important
that the random GUID namespace of variables remains hidden from
userspace. We accomplish this by not populating efivarfs with entries
from that GUID, as well as denying the creation of new ones in that
GUID.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
Fix the following W=1 kernel build warning(s):
fs/fat/nfs.c:21: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
fs/fat/nfs.c:139: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bo Liu <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There is a memory leak reported by kmemleak:
unreferenced object 0xffff88810cc65e60 (size 32):
comm "mount.ocfs2", pid 23753, jiffies 4302528942 (age 34735.105s)
hex dump (first 32 bytes):
10 00 00 00 00 00 00 00 00 01 01 01 01 01 01 01 ................
01 01 01 01 01 01 01 01 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8170f73d>] __kmalloc+0x4d/0x150
[<ffffffffa0ac3f51>] ocfs2_compute_replay_slots+0x121/0x330 [ocfs2]
[<ffffffffa0b65165>] ocfs2_check_volume+0x485/0x900 [ocfs2]
[<ffffffffa0b68129>] ocfs2_mount_volume.isra.0+0x1e9/0x650 [ocfs2]
[<ffffffffa0b7160b>] ocfs2_fill_super+0xe0b/0x1740 [ocfs2]
[<ffffffff818e1fe2>] mount_bdev+0x312/0x400
[<ffffffff819a086d>] legacy_get_tree+0xed/0x1d0
[<ffffffff818de82d>] vfs_get_tree+0x7d/0x230
[<ffffffff81957f92>] path_mount+0xd62/0x1760
[<ffffffff81958a5a>] do_mount+0xca/0xe0
[<ffffffff81958d3c>] __x64_sys_mount+0x12c/0x1a0
[<ffffffff82f26f15>] do_syscall_64+0x35/0x80
[<ffffffff8300006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
This call stack is related to two problems. Firstly, the ocfs2 super uses
"replay_map" to trace online/offline slots, in order to recover offline
slots during recovery and mount. But when ocfs2_truncate_log_init()
returns an error in ocfs2_mount_volume(), the memory of "replay_map" will
not be freed in error handling path. Secondly, the memory of "replay_map"
will not be freed if d_make_root() returns an error in ocfs2_fill_super().
But the memory of "replay_map" will be freed normally when completing
recovery and mount in ocfs2_complete_mount_recovery().
Fix the first problem by adding error handling path to free "replay_map"
when ocfs2_truncate_log_init() fails. And fix the second problem by
calling ocfs2_free_replay_slots(osb) in the error handling path
"out_dismount". In addition, since ocfs2_free_replay_slots() is static,
it is necessary to remove its static attribute and declare it in header
file.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9140db04ef18 ("ocfs2: recover orphans in offline slots during recovery and mount")
Signed-off-by: Li Zetao <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The simple attribute files do not accept a negative value since the commit
488dac0c9237 ("libfs: fix error cast of negative value in
simple_attr_write()"), so we have to use a 64-bit value to write a
negative value for a debugfs file created by debugfs_create_atomic_t().
This restores the previous behaviour by introducing
DEFINE_DEBUGFS_ATTRIBUTE_SIGNED for a signed value.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()")
Signed-off-by: Akinobu Mita <[email protected]>
Reported-by: Zhao Gongyi <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Wei Yongjun <[email protected]>
Cc: Yicong Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "fix error when writing negative value to simple attribute
files".
The simple attribute files do not accept a negative value since the commit
488dac0c9237 ("libfs: fix error cast of negative value in
simple_attr_write()"), but some attribute files want to accept a negative
value.
This patch (of 3):
The simple attribute files do not accept a negative value since the commit
488dac0c9237 ("libfs: fix error cast of negative value in
simple_attr_write()"), so we have to use a 64-bit value to write a
negative value.
This adds DEFINE_SIMPLE_ATTRIBUTE_SIGNED for a signed value.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()")
Signed-off-by: Akinobu Mita <[email protected]>
Reported-by: Zhao Gongyi <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Wei Yongjun <[email protected]>
Cc: Yicong Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Removing the try_to_release_page() wrapper", v3.
This patchset replaces the remaining calls of try_to_release_page() with
the folio equivalent: filemap_release_folio(). This allows us to remove
the wrapper.
This patch (of 4):
Convert move_extent_per_page() to use folios. This change removes 5 calls
to compound_head() and is in preparation for the removal of the
try_to_release_page() wrapper.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The ei pointer does not need to cast the type.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Li zeming <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit 9a10064f5625 ("mm: add a field to store names for private
anonymous memory"), name for private anonymous memory, but not shared
anonymous, can be set. However, naming shared anonymous memory just as
useful for tracking purposes.
Extend the functionality to be able to set names for shared anon.
There are two ways to create anonymous shared memory, using memfd or
directly via mmap():
1. fd = memfd_create(...)
mem = mmap(..., MAP_SHARED, fd, ...)
2. mem = mmap(..., MAP_SHARED | MAP_ANONYMOUS, -1, ...)
In both cases the anonymous shared memory is created the same way by
mapping an unlinked file on tmpfs.
The memfd way allows to give a name for anonymous shared memory, but
not useful when parts of shared memory require to have distinct names.
Example use case: The VMM maps VM memory as anonymous shared memory (not
private because VMM is sandboxed and drivers are running in their own
processes). However, the VM tells back to the VMM how parts of the memory
are actually used by the guest, how each of the segments should be backed
(i.e. 4K pages, 2M pages), and some other information about the segments.
The naming allows us to monitor the effective memory footprint for each
of these segments from the host without looking inside the guest.
Sample output:
/* Create shared anonymous segmenet */
anon_shmem = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
/* Name the segment: "MY-NAME" */
rv = prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME,
anon_shmem, SIZE, "MY-NAME");
cat /proc/<pid>/maps (and smaps):
7fc8e2b4c000-7fc8f2b4c000 rw-s 00000000 00:01 1024 [anon_shmem:MY-NAME]
If the segment is not named, the output is:
7fc8e2b4c000-7fc8f2b4c000 rw-s 00000000 00:01 1024 /dev/zero (deleted)
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Pasha Tatashin <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Bagas Sanjaya <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vincent Whitchurch <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: xu xin <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
|
|
Syzbot reported a null-ptr-deref bug:
NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP
frequency < 30 seconds
general protection fault, probably for non-canonical address
0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 PID: 3603 Comm: segctord Not tainted
6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google
10/11/2022
RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0
fs/nilfs2/alloc.c:608
Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00
00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02
00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7
RSP: 0018:ffffc90003dff830 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d
RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010
RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f
R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158
R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0
Call Trace:
<TASK>
nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline]
nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193
nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236
nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940
nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline]
nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline]
nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088
nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568
nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018
nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline]
nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045
nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline]
nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
...
If DAT metadata file is corrupted on disk, there is a case where
req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during
a b-tree operation that cascadingly updates ancestor nodes of the b-tree,
because nilfs_dat_commit_alloc() for a lower level block can initialize
the blocknr on the same DAT entry between nilfs_dat_prepare_end() and
nilfs_dat_commit_end().
If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free()
without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and
causes the NULL pointer dereference above in
nilfs_palloc_commit_free_entry() function, which leads to a crash.
Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh
before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free().
This also calls nilfs_error() in that case to notify that there is a fatal
flaw in the filesystem metadata and prevent further operations.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The atomic_read was accidentally replaced with atomic_inc_return,
which prevents the server from getting cleaned up and causes rmmod
to hang with a warning:
Can't purge s=00000001
Fixes: 2757a4dc1849 ("afs: Fix access after dec in put functions")
Signed-off-by: Marc Dionne <[email protected]>
Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Linus Torvalds <[email protected]>
|
|
xfs log io error will trigger xlog shut down, and end_io worker call
xlog_state_shutdown_callbacks to unpin and release the buf log item.
The race condition is that when there are some thread doing transaction
commit and happened not to be intercepted by xlog_is_shutdown, then,
these log item will be insert into CIL, when unpin and release these
buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit`
can increase recurrence probability.
The following call graph actually encountered this bad situation.
fsstress io end worker kworker/0:1H-216
xlog_ioend_work
->xlog_force_shutdown
->xlog_state_shutdown_callbacks
->xlog_cil_process_committed
->xlog_cil_committed
->xfs_trans_committed_bulk
->xfs_trans_apply_sb_deltas ->li_ops->iop_unpin(lip, 1);
->xfs_trans_getsb
->_xfs_trans_bjoin
->xfs_buf_item_init
->if (bip) { return 0;} //relog
->xlog_cil_commit
->xlog_cil_insert_items //insert into CIL
->xfs_buf_ioend_fail(bp);
->xfs_buf_ioend
->xfs_buf_item_done
->xfs_buf_item_relse
->xfs_buf_item_free
when cil push worker gather percpu cil and insert super block buf log item
into ctx->log_items then uaf occurs.
==================================================================
BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0
Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105
CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W
6.1.0-rc1-00001-g274115149b42 #136
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Workqueue: xfs-cil/sda xlog_cil_push_work
Call Trace:
<TASK>
dump_stack_lvl+0x4d/0x66
print_report+0x171/0x4a6
kasan_report+0xb3/0x130
xlog_cil_push_work+0x1c8f/0x22f0
process_one_work+0x6f9/0xf70
worker_thread+0x578/0xf30
kthread+0x28c/0x330
ret_from_fork+0x1f/0x30
</TASK>
Allocated by task 2145:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_slab_alloc+0x54/0x60
kmem_cache_alloc+0x14a/0x510
xfs_buf_item_init+0x160/0x6d0
_xfs_trans_bjoin+0x7f/0x2e0
xfs_trans_getsb+0xb6/0x3f0
xfs_trans_apply_sb_deltas+0x1f/0x8c0
__xfs_trans_commit+0xa25/0xe10
xfs_symlink+0xe23/0x1660
xfs_vn_symlink+0x157/0x280
vfs_symlink+0x491/0x790
do_symlinkat+0x128/0x220
__x64_sys_symlink+0x7a/0x90
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 216:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x40
__kasan_slab_free+0x105/0x1a0
kmem_cache_free+0xb6/0x460
xfs_buf_ioend+0x1e9/0x11f0
xfs_buf_item_unpin+0x3d6/0x840
xfs_trans_committed_bulk+0x4c2/0x7c0
xlog_cil_committed+0xab6/0xfb0
xlog_cil_process_committed+0x117/0x1e0
xlog_state_shutdown_callbacks+0x208/0x440
xlog_force_shutdown+0x1b3/0x3a0
xlog_ioend_work+0xef/0x1d0
process_one_work+0x6f9/0xf70
worker_thread+0x578/0xf30
kthread+0x28c/0x330
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff88801800f388
which belongs to the cache xfs_buf_item of size 272
The buggy address is located 104 bytes inside of
272-byte region [ffff88801800f388, ffff88801800f498)
The buggy address belongs to the physical page:
page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000
index:0xffff88801800f208 pfn:0x1800e
head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0
flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640
raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint
Signed-off-by: Guo Xuenan <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Fix uaf in xfs_trans_ail_delete during xlog force shutdown.
In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in
xlog_state_shutdown_callbacks") changed the order of running callbacks
and wait for iclog completion to avoid unmount path untimely destroy AIL.
But which seems not enough to ensue this, adding mdelay in
`xfs_buf_item_unpin` can prove that.
The reproduction is as follows. To ensure destroy AIL safely,
we should wait all xlog ioend workers done and sync the AIL.
==================================================================
BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0
Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43
CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W
6.1.0-rc1-00002-gc28266863c4a #137
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Workqueue: xfs-log/sda xlog_ioend_work
Call Trace:
<TASK>
dump_stack_lvl+0x4d/0x66
print_report+0x171/0x4a6
kasan_report+0xb3/0x130
xfs_trans_ail_delete+0x240/0x2a0
xfs_buf_item_done+0x7b/0xa0
xfs_buf_ioend+0x1e9/0x11f0
xfs_buf_item_unpin+0x4c8/0x860
xfs_trans_committed_bulk+0x4c2/0x7c0
xlog_cil_committed+0xab6/0xfb0
xlog_cil_process_committed+0x117/0x1e0
xlog_state_shutdown_callbacks+0x208/0x440
xlog_force_shutdown+0x1b3/0x3a0
xlog_ioend_work+0xef/0x1d0
process_one_work+0x6f9/0xf70
worker_thread+0x578/0xf30
kthread+0x28c/0x330
ret_from_fork+0x1f/0x30
</TASK>
Allocated by task 9606:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x7a/0x90
__kmalloc+0x59/0x140
kmem_alloc+0xb2/0x2f0
xfs_trans_ail_init+0x20/0x320
xfs_log_mount+0x37e/0x690
xfs_mountfs+0xe36/0x1b40
xfs_fs_fill_super+0xc5c/0x1a70
get_tree_bdev+0x3c5/0x6c0
vfs_get_tree+0x85/0x250
path_mount+0xec3/0x1830
do_mount+0xef/0x110
__x64_sys_mount+0x150/0x1f0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 9662:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x40
__kasan_slab_free+0x105/0x1a0
__kmem_cache_free+0x99/0x2d0
kvfree+0x3a/0x40
xfs_log_unmount+0x60/0xf0
xfs_unmountfs+0xf3/0x1d0
xfs_fs_put_super+0x78/0x300
generic_shutdown_super+0x151/0x400
kill_block_super+0x9a/0xe0
deactivate_locked_super+0x82/0xe0
deactivate_super+0x91/0xb0
cleanup_mnt+0x32a/0x4a0
task_work_run+0x15f/0x240
exit_to_user_mode_prepare+0x188/0x190
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x42/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff888023169400
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes inside of
128-byte region [ffff888023169400, ffff888023169480)
The buggy address belongs to the physical page:
page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000
index:0xffff888023168f80 pfn:0x23168
head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0
flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0
raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint
Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks")
Signed-off-by: Guo Xuenan <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
I've been running near-continuous integration testing of online fsck,
and I've noticed that once a day, one of the ARM VMs will fail the test
with out of order records in the data fork.
xfs/804 races fsstress with online scrub (aka scan but do not change
anything), so I think this might be a bug in the core xfs code. This
also only seems to trigger if one runs the test for more than ~6 minutes
via TIME_FACTOR=13 or something.
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/tree/tests/xfs/804?h=djwong-wtf
I added a debugging patch to the kernel to check the data fork extents
after taking the ILOCK, before dropping ILOCK, and before and after each
bmapping operation. So far I've narrowed it down to the delalloc code
inserting a record in the wrong place in the iext tree:
xfs_bmap_add_extent_hole_delay, near line 2691:
case 0:
/*
* New allocation is not contiguous with another
* delayed allocation.
* Insert a new entry.
*/
oldlen = newlen = 0;
xfs_iunlock_check_datafork(ip); <-- ok here
xfs_iext_insert(ip, icur, new, state);
xfs_iunlock_check_datafork(ip); <-- bad here
break;
}
I recorded the state of the data fork mappings and iext cursor state
when a corrupt data fork is detected immediately after the
xfs_bmap_add_extent_hole_delay call in xfs_bmapi_reserve_delalloc:
ino 0x140bb3 func xfs_bmapi_reserve_delalloc line 4164 data fork:
ino 0x140bb3 nr 0x0 nr_real 0x0 offset 0xb9 blockcount 0x1f startblock 0x935de2 state 1
ino 0x140bb3 nr 0x1 nr_real 0x1 offset 0xe6 blockcount 0xa startblock 0xffffffffe0007 state 0
ino 0x140bb3 nr 0x2 nr_real 0x1 offset 0xd8 blockcount 0xe startblock 0x935e01 state 0
Here we see that a delalloc extent was inserted into the wrong position
in the iext leaf, same as all the other times. The extra trace data I
collected are as follows:
ino 0x140bb3 fork 0 oldoff 0xe6 oldlen 0x4 oldprealloc 0x6 isize 0xe6000
ino 0x140bb3 oldgotoff 0xea oldgotstart 0xfffffffffffffffe oldgotcount 0x0 oldgotstate 0
ino 0x140bb3 crapgotoff 0x0 crapgotstart 0x0 crapgotcount 0x0 crapgotstate 0
ino 0x140bb3 freshgotoff 0xd8 freshgotstart 0x935e01 freshgotcount 0xe freshgotstate 0
ino 0x140bb3 nowgotoff 0xe6 nowgotstart 0xffffffffe0007 nowgotcount 0xa nowgotstate 0
ino 0x140bb3 oldicurpos 1 oldleafnr 2 oldleaf 0xfffffc00f0609a00
ino 0x140bb3 crapicurpos 2 crapleafnr 2 crapleaf 0xfffffc00f0609a00
ino 0x140bb3 freshicurpos 1 freshleafnr 2 freshleaf 0xfffffc00f0609a00
ino 0x140bb3 newicurpos 1 newleafnr 3 newleaf 0xfffffc00f0609a00
The first line shows that xfs_bmapi_reserve_delalloc was called with
whichfork=XFS_DATA_FORK, off=0xe6, len=0x4, prealloc=6.
The second line ("oldgot") shows the contents of @got at the beginning
of the call, which are the results of the first iext lookup in
xfs_buffered_write_iomap_begin.
Line 3 ("crapgot") is the result of duplicating the cursor at the start
of the body of xfs_bmapi_reserve_delalloc and performing a fresh lookup
at @off.
Line 4 ("freshgot") is the result of a new xfs_iext_get_extent right
before the call to xfs_bmap_add_extent_hole_delay. Totally garbage.
Line 5 ("nowgot") is contents of @got after the
xfs_bmap_add_extent_hole_delay call.
Line 6 is the contents of @icur at the beginning fo the call. Lines 7-9
are the contents of the iext cursors at the point where the block
mappings were sampled.
I think @oldgot is a HOLESTARTBLOCK extent because the first lookup
didn't find anything, so we filled in imap with "fake hole until the
end". At the time of the first lookup, I suspect that there's only one
32-block unwritten extent in the mapping (hence oldicurpos==1) but by
the time we get to recording crapgot, crapicurpos==2.
Dave then added:
Ok, that's much simpler to reason about, and implies the smoke is
coming from xfs_buffered_write_iomap_begin() or
xfs_bmapi_reserve_delalloc(). I suspect the former - it does a lot
of stuff with the ILOCK_EXCL held.....
.... including calling xfs_qm_dqattach_locked().
xfs_buffered_write_iomap_begin
ILOCK_EXCL
look up icur
xfs_qm_dqattach_locked
xfs_qm_dqattach_one
xfs_qm_dqget_inode
dquot cache miss
xfs_iunlock(ip, XFS_ILOCK_EXCL);
error = xfs_qm_dqread(mp, id, type, can_alloc, &dqp);
xfs_ilock(ip, XFS_ILOCK_EXCL);
....
xfs_bmapi_reserve_delalloc(icur)
Yup, that's what is letting the magic smoke out -
xfs_qm_dqattach_locked() can cycle the ILOCK. If that happens, we
can pass a stale icur to xfs_bmapi_reserve_delalloc() and it all
goes downhill from there.
Back to Darrick now:
So. Fix this by moving the dqattach_locked call up before we take the
ILOCK, like all the other callers in that file.
Fixes: a526c85c2236 ("xfs: move xfs_file_iomap_begin_delay around") # goes further back than this
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
-Wuninitialized complains about @target in xfsaild_push being
uninitialized in the case where the waitqueue is active but there is no
last item in the AIL to wait for. I /think/ it should never be the case
that the subsequent xfs_trans_ail_cursor_first returns a log item and
hence we'll never end up at XFS_LSN_CMP, but let's make this explicit.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
When -Wstringop-truncation is enabled, the compiler complains about
truncation of the null byte at the end of the xattr name prefix. This
is intentional, since we're concatenating the two strings together and
do _not_ want a null byte in the middle of the name.
We've already ensured that the name buffer is long enough to handle
prefix and name, and the prefix_len is supposed to be the length of the
prefix string without the null byte, so use memcpy here instead.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Every now and then I see fstests failures on aarch64 (64k pages) that
trigger on the following sequence:
mkfs.xfs $dev
mount $dev $mnt
touch $mnt/a
umount $mnt
xfs_db -c 'path /a' -c 'print' $dev
99% of the time this succeeds, but every now and then xfs_db cannot find
/a and fails. This turns out to be a race involving udev/blkid, the
page cache for the block device, and the xfs_db process.
udev is triggered whenever anyone closes a block device or unmounts it.
The default udev rules invoke blkid to read the fs super and create
symlinks to the bdev under /dev/disk. For this, it uses buffered reads
through the page cache.
xfs_db also uses buffered reads to examine metadata. There is no
coordination between xfs_db and udev, which means that they can run
concurrently. Note there is no coordination between the kernel and
blkid either.
On a system with 64k pages, the page cache can cache the superblock and
the root inode (and hence the root dir) with the same 64k page. If
udev spawns blkid after the mkfs and the system is busy enough that it
is still running when xfs_db starts up, they'll both read from the same
page in the pagecache.
The unmount writes updated inode metadata to disk directly. The XFS
buffer cache does not use the bdev pagecache, nor does it invalidate the
pagecache on umount. If the above scenario occurs, the pagecache no
longer reflects what's on disk, xfs_db reads the stale metadata, and
fails to find /a. Most of the time this succeeds because closing a bdev
invalidates the page cache, but when processes race, everyone loses.
Fix the problem by invalidating the bdev pagecache after flushing the
bdev, so that xfs_db will see up to date metadata.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
generic_remap_checks() can reduce the effective request length (i.e.,
after the reflink extend to EOF case is handled) down to zero. If this
occurs, __generic_remap_file_range_prep() proceeds through dio
serialization, file mapping flush calls, and may invoke file_modified()
before returning back to the filesystem caller, all of which immediately
check for len == 0 and return.
While this is mostly harmless, it is spurious and not completely
without side effect. A filemap write call can submit I/O (but not
wait on it) when the specified end byte precedes the start but
happens to land on the same aligned page boundary, which can occur
from __generic_remap_file_range_prep() when len is 0.
The dedupe path already has a len == 0 check to break out before
doing range comparisons. Lift this check a bit earlier in the
function to cover the general case of len == 0 and avoid the
unnecessary work. While here, account for the case where
generic_remap_check_len() may also reduce length to zero.
Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
It's fairly useless to complain about using an obsolete feature without
telling the user which process used it. My Fedora desktop randomly drops
this message, but I would really need this patch to figure out what
triggers is.
[ jlayton: print pid as well as process name ]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
nfsd currently doesn't access i_flctx safely everywhere. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).
Acked-by: Chuck Lever <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
nfs currently doesn't access i_flctx safely. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).
Cc: Trond Myklebust <[email protected]>
Cc: Anna Schumaker <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
lockd currently doesn't access i_flctx safely. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).
Cc: Trond Myklebust <[email protected]>
Cc: Anna Schumaker <[email protected]>
Cc: Chuck Lever <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
ksmbd currently doesn't access i_flctx safely. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).
Cc: Steve French <[email protected]>
Acked-by: Namjae Jeon <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
cifs currently doesn't access i_flctx safely. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).
Cc: Steve French <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
ceph currently doesn't access i_flctx safely. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).
Reviewed-by: Xiubo Li <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
There are a number of places in the kernel that are accessing the
inode->i_flctx field without smp_load_acquire. This is required to
ensure that the caller doesn't see a partially-initialized structure.
Add a new accessor function for it to make this clear and convert all of
the relevant accesses in locks.c to use it. Also, convert
locks_free_lock_context to use the helper as well instead of just doing
a "bare" assignment.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
Ceph has a need to know whether a particular inode has any locks set on
it. It's currently tracking that by a num_locks field in its
filp->private_data, but that's problematic as it tries to decrement this
field when releasing locks and that can race with the file being torn
down.
Add a new vfs_inode_has_locks helper that just returns whether any locks
are currently held on the inode.
Reviewed-by: Xiubo Li <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
Instead of looking up the algorithm by name in hash_algo_name[] to get
its hash_algo ID, just store the hash_algo ID in the fsverity_hash_alg
struct. Verify at boot time that every fsverity_hash_alg has a valid
hash_algo ID with matching digest size.
Remove an unnecessary memset() of the whole digest array to 0 before the
digest is copied into it.
Finally, remove the pr_debug statement. There is already a pr_debug for
the fsverity digest when the file is opened.
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: Mimi Zohar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Fix the following coccicheck warning:
fs/ext4/inline.c:183: WARNING opportunity for min().
fs/ext4/extents.c:2631: WARNING opportunity for max().
fs/ext4/extents.c:2632: WARNING opportunity for min().
fs/ext4/extents.c:5559: WARNING opportunity for max().
fs/ext4/super.c:6908: WARNING opportunity for min().
min()/max() and min_t() macro is defined in include/linux/minmax.h.
It avoids multiple evaluations of the arguments when non-constant and
performs strict type-checking.
Reported-by: kernel test robot <[email protected]>
Suggested-by: Lukas Czerner <[email protected]>
Signed-off-by: Jiangshan Yi <[email protected]>
Reviewed-by: Lukas Czerner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
In ext4_evict_inode(), if we evicting an inode in the 'no_delete' path,
it cannot be raced by another mark_inode_dirty(). If it happens,
someone else may accidentally dirty it without holding inode refcount
and probably cause use-after-free issues in the writeback procedure.
It's indiscoverable and hard to debug, so add an WARN_ON_ONCE() to
check and detect this issue in advance.
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
|
|
As with PG_arch_2, this flag is only allowed on 64-bit architectures due
to the shortage of bits available. It will be used by the arm64 MTE code
in subsequent patches.
Signed-off-by: Peter Collingbourne <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Steven Price <[email protected]>
[[email protected]: added flag preserving in __split_huge_page_tail()]
Signed-off-by: Catalin Marinas <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|