Age | Commit message (Collapse) | Author | Files | Lines |
|
Function ubifs_dump_node() has been modified to avoid memory oob
accessing while dumping node, node length (corresponding to the
size of allocated memory for node) should be passed into all node
dumping callers.
Signed-off-by: Zhihao Cheng <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
node_len"
This reverts commit acc5af3efa30 ("ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len")
No need to avoid memory oob in dumping for data node alone. Later, node
length will be passed into function 'ubifs_dump_node()' which replaces
all node dumping places.
Signed-off-by: Zhihao Cheng <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
To prevent memory out-of-bounds accessing in ubifs_dump_node(), actual
dumping length should be restricted by another condition(size of memory
which is allocated for the node).
This patch handles following situations (These situations may be caused
by bit flipping due to hardware error, writing bypass ubifs, unknown
bugs in ubifs, etc.):
1. bad node_len: Dumping data according to 'ch->len' which may exceed
the size of memory allocated for node.
2. bad node content: Some kinds of node can record additional data, eg.
index node and orphan node, make sure the size of additional data
not beyond the node length.
3. node_type changes: Read data according to type A, but expected type
B, before that, node is allocated according to type B's size. Length
of type A node is greater than type B node.
Signed-off-by: Zhihao Cheng <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
There is a redundant return in dbg_check_nondata_nodes_order,
which will be never reached. In addition, error code should be
returned instead of zero in this branch.
Signed-off-by: Chengsong Ke <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
syzkaller found the following JFFS2 splat:
Unable to handle kernel paging request at virtual address dfffa00000000001
Mem abort info:
ESR = 0x96000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
[dfffa00000000001] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 12745 Comm: syz-executor.5 Tainted: G S 5.9.0-rc8+ #98
Hardware name: linux,dummy-virt (DT)
pstate: 20400005 (nzCv daif +PAN -UAO BTYPE=--)
pc : jffs2_parse_param+0x138/0x308 fs/jffs2/super.c:206
lr : jffs2_parse_param+0x108/0x308 fs/jffs2/super.c:205
sp : ffff000022a57910
x29: ffff000022a57910 x28: 0000000000000000
x27: ffff000057634008 x26: 000000000000d800
x25: 000000000000d800 x24: ffff0000271a9000
x23: ffffa0001adb5dc0 x22: ffff000023fdcf00
x21: 1fffe0000454af2c x20: ffff000024cc9400
x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000000 x16: ffffa000102dbdd0
x15: 0000000000000000 x14: ffffa000109e44bc
x13: ffffa00010a3a26c x12: ffff80000476e0b3
x11: 1fffe0000476e0b2 x10: ffff80000476e0b2
x9 : ffffa00010a3ad60 x8 : ffff000023b70593
x7 : 0000000000000003 x6 : 00000000f1f1f1f1
x5 : ffff000023fdcf00 x4 : 0000000000000002
x3 : ffffa00010000000 x2 : 0000000000000001
x1 : dfffa00000000000 x0 : 0000000000000008
Call trace:
jffs2_parse_param+0x138/0x308 fs/jffs2/super.c:206
vfs_parse_fs_param+0x234/0x4e8 fs/fs_context.c:117
vfs_parse_fs_string+0xe8/0x148 fs/fs_context.c:161
generic_parse_monolithic+0x17c/0x208 fs/fs_context.c:201
parse_monolithic_mount_data+0x7c/0xa8 fs/fs_context.c:649
do_new_mount fs/namespace.c:2871 [inline]
path_mount+0x548/0x1da8 fs/namespace.c:3192
do_mount+0x124/0x138 fs/namespace.c:3205
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount fs/namespace.c:3390 [inline]
__arm64_sys_mount+0x164/0x238 fs/namespace.c:3390
__invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
el0_svc_common.constprop.0+0x15c/0x598 arch/arm64/kernel/syscall.c:149
do_el0_svc+0x60/0x150 arch/arm64/kernel/syscall.c:195
el0_svc+0x34/0xb0 arch/arm64/kernel/entry-common.c:226
el0_sync_handler+0xc8/0x5b4 arch/arm64/kernel/entry-common.c:236
el0_sync+0x15c/0x180 arch/arm64/kernel/entry.S:663
Code: d2d40001 f2fbffe1 91002260 d343fc02 (38e16841)
---[ end trace 4edf690313deda44 ]---
This is because since ec10a24f10c8, the option parsing happens before
fill_super and so the MTD device isn't associated with the filesystem.
Defer the size check until there is a valid association.
Fixes: ec10a24f10c8 ("vfs: Convert jffs2 to use the new mount API")
Cc: <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Jamie Iles <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
fs/ubifs/super.c: function mount_ubifs:
the format specifier "lld" need arg type "long long",
but the according arg "old_idx_sz" has type
"unsigned long long"
Signed-off-by: Fangping Liang <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
The macro use will already have a semicolon.
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Fix to return PTR_ERR() error code from the error handling case where
ubifs_hash_get_desc() failed instead of 0 in ubifs_init_authentication(),
as done elsewhere in this function.
Fixes: 49525e5eecca5 ("ubifs: Add helper functions for authentication support")
Signed-off-by: Wang ShaoBo <[email protected]>
Reviewed-by: Sascha Hauer <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Write buffers use a kmalloc()'ed buffer, they can leak
up to seven bytes of kernel memory to flash if writes are not
aligned.
So use ubifs_pad() to fill these gaps with padding bytes.
This was never a problem while scanning because the scanner logic
manually aligns node lengths and skips over these gaps.
Cc: <[email protected]>
Fixes: 1e51764a3c2ac05a2 ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <[email protected]>
Reviewed-by: Zhihao Cheng <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Ubifs uses %d to print c->big_lpt, but c->big_lpt is a variable
of type unsigned int and should be printed with %u.
Signed-off-by: Chengsong Ke <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Set rp_size to zero will be ignore during remounting.
The method to identify whether we input a remounting option of
rp_size is to check if the rp_size input is zero. It can not work
well if we pass "rp_size=0".
This patch add a bool variable "set_rp_size" to fix this problem.
Reported-by: Jubin Zhong <[email protected]>
Signed-off-by: lizhe <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
The jffs2 mount options will be ignored when remounting jffs2.
It can be easily reproduced with the steps listed below.
1. mount -t jffs2 -o compr=none /dev/mtdblockx /mnt
2. mount -o remount compr=zlib /mnt
Since ec10a24f10c8, the option parsing happens before fill_super and
then pass fc, which contains the options parsing results, to function
jffs2_reconfigure during remounting. But function jffs2_reconfigure do
not update c->mount_opts.
This patch add a function jffs2_update_mount_opts to fix this problem.
By the way, I notice that tmpfs use the same way to update remounting
options. If it is necessary to unify them?
Cc: <[email protected]>
Fixes: ec10a24f10c8 ("vfs: Convert jffs2 to use the new mount API")
Signed-off-by: lizhe <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
The log of this problem is:
jffs2: Error garbage collecting node at 0x***!
jffs2: No space for garbage collection. Aborting GC thread
This is because GC believe that it do nothing, so it abort.
After going over the image of jffs2, I find a scene that
can trigger this problem stably.
The scene is: there is a normal dirent node at summary-area,
but abnormal at corresponding not-summary-area with error
name_crc.
The reason that GC exit abnormally is because it find that
abnormal dirent node to GC, but when it goes to function
jffs2_add_fd_to_list, it cannot meet the condition listed
below:
if ((*prev)->nhash == new->nhash && !strcmp((*prev)->name, new->name))
So no node is marked obsolete, statistical information of
erase_block do not change, which cause GC exit abnormally.
The root cause of this problem is: we do not check the
name_crc of the abnormal dirent node with summary is enabled.
Noticed that in function jffs2_scan_dirent_node, we use
function jffs2_scan_dirty_space to deal with the dirent
node with error name_crc. So this patch add a checking
code in function read_direntry to ensure the correctness
of dirent node. If checked failed, the dirent node will
be marked obsolete so GC will pass this node and this
problem will be fixed.
Cc: <[email protected]>
Signed-off-by: Zhe Li <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Define ubifs_listxattr and ubifs_xattr_handlers to NULL
when CONFIG_UBIFS_FS_XATTR is not enabled, then we can
remove many ugly ifdef macros in the code.
Signed-off-by: Chengguang Xu <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
When debug (print) macros are not enabled, change them to use the
no_printk() macro instead of <nothing>. This fixes gcc warnings when
-Wextra is used:
../fs/jffs2/nodelist.c:255:37: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body]
../fs/jffs2/nodelist.c:278:38: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body]
../fs/jffs2/nodelist.c:558:52: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body]
../fs/jffs2/xattr.c:1247:58: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
../fs/jffs2/xattr.c:1281:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
Builds without warnings on all 3 levels of CONFIG_JFFS2_FS_DEBUG.
Signed-off-by: Randy Dunlap <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: [email protected]
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Delete repeated words in fs/ubifs/.
{negative, is, of, and, one, it}
where "it it" was changed to "if it".
Signed-off-by: Randy Dunlap <[email protected]>
To: [email protected]
Cc: Richard Weinberger <[email protected]>
Cc: [email protected]
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Eric Sandeen <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Rather than going through the big and hairy xfs_setattr_nonsize function,
just open code a transactional i_mode and i_ctime update. This allows
to mark xfs_setattr_nonsize and remove the flags argument to it.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Merge xfs_vn_setattr_nonsize into the only caller.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
It's enough to just use return code, and get rid of an argument.
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
This patch explicitly separates free inode chunk allocation and
inode allocation into two individual high level operations.
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Get rid of the confusing ialloc_context and failure handling around
xfs_dialloc() by moving xfs_dialloc_roll() into xfs_dialloc().
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
So xfs_ialloc() will only address in-core inode allocation then,
Also, rename xfs_ialloc() to xfs_dir_ialloc_init() in order to
keep everything in xfs_inode.c under the same namespace.
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Introduce a helper to make the on-disk inode allocation rolling
logic clearer in preparation of the following cleanup.
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Boolean is preferred for such use.
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Pull io_uring fixes from Jens Axboe:
"Two fixes in here, fixing issues introduced in this merge window"
* tag 'io_uring-5.10-2020-12-11' of git://git.kernel.dk/linux-block:
io_uring: fix file leak on error path of io ctx creation
io_uring: fix mis-seting personality's creds
|
|
The TIF_NOTIFY_SIGNAL based implementation of TWA_SIGNAL is always safe
to use, regardless of context, as we won't be recursing into the signal
lock. So now that all archs are using that, we can drop this deadlock
work-around as it's always safe to use TWA_SIGNAL.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Remove the dead code, TWA_SIGNAL will never set JOBCTL_TASK_WORK at
this point.
Signed-off-by: Jens Axboe <[email protected]>
|
|
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().
strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.
Conflicts:
net/netfilter/nf_tables_api.c
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs fix from Damien Le Moal:
"A single patch in this pull request to fix a BIO and page reference
leak when writing sequential zone files"
* tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: fix page reference and BIO leak
|
|
When we try to visit the pagemap of a tagged userspace pointer, we find
that the start_vaddr is not correct because of the tag.
To fix it, we should untag the userspace pointers in pagemap_read().
I tested with 5.10-rc4 and the issue remains.
Explanation from Catalin in [1]:
"Arguably, that's a user-space bug since tagged file offsets were never
supported. In this case it's not even a tag at bit 56 as per the arm64
tagged address ABI but rather down to bit 47. You could say that the
problem is caused by the C library (malloc()) or whoever created the
tagged vaddr and passed it to this function. It's not a kernel
regression as we've never supported it.
Now, pagemap is a special case where the offset is usually not
generated as a classic file offset but rather derived by shifting a
user virtual address. I guess we can make a concession for pagemap
(only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)"
My test code is based on [2]:
A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8
userspace program:
uint64 OsLayer::VirtualToPhysical(void *vaddr) {
uint64 frame, paddr, pfnmask, pagemask;
int pagesize = sysconf(_SC_PAGESIZE);
off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0
int fd = open(kPagemapPath, O_RDONLY);
...
if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) {
int err = errno;
string errtxt = ErrorString(err);
if (fd >= 0)
close(fd);
return 0;
}
...
}
kernel fs/proc/task_mmu.c:
static ssize_t pagemap_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
...
src = *ppos;
svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54
start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000
end_vaddr = mm->task_size;
/* watch out for wraparound */
// svpfn == 0xb400007662f54
// (mm->task_size >> PAGE) == 0x8000000
if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4
start_vaddr = end_vaddr;
ret = 0;
while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr
int len;
unsigned long end;
...
}
...
}
[1] https://lore.kernel.org/patchwork/patch/1343258/
[2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miles Chen <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Song Bao Hua (Barry Song) <[email protected]>
Cc: <[email protected]> [5.4-]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
fsnotify_parent() used to send two separate events to backends when a
parent inode is watching children and the child inode is also watching.
In an attempt to avoid duplicate events in fanotify, we unified the two
backend callbacks to a single callback and handled the reporting of the
two separate events for the relevant backends (inotify and dnotify).
However the handling is buggy and can result in inotify and dnotify
listeners receiving events of the type they never asked for or spurious
events.
The problem is the unified event callback with two inode marks (parent and
child) is called when any of the parent and child inodes are watched and
interested in the event, but the parent inode's mark that is interested
in the event on the child is not necessarily the one we are currently
reporting to (it could belong to a different group).
So before reporting the parent or child event flavor to backend we need
to check that the mark is really interested in that event flavor.
The semantics of INODE and CHILD marks were hard to follow and made the
logic more complicated than it should have been. Replace it with INODE
and PARENT marks semantics to hopefully make the logic more clear.
Thanks to Hugh Dickins for spotting a bug in the earlier version of this
patch.
Fixes: 497b0c5a7c06 ("fsnotify: send event to parent and child with single callback")
CC: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Hugh Dickins <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
|
|
Pull NFS client fixes from Anna Schumaker:
"Here are a handful more bugfixes for 5.10.
Unfortunately, we found some problems with the new READ_PLUS operation
that aren't easy to fix. We've decided to disable this codepath
through a Kconfig option for now, but a series of patches going into
5.11 will clean up the code and fix the issues at the same time. This
seemed like the best way to go about it.
Summary:
- Fix array overflow when flexfiles mirroring is enabled
- Fix rpcrdma_inline_fixup() crash with new LISTXATTRS
- Fix 5 second delay when doing inter-server copy
- Disable READ_PLUS by default"
* tag 'nfs-for-5.10-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Disable READ_PLUS by default
NFSv4.2: Fix 5 seconds delay when doing inter server copy
NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation
pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabled
|
|
dir_mode
make sure nd->dir_mode is always initialized after success exit from
link_path_walk(); in case of empty path it did not happen.
Reported-by: Anant Thazhemadam <[email protected]>
Tested-by: Anant Thazhemadam <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
If DCACHE_REFERENCED is set, fast_dput() will return true, and then
retain_dentry() have no chance to check DCACHE_DONTCACHE. As a result,
the dentry won't be killed and the corresponding inode can't be evicted.
In the following example, the DAX policy can't take effects unless we
do a drop_caches manually.
# DCACHE_LRU_LIST will be set
echo abcdefg > test.txt
# DCACHE_REFERENCED will be set and DCACHE_DONTCACHE can't do anything
xfs_io -c 'chattr +x' test.txt
# Drop caches to make DAX changing take effects
echo 2 > /proc/sys/vm/drop_caches
What this patch does is preventing fast_dput() from returning true if
DCACHE_DONTCACHE is set. Then retain_dentry() will detect the
DCACHE_DONTCACHE and will return false. As a result, the dentry will be
killed and the inode will be evicted. In this way, if we change per-file
DAX policy, it will take effects automatically after this file is closed
by all processes.
I also add some comments to make the code more clear.
Signed-off-by: Hao Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
If generic_drop_inode() returns true, it means iput_final() can evict
this inode regardless of whether it is dirty or not. If we check
I_DONTCACHE in generic_drop_inode(), any inode with this bit set will be
evicted unconditionally. This is not the desired behavior because
I_DONTCACHE only means the inode shouldn't be cached on the LRU list.
As for whether we need to evict this inode, this is what
generic_drop_inode() should do. This patch corrects the usage of
I_DONTCACHE.
This patch was proposed in [1].
[1]: https://lore.kernel.org/linux-fsdevel/[email protected]/
Fixes: dae2f8ed7992 ("fs: Lift XFS_IDONTCACHE to the VFS layer")
Signed-off-by: Hao Li <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Missing calls to mntget() (or equivalently, too many calls to mntput())
are hard to detect because mntput() delays freeing mounts using
task_work_add(), then again using call_rcu(). As a result, mnt_count
can often be decremented to -1 without getting a KASAN use-after-free
report. Such cases are still bugs though, and they point to real
use-after-frees being possible.
For an example of this, see the bug fixed by commit 1b0b9cc8d379
("vfs: fsmount: add missing mntget()"), discussed at
https://lkml.kernel.org/linux-fsdevel/20190605135401.GB30925@xxxxxxxxxxxxxxxxxxxxxxxxx/T/#u.
This bug *should* have been trivial to find. But actually, it wasn't
found until syzkaller happened to use fchdir() to manipulate the
reference count just right for the bug to be noticeable.
Address this by making mntput_no_expire() issue a WARN if mnt_count has
become negative.
Suggested-by: Miklos Szeredi <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
We've been seeing failures with xfstests generic/091 and generic/263
when using READ_PLUS. I've made some progress on these issues, and the
tests fail later on but still don't pass. Let's disable READ_PLUS by
default until we can work out what is going on.
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Since commit b4868b44c5628 ("NFSv4: Wait for stateid updates after
CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5
seconds delay regardless of the size of the copy. The delay is from
nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential
fails because the seqid in both nfs4_state and nfs4_stateid are 0.
Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state,
until after the call to update_open_stateid, to indicate this is the 1st
open. This fix is part of a 2 patches, the other patch is the fix in the
source server to return the stateid for COPY_NOTIFY request with seqid 1
instead of 0.
Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
By switching to an XFS-backed export, I am able to reproduce the
ibcomp worker crash on my client with xfstests generic/013.
For the failing LISTXATTRS operation, xdr_inline_pages() is called
with page_len=12 and buflen=128.
- When ->send_request() is called, rpcrdma_marshal_req() does not
set up a Reply chunk because buflen is smaller than the inline
threshold. Thus rpcrdma_convert_iovs() does not get invoked at
all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked
on the receive buffer.
- During reply processing, rpcrdma_inline_fixup() tries to copy
received data into rq_rcv_buf->pages because page_len is positive.
But there are no receive pages because rpcrdma_marshal_req() never
allocated them.
The result is that the ibcomp worker faults and dies. Sometimes that
causes a visible crash, and sometimes it results in a transport hang
without other symptoms.
RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
should eventually be fixed or replaced. However, my preference is
that upper-layer operations should explicitly allocate their receive
buffers (using GFP_KERNEL) when possible, rather than relying on
XDRBUF_SPARSE_PAGES.
Reported-by: Olga kornievskaia <[email protected]>
Suggested-by: Olga kornievskaia <[email protected]>
Fixes: c10a75145feb ("NFSv4.2: add the extended attribute proc functions.")
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Olga kornievskaia <[email protected]>
Reviewed-by: Frank van der Linden <[email protected]>
Tested-by: Olga kornievskaia <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Recently syzbot reported[0] that there is a deadlock amongst the users
of exec_update_mutex. The problematic lock ordering found by lockdep
was:
perf_event_open (exec_update_mutex -> ovl_i_mutex)
chown (ovl_i_mutex -> sb_writes)
sendfile (sb_writes -> p->lock)
by reading from a proc file and writing to overlayfs
proc_pid_syscall (p->lock -> exec_update_mutex)
While looking at possible solutions it occured to me that all of the
users and possible users involved only wanted to state of the given
process to remain the same. They are all readers. The only writer is
exec.
There is no reason for readers to block on each other. So fix
this deadlock by transforming exec_update_mutex into a rw_semaphore
named exec_update_lock that only exec takes for writing.
Cc: Jann Horn <[email protected]>
Cc: Vasiliy Kulikov <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Bernd Edlinger <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Christopher Yeoh <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Sargun Dhillon <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex")
[0] https://lkml.kernel.org/r/[email protected]
Reported-by: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
Now that unshare_files happens in begin_new_exec after the point of no
return, io_uring_task_cancel can also happen later.
Effectively this means io_uring activities for a task are only canceled
when exec succeeds.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
Oleg Nesterov recently asked[1] why is there an unshare_files in
do_coredump. After digging through all of the callers of lookup_fd it
turns out that it is
arch/powerpc/platforms/cell/spufs/coredump.c:coredump_next_context
that needs the unshare_files in do_coredump.
Looking at the history[2] this code was also the only piece of coredump code
that required the unshare_files when the unshare_files was added.
Looking at that code it turns out that cell is also the only
architecture that implements elf_coredump_extra_notes_size and
elf_coredump_extra_notes_write.
I looked at the gdb repo[3] support for cell has been removed[4] in binutils
2.34. Geoff Levand reports he is still getting questions on how to
run modern kernels on the PS3, from people using 3rd party firmware so
this code is not dead. According to Wikipedia the last PS3 shipped in
Japan sometime in 2017. So it will probably be a little while before
everyone's hardware dies.
Add some comments briefly documenting the coredump code that exists
only to support cell spufs to make it easier to understand the
coredump code. Eventually the hardware will be dead, or their won't
be userspace tools, or the coredump code will be refactored and it
will be too difficult to update a dead architecture and these comments
make it easy to tell where to pull to remove cell spufs support.
[1] https://lkml.kernel.org/r/[email protected]
[2] 179e037fc137 ("do_coredump(): make sure that descriptor table isn't shared")
[3] git://sourceware.org/git/binutils-gdb.git
[4] abf516c6931a ("Remove Cell Broadband Engine debugging support").
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
When discussing[1] exec and posix file locks it was realized that none
of the callers of get_files_struct fundamentally needed to call
get_files_struct, and that by switching them to helper functions
instead it will both simplify their code and remove unnecessary
increments of files_struct.count. Those unnecessary increments can
result in exec unnecessarily unsharing files_struct which breaking
posix locks, and it can result in fget_light having to fallback to
fget reducing system performance.
Now that get_files_struct has no more users and can not cause the
problems for posix file locking and fget_light remove get_files_struct
so that it does not gain any new users.
[1] https://lkml.kernel.org/r/[email protected]
Suggested-by: Oleg Nesterov <[email protected]>
Acked-by: Christian Brauner <[email protected]>
v1: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
The function close_fd_get_file is explicitly a variant of
__close_fd[1]. Now that __close_fd has been renamed close_fd, rename
close_fd_get_file to be consistent with close_fd.
When __alloc_fd, __close_fd and __fd_install were introduced the
double underscore indicated that the function took a struct
files_struct parameter. The function __close_fd_get_file never has so
the naming has always been inconsistent. This just cleans things up
so there are not any lingering mentions or references __close_fd left
in the code.
[1] 80cd795630d6 ("binder: fix use-after-free due to ksys_close() during fdget()")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
Now that ksys_close is exactly identical to close_fd replace
the one caller of ksys_close with close_fd.
[1] https://lkml.kernel.org/r/[email protected]
Suggested-by: Christoph Hellwig <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
The function __close_fd was added to support binder[1]. Now that
binder has been fixed to no longer need __close_fd[2] all calls
to __close_fd pass current->files.
Therefore transform the files parameter into a local variable
initialized to current->files, and rename __close_fd to close_fd to
reflect this change, and keep it in sync with the similar changes to
__alloc_fd, and __fd_install.
This removes the need for callers to care about the extra care that
needs to be take if anything except current->files is passed, by
limiting the callers to only operation on current->files.
[1] 483ce1d4b8c3 ("take descriptor-related part of close() to file.c")
[2] 44d8047f1d87 ("binder: use standard functions to allocate fds")
Acked-by: Christian Brauner <[email protected]>
v1: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
The function __alloc_fd was added to support binder[1]. With binder
fixed[2] there are no more users.
As alloc_fd just calls __alloc_fd with "files=current->files",
merge them together by transforming the files parameter into a
local variable initialized to current->files.
[1] dcfadfa4ec5a ("new helper: __alloc_fd()")
[2] 44d8047f1d87 ("binder: use standard functions to allocate fds")
Acked-by: Christian Brauner <[email protected]>
v1: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
Simplify the code, and remove the chance of races by reading
RLIMIT_NOFILE only once in f_dupfd.
Pass the read value of RLIMIT_NOFILE into alloc_fd which is the other
location the rlimit was read in f_dupfd. As f_dupfd is the only
caller of alloc_fd this changing alloc_fd is trivially safe.
Further this causes alloc_fd to take all of the same arguments as
__alloc_fd except for the files_struct argument.
Acked-by: Christian Brauner <[email protected]>
v1: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
The function __fd_install was added to support binder[1]. With binder
fixed[2] there are no more users.
As fd_install just calls __fd_install with "files=current->files",
merge them together by transforming the files parameter into a
local variable initialized to current->files.
[1] f869e8a7f753 ("expose a low-level variant of fd_install() for binder")
[2] 44d8047f1d87 ("binder: use standard functions to allocate fds")
Acked-by: Christian Brauner <[email protected]>
v1:https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
|