Age | Commit message (Collapse) | Author | Files | Lines |
|
The following function is the only place that checks USER_LOCK_ATTACHED.
This flag is set when lock request is granted through user_ast() and only
the following function will clear it.
Checking of this flag here is to make sure ocfs2_dlm_unlock is not issued
if this lock is never granted. For example, lock file is created and then
get removed, open file never happens.
Clearing the flag here is not necessary because this is the only function
that checks it, if another flow is executing user_dlm_destroy_lock(), it
will bail out at the beginning because of USER_LOCK_IN_TEARDOWN and never
check USER_LOCK_ATTACHED. Drop the clear, so we don't need take care of
it for the following error handling patch.
int user_dlm_destroy_lock(struct user_lock_res *lockres)
{
...
status = 0;
if (!(lockres->l_flags & USER_LOCK_ATTACHED)) {
spin_unlock(&lockres->l_lock);
goto bail;
}
lockres->l_flags &= ~USER_LOCK_ATTACHED;
lockres->l_flags |= USER_LOCK_BUSY;
spin_unlock(&lockres->l_lock);
status = ocfs2_dlm_unlock(conn, &lockres->l_lksb, DLM_LKF_VALBLK);
if (status) {
user_log_dlm_error("ocfs2_dlm_unlock", status, lockres);
goto bail;
}
...
}
V1 discussion with Joseph:
https://lore.kernel.org/all/[email protected]/T/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Junxiao Bi <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The variable idx is assigned a value and is never read. The variable is
not used and is redundant, remove it.
Cleans up clang scan build warning:
warning: Although the value stored to 'idx' is used in the enclosing
expression, the value is never actually read from 'idx'
[deadcode.DeadStores]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Anton Altaparmakov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
All the timestamps in vfat_create() and vfat_mkdir() come from
fat_time_fat2unix() which ensures time granularity. We don't need to
truncate them to fit FAT's format.
Moreover, fat_truncate_crtime() and fat_timespec64_trunc_10ms() are also
removed because there is no caller anymore.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chung-Chiang Cheng <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
creation time is no longer mixed with change time. Add an in-memory field
for it, and report it in statx if supported.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chung-Chiang Cheng <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
FAT supports creation time but not change time, and there was no
corresponding timestamp for creation time in previous VFS. The original
implementation took the compromise of saving the in-memory change time
into the on-disk creation time field, but this would lead to compatibility
issues with non-linux systems.
To address this issue, this patch changes the behavior of ctime. It will
no longer be loaded and stored from the creation time on disk. Instead of
that, it'll be consistent with the in-memory mtime and share the same
on-disk field. All updates to mtime will also be applied to ctime in
memory, while all updates to ctime will be ignored.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chung-Chiang Cheng <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Separate fat_truncate_time() to each timestamps for later creation time
work.
This patch does not introduce any functional changes, it's merely
refactoring change.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chung-Chiang Cheng <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
I have been focusing on mm for the past two years. e.g. developing,
fixing bugs, reviewing. I have fixed lots of races (including memcg). I
would like to help people working on memcg or related by reviewing their
work. Let me be Cc'd on patches related to memcg.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: FanJun Kong <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
protected_* files have 600 permissions which prevents non-superuser from
reading them.
Container like "AWS greengrass" refuse to launch unless
protected_hardlinks and protected_symlinks are set. When containers like
these run with "userns-remap" or "--user" mapping container's root to
non-superuser on host, they fail to run due to denied read access to these
files.
As these protections are hardly a secret, and do not possess any security
risk, making them world readable.
Though above greengrass usecase needs read access to only
protected_hardlinks and protected_symlinks files, setting all other
protected_* files to 644 to keep consistency.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 800179c9b8a1 ("fs: add link restrictions")
Signed-off-by: Julius Hemanth Pitti <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Luis Chamberlain <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
mlogbuf_rlock has declared and initialized by DEFINE_SPINLOCK,
so we don't need to spin_lock_init again, drop it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Haowen Bai <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
pty_write() invokes kmalloc() which may invoke a normal printk() to print
failure message. This can cause a deadlock in the scenario reported by
syz-bot below:
CPU0 CPU1 CPU2
---- ---- ----
lock(console_owner);
lock(&port_lock_key);
lock(&port->lock);
lock(&port_lock_key);
lock(&port->lock);
lock(console_owner);
As commit dbdda842fe96 ("printk: Add console owner and waiter logic to
load balance console writes") said, such deadlock can be prevented by
using printk_deferred() in kmalloc() (which is invoked in the section
guarded by the port->lock). But there are too many printk() on the
kmalloc() path, and kmalloc() can be called from anywhere, so changing
printk() to printk_deferred() is too complicated and inelegant.
Therefore, this patch chooses to specify __GFP_NOWARN to kmalloc(), so
that printk() will not be called, and this deadlock problem can be
avoided.
Syzbot reported the following lockdep error:
======================================================
WARNING: possible circular locking dependency detected
5.4.143-00237-g08ccc19a-dirty #10 Not tainted
------------------------------------------------------
syz-executor.4/29420 is trying to acquire lock:
ffffffff8aedb2a0 (console_owner){....}-{0:0}, at: console_trylock_spinning kernel/printk/printk.c:1752 [inline]
ffffffff8aedb2a0 (console_owner){....}-{0:0}, at: vprintk_emit+0x2ca/0x470 kernel/printk/printk.c:2023
but task is already holding lock:
ffff8880119c9158 (&port->lock){-.-.}-{2:2}, at: pty_write+0xf4/0x1f0 drivers/tty/pty.c:120
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&port->lock){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159
tty_port_tty_get drivers/tty/tty_port.c:288 [inline] <-- lock(&port->lock);
tty_port_default_wakeup+0x1d/0xb0 drivers/tty/tty_port.c:47
serial8250_tx_chars+0x530/0xa80 drivers/tty/serial/8250/8250_port.c:1767
serial8250_handle_irq.part.0+0x31f/0x3d0 drivers/tty/serial/8250/8250_port.c:1854
serial8250_handle_irq drivers/tty/serial/8250/8250_port.c:1827 [inline] <-- lock(&port_lock_key);
serial8250_default_handle_irq+0xb2/0x220 drivers/tty/serial/8250/8250_port.c:1870
serial8250_interrupt+0xfd/0x200 drivers/tty/serial/8250/8250_core.c:126
__handle_irq_event_percpu+0x109/0xa50 kernel/irq/handle.c:156
[...]
-> #1 (&port_lock_key){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159
serial8250_console_write+0x184/0xa40 drivers/tty/serial/8250/8250_port.c:3198
<-- lock(&port_lock_key);
call_console_drivers kernel/printk/printk.c:1819 [inline]
console_unlock+0x8cb/0xd00 kernel/printk/printk.c:2504
vprintk_emit+0x1b5/0x470 kernel/printk/printk.c:2024 <-- lock(console_owner);
vprintk_func+0x8d/0x250 kernel/printk/printk_safe.c:394
printk+0xba/0xed kernel/printk/printk.c:2084
register_console+0x8b3/0xc10 kernel/printk/printk.c:2829
univ8250_console_init+0x3a/0x46 drivers/tty/serial/8250/8250_core.c:681
console_init+0x49d/0x6d3 kernel/printk/printk.c:2915
start_kernel+0x5e9/0x879 init/main.c:713
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
-> #0 (console_owner){....}-{0:0}:
[...]
lock_acquire+0x127/0x340 kernel/locking/lockdep.c:4734
console_trylock_spinning kernel/printk/printk.c:1773 [inline] <-- lock(console_owner);
vprintk_emit+0x307/0x470 kernel/printk/printk.c:2023
vprintk_func+0x8d/0x250 kernel/printk/printk_safe.c:394
printk+0xba/0xed kernel/printk/printk.c:2084
fail_dump lib/fault-inject.c:45 [inline]
should_fail+0x67b/0x7c0 lib/fault-inject.c:144
__should_failslab+0x152/0x1c0 mm/failslab.c:33
should_failslab+0x5/0x10 mm/slab_common.c:1224
slab_pre_alloc_hook mm/slab.h:468 [inline]
slab_alloc_node mm/slub.c:2723 [inline]
slab_alloc mm/slub.c:2807 [inline]
__kmalloc+0x72/0x300 mm/slub.c:3871
kmalloc include/linux/slab.h:582 [inline]
tty_buffer_alloc+0x23f/0x2a0 drivers/tty/tty_buffer.c:175
__tty_buffer_request_room+0x156/0x2a0 drivers/tty/tty_buffer.c:273
tty_insert_flip_string_fixed_flag+0x93/0x250 drivers/tty/tty_buffer.c:318
tty_insert_flip_string include/linux/tty_flip.h:37 [inline]
pty_write+0x126/0x1f0 drivers/tty/pty.c:122 <-- lock(&port->lock);
n_tty_write+0xa7a/0xfc0 drivers/tty/n_tty.c:2356
do_tty_write drivers/tty/tty_io.c:961 [inline]
tty_write+0x512/0x930 drivers/tty/tty_io.c:1045
__vfs_write+0x76/0x100 fs/read_write.c:494
[...]
other info that might help us debug this:
Chain exists of:
console_owner --> &port_lock_key --> &port->lock
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b6da31b2c07c ("tty: Fix data race in tty_insert_flip_string_fixed_flag")
Signed-off-by: Qi Zheng <[email protected]>
Acked-by: Jiri Slaby <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Cc: Akinobu Mita <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Pointer buf is being assigned a value that is not being read, buf is being
re-assigned in the next starement. The assignment is redundant and can be
removed.
Cleans up clang scan build warning:
kernel/relay.c:443:8: warning: Although the value stored to 'buf' is
used in the enclosing expression, the value is never actually read
from 'buf' [deadcode.DeadStores]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Kalle Valo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When the NTFS BOOT sectors_per_clusters field is > 0x80, it represents a
shift value. Make sure that the shift value is not too large before using
it (NTFS max cluster size is 2MB). Return -EVINVAL if it too large.
This prevents negative shift values and shift values that are larger than
the field size.
Prevents this UBSAN error:
UBSAN: shift-out-of-bounds in ../fs/ntfs3/super.c:673:16
shift exponent -192 is negative
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block")
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: [email protected]
Reviewed-by: Namjae Jeon <[email protected]>
Cc: Konstantin Komarov <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Kari Argillander <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add allocated strarray to device's resource list. This is a must to
automatically release strarray when the device disappears.
Without this fix we have a memory leak in the few drivers which use
devm_kasprintf_strarray().
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: acdb89b6c87a ("lib/string_helpers: Introduce managed variant of kasprintf_strarray()")
Signed-off-by: Puyou Lu <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
At the end of get_last_crashkernel(), the judgement of ck_cmdline is
obviously unnecessary and causes redundance, let's clean it up.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: lizhe <[email protected]>
Acked-by: Baoquan He <[email protected]>
Acked-by: Philipp Rudo <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Dave Young <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This is very theoretical compile failure:
ELF_ST_TYPE(st_info = A)
Cast will bind first and st_info will stop being lvalue:
error: lvalue required as left operand of assignment
Given that the only use of this macro is
ELF_ST_TYPE(sym->st_info)
where st_info is "unsigned char" I've decided to remove cast especially
given that companion macro ELF_ST_BIND doesn't use cast.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When running the stress-ng clone benchmark with multiple testing threads,
it was found that there were significant spinlock contention in sget_fc().
The contended spinlock was the sb_lock. It is under heavy contention
because the following code in the critcal section of sget_fc():
hlist_for_each_entry(old, &fc->fs_type->fs_supers, s_instances) {
if (test(old, fc))
goto share_extant_sb;
}
After testing with added instrumentation code, it was found that the
benchmark could generate thousands of ipc namespaces with the
corresponding number of entries in the mqueue's fs_supers list where the
namespaces are the key for the search. This leads to excessive time in
scanning the list for a match.
Looking back at the mqueue calling sequence leading to sget_fc():
mq_init_ns()
=> mq_create_mount()
=> fc_mount()
=> vfs_get_tree()
=> mqueue_get_tree()
=> get_tree_keyed()
=> vfs_get_super()
=> sget_fc()
Currently, mq_init_ns() is the only mqueue function that will indirectly
call mqueue_get_tree() with a newly allocated ipc namespace as the key for
searching. As a result, there will never be a match with the exising ipc
namespaces stored in the mqueue's fs_supers list.
So using get_tree_keyed() to do an existing ipc namespace search is just a
waste of time. Instead, we could use get_tree_nodev() to eliminate the
useless search. By doing so, we can greatly reduce the sb_lock hold time
and avoid the spinlock contention problem in case a large number of ipc
namespaces are present.
Of course, if the code is modified in the future to allow
mqueue_get_tree() to be called with an existing ipc namespace instead of a
new one, we will have to use get_tree_keyed() in this case.
The following stress-ng clone benchmark command was run on a 2-socket
48-core Intel system:
./stress-ng --clone 32 --verbose --oomable --metrics-brief -t 20
The "bogo ops/s" increased from 5948.45 before patch to 9137.06 after
patch. This is an increase of 54% in performance.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 935c6912b198 ("ipc: Convert mqueue fs to fs_context")
Signed-off-by: Waiman Long <[email protected]>
Cc: Al Viro <[email protected]>
Cc: David Howells <[email protected]>
Cc: Manfred Spraul <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
semtimedop() should be converted to use hrtimer like it has been done for
most of the system calls with timeouts. This system call already takes a
struct timespec as an argument and can therefore provide finer granularity
timed wait.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Prakash Sangappa <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Get rid of redundant assignments which end up in values not being
read either because they are overwritten or the function ends.
Reported by clang-tidy [deadcode.DeadStores]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Orzel <[email protected]>
Reviewed-by: Tom Rix <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add support for extraction of checksum-enabled "070702" cpio archives,
specified in Documentation/driver-api/early-userspace/buffer-format.rst.
Fail extraction if the calculated file data checksum doesn't match the
value carried in the header.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Disseldorp <[email protected]>
Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Martin Wilck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Documentation/driver-api/early-userspace/buffer-format.rst includes the
specification for checksum-enabled cpio archives. Implement support for
this format in gen_init_cpio via a new '-c' parameter.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Disseldorp <[email protected]>
Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Martin Wilck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When processing a "file" entry, gen_init_cpio attempts to allocate a
buffer large enough to stage the entire contents of the source file. It
then attempts to fill the buffer via a single read() call and subsequently
writes out the entire buffer length, without checking that read() returned
the full length, potentially writing uninitialized buffer memory.
Fix this by breaking up file I/O into 64k chunks and only writing the
length returned by the prior read() call.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
initramfs cpio mtime preservation, as implemented in commit 889d51a10712
("initramfs: add option to preserve mtime from initramfs cpio images"),
uses a linked list to defer directory mtime processing until after all
other items in the cpio archive have been processed. This is done to
ensure that parent directory mtimes aren't overwritten via subsequent
child creation.
The lkml link below indicates that the mtime retention use case was for
embedded devices with applications running exclusively out of initramfs,
where the 32-bit mtime value provided a rough file version identifier.
Linux distributions which discard an extracted initramfs immediately after
the root filesystem has been mounted may want to avoid the unnecessary
overhead.
This change adds a new INITRAMFS_PRESERVE_MTIME Kconfig option, which can
be used to disable on-by-default mtime retention and in turn speed up
initramfs extraction, particularly for cpio archives with large directory
counts.
Benchmarks with a one million directory cpio archive extracted 20 times
demonstrated:
mean extraction time (s) std dev
INITRAMFS_PRESERVE_MTIME=y 3.808 0.006
INITRAMFS_PRESERVE_MTIME unset 3.056 0.004
The above extraction times were measured using ftrace (initcall_finish -
initcall_start) values for populate_rootfs() with initramfs_async
disabled.
[[email protected]: rebase atop dir_entry.name flexible array member and drop separate initramfs_mtime.h header]
Link: https://lkml.org/lkml/2008/9/3/424
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
dir_entry.name is currently allocated via a separate kstrdup(). Change it
to a flexible array member and allocate it along with struct dir_entry.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Disseldorp <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Martin Wilck <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "initramfs: "crc" cpio format and INITRAMFS_PRESERVE_MTIME", v7.
This patchset does some minor initramfs refactoring and allows cpio entry
mtime preservation to be disabled via a new Kconfig
INITRAMFS_PRESERVE_MTIME option.
Patches 4/6 to 6/6 implement support for creation and extraction of "crc"
cpio archives, which carry file data checksums. Basic tests for this
functionality can be found at https://github.com/rapido-linux/rapido/pull/163
This patch (of 6):
do_header() is called for each cpio entry and fails if the first six bytes
don't match "newc" magic. The magic check includes a special case error
message if POSIX.1 ASCII (cpio -H odc) magic is detected. This special
case POSIX.1 check can be nested under the "newc" mismatch code path to
avoid calling memcmp() twice in a non-error case.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When a process exits, /proc/${pid}, and /proc/${pid}/net dentries are
flushed. However some leaf dentries like /proc/${pid}/net/arp_cache
aren't. That's because respective PDEs have proc_misc_d_revalidate() hook
which returns 1 and leaves dentries/inodes in the LRU.
Force revalidation/lookup on everything under /proc/${pid}/net by
inheriting proc_net_dentry_ops.
[[email protected]: coding-style cleanups]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: c6c75deda813 ("proc: fix lookup in /proc/net subdirectories after setns(2)")
Signed-off-by: Alexey Dobriyan <[email protected]>
Reported-by: hui li <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
sbi->s_firstinodezone is initialized to 2 and sbi->s_firstdatazone is read
from sbd. There's no guarantee that sbi->s_firstdatazone must bigger than
sbi->s_firstinodezone. If sbi->s_firstdatazone less than 2, the
filesystem can still be mounted unexpetly. At this point, sbi->s_ninodes
flip to very large value and this filesystem is broken. We can observe
this by executing 'df' command. When we execute, we will get an error
message:
"sysv_count_free_inodes: unable to read inode table"
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
If getdelays runs in a non-init network namespace, it will fail in getting
delayacct stats even if it has privilege of root user, which seems to be
not very reasonable. We can simply reproduce this by executing commands:
unshare -n
getdelays -d -p <pid>
I don't think net namespace should be an obstacle to the normal execution
of getdelay function. So let's make it available from all net namespaces.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: xu xin <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Yang Yang <[email protected]>
Cc: "Dr. Thomas Orgis" <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Ismael Luceno <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The task exit struct needs some crucial information to be able to provide
an enhanced version of process and thread accounting. This change
provides:
1. ac_tgid in additon to ac_pid
2. thread group execution walltime in ac_tgetime
3. flag AGROUP in ac_flag to indicate the last task
in a thread group / process
4. device ID and inode of task's /proc/self/exe in
ac_exe_dev and ac_exe_inode
5. tools/accounting/procacct as demonstrator
When a task exits, taskstats are reported to userspace including the
task's pid and ppid, but without the id of the thread group this task is
part of. Without the tgid, the stats of single tasks cannot be correlated
to each other as a thread group (process).
The taskstats documentation suggests that on process exit a data set
consisting of accumulated stats for the whole group is produced. But such
an additional set of stats is only produced for actually multithreaded
processes, not groups that had only one thread, and also those stats only
contain data about delay accounting and not the more basic information
about CPU and memory resource usage. Adding the AGROUP flag to be set
when the last task of a group exited enables determination of process end
also for single-threaded processes.
My applicaton basically does enhanced process accounting with summed
cputime, biggest maxrss, tasks per process. The data is not available
with the traditional BSD process accounting (which is not designed to be
extensible) and the taskstats interface allows more efficient on-the-fly
grouping and summing of the stats, anyway, without intermediate disk
writes.
Furthermore, I do carry statistics on which exact program binary is used
how often with associated resources, getting a picture on how important
which parts of a collection of installed scientific software in different
versions are, and how well they put load on the machine. This is enabled
by providing information on /proc/self/exe for each task. I assume the
two 64-bit fields for device ID and inode are more appropriate than the
possibly large resolved path to keep the data volume down.
Add the tgid to the stats to complete task identification, the flag AGROUP
to mark the last task of a group, the group wallclock time, and
inode-based identification of the associated executable file.
Add tools/accounting/procacct.c as a simplified fork of getdelays.c to
demonstrate process and thread accounting.
[[email protected]: fix version number in comment]
Link: https://lkml.kernel.org/r/20220405003601.7a5f6008@plasteblaster
Link: https://lkml.kernel.org/r/20220331004106.64e5616b@plasteblaster
Signed-off-by: Dr. Thomas Orgis <[email protected]>
Reviewed-by: Ismael Luceno <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: xu xin <[email protected]>
Cc: Yang Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
req->map is set in the valid case and always equals 'map' if the break was
hit. It therefore is unnecessary to use the list iterator variable and
the use of 'map' can be replaced with req->map.
This is done in preparation to limit the scope of a list iterator to the
list traversal loop [1].
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jakob Koschel <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "Brian Johannesmeyer" <[email protected]>
Cc: Cristiano Giuffrida <[email protected]>
Cc: "Bos, H.J." <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Get rid of redundant assignments which end up in values not being read
either because they are overwritten or the function ends.
Reported by clang-tidy [deadcode.DeadStores]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Orzel <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Michal Orzel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In MAINTAINERS PTRACE SUPPORT entry, the file include/uapi/linux/ptrace.h
is redundant, remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tiezhu Yang <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
PT_DTRACE is only used on um now, fix the wrong comment.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tiezhu Yang <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "ptrace: do some cleanup".
This patch (of 3):
PTRACE_SINGLESTEP is always defined as 9 in include/uapi/linux/ptrace.h,
remove redudant check of #ifdef PTRACE_SINGLESTEP.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tiezhu Yang <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
fat*_ent_bread() can be the cause of too many report on I/O error path.
So use fat_msg_ratelimit() instead.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: OGAWA Hirofumi <[email protected]>
Reported-by: qianfan <[email protected]>
Tested-by: qianfan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In order for end users to quickly react to new issues that come up in
production, it is proving useful to leverage the printk indexing system.
This printk index enables kernel developers to use calls to printk() with
changeable ad-hoc format strings (as they always have; no change of
expectations), while enabling end users to examine format strings to
detect changes.
Since end users are using regular expressions to match messages printed
through printk(), being able to detect changes in chosen format strings
from release to release provides a useful signal to review
printk()-matching regular expressions for any necessary updates.
So that detailed FAT messages are captured by this printk index, this
patch wraps fat_msg with a macro.
[[email protected]: coding-style cleanups]
Link: https://lkml.kernel.org/r/8aaa2dd7995e820292bb40d2120ab69756662c65.1648688136.git.jof@thejof.com
Signed-off-by: Jonathan Lassoff <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Tested-by: Petr Mladek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
iput() has already judged the incoming parameter, so there is no need to
repeat outside.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yubo Feng <[email protected]>
Reported-by: Hulk Robot <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The uselib syscall has been long deprecated. There's no need to keep this
enabled by default under X86_32.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
ep_poll() first calls ep_events_available() with no lock held and checks
if ep->rdllist is empty by list_empty_careful(), which reads
rdllist->prev. Thus all accesses to it need some protection to avoid
store/load-tearing.
Note INIT_LIST_HEAD_RCU() already has the annotation for both prev
and next.
Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket
fds.") added the first lockless ep_events_available(), and commit
c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
made some ep_events_available() calls lockless and added single call under
a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock
for zero timeout") made the last ep_events_available() lockless.
BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait
write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0:
INIT_LIST_HEAD include/linux/list.h:38 [inline]
list_splice_init include/linux/list.h:492 [inline]
ep_start_scan fs/eventpoll.c:622 [inline]
ep_send_events fs/eventpoll.c:1656 [inline]
ep_poll fs/eventpoll.c:1806 [inline]
do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234
do_epoll_pwait fs/eventpoll.c:2268 [inline]
__do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
__se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
__x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1:
list_empty_careful include/linux/list.h:329 [inline]
ep_events_available fs/eventpoll.c:381 [inline]
ep_poll fs/eventpoll.c:1797 [inline]
do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234
do_epoll_pwait fs/eventpoll.c:2268 [inline]
__do_sys_epoll_pwait fs/eventpoll.c:2281 [inline]
__se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275
__x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffff88810480c7d0 -> 0xffff888103c15098
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout")
Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reported-by: [email protected]
Cc: Al Viro <[email protected]>, Andrew Morton <[email protected]>
Cc: Kuniyuki Iwashima <[email protected]>
Cc: Kuniyuki Iwashima <[email protected]>
Cc: "Soheil Hassas Yeganeh" <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: "Sridhar Samudrala" <[email protected]>
Cc: Alexander Duyck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Fix data-races around epoll reported by KCSAN."
This series suppresses a false positive KCSAN's message and fixes a real
data-race.
This patch (of 2):
pipe_poll() runs locklessly and assigns 1 to poll_usage. Once poll_usage
is set to 1, it never changes in other places. However, concurrent writes
of a value trigger KCSAN, so let's make KCSAN happy.
BUG: KCSAN: data-race in pipe_poll / pipe_poll
write to 0xffff8880042f6678 of 4 bytes by task 174 on cpu 3:
pipe_poll (fs/pipe.c:656)
ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853)
do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234)
__x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
write to 0xffff8880042f6678 of 4 bytes by task 177 on cpu 1:
pipe_poll (fs/pipe.c:656)
ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853)
do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234)
__x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 177 Comm: epoll_race Not tainted 5.17.0-58927-gf443e374ae13 #6
Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Kuniyuki Iwashima <[email protected]>
Cc: "Soheil Hassas Yeganeh" <[email protected]>
Cc: "Sridhar Samudrala" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Clang static analysis reports this false positive
glob.c:48:32: warning: Assigned value is garbage
or undefined
char const *back_pat = NULL, *back_str = back_str;
^~~~~~~~ ~~~~~~~~
back_str is set after back_pat and it's use is protected by the !back_pat
check. It is not necessary to initialize back_str, so remove the
initialization.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use strchr(), which makes them a lot shorter, and more obviously symmetric
in their treatment of accept/reject. It also saves a little bit of .text;
bloat-o-meter for an arm build says
Function old new delta
strcspn 92 76 -16
strspn 108 76 -32
While here, also remove a stray empty line before EXPORT_SYMBOL().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Before refactoring strspn() and strcspn(), add some simple test cases.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
As in "kernel/panic.c: remove CONFIG_PANIC_ON_OOPS_VALUE indirection",
use the IS_ENABLED() helper rather than having a hidden config option.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
To make the test more robust, there are the following changes:
1. add a check for the return value of kmem_cache_alloc().
2. properly release the object `buf` on several error paths.
3. release the objects of `used_objects` if we never hit `saved_ptr`.
4. destroy the created cache by default.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoke Wang <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Xiaoke Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add support to also use the mailmap for 'in file' email addresses.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
Reported-by: Marc Zyngier <[email protected]>
Acked-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This fixes the following sparse warnings:
kernel/pid_namespace.c:55:77: warning: Using plain integer as NULL pointer
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Haowen Bai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
csum_and_copy_from_user and csum_and_copy_to_user are exported by a few
architectures, but not actually used in modular code. Drop the exports.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Cc: David Miller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Remove the read_from_oldmem() wrapper introduced earlier and convert all
the remaining callers to pass an iov_iter.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Cc: Amit Daniel Kachhap <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This gets rid of copy_to() and let us use proc_read_iter() instead of
proc_read().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Convert vmcore to use an iov_iter", v5.
For some reason several people have been sending bad patches to fix
compiler warnings in vmcore recently. Here's how it should be done.
Compile-tested only on x86. As noted in the first patch, s390 should take
this conversion a bit further, but I'm not inclined to do that work
myself.
This patch (of 3):
Instead of passing in a 'buf' and 'userbuf' argument, pass in an iov_iter.
s390 needs more work to pass the iov_iter down further, or refactor, but
I'd be more comfortable if someone who can test on s390 did that work.
It's more convenient to convert the whole of read_from_oldmem() to take an
iov_iter at the same time, so rename it to read_from_oldmem_iter() and add
a temporary read_from_oldmem() wrapper that creates an iov_iter.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Heiko Carstens <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|