Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently, the show_inode_state() macro only prints
part of the state of inode->i_state. Let’s improve it
to display more of its state.
Signed-off-by: Julian Sun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
It's observed that a crash occurs during hot-remove a memory device,
in which user is accessing the hugetlb. See calltrace as following:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790
Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s
mirror dm_region_hash dm_log dm_mod
CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ #492
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:do_user_addr_fault+0x2a0/0x790
Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41
RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046
RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658
R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000
FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __warn+0x8d/0x190
? do_user_addr_fault+0x2a0/0x790
? report_bug+0x1c3/0x1d0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? do_user_addr_fault+0x2a0/0x790
? exc_page_fault+0x31/0x200
exc_page_fault+0x68/0x200
<...snip...>
BUG: unable to handle page fault for address: 0000000000001000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
---[ end trace 0000000000000000 ]---
BUG: unable to handle page fault for address: 0000000000001000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ #492
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:dentry_name+0x1f4/0x440
<...snip...>
? dentry_name+0x2fa/0x440
vsnprintf+0x1f3/0x4f0
vprintk_store+0x23a/0x540
vprintk_emit+0x6d/0x330
_printk+0x58/0x80
dump_mapping+0x10b/0x1a0
? __pfx_free_object_rcu+0x10/0x10
__dump_page+0x26b/0x3e0
? vprintk_emit+0xe0/0x330
? _printk+0x58/0x80
? dump_page+0x17/0x50
dump_page+0x17/0x50
do_migrate_range+0x2f7/0x7f0
? do_migrate_range+0x42/0x7f0
? offline_pages+0x2f4/0x8c0
offline_pages+0x60a/0x8c0
memory_subsys_offline+0x9f/0x1c0
? lockdep_hardirqs_on+0x77/0x100
? _raw_spin_unlock_irqrestore+0x38/0x60
device_offline+0xe3/0x110
state_store+0x6e/0xc0
kernfs_fop_write_iter+0x143/0x200
vfs_write+0x39f/0x560
ksys_write+0x65/0xf0
do_syscall_64+0x62/0x130
Previously, some sanity check have been done in dump_mapping() before
the print facility parsing '%pd' though, it's still possible to run into
an invalid dentry.d_name.name.
Since dump_mapping() only needs to dump the filename only, retrieve it
by itself in a safer way to prevent an unnecessary crash.
Note that either retrieving the filename with '%pd' or
strncpy_from_kernel_nofault(), the filename could be unreliable.
Signed-off-by: Li Zhijian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Let the kememdup_array() take care about multiplication and possible
overflows.
v2:Add a new modification for reverse array.
Signed-off-by: Yu Jiaoliang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
In netfs_init() or fscache_proc_init(), we create dentry under 'fs/netfs',
but in netfs_exit(), we only delete the proc entry of 'fs/netfs' without
deleting its subtree. This triggers the following WARNING:
==================================================================
remove_proc_entry: removing non-empty directory 'fs/netfs', leaking at least 'requests'
WARNING: CPU: 4 PID: 566 at fs/proc/generic.c:717 remove_proc_entry+0x160/0x1c0
Modules linked in: netfs(-)
CPU: 4 UID: 0 PID: 566 Comm: rmmod Not tainted 6.11.0-rc3 #860
RIP: 0010:remove_proc_entry+0x160/0x1c0
Call Trace:
<TASK>
netfs_exit+0x12/0x620 [netfs]
__do_sys_delete_module.isra.0+0x14c/0x2e0
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================
Therefore use remove_proc_subtree() instead of remove_proc_entry() to
fix the above problem.
Fixes: 7eb5b3e3a0a5 ("netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a symlink")
Cc: [email protected]
Signed-off-by: Baokun Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: David Howells <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD().
Signed-off-by: Hongbo Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Christian Brauner <[email protected]> says:
I've recently looked for some free space in struct inode again because
of some exec kerfuffle we had and while my idea didn't turn into
anything I noticed that we often waste bytes when using wait bit
operations. So I set out to switch that to another mechanism that would
allow us to free up bytes. So this is an attempt to turn i_state from an
unsigned long into an u32 using the individual bytes of i_state as
addresses for the wait var event mechanism (Thanks to Linus for that idea.).
This survives LTP, xfstests on various filesystems, and will-it-scale.
* patches from https://lore.kernel.org/r/[email protected]:
inode: make i_state a u32
inode: port __I_LRU_ISOLATING to var event
inode: port __I_NEW to var event
inode: port __I_SYNC to var event
fs: reorder i_state bits
fs: add i_state helpers
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Now that we use the wait var event mechanism make i_state a u32 and free
up 4 bytes. This means we currently have two 4 byte holes in struct
inode which we can pack.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Port the __I_LRU_ISOLATING mechanism to use the new var event mechanism.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Hi, all
Recently I noticed a bug[1] in btrfs, after digged it into
and I believe it'a race in vfs.
Let's assume there's a inode (ie ino 261) with i_count 1 is
called by iput(), and there's a concurrent thread calling
generic_shutdown_super().
cpu0: cpu1:
iput() // i_count is 1
->spin_lock(inode)
->dec i_count to 0
->iput_final() generic_shutdown_super()
->__inode_add_lru() ->evict_inodes()
// cause some reason[2] ->if (atomic_read(inode->i_count)) continue;
// return before // inode 261 passed the above check
// list_lru_add_obj() // and then schedule out
->spin_unlock()
// note here: the inode 261
// was still at sb list and hash list,
// and I_FREEING|I_WILL_FREE was not been set
btrfs_iget()
// after some function calls
->find_inode()
// found the above inode 261
->spin_lock(inode)
// check I_FREEING|I_WILL_FREE
// and passed
->__iget()
->spin_unlock(inode) // schedule back
->spin_lock(inode)
// check (I_NEW|I_FREEING|I_WILL_FREE) flags,
// passed and set I_FREEING
iput() ->spin_unlock(inode)
->spin_lock(inode) ->evict()
// dec i_count to 0
->iput_final()
->spin_unlock()
->evict()
Now, we have two threads simultaneously evicting
the same inode, which may trigger the BUG(inode->i_state & I_CLEAR)
statement both within clear_inode() and iput().
To fix the bug, recheck the inode->i_count after holding i_lock.
Because in the most scenarios, the first check is valid, and
the overhead of spin_lock() can be reduced.
If there is any misunderstanding, please let me know, thanks.
[1]: https://lore.kernel.org/linux-btrfs/[email protected]/
[2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable()
return false when I reproduced the bug.
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad
CC: [email protected]
Fixes: 63997e98a3be ("split invalidate_inodes()")
Signed-off-by: Julian Sun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Port the __I_NEW mechanism to use the new var event mechanism.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Port the __I_SYNC mechanism to use the new var event mechanism.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
so that we can use the first bits to derive unique addresses from
i_state.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
The i_state member is an unsigned long so that it can be used with the
wait bit infrastructure which expects unsigned long. This wastes 4 bytes
which we're unlikely to ever use. Switch to using the var event wait
mechanism using the address of the bit. Thanks to Linus for the address
idea.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
The VFS git tree is missing from MAINTAINERS. Add it.
Signed-off-by: Eric Biggers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
The underscore variants are for uapi whereas the non-underscore variants
are for in-kernel consumers.
Link: https://lore.kernel.org/r/20240822-anwerben-nutzung-1cd6c82a565f@brauner
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
This helper has been unused for a while now.
Link: https://lore.kernel.org/r/20240822-bewuchs-werktag-46672b3c0606@brauner
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Most commonly neither I_LRU_ISOLATING nor I_SYNC are set, but the stock
kernel takes a back-to-back relock trip to check for them.
It probably can be avoided altogether, but for now massage things back
to just one lock acquire.
Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Zhihao Cheng <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
After its conversion to the new mount API, debugfs displays "none" in
/proc/mounts instead of the actual source. Fix this by recognising its
"source" mount option.
Signed-off-by: Marc Aurèle La France <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: a20971c18752 ("vfs: Convert debugfs to use the new mount API")
Cc: [email protected] # 6.10.x: 49abee5991e1: debugfs: Convert to new uid/gid option parsing helpers
Signed-off-by: Christian Brauner <[email protected]>
|
|
Afaict, we can just rely on inode->i_dio_count for waiting instead of
this awkward indirection through __I_DIO_WAKEUP. This survives LTP dio
and xfstests dio tests.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
In step 2, we obtain the kernel id `k1000`. So in next step (step
3), we should translate the `k1000` not `k21000`.
Signed-off-by: Hongbo Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Since in_group_or_capable has been exported, we can use
it to simplify the code when check group and capable.
Signed-off-by: Hongbo Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
According to bpftrace on these routines most calls result in cmpxchg,
which already provides the same guarantee.
In inode_maybe_inc_iversion elision is possible because even if the
wrong value was read due to now missing smp_mb fence, the issue is going
to correct itself after cmpxchg. If it appears cmpxchg wont be issued,
the fence + reload are there bringing back previous behavior.
Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Add ability to set per-dentry mount expire timeout to autofs.
There are two fairly well known automounter map formats, the autofs
format and the amd format (more or less System V and Berkley).
Some time ago Linux autofs added an amd map format parser that
implemented a fair amount of the amd functionality. This was done
within the autofs infrastructure and some functionality wasn't
implemented because it either didn't make sense or required extra
kernel changes. The idea was to restrict changes to be within the
existing autofs functionality as much as possible and leave changes
with a wider scope to be considered later.
One of these changes is implementing the amd options:
1) "unmount", expire this mount according to a timeout (same as the
current autofs default).
2) "nounmount", don't expire this mount (same as setting the autofs
timeout to 0 except only for this specific mount) .
3) "utimeout=<seconds>", expire this mount using the specified
timeout (again same as setting the autofs timeout but only for
this mount).
To implement these options per-dentry expire timeouts need to be
implemented for autofs indirect mounts. This is because all map keys
(mounts) for autofs indirect mounts use an expire timeout stored in
the autofs mount super block info. structure and all indirect mounts
use the same expire timeout.
Now I have a request to add the "nounmount" option so I need to add
the per-dentry expire handling to the kernel implementation to do this.
The implementation uses the trailing path component to identify the
mount (and is also used as the autofs map key) which is passed in the
autofs_dev_ioctl structure path field. The expire timeout is passed
in autofs_dev_ioctl timeout field (well, of the timeout union).
If the passed in timeout is equal to -1 the per-dentry timeout and
flag are cleared providing for the "unmount" option. If the timeout
is greater than or equal to 0 the timeout is set to the value and the
flag is also set. If the dentry timeout is 0 the dentry will not expire
by timeout which enables the implementation of the "nounmount" option
for the specific mount. When the dentry timeout is greater than zero it
allows for the implementation of the "utimeout=<seconds>" option.
Signed-off-by: Ian Kent <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
A soft lockup in ilookup was reported when stress-testing a 512-way
system [1] (see [2] for full context) and it was verified that not
taking the lock shifts issues back to mm.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
This is another flag that is statically set and doesn't need to use up
an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit.
(1) mem_open() used from proc_mem_operations
(2) adi_open() used from adi_fops
(3) drm_open_helper():
(3.1) accel_open() used from DRM_ACCEL_FOPS
(3.2) drm_open() used from
(3.2.1) amdgpu_driver_kms_fops
(3.2.2) psb_gem_fops
(3.2.3) i915_driver_fops
(3.2.4) nouveau_driver_fops
(3.2.5) panthor_drm_driver_fops
(3.2.6) radeon_driver_kms_fops
(3.2.7) tegra_drm_fops
(3.2.8) vmwgfx_driver_fops
(3.2.9) xe_driver_fops
(3.2.10) DRM_GEM_FOPS
(3.2.11) DEFINE_DRM_GEM_DMA_FOPS
(4) struct memdev sets fmode flags based on type of device opened. For
devices using struct mem_fops unsigned offset is used.
Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts
into the open helper to ensure that the flag is always set.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
In do_dentry_open() the usage is:
f->f_op = fops_get(inode->i_fop);
In generated asm the compiler emits 2 reads from inode->i_fop instead of
just one.
This popped up due to false-sharing where loads from that offset end up
bouncing a cacheline during parallel open. While this is going to be fixed,
the spurious load does not need to be there.
This makes do_dentry_open() go down from 1177 to 1154 bytes.
fops_put() is patched to maintain some consistency.
No functional changes.
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Add the __counted_by compiler attribute to the flexible array member
entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Signed-off-by: Thorsten Blum <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
If we find a positive dentry we can now simply try and open it. All
prelimiary checks are already done with or without O_CREAT.
Signed-off-by: Christian Brauner <[email protected]>
|
|
Now that we audit later during lookup_open() we can remove the audit
dummy context check. This simplifies things a lot.
Signed-off-by: Christian Brauner <[email protected]>
|
|
Perform the check for trailing slashes right in the fastpath check and
don't bother with any additional work.
Signed-off-by: Christian Brauner <[email protected]>
|
|
During O_CREAT we unconditionally audit the parent inode. This makes it
difficult to support a fastpath for O_CREAT when the file already exists
because we have to drop out of RCU lookup needlessly.
We worked around this by checking whether audit was actually active but
that's also suboptimal. Instead, move the audit of the parent inode down
into lookup_open() at a point where it's mostly certain that the file
needs to be created.
This also reduced the inconsistency that currently exists: while audit
on the parent is done independent of whether or no the file already
existed an audit on the file is only performed if it has been created.
By moving the audit down a bit we emit the audit a little later but it
will allow us to simplify the fastpath for O_CREAT significantly.
Signed-off-by: Christian Brauner <[email protected]>
|
|
Today, when opening a file we'll typically do a fast lookup, but if
O_CREAT is set, the kernel always takes the exclusive inode lock. I
assume this was done with the expectation that O_CREAT means that we
always expect to do the create, but that's often not the case. Many
programs set O_CREAT even in scenarios where the file already exists.
This patch rearranges the pathwalk-for-open code to also attempt a
fast_lookup in certain O_CREAT cases. If a positive dentry is found, the
inode_lock can be avoided altogether, and if auditing isn't enabled, it
can stay in rcuwalk mode for the last step_into.
One notable exception that is hopefully temporary: if we're doing an
rcuwalk and auditing is enabled, skip the lookup_fast. Legitimizing the
dentry in that case is more expensive than taking the i_rwsem for now.
Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
A struct eventpoll's busy_poll_usecs field can be modified via a user
ioctl at any time. All reads of this field should be annotated with
READ_ONCE.
Fixes: 85455c795c07 ("eventpoll: support busy poll per epoll instance")
Cc: [email protected]
Signed-off-by: Martin Karsten <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Joe Damato <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Remove redundant and unnecessary code.
ep_alloc uses kzalloc to create struct eventpoll, so there is no need to
set fields to defaults of 0. This was accidentally introduced in commit
85455c795c07 ("eventpoll: support busy poll per epoll instance") and
expanded on in follow-up commits.
Signed-off-by: Joe Damato <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Martin Karsten <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Fixed 3 typos in design.rst
Signed-off-by: Xiaxi Shen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Carlos Maiolino <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
These inlines show up in the fast path (e.g., in do_dentry_open()) and
induce said full barrier regarding i_flctx access when in most cases the
pointer is NULL.
The pointer can be safely checked before issuing the barrier, dodging it
in most cases as a result.
It is plausible the consume fence would be sufficient, but I don't want
to go audit all callers regarding what they before calling here.
Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
The comment on EXPORT_SYMBOL(close_fd) was added in commit 2ca2a09d6215
("fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()"),
before commit 8760c909f54a ("file: Rename __close_fd to close_fd and remove
the files parameter") gave the function its current name, however commit
1572bfdf21d4 ("file: Replace ksys_close with close_fd") removes the
referenced caller entirely, obsoleting this comment.
Signed-off-by: Joel Savitz <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
replace 'permanetly' with 'permanently' in the comment &
replace 'propogated' with 'propagated' in the comment
Signed-off-by: Yuesong Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Both i_mode and noexec checks wrapped in WARN_ON stem from an artifact
of the previous implementation. They used to legitimately check for the
condition, but that got moved up in two commits:
633fb6ac3980 ("exec: move S_ISREG() check earlier")
0fd338b2d2cd ("exec: move path_noexec() check earlier")
Instead of being removed said checks are WARN_ON'ed instead, which
has some debug value.
However, the spurious path_noexec check is racy, resulting in
unwarranted warnings should someone race with setting the noexec flag.
One can note there is more to perm-checking whether execve is allowed
and none of the conditions are guaranteed to still hold after they were
tested for.
Additionally this does not validate whether the code path did any perm
checking to begin with -- it will pass if the inode happens to be
regular.
Keep the redundant path_noexec() check even though it's mindless
nonsense checking for guarantee that isn't given so drop the WARN.
Reword the commentary and do small tidy ups while here.
Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[brauner: keep redundant path_noexec() check]
Signed-off-by: Christian Brauner <[email protected]>
|
|
The lookup_fast helper in fs/namei.c has some subtlety in how dentries
are returned. Document them.
Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
This function no longer exists.
Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Commit 2eea9ce4310d ("mounts: keep list of mounts in an rbtree")
removed the implementation but leave declaration.
Signed-off-by: Yue Haibing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
In the function percpu_rwsem_release, the parameter `read`
is unused, so remove it.
Signed-off-by: Wang Long <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
While the old code did support FSCONFIG_SET_FD, there's no need to
re-get the file the fs_context infrastructure already grabbed for us.
Signed-off-by: Aleksa Sarai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
If you pass an fd using FSCONFIG_SET_FD, autofs_parse_fd() "steals" the
param->file and so the fs_context infrastructure will not do fput() for
us.
Fixes: e6ec453bd0f0 ("autofs: convert autofs to use the new mount api")
Signed-off-by: Aleksa Sarai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
__percpu annotation of *cpu pointer in struct kioctx is put at
the wrong place, resulting in several sparse warnings:
aio.c:623:24: warning: incorrect type in argument 1 (different address spaces)
aio.c:623:24: expected void [noderef] __percpu *__pdata
aio.c:623:24: got struct kioctx_cpu *cpu
aio.c:788:18: warning: incorrect type in assignment (different address spaces)
aio.c:788:18: expected struct kioctx_cpu *cpu
aio.c:788:18: got struct kioctx_cpu [noderef] __percpu *
aio.c:835:24: warning: incorrect type in argument 1 (different address spaces)
aio.c:835:24: expected void [noderef] __percpu *__pdata
aio.c:835:24: got struct kioctx_cpu *cpu
aio.c:940:16: warning: incorrect type in initializer (different address spaces)
aio.c:940:16: expected void const [noderef] __percpu *__vpp_verify
aio.c:940:16: got struct kioctx_cpu *
aio.c:958:16: warning: incorrect type in initializer (different address spaces)
aio.c:958:16: expected void const [noderef] __percpu *__vpp_verify
aio.c:958:16: got struct kioctx_cpu *
Put __percpu annotation at the right place to fix these warnings.
Signed-off-by: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
If no page could be allocated, an error pointer was used as format
string in pr_warn.
Rearrange the code to return early in case of OOM. Also add a check
for the return value of d_path.
Fixes: f8b92ba67c5d ("mount: Add mount warning for impending timestamp expiry")
Signed-off-by: Olaf Hering <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[brauner: rewrite commit and commit message]
Signed-off-by: Christian Brauner <[email protected]>
|
|
improved
Fixed-
WARNING: Missing a blank line after declarations
WARNING: Missing a blank line after declarations
Declaration format: improved struct file declaration format
Signed-off-by: Mohit0404 <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
After commit c22198e78d52 ("direct-io: remove random prefetches"), Nothing
in this file needs anything from `linux/prefetch.h`.
Signed-off-by: Youling Tang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
When deactivating any type of superblock, it had to wait for the in-flight
wb switches to be completed. wb switches are executed in inode_switch_wbs_work_fn()
which needs to acquire the wb_switch_rwsem and races against sync_inodes_sb().
If there are too much dirty data in the superblock, the waiting time may increase
significantly.
For superblocks without cgroup writeback such as tmpfs, they have nothing to
do with the wb swithes, so the flushing can be avoided.
Signed-off-by: Haifeng Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|