Age | Commit message (Collapse) | Author | Files | Lines |
|
When bch2_btree_iter_peek_slot() clones the iterator to search for the
next key, and then discovers that the key from the cloned iterator is
the key we want to return - we also want to save the
iter->key_cache_path as well, for the update path.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Various phases of fsck involve checking references from one btree to
another: this means doing a sequential scan of one btree, and then
mostly random access into the second.
This is particularly painful for checking extents <-> backpointers; we
can prefetch btree node access on the sequential scan, but not on the
random access portion, and this is particularly painful on spinning
rust, where we'd like to keep the pipeline fairly full of btree node
reads so that the elevator can reduce seeking.
This patch implements prefetching and pinning of the portion of the
btree that we'll be doing random access to. We already calculate how
much of the random access btree will fit in memory so it's a fairly
straightforward change.
This will put more pressure on system memory usage, so we introduce a
new option, fsck_memory_usage_percent, which is the percentage of total
system ram that fsck is allowed to pin.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Recursively destroying subvolumes isn't allowed yet.
Fixes: https://github.com/koverstreet/bcachefs/issues/634
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Add a btree to record a parent -> child subvolume relationships,
according to the filesystem heirarchy.
The subvolume_children btree is a bitset btree: if a bit is set at pos
p, that means p.offset is a child of subvolume p.inode.
This will be used for efficiently listing subvolumes, as well as
recursive deletion.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Record the filesystem path heirarchy for subvolumes in bch_subvolume
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Provide a non-write buffer version of bch2_btree_bit_mod_buffered(), for
the subvolume children btree.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Subvolumes need special handling to reattach - we always reattach them
in the root subvolume's lost+found, and they need a slightly different
kind of dirent.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
prep work for improving logging/error messages
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
bit of renaming, prep for adding a fs path parent
Signed-off-by: Kent Overstreet <[email protected]>
|
|
when repair switches d_type to or from DT_SUBVOL, we need to update the
target accordingly
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Check that d_parent_subvol makes sense - the dirent's snapshot must be
visible in d_parent_subvol (i.e. an ancestor of d_parent_subvol's
snapshot) in order to be visible.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
these were already checked in check_subvol()
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
check for inodes with a nonzero bi_parent_subvol field that aren't
actually subvolume roots
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
check that if an inode has a backpointer, the dirent it points to points
back to it.
We do this in check_dirent_inode_dirent(), but only for inodes that have
dirents that point to them - we also have to do the check starting from
the inode to catch inodes that don't have dirents that point to them.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Subvolumes and subvolume root inodes point to each other: this verifies
the subvolume -> inode -> subvolme path.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Add a flush op, to return the exit code via close().
Also update bcachefs usage to use this to return fsck exit codes.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Fixes: https://lore.kernel.org/linux-bcachefs/202402131603.E953E2CF@keescook/T/#u
Reported-by: coverity scan
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Make it so that a thread_with_stdio user can handle ioctls against the
file descriptor.
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Create an ops structure so we can add more file-based functionality in
the next few patches.
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Experimentally fix some problems with stdio_redirect_vprintf by creating
a MOO variant with which we can experiment. We can't do a GFP_KERNEL
allocation while holding the spinlock, and I don't like how the printf
function can silently truncate the output if memory allocation fails.
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Create a new run_thread_with_stdout function that opens a file in
O_RDONLY mode so that the kernel can write things to userspace but
userspace cannot write to the kernel. This will be used to convey xfs
health event information to userspace.
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
needed for thread_with_file; also rare but not unheard of to need this
in module code, when blocking on user input.
one workaround used by some code is wait_event_interruptible() - but
that can be buggy if the outer context isn't expecting unwinding.
Signed-off-by: Kent Overstreet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: fuyuanli <[email protected]>
|
|
This fixes stdio_redirect_read() getting stuck, not noticing that the
pipe has been closed.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
A simple example of the utility of this autocleanup approach to
handling of_node_put().
In this particular case some of the nodes needed for the test are
not available and the _available_ version would cause them to be
skipped resulting in a test failure.
Signed-off-by: Jonathan Cameron <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
handling
To avoid issues with out of order cleanup, or ambiguity about when the
auto freed data is first instantiated, do it within the for loop definition.
The disadvantage is that the struct device_node *child variable creation
is not immediately obvious where this is used.
However, in many cases, if there is another definition of
struct device_node *child; the compiler / static analysers will notify us
that it is unused, or uninitialized.
Note that, in the vast majority of cases, the _available_ form should be
used and as code is converted to these scoped handers, we should confirm
that any cases that do not check for available have a good reason not
to.
Signed-off-by: Jonathan Cameron <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
The recent addition of scope based cleanup support to the kernel
provides a convenient tool to reduce the chances of leaking reference
counts where of_node_put() should have been called in an error path.
This enables
struct device_node *child __free(device_node) = NULL;
for_each_child_of_node(np, child) {
if (test)
return test;
}
with no need for a manual call of of_node_put().
A following patch will reduce the scope of the child variable to the
for loop, to avoid an issues with ordering of autocleanup, and make it
obvious when this assigned a non NULL value.
In this simple example the gains are small but there are some very
complex error handling cases buried in these loops that will be
greatly simplified by enabling early returns with out the need
for this manual of_node_put() call.
Note that there are coccinelle checks in
scripts/coccinelle/iterators/for_each_child.cocci to detect a failure
to call of_node_put(). This new approach does not cause false positives.
Longer term we may want to add scripting to check this new approach is
done correctly with no double of_node_put() calls being introduced due
to the auto cleanup. It may also be useful to script finding places
this new approach is useful.
Signed-off-by: Jonathan Cameron <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
This fixes a bug where we'd return data without waiting for a newline,
if data was present but a newline was not.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Move the cleanup code to a wrapper function, where we can call it after
the thread_with_stdio fn exits.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
- eliminate the dependency on printbufs, so that we can lift
thread_with_file for use in xfs
- add a nonblocking parameter to stdio_redirect_printf(), and either
block if the buffer is full or drop it on the floor - don't buffer
infinitely
Signed-off-by: Kent Overstreet <[email protected]>
|
|
The output buffer lock has to be a spinlock so that we can write to it
from interrupt context, so we can't use a direct copy_to_user; this
switches thread_with_file_read() to use fault_in_writeable() and
copy_to_user_nofault(), similar to how thread_with_file_write() works.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Add mempool_init_kvmalloc_pool() and mempool_create_kvmalloc_pool(),
which wrap kvmalloc() instead of kmalloc() - kmalloc() with a vmalloc()
fallback.
This is part of a bcachefs cleanup - dropping an internal kvpmalloc()
helper (which predates kvmalloc()) along with mempool helpers; this
replaces the bcachefs-private kvpmalloc_pool.
Signed-off-by: Kent Overstreet <[email protected]>
Cc: [email protected]
|
|
The i should be signed to find out the end of the loop. Otherwise,
i >= 0 is always true and loop becomes infinite. Make its type to be
int.
Fixes: 6a9eda34418f ("mtd: spi-nor: core: set mtd->eraseregions for non-uniform erase map")
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Reviewed-by: Tudor Ambarus <[email protected]>
Reviewed-by: Michael Walle <[email protected]>
Reviewed-by: Dan Carpenter <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]
|
|
SPI NOR gets the non uniform erase code cleaned. We stopped using
bitmasks for erase types and flags, and instead introduced dedicated
members. We then passed the SPI NOR erase map to MTD. Users can now
determine the erase regions and make informed decisions on partitions
size.
|
|
The code "max(1U, 3 * (1U << shift) / 4)" comes from the Kyber I/O
scheduler. The Kyber I/O scheduler maintains one internal queue per hwq
and hence derives its async_depth from the number of hwq tags. Using
this approach for the mq-deadline scheduler is wrong since the
mq-deadline scheduler maintains one internal queue for all hwqs
combined. Hence this revert.
Cc: [email protected]
Cc: Damien Le Moal <[email protected]>
Cc: Harshit Mogalapalli <[email protected]>
Cc: Zhiguo Niu <[email protected]>
Fixes: d47f9717e5cf ("block/mq-deadline: use correct way to throttling write requests")
Signed-off-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, isofs, udf, and quota updates from Jan Kara:
"A lot of material this time:
- removal of a lot of GFP_NOFS usage from ext2, udf, quota (either it
was legacy or replaced with scoped memalloc_nofs_*() API)
- removal of BUG_ONs in quota code
- conversion of UDF to the new mount API
- tightening quota on disk format verification
- fix some potentially unsafe use of RCU pointers in quota code and
annotate everything properly to make sparse happy
- a few other small quota, ext2, udf, and isofs fixes"
* tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (26 commits)
udf: remove SLAB_MEM_SPREAD flag usage
quota: remove SLAB_MEM_SPREAD flag usage
isofs: remove SLAB_MEM_SPREAD flag usage
ext2: remove SLAB_MEM_SPREAD flag usage
ext2: mark as deprecated
udf: convert to new mount API
udf: convert novrs to an option flag
MAINTAINERS: add missing git address for ext2 entry
quota: Detect loops in quota tree
quota: Properly annotate i_dquot arrays with __rcu
quota: Fix rcu annotations of inode dquot pointers
isofs: handle CDs with bad root inode but good Joliet root directory
udf: Avoid invalid LVID used on mount
quota: Fix potential NULL pointer dereference
quota: Drop GFP_NOFS instances under dquot->dq_lock and dqio_sem
quota: Set nofs allocation context when acquiring dqio_sem
ext2: Remove GFP_NOFS use in ext2_xattr_cache_insert()
ext2: Drop GFP_NOFS use in ext2_get_blocks()
ext2: Drop GFP_NOFS allocation from ext2_init_block_alloc_info()
udf: Remove GFP_NOFS allocation in udf_expand_file_adinicb()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
- fsnotify optimizations to reduce cost of fsnotify when nobody is
watching
- fix longstanding wart that system could not be suspended when some
process was waiting for response to fanotify permission event
- some spelling fixes
* tag 'fsnotify_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: allow freeze when waiting response for permission events
fanotify: Fix misspelling of "writable"
fsnotify: Fix misspelling of "writable"
inotify: Fix misspelling of "writable"
fsnotify: Add fsnotify_sb_has_watchers() helper
fsnotify: optimize the case of no parent watcher
|
|
Pull xfs updates from Chandan Babu:
- Online repair updates:
- More ondisk structures being repaired:
- Inode's mode field by trying to obtain file type value from
the a directory entry
- Quota counters
- Link counts of inodes
- FS summary counters
- Support for in-memory btrees has been added to support repair
of rmap btrees
- Misc changes:
- Report corruption of metadata to the health tracking subsystem
- Enable indirect health reporting when resources are scarce
- Reduce memory usage while repairing refcount btree
- Extend "Bmap update" intent item to support atomic extent
swapping on the realtime device
- Extend "Bmap update" intent item to support extended attribute
fork and unwritten extents
- Code cleanups:
- Bmap log intent
- Btree block pointer checking
- Btree readahead
- Buffer target
- Symbolic link code
- Remove mrlock wrapper around the rwsem
- Convert all the GFP_NOFS flag usages to use the scoped
memalloc_nofs_save() API instead of direct calls with the GFP_NOFS
- Refactor and simplify xfile abstraction. Lower level APIs in shmem.c
are required to be exported in order to achieve this
- Skip checking alignment constraints for inode chunk allocations when
block size is larger than inode chunk size
- Do not submit delwri buffers collected during log recovery when an
error has been encountered
- Fix SEEK_HOLE/DATA for file regions which have active COW extents
- Fix lock order inversion when executing error handling path during
shrinking a filesystem
- Remove duplicate ifdefs
* tag 'xfs-6.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (183 commits)
xfs: shrink failure needs to hold AGI buffer
mm/shmem.c: Use new form of *@param in kernel-doc
kernel-doc: Add unary operator * to $type_param_ref
xfs: use kvfree() in xlog_cil_free_logvec()
xfs: xfs_btree_bload_prep_block() should use __GFP_NOFAIL
xfs: fix scrub stats file permissions
xfs: fix log recovery erroring out on refcount recovery failure
xfs: move symlink target write function to libxfs
xfs: move remote symlink target read function to libxfs
xfs: move xfs_symlink_remote.c declarations to xfs_symlink_remote.h
xfs: xfs_bmap_finish_one should map unwritten extents properly
xfs: support deferred bmap updates on the attr fork
xfs: support recovering bmap intent items targetting realtime extents
xfs: add a realtime flag to the bmap update log redo items
xfs: add a xattr_entry helper
xfs: fix xfs_bunmapi to allow unmapping of partial rt extents
xfs: move xfs_bmap_defer_add to xfs_bmap_item.c
xfs: reuse xfs_bmap_update_cancel_item
xfs: add a bi_entry helper
xfs: remove xfs_trans_set_bmap_flags
...
|
|
- small cleanup of struct->wacom_wac usage (Tatsunosuke Tobita)
|
|
- removal of unused data structures all over the place (Jiri Slaby)
|