aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-09bcachefs: Fix a spelling error in docsXiaxi Shen1-1/+1
Signed-off-by: Xiaxi Shen <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: promote_whole_extents is now a normal optionKent Overstreet7-9/+11
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Move rebalance_status out of sysfs/internalKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: remove the unused parameter in macro bkey_crc_nextJulian Sun1-2/+2
In the macro definition of bkey_crc_next, five parameters were accepted, but only four of them were used. Let's remove the unused one. The patch has only passed compilation tests, but it should be fine. Signed-off-by: Julian Sun <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: fix macro definition allocate_dropping_locksJulian Sun1-1/+1
The macro allocate_dropping_locks accepts a parameter _trans, but it was not used, rather the variable trans was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: fix macro definition allocate_dropping_locks_errcodeJulian Sun1-1/+1
The macro allocate_dropping_locks_errocode accepts a parameter _trans, but it was not used, rather the variable trans was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: remove the unused macro definitionJulian Sun1-9/+0
macro bch2_kthread_wait_event_ioclock_timeout is no longer used, let's remove it. The patch has passed compilation test. Signed-off-by: Julian Sun <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: quota_reserve_range() -> for_each_btree_key_in_subvolume_uptoKent Overstreet1-32/+14
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: bch2_folio_set() -> for_each_btree_key_in_subvolume_uptoKent Overstreet1-55/+35
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: range_has_data() -> for_each_btree_key_in_subvolume_uptoKent Overstreet1-24/+5
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: bch2_seek_hole() -> for_each_btree_key_in_subvolume_uptoKent Overstreet1-36/+20
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: bch2_seek_data() -> for_each_btree_key_in_subvolume_uptoKent Overstreet1-29/+12
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: bch2_xattr_list() -> for_each_btree_key_in_subvolume_uptoKent Overstreet1-43/+12
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: bch2_readdir() -> for_each_btree_key_in_subvolume_uptoKent Overstreet1-49/+17
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: for_each_btree_key_in_subvolume_upto()Kent Overstreet1-0/+45
New helper for looping over keys in a given subvolume Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: bch2_fiemap(): call trans_begin() on every loop iterKent Overstreet1-18/+22
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: bchfs_read(): call trans_begin() on every loop iterKent Overstreet2-29/+14
Same as the recent change for __bch2_read(); also, kill now unnecessary btree_trans_too_many_iters() calls. Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: kill bch2_btree_iter_peek_and_restart()Kent Overstreet2-14/+1
dead code Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Btree path tracepointsKent Overstreet7-22/+508
Fastpath tracepoints, rarely needed, only enabled with CONFIG_BCACHEFS_PATH_TRACEPOINTS. Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Add check for btree_path ref overflowKent Overstreet3-21/+31
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Don't delete open files in online fsckKent Overstreet3-0/+33
If a file is unlinked but still open, we don't want online fsck to delete it - or fun inconsistencies will happen. https://github.com/koverstreet/bcachefs/issues/727 Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Mark bch_inode_info as SLAB_ACCOUNTYouling Tang1-1/+2
After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need to mark the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 ("kmemcg: account for certain kmem allocations to memcg") Signed-off-by: Youling Tang <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: fix btree_key_cache sysfs knobKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: allocate inode by using alloc_inode_sb()Youling Tang1-1/+2
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() to alloc_inode_sb(). It will also fix [1] to avoid the NULL pointer dereference BUG in list_lru_add() when CONFIG_MEMCG is enabled. Links: [1]: https://lore.kernel.org/all/[email protected]/ [2]: https://lore.kernel.org/all/[email protected]/ Fixes: 86d81ec5f5f0 ("bcachefs: Mark bch_inode_info as SLAB_ACCOUNT") Signed-off-by: Youling Tang <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Opt_durability can now be set via bch2_opt_set_sb()Kent Overstreet3-23/+22
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: bch2_opt_set_sb() can now set (some) device optionsKent Overstreet3-20/+43
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: data_allowed is now an opts.h optionKent Overstreet4-2/+17
need this so cmd_option in userspace can handle it Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Annotate struct bucket_array with __counted_by()Thorsten Blum1-1/+1
Add the __counted_by compiler attribute to the flexible array member bucket to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Signed-off-by: Thorsten Blum <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Fix format specifier in bch2_btree_key_cache_to_text()Nathan Chancellor1-1/+1
When building for a 32-bit architecture, for which 'size_t' is 'unsigned int', there is a compiler warning due to use of '%lu': In file included from fs/bcachefs/vstructs.h:5, from fs/bcachefs/bcachefs_format.h:80, from fs/bcachefs/bcachefs.h:207, from fs/bcachefs/btree_key_cache.c:3: fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text': fs/bcachefs/btree_key_cache.c:795:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 795 | prt_printf(out, "pending:\t%lu\r\n", per_cpu_sum(bc->nr_pending)); | ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/util.h:78:63: note: in definition of macro 'prt_printf' 78 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:795:38: note: format string is defined here 795 | prt_printf(out, "pending:\t%lu\r\n", per_cpu_sum(bc->nr_pending)); | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as errors Use the proper specifier, '%zu', to resolve the warning. Fixes: e447e49977b8 ("bcachefs: key cache can now allocate from pending") Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: key cache can now allocate from pendingKent Overstreet3-15/+50
btree_trans objects can hold the btree_trans_barrier srcu read lock for an extended amount of time (they shouldn't, but it's difficult to guarantee). the srcu barrier blocks memory reclaim, so to avoid too many stranded key cache items, this uses the new pending_rcu_items to allocate from pending items - like we did before, but now without a global lock on the key cache. Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Rip out freelists from btree key cacheKent Overstreet3-328/+55
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: rcu_pending now works in userspaceKent Overstreet2-4/+53
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: rcu_pendingKent Overstreet3-0/+629
Generic data structure for explicitly tracking pending RCU items, allowing items to be dequeued (i.e. allocate from items pending freeing). Works with conventional RCU and SRCU, and possibly other RCU flavors in the future, meaning this can serve as a more generic replacement for SLAB_TYPESAFE_BY_RCU. Pending items are tracked in radix trees; if memory allocation fails, we fall back to linked lists. A rcu_pending is initialized with a callback, which is invoked when pending items's grace periods have expired. Two types of callback processing are handled specially: - RCU_PENDING_KVFREE_FN New backend for kvfree_rcu(). Slightly faster, and eliminates the synchronize_rcu() slowpath in kvfree_rcu_mightsleep() - instead, an rcu_head is allocated if we don't have one and can't use the radix tree TODO: - add a shrinker (as in the existing kvfree_rcu implementation) so that memory reclaim can free expired objects if callback processing isn't keeping up, and to expedite a grace period if we're under memory pressure and too much memory is stranded by RCU - add a counter for amount of memory pending - RCU_PENDING_CALL_RCU_FN Accelerated backend for call_rcu() - pending callbacks are tracked in a radix tree to eliminate linked list overhead. to serve as replacement backends for kvfree_rcu() and call_rcu(); these may be of interest to other uses (e.g. SLAB_TYPESAFE_BY_RCU users). Note: Internally, we're using a single rearming call_rcu() callback for notifications from the core RCU subsystem for notifications when objects are ready to be processed. Ideally we would be getting a callback every time a grace period completes for which we have objects, but that would require multiple rcu_heads in flight, and since the number of gp sequence numbers with uncompleted callbacks is not bounded, we can't do that yet. Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09lib/generic-radix-tree.c: add preallocationKent Overstreet2-17/+37
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09lib/generic-radix-tree.c: genradix_ptr_inlined()Kent Overstreet2-63/+76
Provide an inlined fast path Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Fix deadlock in __wait_on_freeing_inode()Kent Overstreet1-20/+47
We can't call __wait_on_freeing_inode() with btree locks held; we're waiting on another thread that's in evict(), and before it clears that bit it needs to write that inode to flush timestamps - deadlock. Fixing this involves a fair amount of re-jiggering to plumb a new transaction restart. Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: switch to rhashtable for vfs inodes hashKent Overstreet12-89/+160
the standard vfs inode hash table suffers from painful lock contention - this is long overdue Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09inode: make __iget() a static inlineKent Overstreet2-9/+8
bcachefs is switching to an rhashtable for vfs inodes instead of the standard inode.c hashtable, so we need this exported, or - a static inline makes more sense for a single atomic_inc(). Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Replace div_u64 with div64_u64 where second param is u64Reed Riley2-6/+6
Bcachefs often uses this function to divide by nanosecond times - which can easily cause problems when cast to u32. For example, `cat /sys/fs/bcachefs/*/internal/rebalance_status` would return invalid data in the `duration waited` field because dividing by the number of nanoseconds in a minute requires the divisor parameter to be u64. Signed-off-by: Reed Riley <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Fix sysfs rebalance duration waited formattingFeiko Nanninga1-1/+1
cat /sys/fs/bcachefs/*/internal/rebalance_status waiting io wait duration: 13.5 GiB io wait remaining: 627 MiB duration waited: 1392 m duration waited was increasing at a rate of about 14 times the expected rate. div_u64 takes a u32 divisor, but u->nsecs (from time_units[]) can be bigger than u32. Signed-off-by: Feiko Nanninga <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Fix negative timespecsAlyssa Ross1-2/+5
This fixes two problems in the handling of negative times: • rem is signed, but the rem * c->sb.nsec_per_time_unit operation produced a bogus unsigned result, because s32 * u32 = u32. • The timespec was not normalized (it could contain more than a billion nanoseconds). For example, { .tv_sec = -14245441, .tv_nsec = 750000000 }, after being round tripped through timespec_to_bch2_time and then bch2_time_to_timespec would come back as { .tv_sec = -14245440, .tv_nsec = 4044967296 } (more than 4 billion nanoseconds). Cc: [email protected] Fixes: 595c1e9bab7f ("bcachefs: Fix time handling") Closes: https://github.com/koverstreet/bcachefs/issues/743 Co-developed-by: Erin Shepherd <[email protected]> Signed-off-by: Erin Shepherd <[email protected]> Co-developed-by: Ryan Lahfa <[email protected]> Signed-off-by: Ryan Lahfa <[email protected]> Signed-off-by: Alyssa Ross <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: More BCH_SB_MEMBER_INVALID supportKent Overstreet4-9/+17
Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Simplify bch2_bkey_drop_ptrs()Kent Overstreet2-30/+14
bch2_bkey_drop_ptrs() had a some complicated machinery for avoiding O(n^2) when dropping multiple pointers - but when n is only going to be ~4, it's not worth it. Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Add a cond_resched() to __journal_keys_sort()Kent Overstreet1-0/+2
Without this, we'd potentially sort multiple times without a cond_resched(), leading to hung task warnings on larger systems. Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09bcachefs: Fix ca->io_ref usageKent Overstreet1-12/+12
ca->io_ref does not protect against the filesystem going way, c->write_ref does. Much like 0b50b7313ef2 bcachefs: Fix refcounting in discard path the other async paths need fixing. Signed-off-by: Kent Overstreet <[email protected]>
2024-09-09xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()Nathan Chancellor1-0/+2
Clang warns (or errors with CONFIG_WERROR): net/xfrm/xfrm_policy.c:1286:8: error: variable 'dir' is uninitialized when used here [-Werror,-Wuninitialized] 1286 | if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) { | ^~~ net/xfrm/xfrm_policy.c:1257:9: note: initialize the variable 'dir' to silence this warning 1257 | int dir; | ^ | = 0 1 error generated. A recent refactoring removed some assignments to dir because xfrm_policy_is_dead_or_sk() has a dir assignment in it. However, dir is used elsewhere in xfrm_hash_rebuild(), including within loops where it needs to be reloaded for each policy. Restore the assignments before the first use of dir to fix the warning and ensure dir is properly initialized throughout the function. Fixes: 08c2182cf0b4 ("xfrm: policy: use recently added helper in more places") Acked-by: Florian Westphal <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2024-09-09xfrm: policy: fix null dereferenceFlorian Westphal1-2/+2
Julian Wiedmann says: > + if (!xfrm_pol_hold_rcu(ret)) Coverity spotted that ^^^ needs a s/ret/pol fix-up: > CID 1599386: Null pointer dereferences (FORWARD_NULL) > Passing null pointer "ret" to "xfrm_pol_hold_rcu", which dereferences it. Ditch the bogus 'ret' variable. Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype") Reported-by: Julian Wiedmann <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2024-09-09Merge branch 'unmask-dscp-part-four'David S. Miller10-14/+23
Ido Schimmel says: ==================== Unmask upper DSCP bits - part 4 (last) tl;dr - This patchset finishes to unmask the upper DSCP bits in the IPv4 flow key in preparation for allowing IPv4 FIB rules to match on DSCP. No functional changes are expected. The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB lookup to match against the TOS selector in FIB rules and routes. It is currently impossible for user space to configure FIB rules that match on the DSCP value as the upper DSCP bits are either masked in the various call sites that initialize the IPv4 flow key or along the path to the FIB core. In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we need to make sure the entire DSCP value is present in the IPv4 flow key. This patchset finishes to unmask the upper DSCP bits by adjusting all the callers of ip_route_output_key() to properly initialize the full DSCP value in the IPv4 flow key. No functional changes are expected as commit 1fa3314c14c6 ("ipv4: Centralize TOS matching") moved the masking of the upper DSCP bits to the core where 'flowi4_tos' is matched against the TOS selector. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-09-09sctp: Unmask upper DSCP bits in sctp_v4_get_dst()Ido Schimmel1-1/+2
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Note that the 'tos' variable holds the full DS field. Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Reviewed-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-09-09ipv4: udp_tunnel: Unmask upper DSCP bits in udp_tunnel_dst_lookup()Ido Schimmel1-1/+2
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Note that callers of udp_tunnel_dst_lookup() pass the entire DS field in the 'tos' argument. Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>