Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull more bcachefs fixes from Kent Overstreet:
"Notable user impacting bugs
- On multi device filesystems, recovery was looping in
btree_trans_too_many_iters(). This checks if a transaction has
touched too many btree paths (because of iteration over many keys),
and isuses a restart to drop unneeded paths.
But it's now possible for some paths to exceed the previous limit
without iteration in the interior btree update path, since the
transaction commit will do alloc updates for every old and new
btree node, and during journal replay we don't use the btree write
buffer for locking reasons and thus those updates use btree paths
when they wouldn't normally.
- Fix a corner case in rebalance when moving extents on a
durability=0 device. This wouldn't be hit when a device was
formatted with durability=0 since in that case we'll only use it as
a write through cache (only cached extents will live on it), but
durability can now be changed on an existing device.
- bch2_get_acl() could rarely forget to handle a transaction restart;
this manifested as the occasional missing acl that came back after
dropping caches.
- Fix a major performance regression on high iops multithreaded write
workloads (only since 6.9-rc1); a previous fix for a deadlock in
the interior btree update path to check the journal watermark
introduced a dependency on the state of btree write buffer flushing
that we didn't want.
- Assorted other repair paths and recovery fixes"
* tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs: (25 commits)
bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()
bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
bcachefs: Fix a race in btree_update_nodes_written()
bcachefs: btree_node_scan: Respect member.data_allowed
bcachefs: Don't scan for btree nodes when we can reconstruct
bcachefs: Fix check_topology() when using node scan
bcachefs: fix eytzinger0_find_gt()
bcachefs: fix bch2_get_acl() transaction restart handling
bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu list
bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()
bcachefs: Rename struct field swap to prevent macro naming collision
MAINTAINERS: Add entry for bcachefs documentation
Documentation: filesystems: Add bcachefs toctree
bcachefs: JOURNAL_SPACE_LOW
bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINE
bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystems
bcachefs: fix rand_delete unit test
bcachefs: fix ! vs ~ typo in __clear_bit_le64()
bcachefs: Fix rebalance from durability=0 device
bcachefs: Print shutdown journal sequence number
...
|
|
The page has been marked clean before writepage is called. If we don't
redirty it before postponing the write, it might never get written.
Cc: [email protected]
Fixes: 503d4fa6ee28 ("ceph: remove reliance on bdi congestion")
Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Xiubo Li <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
The nfs4 mount fails with EIO on 64-bit big endian architectures since
v6.7. The issue arises from employing a union in the nfsd4_encode_fattr4()
function to overlay a 32-bit array with a 64-bit values based bitmap,
which does not function as intended. Address the endianness issue by
utilizing bitmap_from_arr32() to copy 32-bit attribute masks into a
bitmap in an endianness-agnostic manner.
Cc: [email protected]
Fixes: fce7913b13d0 ("NFSD: Use a bitmask loop to encode FATTR4 results")
Link: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2060217
Signed-off-by: Vasily Gorbik <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
The sysfs_break_active_protection() routine has an obvious reference
leak in its error path. If the call to kernfs_find_and_get() fails then
kn will be NULL, so the companion sysfs_unbreak_active_protection()
routine won't get called (and would only cause an access violation by
trying to dereference kn->parent if it was called). As a result, the
reference to kobj acquired at the start of the function will never be
released.
Fix the leak by adding an explicit kobject_put() call when kn is NULL.
Signed-off-by: Alan Stern <[email protected]>
Fixes: 2afc9166f79b ("scsi: sysfs: Introduce sysfs_{un,}break_active_protection()")
Cc: Bart Van Assche <[email protected]>
Cc: [email protected]
Reviewed-by: Bart Van Assche <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig fixes from Masami Hiramatsu:
- show the original cmdline only once, and only if it was modeified by
bootconfig
* tag 'bootconfig-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fs/proc: Skip bootloader comment if no embedded kernel parameters
fs/proc: remove redundant comments from /proc/bootconfig
|
|
We weren't respecting trans->journal_replay_not_finished - we shouldn't
be searching the journal keys unless we have a ref on them.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
dropping read locks in bch2_btree_node_lock_write_nofail() dates from
before we had the cycle detector; we can now tell the cycle detector
directly when taking a lock may not fail because we can't handle
transaction restarts.
This is needed for adding should_be_locked asserts.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
One btree update might have terminated in a node update, and then while
it is in flight another btree update might free that original node.
This race has to be handled in btree_update_nodes_written() - we were
missing a READ_ONCE().
Signed-off-by: Kent Overstreet <[email protected]>
|
|
cifs_get_fattr() may be called with a NULL inode, so check for a
non-NULL inode before calling
cifs_mark_open_handles_for_deleted_file().
This fixes the following oops:
mount.cifs //srv/share /mnt -o ...,vers=3.1.1
cd /mnt
touch foo; tail -f foo &
rm foo
cat foo
BUG: kernel NULL pointer dereference, address: 00000000000005c0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 696 Comm: cat Not tainted 6.9.0-rc2 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-1.fc39 04/01/2014
RIP: 0010:__lock_acquire+0x5d/0x1c70
Code: 00 00 44 8b a4 24 a0 00 00 00 45 85 f6 0f 84 bb 06 00 00 8b 2d
48 e2 95 01 45 89 c3 41 89 d2 45 89 c8 85 ed 0 0 <48> 81 3f 40 7a 76
83 44 0f 44 d8 83 fe 01 0f 86 1b 03 00 00 31 d2
RSP: 0018:ffffc90000b37490 EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff888110021ec0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000005c0
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000200
FS: 00007f2a1fa08740(0000) GS:ffff888157a00000(0000)
knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2: 00000000000005c0 CR3: 000000011ac7c000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x23/0x70
? page_fault_oops+0x180/0x490
? srso_alias_return_thunk+0x5/0xfbef5
? exc_page_fault+0x70/0x230
? asm_exc_page_fault+0x26/0x30
? __lock_acquire+0x5d/0x1c70
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
lock_acquire+0xc0/0x2d0
? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs]
? srso_alias_return_thunk+0x5/0xfbef5
? kmem_cache_alloc+0x2d9/0x370
_raw_spin_lock+0x34/0x80
? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs]
cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs]
cifs_get_fattr+0x24c/0x940 [cifs]
? srso_alias_return_thunk+0x5/0xfbef5
cifs_get_inode_info+0x96/0x120 [cifs]
cifs_lookup+0x16e/0x800 [cifs]
cifs_atomic_open+0xc7/0x5d0 [cifs]
? lookup_open.isra.0+0x3ce/0x5f0
? __pfx_cifs_atomic_open+0x10/0x10 [cifs]
lookup_open.isra.0+0x3ce/0x5f0
path_openat+0x42b/0xc30
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
do_filp_open+0xc4/0x170
do_sys_openat2+0xab/0xe0
__x64_sys_openat+0x57/0xa0
do_syscall_64+0xc1/0x1e0
entry_SYSCALL_64_after_hwframe+0x72/0x7a
Fixes: ffceb7640cbf ("smb: client: do not defer close open handles to deleted files")
Reviewed-by: Meetakshi Setiya <[email protected]>
Reviewed-by: Bharath SM <[email protected]>
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
In 9p2000 legacy mode, stat2inode initializes nlink to 1,
which is redundant with what alloc_inode should have already set.
9p2000.u overrides this with extensions if present in the stat
structure, and 9p2000.L incorporates nlink into its stat structure.
At the very least this probably messes with directory nlink
accounting in legacy mode.
Signed-off-by: Eric Van Hensbergen <[email protected]>
|
|
If a device wasn't used for btree nodes, no need to scan for them.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Fixes the following Coccinelle/coccicheck warning reported by
string_choices.cocci:
opportunity for str_plural(zgroup->g_nr_zones)
Signed-off-by: Thorsten Blum <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
[BUG]
There is a recent report that when memory pressure is high (including
cached pages), btrfs can spend most of its time on memory allocation in
btrfs_alloc_page_array() for compressed read/write.
[CAUSE]
For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
even if the bulk allocation failed (fell back to single page
allocation) we still retry but with extra memalloc_retry_wait().
If the bulk alloc only returned one page a time, we would spend a lot of
time on the retry wait.
The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
incomplete batch memory allocations").
[FIX]
Although the commit mentioned that other filesystems do the wait, it's
not the case at least nowadays.
All the mainlined filesystems only call memalloc_retry_wait() if they
failed to allocate any page (not only for bulk allocation).
If there is any progress, they won't call memalloc_retry_wait() at all.
For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
if there is no allocation progress at all, and the call is not for
metadata readahead.
So I don't believe we should call memalloc_retry_wait() unconditionally
for short allocation.
Call memalloc_retry_wait() if it fails to allocate any page for tree
block allocation (which goes with __GFP_NOFAIL and may not need the
special handling anyway), and reduce the latency for
btrfs_alloc_page_array().
Reported-by: Julian Taylor <[email protected]>
Tested-by: Julian Taylor <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
CC: [email protected] # 6.1+
Reviewed-by: Sweet Tea Dorminy <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Add an ASSERT to catch a faulty delayed reference item resulting from
prematurely cleared extent buffer.
Also, add a WARN to detect if we try to dirty a ZEROOUT buffer again, which
is suspicious as its update will be lost.
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Btrfs clears the content of an extent buffer marked as
EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is
introduced to prevent a write hole of an extent buffer, which is once
allocated, marked dirty, but turns out unnecessary and cleaned up within
one transaction operation.
Currently, btrfs_clear_buffer_dirty() marks the extent buffer as
EXTENT_BUFFER_ZONED_ZEROOUT, and skips the entry function. If this call
happens while the buffer is under IO (with the WRITEBACK flag set,
without the DIRTY flag), we can add the ZEROOUT flag and clear the
buffer's content just before a bio submission. As a result:
1) it can lead to adding faulty delayed reference item which leads to a
FS corrupted (EUCLEAN) error, and
2) it writes out cleared tree node on disk
The former issue is previously discussed in [1]. The corruption happens
when it runs a delayed reference update. So, on-disk data is safe.
[1] https://lore.kernel.org/linux-btrfs/3f4f2a0ff1a6c818050434288925bdcf3cd719e5.1709124777.git.naohiro.aota@wdc.com/
The latter one can reach on-disk data. But, as that node is already
processed by btrfs_clear_buffer_dirty(), that will be invalidated in the
next transaction commit anyway. So, the chance of hitting the corruption
is relatively small.
Anyway, we should skip flagging ZEROOUT on a non-DIRTY extent buffer, to
keep the content under IO intact.
Fixes: aa6313e6ff2b ("btrfs: zoned: don't clear dirty flag of extent buffer")
CC: [email protected] # 6.8
Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
If the "bootconfig" kernel command-line argument was specified or if
the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are
no embedded kernel parameter, omit the "# Parameters from bootloader:"
comment from the /proc/bootconfig file. This will cause automation
to fall back to the /proc/cmdline file, which will be identical to the
comment in this no-embedded-kernel-parameters case.
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 8b8ce6c75430 ("fs/proc: remove redundant comments from /proc/bootconfig")
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
|
|
commit 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to
/proc/bootconfig") adds bootloader argument comments into /proc/bootconfig.
/proc/bootconfig shows boot_command_line[] multiple times following
every xbc key value pair, that's duplicated and not necessary.
Remove redundant ones.
Output before and after the fix is like:
key1 = value1
*bootloader argument comments*
key2 = value2
*bootloader argument comments*
key3 = value3
*bootloader argument comments*
...
key1 = value1
key2 = value2
key3 = value3
*bootloader argument comments*
...
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to /proc/bootconfig")
Signed-off-by: Zhenhua Huang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: [email protected]
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
shoot down journal keys _before_ populating journal keys with pointers
to scanned nodes
Signed-off-by: Kent Overstreet <[email protected]>
|
|
- fix return types: promoting from unsigned to ssize_t does not do what
we want here, and was pointless since the rest of the eytzinger code
is u32
- nr, not size
Signed-off-by: Kent Overstreet <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Several fixes to qgroups that have been recently identified by test
generic/475:
- fix prealloc reserve leak in subvolume operations
- various other fixes in reservation setup, conversion or cleanup"
* tag 'for-6.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: always clear PERTRANS metadata during commit
btrfs: make btrfs_clear_delalloc_extent() free delalloc reserve
btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans
btrfs: record delayed inode root in transaction
btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations
btrfs: qgroup: correctly model root qgroup rsv in convert
|
|
bch2_acl_from_disk() uses allocate_dropping_locks, and can thus return
a transaction restart - this wasn't handled.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
When allocating bkey_cached from bc->freed_pcpu list, it missed
decreasing the count of nr_freed_pcpu which would cause the mismatch
between the value of nr_freed_pcpu and the list items. This problem
also exists in moving new bkey_cached to bc->freed_pcpu list.
If these happened, the bug info may appear in
bch2_fs_btree_key_cache_exit by the follow code:
BUG_ON(list_count_nodes(&bc->freed_pcpu) != bc->nr_freed_pcpu);
BUG_ON(list_count_nodes(&bc->freed_nonpcpu) != bc->nr_freed_nonpcpu);
Fixes: c65c13f0eac6 ("bcachefs: Run btree key cache shrinker less aggressively")
Signed-off-by: Hongbo Li <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Multiple bug fixes for journal iters:
- When the journal keys gap buffer is resized, we have to adjust the
iterators for moving the gap to the end
- We don't want to rewind iterators to point to the key we just
inserted if it's not for the correct btree/level
Also, add some new assertions.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
The struct field swap can collide with the swap() macro defined in
linux/minmax.h. Rename the struct field to prevent such collisions.
Signed-off-by: Thorsten Blum <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
"bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant
performance regression (nearly 50%) on multithreaded random writes with
fio.
The reason is that the journal watermark checks multiple things,
including the state of the btree write buffer, and on multithreaded
update heavy workloads we're bottleneked on write buffer flushing - we
don't want kicknig off btree updates to depend on the state of the write
buffer.
This isn't strictly correct; the interior btree update path does do
write buffer updates, but it's a tiny fraction of total accounting
updates and we're more concerned with space in the journal itself.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
BCH_IOCTL_FSCK_OFFLINE allows the userspace fsck tool to use the kernel
implementation of fsck - primarily when the kernel version is a better
version match.
It should look and act exactly like the normal userspace fsck that the
user expected to be invoking, so errors should never result in a kernel
panic.
We may want to consider further restricting errors=panic - it's only
intended for debugging in controlled test environments, it should have
no purpose it normal usage.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
To open an encrypted filesystem, we use request_key() to get the
encryption key from the user's keyring - but request_key() needs to
happen in the context of the process that invoked the ioctl.
This easily fixed by using bch2_fs_open() in nostart mode.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Address a slow memory leak with RPC-over-TCP
- Prevent another NFS4ERR_DELAY loop during CREATE_SESSION
* tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP
|
|
Pull xfs fix from Chandan Babu:
- Allow creating new links to special files which were not associated
with a project quota
* tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: allow cross-linking special files without project quota
|
|
Pull smb client fixes from Steve French:
- fix to retry close to avoid potential handle leaks when server
returns EBUSY
- DFS fixes including a fix for potential use after free
- fscache fix
- minor strncpy cleanup
- reconnect race fix
- deal with various possible UAF race conditions tearing sessions down
* tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()
smb: client: fix potential UAF in smb2_is_network_name_deleted()
smb: client: fix potential UAF in is_valid_oplock_break()
smb: client: fix potential UAF in smb2_is_valid_oplock_break()
smb: client: fix potential UAF in smb2_is_valid_lease_break()
smb: client: fix potential UAF in cifs_stats_proc_show()
smb: client: fix potential UAF in cifs_stats_proc_write()
smb: client: fix potential UAF in cifs_dump_full_key()
smb: client: fix potential UAF in cifs_debug_files_proc_show()
smb3: retrying on failed server close
smb: client: serialise cifs_construct_tcon() with cifs_mount_mutex
smb: client: handle DFS tcons in cifs_construct_tcon()
smb: client: refresh referral without acquiring refpath_lock
smb: client: guarantee refcounted children from parent session
cifs: Fix caching to try to do open O_WRONLY as rdwr on server
smb: client: fix UAF in smb2_reconnect_server()
smb: client: replace deprecated strncpy with strscpy
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
The ! was obviously intended to be ~. As it is, this function does
the equivalent to: "addr[bit / 64] = 0;".
Fixes: 27fcec6c27ca ("bcachefs: Clear recovery_passes_required as they complete without errors")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the
client. While a callback job is technically an RPC that counter is
really more for client-driven RPCs, and this has the effect of
preventing the client from being unhashed until the callback completes.
If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we
can end up in a situation where the callback can't complete on the (now
dead) callback channel, but the new client can't connect because the old
client can't be unhashed. This usually manifests as a NFS4ERR_DELAY
return on the CREATE_SESSION operation.
The job is only holding a reference to the client so it can clear a flag
after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a
reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of
reference when dealing with the nfsdfs info files, but it should work
appropriately here to ensure that the nfs4_client doesn't disappear.
Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
Reported-by: Vladimir Benes <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
Pull smb server fixes from Steve French:
"Three fixes, all also for stable:
- encryption fix
- memory overrun fix
- oplock break fix"
* tag '6.9-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1
ksmbd: validate payload size in ipc response
ksmbd: don't send oplock break if rename fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a few small fixes. This comes with some delay because I
wanted to wait on people running their reproducers and the Easter
Holidays meant that those replies came in a little later than usual:
- Fix handling of preventing writes to mounted block devices.
Since last kernel we allow to prevent writing to mounted block
devices provided CONFIG_BLK_DEV_WRITE_MOUNTED isn't set and the
block device is opened with restricted writes. When we switched to
opening block devices as files we altered the mechanism by which we
recognize when a block device has been opened with write
restrictions.
The detection logic assumed that only read-write mounted
filesystems would apply write restrictions to their block devices
from other openers. That of course is not true since it also makes
sense to apply write restrictions for filesystems that are
read-only.
Fix the detection logic using an FMODE_* bit. We still have a few
left since we freed up a couple a while ago. I also picked up a
patch to free up four additional FMODE_* bits scheduled for the
next merge window.
- Fix counting the number of writers to a block device. This just
changes the logic to be consistent.
- Fix a bug in aio causing a NULL pointer derefernce after we
implemented batched processing in aio.
- Finally, add the changes we discussed that allows to yield block
devices early even though file closing itself is deferred.
This also allows us to remove two holder operations to get and
release the holder to align lifetime of file and holder of the
block device"
* tag 'vfs-6.9-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
aio: Fix null ptr deref in aio_complete() wakeup
fs,block: yield devices early
block: count BLK_OPEN_RESTRICT_WRITES openers
block: handle BLK_OPEN_RESTRICT_WRITES correctly
|
|
list_del_init_careful() needs to be the last access to the wait queue
entry - it effectively unlocks access.
Previously, finish_wait() would see the empty list head and skip taking
the lock, and then we'd return - but the completion path would still
attempt to do the wakeup after the task_struct pointer had been
overwritten.
Fixes: 71eb6b6b0ba9 ("fs/aio: obey min_nr when doing wakeups")
Cc: [email protected]
Link: https://lore.kernel.org/linux-fsdevel/CAHTA-ubfwwB51A5Wg5M6H_rPEQK9pNf8FkAGH=vr=FEkyRrtqw@mail.gmail.com/
Signed-off-by: Kent Overstreet <[email protected]>
Link: https://lore.kernel.org/stable/20240331215212.522544-1-kent.overstreet%40linux.dev
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Pull bcachefs repair code from Kent Overstreet:
"A couple more small fixes, and new repair code.
We can now automatically recover from arbitrary corrupted interior
btree nodes by scanning, and we can reconstruct metadata as needed to
bring a filesystem back into a working, consistent, read-write state
and preserve access to whatevver wasn't corrupted.
Meaning - you can blow away all metadata except for extents and
dirents leaf nodes, and repair will reconstruct everything else and
give you your data, and under the correct paths. If inodes are missing
i_size will be slightly off and permissions/ownership/timestamps will
be gone, and we do still need the snapshots btree if snapshots were in
use - in the future we'll be able to guess the snapshot tree structure
in some situations.
IOW - aside from shaking out remaining bugs (fuzz testing is still
coming), repair code should be complete and if repair ever doesn't
work that's the highest priority bug that I want to know about
immediately.
This patchset was kindly tested by a user from India who accidentally
wiped one drive out of a three drive filesystem with no replication on
the family computer - it took a couple weeks but we got everything
important back"
* tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs:
bcachefs: reconstruct_inode()
bcachefs: Subvolume reconstruction
bcachefs: Check for extents that point to same space
bcachefs: Reconstruct missing snapshot nodes
bcachefs: Flag btrees with missing data
bcachefs: Topology repair now uses nodes found by scanning to fill holes
bcachefs: Repair pass for scanning for btree nodes
bcachefs: Don't skip fake btree roots in fsck
bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
bcachefs: Etyzinger cleanups
bcachefs: bch2_shoot_down_journal_keys()
bcachefs: Clear recovery_passes_required as they complete without errors
bcachefs: ratelimit informational fsck errors
bcachefs: Check for bad needs_discard before doing discard
bcachefs: Improve bch2_btree_update_to_text()
mean_and_variance: Drop always failing tests
bcachefs: fix nocow lock deadlock
bcachefs: BCH_WATERMARK_interior_updates
bcachefs: Fix btree node reserve
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Print start and end level of the btree update; also a bit of cleanup.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
sysfs is limited to PAGE_SIZE, and when we're debugging strange
deadlocks/priority inversions we need to see the full list.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Snapshot table accesses generally need to be checking for invalid
snapshot ID now, fix one that was missed.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
This creates a subdirectory for each individual btree under the btrees/
debugfs directory.
Directory structure, before:
/sys/kernel/debug/bcachefs/$FS_ID/btrees/
├── alloc
├── alloc-bfloat-failed
├── alloc-formats
├── backpointers
├── backpointers-bfloat-failed
├── backpointers-formats
...
Directory structure, after:
/sys/kernel/debug/bcachefs/$FS_ID/btrees/
├── alloc
│ ├── bfloat-failed
│ ├── formats
│ └── keys
├── backpointers
│ ├── bfloat-failed
│ ├── formats
│ └── keys
...
Signed-off-by: Thomas Bertschinger <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: [email protected]
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: [email protected]
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: [email protected]
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: [email protected]
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: [email protected]
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|