aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-03-10smb: common: fix fields sizes in compression_pattern_payload_v1Enzo Matsumiya1-2/+2
See protocol documentation in MS-SMB2 section 2.2.42.2.2 Signed-off-by: Enzo Matsumiya <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: negotiate compression algorithmsEnzo Matsumiya6-15/+49
Change "compress=" mount option to a boolean flag, that, if set, will enable negotiating compression algorithms with the server. Do not de/compress anything for now. Signed-off-by: Enzo Matsumiya <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb3: add dynamic trace point for ioctlsSteve French2-0/+37
It can be helpful in debugging to know which ioctls are called to better correlate them with smb3 fsctls (and opens). Add a dynamic trace point to trace ioctls into cifs.ko Here is sample output: TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | new-inotify-ioc-90418 [001] ..... 142157.397024: smb3_ioctl: xid=18 fid=0x0 ioctl cmd=0xc009cf0b new-inotify-ioc-90457 [007] ..... 142217.943569: smb3_ioctl: xid=22 fid=0x389bf5b6 ioctl cmd=0xc009cf0b Signed-off-by: Steve French <[email protected]>
2024-03-10cifs: Fix writeback data corruptionDavid Howells1-126/+157
cifs writeback doesn't correctly handle the case where cifs_extend_writeback() hits a point where it is considering an additional folio, but this would overrun the wsize - at which point it drops out of the xarray scanning loop and calls xas_pause(). The problem is that xas_pause() advances the loop counter - thereby skipping that page. What needs to happen is for xas_reset() to be called any time we decide we don't want to process the page we're looking at, but rather send the request we are building and start a new one. Fix this by copying and adapting the netfslib writepages code as a temporary measure, with cifs writeback intending to be offloaded to netfslib in the near future. This also fixes the issue with the use of filemap_get_folios_tag() causing retry of a bunch of pages which the extender already dealt with. This can be tested by creating, say, a 64K file somewhere not on cifs (otherwise copy-offload may get underfoot), mounting a cifs share with a wsize of 64000, copying the file to it and then comparing the original file and the copy: dd if=/dev/urandom of=/tmp/64K bs=64k count=1 mount //192.168.6.1/test /mnt -o user=...,pass=...,wsize=64000 cp /tmp/64K /mnt/64K cmp /tmp/64K /mnt/64K Without the fix, the cmp fails at position 64000 (or shortly thereafter). Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Signed-off-by: David Howells <[email protected]> cc: Steve French <[email protected]> cc: Paulo Alcantara <[email protected]> cc: Ronnie Sahlberg <[email protected]> cc: Shyam Prasad N <[email protected]> cc: Tom Talpey <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: return reparse type in /proc/mountsPaulo Alcantara2-0/+14
Add support for returning reparse mount option in /proc/mounts. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: set correct d_type for reparse DFS/DFSR and mount pointPaulo Alcantara1-7/+9
Set correct dirent->d_type for IO_REPARSE_TAG_DFS{,R} and IO_REPARSE_TAG_MOUNT_POINT reparse points. Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: parse uid, gid, mode and dev from WSL reparse pointsPaulo Alcantara4-17/+97
Parse the extended attributes from WSL reparse points to correctly report uid, gid mode and dev from ther instantiated inodes. Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: introduce SMB2_OP_QUERY_WSL_EAPaulo Alcantara6-25/+190
Add a new command to smb2_compound_op() for querying WSL extended attributes from reparse points. Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: Fix a NULL vs IS_ERR() check in wsl_set_xattrs()Dan Carpenter1-1/+1
This was intended to be an IS_ERR() check. The ea_create_context() function doesn't return NULL. Fixes: 1eab17fe485c ("smb: client: add support for WSL reparse points") Reviewed-by: Paulo Alcantara <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: add support for WSL reparse pointsPaulo Alcantara10-20/+210
Add support for creating special files via WSL reparse points when using 'reparse=wsl' mount option. They're faster than NFS reparse points because they don't require extra roundtrips to figure out what ->d_type a specific dirent is as such information is already stored in query dir responses and then making getdents() calls faster. Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: reduce number of parameters in smb2_compound_op()Paulo Alcantara2-69/+95
Replace @desired_access, @create_disposition, @create_options and @mode parameters with a single @oparms. No functional changes. Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: fix potential broken compound requestPaulo Alcantara1-43/+63
Now that smb2_compound_op() can accept up to 5 commands in a single compound request, set the appropriate NextCommand and related flags to all subsequent commands as well as handling the case where a valid @cfile is passed and therefore skipping create and close requests in the compound chain. This fix a potential broken compound request that could be sent from smb2_get_reparse_inode() if the client found a valid open file (@cfile) prior to calling smb2_compound_op(). Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: move most of reparse point handling code to common filePaulo Alcantara9-364/+405
In preparation to add support for creating special files also via WSL reparse points in next commits. Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: introduce reparse mount optionPaulo Alcantara4-0/+52
Allow the user to create special files and symlinks by choosing between WSL and NFS reparse points via 'reparse={nfs,wsl}' mount options. If unset or 'reparse=default', the client will default to creating them via NFS reparse points. Creating WSL reparse points isn't supported yet, so simply return error when attempting to mount with 'reparse=wsl' for now. Signed-off-by: Paulo Alcantara <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: retry compound request without reusing leaseMeetakshi Setiya1-3/+38
There is a shortcoming in the current implementation of the file lease mechanism exposed when the lease keys were attempted to be reused for unlink, rename and set_path_size operations for a client. As per MS-SMB2, lease keys are associated with the file name. Linux smb client maintains lease keys with the inode. If the file has any hardlinks, it is possible that the lease for a file be wrongly reused for an operation on the hardlink or vice versa. In these cases, the mentioned compound operations fail with STATUS_INVALID_PARAMETER. This patch adds a fallback to the old mechanism of not sending any lease with these compound operations if the request with lease key fails with STATUS_INVALID_PARAMETER. Resending the same request without lease key should not hurt any functionality, but might impact performance especially in cases where the error is not because of the usage of wrong lease key and we might end up doing an extra roundtrip. Signed-off-by: Meetakshi Setiya <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: do not defer close open handles to deleted filesMeetakshi Setiya6-5/+74
When a file/dentry has been deleted before closing all its open handles, currently, closing them can add them to the deferred close list. This can lead to problems in creating file with the same name when the file is re-created before the deferred close completes. This issue was seen while reusing a client's already existing lease on a file for compound operations and xfstest 591 failed because of the deferred close handle that remained valid even after the file was deleted and was being reused to create a file with the same name. The server in this case returns an error on open with STATUS_DELETE_PENDING. Recreating the file would fail till the deferred handles are closed (duration specified in closetimeo). This patch fixes the issue by flagging all open handles for the deleted file (file path to be precise) by setting status_file_deleted to true in the cifsFileInfo structure. As per the information classes specified in MS-FSCC, SMB2 query info response from the server has a DeletePending field, set to true to indicate that deletion has been requested on that file. If this is the case, flag the open handles for this file too. When doing close in cifs_close for each of these handles, check the value of this boolean field and do not defer close these handles if the corresponding filepath has been deleted. Signed-off-by: Meetakshi Setiya <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb: client: reuse file lease key in compound operationsMeetakshi Setiya6-31/+48
Currently, when a rename, unlink or set path size compound operation is requested on a file that has a lot of dirty pages to be written to the server, we do not send the lease key for these requests. As a result, the server can assume that this request is from a new client, and send a lease break notification to the same client, on the same connection. As a response to the lease break, the client can consume several credits to write the dirty pages to the server. Depending on the server's credit grant implementation, the server can stop granting more credits to this connection, and this can cause a deadlock (which can only be resolved when the lease timer on the server expires). One of the problems here is that the client is sending no lease key, even if it has a lease for the file. This patch fixes the problem by reusing the existing lease key on the file for rename, unlink and set path size compound operations so that the client does not break its own lease. A very trivial example could be a set of commands by a client that maintains open handle (for write) to a file and then tries to copy the contents of that file to another one, eg., tail -f /dev/null > myfile & mv myfile myfile2 Presently, the network capture on the client shows that the move (or rename) would trigger a lease break on the same client, for the same file. With the lease key reused, the lease break request-response overhead is eliminated, thereby reducing the roundtrips performed for this set of operations. The patch fixes the bug described above and also provides perf benefit. Signed-off-by: Meetakshi Setiya <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10smb3: update allocation size more accurately on write completionSteve French1-1/+8
Changes to allocation size are approximated for extending writes of cached files until the server returns the actual value (on SMB3 close or query info for example), but it was setting the estimated value for number of blocks to larger than the file size even if the file is likely sparse which breaks various xfstests (e.g. generic/129, 130, 221, 228). When i_size and i_blocks are updated in write completion do not increase allocation size more than what was written (rounded up to 512 bytes). Signed-off-by: Steve French <[email protected]>
2024-03-10smb: remove SLAB_MEM_SPREAD flag usageChengming Zhou1-1/+1
The SLAB_MEM_SPREAD flag is already a no-op as of 6.8-rc1, remove its usage so we can delete it from slab. No functional change. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Chengming Zhou <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10cifs: allow changing password during remountSteve French4-5/+30
There are cases where a session is disconnected and password has changed on the server (or expired) for this user and this currently can not be fixed without unmount and mounting again. This patch allows remount to change the password (for the non Kerberos case, Kerberos ticket refresh is handled differently) when the session is disconnected and the user can not reconnect due to still using old password. Future patches should also allow us to setup the keyring (cifscreds) to have an "alternate password" so we would be able to change the password before the session drops (without the risk of races between when the password changes and the disconnect occurs - ie cases where the old password is still needed because the new password has not fully rolled out to all servers yet). Cc: [email protected] Signed-off-by: Steve French <[email protected]>
2024-03-10cifs: prevent updating file size from server if we have a read/write leaseBharath SM4-12/+17
In cases of large directories, the readdir operation may span multiple round trips to retrieve contents. This introduces a potential race condition in case of concurrent write and readdir operations. If the readdir operation initiates before a write has been processed by the server, it may update the file size attribute to an older value. Address this issue by avoiding file size updates from readdir when we have read/write lease. Scenario: 1) process1: open dir xyz 2) process1: readdir instance 1 on xyz 3) process2: create file.txt for write 4) process2: write x bytes to file.txt 5) process2: close file.txt 6) process2: open file.txt for read 7) process1: readdir 2 - overwrites file.txt inode size to 0 8) process2: read contents of file.txt - bug, short read with 0 bytes Cc: [email protected] Reviewed-by: Shyam Prasad N <[email protected]> Signed-off-by: Bharath SM <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-10bcachefs: bch2_lookup() gives better error message on inode not foundKent Overstreet1-9/+64
When a dirent points to a missing inode, we really should print out the dirent. This requires quite a bit of refactoring, but there's some other benefits: we now do the entire looup (dirent and inode) in a single btree transaction, and copy to the VFS inode with btree locks still held, like the create path. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: bch2_inode_insert()Kent Overstreet1-62/+76
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: factor out check_inode_backpointer()Kent Overstreet1-9/+29
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Factor out check_subvol_dirent()Kent Overstreet1-48/+57
Going to be adding more code here for checking subvol structure. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Kill some -EINVALsKent Overstreet2-5/+5
Repurposing standard error codes in bcachefs code is banned in new code, and we need to get rid of the remaining ones - private error codes give us much better error messages. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: bump max_active on btree_interior_update_workerKent Overstreet1-1/+1
WQ_UNBOUND with max_active 1 means ordered workqueue, but we don't actually need or want ordered semantics - and probably want a higher concurrency limit anyways. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: move fsck_write_inode() to inode.cKent Overstreet3-40/+44
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Initialize super_block->s_uuidKent Overstreet1-0/+1
Need to fix this oversight for the new FS_IOC_(GET|SET)UUID ioctls. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Switch to uuid_to_fsid()Kent Overstreet1-5/+1
switch the statfs code from something horrible and open coded to the more standard uuid_to_fsid() Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Subvolumes may now be renamedKent Overstreet2-26/+55
Files within a subvolume cannot be renamed into another subvolume, but subvolumes themselves were intended to be. This implements subvolume renaming - we need to ensure that there's only a single dirent that points to a subvolume key (not multiple versions in different snapshots), and we need to ensure that dirent.d_parent_subol and inode.bi_parent_subvol are updated. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: btree node prefetching in check_topologyKent Overstreet4-3/+42
btree_and_journal_iter is old code that we want to get rid of, but we're not ready to yet. lack of btree node prefetching is, it turns out, a real performance issue for fsck on spinning rust, so - add it. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: btree_and_journal_iter.transKent Overstreet4-17/+21
we now always have a btree_trans when using a btree_and_journal_iter; prep work for adding prefetching to btree_and_journal_iter Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: better journal pipeliningKent Overstreet4-59/+98
Recently a severe performance regression was discovered, which bisected to a6548c8b5eb5 bcachefs: Avoid flushing the journal in the discard path It turns out the old behaviour, which issued excessive journal flushes, worked around a performance issue where queueing delays would cause the journal to not be able to write quickly enough and stall. The journal flushes masked the issue because they periodically flushed the device write cache, reducing write latency for non flushes. This patch reworks the journalling code to allow more than one (non-flush) write to be in flight at a time. With this patch, doing 4k random writes and an iodepth of 128, we are now able to hit 560k iops to a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: closure per journal bufKent Overstreet3-23/+41
Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: bio per journal bufKent Overstreet3-29/+34
Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: jset_entry_datetimeKent Overstreet4-17/+67
This gives us a way to record the date and time every journal entry was written - useful for debugging. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: improve journal entry read fsck error messagesKent Overstreet1-41/+55
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: convert journal replay ptrs to darrayKent Overstreet3-58/+36
Eliminates some error paths - no longer have a hardcoded BCH_REPLICAS_MAX limit. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Cleanup bch2_dirent_lookup_trans()Kent Overstreet3-26/+14
Drop an unnecessary bch2_subvolume_get_snapshot() call, and drop the __ from the name - this is a normal interface. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: bch2_hash_set_snapshot() -> bch2_hash_set_in_snapshot()Kent Overstreet3-18/+12
Minor renaming for clarity, bit of refactoring. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Workqueues should be WQ_HIGHPRIKent Overstreet1-4/+4
Most bcachefs workqueues are used for completions, and should be WQ_HIGHPRI - this helps reduce queuing delays, we want to complete quickly once we can no longer signal backpressure by blocking. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Improve bch2_dirent_to_text()Kent Overstreet1-9/+11
For DT_SUBVOL, we now print both parent and child subvol IDs. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: fixup for building in userspaceKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Avoid taking journal lock unnecessarilyKent Overstreet2-53/+55
Previously, any time we failed to get a journal reservation we'd retry, with the journal lock held; but this isn't necessary given wait_event()/wake_up() ordering. This avoids performance cliffs when the journal starts to get backed up and lock contention shoots up. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Journal writes should be REQ_SYNC|REQ_METAKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Avoid setting j->write_work unnecessarilyKent Overstreet1-13/+11
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Split out journal workqueueKent Overstreet3-16/+19
We don't want journal write completions to be blocked behind btree transactions - io_complete_wq is used for btree updates after data and metadata writes. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Kill unnecessary wakeups in journal reclaimKent Overstreet1-11/+9
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: skip invisible entries in empty subvolume checkingGuoyu Ou3-5/+9
When we are checking whether a subvolume is empty in the specified snapshot, entries that do not belong to this subvolume should be skipped. This fixes the following case: $ bcachefs subvolume create ./sub $ cd sub $ bcachefs subvolume create ./sub2 $ bcachefs subvolume snapshot . ./snap $ ls -a snap . .. $ rmdir snap rmdir: failed to remove 'snap': Directory not empty As Kent suggested, we pass 0 in may_delete_deleted_inode() to ignore subvols in the subvol we are checking, because inode.bi_subvol is only set on subvolume roots, and we can't go through every inode in the subvolume and change bi_subvol when taking a snapshot. It makes the check less strict, but that's ok, the rest of fsck will still catch it. Signed-off-by: Guoyu Ou <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>