aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-09-26Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-14/+150
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-balloon supports new stats - vdpa supports setting mac address - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster - virtio_fs supports new sysfs entries for queue info - virtio/vsock performance has been improved And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vsock/virtio: avoid queuing packets when intermediate queue is empty vsock/virtio: refactor virtio_transport_send_pkt_work fw_cfg: Constify struct kobj_type vdpa/mlx5: Postpone MR deletion vdpa/mlx5: Introduce init/destroy for MR resources vdpa/mlx5: Rename mr_mtx -> lock vdpa/mlx5: Extract mr members in own resource struct vdpa/mlx5: Rename function vdpa/mlx5: Delete direct MKEYs in parallel vdpa/mlx5: Create direct MKEYs in parallel MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section virtio_fs: add sysfs entries for queue information virtio_fs: introduce virtio_fs_put_locked helper vdpa: Remove unused declarations vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command vdpa/mlx5: Small improvement for change_num_qps() vdpa/mlx5: Keep notifiers during suspend but ignore vdpa/mlx5: Parallelize device resume vdpa/mlx5: Parallelize device suspend vdpa/mlx5: Use async API for vq modify commands ...
2024-09-25ksmbd: Correct typos in multiple comments across various filesShen Lichuan7-10/+10
Fixed some confusing typos that were currently identified witch codespell, the details are as follows: -in the code comments: fs/smb/common/smb2pdu.h:9: specfication ==> specification fs/smb/common/smb2pdu.h:494: usally ==> usually fs/smb/common/smb2pdu.h:1064: Attrubutes ==> Attributes fs/smb/server/connection.c:28: cleand ==> cleaned fs/smb/server/ksmbd_netlink.h:216: struture ==> structure fs/smb/server/oplock.c:799: conains ==> contains fs/smb/server/oplock.c:1487: containted ==> contained fs/smb/server/server.c:282: proccessing ==> processing fs/smb/server/smb_common.c:491: comforms ==> conforms fs/smb/server/xattr.h:102: ATTRIBUITE ==> ATTRIBUTE Signed-off-by: Shen Lichuan <shenlichuan@vivo.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-25ksmbd: fix open failure from block and char device fileNamjae Jeon1-6/+8
char/block device file can't be opened with dentry_open() if device driver is not loaded. Use O_PATH flags for fake opening file to handle it if file is a block or char file. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-25ksmbd: remove unsafe_memcpy use in session setupNamjae Jeon1-9/+3
Kees pointed out to just use directly ->Buffer instead of pointing ->Buffer using offset not to use unsafe_memcpy(). Suggested-by: Kees Cook <kees@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-25virtio_fs: add sysfs entries for queue informationMax Gurtovoy1-8/+139
Introduce sysfs entries to provide visibility to the multiple queues used by the Virtio FS device. This enhancement allows users to query information about these queues. Specifically, add two sysfs entries: 1. Queue name: Provides the name of each queue (e.g. hiprio/requests.8). 2. CPU list: Shows the list of CPUs that can process requests for each queue. The CPU list feature is inspired by similar functionality in the block MQ layer, which provides analogous sysfs entries for block devices. These new sysfs entries will improve observability and aid in debugging and performance tuning of Virtio FS devices. Reviewed-by: Idan Zach <izach@nvidia.com> Reviewed-by: Shai Malin <smalin@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20240825130716.9506-2-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-09-25virtio_fs: introduce virtio_fs_put_locked helperMax Gurtovoy1-6/+11
Introduce a new helper function virtio_fs_put_locked to encapsulate the common pattern of releasing a virtio_fs reference while holding a lock. The existing virtio_fs_put helper will be used to release a virtio_fs reference while not holding a lock. Also add an assertion in case the lock is not taken when it should. Reviewed-by: Idan Zach <izach@nvidia.com> Reviewed-by: Shai Malin <smalin@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20240825130716.9506-1-mgurtovoy@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-09-24netfs, cifs: Fix mtime/ctime update for mmapped writesDavid Howells1-0/+1
The cifs flag CIFS_INO_MODIFIED_ATTR, which indicates that the mtime and ctime need to be written back on close, got taken over by netfs as NETFS_ICTX_MODIFIED_ATTR to avoid the need to call a function pointer to set it. The flag gets set correctly on buffered writes, but doesn't get set by netfs_page_mkwrite(), leading to occasional failures in generic/080 and generic/215. Fix this by setting the flag in netfs_page_mkwrite(). Fixes: 73425800ac94 ("netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202409161629.98887b2-oliver.sang@intel.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24cifs: update internal version numberSteve French1-2/+2
To 2.51 Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24smb: client: print failed session logoffs with FYIPaulo Alcantara1-2/+1
Do not flood dmesg with failed session logoffs as kerberos tickets getting expired or passwords being rotated is a very common scenario. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24cifs: Fix reversion of the iter in cifs_readv_receive().David Howells3-11/+7
cifs_read_iter_from_socket() copies the iterator that's passed in for the socket to modify as and if it will, and then advances the original iterator by the amount sent. However, both callers revert the advancement (although receive_encrypted_read() zeros beyond the iterator first). The problem is, though, that cifs_readv_receive() reverts by the original length, not the amount transmitted which can cause an oops in iov_iter_revert(). Fix this by: (1) Remove the iov_iter_advance() from cifs_read_iter_from_socket(). (2) Remove the iov_iter_revert() from both callers. This fixes the bug in cifs_readv_receive(). (3) In receive_encrypted_read(), if we didn't get back as much data as the buffer will hold, copy the iterator, advance the copy and use the copy to drive iov_iter_zero(). As a bonus, this gets rid of some unnecessary work. This was triggered by generic/074 with the "-o sign" mount option. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24smb3: fix incorrect mode displayed for read-only filesSteve French1-8/+11
Commands like "chmod 0444" mark a file readonly via the attribute flag (when mapping of mode bits into the ACL are not set, or POSIX extensions are not negotiated), but they were not reported correctly for stat of directories (they were reported ok for files and for "ls"). See example below: root:~# ls /mnt2 -l total 12 drwxr-xr-x 2 root root 0 Sep 21 18:03 normaldir -rwxr-xr-x 1 root root 0 Sep 21 23:24 normalfile dr-xr-xr-x 2 root root 0 Sep 21 17:55 readonly-dir -r-xr-xr-x 1 root root 209716224 Sep 21 18:15 readonly-file root:~# stat -c %a /mnt2/readonly-dir 755 root:~# stat -c %a /mnt2/readonly-file 555 This fixes the stat of directories when ATTR_READONLY is set (in cases where the mode can not be obtained other ways). root:~# stat -c %a /mnt2/readonly-dir 555 Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24smb: client: fix parsing of device numbersPaulo Alcantara2-11/+4
Report correct major and minor numbers from special files created with NFS reparse points. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24smb: client: set correct device number on nfs reparse pointsPaulo Alcantara1-2/+2
Fix major and minor numbers set on special files created with NFS reparse points. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24smb: client: propagate error from cifs_construct_tcon()Paulo Alcantara1-6/+10
Propagate error from cifs_construct_tcon() in cifs_sb_tlink() instead of always returning -EACCES. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24smb: client: fix DFS failover in multiuser mountsPaulo Alcantara1-1/+2
For sessions and tcons created on behalf of new users accessing a multiuser mount, matching their sessions in tcon_super_cb() with master tcon will always lead to false as every new user will have its own session and tcon. All multiuser sessions, however, will inherit ->dfs_root_ses from master tcon, so match it instead. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24cifs: Make the write_{enter,done,err} tracepoints display netfs infoDavid Howells2-10/+18
Make the write RPC tracepoints use the same trace macro complexes as the read tracepoints and display the netfs request and subrequest IDs where available (see commit 519be989717c "cifs: Add a tracepoint to track credits involved in R/W requests"). Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <stfrench@microsoft.com> cc: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24smb: client: fix DFS interlink failoverPaulo Alcantara9-86/+94
The DFS interlinks point to different DFS namespaces so make sure to use the correct DFS root server to chase any DFS links under it by storing the SMB session in dfs_ref_walk structure and then using it on every referral walk. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24smb: client: improve purging of cached referralsPaulo Alcantara1-17/+15
Purge cached referrals that have a single target when reaching maximum of cache size as the client won't need them to failover. Otherwise remove oldest cache entry. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24smb: client: avoid unnecessary reconnects when refreshing referralsPaulo Alcantara1-70/+117
Do not mark tcons for reconnect when current connection matches any of the targets returned by new referral even when there is no cached entry. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-24Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds42-400/+1938
Pull NFS client updates from Anna Schumaker: "New Features: - Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention - Add support for the LOCALIO protocol extention Bugfixes: - Fix memory leak in error path of nfs4_do_reclaim() - Simplify and guarantee lock owner uniqueness - Fix -Wformat-truncation warning - Fix folio refcounts by using folio_attach_private() - Fix failing the mount system call when the server is down - Fix detection of "Proxying of Times" server support Cleanups: - Annotate struct nfs_cache_array with __counted_by() - Remove unnecessary NULL checks before kfree() - Convert RPC_TASK_* constants to an enum - Remove obsolete or misleading comments and declerations" * tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits) nfs: Fix `make htmldocs` warnings in the localio documentation nfs: add "NFS Client and Server Interlock" section to localio.rst nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst nfs: add Documentation/filesystems/nfs/localio.rst nfs: implement client support for NFS_LOCALIO_PROGRAM nfs/localio: use dedicated workqueues for filesystem read and write pnfs/flexfiles: enable localio support nfs: enable localio for non-pNFS IO nfs: add LOCALIO support nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit nfsd: implement server support for NFS_LOCALIO_PROGRAM nfsd: add LOCALIO support nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO nfs_common: add NFS LOCALIO auxiliary protocol enablement SUNRPC: replace program list with program array SUNRPC: add svcauth_map_clnt_to_svc_cred_local SUNRPC: remove call_allocate() BUG_ONs nfsd: add nfsd_serv_try_get and nfsd_serv_put nfsd: add nfsd_file_acquire_local() nfsd: factor out __fh_verify to allow NULL rqstp to be passed ...
2024-09-24Merge tag 'fuse-update-6.12' of ↵Linus Torvalds12-296/+529
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add support for idmapped fuse mounts (Alexander Mikhalitsyn) - Add optimization when checking for writeback (yangyun) - Add tracepoints (Josef Bacik) - Clean up writeback code (Joanne Koong) - Clean up request queuing (me) - Misc fixes * tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (32 commits) fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set fuse: clear FR_PENDING if abort is detected when sending request fs/fuse: convert to use invalid_mnt_idmap fs/mnt_idmapping: introduce an invalid_mnt_idmap fs/fuse: introduce and use fuse_simple_idmap_request() helper fs/fuse: fix null-ptr-deref when checking SB_I_NOIDMAP flag fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN virtio_fs: allow idmapped mounts fuse: allow idmapped mounts fuse: warn if fuse_access is called when idmapped mounts are allowed fuse: handle idmappings properly in ->write_iter() fuse: support idmapped ->rename op fuse: support idmapped ->set_acl fuse: drop idmap argument from __fuse_get_acl fuse: support idmapped ->setattr op fuse: support idmapped ->permission inode op fuse: support idmapped getattr inode op fuse: support idmap for mkdir/mknod/symlink/create/tmpfile fuse: support idmapped FUSE_EXT_GROUPS fuse: add an idmap argument to fuse_simple_request ...
2024-09-24Merge tag 'exfat-for-6.12-rc1' of ↵Linus Torvalds7-127/+174
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Clean-up unnecessary codes as ->valid_size is supported - buffered-IO fallback is no longer needed when using direct-IO - Move ->valid_size extension from mmap to ->page_mkwrite. This improves the overhead caused by unnecessary zero-out during mmap. - Fix memleaks from exfat_load_bitmap() and exfat_create_upcase_table() - Add sops->shutdown and ioctl - Add Yuezhang Mo as a reviwer * tag 'exfat-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: MAINTAINERS: exfat: add myself as reviewer exfat: resolve memory leak from exfat_create_upcase_table() exfat: move extend valid_size into ->page_mkwrite() exfat: fix memory leak in exfat_load_bitmap() exfat: Implement sops->shutdown and ioctl exfat: do not fallback to buffered write exfat: drop ->i_size_ondisk
2024-09-24Merge tag 'f2fs-for-6.12-rc1' of ↵Linus Torvalds20-460/+738
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The main changes include converting major IO paths to use folio, and adding various knobs to control GC more flexibly for Zoned devices. In addition, there are several patches to address corner cases of atomic file operations and better support for file pinning on zoned device. Enhancement: - add knobs to tune foreground/background GCs for Zoned devices - convert IO paths to use folio - reduce expensive checkpoint trigger frequency - allow F2FS_IPU_NOCACHE for pinned file - forcibly migrate to secure space for zoned device file pinning - get rid of buffer_head use - add write priority option based on zone UFS - get rid of online repair on corrupted directory Bug fixes: - fix to don't panic system for no free segment fault injection - fix to don't set SB_RDONLY in f2fs_handle_critical_error() - avoid unused block when dio write in LFS mode - compress: don't redirty sparse cluster during {,de}compress - check discard support for conventional zones - atomic: prevent atomic file from being dirtied before commit - atomic: fix to check atomic_file in f2fs ioctl interfaces - atomic: fix to forbid dio in atomic_file - atomic: fix to truncate pagecache before on-disk metadata truncation - atomic: create COW inode from parent dentry - atomic: fix to avoid racing w/ GC - atomic: require FMODE_WRITE for atomic write ioctls - fix to wait page writeback before setting gcing flag - fix to avoid racing in between read and OPU dio write, dio completion - fix several potential integer overflows in file offsets and dir_block_index - fix to avoid use-after-free in f2fs_stop_gc_thread() As usual, there are several code clean-ups and refactorings" * tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits) f2fs: allow F2FS_IPU_NOCACHE for pinned file f2fs: forcibly migrate to secure space for zoned device file pinning f2fs: remove unused parameters f2fs: fix to don't panic system for no free segment fault injection f2fs: fix to don't set SB_RDONLY in f2fs_handle_critical_error() f2fs: add valid block ratio not to do excessive GC for one time GC f2fs: create gc_no_zoned_gc_percent and gc_boost_zoned_gc_percent f2fs: do FG_GC when GC boosting is required for zoned devices f2fs: increase BG GC migration window granularity when boosted for zoned devices f2fs: add reserved_segments sysfs node f2fs: introduce migration_window_granularity f2fs: make BG GC more aggressive for zoned devices f2fs: avoid unused block when dio write in LFS mode f2fs: fix to check atomic_file in f2fs ioctl interfaces f2fs: get rid of online repaire on corrupted directory f2fs: prevent atomic file from being dirtied before commit f2fs: get rid of page->index f2fs: convert read_node_page() to use folio f2fs: convert __write_node_page() to use folio f2fs: convert f2fs_write_data_page() to use folio ...
2024-09-24ceph: remove the incorrect Fw reference check when dirtying pagesXiubo Li1-1/+0
When doing the direct-io reads it will also try to mark pages dirty, but for the read path it won't hold the Fw caps and there is case will it get the Fw reference. Fixes: 5dda377cf0a6 ("ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-09-24ceph: Remove empty definition in header fileZhang Zekun1-4/+0
The real definition of ceph_acl_chmod() has been removed since commit 4db658ea0ca2 ("ceph: Fix up after semantic merge conflict"), remain the empty definition untouched in the header files. Let's remove the empty definition. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-09-24ceph: Fix typo in the commentYan Zhen4-4/+4
Correctly spelled comments make it easier for the reader to understand the code. replace 'tagert' with 'target' in the comment & replace 'vaild' with 'valid' in the comment & replace 'carefull' with 'careful' in the comment & replace 'trsaverse' with 'traverse' in the comment. Signed-off-by: Yan Zhen <yanzhen@vivo.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-09-24ceph: fix a memory leak on cap_auths in MDS clientLuis Henriques (SUSE)1-0/+12
The cap_auths that are allocated during an MDS session opening are never released, causing a memory leak detected by kmemleak. Fix this by freeing the memory allocated when shutting down the MDS client. Fixes: 1d17de9534cb ("ceph: save cap_auths in MDS client when session is opened") Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-09-24ceph: flush all caps releases when syncing the whole filesystemXiubo Li4-0/+25
We have hit a race between cap releases and cap revoke request that will cause the check_caps() to miss sending a cap revoke ack to MDS. And the client will depend on the cap release to release that revoking caps, which could be delayed for some unknown reasons. In Kclient we have figured out the RCA about race and we need a way to explictly trigger this manually could help to get rid of the caps revoke stuck issue. Link: https://tracker.ceph.com/issues/67221 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-09-24ceph: rename ceph_flush_cap_releases() to ceph_flush_session_cap_releases()Xiubo Li3-8/+8
Prepare for adding a helper to flush the cap releases for all sessions. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-09-24Merge tag 'sysctl-6.12-rc1' of ↵Linus Torvalds1-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl update from Joel Granados: - Avoid evaluating non-mount ctl_tables as a sysctl_mount_point by removing the unlikely (but possible) chance that the permanently empty ctl_table array shares its address with another ctl_table - Update Joel Granados' contact info in MAINTAINERS * tag 'sysctl-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: MAINTAINERS: update email for Joel Granados sysctl: avoid spurious permanent empty tables
2024-09-24fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is setyangyun1-1/+1
This may be a typo. The comment has said shared locks are not allowed when this bit is set. If using shared lock, the wait in `fuse_file_cached_io_open` may be forever. Fixes: 205c1d802683 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP") CC: stable@vger.kernel.org # v6.9 Signed-off-by: yangyun <yangyun50@huawei.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-09-24fuse: clear FR_PENDING if abort is detected when sending requestMiklos Szeredi1-0/+1
The (!fiq->connected) check was moved into the queuing method resulting in the following: Fixes: 5de8acb41c86 ("fuse: cleanup request queuing towards virtiofs") Reported-by: Lai, Yi <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/all/ZvFEAM6JfrBKsOU0@ly-workstation/ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-09-23bcachefs: Fix infinite loop in propagate_key_to_snapshot_leaves()Kent Overstreet1-0/+1
As we iterate we need to mark that we no longer need iterators - otherwise we'll infinite loop via the "too many iters" check when there's many snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-23bcachefs: Ensure BCH_FS_accounting_replay_done is always setKent Overstreet1-0/+3
if it doesn't get set we'll never be able to flush the btree write buffer; this only happens in fake rw mode, but prevents us from shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-23nfs: implement client support for NFS_LOCALIO_PROGRAMMike Snitzer2-6/+132
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL" RPC method that allows the Linux NFS client to verify the local Linux NFS server can see the nonce (single-use UUID) the client generated and made available in nfs_common for subsequent lookup and verification by the NFS server. If matched, the NFS server populates members in the nfs_uuid_t struct. The NFS client then transfers these nfs_uuid_t struct member pointers to the nfs_client struct and cleans up the nfs_uuid_t struct. See: fs/nfs/localio.c:nfs_local_probe() This protocol isn't part of an IETF standard, nor does it need to be considering it is Linux-to-Linux auxiliary RPC protocol that amounts to an implementation detail. Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka AUTH_SYS) is used (enforced by fs/nfs/localio.c:nfs_local_probe()). The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode XDR methods are used instead of the less efficient variable sized methods. Having a nonce (single-use uuid) is better than using the same uuid for the life of the server, and sending it proactively by client rather than reactively by the server is also safer. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs/localio: use dedicated workqueues for filesystem read and writeTrond Myklebust3-38/+91
For localio access, don't call filesystem read() and write() routines directly. This solves two problems: 1) localio writes need to use a normal (non-memreclaim) unbound workqueue. This avoids imposing new requirements on how underlying filesystems process frontend IO, which would cause a large amount of work to update all filesystems. Without this change, when XFS starts getting low on space, XFS flushes work on a non-memreclaim work queue, which causes a priority inversion problem: 00573 workqueue: WQ_MEM_RECLAIM writeback:wb_workfn is flushing !WQ_MEM_RECLAIM xfs-sync/vdc:xfs_flush_inodes_worker 00573 WARNING: CPU: 6 PID: 8525 at kernel/workqueue.c:3706 check_flush_dependency+0x2a4/0x328 00573 Modules linked in: 00573 CPU: 6 PID: 8525 Comm: kworker/u71:5 Not tainted 6.10.0-rc3-ktest-00032-g2b0a133403ab #18502 00573 Hardware name: linux,dummy-virt (DT) 00573 Workqueue: writeback wb_workfn (flush-0:33) 00573 pstate: 400010c5 (nZcv daIF -PAN -UAO -TCO -DIT +SSBS BTYPE=--) 00573 pc : check_flush_dependency+0x2a4/0x328 00573 lr : check_flush_dependency+0x2a4/0x328 00573 sp : ffff0000c5f06bb0 00573 x29: ffff0000c5f06bb0 x28: ffff0000c998a908 x27: 1fffe00019331521 00573 x26: ffff0000d0620900 x25: ffff0000c5f06ca0 x24: ffff8000828848c0 00573 x23: 1fffe00018be0d8e x22: ffff0000c1210000 x21: ffff0000c75fde00 00573 x20: ffff800080bfd258 x19: ffff0000cad63400 x18: ffff0000cd3a4810 00573 x17: 0000000000000000 x16: 0000000000000000 x15: ffff800080508d98 00573 x14: 0000000000000000 x13: 204d49414c434552 x12: 1fffe0001b6eeab2 00573 x11: ffff60001b6eeab2 x10: dfff800000000000 x9 : ffff60001b6eeab3 00573 x8 : 0000000000000001 x7 : 00009fffe491154e x6 : ffff0000db775593 00573 x5 : ffff0000db775590 x4 : ffff0000db775590 x3 : 0000000000000000 00573 x2 : 0000000000000027 x1 : ffff600018be0d62 x0 : dfff800000000000 00573 Call trace: 00573 check_flush_dependency+0x2a4/0x328 00573 __flush_work+0x184/0x5c8 00573 flush_work+0x18/0x28 00573 xfs_flush_inodes+0x68/0x88 00573 xfs_file_buffered_write+0x128/0x6f0 00573 xfs_file_write_iter+0x358/0x448 00573 nfs_local_doio+0x854/0x1568 00573 nfs_initiate_pgio+0x214/0x418 00573 nfs_generic_pg_pgios+0x304/0x480 00573 nfs_pageio_doio+0xe8/0x240 00573 nfs_pageio_complete+0x160/0x480 00573 nfs_writepages+0x300/0x4f0 00573 do_writepages+0x12c/0x4a0 00573 __writeback_single_inode+0xd4/0xa68 00573 writeback_sb_inodes+0x470/0xcb0 00573 __writeback_inodes_wb+0xb0/0x1d0 00573 wb_writeback+0x594/0x808 00573 wb_workfn+0x5e8/0x9e0 00573 process_scheduled_works+0x53c/0xd90 00573 worker_thread+0x370/0x8c8 00573 kthread+0x258/0x2e8 00573 ret_from_fork+0x10/0x20 2) Some filesystem writeback routines can end up taking up a lot of stack space (particularly XFS). Instead of risking running over due to the extra overhead from the NFS stack, we should just call these routines from a workqueue job. Since we need to do this to address 1) above we're able to avoid possibly blowing the stack "for free". Use of dedicated workqueues improves performance over using the system_unbound_wq. Also, the creds used to open the file are used to override_creds() in both nfs_local_call_read() and nfs_local_call_write() -- otherwise the workqueue could have elevated capabilities (which the caller may not). Lastly, care is taken to set PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO in nfs_do_local_write() to avoid writeback deadlocks. The PF_LOCAL_THROTTLE flag prevents deadlocks in balance_dirty_pages() by causing writes to only be throttled against other writes to the same bdi (it keeps the throttling local). Normally all writes to bdi(s) are throttled equally (after throughput factors are allowed for). The PF_MEMALLOC_NOIO flag prevents the lower filesystem IO from causing memory reclaim to re-enter filesystems or IO devices and so prevents deadlocks from occuring where IO that cleans pages is waiting on IO to complete. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> # eliminated wait_for_completion Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23pnfs/flexfiles: enable localio supportTrond Myklebust2-4/+52
If the DS is local to this client use localio to write the data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: enable localio for non-pNFS IOTrond Myklebust2-2/+12
Try a local open of the file being written to, and if it succeeds, then use localio to issue IO. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: add LOCALIO supportWeston Andros Adamson8-0/+758
Add client support for bypassing NFS for localhost reads, writes, and commits. This is only useful when the client and the server are running on the same host. nfs_local_probe() is stubbed out, later commits will enable client and server handshake via a Linux-only LOCALIO auxiliary RPC protocol. This has dynamic binding with the nfsd module (via nfs_localio module which is part of nfs_common). LOCALIO will only work if nfsd is already loaded. The "localio_enabled" nfs kernel module parameter can be used to disable and enable the ability to use LOCALIO support. CONFIG_NFS_LOCALIO enables NFS client support for LOCALIO. Lastly, LOCALIO uses an nfsd_file to initiate all IO. To make proper use of nfsd_file (and nfsd's filecache) its lifetime (duration before nfsd_file_put is called) must extend until after commit, read and write operations. So rather than immediately drop the nfsd_file reference in nfs_local_open_fh(), that doesn't happen until nfs_local_pgio_release() for read/write and not until nfs_local_release_commit_data() for commit. The same applies to the reference held on nfsd's nn->nfsd_serv. Both objects' lifetimes and associated references are managed through calls to nfs_to->nfsd_file_put_local(). Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> # nfs_open_local_fh Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commitMike Snitzer6-13/+19
The nfsd_file will be passed, in future commits, by callers that enable LOCALIO support (for both regular NFS and pNFS IO). [Derived from patch authored by Weston Andros Adamson, but switched from passing struct file to struct nfsd_file] Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: implement server support for NFS_LOCALIO_PROGRAMMike Snitzer3-1/+103
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL" RPC method that allows the Linux NFS client to verify the local Linux NFS server can see the nonce (single-use UUID) the client generated and made available in nfs_common. The server expects this protocol to use the same transport as NFS and NFSACL for its RPCs. This protocol isn't part of an IETF standard, nor does it need to be considering it is Linux-to-Linux auxiliary RPC protocol that amounts to an implementation detail. The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode XDR methods are used instead of the less efficient variable sized methods. The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ): Linux Kernel Organization 400122 nfslocalio Signed-off-by: Mike Snitzer <snitzer@kernel.org> [neilb: factored out and simplified single localio protocol] Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: add LOCALIO supportWeston Andros Adamson7-3/+126
Add server support for bypassing NFS for localhost reads, writes, and commits. This is only useful when both the client and server are running on the same host. If nfsd_open_local_fh() fails then the NFS client will both retry and fallback to normal network-based read, write and commit operations if localio is no longer supported. Care is taken to ensure the same NFS security mechanisms are used (authentication, etc) regardless of whether localio or regular NFS access is used. The auth_domain established as part of the traditional NFS client access to the NFS server is also used for localio. Store auth_domain for localio in nfsd_uuid_t and transfer it to the client if it is local to the server. Relative to containers, localio gives the client access to the network namespace the server has. This is required to allow the client to access the server's per-namespace nfsd_net struct. This commit also introduces the use of NFSD's percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh and other LOCALIO client code. CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: prepare for the NFS client to use nfsd_file for LOCALIOMike Snitzer4-1/+55
The next commit will introduce nfsd_open_local_fh() which returns an nfsd_file structure. This commit exposes LOCALIO's required NFSD symbols to the NFS client: - Make nfsd_open_local_fh() symbol and other required NFSD symbols available to NFS in a global 'nfs_to' nfsd_localio_operations struct (global access suggested by Trond, nfsd_localio_operations suggested by NeilBrown). The next commit will also introduce nfsd_localio_ops_init() that init_nfsd() will call to initialize 'nfs_to'. - Introduce nfsd_file_file() that provides access to nfsd_file's backing file. Keeps nfsd_file structure opaque to NFS client (as suggested by Jeff Layton). - Introduce nfsd_file_put_local() that will put the reference to the nfsd_file's associated nn->nfsd_serv and then put the reference to the nfsd_file (as suggested by NeilBrown). Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to Suggested-by: NeilBrown <neilb@suse.de> # nfsd_localio_operations Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfs_common: add NFS LOCALIO auxiliary protocol enablementMike Snitzer3-0/+142
fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client to generate a nonce (single-use UUID) and associated nfs_uuid_t struct, register it with nfs_common for subsequent lookup and verification by the NFS server and if matched the NFS server populates members in the nfs_uuid_t struct. nfs_common's nfs_uuids list is the basis for localio enablement, as such it has members that point to nfsd memory for direct use by the client (e.g. 'net' is the server's network namespace, through it the client can access nn->nfsd_serv). This commit also provides the base nfs_uuid_t interfaces to allow proper net namespace refcounting for the LOCALIO use case. CONFIG_NFS_LOCALIO controls the nfs_common, NFS server and NFS client enablement for LOCALIO. If both NFS_FS=m and NFSD=m then NFS_COMMON_LOCALIO_SUPPORT=m and nfs_localio.ko is built (and provides nfs_common's LOCALIO support). # lsmod | grep nfs_localio nfs_localio 12288 2 nfsd,nfs sunrpc 745472 35 nfs_localio,nfsd,auth_rpcgss,lockd,nfsv3,nfs Signed-off-by: Mike Snitzer <snitzer@kernel.org> Co-developed-by: NeilBrown <neilb@suse.de> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: replace program list with program arrayNeilBrown3-21/+21
A service created with svc_create_pooled() can be given a linked list of programs and all of these will be served. Using a linked list makes it cumbersome when there are several programs that can be optionally selected with CONFIG settings. After this patch is applied, API consumers must use only svc_create_pooled() when creating an RPC service that listens for more than one RPC program. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: add nfsd_serv_try_get and nfsd_serv_putMike Snitzer2-1/+50
Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any caller of nfsd_serv_try_get releases their reference using nfsd_serv_put. A percpu_ref is used to implement the interlock between nfsd_destroy_serv and any caller of nfsd_serv_try_get. This interlock is needed to properly wait for the completion of client initiated localio calls to nfsd (that are _not_ in the context of nfsd). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: add nfsd_file_acquire_local()NeilBrown4-7/+92
nfsd_file_acquire_local() can be used to look up a file by filehandle without having a struct svc_rqst. This can be used by NFS LOCALIO to allow the NFS client to bypass the NFS protocol to directly access a file provided by the NFS server which is running in the same kernel. In nfsd_file_do_acquire() care is taken to always use fh_verify() if rqstp is not NULL (as is the case for non-LOCALIO callers). Otherwise the non-LOCALIO callers will not supply the correct and required arguments to __fh_verify (e.g. gssclient isn't passed). Introduce fh_verify_local() wrapper around __fh_verify to make it clear that LOCALIO is intended caller. Also, use GC for nfsd_file returned by nfsd_file_acquire_local. GC offers performance improvements if/when a file is reopened before launderette cleans it from the filecache's LRU. Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23nfsd: factor out __fh_verify to allow NULL rqstp to be passedNeilBrown1-31/+60
__fh_verify() offers an interface like fh_verify() but doesn't require a struct svc_rqst *, instead it also takes the specific parts as explicit required arguments. So it is safe to call __fh_verify() with a NULL rqstp, but the net, cred, and client args must not be NULL. __fh_verify() does not use SVC_NET(), nor does the functions it calls. Rather than using rqstp->rq_client pass the client and gssclient explicitly to __fh_verify and then to nfsd_set_fh_dentry(). Lastly, it should be noted that the previous commit prepared for 4 associated tracepoints to only be used if rqstp is not NULL (this is a stop-gap that should be properly fixed so localio also benefits from the utility these tracepoints provide when debugging fh_verify issues). Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Short-circuit fh_verify tracepoints for LOCALIOChuck Lever1-8/+10
LOCALIO will be able to call fh_verify() with a NULL rqstp. In this case, the existing trace points need to be skipped because they want to dereference the address fields in the passed-in rqstp. Temporarily make these trace points conditional to avoid a seg fault in this case. Putting the "rqstp != NULL" check in the trace points themselves makes the check more efficient. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()Chuck Lever1-4/+4
Currently, fh_verify() makes some daring assumptions about which version of file handle the caller wants, based on the things it can find in the passed-in rqstp. The about-to-be-introduced LOCALIO use case sometimes has no svc_rqst context, so this logic won't work in that case. Instead, examine the passed-in file handle. It's .max_size field should carry information to allow nfsd_set_fh_dentry() to initialize the file handle appropriately. The file handle used by lockd and the one created by write_filehandle never need any of the version-specific fields (which affect things like write and getattr requests and pre/post attributes). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>