Age | Commit message (Collapse) | Author | Files | Lines |
|
The next commit will introduce nfsd_open_local_fh() which returns an
nfsd_file structure. This commit exposes LOCALIO's required NFSD
symbols to the NFS client:
- Make nfsd_open_local_fh() symbol and other required NFSD symbols
available to NFS in a global 'nfs_to' nfsd_localio_operations
struct (global access suggested by Trond, nfsd_localio_operations
suggested by NeilBrown). The next commit will also introduce
nfsd_localio_ops_init() that init_nfsd() will call to initialize
'nfs_to'.
- Introduce nfsd_file_file() that provides access to nfsd_file's
backing file. Keeps nfsd_file structure opaque to NFS client (as
suggested by Jeff Layton).
- Introduce nfsd_file_put_local() that will put the reference to the
nfsd_file's associated nn->nfsd_serv and then put the reference to
the nfsd_file (as suggested by NeilBrown).
Suggested-by: Trond Myklebust <[email protected]> # nfs_to
Suggested-by: NeilBrown <[email protected]> # nfsd_localio_operations
Suggested-by: Jeff Layton <[email protected]> # nfsd_file_file
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS
client to generate a nonce (single-use UUID) and associated nfs_uuid_t
struct, register it with nfs_common for subsequent lookup and
verification by the NFS server and if matched the NFS server populates
members in the nfs_uuid_t struct.
nfs_common's nfs_uuids list is the basis for localio enablement, as
such it has members that point to nfsd memory for direct use by the
client (e.g. 'net' is the server's network namespace, through it the
client can access nn->nfsd_serv).
This commit also provides the base nfs_uuid_t interfaces to allow
proper net namespace refcounting for the LOCALIO use case.
CONFIG_NFS_LOCALIO controls the nfs_common, NFS server and NFS client
enablement for LOCALIO. If both NFS_FS=m and NFSD=m then
NFS_COMMON_LOCALIO_SUPPORT=m and nfs_localio.ko is built (and provides
nfs_common's LOCALIO support).
# lsmod | grep nfs_localio
nfs_localio 12288 2 nfsd,nfs
sunrpc 745472 35 nfs_localio,nfsd,auth_rpcgss,lockd,nfsv3,nfs
Signed-off-by: Mike Snitzer <[email protected]>
Co-developed-by: NeilBrown <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
A service created with svc_create_pooled() can be given a linked list of
programs and all of these will be served.
Using a linked list makes it cumbersome when there are several programs
that can be optionally selected with CONFIG settings.
After this patch is applied, API consumers must use only
svc_create_pooled() when creating an RPC service that listens for more
than one RPC program.
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Acked-by: Chuck Lever <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
caller of nfsd_serv_try_get releases their reference using nfsd_serv_put.
A percpu_ref is used to implement the interlock between
nfsd_destroy_serv and any caller of nfsd_serv_try_get.
This interlock is needed to properly wait for the completion of client
initiated localio calls to nfsd (that are _not_ in the context of nfsd).
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: Chuck Lever <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
nfsd_file_acquire_local() can be used to look up a file by filehandle
without having a struct svc_rqst. This can be used by NFS LOCALIO to
allow the NFS client to bypass the NFS protocol to directly access a
file provided by the NFS server which is running in the same kernel.
In nfsd_file_do_acquire() care is taken to always use fh_verify() if
rqstp is not NULL (as is the case for non-LOCALIO callers). Otherwise
the non-LOCALIO callers will not supply the correct and required
arguments to __fh_verify (e.g. gssclient isn't passed).
Introduce fh_verify_local() wrapper around __fh_verify to make it
clear that LOCALIO is intended caller.
Also, use GC for nfsd_file returned by nfsd_file_acquire_local. GC
offers performance improvements if/when a file is reopened before
launderette cleans it from the filecache's LRU.
Suggested-by: Jeff Layton <[email protected]> # use filecache's GC
Signed-off-by: NeilBrown <[email protected]>
Co-developed-by: Mike Snitzer <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
__fh_verify() offers an interface like fh_verify() but doesn't require
a struct svc_rqst *, instead it also takes the specific parts as
explicit required arguments. So it is safe to call __fh_verify() with
a NULL rqstp, but the net, cred, and client args must not be NULL.
__fh_verify() does not use SVC_NET(), nor does the functions it calls.
Rather than using rqstp->rq_client pass the client and gssclient
explicitly to __fh_verify and then to nfsd_set_fh_dentry().
Lastly, it should be noted that the previous commit prepared for 4
associated tracepoints to only be used if rqstp is not NULL (this is a
stop-gap that should be properly fixed so localio also benefits from
the utility these tracepoints provide when debugging fh_verify
issues).
Signed-off-by: NeilBrown <[email protected]>
Co-developed-by: Mike Snitzer <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
LOCALIO will be able to call fh_verify() with a NULL rqstp. In this
case, the existing trace points need to be skipped because they
want to dereference the address fields in the passed-in rqstp.
Temporarily make these trace points conditional to avoid a seg
fault in this case. Putting the "rqstp != NULL" check in the trace
points themselves makes the check more efficient.
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Currently, fh_verify() makes some daring assumptions about which
version of file handle the caller wants, based on the things it can
find in the passed-in rqstp. The about-to-be-introduced LOCALIO use
case sometimes has no svc_rqst context, so this logic won't work in
that case.
Instead, examine the passed-in file handle. It's .max_size field
should carry information to allow nfsd_set_fh_dentry() to initialize
the file handle appropriately.
The file handle used by lockd and the one created by write_filehandle
never need any of the version-specific fields (which affect things
like write and getattr requests and pre/post attributes).
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
There are several places where __fh_verify unconditionally dereferences
rqstp to check that the connection is suitably secure. They look at
rqstp->rq_xprt which is not meaningful in the target use case of
"localio" NFS in which the client talks directly to the local server.
Prepare these to always succeed when rqstp is NULL.
Signed-off-by: NeilBrown <[email protected]>
Co-developed-by: Mike Snitzer <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
LOCALIO-initiated open operations are not running in an nfsd thread
and thus do not have an associated svc_rqst context.
Signed-off-by: NeilBrown <[email protected]>
Co-developed-by: Mike Snitzer <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Eliminates duplicate functions in various files to allow for
additional callers.
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Common nfs4_stat_to_errno() is used by fs/nfs/nfs4xdr.c and will be
used by fs/nfs/localio.c
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and
fs/nfs/nfs3xdr.c
Will also be used by fs/nfsd/localio.c
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
There are some applications that write to predefined non-overlapping
file offsets from multiple clients and therefore don't need to rely on
file locking. However, if these applications want non-aligned offsets
and sizes they need to either use locks or risk data corruption, as the
NFS client defaults to extending writes to whole pages.
This commit adds a new mount option `noalignwrite`, which allows to turn
that off and avoid the need of locking, as long as these applications
don't overlap on offsets.
Signed-off-by: Dan Aloni <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
The comment for nfs_get_root() needs to be updated as it would also be
used by NFS4 as follows:
@x[
nfs_get_root+1
nfs_get_tree_common+1819
nfs_get_tree+2594
vfs_get_tree+73
fc_mount+23
do_nfs4_mount+498
nfs4_try_get_tree+134
nfs_get_tree+2562
vfs_get_tree+73
path_mount+2776
do_mount+226
__se_sys_mount+343
__x64_sys_mount+106
do_syscall_64+69
entry_SYSCALL_64_after_hwframe+97
, mount.nfs4]: 1
Signed-off-by: Li Lingfeng <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
According to draft-ietf-nfsv4-delstid-07:
If a server informs the client via the fattr4_open_arguments
attribute that it supports
OPEN_ARGS_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS and it returns a valid
delegation stateid for an OPEN operation which sets the
OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS flag, then it MUST query the
client via a CB_GETATTR for the fattr4_time_deleg_access (see
Section 5.2) attribute and fattr4_time_deleg_modify attribute (see
Section 5.2).
Thus, we should look that the server supports proxying of times via
OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS.
We want to be extra pedantic and continue to check that FATTR4_TIME_DELEG_ACCESS
and FATTR4_TIME_DELEG_MODIFY are set. The server needs to expose both for the
client to correctly detect "Proxying of Times" support.
Signed-off-by: Roi Azarzar <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Fixes: dcb3c20f7419 ("NFSv4: Add a capability for delegated attributes")
Signed-off-by: Anna Schumaker <[email protected]>
|
|
If the server is down when the client is trying to mount, so that the
calls to exchange_id or create_session fail, then we should allow the
mount system call to fail rather than hang and block other mount/umount
calls.
Reported-by: Oleksandr Tymoshenko <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
folio_attach_private
This patch is inspired by a code review of fs codes which aims at
folio's extra refcnt that could introduce unwanted behavious when
judging refcnt, such as[1].That is, the folio passed to
mapping_evict_folio carries the refcnts from find_lock_entries,
page_cache, corresponding to PTEs and folio's private if has. However,
current code doesn't take the refcnt for folio's private which could
have mapping_evict_folio miss the one to only PTE and lead to
call filemap_release_folio wrongly.
[1]
long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
{
...
//current code will misjudge here if there is one pte on the folio which
is be deemed as the one as folio's private
if (folio_ref_count(folio) >
folio_nr_pages(folio) + folio_has_private(folio) + 1)
return 0;
if (!filemap_release_folio(folio, 0))
return 0;
return remove_mapping(mapping, folio);
}
Signed-off-by: Zhaoyang Huang <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
The nfs_read_prepare() have been removed since
commit a4cdda59111f ("NFS: Create a common pgio_rpc_prepare function"),
and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Since kfree() already checks if its argument is NULL, an additional
check before calling kfree() is unnecessary and can be removed.
Remove it and thus also the following Coccinelle/coccicheck warning
reported by ifnullfree.cocci:
WARNING: NULL check before some freeing functions is not needed
Reviewed-by: Benjamin Coddington <[email protected]>
Signed-off-by: Thorsten Blum <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Add the __counted_by compiler attribute to the flexible array member
array to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Increment size before adding a new struct to the array.
Signed-off-by: Thorsten Blum <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
I have evidence of an Linux NFS client getting NFS4ERR_BAD_SEQID to a
v4.0 LOCK request to a Linux server (which had fixed the problem with
RELEASE_LOCKOWNER bug fixed).
The LOCK request presented a "new" lock owner so there are two seq ids
in the request: that for the open file, and that for the new lock.
Given the context I am confident that the new lock owner was reported to
have the wrong seqid. As lock owner identifiers are reused, the server
must still have a lock owner active which the client thinks is no longer
active.
I wasn't able to determine a root-cause but the simplest fix seems to be
to ensure lock owners are always unique much as open owners are (thanks
to a time stamp). The easiest way to ensure uniqueness is with a 64bit
counter for each server. That will never cycle (if updated once a
nanosecond the last 584 years. A single NFS server would not handle
open/lock requests nearly that fast, and a Linux node is unlikely to
have an uptime approaching that).
This patch removes the 2 ida and instead uses a per-server
atomic64_t to provide uniqueness.
Note that the lock owner already encodes the id as 64 bits even though
it is a 32bit value. So changing to a 64bit value does not change the
encoding of the lock owner. The open owner encoding is now 4 bytes
larger.
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Commit c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in
nfs4_do_reclaim()") separate out the freeing of the state owners from
nfs4_purge_state_owners() and finish it outside the rcu lock.
However, the error path is omitted. As a result, the state owners in
"freeme" will not be released.
Fix it by adding freeing in the error path.
Fixes: c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()")
Signed-off-by: Li Lingfeng <[email protected]>
Cc: [email protected] # v5.3+
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Pull nfsd updates from Chuck Lever:
"Notable features of this release include:
- Pre-requisites for automatically determining the RPC server thread
count
- Clean-up and preparation for supporting LOCALIO, which will be
merged via the NFS client tree
- Enhancements and fixes to NFSv4.2 COPY offload
- A new Python-based tool for generating kernel SunRPC XDR encoding
and decoding functions, added as an aid for prototyping features in
protocols based on the Linux kernel's SunRPC implementation
As always I am grateful to the NFSD contributors, reviewers, testers,
and bug reporters who participated during this cycle"
* tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (57 commits)
xdrgen: Prevent reordering of encoder and decoder functions
xdrgen: typedefs should use the built-in string and opaque functions
xdrgen: Fix return code checking in built-in XDR decoders
tools: Add xdrgen
nfsd: fix delegation_blocked() to block correctly for at least 30 seconds
nfsd: fix initial getattr on write delegation
nfsd: untangle code in nfsd4_deleg_getattr_conflict()
nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall()
nfsd: return -EINVAL when namelen is 0
NFSD: Wrap async copy operations with trace points
NFSD: Clean up extra whitespace in trace_nfsd_copy_done
NFSD: Record the callback stateid in copy tracepoints
NFSD: Display copy stateids with conventional print formatting
NFSD: Limit the number of concurrent async COPY operations
NFSD: Async COPY result needs to return a write verifier
nfsd: avoid races with wake_up_var()
nfsd: use clear_and_wake_up_bit()
sunrpc: xprtrdma: Use ERR_CAST() to return
NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by()
nfsd: call cache_put if xdr_reserve_space returns NULL
...
|
|
NFSD 6.12 Release Notes
Notable features of this release include:
- Pre-requisites for automatically determining the RPC server thread
count
- Clean-up and preparation for supporting LOCALIO, which will be
merged via the NFS client tree
- Enhancements and fixes to NFSv4.2 COPY offload
- A new Python-based tool for generating kernel SunRPC XDR encoding
and decoding functions, added as an aid for prototyping features
in protocols based on the Linux kernel's SunRPC implementation.
As always I am grateful to the NFSD contributors, reviewers,
testers, and bug reporters who participated during this cycle.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 update from Andreas Gruenbacher:
- Convert the writepage address space operation to writepages (Matthew
Wilcox)
- A syzkaller fix (by Julian Sun) and a minor cleanup (Andreas
Gruenbacher)
* tag 'gfs2-v6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Remove gfs2_aspace_writepage()
gfs2: Remove gfs2_jdata_writepage()
gfs2: Remove __gfs2_writepage()
gfs2: Add gfs2_aspace_writepages()
gfs2: fix double destroy_workqueue error
gfs2: Minor gfs2_glock_cb cleanup
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix dangling pointer to rb-tree of defragmented inodes after cleanup
- a followup fix to handle concurrent lseek on the same fd that could
leak memory under some conditions
- fix wrong root id reported in tree checker when verifying dref
* tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix use-after-free on rbtree that tracks inodes for auto defrag
btrfs: tree-checker: fix the wrong output of data backref objectid
btrfs: fix race setting file private on concurrent lseek using same fd
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota and isofs updates from Jan Kara:
"A few small cleanups in quota and isofs"
* tag 'fs_for_v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
isofs: Annotate struct SL_component with __counted_by()
quota: remove unnecessary error code translation in dquot_quota_enable
quota: remove redundant return at end of void function
quota: remove unneeded return value of register_quota_format
quota: avoid missing put_quota_format when DQUOT_SUSPENDED is passed
|
|
Pull bcachefs updates from Kent Overstreet:
- rcu_pending, btree key cache rework: this solves lock contenting in
the key cache, eliminating the biggest source of the srcu lock hold
time warnings, and drastically improving performance on some metadata
heavy workloads - on multithreaded creates we're now 3-4x faster than
xfs.
- We're now using an rhashtable instead of the system inode hash table;
this is another significant performance improvement on multithreaded
metadata workloads, eliminating more lock contention.
- for_each_btree_key_in_subvolume_upto(): new helper for iterating over
keys within a specific subvolume, eliminating a lot of open coded
"subvolume_get_snapshot()" and also fixing another source of srcu
lock time warnings, by running each loop iteration in its own
transaction (as the existing for_each_btree_key() does).
- More work on btree_trans locking asserts; we now assert that we don't
hold btree node locks when trans->locked is false, which is important
because we don't use lockdep for tracking individual btree node
locks.
- Some cleanups and improvements in the bset.c btree node lookup code,
from Alan.
- Rework of btree node pinning, which we use in backpointers fsck. The
old hacky implementation, where the shrinker just skipped over nodes
in the pinned range, was causing OOMs; instead we now use another
shrinker with a much higher seeks number for pinned nodes.
- Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue
where rebalance would sometimes fall back to allocating from the full
filesystem, which is not what we want when it's trying to move data
to a specific target.
- Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache
allocations.
- Idmap mounts are now supported (Hongbo Li)
- Rename whiteouts are now supported (Hongbo Li)
- Erasure coding can now handle devices being marked as failed, or
forcibly removed. We still need the evacuate path for erasure coding,
but it's getting very close to ready for people to start using.
* tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs: (99 commits)
bcachefs: return err ptr instead of null in read sb clean
bcachefs: Remove duplicated include in backpointers.c
bcachefs: Don't drop devices with stripe pointers
bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices
bcachefs: bch_fs.rw_devs_change_count
bcachefs: bch2_dev_remove_stripes()
bcachefs: bch2_trigger_ptr() calculates sectors even when no device
bcachefs: improve error messages in bch2_ec_read_extent()
bcachefs: improve error message on too few devices for ec
bcachefs: improve bch2_new_stripe_to_text()
bcachefs: ec_stripe_head.nr_created
bcachefs: bch_stripe.disk_label
bcachefs: stripe_to_mem()
bcachefs: EIO errcode cleanup
bcachefs: Rework btree node pinning
bcachefs: split up btree cache counters for live, freeable
bcachefs: btree cache counters should be size_t
bcachefs: Don't count "skipped access bit" as touched in btree cache scan
bcachefs: Failed devices no longer require mounting in degraded mode
bcachefs: bch2_dev_rcu_noerror()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 'struct fd' updates from Al Viro:
"Just the 'struct fd' layout change, with conversion to accessor
helpers"
* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add struct fd constructors, get rid of __to_fd()
struct fd: representation change
introduce fd_file(), convert all accessors to it.
|
|
This patch allows f2fs to submit bios of in-place writes on pinned file.
Reviewed-by: Daeho Jeong <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
|
|
failed
If nfsd_startup_net() fails and so ->nfsd_net_up is false,
nfsd_destroy_serv() doesn't currently call svc_destroy(). It should.
Fixes: 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put")
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
NeilBrown says:
> The handling of NFSD_FILE_CACHE_UP is strange. nfsd_file_cache_init()
> sets it, but doesn't clear it on failure. So if nfsd_file_cache_init()
> fails for some reason, nfsd_file_cache_shutdown() would still try to
> clean up if it was called.
Reported-by: NeilBrown <[email protected]>
Fixes: c7b824c3d06c ("NFSD: Replace the "init once" mechanism")
Signed-off-by: Chuck Lever <[email protected]>
|
|
If exfat_load_upcase_table reaches end and returns -EINVAL,
allocated memory doesn't get freed and while
exfat_load_default_upcase_table allocates more memory, leading to a
memory leak.
Here's link to syzkaller crash report illustrating this issue:
https://syzkaller.appspot.com/text?tag=CrashReport&x=1406c201980000
Reported-by: [email protected]
Fixes: a13d1a4de3b0 ("exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper")
Cc: [email protected]
Signed-off-by: Daniel Yang <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
|
|
It is not a good way to extend valid_size to the end of the
mmap area by writing zeros in mmap. Because after calling mmap,
no data may be written, or only a small amount of data may be
written to the head of the mmap area.
This commit moves extending valid_size to exfat_page_mkwrite().
In exfat_page_mkwrite() only extend valid_size to the starting
position of new data writing, which reduces unnecessary writing
of zeros.
If the block is not mapped and is marked as new after being
mapped for writing, block_write_begin() will zero the page
cache corresponding to the block, so there is no need to call
zero_user_segment() in exfat_file_zeroed_range(). And after moving
extending valid_size to exfat_page_mkwrite(), the data written by
mmap will be copied to the page cache but the page cache may be
not mapped to the disk. Calling zero_user_segment() will cause
the data written by mmap to be cleared. So this commit removes
calling zero_user_segment() from exfat_file_zeroed_range() and
renames exfat_file_zeroed_range() to exfat_extend_valid_size().
Signed-off-by: Yuezhang Mo <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
|
|
We should convert fs/fuse code to use a newly introduced
invalid_mnt_idmap instead of passing a NULL as idmap pointer.
Suggested-by: Christian Brauner <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
Link: https://lore.kernel.org/linux-fsdevel/20240904-baugrube-erhoben-b3c1c49a2645@brauner/
Suggested-by: Christian Brauner <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
Let's convert all existing callers properly.
No functional changes intended.
Suggested-by: Christian Brauner <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
It was reported [1] that on linux-next/fs-next the following crash
is reproducible:
[ 42.659136] Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 42.660501] fbcon: Taking over console
[ 42.660930] KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f]
[ 42.661752] CPU: 1 UID: 0 PID: 1589 Comm: dtprobed Not tainted 6.11.0-rc6+ #1
[ 42.662565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.6.6 08/22/2023
[ 42.663472] RIP: 0010:fuse_get_req+0x36b/0x990 [fuse]
[ 42.664046] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8c 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6d 08 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 05 00 00 f6 45 59 20 0f 85 06 03 00 00 48 83
[ 42.666945] RSP: 0018:ffffc900009a7730 EFLAGS: 00010212
[ 42.668837] RAX: dffffc0000000000 RBX: 1ffff92000134eed RCX: ffffffffc20dec9a
[ 42.670122] RDX: 000000000000000b RSI: 0000000000000008 RDI: 0000000000000058
[ 42.672154] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1022110172
[ 42.672160] R10: ffff888110880b97 R11: ffffc900009a737a R12: 0000000000000001
[ 42.672179] R13: ffff888110880b60 R14: ffff888110880b90 R15: ffff888169973840
[ 42.672186] FS: 00007f28cd21d7c0(0000) GS:ffff8883ef280000(0000) knlGS:0000000000000000
[ 42.672191] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.[ CR02: ;32m00007f3237366208 CR3: 0 OK 79e001 CR4: 0000000000770ef0
[ 42.672214] PKRU: 55555554
[ 42.672218] Call Trace:
[ 42.672223] <TASK>
[ 42.672226] ? die_addr+0x41/0xa0
[ 42.672238] ? exc_general_protection+0x14c/0x230
[ 42.672250] ? asm_exc_general_protection+0x26/0x30
[ 42.672260] ? fuse_get_req+0x77a/0x990 [fuse]
[ 42.672281] ? fuse_get_req+0x36b/0x990 [fuse]
[ 42.672300] ? kasan_unpoison+0x27/0x60
[ 42.672310] ? __pfx_fuse_get_req+0x10/0x10 [fuse]
[ 42.672327] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672333] ? alloc_pages_mpol_noprof+0x195/0x440
[ 42.672340] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672345] ? kasan_unpoison+0x27/0x60
[ 42.672350] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672355] ? __kasan_slab_alloc+0x4d/0x90
[ 42.672362] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672367] ? __kmalloc_cache_noprof+0x134/0x350
[ 42.672376] fuse_simple_background+0xe7/0x180 [fuse]
[ 42.672406] cuse_channel_open+0x540/0x710 [cuse]
[ 42.672415] misc_open+0x2a7/0x3a0
[ 42.672424] chrdev_open+0x1ef/0x5f0
[ 42.672432] ? __pfx_chrdev_open+0x10/0x10
[ 42.672439] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672443] ? security_file_open+0x3bb/0x720
[ 42.672451] do_dentry_open+0x43d/0x1200
[ 42.672459] ? __pfx_chrdev_open+0x10/0x10
[ 42.672468] vfs_open+0x79/0x340
[ 42.672475] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672482] do_open+0x68c/0x11e0
[ 42.672489] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672495] ? __pfx_do_open+0x10/0x10
[ 42.672501] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.672506] ? open_last_lookups+0x2a2/0x1370
[ 42.672515] path_openat+0x24f/0x640
[ 42.672522] ? __pfx_path_openat+0x10/0x10
[ 42.723972] ? stack_depot_save_flags+0x45/0x4b0
[ 42.724787] ? __fput+0x43c/0xa70
[ 42.725100] do_filp_open+0x1b3/0x3e0
[ 42.725710] ? poison_slab_object+0x10d/0x190
[ 42.726145] ? __kasan_slab_free+0x33/0x50
[ 42.726570] ? __pfx_do_filp_open+0x10/0x10
[ 42.726981] ? do_syscall_64+0x64/0x170
[ 42.727418] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 42.728018] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.728505] ? do_raw_spin_lock+0x131/0x270
[ 42.728922] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 42.729494] ? do_raw_spin_unlock+0x14c/0x1f0
[ 42.729992] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.730889] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.732178] ? alloc_fd+0x176/0x5e0
[ 42.732585] do_sys_openat2+0x122/0x160
[ 42.732929] ? __pfx_do_sys_openat2+0x10/0x10
[ 42.733448] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.734013] ? __pfx_map_id_up+0x10/0x10
[ 42.734482] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.735529] ? __memcg_slab_free_hook+0x292/0x500
[ 42.736131] __x64_sys_openat+0x123/0x1e0
[ 42.736526] ? __pfx___x64_sys_openat+0x10/0x10
[ 42.737369] ? __x64_sys_close+0x7c/0xd0
[ 42.737717] ? srso_alias_return_thunk+0x5/0xfbef5
[ 42.738192] ? syscall_trace_enter+0x11e/0x1b0
[ 42.738739] do_syscall_64+0x64/0x170
[ 42.739113] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 42.739638] RIP: 0033:0x7f28cd13e87b
[ 42.740038] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25
[ 42.741943] RSP: 002b:00007ffc992546c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 42.742951] RAX: ffffffffffffffda RBX: 00007f28cd44f1ee RCX: 00007f28cd13e87b
[ 42.743660] RDX: 0000000000000002 RSI: 00007f28cd44f2fa RDI: 00000000ffffff9c
[ 42.744518] RBP: 00007f28cd44f2fa R08: 0000000000000000 R09: 0000000000000001
[ 42.745211] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[ 42.745920] R13: 00007f28cd44f2fa R14: 0000000000000000 R15: 0000000000000003
[ 42.746708] </TASK>
[ 42.746937] Modules linked in: cuse vfat fat ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common kvm_amd ccp bochs drm_vram_helper kvm drm_ttm_helper ttm pcspkr i2c_piix4 drm_kms_helper i2c_smbus pvpanic_mmio pvpanic joydev sch_fq_codel drm fuse xfs nvme_tcp nvme_fabrics nvme_core sd_mod sg virtio_net net_failover virtio_scsi failover crct10dif_pclmul crc32_pclmul ata_generic pata_acpi ata_piix ghash_clmulni_intel virtio_pci sha512_ssse3 virtio_pci_legacy_dev sha256_ssse3 virtio_pci_modern_dev sha1_ssse3 libata serio_raw dm_multipath btrfs blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qemu_fw_cfg aesni_intel crypto_simd cryptd
[ 42.754333] ---[ end trace 0000000000000000 ]---
[ 42.756899] RIP: 0010:fuse_get_req+0x36b/0x990 [fuse]
[ 42.757851] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8c 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6d 08 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4d 05 00 00 f6 45 59 20 0f 85 06 03 00 00 48 83
[ 42.760334] RSP: 0018:ffffc900009a7730 EFLAGS: 00010212
[ 42.760940] RAX: dffffc0000000000 RBX: 1ffff92000134eed RCX: ffffffffc20dec9a
[ 42.761697] RDX: 000000000000000b RSI: 0000000000000008 RDI: 0000000000000058
[ 42.763009] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1022110172
[ 42.763920] R10: ffff888110880b97 R11: ffffc900009a737a R12: 0000000000000001
[ 42.764839] R13: ffff888110880b60 R14: ffff888110880b90 R15: ffff888169973840
[ 42.765716] FS: 00007f28cd21d7c0(0000) GS:ffff8883ef280000(0000) knlGS:0000000000000000
[ 42.766890] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.767828] CR2: 00007f3237366208 CR3: 000000012c79e001 CR4: 0000000000770ef0
[ 42.768730] PKRU: 55555554
[ 42.769022] Kernel panic - not syncing: Fatal exception
[ 42.770758] Kernel Offset: 0x7200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 42.771947] ---[ end Kernel panic - not syncing: Fatal exception ]---
It's obviously CUSE related callstack. For CUSE case, we don't have superblock and
our checks for SB_I_NOIDMAP flag does not make any sense. Let's handle this case gracefully.
Fixes: aa16880d9f13 ("fuse: add basic infrastructure to support idmappings")
Link: https://lore.kernel.org/linux-next/87v7z586py.fsf@debian-BULLSEYE-live-builder-AMD64/ [1]
Reported-by: Chandan Babu R <[email protected]>
Reported-by: [email protected]
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
|
|
Enable support for multipage folios on the 9P filesystem. This is all
handled through netfslib and is already enabled on AFS and CIFS also.
Signed-off-by: David Howells <[email protected]>
cc: Eric Van Hensbergen <[email protected]>
cc: Latchesar Ionkov <[email protected]>
cc: Dominique Martinet <[email protected]>
cc: Christian Schoenebeck <[email protected]>
cc: Jeff Layton <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Message-ID: <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
|
|
It's possible for v9fs_fid_find "find by dentry" branch to not turn up
anything despite having an entry set (because e.g. uid doesn't match),
in which case the calling code will generally make an extra lookup
to the server.
In this case we might have had better luck looking by inode, so fall
back to look up by inode if we have one and the lookup by dentry failed.
Message-Id: <[email protected]>
Reviewed-by: Christian Schoenebeck <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
|
|
Merge user access fast validation using address masking.
This allows architectures to optionally use a data dependent address
masking model instead of a conditional branch for validating user
accesses. That avoids the Spectre-v1 speculation barriers.
Right now only x86-64 takes advantage of this, and not all architectures
will be able to do it. It requires a guard region between the user and
kernel address spaces (so that you can't overflow from one to the
other), and an easy way to generate a guaranteed-to-fault address for
invalid user pointers.
Also note that this currently assumes that there is no difference
between user read and write accesses. If extended to architectures like
powerpc, we'll also need to separate out the user read-vs-write cases.
* address-masking:
x86: make the masked_user_access_begin() macro use its argument only once
x86: do the user address masking outside the user access area
x86: support user address masking instead of non-speculative conditional
|
|
Syzbot reports a problem that a warning is triggered due to suspicious
use of rcu_dereference_check(). That is triggered by a call of
bch2_snapshot_tree_oldest_subvol().
The cause of the warning is that inside
bch2_snapshot_tree_oldest_subvol(), snapshot_t() is called which calls
rcu_dereference() that requires a read lock to be held. Also, the call
of bch2_snapshot_tree_next() eventually calls snapshot_t().
To fix this, call rcu_read_lock() before calling snapshot_t(). Then,
release the lock after the termination of the while loop.
Reported-by: <[email protected]>
Signed-off-by: Ahmed Ehab <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Introduce '__attribute__((bpf_fastcall))' for helpers and kfuncs with
corresponding support in LLVM.
It is similar to existing 'no_caller_saved_registers' attribute in
GCC/LLVM with a provision for backward compatibility. It allows
compilers generate more efficient BPF code assuming the verifier or
JITs will inline or partially inline a helper/kfunc with such
attribute. bpf_cast_to_kern_ctx, bpf_rdonly_cast,
bpf_get_smp_processor_id are the first set of such helpers.
- Harden and extend ELF build ID parsing logic.
When called from sleepable context the relevants parts of ELF file
will be read to find and fetch .note.gnu.build-id information. Also
harden the logic to avoid TOCTOU, overflow, out-of-bounds problems.
- Improvements and fixes for sched-ext:
- Allow passing BPF iterators as kfunc arguments
- Make the pointer returned from iter_next method trusted
- Fix x86 JIT convergence issue due to growing/shrinking conditional
jumps in variable length encoding
- BPF_LSM related:
- Introduce few VFS kfuncs and consolidate them in
fs/bpf_fs_kfuncs.c
- Enforce correct range of return values from certain LSM hooks
- Disallow attaching to other LSM hooks
- Prerequisite work for upcoming Qdisc in BPF:
- Allow kptrs in program provided structs
- Support for gen_epilogue in verifier_ops
- Important fixes:
- Fix uprobe multi pid filter check
- Fix bpf_strtol and bpf_strtoul helpers
- Track equal scalars history on per-instruction level
- Fix tailcall hierarchy on x86 and arm64
- Fix signed division overflow to prevent INT_MIN/-1 trap on x86
- Fix get kernel stack in BPF progs attached to tracepoint:syscall
- Selftests:
- Add uprobe bench/stress tool
- Generate file dependencies to drastically improve re-build time
- Match JIT-ed and BPF asm with __xlated/__jited keywords
- Convert older tests to test_progs framework
- Add support for RISC-V
- Few fixes when BPF programs are compiled with GCC-BPF backend
(support for GCC-BPF in BPF CI is ongoing in parallel)
- Add traffic monitor
- Enable cross compile and musl libc
* tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (260 commits)
btf: require pahole 1.21+ for DEBUG_INFO_BTF with default DWARF version
btf: move pahole check in scripts/link-vmlinux.sh to lib/Kconfig.debug
btf: remove redundant CONFIG_BPF test in scripts/link-vmlinux.sh
bpf: Call the missed kfree() when there is no special field in btf
bpf: Call the missed btf_record_free() when map creation fails
selftests/bpf: Add a test case to write mtu result into .rodata
selftests/bpf: Add a test case to write strtol result into .rodata
selftests/bpf: Rename ARG_PTR_TO_LONG test description
selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test
bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error
bpf: Improve check_raw_mode_ok test for MEM_UNINIT-tagged types
bpf: Fix helper writes to read-only maps
bpf: Remove truncation test in bpf_strtol and bpf_strtoul helpers
bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit
selftests/bpf: Add tests for sdiv/smod overflow cases
bpf: Fix a sdiv overflow issue
libbpf: Add bpf_object__token_fd accessor
docs/bpf: Add missing BPF program types to docs
docs/bpf: Add constant values for linkages
bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing
...
|
|
syzbot reported a null-ptr-deref in bch2_fs_start. [0]
When a sb is marked clear but doesn't have a clean section
bch2_read_superblock_clean returns NULL which PTR_ERR_OR_ZERO
lets through, eventually leading to a null ptr dereference down
the line. Adjust read sb clean to return an ERR_PTR indicating the
invalid clean section.
[0] https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543
Signed-off-by: Diogo Jahchan Koike <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
The header files bbpos.h is included twice in backpointers.c,
so one inclusion of each can be removed.
Reported-by: Abaci Robot <[email protected]>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10783
Signed-off-by: Yang Li <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
This factors out ec_strie_head_devs_update(), which initializes the
bitmap of devices we're allocating from, and runs it every time
c->rw_devs_change_count changes.
We also cancel pending, not allocated stripes, since they may refer to
devices that are no longer available.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Add a counter that's incremented whenever rw devices change; this will
be used for erasure coding so that it can keep ec_stripe_head in sync
and not deadlock on a new stripe when a device it wants goes away.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We can now correctly force-remove a device that has stripes on it; this
uses the new BCH_SB_MEMBER_INVALID sentinal value.
Signed-off-by: Kent Overstreet <[email protected]>
|