Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Allison Collins <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
If we encounter an IO error during writeback, log the inode, offset, and
sector number of the failure, instead of forcing the user to do some
sort of reverse mapping to figure out which file is affected.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
No need to pass the full bio_vec.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Move the initialization of ia and ib to the declaration line and remove
a superflous else.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Now that all the writepage code is in the iomap code there is no
need to keep this structure public.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Carlos Maiolino <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
And inline mapping should never mark the page dirty and thus never end up
in writepages. Add a check for that condition and warn if it happens.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Take the xfs writeback code and move it to fs/iomap. A new structure
with three methods is added as the abstraction from the generic writeback
code to the file system. These methods are used to map blocks, submit an
ioend, and cancel a page that encountered an error before it was added to
an ioend.
Signed-off-by: Christoph Hellwig <[email protected]>
[darrick: rename ->submit_ioend to ->prepare_ioend to clarify what it
does]
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Lift the xfs code for tracing address space operations to the iomap
layer.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
File systems like gfs2 don't support delayed allocations or unwritten
extents and thus allocate normal mapped blocks to fill holes. To
cover the case of such file systems allocating new blocks to fill holes
also zero out mapped blocks with the new flag.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
In preparation for moving the writeback code to iomap.c, replace the
XFS-specific COW fork concept with the iomap IOMAP_F_SHARED flag.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
In preparation for moving the ioend structure to common code we need
to get rid of the xfs-specific xfs_trans type. Just make it a file
system private void pointer instead.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Introduce two nicely abstracted helper, which can be moved to the iomap
code later. Also use list_first_entry_or_null to simplify the code a
bit.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
In preparation for moving the XFS writeback code to fs/iomap.c, switch
it to use struct iomap instead of the XFS-specific struct xfs_bmbt_irec.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Carlos Maiolino <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Don't set IOMAP_F_NEW if we COW over an existing allocated range, as
these aren't strictly new allocations. This is required to be able to
use IOMAP_F_NEW to zero newly allocated blocks, which is required for
the iomap code to fully support file systems that don't do delayed
allocations or use unwritten extents.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Currently we don't overwrite the flags field in the iomap in
xfs_bmbt_to_iomap. This works fine with 0-initialized iomaps on stack,
but is harmful once we want to be able to reuse an iomap in the
writeback code. Replace the shared parameter with a set of initial
flags an thus ensures the flags field is always reinitialized.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
When doing a direct IO that spans the current EOF, and there are
written blocks beyond EOF that extend beyond the current write, the
only metadata update that needs to be done is a file size extension.
However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that
there is IO completion metadata updates required, and hence we may
fail to correctly sync file size extensions made in IO completion
when O_DSYNC writes are being used and the hardware supports FUA.
Hence when setting IOMAP_F_DIRTY, we need to also take into account
whether the iomap spans the current EOF. If it does, then we need to
mark it dirty so that IO completion will call generic_write_sync()
to flush the inode size update to stable storage correctly.
Fixes: 3460cac1ca76 ("iomap: Use FUA for pure data O_DSYNC DIO writes")
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
[darrick: removed the ext4 part; they'll handle it separately]
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Use iomap_dio_rw() to wait for unaligned direct IO instead of opencoding
the wait.
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
Filesystems do not support doing IO as asynchronous in some cases. For
example in case of unaligned writes or in case file size needs to be
extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO
in such cases, add argument to iomap_dio_rw() which makes the function
wait for IO completion. This also results in executing
iomap_dio_complete() inline in iomap_dio_rw() providing its return value
to the caller as for ordinary sync IO.
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"A few tracing fixes:
- Remove lockdown from tracefs itself and moved it to the trace
directory. Have the open functions there do the lockdown checks.
- Fix a few races with opening an instance file and the instance
being deleted (Discovered during the lockdown updates). Kept
separate from the clean up code such that they can be backported to
stable easier.
- Clean up and consolidated the checks done when opening a trace
file, as there were multiple checks that need to be done, and it
did not make sense having them done in each open instance.
- Fix a regression in the record mcount code.
- Small hw_lat detector tracer fixes.
- A trace_pipe read fix due to not initializing trace_seq"
* tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
tracing/hwlat: Report total time spent in all NMIs during the sample
recordmcount: Fix nop_mcount() function
tracing: Do not create tracefs files if tracefs lockdown is in effect
tracing: Add locked_down checks to the open calls of files created for tracefs
tracing: Add tracing_check_open_get_tr()
tracing: Have trace events system open call tracing_open_generic_tr()
tracing: Get trace_array reference for available_tracers files
ftrace: Get a reference counter for the trace_array on filter files
tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
|
|
Pull io_uring fix from Jens Axboe:
"Single small fix for a regression in the sequence logic for linked
commands"
* tag 'for-linus-20191012' of git://git.kernel.dk/linux-block:
io_uring: fix sequence logic for timeout requests
|
|
If on boot up, lockdown is activated for tracefs, don't even bother creating
the files. This can also prevent instances from being created if lockdown is
in effect.
Link: http://lkml.kernel.org/r/CAHk-=whC6Ji=fWnjh2+eS4b15TnbsS4VPVtvBOwCy1jjEG_JHQ@mail.gmail.com
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
locked down")
Running the latest kernel through my "make instances" stress tests, I
triggered the following bug (with KASAN and kmemleak enabled):
mkdir invoked oom-killer:
gfp_mask=0x40cd0(GFP_KERNEL|__GFP_COMP|__GFP_RECLAIMABLE), order=0,
oom_score_adj=0
CPU: 1 PID: 2229 Comm: mkdir Not tainted 5.4.0-rc2-test #325
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
Call Trace:
dump_stack+0x64/0x8c
dump_header+0x43/0x3b7
? trace_hardirqs_on+0x48/0x4a
oom_kill_process+0x68/0x2d5
out_of_memory+0x2aa/0x2d0
__alloc_pages_nodemask+0x96d/0xb67
__alloc_pages_node+0x19/0x1e
alloc_slab_page+0x17/0x45
new_slab+0xd0/0x234
___slab_alloc.constprop.86+0x18f/0x336
? alloc_inode+0x2c/0x74
? irq_trace+0x12/0x1e
? tracer_hardirqs_off+0x1d/0xd7
? __slab_alloc.constprop.85+0x21/0x53
__slab_alloc.constprop.85+0x31/0x53
? __slab_alloc.constprop.85+0x31/0x53
? alloc_inode+0x2c/0x74
kmem_cache_alloc+0x50/0x179
? alloc_inode+0x2c/0x74
alloc_inode+0x2c/0x74
new_inode_pseudo+0xf/0x48
new_inode+0x15/0x25
tracefs_get_inode+0x23/0x7c
? lookup_one_len+0x54/0x6c
tracefs_create_file+0x53/0x11d
trace_create_file+0x15/0x33
event_create_dir+0x2a3/0x34b
__trace_add_new_event+0x1c/0x26
event_trace_add_tracer+0x56/0x86
trace_array_create+0x13e/0x1e1
instance_mkdir+0x8/0x17
tracefs_syscall_mkdir+0x39/0x50
? get_dname+0x31/0x31
vfs_mkdir+0x78/0xa3
do_mkdirat+0x71/0xb0
sys_mkdir+0x19/0x1b
do_fast_syscall_32+0xb0/0xed
I bisected this down to the addition of the proxy_ops into tracefs for
lockdown. It appears that the allocation of the proxy_ops and then freeing
it in the destroy_inode callback, is causing havoc with the memory system.
Reading the documentation about destroy_inode and talking with Linus about
this, this is buggy and wrong. When defining the destroy_inode() method, it
is expected that the destroy_inode() will also free the inode, and not just
the extra allocations done in the creation of the inode. The faulty commit
causes a memory leak of the inode data structure when they are deleted.
Instead of allocating the proxy_ops (and then having to free it) the checks
should be done by the open functions themselves, and not hack into the
tracefs directory. First revert the tracefs updates for locked_down and then
later we can add the locked_down checks in the kernel/trace files.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Pull NFS client bugfixes from Anna Schumaker:
"Stable bugfixes:
- Fix O_DIRECT accounting of number of bytes read/written # v4.1+
Other fixes:
- Fix nfsi->nrequests count error on nfs_inode_remove_request()
- Remove redundant mirror tracking in O_DIRECT
- Fix leak of clp->cl_acceptor string
- Fix race to sk_err after xs_error_report"
* tag 'nfs-for-5.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: fix race to sk_err after xs_error_report
NFSv4: Fix leak of clp->cl_acceptor string
NFS: Remove redundant mirror tracking in O_DIRECT
NFS: Fix O_DIRECT accounting of number of bytes read/written
nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
|
|
Pull cifs fixes from Steve French:
"Eight small SMB3 fixes, four for stable, and important fix for the
recent regression introduced by filesystem timestamp range patches"
* tag '5.4-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Force reval dentry if LOOKUP_REVAL flag is set
CIFS: Force revalidate inode when dentry is stale
smb3: Fix regression in time handling
smb3: remove noisy debug message and minor cleanup
CIFS: Gracefully handle QueryInfo errors during open
cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic
fs: cifs: mute -Wunused-const-variable message
smb3: cleanup some recent endian errors spotted by updated sparse
|
|
Pull block fixes from Jens Axboe:
- Fix wbt performance regression introduced with the blk-rq-qos
refactoring (Harshad)
- Fix io_uring fileset removal inadvertently killing the workqueue (me)
- Fix io_uring typo in linked command nonblock submission (Pavel)
- Remove spurious io_uring wakeups on request free (Pavel)
- Fix null_blk zoned command error return (Keith)
- Don't use freezable workqueues for backing_dev, also means we can
revert a previous libata hack (Mika)
- Fix nbd sysfs mutex dropped too soon at removal time (Xiubo)
* tag 'for-linus-20191010' of git://git.kernel.dk/linux-block:
nbd: fix possible sysfs duplicate warning
null_blk: Fix zoned command return code
io_uring: only flush workqueues on fileset removal
io_uring: remove wait loop spurious wakeups
blk-wbt: fix performance regression in wbt scale_up/scale_down
Revert "libata, freezer: avoid block device removal while system is frozen"
bdi: Do not use freezable workqueue
io_uring: fix reversed nonblock flag for link submission
|
|
We have two ways a request can be deferred:
1) It's a regular request that depends on another one
2) It's a timeout that tracks completions
We have a shared helper to determine whether to defer, and that
attempts to make the right decision based on the request. But we
only have some of this information in the caller. Un-share the
two timeout/defer helpers so the caller can use the right one.
Fixes: 5262f567987d ("io_uring: IORING_OP_TIMEOUT support")
Reported-by: yangerkun <[email protected]>
Reviewed-by: Jackie Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Our client can issue multiple SETCLIENTID operations to the same
server in some circumstances. Ensure that calls to
nfs4_proc_setclientid() after the first one do not overwrite the
previously allocated cl_acceptor string.
unreferenced object 0xffff888461031800 (size 32):
comm "mount.nfs", pid 2227, jiffies 4294822467 (age 1407.749s)
hex dump (first 32 bytes):
6e 66 73 40 6b 6c 69 6d 74 2e 69 62 2e 31 30 31 [email protected]
35 67 72 61 6e 67 65 72 2e 6e 65 74 00 00 00 00 5granger.net....
backtrace:
[<00000000ab820188>] __kmalloc+0x128/0x176
[<00000000eeaf4ec8>] gss_stringify_acceptor+0xbd/0x1a7 [auth_rpcgss]
[<00000000e85e3382>] nfs4_proc_setclientid+0x34e/0x46c [nfsv4]
[<000000003d9cf1fa>] nfs40_discover_server_trunking+0x7a/0xed [nfsv4]
[<00000000b81c3787>] nfs4_discover_server_trunking+0x81/0x244 [nfsv4]
[<000000000801b55f>] nfs4_init_client+0x1b0/0x238 [nfsv4]
[<00000000977daf7f>] nfs4_set_client+0xfe/0x14d [nfsv4]
[<0000000053a68a2a>] nfs4_create_server+0x107/0x1db [nfsv4]
[<0000000088262019>] nfs4_remote_mount+0x2c/0x59 [nfsv4]
[<00000000e84a2fd0>] legacy_get_tree+0x2d/0x4c
[<00000000797e947c>] vfs_get_tree+0x20/0xc7
[<00000000ecabaaa8>] fc_mount+0xe/0x36
[<00000000f15fafc2>] vfs_kern_mount+0x74/0x8d
[<00000000a3ff4e26>] nfs_do_root_mount+0x8a/0xa3 [nfsv4]
[<00000000d1c2b337>] nfs4_try_mount+0x58/0xad [nfsv4]
[<000000004c9bddee>] nfs_fs_mount+0x820/0x869 [nfs]
Fixes: f11b2a1cfbf5 ("nfs4: copy acceptor name from context ... ")
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Pull xfs fixes from Darrick Wong:
"A couple of small code cleanups and bug fixes for rounding errors,
metadata logging errors, and an extra layer of safeguards against
leaking memory contents.
- Fix a rounding error in the fallocate code
- Minor code cleanups
- Make sure to zero memory buffers before formatting metadata blocks
- Fix a few places where we forgot to log an inode metadata update
- Remove broken error handling that tried to clean up after a failure
but still got it wrong"
* tag 'xfs-5.4-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: move local to extent inode logging into bmap helper
xfs: remove broken error handling on failed attr sf to leaf change
xfs: log the inode on directory sf to block format change
xfs: assure zeroed memory buffers for certain kmem allocations
xfs: removed unused error variable from xchk_refcountbt_rec
xfs: remove unused flags arg from xfs_get_aghdr_buf()
xfs: Fix tail rounding in xfs_alloc_file_space()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more stabitly fixes, one build warning fix.
- fix inode allocation under NOFS context
- fix leak in fiemap due to concurrent append writes
- fix log-root tree updates
- fix balance convert of single profile on 32bit architectures
- silence false positive warning on old GCCs (code moved in rc1)"
* tag 'for-5.4-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: silence maybe-uninitialized warning in clone_range
btrfs: fix uninitialized ret in ref-verify
btrfs: allocate new inode in NOFS context
btrfs: fix balance convert to single on 32-bit host CPUs
btrfs: fix incorrect updating of log root tree
Btrfs: fix memory leak due to concurrent append writes with fiemap
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache_readdir() fixes from Al Viro:
"The couple of patches you'd been OK with; no hlist conversion yet, and
cursors are still in the list of children"
[ Al is referring to future work to avoid some nasty O(n**2) behavior
with the readdir cursors when you have lots of concurrent readdirs.
This is just a fix for a race with a trivial cleanup - Linus ]
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
libfs: take cursors out of list when moving past the end of directory
Fix the locking in dcache_readdir() and friends
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount fixes from Al Viro:
"A couple of regressions from the mount series"
* 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: add missing blkdev_put() in get_tree_bdev()
shmem: fix LSM options parsing
|
|
that eliminates the last place where we accessed the tail of ->d_subdirs
Signed-off-by: Al Viro <[email protected]>
|
|
Is there are a couple of missing blkdev_put() in get_tree_bdev()?
Signed-off-by: Al Viro <[email protected]>
|
|
We should not remove the workqueue, we just need to ensure that the
workqueues are synced. The workqueues are torn down on ctx removal.
Cc: [email protected]
Fixes: 6b06314c47e1 ("io_uring: add file set registration")
Reported-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The callers of xfs_bmap_local_to_extents_empty() log the inode
external to the function, yet this function is where the on-disk
format value is updated. Push the inode logging down into the
function itself to help prevent future mistakes.
Note that internal bmap callers track the inode logging flags
independently and thus may log the inode core twice due to this
change. This is harmless, so leave this code around for consistency
with the other attr fork conversion functions.
Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
xfs_attr_shortform_to_leaf() attempts to put the shortform fork back
together after a failed attempt to convert from shortform to leaf
format. While this code reallocates and copies back the shortform
attr fork data, it never resets the inode format field back to local
format. Further, now that the inode is properly logged after the
initial switch from local format, any error that triggers the
recovery code will eventually abort the transaction and shutdown the
fs. Therefore, remove the broken and unnecessary error handling
code.
Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
When a directory changes from shortform (sf) to block format, the sf
format is copied to a temporary buffer, the inode format is modified
and the updated format filled with the dentries from the temporary
buffer. If the inode format is modified and attempt to grow the
inode fails (due to I/O error, for example), it is possible to
return an error while leaving the directory in an inconsistent state
and with an otherwise clean transaction. This results in corruption
of the associated directory and leads to xfs_dabuf_map() errors as
subsequent lookups cannot accurately determine the format of the
directory. This problem is reproduced occasionally by generic/475.
The fundamental problem is that xfs_dir2_sf_to_block() changes the
on-disk inode format without logging the inode. The inode is
eventually logged by the bmapi layer in the common case, but error
checking introduces the possibility of failing the high level
request before this happens.
Update both of the dir2 and attr callers of
xfs_bmap_local_to_extents_empty() to log the inode core as
consistent with the bmap local to extent format change codepath.
This ensures that any subsequent errors after the format has changed
cause the transaction to abort.
Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
|
|
We no longer need the extra mirror length tracking in the O_DIRECT code,
as we are able to track the maximum contiguous length in dreq->max_count.
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
When a series of O_DIRECT reads or writes are truncated, either due to
eof or due to an error, then we should return the number of contiguous
bytes that were received/sent starting at the offset specified by the
application.
Currently, we are failing to correctly check contiguity, and so we're
failing the generic/465 in xfstests when the race between the read
and write RPCs causes the file to get extended while the 2 reads are
outstanding. If the first read RPC call wins the race and returns with
eof set, we should treat the second read RPC as being truncated.
Reported-by: Su Yanjun <[email protected]>
Fixes: 1ccbad9f9f9bd ("nfs: fix DIO good bytes calculation")
Cc: [email protected] # 4.1+
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Mark inode for force revalidation if LOOKUP_REVAL flag is set.
This tells the client to actually send a QueryInfo request to
the server to obtain the latest metadata in case a directory
or a file were changed remotely. Only do that if the client
doesn't have a lease for the file to avoid unneeded round
trips to the server.
Cc: <[email protected]>
Signed-off-by: Pavel Shilovsky <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Currently the client indicates that a dentry is stale when inode
numbers or type types between a local inode and a remote file
don't match. If this is the case attributes is not being copied
from remote to local, so, it is already known that the local copy
has stale metadata. That's why the inode needs to be marked for
revalidation in order to tell the VFS to lookup the dentry again
before openning a file. This prevents unexpected stale errors
to be returned to the user space when openning a file.
Cc: <[email protected]>
Signed-off-by: Pavel Shilovsky <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Fixes: cb7a69e60590 ("cifs: Initialize filesystem timestamp ranges")
Only very old servers (e.g. OS/2 and DOS) did not support
DCE TIME (100 nanosecond granularity). Fix the checks used
to set minimum and maximum times.
Fixes xfstest generic/258 (on 5.4-rc1 and later)
CC: Deepa Dinamani <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Signed-off-by: Steve French <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
|
|
Message was intended only for developer temporary build
In addition cleanup two minor warnings noticed by Coverity
and a trivial change to workaround a sparse warning
Signed-off-by: Steve French <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
|
|
GCC throws warning message as below:
‘clone_src_i_size’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
^
fs/btrfs/send.c:5088:6: note: ‘clone_src_i_size’ was declared here
u64 clone_src_i_size;
^
The clone_src_i_size is only used as call-by-reference
in a call to get_inode_info().
Silence the warning by initializing clone_src_i_size to 0.
Note that the warning is a false positive and reported by older versions
of GCC (eg. 7.x) but not eg 9.x. As there have been numerous people, the
patch is applied. Setting clone_src_i_size to 0 does not otherwise make
sense and would not do any action in case the code changes in the future.
Signed-off-by: Austin Kim <[email protected]>
Reviewed-by: David Sterba <[email protected]>
[ add note ]
Signed-off-by: David Sterba <[email protected]>
|
|
Any changes interesting to tasks waiting in io_cqring_wait() are
commited with io_cqring_ev_posted(). However, io_ring_drop_ctx_refs()
also tries to do that but with no reason, that means spurious wakeups
every io_free_req() and io_uring_enter().
Just use percpu_ref_put() instead.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Merge misc fixes from Andrew Morton:
"The usual shower of hotfixes.
Chris's memcg patches aren't actually fixes - they're mature but a few
niggling review issues were late to arrive.
The ocfs2 fixes are quite old - those took some time to get reviewer
attention.
Subsystems affected by this patch series: ocfs2, hotfixes, mm/memcg,
mm/slab-generic"
* emailed patches from Andrew Morton <[email protected]>:
mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
mm, sl[ou]b: improve memory accounting
mm, memcg: make scan aggression always exclude protection
mm, memcg: make memory.emin the baseline for utilisation determination
mm, memcg: proportional memory.{low,min} reclaim
mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
mm/page_alloc.c: fix a crash in free_pages_prepare()
mm/z3fold.c: claim page in the beginning of free
kernel/sysctl.c: do not override max_threads provided by userspace
memcg: only record foreign writebacks with dirty pages when memcg is not disabled
mm: fix -Wmissing-prototypes warnings
writeback: fix use-after-free in finish_writeback_work()
mm/memremap: drop unused SECTION_SIZE and SECTION_MASK
panic: ensure preemption is disabled during panic()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()
fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
ocfs2: clear zero in unaligned direct IO
|
|
finish_writeback_work() reads @done->waitq after decrementing
@done->cnt. However, once @done->cnt reaches zero, @done may be freed
(from stack) at any moment and @done->waitq can contain something
unrelated by the time finish_writeback_work() tries to read it. This
led to the following crash.
"BUG: kernel NULL pointer dereference, address: 0000000000000002"
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
CPU: 40 PID: 555153 Comm: kworker/u98:50 Kdump: loaded Not tainted
...
Workqueue: writeback wb_workfn (flush-btrfs-1)
RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
Code: 48 89 d8 5b c3 e8 50 db 6b ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 fe ca 6b ff eb f2 66 90
RSP: 0018:ffffc90049b27d98 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000003 RDI: 0000000000000002
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffff889fff407600 R11: ffff88ba9395d740 R12: 000000000000e300
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88bfdfa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 0000000002409005 CR4: 00000000001606e0
Call Trace:
__wake_up_common_lock+0x63/0xc0
wb_workfn+0xd2/0x3e0
process_one_work+0x1f5/0x3f0
worker_thread+0x2d/0x3d0
kthread+0x111/0x130
ret_from_fork+0x1f/0x30
Fix it by reading and caching @done->waitq before decrementing
@done->cnt.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 5b9cce4c7eb069 ("writeback: Generalize and expose wb_completion")
Signed-off-by: Tejun Heo <[email protected]>
Debugged-by: Chris Mason <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: <[email protected]> [5.2+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
ocfs2_info_scan_inode_alloc()
In ocfs2_info_scan_inode_alloc(), there is an if statement on line 283
to check whether inode_alloc is NULL:
if (inode_alloc)
When inode_alloc is NULL, it is used on line 287:
ocfs2_inode_lock(inode_alloc, &bh, 0);
ocfs2_inode_lock_full_nested(inode, ...)
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
Thus, a possible null-pointer dereference may occur.
To fix this bug, inode_alloc is checked on line 286.
This bug is found by a static analysis tool STCheck written by us.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jia-Ju Bai <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In ocfs2_write_end_nolock(), there are an if statement on lines 1976,
2047 and 2058, to check whether handle is NULL:
if (handle)
When handle is NULL, it is used on line 2045:
ocfs2_update_inode_fsync_trans(handle, inode, 1);
oi->i_sync_tid = handle->h_transaction->t_tid;
Thus, a possible null-pointer dereference may occur.
To fix this bug, handle is checked before calling
ocfs2_update_inode_fsync_trans().
This bug is found by a static analysis tool STCheck written by us.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jia-Ju Bai <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
check whether loc->xl_entry is NULL:
if (loc->xl_entry)
When loc->xl_entry is NULL, it is used on line 2158:
ocfs2_xa_add_entry(loc, name_hash);
loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);
and line 2164:
ocfs2_xa_add_namevalue(loc, xi);
loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
loc->xl_entry->xe_name_len = xi->xi_name_len;
Thus, possible null-pointer dereferences may occur.
To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
abnormally returns with -EINVAL.
These bugs are found by a static analysis tool STCheck written by us.
[[email protected]: remove now-unused ocfs2_xa_add_entry()]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jia-Ju Bai <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|