Age | Commit message (Collapse) | Author | Files | Lines |
|
The only remaining user of ->io_private is the generic ioend merging
infrastructure. The only user of that is XFS, which no longer sets
->io_private or passes an associated merge callback. Remove the
unused parameter and the ->io_private field.
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
While running generic/050 with an external log, I observed this warning
in dmesg:
Trying to write to read-only block-device sda4 (partno 4)
WARNING: CPU: 2 PID: 215677 at block/blk-core.c:704 submit_bio_checks+0x256/0x510
Call Trace:
submit_bio_noacct+0x2c/0x430
_xfs_buf_ioapply+0x283/0x3c0 [xfs]
__xfs_buf_submit+0x6a/0x210 [xfs]
xfs_buf_delwri_submit_buffers+0xf8/0x270 [xfs]
xfsaild+0x2db/0xc50 [xfs]
kthread+0x14b/0x170
I think this happened because we tried to cover the log after a readonly
mount, and the AIL tried to write the primary superblock to the data
device. The test marks the data device readonly, but it doesn't do the
same to the external log device. Therefore, XFS thinks that the log is
writable, even though AIL writes whine to dmesg because the data device
is read only.
Fix this by amending xfs_log_writable to prevent writes when the AIL
can't possible write anything into the filesystem.
Note: As for the external log or the rt devices being readonly--
xfs_blkdev_get will complain about that if we aren't doing a norecovery
mount.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New feature:
- A new "func-no-repeats" option in tracefs/options directory.
When set the function tracer will detect if the current function
being traced is the same as the previous one, and instead of
recording it, it will keep track of the number of times that the
function is repeated in a row. And when another function is
recorded, it will write a new event that shows the function that
repeated, the number of times it repeated and the time stamp of
when the last repeated function occurred.
Enhancements:
- In order to implement the above "func-no-repeats" option, the ring
buffer timestamp can now give the accurate timestamp of the event
as it is being recorded, instead of having to record an absolute
timestamp for all events. This helps the histogram code which no
longer needs to waste ring buffer space.
- New validation logic to make sure all trace events that access
dereferenced pointers do so in a safe way, and will warn otherwise.
Fixes:
- No longer limit the PIDs of tasks that are recorded for
"saved_cmdlines" to PID_MAX_DEFAULT (32768), as systemd now allows
for a much larger range. This caused the mapping of PIDs to the
task names to be dropped for all tasks with a PID greater than
32768.
- Change trace_clock_global() to never block. This caused a deadlock.
Clean ups:
- Typos, prototype fixes, and removing of duplicate or unused code.
- Better management of ftrace_page allocations"
* tag 'trace-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (32 commits)
tracing: Restructure trace_clock_global() to never block
tracing: Map all PIDs to command lines
ftrace: Reuse the output of the function tracer for func_repeats
tracing: Add "func_no_repeats" option for function tracing
tracing: Unify the logic for function tracing options
tracing: Add method for recording "func_repeats" events
tracing: Add "last_func_repeats" to struct trace_array
tracing: Define new ftrace event "func_repeats"
tracing: Define static void trace_print_time()
ftrace: Simplify the calculation of page number for ftrace_page->records some more
ftrace: Store the order of pages allocated in ftrace_page
tracing: Remove unused argument from "ring_buffer_time_stamp()
tracing: Remove duplicate struct declaration in trace_events.h
tracing: Update create_system_filter() kernel-doc comment
tracing: A minor cleanup for create_system_filter()
kernel: trace: Mundane typo fixes in the file trace_events_filter.c
tracing: Fix various typos in comments
scripts/recordmcount.pl: Make vim and emacs indent the same
scripts/recordmcount.pl: Make indent spacing consistent
tracing: Add a verifier to check string pointers for trace events
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull receive_fd update from Al Viro:
"Cleanup of receive_fd mess"
* 'work.file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: split receive_fd_replace from __receive_fd
|
|
When the tcp connection is not ready to send requests,
we keep retrying echo with an interval of zero.
This seems unnecessary, and this fix changes the interval
between echoes to what is specified as echo_interval.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We can detect server unresponsiveness only if echoes are enabled.
Echoes can be disabled under two scenarios:
1. The connection is low on credits, so we've disabled echoes/oplocks.
2. The connection has not seen any request till now (other than
negotiate/sess-setup), which is when we enable these two, based on
the credits available.
So this fix will check for dead connection, only when echo is enabled.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
CC: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When mounted multiuser it is hard to dump keys for the other sessions
which makes it hard to debug using network traces (e.g. using wireshark).
Suggested-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Previously we were only able to dump CCM or GCM-128 keys (see "smbinfo keys" e.g.)
to allow network debugging (e.g. wireshark) of mounts to SMB3.1.1 encrypted
shares. But with the addition of GCM-256 support, we have to be able to dump
32 byte instead of 16 byte keys which requires adding an additional ioctl
for that.
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Various filesystem support the shutdown ioctl which is used by various
xfstests. The shutdown ioctl sets a flag on the superblock which
prevents open, unlink, symlink, hardlink, rmdir, create etc.
on the file system until unmount and remounted. The two flags supported
in this patch are:
FSOP_GOING_FLAGS_LOGFLUSH and FSOP_GOING_FLAGS_NOLOGFLUSH
which require very little other than blocking new operations (since
we do not cache writes to metadata on the client with cifs.ko).
FSOP_GOING_FLAGS_DEFAULT is not supported yet, but could be added in
the future but would need to call syncfs or equivalent to write out
pending data on the mount.
With this patch various xfstests now work including tests 043 through
046 for example.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
|
|
When file is closed, SMB2 close request is not sent to server
immediately and is deferred for acregmax defined interval. When file is
reopened by same process for read or write, the file handle
is reused if an oplock is held.
When client receives a oplock/lease break, file is closed immediately
if reference count is zero, else oplock is downgraded.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"orangefs: implement orangefs_readahead
mm/readahead.c/read_pages was quite a bit different back when I put my
open-coded readahead logic into orangefs_readpage. That logic seemed
to work as designed back then, it is a trainwreck now.
This implements orangefs_readahead using the new xarray and
readahead_expand features and removes all my open-coded readahead
logic.
This results in an extreme read performance improvement, these sample
numbers are from my test VM:
Here's an example of what's upstream in
5.11.8-200.fc33.x86_64:
30+0 records in
30+0 records out
125829120 bytes (126 MB, 120 MiB) copied, 5.77943 s, 21.8 MB/s
And here's this version of orangefs_readahead on top of 5.12.0-rc4:
30+0 records in
30+0 records out
125829120 bytes (126 MB, 120 MiB) copied, 0.325919 s, 386 MB/s
There are four xfstest regressions with this patch. David Howells and
Matthew Wilcox have been helping me work with this code"
* tag 'for-linus-5.13-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: leave files in the page cache for a few micro seconds at least
Orangef: implement orangefs_readahead.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Assorted stuff all over the place"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
useful constants: struct qstr for ".."
hostfs_open(): don't open-code file_dentry()
whack-a-mole: kill strlen_user() (again)
autofs: should_expire() argument is guaranteed to be positive
apparmor:match_mn() - constify devpath argument
buffer: a small optimization in grow_buffers
get rid of autofs_getpath()
constify dentry argument of dentry_path()/dentry_path_raw()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull exryptfs updates from Al Viro:
"The interesting part here is (ecryptfs) lock_parent() fixes - its
treatment of ->d_parent had been very wrong.
The rest is trivial cleanups"
* 'work.ecryptfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ecryptfs: ecryptfs_dentry_info->crypt_stat is never used
ecryptfs: get rid of unused accessors
ecryptfs: saner API for lock_parent()
ecryptfs: get rid of pointless dget/dput in ->symlink() and ->link()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull Landlock LSM from James Morris:
"Add Landlock, a new LSM from Mickaël Salaün.
Briefly, Landlock provides for unprivileged application sandboxing.
From Mickaël's cover letter:
"The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock
is a stackable LSM [1], it makes possible to create safe security
sandboxes as new security layers in addition to the existing
system-wide access-controls. This kind of sandbox is expected to
help mitigate the security impact of bugs or unexpected/malicious
behaviors in user-space applications. Landlock empowers any
process, including unprivileged ones, to securely restrict
themselves.
Landlock is inspired by seccomp-bpf but instead of filtering
syscalls and their raw arguments, a Landlock rule can restrict the
use of kernel objects like file hierarchies, according to the
kernel semantic. Landlock also takes inspiration from other OS
sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD
Pledge/Unveil.
In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This
series still addresses multiple use cases, especially with the
combined use of seccomp-bpf: applications with built-in sandboxing,
init systems, security sandbox tools and security-oriented APIs [2]"
The cover letter and v34 posting is here:
https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/
See also:
https://landlock.io/
This code has had extensive design discussion and review over several
years"
Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1]
Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2]
* tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
landlock: Enable user space to infer supported features
landlock: Add user and kernel documentation
samples/landlock: Add a sandbox manager example
selftests/landlock: Add user space tests
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
LSM: Infrastructure management of the superblock
landlock: Add ptrace restrictions
landlock: Set up the security framework and manage credentials
landlock: Add ruleset and domain management
landlock: Add object management
|
|
The generic/464 xfstest causes kAFS to emit occasional warnings of the
form:
kAFS: vnode modified {100055:8a} 30->31 YFS.StoreData64 (c=6015)
This indicates that the data version received back from the server did not
match the expected value (the DV should be incremented monotonically for
each individual modification op committed to a vnode).
What is happening is that a lookup call is doing a bulk status fetch
speculatively on a bunch of vnodes in a directory besides getting the
status of the vnode it's actually interested in. This is racing with a
StoreData operation (though it could also occur with, say, a MakeDir op).
On the client, a modification operation locks the vnode, but the bulk
status fetch only locks the parent directory, so no ordering is imposed
there (thereby avoiding an avenue to deadlock).
On the server, the StoreData op handler doesn't lock the vnode until it's
received all the request data, and downgrades the lock after committing the
data until it has finished sending change notifications to other clients -
which allows the status fetch to occur before it has finished.
This means that:
- a status fetch can access the target vnode either side of the exclusive
section of the modification
- the status fetch could start before the modification, yet finish after,
and vice-versa.
- the status fetch and the modification RPCs can complete in either order.
- the status fetch can return either the before or the after DV from the
modification.
- the status fetch might regress the locally cached DV.
Some of these are handled by the previous fix[1], but that's not sufficient
because it checks the DV it received against the DV it cached at the start
of the op, but the DV might've been updated in the meantime by a locally
generated modification op.
Fix this by the following means:
(1) Keep track of when we're performing a modification operation on a
vnode. This is done by marking vnode parameters with a 'modification'
note that causes the AFS_VNODE_MODIFYING flag to be set on the vnode
for the duration.
(2) Alter the speculation race detection to ignore speculative status
fetches if either the vnode is marked as being modified or the data
version number is not what we expected.
Note that whilst the "vnode modified" warning does get recovered from as it
causes the client to refetch the status at the next opportunity, it will
also invalidate the pagecache, so changes might get lost.
Fixes: a9e5c87ca744 ("afs: Fix speculative status fetch going out of order wrt to modifications")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-and-reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/160605082531.252452.14708077925602709042.stgit@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/linux-fsdevel/161961335926.39335.2552653972195467566.stgit@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The pattern prefixed with '/' matches files in the same directory,
but not ones in sub-directories.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Andra Paraschiv <andraprs@amazon.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Gabriel Krisman Bertazi <krisman@collabora.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"New features for ext4 this cycle include support for encrypted
casefold, ensure that deleted file names are cleared in directory
blocks by zeroing directory entries when they are unlinked or moved as
part of a hash tree node split. We also improve the block allocator's
performance on a freshly mounted file system by prefetching block
bitmaps.
There are also the usual cleanups and bug fixes, including fixing a
page cache invalidation race when there is mixed buffered and direct
I/O and the block size is less than page size, and allow the dax flag
to be set and cleared on inline directories"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
ext4: wipe ext4_dir_entry2 upon file deletion
ext4: Fix occasional generic/418 failure
fs: fix reporting supported extra file attributes for statx()
ext4: allow the dax flag to be set and cleared on inline directories
ext4: fix debug format string warning
ext4: fix trailing whitespace
ext4: fix various seppling typos
ext4: fix error return code in ext4_fc_perform_commit()
ext4: annotate data race in jbd2_journal_dirty_metadata()
ext4: annotate data race in start_this_handle()
ext4: fix ext4_error_err save negative errno into superblock
ext4: fix error code in ext4_commit_super
ext4: always panic when errors=panic is specified
ext4: delete redundant uptodate check for buffer
ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()
ext4: make prefetch_block_bitmaps default
ext4: add proc files to monitor new structures
ext4: improve cr 0 / cr 1 group scanning
ext4: add MB_NUM_ORDERS macro
ext4: add mballoc stats proc file
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This includes more dlm networking cleanups and improvements for making
dlm shutdowns more robust"
* tag 'dlm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
fs: dlm: fix missing unlock on error in accept_from_sock()
fs: dlm: add shutdown hook
fs: dlm: flush swork on shutdown
fs: dlm: remove unaligned memory access handling
fs: dlm: check on minimum msglen size
fs: dlm: simplify writequeue handling
fs: dlm: use GFP_ZERO for page buffer
fs: dlm: change allocation limits
fs: dlm: add check if dlm is currently running
fs: dlm: add errno handling to check callback
fs: dlm: set subclass for othercon sock_mutex
fs: dlm: set connected bit after accept
fs: dlm: fix mark setting deadlock
fs: dlm: fix debugfs dump
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Fix a page locking bug in write (introduced in 2.6.26)
- Allow sgid bit to be killed in setacl()
- Miscellaneous fixes and cleanups
* tag 'fuse-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
cuse: simplify refcount
cuse: prevent clone
virtiofs: fix userns
virtiofs: remove useless function
virtiofs: split requests that exceed virtqueue size
virtiofs: fix memory leak in virtio_fs_probe()
fuse: invalidate attrs when page writeback completes
fuse: add a flag FUSE_SETXATTR_ACL_KILL_SGID to kill SGID
fuse: extend FUSE_SETXATTR request
fuse: fix matching of FUSE_DEV_IOC_CLONE command
fuse: fix a typo
fuse: don't zero pages twice
fuse: fix typo for fuse_conn.max_pages comment
fuse: fix write deadlock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
- Fix a regression introduced in 5.2 that resulted in valid overlayfs
mounts being rejected with ELOOP (Too many levels of symbolic links)
- Fix bugs found by various tools
- Miscellaneous improvements and cleanups
* tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: add debug print to ovl_do_getxattr()
ovl: invalidate readdir cache on changes to dir with origin
ovl: allow upperdir inside lowerdir
ovl: show "userxattr" in the mount data
ovl: trivial typo fixes in the file inode.c
ovl: fix misspellings using codespell tool
ovl: do not copy attr several times
ovl: remove ovl_map_dev_ino() return value
ovl: fix error for ovl_fill_super()
ovl: fix missing revert_creds() on error path
ovl: fix leaked dentry
ovl: restrict lower null uuid for "xino=auto"
ovl: check that upperdir path is not on a read-only mount
ovl: plumb through flush method
|
|
This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99.
As discussed in [1] this commit was a no-op because the mapping type was
checked in vma_to_resize before move_vma is ever called. This meant that
vm_ops->mremap() would never be called on such mappings. Furthermore,
we've since expanded support of MREMAP_DONTUNMAP to non-anonymous
mappings, and these special mappings are still protected by the existing
check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL.
1. https://lkml.org/lkml/2020/12/28/2340
Link: https://lkml.kernel.org/r/20210323182520.2712101-2-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For reads, use the better variant of checking for the need to call
filemap_write_and_wait_range() when doing O_DIRECT. This avoids falling
back to the slow path for IOCB_NOWAIT, if there are no pages to wait for
(or write out).
Link: https://lkml.kernel.org/r/20210224164455.1096727-4-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix kernel-doc notation function arguments to eliminate two kernel-doc
warnings:
fs_parser.c:322: warning: Excess function parameter 'name' description in 'validate_constant_table'
fs_parser.c:367: warning: Function parameter or member 'name' not described in 'fs_validate_description'
Link: https://lkml.kernel.org/r/20210407033743.9701-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the following clang warning:
fs/ocfs2/dlm/dlmrecovery.c:129:20: warning: unused function 'dlm_reset_recovery' [-Wunused-function].
Link: https://lkml.kernel.org/r/1618382761-5784-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
s/cluter/cluster/
Link: https://lkml.kernel.org/r/20210324072931.5056-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use macro map_flag() is tricky and coccicheck outputs the following
warning:
fs/ocfs2/stack_o2cb.c:69:5-16: Unneeded variable: "o2dlm_flags"
So map flags directly in flags_to_o2dlm() to make coccicheck happy.
And remove BUG_ON() here as well to simplify code since it runs well
a long time.
Link: https://lkml.kernel.org/r/1616138664-35935-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the following coccicheck warning:
fs/ocfs2/blockcheck.c:232:0-23: WARNING: blockcheck_fops should be defined with DEFINE_DEBUGFS_ATTRIBUTE
Link: https://lkml.kernel.org/r/1614155230-57292-1-git-send-email-yang.lee@linux.alibaba.com
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
unreferenced object 0xffff8881123bf0a0 (size 32):
comm "syz-executor557", pid 8384, jiffies 4294946143 (age 12.360s)
backtrace:
[<ffffffff81469b71>] kmalloc_node include/linux/slab.h:579 [inline]
[<ffffffff81469b71>] kvmalloc_node+0x61/0xf0 mm/util.c:587
[<ffffffff815f0b3f>] kvmalloc include/linux/mm.h:795 [inline]
[<ffffffff815f0b3f>] kvmalloc_array include/linux/mm.h:813 [inline]
[<ffffffff815f0b3f>] kvcalloc include/linux/mm.h:818 [inline]
[<ffffffff815f0b3f>] io_rsrc_data_alloc+0x4f/0xc0 fs/io_uring.c:7164
[<ffffffff815f26d8>] io_sqe_buffers_register+0x98/0x3d0 fs/io_uring.c:8383
[<ffffffff815f84a7>] __io_uring_register+0xf67/0x18c0 fs/io_uring.c:9986
[<ffffffff81609222>] __do_sys_io_uring_register fs/io_uring.c:10091 [inline]
[<ffffffff81609222>] __se_sys_io_uring_register fs/io_uring.c:10071 [inline]
[<ffffffff81609222>] __x64_sys_io_uring_register+0x112/0x230 fs/io_uring.c:10071
[<ffffffff842f616a>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
[<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fix data->tags memory leak, through io_rsrc_data_free() to release
data memory space.
Reported-by: syzbot+0f32d05d8b6cd8d7ea3e@syzkaller.appspotmail.com
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Link: https://lore.kernel.org/r/20210430082515.13886-1-qiang.zhang@windriver.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Evaluate $(call cc-option,...) etc. only for build targets
- Add CONFIG_VMLINUX_MAP to generate .map file when linking vmlinux
- Remove unnecessary --gcc-toolchains Clang flag because the --prefix
flag finds the toolchains
- Do not pass Clang's --prefix flag when using the integrated as
- Check the assembler version in Kconfig time
- Add new CONFIG options, AS_VERSION, AS_IS_GNU, AS_IS_LLVM to clean up
some dependencies in Kconfig
- Fix invalid Module.symvers creation when building only modules
without vmlinux
- Fix false-positive modpost warnings when CONFIG_TRIM_UNUSED_KSYMS is
set, but there is no module to build
- Refactor module installation Makefile
- Support zstd for module compression
- Convert alpha and ia64 to use generic shell scripts to generate the
syscall headers
- Add a new elfnote to indicate if the kernel was built with LTO, which
will be used by pahole
- Flatten the directory structure under include/config/ so CONFIG
options and filenames match
- Change the deb source package name from linux-$(KERNELRELEASE) to
linux-upstream
* tag 'kbuild-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (42 commits)
kbuild: Add $(KBUILD_HOSTLDFLAGS) to 'has_libelf' test
kbuild: deb-pkg: change the source package name to linux-upstream
tools: do not include scripts/Kbuild.include
kbuild: redo fake deps at include/config/*.h
kbuild: remove TMPO from try-run
MAINTAINERS: add pattern for dummy-tools
kbuild: add an elfnote for whether vmlinux is built with lto
ia64: syscalls: switch to generic syscallhdr.sh
ia64: syscalls: switch to generic syscalltbl.sh
alpha: syscalls: switch to generic syscallhdr.sh
alpha: syscalls: switch to generic syscalltbl.sh
sysctl: use min() helper for namecmp()
kbuild: add support for zstd compressed modules
kbuild: remove CONFIG_MODULE_COMPRESS
kbuild: merge scripts/Makefile.modsign to scripts/Makefile.modinst
kbuild: move module strip/compression code into scripts/Makefile.modinst
kbuild: refactor scripts/Makefile.modinst
kbuild: rename extmod-prefix to extmod_prefix
kbuild: check module name conflict for external modules as well
kbuild: show the target directory for depmod log
...
|
|
Currently the -EINVAL error return path is leaking memory allocated
to data. Fix this by not returning immediately but instead setting
the error return variable to -EINVAL and breaking out of the loop.
Kudos to Pavel Begunkov for suggesting a correct fix.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210429104602.62676-1-colin.king@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_rsrc_node_switch_start() can fail, don't forget to check returned
error code.
Reported-by: syzbot+a4715dd4b7c866136f79@syzkaller.appspotmail.com
Fixes: eae071c9b4cef ("io_uring: prepare fixed rw for dynanic buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c4c06e2f3f0c8e43bd8d0a266c79055bcc6b6e60.1619693112.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Allow empty reg buffer slots any request using which should fail. This
allows users to not register all buffers in advance, but do it lazily
and/or on demand via updates. That is achieved by setting iov_base and
iov_len to zero for registration and/or buffer updates. Empty buffer
can't have a non-zero tag.
Implementation details: to not add extra overhead to io_import_fixed(),
create a dummy buffer crafted to fail any request using it, and set it
to all empty buffer slots.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e95e4d700082baaf010c648c72ac764c9cc8826.1619611868.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a couple of BUILD_BUG_ON() checking some rsrc uapi structs and SQE
flags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ff960df4d5026b9fb5bfd80994b9d3667d3926da.1619536280.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
CQE flags take one byte that we store in req->flags together with other
REQ_F_* internal flags. CQE flags are copied directly into req and then
verified that requires some handling on failures, e.g. to make sure that
that copy doesn't set some of the internal flags.
Move all internal flags to take bits after the first byte, so we don't
need extra handling and make it safer overall.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8b5b02d1ab9d786fcc7db4a3fe86db6b70b8987.1619536280.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Resource emitted CQEs are not bound to requests, so fix up counters used
for DRAIN/defer logic.
Fixes: b60c8dce33895 ("io_uring: preparation for rsrc tagging")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2b32f5f0a40d5928c3466d028f936e167f0654be.1619536280.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to walk
all map elements in a more robust and easier to verify fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF on
s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets
- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks
- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices which don't
need headers in the linear skb part (e.g. virtio)
- nexthop: resilient next-hop groups - improve path stability on
next-hops group changes (incl. offload for mlxsw)
- ipv6: segment routing: add support for IPv4 decapsulation
- icmp: add support for RFC 8335 extended PROBE messages
- inet: use bigger hash table for IP ID generation
- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is slow in
reporting that it completed transmitting the original
- tcp: reorder tcp_congestion_ops for better cache locality
- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow
- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take place
correctly even for encapsulated UDP traffic
- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO
- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls
- veth: allow GRO without XDP - this allows aggregating UDP packets
before handing them off to routing, bridge, OvS, etc.
- allow specifing ifindex when device is moved to another namespace
- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used to
define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily
- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic
- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing
Device APIs:
- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
independent APIs
- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and bnxt
support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP which
define EEPROM in terms of pages (incl. mlx5 support)
- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)
- psample: add additional metadata attributes like transit delay for
packets sampled from switch HW (and corresponding egress and
policy-based sampling in the mlxsw driver)
- dsa: improve support for sandwiched LAGs with bridge and DSA
- netfilter:
- flowtable: use direct xmit in topologies with IP forwarding,
bridging, vlans etc.
- nftables: counter hardware offload support
- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver
- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames
- phy: add support for Clause-45 PHY Loopback
- pci/iov: add sysfs MSI-X vector assignment interface to distribute
MSI-X resources to VFs (incl. mlx5 support)
New hardware/drivers:
- dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port
Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit
interfaces.
- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and
BCM63xx switches
- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches
- ath11k: support for QCN9074 a 802.11ax device
- Bluetooth: Broadcom BCM4330 and BMC4334
- phy: Marvell 88X2222 transceiver support
- mdio: add BCM6368 MDIO mux bus controller
- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips
- mana: driver for Microsoft Azure Network Adapter (MANA)
- Actions Semi Owl Ethernet MAC
- can: driver for ETAS ES58X CAN/USB interfaces
Pure driver changes:
- add XDP support to: enetc, igc, stmmac
- add AF_XDP support to: stmmac
- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary
- mlx5:
- flow rules: add support for mirroring with conntrack, matching
on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting
- ice, iavf: support flow filters, UDP Segmentation Offload
- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic
- ionic:
- implement Rx page reuse
- support HW PTP time-stamping
- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.
- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment
- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping
- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.
- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)
- mt7601u: enable TDLS support
- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes"
* tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits)
net: selftest: fix build issue if INET is disabled
net: netrom: nr_in: Remove redundant assignment to ns
net: tun: Remove redundant assignment to ret
net: phy: marvell: add downshift support for M88E1240
net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255
net/sched: act_ct: Remove redundant ct get and check
icmp: standardize naming of RFC 8335 PROBE constants
bpf, selftests: Update array map tests for per-cpu batched ops
bpf: Add batched ops support for percpu array
bpf: Implement formatted output helpers with bstr_printf
seq_file: Add a seq_bprintf function
sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
net:nfc:digital: Fix a double free in digital_tg_recv_dep_req
net: fix a concurrency bug in l2tp_tunnel_register()
net/smc: Remove redundant assignment to rc
mpls: Remove redundant assignment to err
llc2: Remove redundant assignment to rc
net/tls: Remove redundant initialization of record
rds: Remove redundant assignment to nr_sig
dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
- support for limited fanotify functionality for unpriviledged users
- faster merging of fanotify events
- a few smaller fsnotify improvements
* tag 'fsnotify_for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
shmem: allow reporting fanotify events with file handles on tmpfs
fs: introduce a wrapper uuid_to_fsid()
fanotify_user: use upper_32_bits() to verify mask
fanotify: support limited functionality for unprivileged users
fanotify: configurable limits via sysfs
fanotify: limit number of event merge attempts
fsnotify: use hash table for faster events merge
fanotify: mix event info and pid into merge key hash
fanotify: reduce event objectid to 29-bit hash
fsnotify: allow fsnotify_{peek,remove}_first_event with empty queue
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota, ext2, reiserfs updates from Jan Kara:
- support for path (instead of device) based quotactl syscall
(quotactl_path(2))
- ext2 conversion to kmap_local()
- other minor cleanups & fixes
* tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fs/reiserfs/journal.c: delete useless variables
fs/ext2: Replace kmap() with kmap_local_page()
ext2: Match up ext2_put_page() with ext2_dotdot() and ext2_find_entry()
fs/ext2/: fix misspellings using codespell tool
quota: report warning limits for realtime space quotas
quota: wire up quotactl_path
quota: Add mountpath based quota support
|
|
Pull xfs updates from Darrick Wong:
"The notable user-visible addition this cycle is ability to remove
space from the last AG in a filesystem. This is the first of many
changes needed for full-fledged support for shrinking a filesystem.
Still needed are (a) the ability to reorganize files and metadata away
from the end of the fs; (b) the ability to remove entire allocation
groups; (c) shrink support for realtime volumes; and (d) thorough
testing of (a-c).
There are a number of performance improvements in this code drop: Dave
streamlined various parts of the buffer logging code and reduced the
cost of various debugging checks, and added the ability to pre-create
the xattr structures while creating files. Brian eliminated
transaction reservations that were being held across writeback (thus
reducing livelock potential.
Other random pieces: Pavel fixed the repetitve warnings about
deprecated mount options, I fixed online fsck to behave itself when a
readonly remount comes in during scrub, and refactored various other
parts of that code, Christoph contributed a lot of refactoring this
cycle. The xfs_icdinode structure has been absorbed into the (incore)
xfs_inode structure, and the format and flags handling around
xfs_inode_fork structures has been simplified. Chandan provided a
number of fixes for extent count overflow related problems that have
been shaken out by debugging knobs added during 5.12.
Summary:
- Various minor fixes in online scrub.
- Prevent metadata files from being automatically inactivated.
- Validate btree heights by the computed per-btree limits.
- Don't warn about remounting with deprecated mount options.
- Initialize attr forks at create time if we suspect we're going to
need to store them.
- Reduce memory reallocation workouts in the logging code.
- Fix some theoretical math calculation errors in logged buffers that
span multiple discontig memory ranges but contiguous ondisk
regions.
- Speedups in dirty buffer bitmap handling.
- Make type verifier functions more inline-happy to reduce overhead.
- Reduce debug overhead in directory checking code.
- Many many typo fixes.
- Begin to handle the permanent loss of the very end of a filesystem.
- Fold struct xfs_icdinode into xfs_inode.
- Deprecate the long defunct BMV_IF_NO_DMAPI_READ from the bmapx
ioctl.
- Remove a broken directory block format check from online scrub.
- Fix a bug where we could produce an unnecessarily tall data fork
btree when creating an attr fork.
- Fix scrub and readonly remounts racing.
- Fix a writeback ioend log deadlock problem by dropping the behavior
where we could preallocate a setfilesize transaction.
- Fix some bugs in the new extent count checking code.
- Fix some bugs in the attr fork preallocation code.
- Refactor if_flags out of the incore inode fork data structure"
* tag 'xfs-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (77 commits)
xfs: remove xfs_quiesce_attr declaration
xfs: remove XFS_IFEXTENTS
xfs: remove XFS_IFINLINE
xfs: remove XFS_IFBROOT
xfs: only look at the fork format in xfs_idestroy_fork
xfs: simplify xfs_attr_remove_args
xfs: rename and simplify xfs_bmap_one_block
xfs: move the XFS_IFEXTENTS check into xfs_iread_extents
xfs: drop unnecessary setfilesize helper
xfs: drop unused ioend private merge and setfilesize code
xfs: open code ioend needs workqueue helper
xfs: drop submit side trans alloc for append ioends
xfs: fix return of uninitialized value in variable error
xfs: get rid of the ip parameter to xchk_setup_*
xfs: fix scrub and remount-ro protection when running scrub
xfs: move the check for post-EOF mappings into xfs_can_free_eofblocks
xfs: move the xfs_can_free_eofblocks call under the IOLOCK
xfs: precalculate default inode attribute offset
xfs: default attr fork size does not handle device inodes
xfs: inode fork allocation depends on XFS_IFEXTENT flag
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Fix some compiler and kernel-doc warnings
- Various minor cleanups and optimizations
- Add a new sysfs gfs2 status file with some filesystem wide
information
* tag 'gfs2-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix fall-through warnings for Clang
gfs2: Fix a number of kernel-doc warnings
gfs2: Make gfs2_setattr_simple static
gfs2: Add new sysfs file for gfs2 status
gfs2: Silence possible null pointer dereference warning
gfs2: Turn gfs2_meta_indirect_buffer into gfs2_meta_buffer
gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extent
gfs2: Turn gfs2_extent_map into gfs2_{get,alloc}_extent
gfs2: Add new gfs2_iomap_get helper
gfs2: Remove unused variable sb_format
gfs2: Fix dir.c function parameter descriptions
gfs2: Eliminate gh parameter from go_xmote_bh func
gfs2: don't create empty buffers for NO_CREATE
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Improve write performance with dirsync mount option
- Improve lookup performance
- Add support for FITRIM ioctl
- Fix a bug with discard option
* tag 'exfat-for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: speed up iterate/lookup by fixing start point of traversing cluster chain
exfat: improve write performance when dirsync enabled
exfat: add support ioctl and FITRIM function
exfat: introduce bitmap_lock for cluster bitmap access
exfat: fix erroneous discard when clear cluster bit
|
|
The final parameter of filemap_write_and_wait_range is the end of the
range to flush, not the length of the range to flush.
Fixes: 46afb0628b86 ("xfs: only flush the unshared range in xfs_reflink_unshare")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
The blocks used for allocation btrees (bnobt and countbt) are
technically considered free space. This is because as free space is
used, allocbt blocks are removed and naturally become available for
traditional allocation. However, this means that a significant
portion of free space may consist of in-use btree blocks if free
space is severely fragmented.
On large filesystems with large perag reservations, this can lead to
a rare but nasty condition where a significant amount of physical
free space is available, but the majority of actual usable blocks
consist of in-use allocbt blocks. We have a record of a (~12TB, 32
AG) filesystem with multiple AGs in a state with ~2.5GB or so free
blocks tracked across ~300 total allocbt blocks, but effectively at
100% full because the the free space is entirely consumed by
refcountbt perag reservation.
Such a large perag reservation is by design on large filesystems.
The problem is that because the free space is so fragmented, this AG
contributes the 300 or so allocbt blocks to the global counters as
free space. If this pattern repeats across enough AGs, the
filesystem lands in a state where global block reservation can
outrun physical block availability. For example, a streaming
buffered write on the affected filesystem continues to allow delayed
allocation beyond the point where writeback starts to fail due to
physical block allocation failures. The expected behavior is for the
delalloc block reservation to fail gracefully with -ENOSPC before
physical block allocation failure is a possibility.
To address this problem, set aside in-use allocbt blocks at
reservation time and thus ensure they cannot be reserved until truly
available for physical allocation. This allows alloc btree metadata
to continue to reside in free space, but dynamically adjusts
reservation availability based on internal state. Note that the
logic requires that the allocbt counter is fully populated at
reservation time before it is fully effective. We currently rely on
the mount time AGF scan in the perag reservation initialization code
for this dependency on filesystems where it's most important (i.e.
with active perag reservations).
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Introduce an in-core counter to track the sum of all allocbt blocks
used by the filesystem. This value is currently tracked per-ag via
the ->agf_btreeblks field in the AGF, which also happens to include
rmapbt blocks. A global, in-core count of allocbt blocks is required
to identify the subset of global ->m_fdblocks that consists of
unavailable blocks currently used for allocation btrees. To support
this calculation at block reservation time, construct a similar
global counter for allocbt blocks, populate it on first read of each
AGF and update it as allocbt blocks are used and released.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
perag reservation is enabled at mount time on a per AG basis. The
upcoming change to set aside allocbt blocks from block reservation
requires a populated allocbt counter as soon as possible after mount
to be fully effective against large perag reservations. Therefore as
a preparation step, initialize the pagf on all mounts where at least
one reservation is active. Note that this already occurs to some
degree on most default format filesystems as reservation requirement
calculations already depend on the AGF or AGI, depending on the
reservation type.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Since agf_btreeblks didn't exist before the lazysbcount feature, the fs
summary count scrubber needs to walk the free space btrees to determine
the amount of space being used by those btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
|
|
Keep the mount superblock counters up to date for !lazysbcount
filesystems so that when we log the superblock they do not need
updating in any way because they are already correct.
It's found by what Zorro reported:
1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
2. mount $dev $mnt
3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
4. umount $mnt
5. xfs_repair -n $dev
and I've seen no problem with this patch.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
The AGF free space btree block counter wasn't added until the
lazysbcount feature was added to XFS midway through the life of the V4
format, so ignore the field when checking. Online AGF repair requires
rmapbt, so it doesn't need the feature check.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
In commit f8f2835a9cf3 we changed the behavior of XFS to use EFIs to
remove blocks from an overfilled AGFL because there were complaints
about transaction overruns that stemmed from trying to free multiple
blocks in a single transaction.
Unfortunately, that commit missed a subtlety in the debug-mode
transaction accounting when a realtime volume is attached. If a
realtime file undergoes a data fork mapping change such that realtime
extents are allocated (or freed) in the same transaction that a data
device block is also allocated (or freed), we can trip a debugging
assertion. This can happen (for example) if a realtime extent is
allocated and it is necessary to reshape the bmbt to hold the new
mapping.
When we go to allocate a bmbt block from an AG, the first thing the data
device block allocator does is ensure that the freelist is the proper
length. If the freelist is too long, it will trim the freelist to the
proper length.
In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to
record the decrement in the AG free list count. Prior to f8f28 we would
put the free block back in the free space btrees in the same
transaction, which calls xfs_trans_agblocks_delta() to record the
increment in the AG free block count. Since AGFL blocks are included in
the global free block count (fdblocks), there is no corresponding
fdblocks update, so the AGFL free satisfies the following condition in
xfs_trans_apply_sb_deltas:
/*
* Check that superblock mods match the mods made to AGF counters.
*/
ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) ==
(tp->t_ag_freeblks_delta + tp->t_ag_flist_delta +
tp->t_ag_btree_delta));
The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is
the number blocks that were allocated.
After commit f8f28 we defer the block freeing to the next chained
transaction, which means that the calls to xfs_trans_agflist_delta and
xfs_trans_agblocks_delta occur in separate transactions. The (first)
transaction that shortens the free list trips on the comparison, which
has now become:
(X + 0) == ((X) + -1 + 0)
because we haven't freed the AGFL block yet; we've only logged an
intention to free it. When the second transaction (the deferred free)
commits, it will evaluate the expression as:
(0 + 0) == (1 + 0 + 0)
and trip over that in turn.
At this point, the astute reader may note that the two commits tagged by
this patch have been in the kernel for a long time but haven't generated
any bug reports. How is it that the author became aware of this bug?
This originally surfaced as an intermittent failure when I was testing
realtime rmap, but a different bug report by Zorro Lang reveals the same
assertion occuring on !lazysbcount filesystems.
The common factor to both reports (and why this problem wasn't
previously reported) becomes apparent if we consider when
xfs_trans_apply_sb_deltas is called by __xfs_trans_commit():
if (tp->t_flags & XFS_TRANS_SB_DIRTY)
xfs_trans_apply_sb_deltas(tp);
With a modern lazysbcount filesystem, transactions update only the
percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence
xfs_trans_apply_sb_deltas is rarely called.
However, updates to the count of free realtime extents are not part of
lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or
removing data fork mappings to realtime files; similarly,
XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems.
Dave mentioned in response to an earlier version of this patch:
"IIUC, what you are saying is that this debug code is simply not
exercised in normal testing and hasn't been for the past decade? And it
still won't be exercised on anything other than realtime device testing?
"...it was debugging code from 1994 that was largely turned into dead
code when lazysbcounters were introduced in 2007. Hence I'm not sure it
holds any value anymore."
This debugging code isn't especially helpful - you can modify the
flcount on one AG and the freeblks of another AG, and it won't trigger.
Add the fact that nobody noticed for a decade, and let's just get rid of
it (and start testing realtime :P).
This bug was found by running generic/051 on either a V4 filesystem
lacking lazysbcount; or a V5 filesystem with a realtime volume.
Cc: bfoster@redhat.com, zlang@redhat.com
Fixes: f8f2835a9cf3 ("xfs: defer agfl block frees when dfops is available")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
|
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|