aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2019-11-03NFSv4: nfs4_callback_getattr() should ignore revoked delegationsTrond Myklebust1-3/+1
If the delegation has been revoked, ignore it. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFSv4: Fix delegation handling in update_open_stateid()Trond Myklebust1-2/+2
If the delegation is marked as being revoked, then don't use it in the open state structure. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFSv4.1: Don't rebind to the same source port when reconnecting to the serverTrond Myklebust3-2/+9
NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look at the source port when deciding whether or not an RPC call is a replay of a previous call. This requires clients to perform strange TCP gymnastics in order to ensure that when they reconnect to the server, they bind to the same source port. NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics, that do not look at the source port of the connection. This patch therefore ensures they can ignore the rebind requirement. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFS/pnfs: Separate NFSv3 DS and MDS trafficTrond Myklebust2-0/+7
If a NFSv3 server is being used as both a DS and as a regular NFSv3 server, we may want to keep the IO traffic on a separate TCP connection, since it will typically have very different timeout characteristics. This patch therefore sets up a flag to separate the two modes of operation for the nfs_client. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03pNFS: nfs3_set_ds_client should set NFS_CS_NOPINGTrond Myklebust1-0/+2
Connecting to the DS is a non-interactive, asynchronous task, so there is no reason to fire up an extra RPC null ping in order to ensure that the server is up. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFS: Add a flag to tell nfs_client to set RPC_CLNT_CREATE_NOPINGTrond Myklebust1-0/+2
Add a flag to tell the nfs_client it should set RPC_CLNT_CREATE_NOPING when creating the rpc client. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFS: Use non-atomic bit ops when initialising struct nfs_client_initdataTrond Myklebust2-4/+4
We don't need atomic bit ops when initialising a local structure on the stack. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFSv3: Clean up timespec encodeTrond Myklebust1-8/+4
Simplify the struct iattr timestamp encoding by skipping the step of an intermediate struct timespec. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFSv2: Clean up timespec encodeTrond Myklebust1-12/+7
Simplify the struct iattr timestamp encoding by skipping the step of an intermediate struct timespec. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFSv2: Fix a typo in encode_sattr()Trond Myklebust1-1/+1
Encode the mtime correctly. Fixes: 95582b0083883 ("vfs: change inode times to use struct timespec64") Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFSv4: NFSv4 callbacks also support 64-bit timestampsTrond Myklebust3-7/+7
Convert the NFSv4 callbacks to use struct timestamp64, rather than truncating times to 32-bit values. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFSv4: Encode 64-bit timestampsTrond Myklebust1-6/+3
NFSv4 supports 64-bit timestamps, so there is no point in converting the struct iattr timestamps to 32-bits before encoding. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFS: Convert struct nfs_fattr to use struct timespec64Trond Myklebust5-37/+37
NFSv4 supports 64-bit times, so we should switch to using struct timespec64 when decoding attributes. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-03NFS: If nfs_mountpoint_expiry_timeout < 0, do not expire submountsTrond Myklebust1-0/+3
If we set nfs_mountpoint_expiry_timeout to a negative value, then allow that to imply that we do not expire NFSv4 submounts. Signed-off-by: Trond Myklebust <[email protected]>
2019-11-02Merge tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-1/+2
Pull cifs fix from Steve French: "A small smb3 memleak fix" * tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6: fix memory leak in large read decrypt offload
2019-11-01Merge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds3-6/+14
Pull NFS client bugfixes from Anna Schumaker: "This contains two delegation fixes (with the RCU lock leak fix marked for stable), and three patches to fix destroying the the sunrpc back channel. Stable bugfixes: - Fix an RCU lock leak in nfs4_refresh_delegation_stateid() Other fixes: - The TCP back channel mustn't disappear while requests are outstanding - The RDMA back channel mustn't disappear while requests are outstanding - Destroy the back channel when we destroy the host transport - Don't allow a cached open with a revoked delegation" * tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid() NFSv4: Don't allow a cached open with a revoked delegation SUNRPC: Destroy the back channel when we destroy the host transport SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
2019-11-01Merge tag 'for-linus-20191101' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+10
Pull block fixes from Jens Axboe: - Two small nvme fixes, one is a fabrics connection fix, the other one a cleanup made possible by that fix (Anton, via Keith) - Fix requeue handling in umb ubd (Anton) - Fix spin_lock_irq() nesting in blk-iocost (Dan) - Three small io_uring fixes: - Install io_uring fd after done with ctx (me) - Clear ->result before every poll issue (me) - Fix leak of shadow request on error (Pavel) * tag 'for-linus-20191101' of git://git.kernel.dk/linux-block: iocost: don't nest spin_lock_irq in ioc_weight_write() io_uring: ensure we clear io_kiocb->result before each issue um-ubd: Entrust re-queue to the upper layers nvme-multipath: remove unused groups_only mode in ana log nvme-multipath: fix possible io hang after ctrl reconnect io_uring: don't touch ctx in setup after ring fd install io_uring: Fix leaked shadow_req
2019-11-01NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()Trond Myklebust1-1/+1
A typo in nfs4_refresh_delegation_stateid() means we're leaking an RCU lock, and always returning a value of 'false'. As the function description states, we were always supposed to return 'true' if a matching delegation was found. Fixes: 12f275cdd163 ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.") Cc: [email protected] # v4.15+ Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2019-11-01NFSv4: Don't allow a cached open with a revoked delegationTrond Myklebust3-5/+13
If the delegation is marked as being revoked, we must not use it for cached opens. Fixes: 869f9dfa4d6d ("NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2019-10-30io_uring: ensure we clear io_kiocb->result before each issueJens Axboe1-0/+1
We use io_kiocb->result == -EAGAIN as a way to know if we need to re-submit a polled request, as -EAGAIN reporting happens out-of-line for IO submission failures. This field is cleared when we originally allocate the request, but it isn't reset when we retry the submission from async context. This can cause issues where we think something needs a re-issue, but we're really just reading stale data. Reset ->result whenever we re-prep a request for polled submission. Cc: [email protected] Fixes: 9e645e1105ca ("io_uring: add support for sqe links") Reported-by: Bijan Mottahedeh <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-10-30Merge tag 'gfs2-v5.4-rc5.fixes' of ↵Linus Torvalds1-7/+13
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: "Fix remounting (broken in -rc1)." * tag 'gfs2-v5.4-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix initialisation of args for remount
2019-10-30gfs2: Fix initialisation of args for remountAndrew Price1-7/+13
When gfs2 was converted to use fs_context, the initialisation of the mount args structure to the currently active args was lost with the removal of gfs2_remount_fs(), so the checks of the new args on remount became checks against the default values instead of the current ones. This caused unexpected remount behaviour and test failures (xfstests generic/294, generic/306 and generic/452). Reinstate the args initialisation, this time in gfs2_init_fs_context() and conditional upon fc->purpose, as that's the only time we get control before the mount args are parsed in the remount process. Fixes: 1f52aa08d12f ("gfs2: Convert gfs2 to fs_context") Signed-off-by: Andrew Price <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2019-10-29Merge tag 'fuse-fixes-5.4-rc6' of ↵Linus Torvalds7-65/+149
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Mostly virtiofs fixes, but also fixes a regression and couple of longstanding data/metadata writeback ordering issues" * tag 'fuse-fixes-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: redundant get_fuse_inode() calls in fuse_writepages_fill() fuse: Add changelog entries for protocols 7.1 - 7.8 fuse: truncate pending writes on O_TRUNC fuse: flush dirty data/metadata before non-truncate setattr virtiofs: Remove set but not used variable 'fc' virtiofs: Retry request submission from worker context virtiofs: Count pending forgets as in_flight forgets virtiofs: Set FR_SENT flag only after request has been sent virtiofs: No need to check fpq->connected state virtiofs: Do not end request in submission context fuse: don't advise readdirplus for negative lookup fuse: don't dereference req->args on finished request virtio-fs: don't show mount options virtio-fs: Change module name to virtiofs.ko
2019-10-28io_uring: don't touch ctx in setup after ring fd installJens Axboe1-4/+8
syzkaller reported an issue where it looks like a malicious app can trigger a use-after-free of reading the ctx ->sq_array and ->rings value right after having installed the ring fd in the process file table. Defer ring fd installation until after we're done reading those values. Fixes: 75b28affdd6a ("io_uring: allocate the two rings together") Reported-by: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2019-10-27io_uring: Fix leaked shadow_reqPavel Begunkov1-0/+1
io_queue_link_head() owns shadow_req after taking it as an argument. By not freeing it in case of an error, it can leak the request along with taken ctx->refs. Reviewed-by: Jackie Liu <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-10-27fix memory leak in large read decrypt offloadSteve French1-1/+2
Spotted by Ronnie. Reviewed-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]>
2019-10-27Merge tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds9-37/+75
Pull cifs fixes from Steve French: "Seven cifs/smb3 fixes, including three for stable" * tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs CIFS: Fix use after free of file info structures CIFS: Fix retry mid list corruption on reconnects cifs: Fix missed free operations CIFS: avoid using MID 0xFFFF cifs: clarify comment about timestamp granularity for old servers cifs: Handle -EINPROGRESS only when noblockcnt is set
2019-10-26Merge tag 'for-linus-2019-10-26' of git://git.kernel.dk/linux-blockLinus Torvalds1-94/+113
Pull block and io_uring fixes from Jens Axboe: "A bit bigger than usual at this point in time, mostly due to some good bug hunting work by Pavel that resulted in three io_uring fixes from him and two from me. Anyway, this pull request contains: - Revert of the submit-and-wait optimization for io_uring, it can't always be done safely. It depends on commands always making progress on their own, which isn't necessarily the case outside of strict file IO. (me) - Series of two patches from me and three from Pavel, fixing issues with shared data and sequencing for io_uring. - Lastly, two timeout sequence fixes for io_uring (zhangyi) - Two nbd patches fixing races (Josef) - libahci regulator_get_optional() fix (Mark)" * tag 'for-linus-2019-10-26' of git://git.kernel.dk/linux-block: nbd: verify socket is supported during setup ata: libahci_platform: Fix regulator_get_optional() misuse nbd: handle racing with error'ed out commands nbd: protect cmd->status with cmd->lock io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD io_uring: used cached copies of sq->dropped and cq->overflow io_uring: Fix race for sqes with userspace io_uring: Fix broken links with offloading io_uring: Fix corrupted user_data io_uring: correct timeout req sequence when inserting a new entry io_uring : correct timeout req sequence when waiting timeout io_uring: revert "io_uring: optimize submit_and_wait API"
2019-10-26Merge tag 'dax-fix-5.4-rc5' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fix from Dan Williams: "Fix a performance regression that followed from a fix to the conversion of the fsdax implementation to the xarray. v5.3 users report that they stop seeing huge page mappings on an application + filesystem layout that was seeing huge pages previously on v5.2" * tag 'dax-fix-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: fs/dax: Fix pmd vs pte conflict detection
2019-10-25io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREADJens Axboe1-12/+32
We currently assume that submissions from the sqthread are successful, and if IO polling is enabled, we use that value for knowing how many completions to look for. But if we overflowed the CQ ring or some requests simply got errored and already completed, they won't be available for polling. For the case of IO polling and SQTHREAD usage, look at the pending poll list. If it ever hits empty then we know that we don't have anymore pollable requests inflight. For that case, simply reset the inflight count to zero. Reported-by: Pavel Begunkov <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-10-25io_uring: used cached copies of sq->dropped and cq->overflowJens Axboe1-5/+8
We currently use the ring values directly, but that can lead to issues if the application is malicious and changes these values on our behalf. Created in-kernel cached versions of them, and just overwrite the user side when we update them. This is similar to how we treat the sq/cq ring tail/head updates. Reported-by: Pavel Begunkov <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-10-25io_uring: Fix race for sqes with userspacePavel Begunkov1-1/+2
io_ring_submit() finalises with 1. io_commit_sqring(), which releases sqes to the userspace 2. Then calls to io_queue_link_head(), accessing released head's sqe Reorder them. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-10-25io_uring: Fix broken links with offloadingPavel Begunkov1-29/+33
io_sq_thread() processes sqes by 8 without considering links. As a result, links will be randomely subdivided. The easiest way to fix it is to call io_get_sqring() inside io_submit_sqes() as do io_ring_submit(). Downsides: 1. This removes optimisation of not grabbing mm_struct for fixed files 2. It submitting all sqes in one go, without finer-grained sheduling with cq processing. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-10-25io_uring: Fix corrupted user_dataPavel Begunkov1-0/+2
There is a bug, where failed linked requests are returned not with specified @user_data, but with garbage from a kernel stack. The reason is that io_fail_links() uses req->user_data, which is uninitialised when called from io_queue_sqe() on fail path. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-10-24cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occursDave Wysochanski4-9/+22
There's a deadlock that is possible and can easily be seen with a test where multiple readers open/read/close of the same file and a disruption occurs causing reconnect. The deadlock is due a reader thread inside cifs_strict_readv calling down_read and obtaining lock_sem, and then after reconnect inside cifs_reopen_file calling down_read a second time. If in between the two down_read calls, a down_write comes from another process, deadlock occurs. CPU0 CPU1 ---- ---- cifs_strict_readv() down_read(&cifsi->lock_sem); _cifsFileInfo_put OR cifs_new_fileinfo down_write(&cifsi->lock_sem); cifs_reopen_file() down_read(&cifsi->lock_sem); Fix the above by changing all down_write(lock_sem) calls to down_write_trylock(lock_sem)/msleep() loop, which in turn makes the second down_read call benign since it will never block behind the writer while holding lock_sem. Signed-off-by: Dave Wysochanski <[email protected]> Suggested-by: Ronnie Sahlberg <[email protected]> Reviewed--by: Ronnie Sahlberg <[email protected]> Reviewed-by: Pavel Shilovsky <[email protected]>
2019-10-24CIFS: Fix use after free of file info structuresPavel Shilovsky1-3/+3
Currently the code assumes that if a file info entry belongs to lists of open file handles of an inode and a tcon then it has non-zero reference. The recent changes broke that assumption when putting the last reference of the file info. There may be a situation when a file is being deleted but nothing prevents another thread to reference it again and start using it. This happens because we do not hold the inode list lock while checking the number of references of the file info structure. Fix this by doing the proper locking when doing the check. Fixes: 487317c99477d ("cifs: add spinlock for the openFileList to cifsInodeInfo") Fixes: cb248819d209d ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic") Cc: Stable <[email protected]> Reviewed-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Pavel Shilovsky <[email protected]> Signed-off-by: Steve French <[email protected]>
2019-10-24CIFS: Fix retry mid list corruption on reconnectsPavel Shilovsky2-20/+32
When the client hits reconnect it iterates over the mid pending queue marking entries for retry and moving them to a temporary list to issue callbacks later without holding GlobalMid_Lock. In the same time there is no guarantee that mids can't be removed from the temporary list or even freed completely by another thread. It may cause a temporary list corruption: [ 430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469 [ 430.464668] ------------[ cut here ]------------ [ 430.466569] kernel BUG at lib/list_debug.c:51! [ 430.468476] invalid opcode: 0000 [#1] SMP PTI [ 430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19 [ 430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55 ... [ 430.510426] Call Trace: [ 430.511500] cifs_reconnect+0x25e/0x610 [cifs] [ 430.513350] cifs_readv_from_socket+0x220/0x250 [cifs] [ 430.515464] cifs_read_from_socket+0x4a/0x70 [cifs] [ 430.517452] ? try_to_wake_up+0x212/0x650 [ 430.519122] ? cifs_small_buf_get+0x16/0x30 [cifs] [ 430.521086] ? allocate_buffers+0x66/0x120 [cifs] [ 430.523019] cifs_demultiplex_thread+0xdc/0xc30 [cifs] [ 430.525116] kthread+0xfb/0x130 [ 430.526421] ? cifs_handle_standard+0x190/0x190 [cifs] [ 430.528514] ? kthread_park+0x90/0x90 [ 430.530019] ret_from_fork+0x35/0x40 Fix this by obtaining extra references for mids being retried and marking them as MID_DELETED which indicates that such a mid has been dequeued from the pending list. Also move mid cleanup logic from DeleteMidQEntry to _cifs_mid_q_entry_release which is called when the last reference to a particular mid is put. This allows to avoid any use-after-free of response buffers. The patch needs to be backported to stable kernels. A stable tag is not mentioned below because the patch doesn't apply cleanly to any actively maintained stable kernel. Reviewed-by: Ronnie Sahlberg <[email protected]> Reviewed-and-tested-by: David Wysochanski <[email protected]> Signed-off-by: Pavel Shilovsky <[email protected]> Signed-off-by: Steve French <[email protected]>
2019-10-24Merge tag 'gfs2-v5.4-rc4.fixes' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: "Fix a memory leak introduced in -rc1" * tag 'gfs2-v5.4-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix memory leak when gfs2meta's fs_context is freed
2019-10-24gfs2: Fix memory leak when gfs2meta's fs_context is freedAndrew Price1-0/+1
gfs2 and gfs2meta share an ->init_fs_context function which allocates an args structure stored in fc->fs_private. gfs2 registers a ->free function to free this memory when the fs_context is cleaned up, but there was not one registered for gfs2meta, causing a leak. Register a ->free function for gfs2meta. The existing gfs2_fc_free function does what we need. Reported-by: [email protected] Fixes: 1f52aa08d12f ("gfs2: Convert gfs2 to fs_context") Signed-off-by: Andrew Price <[email protected]> Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2019-10-23io_uring: correct timeout req sequence when inserting a new entryzhangyi (F)1-1/+10
The sequence number of the timeout req (req->sequence) indicate the expected completion request. Because of each timeout req consume a sequence number, so the sequence of each timeout req on the timeout list shouldn't be the same. But now, we may get the same number (also incorrect) if we insert a new entry before the last one, such as submit such two timeout reqs on a new ring instance below. req->sequence req_1 (count = 2): 2 req_2 (count = 1): 2 Then, if we submit a nop req, req_2 will still timeout even the nop req finished. This patch fix this problem by adjust the sequence number of each reordered reqs when inserting a new entry. Signed-off-by: zhangyi (F) <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-10-23io_uring : correct timeout req sequence when waiting timeoutzhangyi (F)1-1/+10
The sequence number of reqs on the timeout_list before the timeout req should be adjusted in io_timeout_fn(), because the current timeout req will consumes a slot in the cq_ring and cq_tail pointer will be increased, otherwise other timeout reqs may return in advance without waiting for enough wait_nr. Signed-off-by: zhangyi (F) <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-10-23io_uring: revert "io_uring: optimize submit_and_wait API"Jens Axboe1-46/+17
There are cases where it isn't always safe to block for submission, even if the caller asked to wait for events as well. Revert the previous optimization of doing that. This reverts two commits: bf7ec93c644cb c576666863b78 Fixes: c576666863b78 ("io_uring: optimize submit_and_wait API") Signed-off-by: Jens Axboe <[email protected]>
2019-10-23fuse: redundant get_fuse_inode() calls in fuse_writepages_fill()Vasily Averin1-3/+1
Currently fuse_writepages_fill() calls get_fuse_inode() few times with the same argument. Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2019-10-23fuse: truncate pending writes on O_TRUNCMiklos Szeredi1-3/+7
Make sure cached writes are not reordered around open(..., O_TRUNC), with the obvious wrong results. Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on") Cc: <[email protected]> # v3.15+ Signed-off-by: Miklos Szeredi <[email protected]>
2019-10-23fuse: flush dirty data/metadata before non-truncate setattrMiklos Szeredi1-0/+13
If writeback cache is enabled, then writes might get reordered with chmod/chown/utimes. The problem with this is that performing the write in the fuse daemon might itself change some of these attributes. In such case the following sequence of operations will result in file ending up with the wrong mode, for example: int fd = open ("suid", O_WRONLY|O_CREAT|O_EXCL); write (fd, "1", 1); fchown (fd, 0, 0); fchmod (fd, 04755); close (fd); This patch fixes this by flushing pending writes before performing chown/chmod/utimes. Reported-by: Giuseppe Scrivano <[email protected]> Tested-by: Giuseppe Scrivano <[email protected]> Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on") Cc: <[email protected]> # v3.15+ Signed-off-by: Miklos Szeredi <[email protected]>
2019-10-23Merge tag 'for-5.4-rc4-tag' of ↵Linus Torvalds10-56/+41
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fixes of error handling cleanup of metadata accounting with qgroups enabled - fix swapped values for qgroup tracepoints - fix race when handling full sync flag - don't start unused worker thread, functionality removed already * tag 'for-5.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: check for the full sync flag while holding the inode lock during fsync Btrfs: fix qgroup double free after failure to reserve metadata for delalloc btrfs: tracepoints: Fix bad entry members of qgroup events btrfs: tracepoints: Fix wrong parameter order for qgroup events btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents() btrfs: don't needlessly create extent-refs kernel thread btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() Btrfs: add missing extents release on file extent cluster relocation error
2019-10-23virtiofs: Remove set but not used variable 'fc'zhengbin1-2/+0
Fixes gcc '-Wunused-but-set-variable' warning: fs/fuse/virtio_fs.c: In function virtio_fs_wake_pending_and_unlock: fs/fuse/virtio_fs.c:983:20: warning: variable fc set but not used [-Wunused-but-set-variable] It is not used since commit 7ee1e2e631db ("virtiofs: No need to check fpq->connected state") Reported-by: Hulk Robot <[email protected]> Signed-off-by: zhengbin <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2019-10-22fs/dax: Fix pmd vs pte conflict detectionDan Williams1-2/+3
Users reported a v5.3 performance regression and inability to establish huge page mappings. A revised version of the ndctl "dax.sh" huge page unit test identifies commit 23c84eb78375 "dax: Fix missed wakeup with PMD faults" as the source. Update get_unlocked_entry() to check for NULL entries before checking the entry order, otherwise NULL is misinterpreted as a present pte conflict. The 'order' check needs to happen before the locked check as an unlocked entry at the wrong order must fallback to lookup the correct order. Reported-by: Jeff Smits <[email protected]> Reported-by: Doug Nelson <[email protected]> Cc: <[email protected]> Fixes: 23c84eb78375 ("dax: Fix missed wakeup with PMD faults") Reviewed-by: Jan Kara <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Link: https://lore.kernel.org/r/157167532455.3945484.11971474077040503994.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]>
2019-10-21virtiofs: Retry request submission from worker contextVivek Goyal1-9/+52
If regular request queue gets full, currently we sleep for a bit and retrying submission in submitter's context. This assumes submitter is not holding any spin lock. But this assumption is not true for background requests. For background requests, we are called with fc->bg_lock held. This can lead to deadlock where one thread is trying submission with fc->bg_lock held while request completion thread has called fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As request completion thread gets blocked, it does not make further progress and that means queue does not get empty and submitter can't submit more requests. To solve this issue, retry submission with the help of a worker, instead of retrying in submitter's context. We already do this for hiprio/forget requests. Reported-by: Chirantan Ekbote <[email protected]> Signed-off-by: Vivek Goyal <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2019-10-21virtiofs: Count pending forgets as in_flight forgetsVivek Goyal1-24/+20
If virtqueue is full, we put forget requests on a list and these forgets are dispatched later using a worker. As of now we don't count these forgets in fsvq->in_flight variable. This means when queue is being drained, we have to have special logic to first drain these pending requests and then wait for fsvq->in_flight to go to zero. By counting pending forgets in fsvq->in_flight, we can get rid of special logic and just wait for in_flight to go to zero. Worker thread will kick and drain all the forgets anyway, leading in_flight to zero. I also need similar logic for normal request queue in next patch where I am about to defer request submission in the worker context if queue is full. This simplifies the code a bit. Also add two helper functions to inc/dec in_flight. Decrement in_flight helper will later used to call completion when in_flight reaches zero. Signed-off-by: Vivek Goyal <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>