aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-10-21ext4: add fast_commit feature and handling for extended mount optionsHarshad Shirwadkar2-5/+26
We are running out of mount option bits. Add handling for using s_mount_opt2. Add ext4 and jbd2 fast commit feature flag and also add ability to turn off the fast commit feature in Ext4. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-3-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-22exfat: remove useless check in exfat_move_file()Tetsuhiro Kohada1-5/+0
In exfat_move_file(), the identity of source and target directory has been checked by the caller. Also, it gets stream.start_clu from file dir-entry, which is an invalid determination. Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22exfat: remove 'rwoffset' in exfat_inode_infoTetsuhiro Kohada5-20/+9
Remove 'rwoffset' in exfat_inode_info and replace it with the parameter of exfat_readdir(). Since rwoffset is referenced only by exfat_readdir(), it is not necessary a exfat_inode_info's member. Also, change cpos to point to the next of entry-set, and return the index of dir-entry via dir_entry->entry. Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22exfat: replace memcpy with structure assignmentTetsuhiro Kohada3-14/+10
Use structure assignment instead of memcpy. Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22exfat: remove useless directory scan in exfat_add_entry()Tetsuhiro Kohada1-10/+1
There is nothing in directory just created, so there is no need to scan. Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22exfat: eliminate dead code in exfat_find()Tetsuhiro Kohada2-74/+47
The exfat_find_dir_entry() called by exfat_find() doesn't return -EEXIST. Therefore, the root-dir information setting is never executed. Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22exfat: use i_blocksize() to get blocksizeXianting Tian1-1/+1
We alreday has the interface i_blocksize() to get blocksize, so use it. Signed-off-by: Xianting Tian <tian.xianting@h3c.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-22exfat: fix misspellings using codespell toolNamjae Jeon3-3/+3
Sedat reported typos using codespell tool. $ codespell fs/exfat/*.c | grep -v iput fs/exfat/namei.c:293: upto ==> up to fs/exfat/nls.c:14: tabel ==> table $ codespell fs/exfat/*.h fs/exfat/exfat_fs.h:133: usally ==> usually Fix typos found by codespell. Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-10-21xfs: cancel intents immediately if process_intents failsDarrick J. Wong1-0/+8
If processing recovered log intent items fails, we need to cancel all the unprocessed recovered items immediately so that a subsequent AIL push in the bail out path won't get wedged on the pinned intent items that didn't get processed. This can happen if the log contains (1) an intent that gets and releases an inode, (2) an intent that cannot be recovered successfully, and (3) some third intent item. When recovery of (2) fails, we leave (3) pinned in memory. Inode reclamation is called in the error-out path of xfs_mountfs before xfs_log_cancel_mount. Reclamation calls xfs_ail_push_all_sync, which gets stuck waiting for (3). Therefore, call xlog_recover_cancel_intents if _process_intents fails. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-10-21smb3: do not try to cache root directory if dir leases not supportedSteve French1-1/+4
To servers which do not support directory leases (e.g. Samba) it is wasteful to try to open_shroot (ie attempt to cache the root directory handle). Skip attempt to open_shroot when server does not indicate support for directory leases. Cuts the number of requests on mount from 17 to 15, and cuts the number of requests on stat of the root directory from 4 to 3. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> # v5.1+
2020-10-21smb3: fix stat when special device file and mounted with modefromsidSteve French1-1/+6
When mounting with modefromsid mount option, it was possible to get the error on stat of a fifo or char or block device: "cannot stat <filename>: Operation not supported" Special devices can be stored as reparse points by some servers (e.g. Windows NFS server and when using the SMB3.1.1 POSIX Extensions) but when the modefromsid mount option is used the client attempts to get the ACL for the file which requires opening with OPEN_REPARSE_POINT create option. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
2020-10-21cifs: Print the address and port we are connecting to in generic_ip_connect()Samuel Cabrero1-2/+10
Can be helpful in debugging mount and reconnect issues Signed-off-by: Samuel Cabrero <scabrero@suse.de> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-21SMB3: Resolve data corruption of TCP server info fieldsRohith Surabattula1-5/+7
TCP server info field server->total_read is modified in parallel by demultiplex thread and decrypt offload worker thread. server->total_read is used in calculation to discard the remaining data of PDU which is not read into memory. Because of parallel modification, server->total_read can get corrupted and can result in discarding the valid data of next PDU. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> #5.4+ Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-21io_uring: don't reuse linked_timeoutPavel Begunkov1-1/+3
Clear linked_timeout for next requests in __io_queue_sqe() so we won't queue it up unnecessary when it's going to be punted. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Cc: stable@vger.kernel.org # v5.9 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-21Merge tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds16-396/+451
Pull ceph updates from Ilya Dryomov: - a patch that removes crush_workspace_mutex (myself). CRUSH computations are no longer serialized and can run in parallel. - a couple new filesystem client metrics for "ceph fs top" command (Xiubo Li) - a fix for a very old messenger bug that affected the filesystem, marked for stable (myself) - assorted fixups and cleanups throughout the codebase from Jeff and others. * tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client: (27 commits) libceph: clear con->out_msg on Policy::stateful_server faults libceph: format ceph_entity_addr nonces as unsigned libceph: fix ENTITY_NAME format suggestion libceph: move a dout in queue_con_delay() ceph: comment cleanups and clarifications ceph: break up send_cap_msg ceph: drop separate mdsc argument from __send_cap ceph: promote to unsigned long long before shifting ceph: don't SetPageError on readpage errors ceph: mark ceph_fmt_xattr() as printf-like for better type checking ceph: fold ceph_update_writeable_page into ceph_write_begin ceph: fold ceph_sync_writepages into writepage_nounlock ceph: fold ceph_sync_readpages into ceph_readpage ceph: don't call ceph_update_writeable_page from page_mkwrite ceph: break out writeback of incompatible snap context to separate function ceph: add a note explaining session reject error string libceph: switch to the new "osd blocklist add" command libceph, rbd, ceph: "blacklist" -> "blocklist" ceph: have ceph_writepages_start call pagevec_lookup_range_tag ceph: use kill_anon_super helper ...
2020-10-21xfs: fix fallocate functions when rtextsize is larger than 1Darrick J. Wong3-18/+46
In commit fe341eb151ec, I forgot that xfs_free_file_space isn't strictly a "remove mapped blocks" function. It is actually a function to zero file space by punching out the middle and writing zeroes to the unaligned ends of the specified range. Therefore, putting a rtextsize alignment check in that function is wrong because that breaks unaligned ZERO_RANGE on the realtime volume. Furthermore, xfs_file_fallocate already has alignment checks for the functions require the file range to be aligned to the size of a fundamental allocation unit (which is 1 FSB on the data volume and 1 rt extent on the realtime volume). Create a new helper to check fallocate arguments against the realtiem allocation unit size, fix the fallocate frontend to use it, fix free_file_space to delete the correct range, and remove a now redundant check from insert_file_space. NOTE: The realtime extent size is not required to be a power of two! Fixes: fe341eb151ec ("xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2020-10-21NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copyDai Ngo7-8/+152
NFS_FS=y as dependency of CONFIG_NFSD_V4_2_INTER_SSC still have build errors and some configs with NFSD=m to get NFS4ERR_STALE error when doing inter server copy. Added ops table in nfs_common for knfsd to access NFS client modules. Fixes: 3ac3711adb88 ("NFSD: Fix NFS server build errors") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-10-20io_uring: unify fsize with def->work_flagsJens Axboe3-14/+18
This one was missed in the earlier conversion, should be included like any of the other IO identity flags. Make sure we restore to RLIM_INIFITY when dropping the personality again. Fixes: 98447d65b4a7 ("io_uring: move io identity items into separate struct") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-20gfs2: Add fields for statfs info in struct gfs2_log_header_hostAbhi Das4-1/+11
And read these in __get_log_header() from the log header. Also make gfs2_statfs_change_out() non-static so it can be used outside of super.c Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20gfs2: Ignore subsequent errors after withdraw in rgrp_go_syncAndreas Gruenbacher1-1/+1
Once a withdraw has occurred, ignore errors that are the consequence of the withdraw. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20gfs2: Eliminate gl_vmBob Peterson3-30/+16
The gfs2_glock structure has a gl_vm member, introduced in commit 7005c3e4ae428 ("GFS2: Use range based functions for rgrp sync/invalidation"), which stores the location of resource groups within their address space. This structure is in a union with iopen glock specific fields. It was introduced because at unmount time, the resource group objects were destroyed before flushing out any pending resource group glock work, and flushing out such work could require flushing / truncating the address space. Since commit b3422cacdd7e6 ("gfs2: Rework how rgrp buffer_heads are managed"), any pending resource group glock work is flushed out before destroying the resource group objects. So the resource group objects will now always exist in rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values where needed instead of caching them. This also eliminates the union. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20gfs2: Only access gl_delete for iopen glocksBob Peterson1-4/+7
Only initialize gl_delete for iopen glocks, but more importantly, only access it for iopen glocks in flush_delete_work: flush_delete_work is called for different types of glocks including rgrp glocks, and those use gl_vm which is in a union with gl_delete. Without this fix, we'll end up clobbering gl_vm, which results in general memory corruption. Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20gfs2: Fix comments to glock_hash_walkBob Peterson1-2/+1
The comments before function glock_hash_walk had the wrong name and an extra parameter. This simply fixes the comments. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20Merge tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds16-115/+397
Pull NFS client updates from Anna Schumaker: "Stable Fixes: - Wait for stateid updates after CLOSE/OPEN_DOWNGRADE # v5.4+ - Fix nfs_path in case of a rename retry - Support EXCHID4_FLAG_SUPP_FENCE_OPS v4.2 EXCHANGE_ID flag New features and improvements: - Replace dprintk() calls with tracepoints - Make cache consistency bitmap dynamic - Added support for the NFS v4.2 READ_PLUS operation - Improvements to net namespace uniquifier Other bugfixes and cleanups: - Remove redundant clnt pointer - Don't update timeout values on connection resets - Remove redundant tracepoints - Various cleanups to comments - Fix oops when trying to use copy_file_range with v4.0 source server - Improvements to flexfiles mirrors - Add missing 'local_lock=posix' mount option" * tag 'nfs-for-5.10-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (55 commits) NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag NFSv4: Fix up RCU annotations for struct nfs_netns_client NFS: Only reference user namespace from nfs4idmap struct instead of cred nfs: add missing "posix" local_lock constant table definition NFSv4: Use the net namespace uniquifier if it is set NFSv4: Clean up initialisation of uniquified client id strings NFS: Decode a full READ_PLUS reply SUNRPC: Add an xdr_align_data() function NFS: Add READ_PLUS hole segment decoding SUNRPC: Add the ability to expand holes in data pages SUNRPC: Split out _shift_data_right_tail() SUNRPC: Split out xdr_realign_pages() from xdr_align_pages() NFS: Add READ_PLUS data segment support NFS: Use xdr_page_pos() in NFSv4 decode_getacl() SUNRPC: Implement a xdr_page_pos() function SUNRPC: Split out a function for setting current page NFS: fix nfs_path in case of a rename retry fs: nfs: return per memcg count for xattr shrinkers NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE nfs: remove incorrect fallthrough label ...
2020-10-20Merge tag 'io_uring-5.10-2020-10-20' of git://git.kernel.dk/linux-blockLinus Torvalds4-289/+432
Pull io_uring updates from Jens Axboe: "A mix of fixes and a few stragglers. In detail: - Revert the bogus __read_mostly that we discussed for the initial pull request. - Fix a merge window regression with fixed file registration error path handling. - Fix io-wq numa node affinities. - Series abstracting out an io_identity struct, making it both easier to see what the personality items are, and also easier to to adopt more. Use this to cover audit logging. - Fix for read-ahead disabled block condition in async buffered reads, and using single page read-ahead to unify what generic_file_buffer_read() path is used. - Series for REQ_F_COMP_LOCKED fix and removal of it (Pavel) - Poll fix (Pavel)" * tag 'io_uring-5.10-2020-10-20' of git://git.kernel.dk/linux-block: (21 commits) io_uring: use blk_queue_nowait() to check if NOWAIT supported mm: use limited read-ahead to satisfy read mm: mark async iocb read as NOWAIT once some data has been copied io_uring: fix double poll mask init io-wq: inherit audit loginuid and sessionid io_uring: use percpu counters to track inflight requests io_uring: assign new io_identity for task if members have changed io_uring: store io_identity in io_uring_task io_uring: COW io_identity on mismatch io_uring: move io identity items into separate struct io_uring: rely solely on work flags to determine personality. io_uring: pass required context in as flags io-wq: assign NUMA node locality if appropriate io_uring: fix error path cleanup in io_sqe_files_register() Revert "io_uring: mark io_uring_fops/io_op_defs as __read_mostly" io_uring: fix REQ_F_COMP_LOCKED by killing it io_uring: dig out COMP_LOCK from deep call chain io_uring: don't put a poll req under spinlock io_uring: don't unnecessarily clear F_LINK_TIMEOUT io_uring: don't set COMP_LOCKED if won't put ...
2020-10-20cifs: make const array static, makes object smallerColin Ian King1-3/+5
Don't populate const array smb3_create_tag_posix on the stack but instead make it static. Makes the object code smaller by 50 bytes. Before: text data bss dec hex filename 150184 47167 0 197351 302e7 fs/cifs/smb2pdu.o After: text data bss dec hex filename 150070 47231 0 197301 302b5 fs/cifs/smb2pdu.o (gcc version 10.2.0) Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-20SMB3.1.1: Fix ids returned in POSIX query dirSteve French3-5/+7
We were setting the uid/gid to the default in each dir entry in the parsing of the POSIX query dir response, rather than attempting to map the user and group SIDs returned by the server to well known SIDs (or upcall if not found). CC: Stable <stable@vger.kernel.org> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-20smb3: add dynamic trace point to trace when credits obtainedSteve French3-9/+18
SMB3 crediting is used for flow control, and it can be useful to trace for problem determination how many credits were acquired and for which operation. Here is an example ("trace-cmd record -e *add_credits"): cifsd-9522    [010] ....  5995.202712: smb3_add_credits: server=localhost current_mid=0x12 credits=373 credits_to_add=10 cifsd-9522    [010] ....  5995.204040: smb3_add_credits: server=localhost current_mid=0x15 credits=400 credits_to_add=30 Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-20smb3.1.1: do not fail if no encryption required but server doesn't support itSteve French1-3/+13
There are cases where the server can return a cipher type of 0 and it not be an error. For example server supported no encryption types (e.g. server completely disabled encryption), or the server and client didn't support any encryption types in common (e.g. if a server only supported AES256_CCM). In those cases encryption would not be supported, but that can be ok if the client did not require encryption on mount and it should not return an error. In the case in which mount requested encryption ("seal" on mount) then checks later on during tree connection will return the proper rc, but if seal was not requested by client, since server is allowed to return 0 to indicate no supported cipher, we should not fail mount. Reported-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-19Merge tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds36-508/+992
Pull more xfs updates from Darrick Wong: "The second large pile of new stuff for 5.10, with changes even more monumental than last week! We are formally announcing the deprecation of the V4 filesystem format in 2030. All users must upgrade to the V5 format, which contains design improvements that greatly strengthen metadata validation, supports reflink and online fsck, and is the intended vehicle for handling timestamps past 2038. We're also deprecating the old Irix behavioral tweaks in September 2025. Coming along for the ride are two design changes to the deferred metadata ops subsystem. One of the improvements is to retain correct logical ordering of tasks and subtasks, which is a more logical design for upper layers of XFS and will become necessary when we add atomic file range swaps and commits. The second improvement to deferred ops improves the scalability of the log by helping the log tail to move forward during long-running operations. This reduces log contention when there are a large number of threads trying to run transactions. In addition to that, this fixes numerous small bugs in log recovery; refactors logical intent log item recovery to remove the last remaining place in XFS where we could have nested transactions; fixes a couple of ways that intent log item recovery could fail in ways that wouldn't have happened in the regular commit paths; fixes a deadlock vector in the GETFSMAP implementation (which improves its performance by 20%); and fixes serious bugs in the realtime growfs, fallocate, and bitmap handling code. Summary: - Deprecate the V4 filesystem format, some disused mount options, and some legacy sysctl knobs now that we can support dates into the 25th century. Note that removal of V4 support will not happen until the early 2030s. - Fix some probles with inode realtime flag propagation. - Fix some buffer handling issues when growing a rt filesystem. - Fix a problem where a BMAP_REMAP unmap call would free rt extents even though the purpose of BMAP_REMAP is to avoid freeing the blocks. - Strengthen the dabtree online scrubber to check hash values on child dabtree blocks. - Actually log new intent items created as part of recovering log intent items. - Fix a bug where quotas weren't attached to an inode undergoing bmap intent item recovery. - Fix a buffer overrun problem with specially crafted log buffer headers. - Various cleanups to type usage and slightly inaccurate comments. - More cleanups to the xattr, log, and quota code. - Don't run the (slower) shared-rmap operations on attr fork mappings. - Fix a bug where we failed to check the LSN of finobt blocks during replay and could therefore overwrite newer data with older data. - Clean up the ugly nested transaction mess that log recovery uses to stage intent item recovery in the correct order by creating a proper data structure to capture recovered chains. - Use the capture structure to resume intent item chains with the same log space and block reservations as when they were captured. - Fix a UAF bug in bmap intent item recovery where we failed to maintain our reference to the incore inode if the bmap operation needed to relog itself to continue. - Rearrange the defer ops mechanism to finish newly created subtasks of a parent task before moving on to the next parent task. - Automatically relog intent items in deferred ops chains if doing so would help us avoid pinning the log tail. This will help fix some log scaling problems now and will facilitate atomic file updates later. - Fix a deadlock in the GETFSMAP implementation by using an internal memory buffer to reduce indirect calls and copies to userspace, thereby improving its performance by ~20%. - Fix various problems when calling growfs on a realtime volume would not fully update the filesystem metadata. - Fix broken Kconfig asking about deprecated XFS when XFS is disabled" * tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits) xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n xfs: fix high key handling in the rt allocator's query_range function xfs: annotate grabbing the realtime bitmap/summary locks in growfs xfs: make xfs_growfs_rt update secondary superblocks xfs: fix realtime bitmap/summary file truncation when growing rt volume xfs: fix the indent in xfs_trans_mod_dquot xfs: do the ASSERT for the arguments O_{u,g,p}dqpp xfs: fix deadlock and streamline xfs_getfsmap performance xfs: limit entries returned when counting fsmap records xfs: only relog deferred intent items if free space in the log gets low xfs: expose the log push threshold xfs: periodically relog deferred intent items xfs: change the order in which child and parent defer ops are finished xfs: fix an incore inode UAF in xfs_bui_recover xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering xfs: clean up bmap intent item recovery checking xfs: xfs_defer_capture should absorb remaining transaction reservation xfs: xfs_defer_capture should absorb remaining block reservations xfs: proper replay of deferred ops queued during log recovery xfs: remove XFS_LI_RECOVERED ...
2020-10-19Merge tag 'fuse-update-5.10' of ↵Linus Torvalds14-490/+2629
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Support directly accessing host page cache from virtiofs. This can improve I/O performance for various workloads, as well as reducing the memory requirement by eliminating double caching. Thanks to Vivek Goyal for doing most of the work on this. - Allow automatic submounting inside virtiofs. This allows unique st_dev/ st_ino values to be assigned inside the guest to files residing on different filesystems on the host. Thanks to Max Reitz for the patches. - Fix an old use after free bug found by Pradeep P V K. * tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits) virtiofs: calculate number of scatter-gather elements accurately fuse: connection remove fix fuse: implement crossmounts fuse: Allow fuse_fill_super_common() for submounts fuse: split fuse_mount off of fuse_conn fuse: drop fuse_conn parameter where possible fuse: store fuse_conn in fuse_req fuse: add submount support to <uapi/linux/fuse.h> fuse: fix page dereference after free virtiofs: add logic to free up a memory range virtiofs: maintain a list of busy elements virtiofs: serialize truncate/punch_hole and dax fault path virtiofs: define dax address space operations virtiofs: add DAX mmap support virtiofs: implement dax read/write operations virtiofs: introduce setupmapping/removemapping commands virtiofs: implement FUSE_INIT map_alignment field virtiofs: keep a list of free dax memory ranges virtiofs: add a mount option to enable dax virtiofs: set up virtio_fs dax_device ...
2020-10-19Merge tag 'zonefs-5.10-rc1' of ↵Linus Torvalds2-13/+218
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs updates from Damien Le Moal: "Add an 'explicit-open' mount option to automatically issue a REQ_OP_ZONE_OPEN command to the device whenever a sequential zone file is open for writing for the first time. This avoids 'insufficient zone resources' errors for write operations on some drives with limited zone resources or on ZNS drives with a limited number of active zones. From Johannes" * tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: document the explicit-open mount option zonefs: open/close zone on file open/close zonefs: provide no-lock zonefs_io_error variant zonefs: introduce helper for zone management
2020-10-19cifs: Return the error from crypt_message when enc/dec key not found.Shyam Prasad N1-1/+1
In crypt_message, when smb2_get_enc_key returns error, we need to return the error back to the caller. If not, we end up processing the message further, causing a kernel oops due to unwarranted access of memory. Call Trace: smb3_receive_transform+0x120/0x870 [cifs] cifs_demultiplex_thread+0xb53/0xc20 [cifs] ? cifs_handle_standard+0x190/0x190 [cifs] kthread+0x116/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x1f/0x30 Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-19smb3.1.1: set gcm256 when requestedSteve French4-6/+17
update smb encryption code to set 32 byte key length and to set gcm256 when requested on mount. Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-19smb3.1.1: rename nonces used for GCM and CCM encryptionSteve French2-6/+6
Now that 256 bit encryption can be negotiated, update names of the nonces to match the updated official protocol documentation (e.g. AES_GCM_NONCE instead of AES_128GCM_NONCE) since they apply to both 128 bit and 256 bit encryption. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-10-19smb3.1.1: print warning if server does not support requested encryption typeSteve French1-2/+13
If server does not support AES-256-GCM and it was required on mount, print warning message. Also log and return a different error message (EOPNOTSUPP) when encryption mechanism is not supported vs the case when an unknown unrequested encryption mechanism could be returned (EINVAL). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-10-19io_uring: fix racy REQ_F_LINK_TIMEOUT clearingPavel Begunkov1-4/+13
io_link_timeout_fn() removes REQ_F_LINK_TIMEOUT from the link head's flags, it's not atomic and may race with what the head is doing. If io_link_timeout_fn() doesn't clear the flag, as forced by this patch, then it may happen that for "req -> link_timeout1 -> link_timeout2", __io_kill_linked_timeout() would find link_timeout2 and try to cancel it, so miscounting references. Teach it to ignore such double timeouts by marking the active one with a new flag in io_prep_linked_timeout(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19io_uring: do poll's hash_node init in common codePavel Begunkov1-2/+1
Move INIT_HLIST_NODE(&req->hash_node) into __io_arm_poll_handler(), so that it doesn't duplicated and common poll code would be responsible for it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19io_uring: inline io_poll_task_handler()Pavel Begunkov1-19/+12
io_poll_task_handler() doesn't add clarity, inline it in its only user. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19io_uring: remove extra ->file check in poll prepPavel Begunkov1-2/+0
io_poll_add_prep() doesn't need to verify ->file because it's already done in io_init_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19io_uring: make cached_cq_overflow non atomic_tPavel Begunkov1-5/+6
ctx->cached_cq_overflow is changed only under completion_lock. Convert it from atomic_t to just int, and mark all places when it's read without lock with READ_ONCE, which guarantees atomicity (relaxed ordering). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19io_uring: inline io_fail_links()Pavel Begunkov1-10/+3
Inline io_fail_links() and kill extra io_cqring_ev_posted(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19io_uring: kill ref get/drop in personality initPavel Begunkov1-4/+9
Don't take an identity on personality/creds init only to drop it a few lines after. Extract a function which prepares req->work but leaves it without identity. Note: it's safe to not check REQ_F_WORK_INITIALIZED there because it's nobody had a chance to init it before io_init_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19io_uring: flags-based creds init in queuePavel Begunkov1-2/+2
Use IO_WQ_WORK_CREDS to figure out if req has creds to be used. Since recently it should rely only on flags, but not value of work.creds. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19io_uring: use blk_queue_nowait() to check if NOWAIT supportedJeffle Xu1-1/+1
commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") adds a new helper function blk_queue_nowait() to check if the bdev supports handling of REQ_NOWAIT or not. Since then bio-based dm device can also support REQ_NOWAIT, and currently only dm-linear supports that since commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable it for linear target"). Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-8/+13
Merge yet more updates from Andrew Morton: "Subsystems affected by this patch series: mm (memcg, migration, pagemap, gup, madvise, vmalloc), ia64, and misc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits) mm: remove duplicate include statement in mmu.c mm: remove the filename in the top of file comment in vmalloc.c mm: cleanup the gfp_mask handling in __vmalloc_area_node mm: remove alloc_vm_area x86/xen: open code alloc_vm_area in arch_gnttab_valloc xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv drm/i915: use vmap in i915_gem_object_map drm/i915: stop using kmap in i915_gem_object_map drm/i915: use vmap in shmem_pin_map zsmalloc: switch from alloc_vm_area to get_vm_area mm: allow a NULL fn callback in apply_to_page_range mm: add a vmap_pfn function mm: add a VM_MAP_PUT_PAGES flag for vmap mm: update the documentation for vfree mm/madvise: introduce process_madvise() syscall: an external memory hinting API pid: move pidfd_get_pid() to pid.c mm/madvise: pass mm to do_madvise selftests/vm: 10x speedup for hmm-tests binfmt_elf: take the mmap lock around find_extend_vma() mm/gup_benchmark: take the mmap lock around GUP ...
2020-10-18Merge tag 'for-linus-5.10-rc1-part2' of ↵Linus Torvalds6-4/+12
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull more ubi and ubifs updates from Richard Weinberger: "UBI: - Correctly use kthread_should_stop in ubi worker UBIFS: - Fixes for memory leaks while iterating directory entries - Fix for a user triggerable error message - Fix for a space accounting bug in authenticated mode" * tag 'for-linus-5.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: journal: Make sure to not dirty twice for auth nodes ubifs: setflags: Don't show error message when vfs_ioc_setflags_prepare() fails ubifs: ubifs_jnl_change_xattr: Remove assertion 'nlink > 0' for host inode ubi: check kthread_should_stop() after the setting of task state ubifs: dent: Fix some potential memory leaks while iterating entries ubifs: xattr: Fix some potential memory leaks while iterating entries
2020-10-18Merge tag 'for-linus-5.10-rc1' of ↵Linus Torvalds5-21/+34
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull ubifs updates from Richard Weinberger: - Kernel-doc fixes - Fixes for memory leaks in authentication option parsing * tag 'for-linus-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: mount_ubifs: Release authentication resource in error handling path ubifs: Don't parse authentication mount options in remount process ubifs: Fix a memleak after dumping authentication mount options ubifs: Fix some kernel-doc warnings in tnc.c ubifs: Fix some kernel-doc warnings in replay.c ubifs: Fix some kernel-doc warnings in gc.c ubifs: Fix 'hash' kernel-doc warning in auth.c
2020-10-18mm/madvise: pass mm to do_madviseMinchan Kim1-1/+1
Patch series "introduce memory hinting API for external process", v9. Now, we have MADV_PAGEOUT and MADV_COLD as madvise hinting API. With that, application could give hints to kernel what memory range are preferred to be reclaimed. However, in some platform(e.g., Android), the information required to make the hinting decision is not known to the app. Instead, it is known to a centralized userspace daemon(e.g., ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the concern, this patch introduces new syscall - process_madvise(2). Bascially, it's same with madvise(2) syscall but it has some differences. 1. It needs pidfd of target process to provide the hint 2. It supports only MADV_{COLD|PAGEOUT|MERGEABLE|UNMEREABLE} at this moment. Other hints in madvise will be opened when there are explicit requests from community to prevent unexpected bugs we couldn't support. 3. Only privileged processes can do something for other process's address space. For more detail of the new API, please see "mm: introduce external memory hinting API" description in this patchset. This patch (of 3): In upcoming patches, do_madvise will be called from external process context so we shouldn't asssume "current" is always hinted process's task_struct. Furthermore, we must not access mm_struct via task->mm, but obtain it via access_mm() once (in the following patch) and only use that pointer [1], so pass it to do_madvise() as well. Note the vma->vm_mm pointers are safe, so we can use them further down the call stack. And let's pass current->mm as arguments of do_madvise so it shouldn't change existing behavior but prepare next patch to make review easy. [vbabka@suse.cz: changelog tweak] [minchan@kernel.org: use current->mm for io_uring] Link: http://lkml.kernel.org/r/20200423145215.72666-1-minchan@kernel.org [akpm@linux-foundation.org: fix it for upstream changes] [akpm@linux-foundation.org: whoops] [rdunlap@infradead.org: add missing includes] Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jann Horn <jannh@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: John Dias <joaodias@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Christian Brauner <christian@brauner.io> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: https://lkml.kernel.org/r/20200901000633.1920247-1-minchan@kernel.org Link: http://lkml.kernel.org/r/20200622192900.22757-1-minchan@kernel.org Link: http://lkml.kernel.org/r/20200302193630.68771-2-minchan@kernel.org Link: http://lkml.kernel.org/r/20200622192900.22757-2-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-2-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18binfmt_elf: take the mmap lock around find_extend_vma()Jann Horn1-0/+3
create_elf_tables() runs after setup_new_exec(), so other tasks can already access our new mm and do things like process_madvise() on it. (At the time I'm writing this commit, process_madvise() is not in mainline yet, but has been in akpm's tree for some time.) While I believe that there are currently no APIs that would actually allow another process to mess up our VMA tree (process_madvise() is limited to MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm under which no syscalls have been executed yet), this seems like an accident waiting to happen. Let's make sure that we always take the mmap lock around GUP paths as long as another process might be able to see the mm. (Yes, this diff looks suspicious because we drop the lock before doing anything with `vma`, but that's because we actually don't do anything with it apart from the NULL check.) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michel Lespinasse <walken@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://lkml.kernel.org/r/CAG48ez1-PBCdv3y8pn-Ty-b+FmBSLwDuVKFSt8h7wARLy0dF-Q@mail.gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>