aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-03xfs: ensure st_blocks never goes to zero during COW writesChristoph Hellwig2-0/+18
COW writes remove the amount overwritten either directly for delalloc reservations, or in earlier deferred transactions than adding the new amount back in the bmap map transaction. This means st_blocks on an inode where all data is overwritten using the COW path can temporarily show a 0 st_blocks. This can easily be reproduced with the pending zoned device support where all writes use this path and trips the check in generic/615, but could also happen on a reflink file without that. Fix this by temporarily add the pending blocks to be mapped to i_delayed_blks while the item is queued. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: use xas_for_each_marked in xfs_reclaim_inodes_countChristoph Hellwig2-29/+9
xfs_reclaim_inodes_count iterates over all AGs to sum up the reclaimable inodes counts. There is no point in grabbing a reference to the them or unlock the RCU critical section for each iteration, so switch to the more efficient xas_for_each_marked iterator. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: convert perag lookup to xarrayChristoph Hellwig4-52/+35
Convert the perag lookup from the legacy radix tree to the xarray, which allows for much nicer iteration and bulk lookup semantics. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: simplify tagged perag iterationChristoph Hellwig2-40/+35
Pass the old perag structure to the tagged loop helpers so that they can grab the old agno before releasing the reference. This removes the need to separately track the agno and the iterator macro, and thus also obsoletes the for_each_perag_tag syntactic sugar. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: move the tagged perag lookup helpers to xfs_icache.cChristoph Hellwig3-62/+58
The tagged perag helpers are only used in xfs_icache.c in the kernel code and not at all in xfsprogs. Move them to xfs_icache.c in preparation for switching to an xarray, for which I have no plan to implement the tagged lookup functions for userspace. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: use kfree_rcu_mightsleep to free the perag structuresChristoph Hellwig2-14/+1
Using the kfree_rcu_mightsleep is simpler and removes the need for a rcu_head in the perag structure. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: use LIST_HEAD() to simplify codeHongbo Li1-2/+1
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Signed-off-by: Hongbo Li <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: Remove duplicate xfs_trans_priv.h headerJiapeng Chong1-1/+0
./fs/xfs/libxfs/xfs_defer.c: xfs_trans_priv.h is included more than once. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9491 Signed-off-by: Jiapeng Chong <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: remove unnecessary checkDan Carpenter1-1/+1
We checked that "pip" is non-NULL at the start of the if else statement so there is no need to check again here. Delete the check. Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: Use xfs set and clear mp state helpersJohn Garry5-9/+9
Use the set and clear mp state helpers instead of open-coding. It is noted that in some instances calls to atomic operation set_bit() and clear_bit() are being replaced with test_and_set_bit() and test_and_clear_bit(), respectively, as there is no specific helpers for set_bit() and clear_bit() only. However should be ok, as we are just ignoring the returned value from those "test" variants. Signed-off-by: John Garry <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: reclaim speculative preallocations for append only filesChristoph Hellwig3-8/+10
The XFS XFS_DIFLAG_APPEND maps to the VFS S_APPEND flag, which forbids writes that don't append at the current EOF. But the commit originally adding XFS_DIFLAG_APPEND support (commit a23321e766d in xfs xfs-import repository) also checked it to skip releasing speculative preallocations, which doesn't make any sense. Another commit (dd9f438e3290 in the xfs-import repository) later extended that flag to also report these speculation preallocations which should not exist in getbmap. Remove these checks as nothing XFS_DIFLAG_APPEND implies that preallocations beyond EOF should exist, but explicitly check for XFS_DIFLAG_APPEND in xfs_file_release to bypass the algorithm that discard preallocations on the first close as append only files aren't expected to be written to only once. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: simplify extent lookup in xfs_can_free_eofblocksChristoph Hellwig1-15/+7
xfs_can_free_eofblocks just cares if there is an extent beyond EOF. Replace the call to xfs_bmapi_read with a xfs_iext_lookup_extent as we've already checked that extents are read in earlier. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: check XFS_EOFBLOCKS_RELEASED earlier in xfs_release_eofblocksChristoph Hellwig1-3/+2
If the XFS_EOFBLOCKS_RELEASED flag is set, we are not going to free the eofblocks, so don't bother locking the inode or performing the checks in xfs_can_free_eofblocks. Also switch to a test_and_set operation once the iolock has been acquire so that only the caller that sets it actually frees the post-EOF blocks. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: only free posteof blocks on first closeDarrick J. Wong2-23/+13
Certain workloads fragment files on XFS very badly, such as a software package that creates a number of threads, each of which repeatedly run the sequence: open a file, perform a synchronous write, and close the file, which defeats the speculative preallocation mechanism. We work around this problem by only deleting posteof blocks the /first/ time a file is closed to preserve the behavior that unpacking a tarball lays out files one after the other with no gaps. Signed-off-by: Darrick J. Wong <[email protected]> [hch: rebased, updated comment, renamed the flag] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: don't free post-EOF blocks on read closeDave Chinner1-1/+7
When we have a workload that does open/read/close in parallel with other allocation, the file becomes rapidly fragmented. This is due to close() calling xfs_file_release() and removing the speculative preallocation beyond EOF. Add a check for a writable context to xfs_file_release to skip the post-EOF block freeing (an the similarly pointless flushing on truncate down). Before: Test 1: sync write fragmentation counts /mnt/scratch/file.0: 919 /mnt/scratch/file.1: 916 /mnt/scratch/file.2: 919 /mnt/scratch/file.3: 920 /mnt/scratch/file.4: 920 /mnt/scratch/file.5: 921 /mnt/scratch/file.6: 916 /mnt/scratch/file.7: 918 After: Test 1: sync write fragmentation counts /mnt/scratch/file.0: 24 /mnt/scratch/file.1: 24 /mnt/scratch/file.2: 11 /mnt/scratch/file.3: 24 /mnt/scratch/file.4: 3 /mnt/scratch/file.5: 24 /mnt/scratch/file.6: 24 /mnt/scratch/file.7: 23 Signed-off-by: Dave Chinner <[email protected]> [darrick: wordsmithing, fix commit message] Signed-off-by: Darrick J. Wong <[email protected]> [hch: ported to the new ->release code structure] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: skip all of xfs_file_release when shut downChristoph Hellwig1-4/+6
There is no point in trying to free post-EOF blocks when the file system is shutdown, as it will just error out ASAP. Instead return instantly when xfs_file_release is called on a shut down file system. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: don't bother returning errors from xfs_file_releaseChristoph Hellwig1-9/+9
While ->release returns int, the only caller ignores the return value. As we're only doing cleanup work there isn't much of a point in return a value to start with, so just document the situation instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: refactor f_op->release handlingChristoph Hellwig3-83/+68
Currently f_op->release is split in not very obvious ways. Fix that by folding xfs_release into xfs_file_release. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03xfs: remove the i_mode check in xfs_releaseChristoph Hellwig1-3/+0
xfs_release is only called from xfs_file_release, which is wired up as the f_op->release handler for regular files only. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-09-03Merge tag 'btree-cleanups-6.12_2024-09-02' of ↵Chandan Babu R19-154/+237
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA xfs: cleanups for inode rooted btree code [v4.2 8/8] This series prepares the btree code to support realtime reverse mapping btrees by refactoring xfs_ifork_realloc to be fed a per-btree ops structure so that it can handle multiple types of inode-rooted btrees. It moves on to refactoring the btree code to use the new realloc routines. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]> * tag 'btree-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: standardize the btree maxrecs function parameters xfs: replace shouty XFS_BM{BT,DR} macros
2024-09-03Merge tag 'xfs-fixes-6.12_2024-09-02' of ↵Chandan Babu R3-8/+9
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA xfs: various bug fixes for 6.12 [7/8] Various bug fixes for 6.12. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]> * tag 'xfs-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: fix a sloppy memory handling bug in xfs_iroot_realloc xfs: fix FITRIM reporting again xfs: fix C++ compilation errors in xfs_fs.h
2024-09-03Merge tag 'quota-cleanups-6.12_2024-09-02' of ↵Chandan Babu R4-36/+81
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA xfs: cleanups for quota mount [v4.2 6/8] Refactor the quota file loading code in preparation for adding metadata directory trees. Did you know that quotarm works even when quota isn't active? With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]> * tag 'quota-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: refactor loading quota inodes in the regular case
2024-09-03Merge tag 'rtalloc-cleanups-6.12_2024-09-02' of ↵Chandan Babu R14-490/+478
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA xfs: cleanups for the realtime allocator [v4.2 5/8] This third series cleans up the realtime allocator code so that it'll be somewhat less difficult to figure out what on earth it's doing. We also rearrange the fsmap code a bit. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]> * tag 'rtalloc-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c xfs: rearrange xfs_fsmap.c a little bit xfs: replace m_rsumsize with m_rsumblocks xfs: remove xfs_{rtbitmap,rtsummary}_wordcount xfs: add xchk_setup_nothing and xchk_nothing helpers xfs: make the rtalloc start hint a xfs_rtblock_t xfs: factor out a xfs_rtallocate_align helper xfs: rework the rtalloc fallback handling xfs: factor out a xfs_rtallocate helper xfs: clean up the ISVALID macro in xfs_bmap_adjacent
2024-09-03Merge tag 'rtalloc-fixes-6.12_2024-09-02' of ↵Chandan Babu R7-133/+124
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA xfs: fixes for the realtime allocator [v4.2 4/8] While I was reviewing how to integrate realtime allocation groups with the rt allocator, I noticed several bugs in the existing allocation code with regards to calculating the maximum range of rtx to scan for free space. This series fixes those range bugs and cleans up a few things too. I also added a few cleanups from Christoph. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]> * tag 'rtalloc-fixes-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: simplify xfs_rtalloc_query_range xfs: remove xfs_rtb_to_rtxrem xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near xfs: clean up xfs_rtallocate_extent_exact a bit xfs: refactor aligning bestlen to prod xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block xfs: don't return too-short extents from xfs_rtallocate_extent_block xfs: ensure rtx mask/shift are correct after growfs xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock
2024-09-03Merge tag 'rtbitmap-cleanups-6.12_2024-09-02' of ↵Chandan Babu R7-402/+438
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA xfs: clean up the rtbitmap code [v4.2 3/8] Here are some cleanups and reorganization of the realtime bitmap code to share more of that code between userspace and the kernel. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]> * tag 'rtbitmap-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock xfs: factor out rtbitmap/summary initialization helpers xfs: factor out a xfs_last_rt_bmblock helper xfs: factor out a xfs_growfs_rt_bmblock helper xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc xfs: cleanup the calling convention for xfs_rtpick_extent xfs: add bounds checking to xfs_rt{bitmap,summary}_read_buf xfs: assert a valid limit in xfs_rtfind_forw xfs: remove the limit argument to xfs_rtfind_back xfs: make the RT rsum_cache mandatory xfs: factor out a xfs_validate_rt_geometry helper xfs: remove xfs_validate_rtextents
2024-09-03Merge tag 'metadir-cleanups-6.12_2024-09-02' of ↵Chandan Babu R8-12/+16
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA xfs: cleanups before adding metadata directories [v4.2 2/8] Before we start adding code for metadata directory trees, let's clean up some warts in the realtime bitmap code and the inode allocator code. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]> * tag 'metadir-cleanups-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: pass the icreate args object to xfs_dialloc xfs: match on the global RT inode numbers in xfs_is_metadata_inode xfs: validate inumber in xfs_iget
2024-09-03Merge tag 'atomic-file-commits-6.12_2024-09-02' of ↵Chandan Babu R5-3/+243
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA xfs: atomic file content commits [v31.1 1/8] This series creates XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE ioctls to perform the exchange only if the target file has not been changed since a given sampling point. This new functionality uses the mechanism underlying EXCHANGE_RANGE to stage and commit file updates such that reader programs will see either the old contents or the new contents in their entirety, with no chance of torn writes. A successful call completion guarantees that the new contents will be seen even if the system fails. The pair of ioctls allows userspace to perform what amounts to a compare and exchange operation on entire file contents. Note that there are ongoing arguments in the community about how best to implement some sort of file data write counter that nfsd could also use to signal invalidations to clients. Until such a thing is implemented, this patch will rely on ctime/mtime updates. Here are the proposed manual pages: IOCTL-XFS-COMMIT-RANGE(2) System Calls ManualIOCTL-XFS-COMMIT-RANGE(2) NAME ioctl_xfs_start_commit - prepare to exchange the contents of two files ioctl_xfs_commit_range - conditionally exchange the contents of parts of two files SYNOPSIS #include <sys/ioctl.h> #include <xfs/xfs_fs.h> int ioctl(int file2_fd, XFS_IOC_START_COMMIT, struct xfs_com‐ mit_range *arg); int ioctl(int file2_fd, XFS_IOC_COMMIT_RANGE, struct xfs_com‐ mit_range *arg); DESCRIPTION Given a range of bytes in a first file file1_fd and a second range of bytes in a second file file2_fd, this ioctl(2) ex‐ changes the contents of the two ranges if file2_fd passes cer‐ tain freshness criteria. Before exchanging the contents, the program must call the XFS_IOC_START_COMMIT ioctl to sample freshness data for file2_fd. If the sampled metadata does not match the file metadata at commit time, XFS_IOC_COMMIT_RANGE will return EBUSY. Exchanges are atomic with regards to concurrent file opera‐ tions. Implementations must guarantee that readers see either the old contents or the new contents in their entirety, even if the system fails. The system call parameters are conveyed in structures of the following form: struct xfs_commit_range { __s32 file1_fd; __u32 pad; __u64 file1_offset; __u64 file2_offset; __u64 length; __u64 flags; __u64 file2_freshness[5]; }; The field pad must be zero. The fields file1_fd, file1_offset, and length define the first range of bytes to be exchanged. The fields file2_fd, file2_offset, and length define the second range of bytes to be exchanged. The field file2_freshness is an opaque field whose contents are determined by the kernel. These file attributes are used to confirm that file2_fd has not changed by another thread since the current thread began staging its own update. Both files must be from the same filesystem mount. If the two file descriptors represent the same file, the byte ranges must not overlap. Most disk-based filesystems require that the starts of both ranges must be aligned to the file block size. If this is the case, the ends of the ranges must also be so aligned unless the XFS_EXCHANGE_RANGE_TO_EOF flag is set. The field flags control the behavior of the exchange operation. XFS_EXCHANGE_RANGE_TO_EOF Ignore the length parameter. All bytes in file1_fd from file1_offset to EOF are moved to file2_fd, and file2's size is set to (file2_offset+(file1_length- file1_offset)). Meanwhile, all bytes in file2 from file2_offset to EOF are moved to file1 and file1's size is set to (file1_offset+(file2_length- file2_offset)). XFS_EXCHANGE_RANGE_DSYNC Ensure that all modified in-core data in both file ranges and all metadata updates pertaining to the exchange operation are flushed to persistent storage before the call returns. Opening either file de‐ scriptor with O_SYNC or O_DSYNC will have the same effect. XFS_EXCHANGE_RANGE_FILE1_WRITTEN Only exchange sub-ranges of file1_fd that are known to contain data written by application software. Each sub-range may be expanded (both upwards and downwards) to align with the file allocation unit. For files on the data device, this is one filesystem block. For files on the realtime device, this is the realtime extent size. This facility can be used to implement fast atomic scatter-gather writes of any complexity for software-defined storage targets if all writes are aligned to the file allocation unit. XFS_EXCHANGE_RANGE_DRY_RUN Check the parameters and the feasibility of the op‐ eration, but do not change anything. RETURN VALUE On error, -1 is returned, and errno is set to indicate the er‐ ror. ERRORS Error codes can be one of, but are not limited to, the follow‐ ing: EBADF file1_fd is not open for reading and writing or is open for append-only writes; or file2_fd is not open for reading and writing or is open for append-only writes. EBUSY The file2 inode number and timestamps supplied do not match file2_fd. EINVAL The parameters are not correct for these files. This error can also appear if either file descriptor repre‐ sents a device, FIFO, or socket. Disk filesystems gen‐ erally require the offset and length arguments to be aligned to the fundamental block sizes of both files. EIO An I/O error occurred. EISDIR One of the files is a directory. ENOMEM The kernel was unable to allocate sufficient memory to perform the operation. ENOSPC There is not enough free space in the filesystem ex‐ change the contents safely. EOPNOTSUPP The filesystem does not support exchanging bytes between the two files. EPERM file1_fd or file2_fd are immutable. ETXTBSY One of the files is a swap file. EUCLEAN The filesystem is corrupt. EXDEV file1_fd and file2_fd are not on the same mounted filesystem. CONFORMING TO This API is XFS-specific. USE CASES Several use cases are imagined for this system call. Coordina‐ tion between multiple threads is performed by the kernel. The first is a filesystem defragmenter, which copies the con‐ tents of a file into another file and wishes to exchange the space mappings of the two files, provided that the original file has not changed. An example program might look like this: int fd = open("/some/file", O_RDWR); int temp_fd = open("/some", O_TMPFILE | O_RDWR); struct stat sb; struct xfs_commit_range args = { .flags = XFS_EXCHANGE_RANGE_TO_EOF, }; /* gather file2's freshness information */ ioctl(fd, XFS_IOC_START_COMMIT, &args); fstat(fd, &sb); /* make a fresh copy of the file with terrible alignment to avoid reflink */ clone_file_range(fd, NULL, temp_fd, NULL, 1, 0); clone_file_range(fd, NULL, temp_fd, NULL, sb.st_size - 1, 0); /* commit the entire update */ args.file1_fd = temp_fd; ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args); if (ret && errno == EBUSY) printf("file changed while defrag was underway "); The second is a data storage program that wants to commit non- contiguous updates to a file atomically. This program cannot coordinate updates to the file and therefore relies on the ker‐ nel to reject the COMMIT_RANGE command if the file has been up‐ dated by someone else. This can be done by creating a tempo‐ rary file, calling FICLONE(2) to share the contents, and stag‐ ing the updates into the temporary file. The FULL_FILES flag is recommended for this purpose. The temporary file can be deleted or punched out afterwards. An example program might look like this: int fd = open("/some/file", O_RDWR); int temp_fd = open("/some", O_TMPFILE | O_RDWR); struct xfs_commit_range args = { .flags = XFS_EXCHANGE_RANGE_TO_EOF, }; /* gather file2's freshness information */ ioctl(fd, XFS_IOC_START_COMMIT, &args); ioctl(temp_fd, FICLONE, fd); /* append 1MB of records */ lseek(temp_fd, 0, SEEK_END); write(temp_fd, data1, 1000000); /* update record index */ pwrite(temp_fd, data1, 600, 98765); pwrite(temp_fd, data2, 320, 54321); pwrite(temp_fd, data2, 15, 0); /* commit the entire update */ args.file1_fd = temp_fd; ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args); if (ret && errno == EBUSY) printf("file changed before commit; will roll back "); NOTES Some filesystems may limit the amount of data or the number of extents that can be exchanged in a single call. SEE ALSO ioctl(2) XFS 2024-02-18 IOCTL-XFS-COMMIT-RANGE(2) With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]> * tag 'atomic-file-commits-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: introduce new file range commit ioctls
2024-09-02Merge tag 'linux-can-next-for-6.12-20240830' of ↵Jakub Kicinski7-130/+170
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2024-08-30 The first patch is by Duy Nguyen and document the R-Car V4M support in the rcar-canfd DT bindings. Frank Li's patch converts the microchip,mcp251x.txt DT bindings documentation to yaml. A patch by Zhang Changzhong update a comment in the j1939 CAN networking stack. Stefan Mätje's patch updates the CAN configuration netlink code, so that the bit timing calculation doesn't work on stale can_priv::ctrlmode data. Martin Jocic contributes a patch for the kvaser_pciefd driver to convert some ifdefs into if (IS_ENABLED()). The last patch is by Yan Zhen and simplifies the probe() function of the kvaser USB driver by using dev_err_probe(). * tag 'linux-can-next-for-6.12-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: kvaser_usb: Simplify with dev_err_probe() can: kvaser_pciefd: Use IS_ENABLED() instead of #ifdef can: netlink: avoid call to do_set_data_bittiming callback with stale can_priv::ctrlmode can: j1939: use correct function name in comment dt-bindings: can: convert microchip,mcp251x.txt to yaml dt-bindings: can: renesas,rcar-canfd: Document R-Car V4M support ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-09-02Merge tag 'for-net-2024-08-30' of ↵Jakub Kicinski7-92/+117
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - qca: If memdump doesn't work, re-enable IBS - MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT - Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE" - MGMT: Ignore keys being loaded with invalid type * tag 'for-net-2024-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: MGMT: Ignore keys being loaded with invalid type Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE" Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT Bluetooth: hci_sync: Introduce hci_cmd_sync_run/hci_cmd_sync_run_once Bluetooth: qca: If memdump doesn't work, re-enable IBS ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-09-02Merge tag 'linux-can-fixes-for-6.11-20240830' of ↵Jakub Kicinski6-64/+121
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2024-08-30 The first patch is by Kuniyuki Iwashima for the CAN BCM protocol that adds a missing proc entry removal when a device unregistered. Simon Horman fixes the cleanup in the error cleanup path of the m_can driver's open function. Markus Schneider-Pargmann contributes 7 fixes for the m_can driver, all related to the recently added IRQ coalescing support. The next 2 patches are by me, target the mcp251xfd driver and fix ring and coalescing configuration problems when switching from CAN-CC to CAN-FD mode. Simon Arlott's patch fixes a possible deadlock in the mcp251x driver. The last patch is by Martin Jocic for the kvaser_pciefd driver and fixes a problem with lost IRQs, which result in starvation, under high load situations. * tag 'linux-can-fixes-for-6.11-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: kvaser_pciefd: Use a single write when releasing RX buffers can: mcp251x: fix deadlock if an interrupt occurs during mcp251x_open can: mcp251xfd: mcp251xfd_ring_init(): check TX-coalescing configuration can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode can: m_can: Limit coalescing to peripheral instances can: m_can: Reset cached active_interrupts on start can: m_can: disable_all_interrupts, not clear active_interrupts can: m_can: Do not cancel timer from within timer can: m_can: Remove m_can_rx_peripheral indirection can: m_can: Remove coalesing disable in isr during suspend can: m_can: Reset coalescing during suspend/resume can: m_can: Release irq on error in m_can_open can: bcm: Remove proc entry when dev is unregistered. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-09-02r8169: add support for RTL8126A rev.bChunHao Lin3-15/+29
Add support for RTL8126A rev.b. Its XID is 0x64a. It is basically based on the one with XID 0x649, but with different firmware file. Signed-off-by: ChunHao Lin <[email protected]> Reviewed-by: Heiner Kallweit <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-09-02netdev-genl: Set extack and fix error on napi-getJoe Damato1-3/+5
In commit 27f91aaf49b3 ("netdev-genl: Add netlink framework functions for napi"), when an invalid NAPI ID is specified the return value -EINVAL is used and no extack is set. Change the return value to -ENOENT and set the extack. Before this commit: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-get --json='{"id": 451}' Netlink error: Invalid argument nl_len = 36 (20) nl_flags = 0x100 nl_type = 2 error: -22 After this commit: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-get --json='{"id": 451}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.id'} Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Joe Damato <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-09-02smb: client: fix hang in wait_for_response() for negprotoPaulo Alcantara1-1/+13
Call cifs_reconnect() to wake up processes waiting on negotiate protocol to handle the case where server abruptly shut down and had no chance to properly close the socket. Simple reproducer: ssh 192.168.2.100 pkill -STOP smbd mount.cifs //192.168.2.100/test /mnt -o ... [never returns] Cc: Rickard Andersson <[email protected]> Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-09-02btrfs: zoned: handle broken write pointer on zonesNaohiro Aota1-5/+25
Btrfs rejects to mount a FS if it finds a block group with a broken write pointer (e.g, unequal write pointers on two zones of RAID1 block group). Since such case can happen easily with a power-loss or crash of a system, we need to handle the case more gently. Handle such block group by making it unallocatable, so that there will be no writes into it. That can be done by setting the allocation pointer at the end of allocating region (= block_group->zone_capacity). Then, existing code handle zone_unusable properly. Having proper zone_capacity is necessary for the change. So, set it as fast as possible. We cannot handle RAID0 and RAID10 case like this. But, they are anyway unable to read because of a missing stripe. Fixes: 265f7237dd25 ("btrfs: zoned: allow DUP on meta-data block groups") Fixes: 568220fa9657 ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree") CC: [email protected] # 6.1+ Reported-by: HAN Yuwei <[email protected]> Cc: Xuefer <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-09-02bpftool: Fix handling enum64 in btf dump sortingMykyta Yatsenko1-3/+4
Wrong function is used to access the first enum64 element. Substituting btf_enum(t) with btf_enum64(t) for BTF_KIND_ENUM64. Fixes: 94133cf24bb3 ("bpftool: Introduce btf c dump sorting") Signed-off-by: Mykyta Yatsenko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Quentin Monnet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-09-02ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discardyangerkun1-6/+4
Commit 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in ext4_group_info") speed up fstrim by skipping trim trimmed group. We also has the chance to clear trimmed once there exists some block free for this group(mount without discard), and the next trim for this group will work well too. For mount with discard, we will issue dicard when we free blocks, so leave trimmed flag keep alive to skip useless trim trigger from userspace seems reasonable. But for some case like ext4 build on dm-thinpool(ext4 blocksize 4K, pool blocksize 128K), discard from ext4 maybe unaligned for dm thinpool, and thinpool will just finish this discard(see process_discard_bio when begein equals to end) without actually process discard. For this case, trim from userspace can really help us to free some thinpool block. So convert to clear trimmed flag for all case no matter mounted with discard or not. Fixes: 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in ext4_group_info") Signed-off-by: yangerkun <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: drop all delonly descriptionsZhang Yi1-34/+32
When counting reserved clusters, delayed type is always equal to delonly type now, hence drop all delonly descriptions in parameters and comments. Signed-off-by: Zhang Yi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: drop ext4_es_is_delonly()Zhang Yi3-16/+11
Since we don't add delayed flag in unwritten extents, so there is no difference between ext4_es_is_delayed() and ext4_es_is_delonly(), just drop ext4_es_is_delonly(). Signed-off-by: Zhang Yi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: make extent status types exclusiveZhang Yi1-2/+10
Since we don't add delayed flag in unwritten extents, all of the four extent status types EXTENT_STATUS_WRITTEN, EXTENT_STATUS_UNWRITTEN, EXTENT_STATUS_DELAYED and EXTENT_STATUS_HOLE are exclusive now, add assertion when storing pblock before inserting extent into status tree and add comment to the status definition. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Zhang Yi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: drop unused ext4_es_store_status()Zhang Yi1-7/+0
The helper ext4_es_store_status() is unused now, just drop it. Signed-off-by: Zhang Yi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: use ext4_map_query_blocks() in ext4_map_blocks()Zhang Yi1-21/+1
The blocks map querying logic in ext4_map_blocks() are the same as ext4_map_query_blocks(), so switch to directly use it. Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: drop ext4_es_delayed_clu()Zhang Yi2-90/+0
Since we move ext4_da_update_reserve_space() to ext4_es_insert_extent(), no one uses ext4_es_delayed_clu() and __es_delayed_clu(), just drop them. Signed-off-by: Zhang Yi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: update delalloc data reserve spcae in ext4_es_insert_extent()Zhang Yi3-45/+24
Now that we update data reserved space for delalloc after allocating new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is enabled, we also need to query the extents_status tree to calculate the exact reserved clusters. This is complicated now and it appears that it's better to do this job in ext4_es_insert_extent(), because __es_remove_extent() have already count delalloc blocks when removing delalloc extents and __revise_pending() return new adding pending count, we could update the reserved blocks easily in ext4_es_insert_extent(). We direct reduce the reserved cluster count when replacing a delalloc extent. However, thers are two special cases need to concern about the quota claiming when doing direct block allocation (e.g. from fallocate). A), fallocate a range that covers a delalloc extent but start with non-delayed allocated blocks, e.g. a hole. hhhhhhh+ddddddd+ddddddd ^^^^^^^^^^^^^^^^^^^^^^^ fallocate this range Current ext4_map_blocks() can't always trim the extent since it may release i_data_sem before calling ext4_map_create_blocks() and raced by another delayed allocation. Hence the EXT4_GET_BLOCKS_DELALLOC_RESERVE may not set even when we are replacing a delalloc extent, without this flag set, the quota has already been claimed by ext4_mb_new_blocks(), so we should release the quota reservations instead of claim them again. B), bigalloc feature is enabled, fallocate a range that contains non-delayed allocated blocks. |< one cluster >| hhhhhhh+hhhhhhh+hhhhhhh+ddddddd ^^^^^^^ fallocate this range This case is similar to above case, the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag is also not set. Hence we should release the quota reservations if we replace a delalloc extent but without EXT4_GET_BLOCKS_DELALLOC_RESERVE set. Signed-off-by: Zhang Yi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: passing block allocation information to ext4_es_insert_extent()Zhang Yi4-10/+11
Just pass the block allocation flag to ext4_es_insert_extent() when we replacing a current extent after an actually block allocation or extent status conversion, this flag will be used by later changes. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Zhang Yi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: let __revise_pending() return newly inserted pendingsZhang Yi1-10/+18
Let __insert_pending() return 1 after successfully inserting a new pending cluster, and also let __revise_pending() to return the number of of newly inserted pendings. Signed-off-by: Zhang Yi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: don't set EXTENT_STATUS_DELAYED on allocated blocksZhang Yi2-19/+1
Currently, we release delayed allocation reservation when removing delayed extent from extent status tree (which also happens when overwriting one extent with another one). When we allocated unwritten extent under some delayed allocated extent, we don't need the reservation anymore and hence we don't need to preserve the EXT4_MAP_DELAYED status bit. Allocating the new extent blocks will properly release the reservation. Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag setZhang Yi1-7/+8
When doing block allocation, magic EXT4_GET_BLOCKS_DELALLOC_RESERVE means the allocating range covers a range of delayed allocated clusters, the blocks and quotas have already been reserved in ext4_da_map_blocks(), we should update the reserved space and don't need to claim them again. At the moment, we only set this magic in mpage_map_one_extent() when allocating a range of delayed allocated clusters in the write back path, it makes things complicated since we have to notice and deal with the case of allocating non-delayed allocated clusters separately in ext4_ext_map_blocks(). For example, it we fallocate some blocks that have been delayed allocated, free space would be claimed again in ext4_mb_new_blocks() (this is wrong exactily), and we can't claim quota space again, we have to release the quota reservations made for that previously delayed allocated clusters. Move the position thats set the EXT4_GET_BLOCKS_DELALLOC_RESERVE to where we actually do block allocation, it could simplify above handling a lot, it means that we always set this magic once the allocation range covers delalloc blocks, no need to take care of the allocation path. Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02ext4: factor out ext4_map_create_blocks() to allocate new blocksZhang Yi1-76/+81
Factor out a common helper ext4_map_create_blocks() from ext4_map_blocks() to do a real blocks allocation, no logic changes. [ Note: this first patch of a ten patch series named "v3: simplify the counting and management of delalloc reserved blocks". The link to the v1 and v2 patch series are below. -- TYT ] Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://patch.msgid.link/[email protected] # v2 of patch series Link: https://patch.msgid.link/[email protected] # v1 of the patch series Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-09-02perf daemon: Fix the build on more 32-bit architecturesArnaldo Carvalho de Melo1-4/+4
FYI: I'm carrying this on perf-tools-next. The previous attempt fixed the build on debian:experimental-x-mipsel, but when building on a larger set of containers I noticed it broke the build on some other 32-bit architectures such as: 42 7.87 ubuntu:18.04-x-arm : FAIL gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) builtin-daemon.c: In function 'cmd_session_list': builtin-daemon.c:692:16: error: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'long int' [-Werror=format=] fprintf(out, "%c%" PRIu64, ^~~~~ builtin-daemon.c:694:13: csv_sep, (curr - daemon->start) / 60); ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from builtin-daemon.c:3:0: /usr/arm-linux-gnueabihf/include/inttypes.h:105:34: note: format string is defined here # define PRIu64 __PRI64_PREFIX "u" So lets cast that time_t (32-bit/64-bit) to uint64_t to make sure it builds everywhere. Fixes: 4bbe6002931954bb ("perf daemon: Fix the build on 32-bit architectures") Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Link: https://lore.kernel.org/r/ZsPmldtJ0D9Cua9_@x1 Signed-off-by: Namhyung Kim <[email protected]>
2024-09-02perf python: include "util/sample.h"Xu Yang1-0/+1
The 32-bit arm build system will complain: tools/perf/util/python.c:75:28: error: field ‘sample’ has incomplete type 75 | struct perf_sample sample; However, arm64 build system doesn't complain this. The root cause is arm64 define "HAVE_KVM_STAT_SUPPORT := 1" in tools/perf/arch/arm64/Makefile, but arm arch doesn't define this. This will lead to kvm-stat.h include other header files on arm64 build system, especially "util/sample.h" for util/python.c. This will try to directly include "util/sample.h" for "util/python.c" to avoid such build issue on arm platform. Signed-off-by: Xu Yang <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>