aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-08-18xfs: drop ->writepage completelyDave Chinner1-17/+0
->writepage is only used in one place - single page writeback from memory reclaim. We only allow such writeback from kswapd, not from direct memory reclaim, and so it is rarely used. When it comes from kswapd, it is effectively random dirty page shoot-down, which is horrible for IO patterns. We will already have background writeback trying to clean all the dirty pages in memory as efficiently as possible, so having kswapd interrupt our well formed IO stream only slows things down. So get rid of xfs_vm_writepage() completely. Signed-off-by: Dave Chinner <[email protected]> [djwong: forward port to 5.15] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-19isystem: ship and use stdarg.hAlexey Dobriyan3-3/+3
Ship minimal stdarg.h (1 type, 4 macros) as <linux/stdarg.h>. stdarg.h is the only userspace header commonly used in the kernel. GPL 2 version of <stdarg.h> can be extracted from http://archive.debian.org/debian/pool/main/g/gcc-4.2/gcc-4.2_4.2.4.orig.tar.gz Signed-off-by: Alexey Dobriyan <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2021-08-18ovl: enable RCU'd ->get_acl()Miklos Szeredi2-4/+16
Overlayfs does not cache ACL's (to avoid double caching). Instead it just calls the underlying filesystem's i_op->get_acl(), which will return the cached value, if possible. In rcu path walk, however, get_cached_acl_rcu() is employed to get the value from the cache, which will fail on overlayfs resulting in dropping out of rcu walk mode. This can result in a big performance hit in certain situations. Fix by calling ->get_acl() with rcu=true in case of ACL_DONT_CACHE (which indicates pass-through) Reported-by: garyhuang <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-18vfs: add rcu argument to ->get_acl() callbackMiklos Szeredi36-37/+88
Add a rcu argument to the ->get_acl() callback to allow get_cached_acl_rcu() to call the ->get_acl() method in the next patch. Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-18mm/swap: consider max pages in iomap_swapfile_add_extentXu Yu1-0/+6
When the max pages (last_page in the swap header + 1) is smaller than the total pages (inode size) of the swapfile, iomap_swapfile_activate overwrites sis->max with total pages. However, frontswap_map is a swap page state bitmap allocated using the initial sis->max page count read from the swap header. If swapfile activation increases sis->max, it's possible for the frontswap code to walk off the end of the bitmap, thereby corrupting kernel memory. [djwong: modify the description a bit; the original paragraph reads: "However, frontswap_map is allocated using max pages. When test and clear the sis offset, which is larger than max pages, of frontswap_map in __frontswap_invalidate_page(), neighbors of frontswap_map may be overwritten, i.e., slab is polluted." Note also that this bug resulted in a behavioral change: activating a swap file that was formatted and later extended results in all pages being activated, not the number of pages recorded in the swap header.] This fixes the issue by considering the limitation of max pages of swap info in iomap_swapfile_add_extent(). To reproduce the case, compile kernel with slub RED ZONE, then run test: $ sudo stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times \ --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \ --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \ --temp-path /root/tmpdir/stress-ng/ We'll get the error log as below: [ 1151.015141] ============================================================================= [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten [ 1151.017486] ----------------------------------------------------------------------------- [ 1151.017486] [ 1151.018997] Disabling lock debugging due to kernel taint [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816 [ 1151.022538] __slab_alloc+0xe/0x20 [ 1151.023069] __kmalloc_node+0xfd/0x4b0 [ 1151.023704] __do_sys_swapon+0xcf6/0x1170 [ 1151.024346] do_syscall_64+0x33/0x40 [ 1151.024925] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041 [ 1151.026889] kfree+0x276/0x2b0 [ 1151.027405] put_cred_rcu+0xa1/0xc0 [ 1151.027949] rcu_do_batch+0x17d/0x410 [ 1151.028566] rcu_core+0x14e/0x2b0 [ 1151.029084] __do_softirq+0x101/0x29e [ 1151.029645] asm_call_irq_on_stack+0x12/0x20 [ 1151.030381] do_softirq_own_stack+0x37/0x40 [ 1151.031037] do_softirq.part.15+0x2b/0x30 [ 1151.031710] __local_bh_enable_ip+0x4b/0x50 [ 1151.032412] copy_fpstate_to_sigframe+0x111/0x360 [ 1151.033197] __setup_rt_frame+0xce/0x480 [ 1151.033809] arch_do_signal+0x1a3/0x250 [ 1151.034463] exit_to_user_mode_prepare+0xcf/0x110 [ 1151.035242] syscall_exit_to_user_mode+0x27/0x190 [ 1151.035970] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201 [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000 [ 1151.038323] [ 1151.039683] Redzone 000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ [ 1151.041180] Object 000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 1151.042714] Redzone 0000000084e43932: 00 00 00 c0 cc cc cc cc ........ [ 1151.044120] Padding 000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G B 5.10.50+ #7 [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 1151.048633] Call Trace: [ 1151.049072] dump_stack+0x57/0x6a [ 1151.049585] check_bytes_and_report+0xed/0x110 [ 1151.050320] check_object+0x1eb/0x290 [ 1151.050924] ? __x64_sys_swapoff+0x39a/0x540 [ 1151.051646] free_debug_processing+0x151/0x350 [ 1151.052333] __slab_free+0x21a/0x3a0 [ 1151.052938] ? _cond_resched+0x2d/0x40 [ 1151.053529] ? __vunmap+0x1de/0x220 [ 1151.054139] ? __x64_sys_swapoff+0x39a/0x540 [ 1151.054796] ? kfree+0x276/0x2b0 [ 1151.055307] kfree+0x276/0x2b0 [ 1151.055832] __x64_sys_swapoff+0x39a/0x540 [ 1151.056466] do_syscall_64+0x33/0x40 [ 1151.057084] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1151.057866] RIP: 0033:0x150340b0ffb7 [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d. [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8 [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7 [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860 [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094 [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc [ 1151.066413] [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed Fixes: 67482129cdab ("iomap: add a swapfile activation function") Fixes: a45c0eccc564 ("iomap: move the swapfile code into a separate file") Signed-off-by: Gang Deng <[email protected]> Signed-off-by: Xu Yu <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2021-08-18Merge tag 'for-5.14-rc6-tag' of ↵Linus Torvalds1-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more fix for cross-rename, adding a missing check for directory and subvolume, this could lead to a crash" * tag 'for-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: prevent rename2 from exchanging a subvol with a directory from different parents
2021-08-18pipe: avoid unnecessary EPOLLET wakeups under normal loadsLinus Torvalds1-6/+9
I had forgotten just how sensitive hackbench is to extra pipe wakeups, and commit 3a34b13a88ca ("pipe: make pipe writes always wake up readers") ended up causing a quite noticeable regression on larger machines. Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's not clear that this matters in real life all that much, but as Mel points out, it's used often enough when comparing kernels and so the performance regression shows up like a sore thumb. It's easy enough to fix at least for the common cases where pipes are used purely for data transfer, and you never have any exciting poll usage at all. So set a special 'poll_usage' flag when there is polling activity, and make the ugly "EPOLLET has crazy legacy expectations" semantics explicit to only that case. I would love to limit it to just the broken EPOLLET case, but the pipe code can't see the difference between epoll and regular select/poll, so any non-read/write waiting will trigger the extra wakeup behavior. That is sufficient for at least the hackbench case. Apart from making the odd extra wakeup cases more explicitly about EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe write, not at the first write chunk. That is actually much saner semantics (as much as you can call any of the legacy edge-triggered expectations for EPOLLET "sane") since it means that you know the wakeup will happen once the write is done, rather than possibly in the middle of one. [ For stable people: I'm putting a "Fixes" tag on this, but I leave it up to you to decide whether you actually want to backport it or not. It likely has no impact outside of synthetic benchmarks - Linus ] Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/ Fixes: 3a34b13a88ca ("pipe: make pipe writes always wake up readers") Reported-by: kernel test robot <[email protected]> Tested-by: Sandeep Patil <[email protected]> Tested-by: Mel Gorman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-08-19erofs: add fiemap support with iomapGao Xiang5-1/+59
This adds fiemap support for both uncompressed files and compressed files by using iomap infrastructure. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]>
2021-08-19erofs: add support for the full decompressed lengthGao Xiang2-8/+92
Previously, there is no need to get the full decompressed length since EROFS supports partial decompression. However for some other cases such as fiemap, the full decompressed length is necessary for iomap to make it work properly. This patch adds a way to get the full decompressed length. Note that it takes more metadata overhead and it'd be avoided if possible in the performance sensitive scenario. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]>
2021-08-18block: Introduce IOPRIO_NR_LEVELSDamien Le Moal1-1/+1
The BFQ scheduler and ioprio_check_cap() both assume that the RT priority class (IOPRIO_CLASS_RT) can have up to 8 different priority levels, similarly to the BE class (IOPRIO_CLASS_iBE). This is controlled using the IOPRIO_BE_NR macro , which is badly named as the number of levels also applies to the RT class. Introduce the class independent IOPRIO_NR_LEVELS macro, defined to 8, to make things clear. Keep the old IOPRIO_BE_NR macro definition as an alias for IOPRIO_NR_LEVELS. Signed-off-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-08-17io_uring: pin ctx on fallback executionPavel Begunkov1-0/+2
Pin ring in io_fallback_req_func() by briefly elevating ctx->refs in case any task_work handler touches ctx after releasing a request. Fixes: 9011bf9a13e3b ("io_uring: fix stuck fallback reqs") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/833a494713d235ec144284a9bbfe418df4f6b61c.1629235576.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-08-17fuse: truncate pagecache on atomic_o_truncMiklos Szeredi1-2/+5
fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic O_TRUNC. This can deadlock with fuse_wait_on_page_writeback() in fuse_launder_page() triggered by invalidate_inode_pages2(). Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a truncate_pagecache() call. This makes sense regardless of FOPEN_KEEP_CACHE or fc->writeback cache, so do it unconditionally. Reported-by: Xie Yongji <[email protected]> Reported-and-tested-by: [email protected] Fixes: e4648309b85a ("fuse: truncate pending writes on O_TRUNC") Cc: <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17f2fs: compress: do sanity check on clusterChao Yu3-0/+62
This patch adds f2fs_sanity_check_cluster() to support doing sanity check on cluster of compressed file, it will be triggered from below two paths: - __f2fs_cluster_blocks() - f2fs_map_blocks(F2FS_GET_BLOCK_FIEMAP) And it can detect below three kind of cluster insanity status. C: COMPRESS_ADDR N: NULL_ADDR or NEW_ADDR V: valid blkaddr *: any value 1. [*|C|*|*] 2. [C|*|C|*] 3. [C|N|N|V] Signed-off-by: Chao Yu <[email protected]> [Nathan Chancellor: fix missing inline warning] Signed-off-by: Jaegeuk Kim <[email protected]>
2021-08-17f2fs: convert S_IRUGO to 0444Yangtao Li2-5/+5
To fix: WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. Signed-off-by: Yangtao Li <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2021-08-17f2fs: fix to keep compatibility of fault injection interfaceChao Yu1-0/+1
The value of FAULT_* macros and its description in f2fs.rst became inconsistent, fix this to keep compatibility of fault injection interface. Fixes: 67883ade7a98 ("f2fs: remove FAULT_ALLOC_BIO") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2021-08-17f2fs: support fault injection for f2fs_kmem_cache_alloc()Chao Yu12-30/+58
This patch supports to inject fault into f2fs_kmem_cache_alloc(). Usage: a) echo 32768 > /sys/fs/f2fs/<dev>/inject_type or b) mount -o fault_type=32768 <dev> <mountpoint> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2021-08-17f2fs: compress: allow write compress released file after truncate to zeroFengnan Chang1-0/+8
For compressed file, after release compress blocks, don't allow write direct, but we should allow write direct after truncate to zero. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Fengnan Chang <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2021-08-17nfsd4: Fix forced-expiry lockingJ. Bruce Fields1-2/+2
This should use the network-namespace-wide client_lock, not the per-client cl_lock. You shouldn't see any bugs unless you're actually using the forced-expiry interface introduced by 89c905beccbb. Fixes: 89c905beccbb "nfsd: allow forced expiration of NFSv4 clients" Signed-off-by: J. Bruce Fields <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2021-08-17lockd: change the proc_handler for nsm_use_hostnamesJia He1-1/+1
nsm_use_hostnames is a module parameter and it will be exported to sysctl procfs. This is to let user sometimes change it from userspace. But the minimal unit for sysctl procfs read/write it sizeof(int). In big endian system, the converting from/to bool to/from int will cause error for proc items. This patch use a new proc_handler proc_dobool to fix it. Signed-off-by: Jia He <[email protected]> Reviewed-by: Pan Xinhui <[email protected]> [thuth: Fix typo in commit message] Signed-off-by: Thomas Huth <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2021-08-17NFSD: remove vanity commentsNeilBrown1-1/+0
Including one's name in copyright claims is appropriate. Including it in random comments is just vanity. After 2 decades, it is time for these to be gone. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2021-08-17lockd: Fix invalid lockowner cast after vfs_test_lockBenjamin Coddington1-1/+1
After calling vfs_test_lock() the pointer to a conflicting lock can be returned, and that lock is not guarunteed to be owned by nlm. In that case, we cannot cast it to struct nlm_lockowner. Instead return the pid of that conflicting lock. Fixes: 646d73e91b42 ("lockd: Show pid of lockd for remote locks") Signed-off-by: Benjamin Coddington <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2021-08-17NFSD: Use new __string_len C macros for nfsd_clid_classChuck Lever1-3/+2
Clean up. Signed-off-by: Chuck Lever <[email protected]>
2021-08-17NFSD: Use new __string_len C macros for the nfs_dirent tracepointChuck Lever1-7/+5
Clean up. Signed-off-by: Chuck Lever <[email protected]>
2021-08-17NFSD: Batch release pages during splice readChuck Lever1-7/+2
Large splice reads call put_page() repeatedly. put_page() is relatively expensive to call, so replace it with the new svc_rqst_replace_page() helper to help amortize that cost. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: NeilBrown <[email protected]>
2021-08-17NFSD: Clean up splice actorChuck Lever1-8/+3
A few useful observations: - The value in @size is never modified. - splice_desc.len is an unsigned int, and so is xdr_buf.page_len. An implicit cast to size_t is unnecessary. - The computation of .page_len is the same in all three arms of the "if" statement, so hoist it out to make it clear that the operation is an unconditional invariant. The resulting function is 18 bytes shorter on my system (-Os). Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: NeilBrown <[email protected]>
2021-08-17ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()chenying1-2/+4
If function ovl_instantiate() returns an error, ovl_cleanup will be called and try to remove newdentry from wdir, but the newdentry has been moved to udir at this time. This will causes BUG_ON(victim->d_parent->d_inode != dir) in fs/namei.c:may_delete. Signed-off-by: chenying <[email protected]> Fixes: 01b39dcc9568 ("ovl: use inode_insert5() to hash a newly created inode") Link: https://lore.kernel.org/linux-unionfs/[email protected]/ Cc: <[email protected]> # v4.18 Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: use kvalloc in xattr copy-upMiklos Szeredi1-4/+5
Extended attributes are usually small, but could be up to 64k in size, so use the most efficient method for doing the allocation. Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: update ctime when changing fileattrChengguang Xu1-0/+3
Currently we keep size, mode and times of overlay inode as the same as upper inode, so should update ctime when changing file attribution as well. Signed-off-by: Chengguang Xu <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: skip checking lower file's i_writecount on truncateChengguang Xu1-6/+0
It is possible that a directory tree is shared between multiple overlay instances as a lower layer. In this case when one instance executes a file residing on the lower layer, the other instance denies a truncate(2) call on this file. This only happens for truncate(2) and not for open(2) with the O_TRUNC flag. Fix this interference and inconsistency by removing the preliminary i_writecount check before copy-up. This means that unlike on normal filesystems truncate(argv[0]) will now succeed. If this ever causes a regression in a real world use case this needs to be revisited. One way to fix this properly would be to keep a correct i_writecount in the overlay inode, but that is difficult due to memory mapping code only dealing with the real file/inode. Signed-off-by: Chengguang Xu <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: relax lookup error on mismatch origin ftypeAmir Goldstein1-1/+1
We get occasional reports of lookup errors due to mismatched origin ftype from users that re-format a lower squashfs image. Commit 13c6ad0f45fd ("ovl: document lower modification caveats") tries to discourage the practice of re-formating lower layers and describes the expected behavior as undefined. Commit b0e0f69731cd ("ovl: restrict lower null uuid for "xino=auto"") limits the configurations in which origin file handles are followed. In addition to these measures, change the behavior in case of detecting a mismatch origin ftype in lookup to issue a warning, not follow origin, but not fail the lookup operation either. That should make overall more users happy without any big consequences. Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxgPq9E9xxwU2CDyHy-_yCZZeymg+3n+-6AqkGGE1YtwvQ@mail.gmail.com/ Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: do not set overlay.opaque for new directoriesVyacheslav Yurkov1-1/+3
Enable optimizations only if user opted-in for any of extended features. If optimization is enabled, it breaks existing use case when a lower layer directory appears after directory was created on a merged layer. If overlay.opaque is applied, new files on lower layer are not visible. Consider the following scenario: - /lower and /upper are mounted to /merged - directory /merged/new-dir is created with a file test1 - overlay is unmounted - directory /lower/new-dir is created with a file test2 - overlay is mounted again If opaque is applied by default, file test2 is not going to be visible without explicitly clearing the overlay.opaque attribute Signed-off-by: Vyacheslav Yurkov <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: add ovl_allow_offline_changes() helperVyacheslav Yurkov2-3/+13
Allows to check whether any of extended features are enabled Signed-off-by: Vyacheslav Yurkov <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: disable decoding null uuid with redirect_dirVyacheslav Yurkov1-1/+1
Currently decoding origin with lower null uuid is not allowed unless user opted-in to one of the new features that require following the lower inode of non-dir upper (index, xino, metacopy). Now we add redirect_dir too to that feature list. Signed-off-by: Vyacheslav Yurkov <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: consistent behavior for immutable/append-only inodesAmir Goldstein4-7/+158
When a lower file has immutable/append-only fileattr flags, the behavior of overlayfs post copy up is inconsistent. Immediattely after copy up, ovl inode still has the S_IMMUTABLE/S_APPEND inode flags copied from lower inode, so vfs code still treats the ovl inode as immutable/append-only. After ovl inode evict or mount cycle, the ovl inode does not have these inode flags anymore. We cannot copy up the immutable and append-only fileattr flags, because immutable/append-only inodes cannot be linked and because overlayfs will not be able to set overlay.* xattr on the upper inodes. Instead, if any of the fileattr flags of interest exist on the lower inode, we store them in overlay.protattr xattr on the upper inode and we read the flags from xattr on lookup and on fileattr_get(). This gives consistent behavior post copy up regardless of inode eviction from cache. When user sets new fileattr flags, we update or remove the overlay.protattr xattr. Storing immutable/append-only fileattr flags in an xattr instead of upper fileattr also solves other non-standard behavior issues - overlayfs can now copy up children of "ovl-immutable" directories and lower aliases of "ovl-immutable" hardlinks. Reported-by: Chengguang Xu <[email protected]> Link: https://lore.kernel.org/linux-unionfs/[email protected]/ Link: https://lore.kernel.org/linux-unionfs/[email protected]/ Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: copy up sync/noatime fileattr flagsAmir Goldstein3-21/+89
When a lower file has sync/noatime fileattr flags, the behavior of overlayfs post copy up is inconsistent. Immediately after copy up, ovl inode still has the S_SYNC/S_NOATIME inode flags copied from lower inode, so vfs code still treats the ovl inode as sync/noatime. After ovl inode evict or mount cycle, the ovl inode does not have these inode flags anymore. To fix this inconsistency, try to copy the fileattr flags on copy up if the upper fs supports the fileattr_set() method. This gives consistent behavior post copy up regardless of inode eviction from cache. We cannot copy up the immutable/append-only inode flags in a similar manner, because immutable/append-only inodes cannot be linked and because overlayfs will not be able to set overlay.* xattr on the upper inodes. Those flags will be addressed by a followup patch. Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17ovl: pass ovl_fs to ovl_check_setxattr()Amir Goldstein5-15/+16
Instead of passing the overlay dentry. Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-17fs: add generic helper for filling statx attribute flagsAmir Goldstein2-6/+19
The immutable and append-only properties on an inode are published on the inode's i_flags and enforced by the VFS. Create a helper to fill the corresponding STATX_ATTR_ flags in the kstat structure from the inode's i_flags. Only orange was converted to use this helper. Other filesystems could use it in the future. Suggested-by: Miklos Szeredi <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2021-08-16iomap: move loop control code to iter.cDarrick J. Wong2-1/+1
Now that we've moved iomap to the iterator model, rename this file to be in sync with the functions contained inside of it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2021-08-16iomap: constify iomap_iter_srcmapChristoph Hellwig1-19/+19
The srcmap returned from iomap_iter_srcmap is never modified, so mark the iomap returned from it const and constify a lot of code that never modifies the iomap. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16fsdax: switch the fault handlers to use iomap_iterChristoph Hellwig1-118/+75
Avoid the open coded calls to ->iomap_begin and ->iomap_end and call iomap_iter instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16fsdax: factor out a dax_fault_actor() helperShiyang Ruan1-148/+149
The core logic in the two dax page fault functions is similar. So, move the logic into a common helper function. Also, to facilitate the addition of new features, such as CoW, switch-case is no longer used to handle different iomap types. Signed-off-by: Shiyang Ruan <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Ritesh Harjani <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16fsdax: factor out helpers to simplify the dax fault codeShiyang Ruan1-69/+84
The dax page fault code is too long and a bit difficult to read. And it is hard to understand when we trying to add new features. Some of the PTE/PMD codes have similar logic. So, factor out helper functions to simplify the code. Signed-off-by: Shiyang Ruan <[email protected]> Reviewed-by: Ritesh Harjani <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> [hch: minor cleanups] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16iomap: rework unshare flagChristoph Hellwig1-14/+9
Instead of another internal flags namespace inside of buffered-io.c, just pass a UNSHARE hint in the main iomap flags field. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16iomap: pass an iomap_iter to various buffered I/O helpersChristoph Hellwig1-71/+66
Pass the iomap_iter structure instead of individual parameters to various internal helpers for buffered I/O. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16iomap: remove iomap_applyChristoph Hellwig2-131/+0
iomap_apply is unused now, so remove it. Signed-off-by: Christoph Hellwig <[email protected]> [djwong: rebase this patch to preserve git history of iomap loop control] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2021-08-16fsdax: switch dax_iomap_rw to use iomap_iterChristoph Hellwig1-25/+24
Switch the dax_iomap_rw implementation to use iomap_iter. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16iomap: switch iomap_swapfile_activate to use iomap_iterChristoph Hellwig1-22/+16
Switch iomap_swapfile_activate to use iomap_iter. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16iomap: switch iomap_seek_data to use iomap_iterChristoph Hellwig1-23/+24
Rewrite iomap_seek_data to use iomap_iter. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16iomap: switch iomap_seek_hole to use iomap_iterChristoph Hellwig1-25/+26
Rewrite iomap_seek_hole to use iomap_iter. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-08-16iomap: switch iomap_bmap to use iomap_iterChristoph Hellwig1-18/+13
Rewrite the ->bmap implementation based on iomap_iter. Signed-off-by: Christoph Hellwig <[email protected]> [djwong: restructure the loop to make its behavior a little clearer] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>