aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-08-24ceph: wait for OSD requests' callbacks to finish when unmountingXiubo Li3-0/+23
The sync_filesystem() will flush all the dirty buffer and submit the osd reqs to the osdc and then is blocked to wait for all the reqs to finish. But the when the reqs' replies come, the reqs will be removed from osdc just before the req->r_callback()s are called. Which means the sync_filesystem() will be woke up by leaving the req->r_callback()s are still running. This will be buggy when the waiter require the req->r_callback()s to release some resources before continuing. So we need to make sure the req->r_callback()s are called before removing the reqs from the osdc. WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0 CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1 Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0 RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00 RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000 RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000 R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40 R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000 FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> generic_shutdown_super+0x47/0x120 kill_anon_super+0x14/0x30 ceph_kill_sb+0x36/0x90 [ceph] deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x67/0xb0 exit_to_user_mode_prepare+0x23d/0x240 syscall_exit_to_user_mode+0x25/0x60 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd83dc39e9b We need to increase the blocker counter to make sure all the osd requests' callbacks have been finished just before calling the kill_anon_super() when unmounting. Link: https://tracker.ceph.com/issues/58126 Signed-off-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: drop messages from MDS when unmountingXiubo Li7-22/+109
When unmounting all the dirty buffers will be flushed and after the last osd request is finished the last reference of the i_count will be released. Then it will flush the dirty cap/snap to MDSs, and the unmounting won't wait the possible acks, which will ihold the inodes when updating the metadata locally but makes no sense any more, of this. This will make the evict_inodes() to skip these inodes. If encrypt is enabled the kernel generate a warning when removing the encrypt keys when the skipped inodes still hold the keyring: WARNING: CPU: 4 PID: 168846 at fs/crypto/keyring.c:242 fscrypt_destroy_keyring+0x7e/0xd0 CPU: 4 PID: 168846 Comm: umount Tainted: G S 6.1.0-rc5-ceph-g72ead199864c #1 Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 RIP: 0010:fscrypt_destroy_keyring+0x7e/0xd0 RSP: 0018:ffffc9000b277e28 EFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff88810d52ac00 RCX: ffff88810b56aa00 RDX: 0000000080000000 RSI: ffffffff822f3a09 RDI: ffff888108f59000 RBP: ffff8881d394fb88 R08: 0000000000000028 R09: 0000000000000000 R10: 0000000000000001 R11: 11ff4fe6834fcd91 R12: ffff8881d394fc40 R13: ffff888108f59000 R14: ffff8881d394f800 R15: 0000000000000000 FS: 00007fd83f6f1080(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f918d417000 CR3: 000000017f89a005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> generic_shutdown_super+0x47/0x120 kill_anon_super+0x14/0x30 ceph_kill_sb+0x36/0x90 [ceph] deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x67/0xb0 exit_to_user_mode_prepare+0x23d/0x240 syscall_exit_to_user_mode+0x25/0x60 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd83dc39e9b Later the kernel will crash when iput() the inodes and dereferencing the "sb->s_master_keys", which has been released by the generic_shutdown_super(). Link: https://tracker.ceph.com/issues/59162 Signed-off-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: prevent snapshot creation in encrypted locked directoriesLuís Henriques1-0/+5
With snapshot names encryption we can not allow snapshots to be created in locked directories because the names wouldn't be encrypted. This patch forces the directory to be unlocked to allow a snapshot to be created. Signed-off-by: Luís Henriques <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: add support for encrypted snapshot namesLuís Henriques3-41/+201
Since filenames in encrypted directories are encrypted and shown as a base64-encoded string when the directory is locked, make snapshot names show a similar behaviour. When creating a snapshot, .snap directories for every subdirectory will show the snapshot name in the "long format": # mkdir .snap/my-snap # ls my-dir/.snap/ _my-snap_1099511627782 Encrypted snapshots will need to be able to handle these by encrypting/decrypting only the snapshot part of the string ('my-snap'). Also, since the MDS prevents snapshot names to be bigger than 240 characters it is necessary to adapt CEPH_NOHASH_NAME_MAX to accommodate this extra limitation. [ idryomov: drop const on !CONFIG_FS_ENCRYPTION branch too ] Signed-off-by: Luís Henriques <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: invalidate pages when doing direct/sync writesLuís Henriques1-5/+14
When doing a direct/sync write, we need to invalidate the page cache in the range being written to. If we don't do this, the cache will include invalid data as we just did a write that avoided the page cache. In the event that invalidation fails, just ignore the error. That likely just means that we raced with another task doing a buffered write, in which case we want to leave the page intact anyway. [ jlayton: minor comment update ] Signed-off-by: Luís Henriques <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: plumb in decryption during readsJeff Layton2-34/+125
Force the use of sparse reads when the inode is encrypted, and add the appropriate code to decrypt the extent map after receiving. Note that the crypto block may be smaller than a page, but the reverse cannot be true. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: add encryption support to writepage and writepagesJeff Layton2-21/+98
Allow writepage to issue encrypted writes. Extend out the requested size and offset to cover complete blocks, and then encrypt and write them to the OSDs. Add the appropriate machinery to write back dirty data with encryption. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: add read/modify/write to ceph_sync_writeJeff Layton1-28/+290
When doing a synchronous write on an encrypted inode, we have no guarantee that the caller is writing crypto block-aligned data. When that happens, we must do a read/modify/write cycle. First, expand the range to cover complete blocks. If we had to change the original pos or length, issue a read to fill the first and/or last pages, and fetch the version of the object from the result. We then copy data into the pages as usual, encrypt the result and issue a write prefixed by an assertion that the version hasn't changed. If it has changed then we restart the whole thing again. If there is no object at that position in the file (-ENOENT), we prefix the write on an exclusive create of the object instead. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: align data in pages in ceph_sync_writeJeff Layton1-10/+10
Encrypted files will need to be dealt with in block-sized chunks and once we do that, the way that ceph_sync_write aligns the data in the bounce buffer won't be acceptable. Change it to align the data the same way it would be aligned in the pagecache. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: don't use special DIO path for encrypted inodesJeff Layton1-2/+4
Eventually I want to merge the synchronous and direct read codepaths, possibly via new netfs infrastructure. For now, the direct path is not crypto-enabled, so use the sync read/write paths instead. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: add truncate size handling support for fscryptXiubo Li4-12/+234
This will transfer the encrypted last block contents to the MDS along with the truncate request only when the new size is smaller and not aligned to the fscrypt BLOCK size. When the last block is located in the file hole, the truncate request will only contain the header. The MDS could fail to do the truncate if there has another client or process has already updated the RADOS object which contains the last block, and will return -EAGAIN, then the kclient needs to retry it. The RMW will take around 50ms, and will let it retry 20 times for now. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: add object version support for sync readXiubo Li2-12/+35
Turn the guts of ceph_sync_read into a new helper that takes an inode and an offset instead of a kiocb struct, and make ceph_sync_read call the helper as a wrapper. Make the new helper always return the last object's version. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: add infrastructure for file encryption and decryptionJeff Layton2-0/+251
...and allow test_dummy_encryption to bypass content encryption if mounted with test_dummy_encryption=clear. [ xiubli: remove test_dummy_encryption=clear support per Ilya ] Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: handle fscrypt fields in cap messages from MDSJeff Layton1-2/+83
Handle the new fscrypt_file and fscrypt_auth fields in cap messages. Use them to populate new fields in cap_extra_info and update the inode with those values. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: size handling in MClientRequest, cap updates and inode tracesJeff Layton6-22/+70
For encrypted inodes, transmit a rounded-up size to the MDS as the normal file size and send the real inode size in fscrypt_file field. Also, fix up creates and truncates to also transmit fscrypt_file. When we get an inode trace from the MDS, grab the fscrypt_file field if the inode is encrypted, and use it to populate the i_size field instead of the regular inode size field. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: mark directory as non-complete after loading keyLuís Henriques4-9/+46
When setting a directory's crypt context, ceph_dir_clear_complete() needs to be called otherwise if it was complete before, any existing (old) dentry will still be valid. This patch adds a wrapper around __fscrypt_prepare_readdir() which will ensure a directory is marked as non-complete if key status changes. [ xiubli: revise commit title per Milind ] Signed-off-by: Luís Henriques <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: allow encrypting a directory while not having Ax capsLuís Henriques1-1/+2
If a client doesn't have Fx caps on a directory, it will get errors while trying encrypt it: ceph: handle_cap_grant: cap grant attempt to change fscrypt_auth on non-I_NEW inode (old len 0 new len 48) fscrypt (ceph, inode 1099511627812): Error -105 getting encryption context A simple way to reproduce this is to use two clients: client1 # mkdir /mnt/mydir client2 # ls /mnt/mydir client1 # fscrypt encrypt /mnt/mydir client1 # echo hello > /mnt/mydir/world This happens because, in __ceph_setattr(), we only initialize ci->fscrypt_auth if we have Ax and ceph_fill_inode() won't use the fscrypt_auth received if the inode state isn't I_NEW. Fix it by allowing ceph_fill_inode() to also set ci->fscrypt_auth if the inode doesn't have it set already. Signed-off-by: Luís Henriques <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: add some fscrypt guardrailsJeff Layton5-9/+55
Add the appropriate calls into fscrypt for various actions, including link, rename, setattr, and the open codepaths. Disable fallocate for encrypted inodes -- hopefully, just for now. If we have an encrypted inode, then the client will need to re-encrypt the contents of the new object. Disable copy offload to or from encrypted inodes. Set i_blkbits to crypto block size for encrypted inodes -- some of the underlying infrastructure for fscrypt relies on i_blkbits being aligned to crypto blocksize. Report STATX_ATTR_ENCRYPTED on encrypted inodes. [ lhenriques: forbid encryption with striped layouts ] Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: create symlinks with encrypted and base64-encoded targetsJeff Layton2-17/+150
When creating symlinks in encrypted directories, encrypt and base64-encode the target with the new inode's key before sending to the MDS. When filling a symlinked inode, base64-decode it into a buffer that we'll keep in ci->i_symlink. When get_link is called, decrypt the buffer into a new one that will hang off i_link. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: add support to readdir for encrypted namesXiubo Li6-26/+144
To make it simpler to decrypt names in a readdir reply (i.e. before we have a dentry), add a new ceph_encode_encrypted_fname()-like helper that takes a qstr pointer instead of a dentry pointer. Once we've decrypted the names in a readdir reply, we no longer need the crypttext, so overwrite them in ceph_mds_reply_dir_entry with the unencrypted names. Then in both ceph_readdir_prepopulate() and ceph_readdir() we will use the dencrypted name directly. [ jlayton: convert some BUG_ONs into error returns ] Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: pass the request to parse_reply_info_readdir()Xiubo Li1-10/+13
Instead of passing just the r_reply_info to the readdir reply parser, pass the request pointer directly instead. This will facilitate implementing readdir on fscrypted directories. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: make ceph_fill_trace and ceph_get_name decrypt namesJeff Layton2-14/+60
When we get a dentry in a trace, decrypt the name so we can properly instantiate the dentry or fill out ceph_get_name() buffer. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: add helpers for converting names for userland presentationJeff Layton2-0/+123
Define a new ceph_fname struct that we can use to carry information about encrypted dentry names. Add helpers for working with these objects, including ceph_fname_to_usr which formats an encrypted filename for userland presentation. [ xiubli: fix resulting name length check -- neither name_len nor ctext_len should exceed NAME_MAX ] Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: make d_revalidate call fscrypt revalidator for encrypted dentriesJeff Layton1-2/+7
If we have a dentry which represents a no-key name, then we need to test whether the parent directory's encryption key has since been added. Do that before we test anything else about the dentry. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: set DCACHE_NOKEY_NAME flag in ceph_lookup/atomic_open()Jeff Layton2-0/+18
This is required so that we know to invalidate these dentries when the directory is unlocked. Atomic open can act as a lookup if handed a dentry that is negative on the MDS. Ensure that we set DCACHE_NOKEY_NAME on the dentry in atomic_open, if we don't have the key for the parent. Otherwise, we can end up validating the dentry inappropriately if someone later adds a key. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: decode alternate_name in lease infoJeff Layton2-10/+38
Ceph is a bit different from local filesystems, in that we don't want to store filenames as raw binary data, since we may also be dealing with clients that don't support fscrypt. We could just base64-encode the encrypted filenames, but that could leave us with filenames longer than NAME_MAX. It turns out that the MDS doesn't care much about filename length, but the clients do. To manage this, we've added a new "alternate name" field that can be optionally added to any dentry that we'll use to store the binary crypttext of the filename if its base64-encoded value will be longer than NAME_MAX. When a dentry has one of these names attached, the MDS will send it along in the lease info, which we can then store for later usage. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: send alternate_name in MClientRequestJeff Layton2-5/+74
In the event that we have a filename longer than CEPH_NOHASH_NAME_MAX, we'll need to hash the tail of the filename. The client however will still need to know the full name of the file if it has a key. To support this, the MClientRequest field has grown a new alternate_name field that we populate with the full (binary) crypttext of the filename. This is then transmitted to the clients in readdir or traces as part of the dentry lease. Add support for populating this field when the filenames are very long. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-24ceph: encode encrypted name in ceph_mdsc_build_path and dentry releaseJeff Layton5-25/+172
Allow ceph_mdsc_build_path to encrypt and base64 encode the filename when the parent is encrypted and we're sending the path to the MDS. In a similar fashion, encode encrypted dentry names if including a dentry release in a request. In most cases, we just encrypt the filenames and base64 encode them, but when the name is longer than CEPH_NOHASH_NAME_MAX, we use a similar scheme to fscrypt proper, and hash the remaning bits with sha256. When doing this, we then send along the full crypttext of the name in the new alternate_name field of the MClientRequest. The MDS can then send that along in readdir responses and traces. [ idryomov: drop duplicate include reported by Abaci Robot ] Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-and-tested-by: Luís Henriques <[email protected]> Reviewed-by: Milind Changire <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2023-08-23riscv: support the elf-fdpic binfmt loaderGreg Ungerer1-1/+1
Add support for enabling and using the binfmt_elf_fdpic program loader on RISC-V platforms. The most important change is to setup registers during program load to pass the mapping addresses to the new process. One of the interesting features of the elf-fdpic loader is that it also allows appropriately compiled ELF format binaries to be loaded on nommu systems. Appropriate being those compiled with -pie. Signed-off-by: Greg Ungerer <[email protected]> Acked-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2023-08-23binfmt_elf_fdpic: support 64-bit systemsGreg Ungerer1-19/+19
The binfmt_flat_fdpic code has a number of 32-bit specific data structures associated with it. Extend it to be able to support and be used on 64-bit systems as well. The new code defines a number of key 64-bit variants of the core elf-fdpic data structures - along side the existing 32-bit sized ones. A common set of generic named structures are defined to be either the 32-bit or 64-bit ones as required at compile time. This is a similar technique to that used in the ELF binfmt loader. For example: elf_fdpic_loadseg is either elf32_fdpic_loadseg or elf64_fdpic_loadseg elf_fdpic_loadmap is either elf32_fdpic_loadmap or elf64_fdpic_loadmap the choice based on ELFCLASS32 or ELFCLASS64. Signed-off-by: Greg Ungerer <[email protected]> Acked-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2023-08-23NFS: Enable the READ_PLUS operation by defaultAnna Schumaker1-4/+2
Now that the remaining issues have been worked out, including some unexpected 32 bit issues, we can safely enable the feature by default. Signed-off-by: Anna Schumaker <[email protected]>
2023-08-23NFSv4.2: Rework scratch handling for READ_PLUS (again)Anna Schumaker5-13/+14
I found that the read code might send multiple requests using the same nfs_pgio_header, but nfs4_proc_read_setup() is only called once. This is how we ended up occasionally double-freeing the scratch buffer, but also means we set a NULL pointer but non-zero length to the xdr scratch buffer. This results in an oops the first time decoding needs to copy something to scratch, which frequently happens when decoding READ_PLUS hole segments. I fix this by moving scratch handling into the pageio read code. I provide a function to allocate scratch space for decoding read replies, and free the scratch buffer when the nfs_pgio_header is freed. Fixes: fbd2a05f29a9 (NFSv4.2: Rework scratch handling for READ_PLUS) Signed-off-by: Anna Schumaker <[email protected]>
2023-08-23NFSv4.2: Fix READ_PLUS size calculationsAnna Schumaker1-3/+9
I bump the decode_read_plus_maxsz to account for hole segments, but I need to subtract out this increase when calling rpc_prepare_reply_pages() so the common case of single data segment replies can be directly placed into the xdr pages without needing to be shifted around. Reported-by: Chuck Lever <[email protected]> Fixes: d3b00a802c845 ("NFS: Replace the READ_PLUS decoding code") Signed-off-by: Anna Schumaker <[email protected]>
2023-08-23NFSv4.2: Fix READ_PLUS smatch warningsAnna Schumaker1-2/+1
Smatch reports: fs/nfs/nfs42xdr.c:1131 decode_read_plus() warn: missing error code? 'status' Which Dan suggests to fix by doing a hardcoded "return 0" from the "if (segments == 0)" check. Additionally, smatch reports that the "status = -EIO" assignment is not used. This patch addresses both these issues. Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Fixes: d3b00a802c845 ("NFS: Replace the READ_PLUS decoding code") Signed-off-by: Anna Schumaker <[email protected]>
2023-08-23f2fs: compress: fix to assign compress_level for lz4 correctlyChao Yu1-1/+1
After remount, F2FS_OPTION().compress_level was assgin to LZ4HC_DEFAULT_CLEVEL incorrectly, result in lz4hc:9 was enabled, fix it. 1. mount /dev/vdb /dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4,compress_log_size=2,...) 2. mount -t f2fs -o remount,compress_log_size=3 /mnt/f2fs/ 3. mount|grep f2fs /dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4:9,compress_log_size=3,...) Fixes: 00e120b5e4b5 ("f2fs: assign default compression level") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-08-23f2fs: fix error path of f2fs_submit_page_read()Chao Yu1-0/+3
In error path of f2fs_submit_page_read(), it missed to call iostat_update_and_unbind_ctx() and free bio_post_read_ctx, fix it. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-08-23f2fs: clean up error handling in sanity_check_{compress_,}inode()Chao Yu1-19/+4
In sanity_check_{compress_,}inode(), it doesn't need to set SBI_NEED_FSCK in each error case, instead, we can set the flag in do_read_inode() only once when sanity_check_inode() fails. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2023-08-23erofs: release ztailpacking pclusters properlyJingbo Xu1-1/+4
Currently ztailpacking pclusters are chained with FOLLOWED_NOINPLACE and not recorded into the managed_pslots XArray. After commit 7674a42f35ea ("erofs: use struct lockref to replace handcrafted approach"), ztailpacking pclusters won't be freed with erofs_workgroup_put() anymore, which will cause the following issue: BUG erofs_pcluster-1 (Tainted: G OE ): Objects remaining in erofs_pcluster-1 on __kmem_cache_shutdown() Use z_erofs_free_pcluster() directly to free ztailpacking pclusters. Fixes: 7674a42f35ea ("erofs: use struct lockref to replace handcrafted approach") Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2023-08-23erofs: don't warn dedupe and fragments features anymoresunshijie1-4/+0
The `dedupe` and `fragments` features have been merged for a year. They are mostly stable now. Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: sunshijie <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2023-08-23erofs: adapt folios for z_erofs_read_folio()Gao Xiang1-5/+4
It's a straight-forward conversion and no logic changes (except that it renames the corresponding tracepoint.) Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-23erofs: adapt folios for z_erofs_readahead()Gao Xiang1-17/+15
It's a straight-forward conversion except that readahead_folio() will do folio_put() in advance but it doesn't matter since folios are still locked. As before, since file-backed folios (pages for now) are locked, so we could temporarily use folio->private as an internal counter to indicate split parts of each folio for the corresponding pclusters to decompress. When such counter becomes zero, the folio will be finally unlocked (see compress.h and z_erofs_onlinepage_endio()). Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-23erofs: get rid of fe->backmost for cache decompressionGao Xiang1-5/+2
EROFS_MAP_FULL_MAPPED is more accurate to decide if caching the last incomplete pcluster for later read or not. Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-23erofs: drop z_erofs_page_mark_eio()Gao Xiang1-20/+9
It can be folded into z_erofs_onlinepage_endio() to simplify the code. Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-23erofs: tidy up z_erofs_do_read_page()Gao Xiang1-29/+24
- Fix a typo: spiltted => split; - Move !EROFS_MAP_MAPPED and EROFS_MAP_FRAGMENT upwards; - Increase `split` in advance to avoid unnecessary repeats. Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-23erofs: move preparation logic into z_erofs_pcluster_begin()Gao Xiang1-33/+27
Some preparation logic should be part of z_erofs_pcluster_begin() instead of z_erofs_do_read_page(). Let's move now. Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-23erofs: avoid obsolete {collector,collection} termsGao Xiang1-21/+18
{collector,collection} were once reserved in order to indicate different runtime logical extent instance of multi-reference pclusters. However, de-duplicated decompression has been landed in a more flexable way, thus `struct z_erofs_collection` was formally removed in commit 87ca34a7065d ("erofs: get rid of `struct z_erofs_collection'"). Let's handle the remaining leftovers, for example: `z_erofs_collector_begin` => `z_erofs_pcluster_begin` `z_erofs_collector_end` => `z_erofs_pcluster_end` as well as some comments. No logic changes. Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-23erofs: simplify z_erofs_read_fragment()Gao Xiang1-26/+13
A trivial cleanup to make the fragment handling logic more clear. Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-23erofs: remove redundant erofs_fs_type declaration in super.cFerry Meng1-1/+0
As erofs_fs_type has been declared in internal.h, there is no use to declare repeatedly in super.c. Signed-off-by: Ferry Meng <[email protected]> Reviewed-by: Gao Xiang <[email protected]> eviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2023-08-23erofs: add necessary kmem_cache_create flags for erofs inode cacheFerry Meng1-3/+3
To improve memory access efficiency and enable statistics functionality, add SLAB_MEM_SPREAD and SLAB_ACCOUNT flag during erofs_inode_cachep's allocation time. Signed-off-by: Ferry Meng <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2023-08-23erofs: clean up redundant comment and adjust code alignmentFerry Meng1-18/+4
Remove some redundant comments in erofs/super.c, and avoid unncessary line breaks for cleanup. Signed-off-by: Ferry Meng <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>