diff options
author | Aleksa Sarai <cyphar@cyphar.com> | 2024-08-28 20:19:42 +1000 |
---|---|---|
committer | Christian Brauner <brauner@kernel.org> | 2024-09-05 11:37:45 +0200 |
commit | b4fef22c2fb97fa204f0c99c7c7f1c6b422ef0aa (patch) | |
tree | 31943c93efe248b645a17b9ecda6f4017ef9f32f /fs/fhandle.c | |
parent | 5c40e050e6ac0218af7c520095729d440cc87e6b (diff) |
uapi: explain how per-syscall AT_* flags should be allocated
Unfortunately, the way we have gone about adding new AT_* flags has
been a little messy. In the beginning, all of the AT_* flags had generic
meanings and so it made sense to share the flag bits indiscriminately.
However, we inevitably ran into syscalls that needed their own
syscall-specific flags. Due to the lack of a planned out policy, we
ended up with the following situations:
* Existing syscalls adding new features tended to use new AT_* bits,
with some effort taken to try to re-use bits for flags that were so
obviously syscall specific that they only make sense for a single
syscall (such as the AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID triplet).
Given the constraints of bitflags, this works well in practice, but
ideally (to avoid future confusion) we would plan ahead and define a
set of "per-syscall bits" ahead of time so that when allocating new
bits we don't end up with a complete mish-mash of which bits are
supposed to be per-syscall and which aren't.
* New syscalls dealt with this in several ways:
- Some syscalls (like renameat2(2), move_mount(2), fsopen(2), and
fspick(2)) created their separate own flag spaces that have no
overlap with the AT_* flags. Most of these ended up allocating
their bits sequentually.
In the case of move_mount(2) and fspick(2), several flags have
identical meanings to AT_* flags but were allocated in their own
flag space.
This makes sense for syscalls that will never share AT_* flags, but
for some syscalls this leads to duplication with AT_* flags in a
way that could cause confusion (if renameat2(2) grew a
RENAME_EMPTY_PATH it seems likely that users could mistake it for
AT_EMPTY_PATH since it is an *at(2) syscall).
- Some syscalls unfortunately ended up both creating their own flag
space while also using bits from other flag spaces. The most
obvious example is open_tree(2), where the standard usage ends up
using flags from *THREE* separate flag spaces:
open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE|O_CLOEXEC|AT_RECURSIVE);
(Note that O_CLOEXEC is also platform-specific, so several future
OPEN_TREE_* bits are also made unusable in one fell swoop.)
It's not entirely clear to me what the "right" choice is for new
syscalls. Just saying that all future VFS syscalls should use AT_* flags
doesn't seem practical. openat2(2) has RESOLVE_* flags (many of which
don't make much sense to burn generic AT_* flags for) and move_mount(2)
has separate AT_*-like flags for both the source and target so separate
flags are needed anyway (though it seems possible that renameat2(2)
could grow *_EMPTY_PATH flags at some point, and it's a bit of a shame
they can't be reused).
But at least for syscalls that _do_ choose to use AT_* flags, we should
explicitly state the policy that 0x2ff is currently intended for
per-syscall flags and that new flags should err on the side of
overlapping with existing flag bits (so we can extend the scope of
generic flags in the future if necessary).
And add AT_* aliases for the RENAME_* flags to further cement that
renameat2(2) is an *at(2) flag, just with its own per-syscall flags.
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/r/20240828-exportfs-u64-mount-id-v3-1-10c2c4c16708@cyphar.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Diffstat (limited to 'fs/fhandle.c')
0 files changed, 0 insertions, 0 deletions