blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-12-13	selftests/bpf: Allow VLAN packets in xdp_hw_metadata	Larysa Zaremba	2	-1/+17
	Make VLAN c-tag and s-tag XDP hint testing more convenient by not skipping VLAN-ed packets. Allow both 802.1ad and 802.1Q headers. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-16-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	mlx5: implement VLAN tag XDP hint	Larysa Zaremba	2	-1/+16
	Implement the newly added .xmo_rx_vlan_tag() hint function. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20231205210847.28460-15-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	net: make vlan_get_tag() return -ENODATA instead of -EINVAL	Larysa Zaremba	1	-2/+2
	__vlan_hwaccel_get_tag() is used in veth XDP hints implementation, its return value (-EINVAL if skb is not VLAN tagged) is passed to bpf code, but XDP hints specification requires drivers to return -ENODATA, if a hint cannot be provided for a particular packet. Solve this inconsistency by changing error return value of __vlan_hwaccel_get_tag() from -EINVAL to -ENODATA, do the same thing to __vlan_get_tag(), because this function is supposed to follow the same convention. This, in turn, makes -ENODATA the only non-zero value vlan_get_tag() can return. We can do this with no side effects, because none of the users of the 3 above-mentioned functions rely on the exact value. Suggested-by: Jesper Dangaard Brouer <jbrouer@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-14-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	veth: Implement VLAN tag XDP hint	Larysa Zaremba	1	-0/+19
	In order to test VLAN tag hint in hardware-independent selftests, implement newly added hint in veth driver. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-13-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	ice: use VLAN proto from ring packet context in skb path	Larysa Zaremba	2	-10/+6
	VLAN proto, used in ice XDP hints implementation is stored in ring packet context. Utilize this value in skb VLAN processing too instead of checking netdev features. At the same time, use vlan_tci instead of vlan_tag in touched code, because VLAN tag often refers to VLAN proto and VLAN TCI combined, while in the code we clearly store only VLAN TCI. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-12-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	ice: Implement VLAN tag hint	Larysa Zaremba	6	-9/+59
	Implement .xmo_rx_vlan_tag callback to allow XDP code to read packet's VLAN tag. At the same time, use vlan_tci instead of vlan_tag in touched code, because VLAN tag often refers to VLAN proto and VLAN TCI combined, while in the code we clearly store only VLAN TCI. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-11-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	xdp: Add VLAN tag hint	Larysa Zaremba	7	-1/+57
	Implement functionality that enables drivers to expose VLAN tag to XDP code. VLAN tag is represented by 2 variables: - protocol ID, which is passed to bpf code in BE - VLAN TCI, in host byte order Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20231205210847.28460-10-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	ice: Support XDP hints in AF_XDP ZC mode	Larysa Zaremba	2	-0/+19
	In AF_XDP ZC, xdp_buff is not stored on ring, instead it is provided by xsk_buff_pool. Space for metadata sources right after such buffers was already reserved in commit 94ecc5ca4dbf ("xsk: Add cb area to struct xdp_buff_xsk"). Some things (such as pointer to packet context) do not change on a per-packet basis, so they can be set at the same time as RX queue info. On the other hand, RX descriptor is unique for each packet, but is already known when setting DMA addresses. This minimizes performance impact of hints on regular packet processing. Update AF_XDP ZC packet processing to support XDP hints. Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-9-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	xsk: add functions to fill control buffer	Maciej Fijalkowski	3	-0/+31
	Commit 94ecc5ca4dbf ("xsk: Add cb area to struct xdp_buff_xsk") has added a buffer for custom data to xdp_buff_xsk. Particularly, this memory is used for data, consumed by XDP hints kfuncs. It does not always change on a per-packet basis and some parts can be set for example, at the same time as RX queue info. Add functions to fill all cbs in xsk_buff_pool with the same metadata. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-8-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	ice: Support RX hash XDP hint	Larysa Zaremba	3	-204/+284
	RX hash XDP hint requests both hash value and type. Type is XDP-specific, so we need a separate way to map these values to the hardware ptypes, so create a lookup table. Instead of creating a new long list, reuse contents of ice_decode_rx_desc_ptype[] through preprocessor. Current hash type enum does not contain ICMP packet type, but ice devices support it, so also add a new type into core code. Then use previously refactored code and create a function that allows XDP code to read RX hash. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-7-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	ice: Support HW timestamp hint	Larysa Zaremba	7	-7/+42
	Use previously refactored code and create a function that allows XDP code to read HW timestamp. Also, introduce packet context, where hints-related data will be stored. ice_xdp_buff contains only a pointer to this structure, to avoid copying it in ZC mode later in the series. HW timestamp is the first supported hint in the driver, so also add xdp_metadata_ops. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-6-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	ice: Introduce ice_xdp_buff	Larysa Zaremba	3	-5/+30
	In order to use XDP hints via kfuncs we need to put RX descriptor and miscellaneous data next to xdp_buff. Same as in hints implementations in other drivers, we achieve this through putting xdp_buff into a child structure. Currently, xdp_buff is stored in the ring structure, so replace it with union that includes child structure. This way enough memory is available while existing XDP code remains isolated from hints. Minimum size of the new child structure (ice_xdp_buff) is exactly 64 bytes (single cache line). To place it at the start of a cache line, move 'next' field from CL1 to CL4, as it isn't used often. This still leaves 192 bits available in CL3 for packet context extensions. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-5-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	ice: Make ptype internal to descriptor info processing	Larysa Zaremba	4	-13/+16
	Currently, rx_ptype variable is used only as an argument to ice_process_skb_fields() and is computed just before the function call. Therefore, there is no reason to pass this value as an argument. Instead, remove this argument and compute the value directly inside ice_process_skb_fields() function. Also, separate its calculation into a short function, so the code can later be reused in .xmo_() callbacks. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-4-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	ice: make RX HW timestamp reading code more reusable	Larysa Zaremba	3	-20/+36
	Previously, we only needed RX HW timestamp in skb path, hence all related code was written with skb in mind. But with the addition of XDP hints via kfuncs to the ice driver, the same logic will be needed in .xmo_() callbacks. Put generic process of reading RX HW timestamp from a descriptor into a separate function. Move skb-related code into another source file. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-3-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	ice: make RX hash reading code more reusable	Larysa Zaremba	1	-11/+25
	Previously, we only needed RX hash in skb path, hence all related code was written with skb in mind. But with the addition of XDP hints via kfuncs to the ice driver, the same logic will be needed in .xmo_() callbacks. Separate generic process of reading RX hash from a descriptor into a separate function. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20231205210847.28460-2-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'	Alexei Starovoitov	14	-473/+1065
	Andrii Nakryiko says: ==================== BPF token support in libbpf's BPF object Add fuller support for BPF token in high-level BPF object APIs. This is the most frequently used way to work with BPF using libbpf, so supporting BPF token there is critical. Patch #1 is improving kernel-side BPF_TOKEN_CREATE behavior by rejecting to create "empty" BPF token with no delegation. This seems like saner behavior which also makes libbpf's caching better overall. If we ever want to create BPF token with no delegate_xxx options set on BPF FS, we can use a new flag to enable that. Patches #2-#5 refactor libbpf internals, mostly feature detection code, to prepare it from BPF token FD. Patch #6 adds options to pass BPF token into BPF object open options. It also adds implicit BPF token creation logic to BPF object load step, even without any explicit involvement of the user. If the environment is setup properly, BPF token will be created transparently and used implicitly. This allows for all existing application to gain BPF token support by just linking with latest version of libbpf library. No source code modifications are required. All that under assumption that privileged container management agent properly set up default BPF FS instance at /sys/bpf/fs to allow BPF token creation. Patches #7-#8 adds more selftests, validating BPF object APIs work as expected under unprivileged user namespaced conditions in the presence of BPF token. Patch #9 extends libbpf with LIBBPF_BPF_TOKEN_PATH envvar knowledge, which can be used to override custom BPF FS location used for implicit BPF token creation logic without needing to adjust application code. This allows admins or container managers to mount BPF token-enabled BPF FS at non-standard location without the need to coordinate with applications. LIBBPF_BPF_TOKEN_PATH can also be used to disable BPF token implicit creation by setting it to an empty value. Patch #10 tests this new envvar functionality. v2->v3: - move some stray feature cache refactorings into patch #4 (Alexei); - add LIBBPF_BPF_TOKEN_PATH envvar support (Alexei); v1->v2: - remove minor code redundancies (Eduard, John); - add acks and rebase. ==================== Link: https://lore.kernel.org/r/20231213190842.3844987-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	selftests/bpf: add tests for LIBBPF_BPF_TOKEN_PATH envvar	Andrii Nakryiko	1	-0/+112
	Add new subtest validating LIBBPF_BPF_TOKEN_PATH envvar semantics. Extend existing test to validate that LIBBPF_BPF_TOKEN_PATH allows to disable implicit BPF token creation by setting envvar to empty string. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-11-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar	Andrii Nakryiko	2	-6/+21
	To allow external admin authority to override default BPF FS location (/sys/fs/bpf) for implicit BPF token creation, teach libbpf to recognize LIBBPF_BPF_TOKEN_PATH envvar. If it is specified and user application didn't explicitly specify neither bpf_token_path nor bpf_token_fd option, it will be treated exactly like bpf_token_path option, overriding default /sys/fs/bpf location and making BPF token mandatory. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	selftests/bpf: add tests for BPF object load with implicit token	Andrii Nakryiko	1	-0/+76
	Add a test to validate libbpf's implicit BPF token creation from default BPF FS location (/sys/fs/bpf). Also validate that disabling this implicit BPF token creation works. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	selftests/bpf: add BPF object loading tests with explicit token passing	Andrii Nakryiko	3	-0/+185
	Add a few tests that attempt to load BPF object containing privileged map, program, and the one requiring mandatory BTF uploading into the kernel (to validate token FD propagation to BPF_BTF_LOAD command). Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	libbpf: wire up BPF token support at BPF object level	Andrii Nakryiko	4	-12/+158
	Add BPF token support to BPF object-level functionality. BPF token is supported by BPF object logic either as an explicitly provided BPF token from outside (through BPF FS path or explicit BPF token FD), or implicitly (unless prevented through bpf_object_open_opts). Implicit mode is assumed to be the most common one for user namespaced unprivileged workloads. The assumption is that privileged container manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token delegation options (delegate_{cmds,maps,progs,attachs} mount options). BPF object during loading will attempt to create BPF token from /sys/fs/bpf location, and pass it for all relevant operations (currently, map creation, BTF load, and program load). In this implicit mode, if BPF token creation fails due to whatever reason (BPF FS is not mounted, or kernel doesn't support BPF token, etc), this is not considered an error. BPF object loading sequence will proceed with no BPF token. In explicit BPF token mode, user provides explicitly either custom BPF FS mount point path or creates BPF token on their own and just passes token FD directly. In such case, BPF object will either dup() token FD (to not require caller to hold onto it for entire duration of BPF object lifetime) or will attempt to create BPF token from provided BPF FS location. If BPF token creation fails, that is considered a critical error and BPF object load fails with an error. Libbpf provides a way to disable implicit BPF token creation, if it causes any troubles (BPF token is designed to be completely optional and shouldn't cause any problems even if provided, but in the world of BPF LSM, custom security logic can be installed that might change outcome dependin on the presence of BPF token). To disable libbpf's default BPF token creation behavior user should provide either invalid BPF token FD (negative), or empty bpf_token_path option. BPF token presence can influence libbpf's feature probing, so if BPF object has associated BPF token, feature probing is instructed to use BPF object-specific feature detection cache and token FD. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	libbpf: wire up token_fd into feature probing logic	Andrii Nakryiko	5	-46/+66
	Adjust feature probing callbacks to take into account optional token_fd. In unprivileged contexts, some feature detectors would fail to detect kernel support just because BPF program, BPF map, or BTF object can't be loaded due to privileged nature of those operations. So when BPF object is loaded with BPF token, this token should be used for feature probing. This patch is setting support for this scenario, but we don't yet pass non-zero token FD. This will be added in the next patch. We also switched BPF cookie detector from using kprobe program to tracepoint one, as tracepoint is somewhat less dangerous BPF program type and has higher likelihood of being allowed through BPF token in the future. This change has no effect on detection behavior. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	libbpf: move feature detection code into its own file	Andrii Nakryiko	6	-466/+479
	It's quite a lot of well isolated code, so it seems like a good candidate to move it out of libbpf.c to reduce its size. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	libbpf: further decouple feature checking logic from bpf_object	Andrii Nakryiko	3	-11/+22
	Add feat_supported() helper that accepts feature cache instead of bpf_object. This allows low-level code in bpf.c to not know or care about higher-level concept of bpf_object, yet it will be able to utilize custom feature checking in cases where BPF token might influence the outcome. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	libbpf: split feature detectors definitions from cached results	Andrii Nakryiko	1	-6/+12
	Split a list of supported feature detectors with their corresponding callbacks from actual cached supported/missing values. This will allow to have more flexible per-token or per-object feature detectors in subsequent refactorings. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	bpf: fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS	Andrii Nakryiko	1	-1/+9
	It's quite confusing in practice when it's possible to successfully create a BPF token from BPF FS that didn't have any of delegate_xxx mount options set up. While it's not wrong, it's actually more meaningful to reject BPF_TOKEN_CREATE with specific error code (-ENOENT) to let user-space know that no token delegation is setup up. So, instead of creating empty BPF token that will be always ignored because it doesn't have any of the allow_xxx bits set, reject it with -ENOENT. If we ever need empty BPF token to be possible, we can support that with extra flag passed into BPF_TOKEN_CREATE. Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231213190842.3844987-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13	bpf: selftests: Add verifier tests for CO-RE bitfield writes	Daniel Xu	2	-0/+102
	Add some tests that exercise BPF_CORE_WRITE_BITFIELD() macro. Since some non-trivial bit fiddling is going on, make sure various edge cases (such as adjacent bitfields and bitfields at the edge of structs) are exercised. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/72698a1080fa565f541d5654705255984ea2a029.1702325874.git.dxu@dxuuu.xyz Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-13	bpf: selftests: test_loader: Support __btf_path() annotation	Daniel Xu	2	-0/+8
	This commit adds support for per-prog btf_custom_path. This is necessary for testing CO-RE relocations on non-vmlinux types using test_loader infrastructure. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/660ea7f2fdbdd5103bc1af87c9fc931f05327926.1702325874.git.dxu@dxuuu.xyz Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-13	libbpf: Add BPF_CORE_WRITE_BITFIELD() macro	Daniel Xu	1	-0/+32
	=== Motivation === Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield writing wrapper to make the verifier happy. Two alternatives to this approach are: 1. Use the upcoming `preserve_static_offset` [0] attribute to disable CO-RE on specific structs. 2. Use broader byte-sized writes to write to bitfields. (1) is a bit hard to use. It requires specific and not-very-obvious annotations to bpftool generated vmlinux.h. It's also not generally available in released LLVM versions yet. (2) makes the code quite hard to read and write. And especially if BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to to have an inverse helper for writing. === Implementation details === Since the logic is a bit non-obvious, I thought it would be helpful to explain exactly what's going on. To start, it helps by explaining what LSHIFT_U64 (lshift) and RSHIFT_U64 (rshift) is designed to mean. Consider the core of the BPF_CORE_READ_BITFIELD() algorithm: val <<= __CORE_RELO(s, field, LSHIFT_U64); val = val >> __CORE_RELO(s, field, RSHIFT_U64); Basically what happens is we lshift to clear the non-relevant (blank) higher order bits. Then we rshift to bring the relevant bits (bitfield) down to LSB position (while also clearing blank lower order bits). To illustrate: Start: ........XXX...... Lshift: XXX......00000000 Rshift: 00000000000000XXX where `.` means blank bit, `0` means 0 bit, and `X` means bitfield bit. After the two operations, the bitfield is ready to be interpreted as a regular integer. Next, we want to build an alternative (but more helpful) mental model on lshift and rshift. That is, to consider: * rshift as the total number of blank bits in the u64 * lshift as number of blank bits left of the bitfield in the u64 Take a moment to consider why that is true by consulting the above diagram. With this insight, we can now define the following relationship: bitfield _ \| \| 0.....00XXX0...00 \| \| \| \| \|______\| \| \| lshift \| \| \|____\| (rshift - lshift) That is, we know the number of higher order blank bits is just lshift. And the number of lower order blank bits is (rshift - lshift). Finally, we can examine the core of the write side algorithm: mask = (~0ULL << rshift) >> lshift; // 1 val = (val & ~mask) \| ((nval << rpad) & mask); // 2 1. Compute a mask where the set bits are the bitfield bits. The first left shift zeros out exactly the number of blank bits, leaving a bitfield sized set of 1s. The subsequent right shift inserts the correct amount of higher order blank bits. 2. On the left of the `\|`, mask out the bitfield bits. This creates 0s where the new bitfield bits will go. On the right of the `\|`, bring nval into the correct bit position and mask out any bits that fall outside of the bitfield. Finally, by bor'ing the two halves, we get the final set of bits to write back. [0]: https://reviews.llvm.org/D133361 Co-developed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Co-developed-by: Jonathan Lemon <jlemon@aviatrix.com> Signed-off-by: Jonathan Lemon <jlemon@aviatrix.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/4d3dd215a4fd57d980733886f9c11a45e1a9adf3.1702325874.git.dxu@dxuuu.xyz Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-13	bpf: Support uid and gid when mounting bpffs	Jie Jiang	2	-1/+51
	Parse uid and gid in bpf_parse_param() so that they can be passed in as the `data` parameter when mount() bpffs. This will be useful when we want to control which user/group has the control to the mounted bpffs, otherwise a separate chown() call will be needed. Signed-off-by: Jie Jiang <jiejiang@chromium.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mike Frysinger <vapier@chromium.org> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231212093923.497838-1-jiejiang@chromium.org
2023-12-13	selftests/bpf: fix compiler warnings in RELEASE=1 mode	Andrii Nakryiko	2	-2/+2
	When compiling BPF selftests with RELEASE=1, we get two new warnings, which are treated as errors. Fix them. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20231212225343.1723081-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-12	selftests/bpf: Relax time_tai test for equal timestamps in tai_forward	YiFei Zhu	1	-1/+1
	We're observing test flakiness on an arm64 platform which might not have timestamps as precise as x86. The test log looks like: test_time_tai:PASS:tai_open 0 nsec test_time_tai:PASS:test_run 0 nsec test_time_tai:PASS:tai_ts1 0 nsec test_time_tai:PASS:tai_ts2 0 nsec test_time_tai:FAIL:tai_forward unexpected tai_forward: actual 1702348135471494160 <= expected 1702348135471494160 test_time_tai:PASS:tai_gettime 0 nsec test_time_tai:PASS:tai_future_ts1 0 nsec test_time_tai:PASS:tai_future_ts2 0 nsec test_time_tai:PASS:tai_range_ts1 0 nsec test_time_tai:PASS:tai_range_ts2 0 nsec #199 time_tai:FAIL This patch changes ASSERT_GT to ASSERT_GE in the tai_forward assertion so that equal timestamps are permitted. Fixes: 64e15820b987 ("selftests/bpf: Add BPF-helper test for CLOCK_TAI access") Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231212182911.3784108-1-zhuyifei@google.com
2023-12-12	bpf: Comment on check_mem_size_reg	Andrei Matei	1	-0/+6
	This patch adds a comment to check_mem_size_reg -- a function whose meaning is not very transparent. The function implicitly deals with two registers connected by convention, which is not obvious. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231210225149.67639-1-andreimatei1@gmail.com
2023-12-12	bpf: Remove unused backtrack_state helper functions	Yang Li	1	-15/+0
	The function are defined in the verifier.c file, but not called elsewhere, so delete the unused function. kernel/bpf/verifier.c:3448:20: warning: unused function 'bt_set_slot' kernel/bpf/verifier.c:3453:20: warning: unused function 'bt_clear_slot' kernel/bpf/verifier.c:3488:20: warning: unused function 'bt_is_slot_set' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20231212005436.103829-1-yang.lee@linux.alibaba.com Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7714
2023-12-12	selftests/bpf: Fixes tests for filesystem kfuncs	Manu Bretelle	1	-0/+8
	`fs_kfuncs.c`'s `test_xattr` would fail the test even when the filesystem did not support xattr, for instance when /tmp is mounted as tmpfs. This change checks errno when setxattr fail. If the failure is due to the operation being unsupported, we will skip the test (just like we would if verity was not enabled on the FS. Before the change, fs_kfuncs test would fail in test_axattr: $ vmtest -k $(make -s image_name) './tools/testing/selftests/bpf/test_progs -a fs_kfuncs' => bzImage ===> Booting [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ ===> Setting up VM ===> Running command [ 4.157491] bpf_testmod: loading out-of-tree module taints kernel. [ 4.161515] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel test_xattr:PASS:create_file 0 nsec test_xattr:FAIL:setxattr unexpected error: -1 (errno 95) #90/1 fs_kfuncs/xattr:FAIL #90/2 fs_kfuncs/fsverity:SKIP #90 fs_kfuncs:FAIL All error logs: test_xattr:PASS:create_file 0 nsec test_xattr:FAIL:setxattr unexpected error: -1 (errno 95) #90/1 fs_kfuncs/xattr:FAIL #90 fs_kfuncs:FAIL Summary: 0/0 PASSED, 1 SKIPPED, 1 FAILED Test plan: $ touch tmpfs_file && truncate -s 1G tmpfs_file && mkfs.ext4 tmpfs_file # /tmp mounted as tmpfs $ vmtest -k $(make -s image_name) './tools/testing/selftests/bpf/test_progs -a fs_kfuncs' => bzImage ===> Booting ===> Setting up VM ===> Running command WARNING! Selftests relying on bpf_testmod.ko will be skipped. Can't find bpf_testmod.ko kernel module: -2 #90/1 fs_kfuncs/xattr:SKIP #90/2 fs_kfuncs/fsverity:SKIP #90 fs_kfuncs:SKIP Summary: 1/0 PASSED, 2 SKIPPED, 0 FAILED # /tmp mounted as ext4 with xattr enabled but not verity $ vmtest -k $(make -s image_name) 'mount -o loop tmpfs_file /tmp && \ /tools/testing/selftests/bpf/test_progs -a fs_kfuncs' => bzImage ===> Booting ===> Setting up VM ===> Running command [ 4.067071] loop0: detected capacity change from 0 to 2097152 [ 4.191882] EXT4-fs (loop0): mounted filesystem 407ffa36-4553-4c8c-8c78-134443630f69 r/w with ordered data mode. Quota mode: none. WARNING! Selftests relying on bpf_testmod.ko will be skipped. Can't find bpf_testmod.ko kernel module: -2 #90/1 fs_kfuncs/xattr:OK #90/2 fs_kfuncs/fsverity:SKIP #90 fs_kfuncs:OK (SKIP: 1/2) Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED $ tune2fs -O verity tmpfs_file # /tmp as ext4 with both xattr and verity enabled $ vmtest -k $(make -s image_name) 'mount -o loop tmpfs_file /tmp && \ ./tools/testing/selftests/bpf/test_progs -a fs_kfuncs' => bzImage ===> Booting ===> Setting up VM ===> Running command [ 4.291434] loop0: detected capacity change from 0 to 2097152 [ 4.460828] EXT4-fs (loop0): recovery complete [ 4.468631] EXT4-fs (loop0): mounted filesystem 7b4a7b7f-c442-4b06-9ede-254e63cceb52 r/w with ordered data mode. Quota mode: none. [ 4.988074] fs-verity: sha256 using implementation "sha256-generic" WARNING! Selftests relying on bpf_testmod.ko will be skipped. Can't find bpf_testmod.ko kernel module: -2 #90/1 fs_kfuncs/xattr:OK #90/2 fs_kfuncs/fsverity:OK #90 fs_kfuncs:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Fixes: 341f06fdddf7 ("selftests/bpf: Add tests for filesystem kfuncs") Signed-off-by: Manu Bretelle <chantr4@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20231211180733.763025-1-chantr4@gmail.com
2023-12-11	bpf: use bitfields for simple per-subprog bool flags	Andrii Nakryiko	1	-6/+6
	We have a bunch of bool flags for each subprog. Instead of wasting bytes for them, use bitfields instead. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231204233931.49758-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11	bpf: tidy up exception callback management a bit	Andrii Nakryiko	3	-23/+42
	Use the fact that we are passing subprog index around and have a corresponding struct bpf_subprog_info in bpf_verifier_env for each subprogram. We don't need to separately pass around a flag whether subprog is exception callback or not, each relevant verifier function can determine this using provided subprog index if we maintain bpf_subprog_info properly. Also move out exception callback-specific logic from btf_prepare_func_args(), keeping it generic. We can enforce all these restriction right before exception callback verification pass. We add out parameter, arg_cnt, for now, but this will be unnecessary with subsequent refactoring and will be removed. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231204233931.49758-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11	bpf: emit more dynptr information in verifier log	Andrii Nakryiko	1	-9/+16
	Emit dynptr type for CONST_PTR_TO_DYNPTR register. Also emit id, ref_obj_id, and dynptr_id fields for STACK_DYNPTR stack slots. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231204233931.49758-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11	bpf: log PTR_TO_MEM memory size in verifier log	Andrii Nakryiko	1	-0/+4
	Emit valid memory size addressable through PTR_TO_MEM register. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231204233931.49758-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11	selftests/bpf: validate eliminated global subprog is not freplaceable	Andrii Nakryiko	3	-3/+83
	Add selftest that establishes dead code-eliminated valid global subprog (global_dead) and makes sure that it's not possible to freplace it, as it's effectively not there. This test will fail with unexpected success before 2afae08c9dcb ("bpf: Validate global subprogs lazily"). v2->v3: - add missing err assignment (Alan); - undo unnecessary signature changes in verifier_global_subprogs.c (Eduard); v1->v2: - don't rely on assembly output in verifier log, which changes between compiler versions (CI). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20231211174131.2324306-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11	net, xdp: Allow metadata > 32	Aleksander Lobakin	2	-6/+14
	32 bytes may be not enough for some custom metadata. Relax the restriction, allow metadata larger than 32 bytes and make __skb_metadata_differs() work with bigger lengths. Now size of metadata is only limited by the fact it is stored as u8 in skb_shared_info, so maximum possible value is 255. Size still has to be aligned to 4, so the actual upper limit becomes 252. Most driver implementations will offer less, none can offer more. Other important conditions, such as having enough space for xdp_frame building, are already checked in bpf_xdp_adjust_meta(). Signed-off-by: Aleksander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/eb87653c-8ff8-447d-a7a1-25961f60518a@kernel.org Link: https://lore.kernel.org/bpf/20231206205919.404415-3-larysa.zaremba@intel.com
2023-12-11	selftests/bpf: Increase invalid metadata size	Larysa Zaremba	1	-2/+2
	Changed check expects passed data meta to be deemed invalid. After loosening the requirement, the size of 36 bytes becomes valid. Therefore, increase tested meta size to 256, so we do not get an unexpected success. Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231206205919.404415-2-larysa.zaremba@intel.com
2023-12-09	Merge branch 'add-new-bpf_cpumask_weight-kfunc'	Alexei Starovoitov	5	-1/+58
	David Vernet says: ==================== Add new bpf_cpumask_weight() kfunc It can be useful to query how many bits are set in a cpumask. For example, if you want to perform special logic for the last remaining core that's set in a mask. This logic is already exposed through the main kernel's cpumask header as cpumask_weight(), so it would be useful to add a new bpf_cpumask_weight() kfunc which wraps it and does the same. This patch series was built and tested on top of commit 2146f7fe6e02 ("Merge branch 'allocate-bpf-trampoline-on-bpf_prog_pack'"). ==================== Link: https://lore.kernel.org/r/20231207210843.168466-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09	selftests/bpf: Add test for bpf_cpumask_weight() kfunc	David Vernet	3	-0/+45
	The new bpf_cpumask_weight() kfunc can be used to count the number of bits that are set in a struct cpumask* kptr. Let's add a selftest to verify its behavior. Signed-off-by: David Vernet <void@manifault.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231207210843.168466-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09	bpf: Add bpf_cpumask_weight() kfunc	David Vernet	2	-1/+13
	It can be useful to query how many bits are set in a cpumask. For example, if you want to perform special logic for the last remaining core that's set in a mask. Let's therefore add a new bpf_cpumask_weight() kfunc which checks how many bits are set in a mask. Signed-off-by: David Vernet <void@manifault.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231207210843.168466-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09	test_bpf: Rename second ALU64_SMOD_X to ALU64_SMOD_K	Tiezhu Yang	1	-1/+1
	Currently, there are two test cases with same name "ALU64_SMOD_X: -7 % 2 = -1", the first one is right, the second one should be ALU64_SMOD_K because its code is BPF_ALU64 \| BPF_MOD \| BPF_K. Before: test_bpf: #170 ALU64_SMOD_X: -7 % 2 = -1 jited:1 4 PASS test_bpf: #171 ALU64_SMOD_X: -7 % 2 = -1 jited:1 4 PASS After: test_bpf: #170 ALU64_SMOD_X: -7 % 2 = -1 jited:1 4 PASS test_bpf: #171 ALU64_SMOD_K: -7 % 2 = -1 jited:1 4 PASS Fixes: daabb2b098e0 ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231207040851.19730-1-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09	selftests/bpf: validate fake register spill/fill precision backtracking logic	Andrii Nakryiko	1	-0/+154
	Add two tests validating that verifier's precision backtracking logic handles BPF_ST_MEM instructions that produce fake register spill into register slot. This is happening when non-zero constant is written directly to a slot, e.g., (u64 )(r10 -8) = 123. Add both full 64-bit register spill, as well as 32-bit "sub-spill". Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231209010958.66758-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09	bpf: handle fake register spill to stack with BPF_ST_MEM instruction	Andrii Nakryiko	1	-1/+0
	When verifier validates BPF_ST_MEM instruction that stores known constant to stack (e.g., (u64 )(r10 - 8) = 123), it effectively spills a fake register with a constant (but initially imprecise) value to a stack slot. Because read-side logic treats it as a proper register fill from stack slot, we need to mark such stack slot initialization as INSN_F_STACK_ACCESS instruction to stop precision backtracking from missing it. Fixes: 41f6f64e6999 ("bpf: support non-r10 register spill/fill to/from stack in precision tracking") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231209010958.66758-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09	Merge branch 'bpf-fixes-for-maybe_wait_bpf_programs'	Alexei Starovoitov	1	-5/+14
	Hou Tao says: ==================== The patch set aims to fix the problems found when inspecting the code related with maybe_wait_bpf_programs(). Patch #1 removes unnecessary invocation of maybe_wait_bpf_programs(). Patch #2 calls maybe_wait_bpf_programs() only once for batched update. Patch #3 adds the missed waiting when doing batched lookup_deletion on htab of maps. Patch #4 does wait only if the update or deletion operation succeeds. Patch #5 fixes the value of batch.count when memory allocation fails. ==================== Link: https://lore.kernel.org/r/20231208102355.2628918-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09	bpf: Set uattr->batch.count as zero before batched update or deletion	Hou Tao	1	-0/+6
	generic_map_{delete,update}_batch() doesn't set uattr->batch.count as zero before it tries to allocate memory for key. If the memory allocation fails, the value of uattr->batch.count will be incorrect. Fix it by setting uattr->batch.count as zero beore batched update or deletion. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231208102355.2628918-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>