blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-10-12	tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function	Ammar Faizi	1	-13/+7
	This function is only called by memcpy(), there is no real reason to have this wrapper. Delete this function and move the code to memcpy() directly. Signed-off-by: Ammar Faizi <[email protected]> Reviewed-by: Alviro Iskandar Setiawan <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]>
2023-10-12	tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function	Ammar Faizi	1	-10/+0
	This nolibc internal function is not used. Delete it. It was probably supposed to handle memmove(), but today the memmove() has its own implementation. Signed-off-by: Ammar Faizi <[email protected]> Reviewed-by: Alviro Iskandar Setiawan <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]>
2023-10-12	tools/nolibc: x86-64: Use `rep stosb` for `memset()`	Ammar Faizi	2	-0/+15
	Simplify memset() on the x86-64 arch. The x86-64 arch has a 'rep stosb' instruction, which can perform memset() using only a single instruction, given: %al = value (just like the second argument of memset()) %rdi = destination %rcx = length Before this patch: ``` 00000000000010c9 <memset>: 10c9: 48 89 f8 mov %rdi,%rax 10cc: 48 85 d2 test %rdx,%rdx 10cf: 74 0e je 10df <memset+0x16> 10d1: 31 c9 xor %ecx,%ecx 10d3: 40 88 34 08 mov %sil,(%rax,%rcx,1) 10d7: 48 ff c1 inc %rcx 10da: 48 39 ca cmp %rcx,%rdx 10dd: 75 f4 jne 10d3 <memset+0xa> 10df: c3 ret ``` After this patch: ``` 0000000000001511 <memset>: 1511: 96 xchg %eax,%esi 1512: 48 89 d1 mov %rdx,%rcx 1515: 57 push %rdi 1516: f3 aa rep stos %al,%es:(%rdi) 1518: 58 pop %rax 1519: c3 ret ``` v2: - Use pushq %rdi / popq %rax (Alviro). - Use xchg %eax, %esi (Willy). Link: https://lore.kernel.org/lkml/[email protected] Suggested-by: Alviro Iskandar Setiawan <[email protected]> Suggested-by: Willy Tarreau <[email protected]> Signed-off-by: Ammar Faizi <[email protected]> Reviewed-by: Alviro Iskandar Setiawan <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]>
2023-10-12	tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`	Ammar Faizi	2	-0/+33
	Simplify memcpy() and memmove() on the x86-64 arch. The x86-64 arch has a 'rep movsb' instruction, which can perform memcpy() using only a single instruction, given: %rdi = destination %rsi = source %rcx = length Additionally, it can also handle the overlapping case by setting DF=1 (backward copy), which can be used as the memmove() implementation. Before this patch: ``` 00000000000010ab <memmove>: 10ab: 48 89 f8 mov %rdi,%rax 10ae: 31 c9 xor %ecx,%ecx 10b0: 48 39 f7 cmp %rsi,%rdi 10b3: 48 83 d1 ff adc $0xffffffffffffffff,%rcx 10b7: 48 85 d2 test %rdx,%rdx 10ba: 74 25 je 10e1 <memmove+0x36> 10bc: 48 83 c9 01 or $0x1,%rcx 10c0: 48 39 f0 cmp %rsi,%rax 10c3: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi 10ca: 48 0f 43 fa cmovae %rdx,%rdi 10ce: 48 01 cf add %rcx,%rdi 10d1: 44 8a 04 3e mov (%rsi,%rdi,1),%r8b 10d5: 44 88 04 38 mov %r8b,(%rax,%rdi,1) 10d9: 48 01 cf add %rcx,%rdi 10dc: 48 ff ca dec %rdx 10df: 75 f0 jne 10d1 <memmove+0x26> 10e1: c3 ret 00000000000010e2 <memcpy>: 10e2: 48 89 f8 mov %rdi,%rax 10e5: 48 85 d2 test %rdx,%rdx 10e8: 74 12 je 10fc <memcpy+0x1a> 10ea: 31 c9 xor %ecx,%ecx 10ec: 40 8a 3c 0e mov (%rsi,%rcx,1),%dil 10f0: 40 88 3c 08 mov %dil,(%rax,%rcx,1) 10f4: 48 ff c1 inc %rcx 10f7: 48 39 ca cmp %rcx,%rdx 10fa: 75 f0 jne 10ec <memcpy+0xa> 10fc: c3 ret ``` After this patch: ``` // memmove is an alias for memcpy 000000000040133b <memcpy>: 40133b: 48 89 d1 mov %rdx,%rcx 40133e: 48 89 f8 mov %rdi,%rax 401341: 48 89 fa mov %rdi,%rdx 401344: 48 29 f2 sub %rsi,%rdx 401347: 48 39 ca cmp %rcx,%rdx 40134a: 72 03 jb 40134f <memcpy+0x14> 40134c: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40134e: c3 ret 40134f: 48 8d 7c 0f ff lea -0x1(%rdi,%rcx,1),%rdi 401354: 48 8d 74 0e ff lea -0x1(%rsi,%rcx,1),%rsi 401359: fd std 40135a: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40135c: fc cld 40135d: c3 ret ``` v3: - Make memmove as an alias for memcpy (Willy). - Make the forward copy the likely case (Alviro). v2: - Fix the broken memmove implementation (David). Link: https://lore.kernel.org/lkml/[email protected] Link: https://lore.kernel.org/lkml/[email protected] Suggested-by: David Laight <[email protected]> Signed-off-by: Ammar Faizi <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]>
2023-10-12	selftests/nolibc: use -nostdinc for nolibc-test	Thomas Weißschuh	1	-1/+1
	Avoid any accidental reliance on system includes. Signed-off-by: Thomas Weißschuh <[email protected]> Signed-off-by: Willy Tarreau <[email protected]>
2023-10-12	tools/nolibc: add stdarg.h header	Thomas Weißschuh	5	-5/+21
	This allows nolic to work with `-nostdinc` avoiding any reliance on system headers. The implementation has been lifted from musl libc 1.2.4. There is already an implementation of stdarg.h in include/linux/stdarg.h but that is GPL licensed and therefore not suitable for nolibc. The used compiler builtins have been validated to be at least available since GCC 4.1.2 and clang 3.0.0. Signed-off-by: Thomas Weißschuh <[email protected]> Signed-off-by: Willy Tarreau <[email protected]>
2023-10-12	tools/nolibc: mark start_c as weak	Thomas Weißschuh	1	-0/+1
	Otherwise the different instances of _start_c from each compilation unit will lead to linker errors: /usr/bin/ld: /tmp/ccSNvRqs.o: in function `_start_c': nolibc-test-foo.c:(.text.nolibc_memset+0x9): multiple definition of `_start_c'; /tmp/ccG25101.o:nolibc-test.c:(.text+0x1ea3): first defined here Fixes: 17336755150b ("tools/nolibc: add new crt.h with _start_c") Signed-off-by: Thomas Weißschuh <[email protected]> Link: https://lore.kernel.org/lkml/20231012-nolibc-start_c-multiple-v1-1-fbfc73e0283f@weissschuh.net/ Link: https://lore.kernel.org/lkml/[email protected]/ Acked-by: Willy Tarreau <[email protected]>
2023-10-12	tools/nolibc: i386: Fix a stack misalign bug on _start	Ammar Faizi	1	-1/+3
	The ABI mandates that the %esp register must be a multiple of 16 when executing a 'call' instruction. Commit 2ab446336b17 ("tools/nolibc: i386: shrink _start with _start_c") simplified the _start function, but it didn't take care of the %esp alignment, causing SIGSEGV on SSE and AVX programs that use aligned move instruction (e.g., movdqa, movaps, and vmovdqa). The 'and $-16, %esp' aligns the %esp at a multiple of 16. Then 'push %eax' will subtract the %esp by 4; thus, it breaks the 16-byte alignment. Make sure the %esp is correctly aligned after the push by subtracting 12 before the push. Extra: Add 'add $12, %esp' before the 'and $-16, %esp' to avoid over-estimating for particular cases as suggested by Willy. A test program to validate the %esp alignment on _start can be found at: https://lore.kernel.org/lkml/[email protected] [ Thomas: trim Fixes tag commit id ] Cc: Zhangjin Wu <[email protected]> Fixes: 2ab446336b17 ("tools/nolibc: i386: shrink _start with _start_c") Reported-by: Nicholas Rosenberg <[email protected]> Acked-by: Thomas Weißschuh <[email protected]> Signed-off-by: Ammar Faizi <[email protected]> Reviewed-by: Alviro Iskandar Setiawan <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]>
2023-10-12	selftests/thermel/intel: Add test to read power floor status	Srinivas Pandruvada	3	-0/+121
	Some SoCs have firmware support to notify, if the system can't lower power limit to a value requested from user space via RAPL constraints. This test program waits for notification of power floor and prints. This program can be used to test this feature and also allows other user space programs to use as a reference. Signed-off-by: Srinivas Pandruvada <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-10-12	KVM: selftests: Force load all supported XSAVE state in state test	Sean Christopherson	2	-0/+23
	Extend x86's state to forcefully load all host-supported xfeatures by modifying xstate_bv in the saved state. Stuffing xstate_bv ensures that the selftest is verifying KVM's full ABI regardless of whether or not the guest code is successful in getting various xfeatures out of their INIT state, e.g. see the disaster that is/was MPX. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-10-12	KVM: selftests: Load XSAVE state into untouched vCPU during state test	Sean Christopherson	1	-2/+17
	Expand x86's state test to load XSAVE state into a "dummy" vCPU prior to KVM_SET_CPUID2, and again with an empty guest CPUID model. Except for off-by-default features, i.e. AMX, KVM's ABI for KVM_SET_XSAVE is that userspace is allowed to load xfeatures so long as they are supported by the host. This is a regression test for a combination of KVM bugs where the state saved by KVM_GET_XSAVE{2} could not be loaded via KVM_SET_XSAVE if the saved xstate_bv would load guest-unsupported xfeatures. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-10-12	KVM: selftests: Touch relevant XSAVE state in guest for state test	Sean Christopherson	2	-0/+91
	Modify support XSAVE state in the "state test's" guest code so that saving and loading state via KVM_{G,S}ET_XSAVE actually does something useful, i.e. so that xstate_bv in XSAVE state isn't empty. Punt on BNDCSR for now, it's easier to just stuff that xfeature from the host side. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-10-12	KVM: riscv: selftests: get-reg-list print_reg should never fail	Andrew Jones	1	-51/+42
	When outputting the "new" register list we want to print all of the new registers, decoding as much as possible of each of them. Also, we don't want to assert while listing registers with '--list'. We output "/* UNKNOWN */" after each new register (which we were already doing for some), which should be enough. Signed-off-by: Andrew Jones <[email protected]> Reviewed-by: Haibo Xu <[email protected]> Signed-off-by: Anup Patel <[email protected]>
2023-10-12	KVM: riscv: selftests: Add condops extensions to get-reg-list test	Anup Patel	1	-0/+17
	We have a new conditional operations related ISA extensions so let us add these extensions to get-reg-list test. Signed-off-by: Anup Patel <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Signed-off-by: Anup Patel <[email protected]>
2023-10-12	KVM: riscv: selftests: Add smstateen registers to get-reg-list test	Anup Patel	1	-0/+34
	We have a new smstateen registers as separate sub-type of CSR ONE_REG interface so let us add these registers to get-reg-list test. Signed-off-by: Anup Patel <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Signed-off-by: Anup Patel <[email protected]>
2023-10-12	KVM: riscv: selftests: Add senvcfg register to get-reg-list test	Anup Patel	1	-0/+3
	We have a new senvcfg register in the general CSR ONE_REG interface so let us add it to get-reg-list test. Signed-off-by: Anup Patel <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Signed-off-by: Anup Patel <[email protected]>
2023-10-12	KVM: selftests: Add array order helpers to riscv get-reg-list	Andrew Jones	1	-39/+47
	Add a couple macros to use when filling arrays in order to ensure the elements are placed in the right order, regardless of the order we prefer to read them. And immediately apply the new macro to resorting the ISA extension lists alphabetically. Signed-off-by: Andrew Jones <[email protected]> Reviewed-by: Haibo Xu <[email protected]> Signed-off-by: Anup Patel <[email protected]>
2023-10-11	selftests/bpf: Add tests for cgroup unix socket address hooks	Daan De Meyer	10	-0/+883
	These selftests are written in prog_tests style instead of adding them to the existing test_sock_addr tests. Migrating the existing sock addr tests to prog_tests style is left for future work. This commit adds support for testing bind() sockaddr hooks, even though there's no unix socket sockaddr hook for bind(). We leave this code intact for when the INET and INET6 tests are migrated in the future which do support intercepting bind(). Signed-off-by: Daan De Meyer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-11	selftests/bpf: Make sure mount directory exists	Daan De Meyer	1	-0/+5
	The mount directory for the selftests cgroup tree might not exist so let's make sure it does exist by creating it ourselves if it doesn't exist. Signed-off-by: Daan De Meyer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-11	bpftool: Add support for cgroup unix socket address hooks	Daan De Meyer	5	-23/+38
	Add the necessary plumbing to hook up the new cgroup unix sockaddr hooks into bpftool. Signed-off-by: Daan De Meyer <[email protected]> Acked-by: Quentin Monnet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-11	libbpf: Add support for cgroup unix socket address hooks	Daan De Meyer	1	-0/+10
	Add the necessary plumbing to hook up the new cgroup unix sockaddr hooks into libbpf. Signed-off-by: Daan De Meyer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-11	bpf: Implement cgroup sockaddr hooks for unix sockets	Daan De Meyer	1	-4/+9
	These hooks allows intercepting connect(), getsockname(), getpeername(), sendmsg() and recvmsg() for unix sockets. The unix socket hooks get write access to the address length because the address length is not fixed when dealing with unix sockets and needs to be modified when a unix socket address is modified by the hook. Because abstract socket unix addresses start with a NUL byte, we cannot recalculate the socket address in kernelspace after running the hook by calculating the length of the unix socket path using strlen(). These hooks can be used when users want to multiplex syscall to a single unix socket to multiple different processes behind the scenes by redirecting the connect() and other syscalls to process specific sockets. We do not implement support for intercepting bind() because when using bind() with unix sockets with a pathname address, this creates an inode in the filesystem which must be cleaned up. If we rewrite the address, the user might try to clean up the wrong file, leaking the socket in the filesystem where it is never cleaned up. Until we figure out a solution for this (and a use case for intercepting bind()), we opt to not allow rewriting the sockaddr in bind() calls. We also implement recvmsg() support for connected streams so that after a connect() that is modified by a sockaddr hook, any corresponding recmvsg() on the connected socket can also be modified to make the connected program think it is connected to the "intended" remote. Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Daan De Meyer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-11	tools: ynl: use ynl-gen -o instead of stdout in Makefile	Jakub Kicinski	1	-2/+2
	Jiri added more careful handling of output of the code generator to avoid wiping out existing files in commit f65f305ae008 ("tools: ynl-gen: use temporary file for rendering") Make use of the -o option in the Makefiles, it is already used by ynl-regen.sh. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-11	selftests/bpf: Add missing section name tests for getpeername/getsockname	Daan De Meyer	1	-0/+20
	These were missed when these hooks were first added so add them now instead to make sure every sockaddr hook has a matching section name test. Signed-off-by: Daan De Meyer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-11	testing: nvdimm: make struct class structures constant	Greg Kroah-Hartman	2	-15/+16
	Now that the driver core allows for struct class to be in read-only memory, we should make all 'class' structures declared at build time placing them into read-only memory, instead of having to be dynamically allocated at runtime. Cc: Dan Williams <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Ira Weiny <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Tested-by: Ira Weiny <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Link: https://lore.kernel.org/r/2023100611-platinum-galleria-ceb3@gregkh Signed-off-by: Ira Weiny <[email protected]>
2023-10-11	selftests/hid: force using our compiled libbpf headers	Benjamin Tissoires	1	-0/+2
	Turns out that we were relying on the globally installed headers, not the ones we freshly compiled. Add a manual include in CFLAGS to sort this out. Tested-by: Nick Desaulniers <[email protected]> # Build Tested-by: Justin Stitt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Benjamin Tissoires <[email protected]>
2023-10-11	selftests/hid: do not manually call headers_install	Benjamin Tissoires	1	-6/+2
	"make headers" is a requirement before calling make on the selftests dir, so we should not have to manually install those headers Acked-by: Shuah Khan <[email protected]> Tested-by: Nick Desaulniers <[email protected]> # Build Tested-by: Justin Stitt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Benjamin Tissoires <[email protected]>
2023-10-11	selftests/hid: ensure we can compile the tests on kernels pre-6.3	Benjamin Tissoires	2	-3/+77
	For the hid-bpf tests to compile, we need to have the definition of struct hid_bpf_ctx. This definition is an internal one from the kernel and it is supposed to be defined in the generated vmlinux.h. This vmlinux.h header is generated based on the currently running kernel or if the kernel was already compiled in the tree. If you just compile the selftests without compiling the kernel beforehand and you are running on a 6.2 kernel, you'll end up with a vmlinux.h without the hid_bpf_ctx definition. Use the clever trick from tools/testing/selftests/bpf/progs/bpf_iter.h to force the definition of that symbol in case we don't find it in the BTF and also add __attribute__((preserve_access_index)) to further support CO-RE functionality for these tests. Signed-off-by: Justin Stitt <[email protected]> Tested-by: Nick Desaulniers <[email protected]> # Build Tested-by: Justin Stitt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Benjamin Tissoires <[email protected]>
2023-10-10	Merge tag 'hyperv-fixes-signed-20231009' of ↵	Linus Torvalds	2	-37/+235
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - fixes for Hyper-V VTL code (Saurabh Sengar and Olaf Hering) - fix hv_kvp_daemon to support keyfile based connection profile (Shradha Gupta) * tag 'hyperv-fixes-signed-20231009' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hv/hv_kvp_daemon:Support for keyfile based connection profile hyperv: reduce size of ms_hyperv_info x86/hyperv: Add common print prefix "Hyper-V" in hv_init x86/hyperv: Remove hv_vtl_early_init initcall x86/hyperv: Restrict get_vtl to only VTL platforms
2023-10-10	iommufd/selftest: Add domain_alloc_user() support in iommu mock	Yi Liu	3	-8/+29
	Add mock_domain_alloc_user() and a new test case for IOMMU_HWPT_ALLOC_NEST_PARENT. Link: https://lore.kernel.org/r/[email protected] Co-developed-by: Nicolin Chen <[email protected]> Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Yi Liu <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-10	hv/hv_kvp_daemon:Support for keyfile based connection profile	Shradha Gupta	2	-37/+235
	Ifcfg config file support in NetworkManger is deprecated. This patch provides support for the new keyfile config format for connection profiles in NetworkManager. The patch modifies the hv_kvp_daemon code to generate the new network configuration in keyfile format(.ini-style format) along with a ifcfg format configuration. The ifcfg format configuration is also retained to support easy backward compatibility for distro vendors. These configurations are stored in temp files which are further translated using the hv_set_ifconfig.sh script. This script is implemented by individual distros based on the network management commands supported. For example, RHEL's implementation could be found here: https://gitlab.com/redhat/centos-stream/src/hyperv-daemons/-/blob/c9s/hv_set_ifconfig.sh Debian's implementation could be found here: https://github.com/endlessm/linux/blob/master/debian/cloud-tools/hv_set_ifconfig The next part of this support is to let the Distro vendors consume these modified implementations to the new configuration format. Tested-on: Rhel9(Hyper-V, Azure)(nm and ifcfg files verified) Signed-off-by: Shradha Gupta <[email protected]> Reviewed-by: Saurabh Sengar <[email protected]> Reviewed-by: Ani Sinha <[email protected]> Signed-off-by: Wei Liu <[email protected]> Link: https://lore.kernel.org/r/1696847920-31125-1-git-send-email-shradhagupta@linux.microsoft.com
2023-10-09	tools: ynl-gen: handle do ops with no input attrs	Jakub Kicinski	1	-6/+11
	The code supports dumps with no input attributes currently thru a combination of special-casing and luck. Clean up the handling of ops with no inputs. Create empty Structs, and skip printing of empty types. This makes dos with no inputs work. Tested-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-09	selftests/bpf: Add BPF_FIB_LOOKUP_SRC tests	Martynas Pumputis	1	-6/+77
	This patch extends the existing fib_lookup test suite by adding two test cases (for each IP family): * Test source IP selection from the egressing netdev. * Test source IP selection when an IP route has a preferred src IP addr. Signed-off-by: Martynas Pumputis <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-09	bpf: Derive source IP addr via bpf_*_fib_lookup()	Martynas Pumputis	1	-0/+10
	Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC \| BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; /* the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-09	selftests/bpf: Add testcase for async callback return value failure	David Vernet	2	-2/+51
	A previous commit updated the verifier to print an accurate failure message for when someone specifies a nonzero return value from an async callback. This adds a testcase for validating that the verifier emits the correct message in such a case. Signed-off-by: David Vernet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-10-09	KVM: selftests: Test behavior of HWCR, a.k.a. MSR_K7_HWCR	Jim Mattson	2	-0/+48
	Verify the following behavior holds true for writes and reads of HWCR from host userspace: * Attempts to set bits 3, 6, or 8 are ignored * Bits 18 and 24 are the only bits that can be set * Any bit that can be set can also be cleared Signed-off-by: Jim Mattson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]>
2023-10-09	bpftool: Align bpf_load_and_run_opts insns and data	Ian Rogers	1	-20/+23
	A C string lacks alignment so use aligned arrays to avoid potential alignment problems. Switch to using sizeof (less 1 for the \0 terminator) rather than a hardcode size constant. Signed-off-by: Ian Rogers <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Quentin Monnet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-10-09	bpftool: Align output skeleton ELF code	Ian Rogers	1	-6/+9
	libbpf accesses the ELF data requiring at least 8 byte alignment, however, the data is generated into a C string that doesn't guarantee alignment. Fix this by assigning to an aligned char array. Use sizeof on the array, less one for the \0 terminator, rather than generating a constant. Fixes: a6cc6b34b93e ("bpftool: Provide a helper method for accessing skeleton's embedded ELF data") Signed-off-by: Ian Rogers <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Reviewed-by: Alan Maguire <[email protected]> Acked-by: Quentin Monnet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-10-09	selftests/bpf: Test pinning bpf timer to a core	David Vernet	2	-1/+66
	Now that we support pinning a BPF timer to the current core, we should test it with some selftests. This patch adds two new testcases to the timer suite, which verifies that a BPF timer both with and without BPF_F_TIMER_ABS, can be pinned to the calling core with BPF_F_TIMER_CPU_PIN. Signed-off-by: David Vernet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-10-09	bpf: Add ability to pin bpf timer to calling CPU	David Vernet	1	-0/+4
	BPF supports creating high resolution timers using bpf_timer_* helper functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which specifies that the timeout should be interpreted as absolute time. It would also be useful to be able to pin that timer to a core. For example, if you wanted to make a subset of cores run without timer interrupts, and only have the timer be invoked on a single core. This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag. When specified, the HRTIMER_MODE_PINNED flag is passed to hrtimer_start(). A subsequent patch will update selftests to validate. Signed-off-by: David Vernet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-10-06	selftests/bpf: Make seen_tc* variable tests more robust	Daniel Borkmann	3	-60/+46
	Martin reported that on his local dev machine the test_tc_chain_mixed() fails as "test_tc_chain_mixed:FAIL:seen_tc5 unexpected seen_tc5: actual 1 != expected 0" and others occasionally, too. However, when running in a more isolated setup (qemu in particular), it works fine for him. The reason is that there is a small race-window where seen_tc* could turn into true for various test cases when there is background traffic, e.g. after the asserts they often get reset. In such case when subsequent detach takes place, unrelated background traffic could have already flipped the bool to true beforehand. Add a small helper tc_skel_reset_all_seen() to reset all bools before we do the ping test. At this point, everything is set up as expected and therefore no race can occur. All tc_{opts,links} tests continue to pass after this change. Reported-by: Martin KaFai Lau <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-06	selftests/bpf: Test query on empty mprog and pass revision into attach	Daniel Borkmann	1	-0/+59
	Add a new test case to query on an empty bpf_mprog and pass the revision directly into expected_revision for attachment to assert that this does succeed. ./test_progs -t tc_opts [ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz [ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns [ 1.412419] clocksource: Switched to clocksource tsc [ 1.428671] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK #269 tc_opts_query_attach:OK <--- (new test) #270 tc_opts_replace:OK #271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-06	selftests/bpf: Adapt assert_mprog_count to always expect 0 count	Daniel Borkmann	3	-11/+8
	Simplify __assert_mprog_count() to remove the -ENOENT corner case as the bpf_prog_query() now returns 0 when no bpf_mprog is attached. This also allows to convert a few test cases from using raw __assert_mprog_count() over to plain assert_mprog_count() helper. Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-06	selftests/bpf: Test bpf_mprog query API via libbpf and raw syscall	Daniel Borkmann	1	-0/+167
	Add a new test case which performs double query of the bpf_mprog through libbpf API, but also via raw bpf(2) syscall. This is testing to gather first the count and then in a subsequent probe the full information with the program array without clearing passed structs in between. # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz [ 1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns [ 1.402734] clocksource: Switched to clocksource tsc [ 1.426639] bpf_testmod: loading out-of-tree module taints kernel. [ 1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK <--- (new test) #269 tc_opts_replace:OK #270 tc_opts_revision:OK Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-06	selftests: firmware: remove duplicate unneeded defines	Muhammad Usama Anjum	1	-4/+0
	These duplicate defines should automatically be picked up from kernel headers. Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-10-06	selftests: core: remove duplicate defines	Muhammad Usama Anjum	1	-28/+0
	Remove duplicate defines which are already defined in kernel headers and re-definition isn't required. Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-10-06	selftests: clone3: remove duplicate defines	Muhammad Usama Anjum	4	-21/+0
	Remove duplicate defines which are already included in kernel headers. MAX_PID_NS_LEVEL macro is used inside kernel only. It isn't exposed to userspace. So it is never defined in test application. Remove #ifndef in this case. Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-10-06	selftests: capabilities: remove duplicate unneeded defines	Muhammad Usama Anjum	3	-17/+1
	These duplicate defines should automatically be picked up from kernel headers. Use KHDR_INCLUDES to add kernel header files. Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-10-06	kselftest: vm: add tests for no-inherit memory-deny-write-execute	Florent Revest	1	-6/+108
	Add some tests to cover the new PR_MDWE_NO_INHERIT flag of the PR_SET_MDWE prctl. Check that: - it can't be set without PR_SET_MDWE - MDWE flags can't be unset - when set, PR_SET_MDWE doesn't propagate to children Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Florent Revest <[email protected]> Acked-by: Catalin Marinas <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Alexey Izbyshev <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ayush Jain <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Joey Gouly <[email protected]> Cc: KP Singh <[email protected]> Cc: Mark Brown <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Szabolcs Nagy <[email protected]> Cc: Topi Miettinen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-06	mm: add a NO_INHERIT flag to the PR_SET_MDWE prctl	Florent Revest	1	-0/+1
	This extends the current PR_SET_MDWE prctl arg with a bit to indicate that the process doesn't want MDWE protection to propagate to children. To implement this no-inherit mode, the tag in current->mm->flags must be absent from MMF_INIT_MASK. This means that the encoding for "MDWE but without inherit" is different in the prctl than in the mm flags. This leads to a bit of bit-mangling in the prctl implementation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Florent Revest <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Alexey Izbyshev <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ayush Jain <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Joey Gouly <[email protected]> Cc: KP Singh <[email protected]> Cc: Mark Brown <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Szabolcs Nagy <[email protected]> Cc: Topi Miettinen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>