blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-09-02	selftests/xsk: Make sure single threaded test terminates	Maciej Fijalkowski	1	-0/+7
	For single threaded poll tests call pthread_kill() from main thread so that we are sure worker thread has finished its job and it is possible to proceed with next test types from test suite. It was observed that on some platforms it takes a bit longer for worker thread to exit and next test case sees device as busy in this case. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02	selftests/xsk: Add support for executing tests on physical device	Maciej Fijalkowski	3	-87/+170
	Currently, architecture of xdpxceiver is designed strictly for conducting veth based tests. Veth pair is created together with a network namespace and one of the veth interfaces is moved to the mentioned netns. Then, separate threads for Tx and Rx are spawned which will utilize described setup. Infrastructure described in the paragraph above can not be used for testing AF_XDP support on physical devices. That testing will be conducted on a single network interface and same queue. Xskxceiver needs to be extended to distinguish between veth tests and physical interface tests. Since same iface/queue id pair will be used by both Tx/Rx threads for physical device testing, Tx thread, which happen to run after the Rx thread, is going to create XSK socket with shared umem flag. In order to track this setting throughout the lifetime of spawned threads, introduce 'shared_umem' boolean variable to struct ifobject and set it to true when xdpxceiver is run against physical device. In such case, UMEM size needs to be doubled, so half of it will be used by Rx thread and other half by Tx thread. For two step based test types, value of XSKMAP element under key 0 has to be updated as there is now another socket for the second step. Also, to avoid race conditions when destroying XSK resources, move this activity to the main thread after spawned Rx and Tx threads have finished its job. This way it is possible to gracefully remove shared umem without introducing synchronization mechanisms. To run xsk selftests suite on physical device, append "-i $IFACE" when invoking test_xsk.sh. For veth based tests, simply skip it. When "-i $IFACE" is in place, under the hood test_xsk.sh will use $IFACE for both interfaces supplied to xdpxceiver, which in turn will interpret that this execution of test suite is for a physical device. Note that currently this makes it possible only to test SKB and DRV mode (in case underlying device has native XDP support). ZC testing support is added in a later patch. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02	selftests/xsk: Increase chars for interface name to 16	Maciej Fijalkowski	1	-2/+2
	So that "enp240s0f0" or such name can be used against xskxceiver. While at it, also extend character count for netns name. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02	selftests/xsk: Introduce default Rx pkt stream	Maciej Fijalkowski	2	-27/+51
	In order to prepare xdpxceiver for physical device testing, let us introduce default Rx pkt stream. Reason for doing it is that physical device testing will use a UMEM with a doubled size where half of it will be used by Tx and other half by Rx. This means that pkt addresses will differ for Tx and Rx streams. Rx thread will initialize the xsk_umem_info::base_addr that is added here so that pkt_set(), when working on Rx UMEM will add this offset and second half of UMEM space will be used. Note that currently base_addr is 0 on both sides. Future commit will do the mentioned initialization. Previously, veth based testing worked on separate UMEMs, so single default stream was fine. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02	selftests/xsk: Query for native XDP support	Maciej Fijalkowski	1	-2/+37
	Currently, xdpxceiver assumes that underlying device supports XDP in native mode - it is fine by now since tests can run only on a veth pair. Future commit is going to allow running test suite against physical devices, so let us query the device if it is capable of running XDP programs in native mode. This way xdpxceiver will not try to run TEST_MODE_DRV if device being tested is not supporting it. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02	landlock: Fix file reparenting without explicit LANDLOCK_ACCESS_FS_REFER	Mickaël Salaün	1	-10/+145
	This change fixes a mis-handling of the LANDLOCK_ACCESS_FS_REFER right when multiple rulesets/domains are stacked. The expected behaviour was that an additional ruleset can only restrict the set of permitted operations, but in this particular case, it was potentially possible to re-gain the LANDLOCK_ACCESS_FS_REFER right. With the introduction of LANDLOCK_ACCESS_FS_REFER, we added the first globally denied-by-default access right. Indeed, this lifted an initial Landlock limitation to rename and link files, which was initially always denied when the source or the destination were different directories. This led to an inconsistent backward compatibility behavior which was only taken into account if no domain layer were using the new LANDLOCK_ACCESS_FS_REFER right. However, when restricting a thread with a new ruleset handling LANDLOCK_ACCESS_FS_REFER, all inherited parent rulesets/layers not explicitly handling LANDLOCK_ACCESS_FS_REFER would behave as if they were handling this access right and with all their rules allowing it. This means that renaming and linking files could became allowed by these parent layers, but all the other required accesses must also be granted: all layers must allow file removal or creation, and renaming and linking operations cannot lead to privilege escalation according to the Landlock policy. See detailed explanation in commit b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER"). To say it another way, this bug may lift the renaming and linking limitations of the initial Landlock version, and a same ruleset can enforce different restrictions depending on previous or next enforced ruleset (i.e. inconsistent behavior). The LANDLOCK_ACCESS_FS_REFER right cannot give access to data not already allowed, but this doesn't follow the contract of the first Landlock ABI. This fix puts back the limitation for sandboxes that didn't opt-in for this additional right. For instance, if a first ruleset allows LANDLOCK_ACCESS_FS_MAKE_REG on /dst and LANDLOCK_ACCESS_FS_REMOVE_FILE on /src, renaming /src/file to /dst/file is denied. However, without this fix, stacking a new ruleset which allows LANDLOCK_ACCESS_FS_REFER on / would now permit the sandboxed thread to rename /src/file to /dst/file . This change fixes the (absolute) rule access rights, which now always forbid LANDLOCK_ACCESS_FS_REFER except when it is explicitly allowed when creating a rule. Making all domain handle LANDLOCK_ACCESS_FS_REFER was an initial approach but there is two downsides: * it makes the code more complex because we still want to check that a rule allowing LANDLOCK_ACCESS_FS_REFER is legitimate according to the ruleset's handled access rights (i.e. ABI v1 != ABI v2); * it would not allow to identify if the user created a ruleset explicitly handling LANDLOCK_ACCESS_FS_REFER or not, which will be an issue to audit Landlock. Instead, this change adds an ACCESS_INITIALLY_DENIED list of denied-by-default rights, which (only) contains LANDLOCK_ACCESS_FS_REFER. All domains are treated as if they are also handling this list, but without modifying their fs_access_masks field. A side effect is that the errno code returned by rename(2) or link(2) may be changed from EXDEV to EACCES according to the enforced restrictions. Indeed, we now have the mechanic to identify if an access is denied because of a required right (e.g. LANDLOCK_ACCESS_FS_MAKE_REG, LANDLOCK_ACCESS_FS_REMOVE_FILE) or if it is denied because of missing LANDLOCK_ACCESS_FS_REFER rights. This may result in different errno codes than for the initial Landlock version, but this approach is more consistent and better for rename/link compatibility reasons, and it wasn't possible before (hence no backport to ABI v1). The layout1.rename_file test reflects this change. Add 4 layout1.refer_denied_by_default* test suites to check that the behavior of a ruleset not handling LANDLOCK_ACCESS_FS_REFER (ABI v1) is unchanged even if another layer handles LANDLOCK_ACCESS_FS_REFER (i.e. ABI v1 precedence). Make sure rule's absolute access rights are correct by testing with and without a matching path. Add test_rename() and test_exchange() helpers. Extend layout1.inval tests to check that a denied-by-default access right is not necessarily part of a domain's handled access rights. Test coverage for security/landlock is 95.3% of 599 lines according to gcc/gcov-11. Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Reviewed-by: Paul Moore <[email protected]> Reviewed-by: Günther Noack <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] [mic: Constify and slightly simplify test helpers] Signed-off-by: Mickaël Salaün <[email protected]>
2022-09-02	selftests/bpf: Amend test_tunnel to exercise BPF_F_TUNINFO_FLAGS	Shmulik Ladkani	1	-8/+16
	Get the tunnel flags in {ipv6}vxlan_get_tunnel_src and ensure they are aligned with tunnel params set at {ipv6}vxlan_set_tunnel_dst. Signed-off-by: Shmulik Ladkani <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-01	selftests: net: dsa: symlink the tc_actions.sh test	Vladimir Oltean	3	-1/+4
	This has been validated on the Ocelot/Felix switch family (NXP LS1028A) and should be relevant to any switch driver that offloads the tc-flower and/or tc-matchall actions trap, drop, accept, mirred, for which DSA has operations. TEST: gact drop and ok (skip_hw) [ OK ] TEST: mirred egress flower redirect (skip_hw) [ OK ] TEST: mirred egress flower mirror (skip_hw) [ OK ] TEST: mirred egress matchall mirror (skip_hw) [ OK ] TEST: mirred_egress_to_ingress (skip_hw) [ OK ] TEST: gact drop and ok (skip_sw) [ OK ] TEST: mirred egress flower redirect (skip_sw) [ OK ] TEST: mirred egress flower mirror (skip_sw) [ OK ] TEST: mirred egress matchall mirror (skip_sw) [ OK ] TEST: trap (skip_sw) [ OK ] TEST: mirred_egress_to_ingress (skip_sw) [ OK ] Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-01	Merge tag 'kvm-s390-master-6.0-1' of ↵	Paolo Bonzini	11	-176/+343
	git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD PCI interpretation compile fixes
2022-09-01	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	3	-28/+54
	tools/testing/selftests/net/.gitignore sort the net-next version and use it Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-01	selftests/bpf: Test concurrent updates on bpf_task_storage_busy	Hou Tao	2	-0/+161
	Under full preemptible kernel, task local storage lookup operations on the same CPU may update per-cpu bpf_task_storage_busy concurrently. If the update of bpf_task_storage_busy is not preemption safe, the final value of bpf_task_storage_busy may become not-zero forever and bpf_task_storage_trylock() will always fail. So add a test case to ensure the update of bpf_task_storage_busy is preemption safe. Will skip the test case when CONFIG_PREEMPT is disabled, and it can only reproduce the problem probabilistically. By increasing TASK_STORAGE_MAP_NR_LOOP and running it under ARM64 VM with 4-cpus, it takes about four rounds to reproduce: > test_maps is modified to only run test_task_storage_map_stress_lookup() $ export TASK_STORAGE_MAP_NR_THREAD=256 $ export TASK_STORAGE_MAP_NR_LOOP=81920 $ export TASK_STORAGE_MAP_PIN_CPU=1 $ time ./test_maps test_task_storage_map_stress_lookup(135):FAIL:bad bpf_task_storage_busy got -2 real 0m24.743s user 0m6.772s sys 0m17.966s Signed-off-by: Hou Tao <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-09-01	selftests/bpf: Move sys_pidfd_open() into task_local_storage_helpers.h	Hou Tao	3	-18/+20
	sys_pidfd_open() is defined twice in both test_bprm_opts.c and test_local_storage.c, so move it to a common header file. And it will be used in map_tests as well. Signed-off-by: Hou Tao <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-09-01	selftests/net: return back io_uring zc send tests	Pavel Begunkov	2	-79/+41
	Enable io_uring zerocopy send tests back and fix them up to follow the new inteface. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/c8e5018c516093bdad0b6e19f2f9847dea17e4d2.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-09-01	selftests/net: temporarily disable io_uring zc test	Pavel Begunkov	1	-0/+9
	We're going to change API, to avoid build problems with a couple of following commits, disable io_uring testing. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/12b7507223df04fbd12aa05fc0cb544b51d7ed79.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-08-31	net-next: Fix IP_UNICAST_IF option behavior for connected sockets	Richard Gobert	2	-2/+44
	The IP_UNICAST_IF socket option is used to set the outgoing interface for outbound packets. The IP_UNICAST_IF socket option was added as it was needed by the Wine project, since no other existing option (SO_BINDTODEVICE socket option, IP_PKTINFO socket option or the bind function) provided the needed characteristics needed by the IP_UNICAST_IF socket option. [1] The IP_UNICAST_IF socket option works well for unconnected sockets, that is, the interface specified by the IP_UNICAST_IF socket option is taken into consideration in the route lookup process when a packet is being sent. However, for connected sockets, the outbound interface is chosen when connecting the socket, and in the route lookup process which is done when a packet is being sent, the interface specified by the IP_UNICAST_IF socket option is being ignored. This inconsistent behavior was reported and discussed in an issue opened on systemd's GitHub project [2]. Also, a bug report was submitted in the kernel's bugzilla [3]. To understand the problem in more detail, we can look at what happens for UDP packets over IPv4 (The same analysis was done separately in the referenced systemd issue). When a UDP packet is sent the udp_sendmsg function gets called and the following happens: 1. The oif member of the struct ipcm_cookie ipc (which stores the output interface of the packet) is initialized by the ipcm_init_sk function to inet->sk.sk_bound_dev_if (the device set by the SO_BINDTODEVICE socket option). 2. If the IP_PKTINFO socket option was set, the oif member gets overridden by the call to the ip_cmsg_send function. 3. If no output interface was selected yet, the interface specified by the IP_UNICAST_IF socket option is used. 4. If the socket is connected and no destination address is specified in the send function, the struct ipcm_cookie ipc is not taken into consideration and the cached route, that was calculated in the connect function is being used. Thus, for a connected socket, the IP_UNICAST_IF sockopt isn't taken into consideration. This patch corrects the behavior of the IP_UNICAST_IF socket option for connect()ed sockets by taking into consideration the IP_UNICAST_IF sockopt when connecting the socket. In order to avoid reconnecting the socket, this option is still ignored when applied on an already connected socket until connect() is called again by the Richard Gobert. Change the __ip4_datagram_connect function, which is called during socket connection, to take into consideration the interface set by the IP_UNICAST_IF socket option, in a similar way to what is done in the udp_sendmsg function. [1] https://lore.kernel.org/netdev/1328685717.4736.4.camel@edumazet-laptop/T/ [2] https://github.com/systemd/systemd/issues/11935#issuecomment-618691018 [3] https://bugzilla.kernel.org/show_bug.cgi?id=210255 Signed-off-by: Richard Gobert <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/20220829111554.GA1771@debian Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-31	selftests/bpf: Add test cases for htab update	Hou Tao	3	-0/+156
	One test demonstrates the reentrancy of hash map update on the same bucket should fail, and another one shows concureently updates of the same hash map bucket should succeed and not fail due to the reentrancy checking for bucket lock. There is no trampoline support on s390x, so move htab_update to denylist. Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-08-31	selftest/bpf: Ensure no module loading in bpf_setsockopt(TCP_CONGESTION)	Martin KaFai Lau	1	-0/+4
	This patch adds a test to ensure bpf_setsockopt(TCP_CONGESTION, "not_exist") will not trigger the kernel module autoload. Before the fix: [ 40.535829] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274 [...] [ 40.552134] tcp_ca_find_autoload.constprop.0+0xcb/0x200 [ 40.552689] tcp_set_congestion_control+0x99/0x7b0 [ 40.553203] do_tcp_setsockopt+0x3ed/0x2240 [...] [ 40.556041] __bpf_setsockopt+0x124/0x640 Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-08-31	selftests: net: sort .gitignore file	Axel Rasmussen	1	-25/+25
	This is the result of `sort tools/testing/selftests/net/.gitignore`, but preserving the comment at the top. Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Axel Rasmussen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-31	selftests/xsk: Add missing close() on netns fd	Maciej Fijalkowski	1	-0/+4
	Commit 1034b03e54ac ("selftests: xsk: Simplify cleanup of ifobjects") removed close on netns fd, which is not correct, so let us restore it. Fixes: 1034b03e54ac ("selftests: xsk: Simplify cleanup of ifobjects") Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-08-31	selftests/nolibc: Avoid generated files being committed	Fernanda Ma'rouf	1	-0/+4
	After running the nolibc tests, the "git status" is not clean because the generated files are not ignored. Create a `.gitignore` inside the selftests/nolibc directory to ignore them. Cc: Ammar Faizi <[email protected]> Cc: Fernanda Ma'rouf <[email protected]> Signed-off-by: Fernanda Ma'rouf <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: add a "help" target	Willy Tarreau	1	-1/+26
	It presents the supported targets, and becomes the default target to save the user from having to read the makefile. The "all" target was placed after it and now points to "run" to do everything since it's no longer the default one. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: "sysroot" target installs a local copy of the sysroot	Willy Tarreau	1	-2/+11
	It's not convenient to rely on a sysroot built in another directory, especially when running cross-compilation tests, where one has to switch back and forth between directories. Let's make it possible to install the sysroot directly in the test directory. It's not big and even benefits from being copied by arch so that it's easier to switch between archs if needed. The new "sysroot" target does this, it just calls "headers_standalone" from nolibc to install the sysroot right here. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: add a "run" target to start the kernel in QEMU	Willy Tarreau	1	-0/+33
	The "run" target will build the kernel and start it in QEMU. The "rerun" target will not have the kernel dependency and will just try to start QEMU. The QEMU architecture used to start the kernel is derived from the configured ARCH. This might need to be improved for archs which include different variants under the same name (mips vs mipsel, +/-64, riscv32 vs riscv64). This could be tested for i386, x86, arm, arm64, mips and riscv (the later two reporting issues on some tests). It is possible to pass a test specification for nolibc-test in the TEST variable, which will be passed as-is as NOLIBC_TEST. On success, the number of successful tests is printed. On failure, failed lines are individually printed. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: add a "defconfig" target	Willy Tarreau	1	-0/+12
	While most archs will work fine with "make defconfig", not all will do, and it's not always easy to remember the most suitable choice to use for a specific architecture. This adds a "defconfig" target to the Makefile so that one may easily run "make -C ... defconfig" and make sure to clean and rebuild a fresh config. This is not used by default because we want to preserve the user's config by default. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: add a "kernel" target to build the kernel with the initramfs	Willy Tarreau	1	-0/+13
	The "kernel" target rebuilds the kernel with the current config for the selected arch, with an initramfs containing the nolibc-test utility. Since image names depend on the architecture, the currently supported ones are referenced and resolved based on the architecture. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: support glibc as well	Willy Tarreau	1	-2/+45
	Adding support for glibc can be useful to distinguish between bugs in nolibc and bugs in the kernel when a syscall reports an unusual value. It's not that much work and should not affect the long term maintainability of the tests. The necessary changes can essentially be summed up like this: - set _GNU_SOURCE a the top to access some definitions - many includes added when we know we don't come from nolibc (missing the stdio include guard) - disable gettid() which is not exposed by glibc - disable gettimeofday's support of bad pointers since these crash in glibc - add a simple itoa() for errorname(); strerror() is too verbose (no way to get short messages). strerrorname_np() was added in modern glibc (2.32) to do exactly this but that 's too recent to be usable as the default fallback. - use the standard ioperm() definition. May be we need to implement ioperm() in nolibc if that's useful. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: condition some tests on /proc existence	Willy Tarreau	1	-5/+9
	If /proc is not available (program run inside a chroot or without sufficient permissions), it's better to disable the associated tests. Some will be preserved like the ones which check for a failure to create some entries there since they're still supposed to fail. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: recreate and populate /dev and /proc if missing	Willy Tarreau	1	-0/+56
	Most of the time the program will be run alone in an initramfs. There is no value in requiring the user to populate /dev and /proc for such tests, we can do it ourselves, and it participates to the tests at the same time. What's done here is that when called as init (getpid()==1) we check if /dev exists or create it, if /dev/console and /dev/null exists, otherwise we try to mount a devtmpfs there, and if it fails we fall back to mknod. The console is reopened if stdout was closed. Finally /proc is created and mounted if /proc/self cannot be found. This is sufficient for most tests. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: on x86, support exiting with isa-debug-exit	Willy Tarreau	1	-0/+9
	QEMU, when started with "-device isa-debug-exit -no-reboot" will exit with status code 2N+1 when N is written to 0x501. This is particularly convenient for automated tests but this is not portable. As such we only enable this on x86_64 when pid==1. In addition, this requires an ioperm() call but in order not to have to define arch-specific syscalls we just perform the syscall by hand there. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: exit with poweroff on success when getpid() == 1	Willy Tarreau	1	-0/+14
	The idea is to ease automated testing under qemu. If the test succeeds while running as PID 1, indicating the system was booted with init=/test, let's just power off so that qemu can exit with a successful code. In other situations it will exit and provoke a panic, which may be caught for example with CONFIG_PVPANIC. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: add a few tests for some libc functions	Willy Tarreau	1	-0/+35
	The test series called "stdlib" covers some libc functions (string, stdlib etc). By default they are automatically run after "syscall" but may be requested in argument or in variable NOLIBC_TEST. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: implement a few tests for various syscalls	Willy Tarreau	1	-0/+110
	This adds 63 tests covering about 34 syscalls. Both successes and failures are tested. Two tests fail when run as unprivileged user (link_dir which returns EACCESS instead of EPERM, and chroot which returns EPERM). One test (execve("/")) expects to fail on EACCESS, but needs to have valid arguments otherwise the kernel will log a message. And a few tests require /proc to be mounted. The code is not pretty since all tests are one-liners, sometimes resulting in long lines, especially when using compount statements to preset a line, but it's convenient and doesn't obfuscate the code, which is important to understand what failed. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: support a test definition format	Willy Tarreau	1	-0/+91
	It now becomes possible to pass a string either in argv[1] or in the NOLIBC_TEST environment variable (the former having precedence), to specify which tests to run. The format is: testname[:range]*[,testname...] Where a range is either a single value or the min and max numbers of the test IDs in a sequence, delimited by a dash. Multiple ranges are possible. This should provide enough flexibility to focus on certain failing parts just by playing with the boot command line in a boot loader or in qemu depending on what is accessible. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	selftests/nolibc: add basic infrastructure to ease creation of nolibc tests	Willy Tarreau	2	-0/+438
	This creates a "nolibc" selftest that intends to test various parts of the nolibc component, both in terms of build and execution for a given architecture. The aim is for it to be as simple to run as a kernel build, by just passing the compiler (for the build) and the ARCH (for kernel and execution). It brings a basic squeleton made of a single C file that will ease testing and error reporting. The code will be arranged so that it remains easy to add basic tests for syscalls or library calls that may rely on a condition to be executed, and whose result is compared to a value or to an error with a specific errno value. Tests will just use a relative line number in switch/case statements as an index, saving the user from having to maintain arrays and complicated functions which can often just be one-liners. MAINTAINERS was updated. Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2022-08-31	netfilter: remove nf_conntrack_helper sysctl and modparam toggles	Pablo Neira Ayuso	1	-10/+26
	__nf_ct_try_assign_helper() remains in place but it now requires a template to configure the helper. A toggle to disable automatic helper assignment was added by: a9006892643a ("netfilter: nf_ct_helper: allow to disable automatic helper assignment") in 2012 to address the issues described in "Secure use of iptables and connection tracking helpers". Automatic conntrack helper assignment was disabled by: 3bb398d925ec ("netfilter: nf_ct_helper: disable automatic helper assignment") back in 2016. This patch removes the sysctl and modparam toggles, users now have to rely on explicit conntrack helper configuration via ruleset. Update tools/testing/selftests/netfilter/nft_conntrack_helper.sh to check that auto-assignment does not happen anymore. Acked-by: Aaron Conole <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-08-29	selftests/bpf: Fix connect4_prog tcp/socket header type conflict	James Hilliard	1	-2/+3
	There is a potential for us to hit a type conflict when including netinet/tcp.h and sys/socket.h, we can replace both of these includes with linux/tcp.h and bpf_tcp_helpers.h to avoid this conflict. Fixes errors like the below when compiling with gcc BPF backend: In file included from /usr/include/netinet/tcp.h:91, from progs/connect4_prog.c:11: /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char' 34 \| typedef __INT8_TYPE__ int8_t; \| ^~~~~~ In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155, from /usr/include/x86_64-linux-gnu/bits/socket.h:29, from /usr/include/x86_64-linux-gnu/sys/socket.h:33, from progs/connect4_prog.c:10: /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'} 24 \| typedef __int8_t int8_t; \| ^~~~~~ /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int' 43 \| typedef __INT64_TYPE__ int64_t; \| ^~~~~~~ /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'} 27 \| typedef __int64_t int64_t; \| ^~~~~~~ Signed-off-by: James Hilliard <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-08-29	selftests/bpf: Fix bind{4,6} tcp/socket header type conflict	James Hilliard	2	-4/+0
	There is a potential for us to hit a type conflict when including netinet/tcp.h with sys/socket.h, we can remove these as they are not actually needed. Fixes errors like the below when compiling with gcc BPF backend: In file included from /usr/include/netinet/tcp.h:91, from progs/bind4_prog.c:10: /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char' 34 \| typedef __INT8_TYPE__ int8_t; \| ^~~~~~ In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155, from /usr/include/x86_64-linux-gnu/bits/socket.h:29, from /usr/include/x86_64-linux-gnu/sys/socket.h:33, from progs/bind4_prog.c:9: /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'} 24 \| typedef __int8_t int8_t; \| ^~~~~~ /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int' 43 \| typedef __INT64_TYPE__ int64_t; \| ^~~~~~~ /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'} 27 \| typedef __int64_t int64_t; \| ^~~~~~~ make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/bind4_prog.o] Error 1 Signed-off-by: James Hilliard <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-08-26	selftests/bpf: Declare subprog_noise as static in tailcall_bpf2bpf4	James Hilliard	1	-1/+1
	Due to bpf_map_lookup_elem being declared static we need to also declare subprog_noise as static. Fixes the following error: progs/tailcall_bpf2bpf4.c:26:9: error: 'bpf_map_lookup_elem' is static but used in inline function 'subprog_noise' which is not static [-Werror] 26 \| bpf_map_lookup_elem(&nop_table, &key); \| ^~~~~~~~~~~~~~~~~~~ Signed-off-by: James Hilliard <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-08-26	selftests/bpf: fix type conflict in test_tc_dtime	James Hilliard	1	-1/+0
	The sys/socket.h header isn't required to build test_tc_dtime and may cause a type conflict. Fixes the following error: In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155, from /usr/include/x86_64-linux-gnu/bits/socket.h:29, from /usr/include/x86_64-linux-gnu/sys/socket.h:33, from progs/test_tc_dtime.c:18: /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: error: conflicting types for 'int8_t'; have '__int8_t' {aka 'signed char'} 24 \| typedef __int8_t int8_t; \| ^~~~~~ In file included from progs/test_tc_dtime.c:5: /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'char'} 34 \| typedef __INT8_TYPE__ int8_t; \| ^~~~~~ /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: error: conflicting types for 'int64_t'; have '__int64_t' {aka 'long long int'} 27 \| typedef __int64_t int64_t; \| ^~~~~~~ /home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long int'} 43 \| typedef __INT64_TYPE__ int64_t; \| ^~~~~~~ make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/test_tc_dtime.o] Error 1 Signed-off-by: James Hilliard <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-08-26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	David S. Miller	2	-0/+26
	Daniel borkmann says: ==================== The following pull-request contains BPF updates for your net tree. We've added 11 non-merge commits during the last 14 day(s) which contain a total of 13 files changed, 61 insertions(+), 24 deletions(-). The main changes are: 1) Fix BPF verifier's precision tracking around BPF ring buffer, from Kumar Kartikeya Dwivedi. 2) Fix regression in tunnel key infra when passing FLOWI_FLAG_ANYSRC, from Eyal Birger. 3) Fix insufficient permissions for bpf_sys_bpf() helper, from YiFei Zhu. 4) Fix splat from hitting BUG when purging effective cgroup programs, from Pu Lehui. 5) Fix range tracking for array poke descriptors, from Daniel Borkmann. 6) Fix corrupted packets for XDP_SHARED_UMEM in aligned mode, from Magnus Karlsson. 7) Fix NULL pointer splat in BPF sockmap sk_msg_recvmsg(), from Liu Jian. 8) Add READ_ONCE() to bpf_jit_limit when reading from sysctl, from Kuniyuki Iwashima. 9) Add BPF selftest lru_bug check to s390x deny list, from Daniel Müller. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-08-26	selftests/powerpc: Add a test for execute-only memory	Nicholas Miehlbradt	2	-1/+233
	This selftest is designed to cover execute-only protections on the Radix MMU but will also work with Hash. The tests are based on those found in pkey_exec_test with modifications to use the generic mprotect() instead of the pkey variants. Signed-off-by: Nicholas Miehlbradt <[email protected]> Signed-off-by: Russell Currey <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-25	bpf: Add CGROUP prefix to cgroup_iter_order	Hao Luo	3	-7/+7
	bpf_cgroup_iter_order is globally visible but the entries do not have CGROUP prefix. As requested by Andrii, put a CGROUP in the names in bpf_cgroup_iter_order. This patch fixes two previous commits: one introduced the API and the other uses the API in bpf selftest (that is, the selftest cgroup_hierarchical_stats). I tested this patch via the following command: test_progs -t cgroup,iter,btf_dump Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter") Fixes: 88886309d2e8 ("selftests/bpf: add a selftest for cgroup hierarchical stats collection") Suggested-by: Andrii Nakryiko <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Hao Luo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-08-25	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	9	-4/+131
	drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()") c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer") https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-25	Merge tag 'net-6.0-rc3' of ↵	Linus Torvalds	5	-0/+90
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from ipsec and netfilter (with one broken Fixes tag). Current release - new code bugs: - dsa: don't dereference NULL extack in dsa_slave_changeupper() - dpaa: fix <1G ethernet on LS1046ARDB - neigh: don't call kfree_skb() under spin_lock_irqsave() Previous releases - regressions: - r8152: fix the RX FIFO settings when suspending - dsa: microchip: keep compatibility with device tree blobs with no phy-mode - Revert "net: macsec: update SCI upon MAC address change." - Revert "xfrm: update SA curlft.use_time", comply with RFC 2367 Previous releases - always broken: - netfilter: conntrack: work around exceeded TCP receive window - ipsec: fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid - moxa: get rid of asymmetry in DMA mapping/unmapping - dsa: microchip: make learning configurable and keep it off while standalone - ice: xsk: prohibit usage of non-balanced queue id - rxrpc: fix locking in rxrpc's sendmsg Misc: - another chunk of sysctl data race silencing" * tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) net: lantiq_xrx200: restore buffer if memory allocation failed net: lantiq_xrx200: fix lock under memory pressure net: lantiq_xrx200: confirm skb is allocated before using net: stmmac: work around sporadic tx issue on link-up ionic: VF initial random MAC address if no assigned mac ionic: fix up issues with handling EAGAIN on FW cmds ionic: clear broken state on generation change rxrpc: Fix locking in rxrpc's sendmsg net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2 MAINTAINERS: rectify file entry in BONDING DRIVER i40e: Fix incorrect address type for IPv6 flow rules ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter net: Fix a data-race around sysctl_somaxconn. net: Fix a data-race around netdev_unregister_timeout_secs. net: Fix a data-race around gro_normal_batch. net: Fix data-races around sysctl_devconf_inherit_init_net. net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. net: Fix a data-race around netdev_budget_usecs. net: Fix data-races around sysctl_max_skb_frags. net: Fix a data-race around netdev_budget. ...
2022-08-25	selftests/net: fix reinitialization of TEST_PROGS in net self tests.	Adel Abouchaev	1	-1/+1
	Assinging will drop all previous tests. Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting") Signed-off-by: Adel Abouchaev <[email protected]> Reviewed-by: Shuah Khan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-25	selftests/bpf: Add regression test for pruning fix	Kumar Kartikeya Dwivedi	1	-0/+25
	Add a test to ensure we do mark_chain_precision for the argument type ARG_CONST_ALLOC_SIZE_OR_ZERO. For other argument types, this was already done, but propagation for missing for this case. Without the fix, this test case loads successfully. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-08-25	selftests/bpf: add a selftest for cgroup hierarchical stats collection	Yosry Ahmed	3	-0/+584
	Add a selftest that tests the whole workflow for collecting, aggregating (flushing), and displaying cgroup hierarchical stats. TL;DR: - Userspace program creates a cgroup hierarchy and induces memcg reclaim in parts of it. - Whenever reclaim happens, vmscan_start and vmscan_end update per-cgroup percpu readings, and tell rstat which (cgroup, cpu) pairs have updates. - When userspace tries to read the stats, vmscan_dump calls rstat to flush the stats, and outputs the stats in text format to userspace (similar to cgroupfs stats). - rstat calls vmscan_flush once for every (cgroup, cpu) pair that has updates, vmscan_flush aggregates cpu readings and propagates updates to parents. - Userspace program makes sure the stats are aggregated and read correctly. Detailed explanation: - The test loads tracing bpf programs, vmscan_start and vmscan_end, to measure the latency of cgroup reclaim. Per-cgroup readings are stored in percpu maps for efficiency. When a cgroup reading is updated on a cpu, cgroup_rstat_updated(cgroup, cpu) is called to add the cgroup to the rstat updated tree on that cpu. - A cgroup_iter program, vmscan_dump, is loaded and pinned to a file, for each cgroup. Reading this file invokes the program, which calls cgroup_rstat_flush(cgroup) to ask rstat to propagate the updates for all cpus and cgroups that have updates in this cgroup's subtree. Afterwards, the stats are exposed to the user. vmscan_dump returns 1 to terminate iteration early, so that we only expose stats for one cgroup per read. - An ftrace program, vmscan_flush, is also loaded and attached to bpf_rstat_flush. When rstat flushing is ongoing, vmscan_flush is invoked once for each (cgroup, cpu) pair that has updates. cgroups are popped from the rstat tree in a bottom-up fashion, so calls will always be made for cgroups that have updates before their parents. The program aggregates percpu readings to a total per-cgroup reading, and also propagates them to the parent cgroup. After rstat flushing is over, all cgroups will have correct updated hierarchical readings (including all cpus and all their descendants). - Finally, the test creates a cgroup hierarchy and induces memcg reclaim in parts of it, and makes sure that the stats collection, aggregation, and reading workflow works as expected. Signed-off-by: Yosry Ahmed <[email protected]> Signed-off-by: Hao Luo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-08-25	selftests/bpf: extend cgroup helpers	Yosry Ahmed	2	-47/+174
	This patch extends bpf selft cgroup_helpers [ID] n various ways: - Add enable_controllers() that allows tests to enable all or a subset of controllers for a specific cgroup. - Add join_cgroup_parent(). The cgroup workdir is based on the pid, therefore a spawned child cannot join the same cgroup hierarchy of the test through join_cgroup(). join_cgroup_parent() is used in child processes to join a cgroup under the parent's workdir. - Add write_cgroup_file() and write_cgroup_file_parent() (similar to join_cgroup_parent() above). - Add get_root_cgroup() for tests that need to do checks on root cgroup. - Distinguish relative and absolute cgroup paths in function arguments. Now relative paths are called relative_path, and absolute paths are called cgroup_path. Signed-off-by: Yosry Ahmed <[email protected]> Signed-off-by: Hao Luo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-08-25	selftests/bpf: Test cgroup_iter.	Hao Luo	4	-1/+271
	Add a selftest for cgroup_iter. The selftest creates a mini cgroup tree of the following structure: ROOT (working cgroup) \| PARENT / \ CHILD1 CHILD2 and tests the following scenarios: - invalid cgroup fd. - pre-order walk over descendants from PARENT. - post-order walk over descendants from PARENT. - walk of ancestors from PARENT. - process only a single object (i.e. PARENT). - early termination. Acked-by: Yonghong Song <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Hao Luo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-08-25	bpf: Introduce cgroup iter	Hao Luo	1	-2/+2
	Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: - walking a cgroup's descendants in pre-order. - walking a cgroup's descendants in post-order. - walking a cgroup's ancestors. - process only the given cgroup. When attaching cgroup_iter, one can set a cgroup to the iter_link created from attaching. This cgroup is passed as a file descriptor or cgroup id and serves as the starting point of the walk. If no cgroup is specified, the starting point will be the root cgroup v2. For walking descendants, one can specify the order: either pre-order or post-order. For walking ancestors, the walk starts at the specified cgroup and ends at the root. One can also terminate the walk early by returning 1 from the iter program. Note that because walking cgroup hierarchy holds cgroup_mutex, the iter program is called with cgroup_mutex held. Currently only one session is supported, which means, depending on the volume of data bpf program intends to send to user space, the number of cgroups that can be walked is limited. For example, given the current buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can be walked is 512. This is a limitation of cgroup_iter. If the output data is larger than the kernel buffer size, after all data in the kernel buffer is consumed by user space, the subsequent read() syscall will signal EOPNOTSUPP. In order to work around, the user may have to update their program to reduce the volume of data sent to output. For example, skip some uninteresting cgroups. In future, we may extend bpf_iter flags to allow customizing buffer size. Acked-by: Yonghong Song <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Hao Luo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>