aboutsummaryrefslogtreecommitdiff
path: root/tools/testing
AgeCommit message (Collapse)AuthorFilesLines
2023-03-06selftests/bpf: Split test_attach_probe into multi subtestsMenglong Dong3-101/+205
In order to adapt to the older kernel, now we split the "attach_probe" testing into multi subtests: manual // manual attach tests for kprobe/uprobe auto // auto-attach tests for kprobe and uprobe kprobe-sleepable // kprobe sleepable test uprobe-lib // uprobe tests for library function by name uprobe-sleepable // uprobe sleepable test uprobe-ref_ctr // uprobe ref_ctr test As sleepable kprobe needs to set BPF_F_SLEEPABLE flag before loading, we need to move it to a stand alone skel file, in case of it is not supported by kernel and make the whole loading fail. Therefore, we can only enable part of the subtests for older kernel. Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Reviewed-by: Biao Jiang <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-03-04selftests/bpf: adjust log_fixup's buffer size for proper truncationAndrii Nakryiko1-1/+1
Adjust log_fixup's expected buffer length to fix the test. It's pretty finicky in its length expectation, but it doesn't break often. So just adjust the length to work on current kernel and with follow up iterator changes as well. Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-04selftests/bpf: enhance align selftest's expected log matchingAndrii Nakryiko1-6/+12
Allow to search for expected register state in all the verifier log output that's related to specified instruction number. See added comment for an example of possible situation that is happening due to a simple enhancement done in the next patch, which fixes handling of env->test_state_freq flag in state checkpointing logic. Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-03selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()Eduard Zingerman4-1/+920
Function verifier.c:convert_ctx_access() applies some rewrites to BPF instructions that read or write BPF program context. This commit adds machinery to allow test cases that inspect BPF program after these rewrites are applied. An example of a test case: { // Shorthand for field offset and size specification N(CGROUP_SOCKOPT, struct bpf_sockopt, retval), // Pattern generated for field read .read = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::current_task);" "$dst = *(u64 *)($dst + task_struct::bpf_ctx);" "$dst = *(u32 *)($dst + bpf_cg_run_ctx::retval);", // Pattern generated for field write .write = "*(u64 *)($ctx + bpf_sockopt_kern::tmp_reg) = r9;" "r9 = *(u64 *)($ctx + bpf_sockopt_kern::current_task);" "r9 = *(u64 *)(r9 + task_struct::bpf_ctx);" "*(u32 *)(r9 + bpf_cg_run_ctx::retval) = $src;" "r9 = *(u64 *)($ctx + bpf_sockopt_kern::tmp_reg);" , }, For each test case, up to three programs are created: - One that uses BPF_LDX_MEM to read the context field. - One that uses BPF_STX_MEM to write to the context field. - One that uses BPF_ST_MEM to write to the context field. The disassembly of each program is compared with the pattern specified in the test case. Kernel code for disassembly is reused (as is in the bpftool). To keep Makefile changes to the minimum, symbolic links to `kernel/bpf/disasm.c` and `kernel/bpf/disasm.h ` are added. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-03selftests/bpf: test if pointer type is tracked for BPF_ST_MEMEduard Zingerman1-0/+23
Check that verifier tracks pointer types for BPF_ST_MEM instructions and reports error if pointer types do not match for different execution branches. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-03bpf: allow ctx writes using BPF_ST_MEM instructionEduard Zingerman1-11/+0
Lift verifier restriction to use BPF_ST_MEM instructions to write to context data structures. This requires the following changes: - verifier.c:do_check() for BPF_ST updated to: - no longer forbid writes to registers of type PTR_TO_CTX; - track dst_reg type in the env->insn_aux_data[...].ptr_type field (same way it is done for BPF_STX and BPF_LDX instructions). - verifier.c:convert_ctx_access() and various callbacks invoked by it are updated to handled BPF_ST instruction alongside BPF_STX. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-03bpf: Refactor RCU enforcement in the verifier.Alexei Starovoitov7-28/+18
bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack of such key mechanism makes it impossible for sleepable bpf programs to use RCU pointers. Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't support btf_type_tag yet) and allowlist certain field dereferences in important data structures like tast_struct, cgroup, socket that are used by sleepable programs either as RCU pointer or full trusted pointer (which is valid outside of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such tagging. They will be removed once GCC supports btf_type_tag. With that refactor check_ptr_to_btf_access(). Make it strict in enforcing PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without modifier flags. There is a chance that this strict enforcement might break existing programs (especially on GCC compiled kernels), but this cleanup has to start sooner than later. Note PTR_TO_CTX access still yields old deprecated PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0. Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-03-03selftests/bpf: Tweak cgroup kfunc test.Alexei Starovoitov1-1/+11
Adjust cgroup kfunc test to dereference RCU protected cgroup pointer as PTR_TRUSTED and pass into KF_TRUSTED_ARGS kfunc. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-03-03selftests/bpf: Add a test case for kptr_rcu.Alexei Starovoitov1-0/+12
Tweak existing map_kptr test to check kptr_rcu. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-03-03bpf: Introduce kptr_rcu.Alexei Starovoitov4-5/+5
The life time of certain kernel structures like 'struct cgroup' is protected by RCU. Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps. The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU. Derefrence of other kptr-s returns PTR_UNTRUSTED. For example: struct map_value { struct cgroup __kptr *cgrp; }; SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path) { struct cgroup *cg, *cg2; cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0 bpf_kptr_xchg(&v->cgrp, cg); cg2 = v->cgrp; // This is new feature introduced by this patch. // cg2 is PTR_MAYBE_NULL | MEM_RCU. // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero if (cg2) bpf_cgroup_ancestor(cg2, level); // safe to do. } Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-03-03bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.Alexei Starovoitov9-22/+22
__kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps. The concept felt useful, but didn't get much traction, since bpf_rdonly_cast() was added soon after and bpf programs received a simpler way to access PTR_UNTRUSTED kernel pointers without going through restrictive __kptr usage. Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate its intended usage. The main goal of __kptr_untrusted was to read/write such pointers directly while bpf_kptr_xchg was a mechanism to access refcnted kernel pointers. The next patch will allow RCU protected __kptr access with direct read. At that point __kptr_untrusted will be deprecated. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-03-02selftests/bpf: Add absolute timer testTero Kristo2-0/+48
Add test for the absolute BPF timer under the existing timer tests. This will run the timer two times with 1us expiration time, and then re-arm the timer at ~35s in the future. At the end, it is verified that the absolute timer expired exactly two times. Signed-off-by: Tero Kristo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-02selftests/bpf: Add -Wuninitialized flag to bpf prog flagsDave Marchevsky6-11/+14
Per C99 standard [0], Section 6.7.8, Paragraph 10: If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. And in the same document, in appendix "J.2 Undefined behavior": The behavior is undefined in the following circumstances: [...] The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.8, 6.8). This means that use of an uninitialized stack variable is undefined behavior, and therefore that clang can choose to do a variety of scary things, such as not generating bytecode for "bunch of useful code" in the below example: void some_func() { int i; if (!i) return; // bunch of useful code } To add insult to injury, if some_func above is a helper function for some BPF program, clang can choose to not generate an "exit" insn, causing verifier to fail with "last insn is not an exit or jmp". Going from that verification failure to the root cause of uninitialized use is certain to be frustrating. This patch adds -Wuninitialized to the cflags for selftest BPF progs and fixes up existing instances of uninitialized use. [0]: https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf Signed-off-by: Dave Marchevsky <[email protected]> Cc: David Vernet <[email protected]> Cc: Tejun Heo <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-01selftests/bpf: Support custom per-test flags and multiple expected messagesAndrii Nakryiko3-9/+84
Extend __flag attribute by allowing to specify one of the following: * BPF_F_STRICT_ALIGNMENT * BPF_F_ANY_ALIGNMENT * BPF_F_TEST_RND_HI32 * BPF_F_TEST_STATE_FREQ * BPF_F_SLEEPABLE * BPF_F_XDP_HAS_FRAGS * Some numeric value Extend __msg attribute by allowing to specify multiple exepcted messages. All messages are expected to be present in the verifier log in the order of application. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] [ Eduard: added commit message, formatting, comments ]
2023-03-01selftests/bpf: Set __BITS_PER_LONG if target is bpf for LoongArchTiezhu Yang1-1/+2
If target is bpf, there is no __loongarch__ definition, __BITS_PER_LONG defaults to 32, __NR_nanosleep is not defined: #if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32 #define __NR_nanosleep 101 __SC_3264(__NR_nanosleep, sys_nanosleep_time32, sys_nanosleep) #endif Work around this problem, by explicitly setting __BITS_PER_LONG to __loongarch_grlen which is defined by compiler as 64 for LA64. This is similar with commit 36e70b9b06bf ("selftests, bpf: Fix broken riscv build"). Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-03-01selftests/bpf: Add more tests for kptrs in mapsKumar Kartikeya Dwivedi3-65/+451
Firstly, ensure programs successfully load when using all of the supported maps. Then, extend existing tests to test more cases at runtime. We are currently testing both the synchronous freeing of items and asynchronous destruction when map is freed, but the code needs to be adjusted a bit to be able to also accomodate support for percpu maps. We now do a delete on the item (and update for array maps which has a similar effect for kptrs) to perform a synchronous free of the kptr, and test destruction both for the synchronous and asynchronous deletion. Next time the program runs, it should observe the refcount as 1 since all existing references should have been released by then. By running the program after both possible paths freeing kptrs, we establish that they correctly release resources. Next, we augment the existing test to also test the same code path shared by all local storage maps using a task local storage map. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-01selftests/bpf: tests for using dynptrs to parse skb and xdp buffersJoanne Koong15-23/+2522
Test skb and xdp dynptr functionality in the following ways: 1) progs/test_cls_redirect_dynptr.c * Rewrite "progs/test_cls_redirect.c" test to use dynptrs to parse skb data * This is a great example of how dynptrs can be used to simplify a lot of the parsing logic for non-statically known values. When measuring the user + system time between the original version vs. using dynptrs, and averaging the time for 10 runs (using "time ./test_progs -t cls_redirect"): original version: 0.092 sec with dynptrs: 0.078 sec 2) progs/test_xdp_dynptr.c * Rewrite "progs/test_xdp.c" test to use dynptrs to parse xdp data When measuring the user + system time between the original version vs. using dynptrs, and averaging the time for 10 runs (using "time ./test_progs -t xdp_attach"): original version: 0.118 sec with dynptrs: 0.094 sec 3) progs/test_l4lb_noinline_dynptr.c * Rewrite "progs/test_l4lb_noinline.c" test to use dynptrs to parse skb data When measuring the user + system time between the original version vs. using dynptrs, and averaging the time for 10 runs (using "time ./test_progs -t l4lb_all"): original version: 0.062 sec with dynptrs: 0.081 sec For number of processed verifier instructions: original version: 6268 insns with dynptrs: 2588 insns 4) progs/test_parse_tcp_hdr_opt_dynptr.c * Add sample code for parsing tcp hdr opt lookup using dynptrs. This logic is lifted from a real-world use case of packet parsing in katran [0], a layer 4 load balancer. The original version "progs/test_parse_tcp_hdr_opt.c" (not using dynptrs) is included here as well, for comparison. When measuring the user + system time between the original version vs. using dynptrs, and averaging the time for 10 runs (using "time ./test_progs -t parse_tcp_hdr_opt"): original version: 0.031 sec with dynptrs: 0.045 sec 5) progs/dynptr_success.c * Add test case "test_skb_readonly" for testing attempts at writes on a prog type with read-only skb ctx. * Add "test_dynptr_skb_data" for testing that bpf_dynptr_data isn't supported for skb progs. 6) progs/dynptr_fail.c * Add test cases "skb_invalid_data_slice{1,2,3,4}" and "xdp_invalid_data_slice{1,2}" for testing that helpers that modify the underlying packet buffer automatically invalidate the associated data slice. * Add test cases "skb_invalid_ctx" and "xdp_invalid_ctx" for testing that prog types that do not support bpf_dynptr_from_skb/xdp don't have access to the API. * Add test case "dynptr_slice_var_len{1,2}" for testing that variable-sized len can't be passed in to bpf_dynptr_slice * Add test case "skb_invalid_slice_write" for testing that writes to a read-only data slice are rejected by the verifier. * Add test case "data_slice_out_of_bounds_skb" for testing that writes to an area outside the slice are rejected. * Add test case "invalid_slice_rdwr_rdonly" for testing that prog types that don't allow writes to packet data don't accept any calls to bpf_dynptr_slice_rdwr. [0] https://github.com/facebookincubator/katran/blob/main/katran/lib/bpf/pckt_parsing.h Signed-off-by: Joanne Koong <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-27Merge tag 'net-6.3-rc1' of ↵Linus Torvalds2-7/+121
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and netfilter. The notable fixes here are the EEE fix which restores boot for many embedded platforms (real and QEMU); WiFi warning suppression and the ICE Kconfig cleanup. Current release - regressions: - phy: multiple fixes for EEE rework - wifi: wext: warn about usage only once - wifi: ath11k: allow system suspend to survive ath11k Current release - new code bugs: - mlx5: Fix memory leak in IPsec RoCE creation - ibmvnic: assign XPS map to correct queue index Previous releases - regressions: - netfilter: ip6t_rpfilter: Fix regression with VRF interfaces - netfilter: ctnetlink: make event listener tracking global - nf_tables: allow to fetch set elements when table has an owner - mlx5: - fix skb leak while fifo resync and push - fix possible ptp queue fifo use-after-free Previous releases - always broken: - sched: fix action bind logic - ptp: vclock: use mutex to fix "sleep on atomic" bug if driver also uses a mutex - netfilter: conntrack: fix rmmod double-free race - netfilter: xt_length: use skb len to match in length_mt6, avoid issues with BIG TCP Misc: - ice: remove unnecessary CONFIG_ICE_GNSS - mlx5e: remove hairpin write debugfs files - sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy" * tag 'net-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) tcp: tcp_check_req() can be called from process context net: phy: c45: fix network interface initialization failures on xtensa, arm:cubieboard xen-netback: remove unused variables pending_idx and index net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy net: dsa: ocelot_ext: remove unnecessary phylink.h include net: mscc: ocelot: fix duplicate driver name error net: dsa: felix: fix internal MDIO controller resource length net: dsa: seville: ignore mscc-miim read errors from Lynx PCS net/sched: act_sample: fix action bind logic net/sched: act_mpls: fix action bind logic net/sched: act_pedit: fix action bind logic wifi: wext: warn about usage only once wifi: mt76: usb: fix use-after-free in mt76u_free_rx_queue qede: avoid uninitialized entries in coal_entry array nfc: fix memory leak of se_io context in nfc_genl_se_io ice: remove unnecessary CONFIG_ICE_GNSS net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy() ibmvnic: Assign XPS map to correct queue index docs: net: fix inaccuracies in msg_zerocopy.rst tools: net: add __pycache__ to gitignore ...
2023-02-27selftests/bpf: Fix compilation errors: Assign a value to a constantRong Tao1-1/+1
Commit bc292ab00f6c("mm: introduce vma->vm_flags wrapper functions") turns the vm_flags into a const variable. Added bpf_find_vma test in commit f108662b27c9("selftests/bpf: Add tests for bpf_find_vma") to assign values to variables that declare const in find_vma_fail1.c programs, which is an error to the compiler and does not test BPF verifiers. It is better to replace 'const vm_flags_t vm_flags' with 'unsigned long vm_start' for testing. $ make -C tools/testing/selftests/bpf/ -j8 ... progs/find_vma_fail1.c:16:16: error: cannot assign to non-static data member 'vm_flags' with const-qualified type 'const vm_flags_t' (aka 'const unsigned long') vma->vm_flags |= 0x55; ~~~~~~~~~~~~~ ^ ../tools/testing/selftests/bpf/tools/include/vmlinux.h:1898:20: note: non-static data member 'vm_flags' declared const here const vm_flags_t vm_flags; ~~~~~~~~~~~`~~~~~~^~~~~~~~ Signed-off-by: Rong Tao <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-27selftests/bpf: Use __NR_prlimit64 instead of __NR_getrlimit in user_ringbuf testTiezhu Yang2-2/+2
After commit 80d7da1cac62 ("asm-generic: Drop getrlimit and setrlimit syscalls from default list"), new architectures won't need to include getrlimit and setrlimit, they are superseded with prlimit64. In order to maintain compatibility for the new architectures, such as LoongArch which does not define __NR_getrlimit, it is better to use __NR_prlimit64 instead of __NR_getrlimit in user_ringbuf test to fix the following build error: TEST-OBJ [test_progs] user_ringbuf.test.o tools/testing/selftests/bpf/prog_tests/user_ringbuf.c: In function 'kick_kernel_cb': tools/testing/selftests/bpf/prog_tests/user_ringbuf.c:593:17: error: '__NR_getrlimit' undeclared (first use in this function) 593 | syscall(__NR_getrlimit); | ^~~~~~~~~~~~~~ tools/testing/selftests/bpf/prog_tests/user_ringbuf.c:593:17: note: each undeclared identifier is reported only once for each function it appears in make: *** [Makefile:573: tools/testing/selftests/bpf/user_ringbuf.test.o] Error 1 make: Leaving directory 'tools/testing/selftests/bpf' Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds23-406/+1453
Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
2023-02-25Merge tag 'powerpc-6.3-1' of ↵Linus Torvalds15-303/+435
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Support for configuring secure boot with user-defined keys on PowerVM LPARs - Simplify the replay of soft-masked IRQs by making it non-recursive - Add support for KCSAN on 64-bit Book3S - Improvements to the API & code which interacts with RTAS (pseries firmware) - Change 32-bit powermac to assign PCI bus numbers per domain by default - Some improvements to the 32-bit BPF JIT - Various other small features and fixes Thanks to Anders Roxell, Andrew Donnellan, Andrew Jeffery, Benjamin Gray, Christophe Leroy, Frederic Barrat, Ganesh Goudar, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Josh Poimboeuf, Kajol Jain, Laurent Dufour, Mahesh Salgaonkar, Mathieu Desnoyers, Mimi Zohar, Murphy Zhou, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nicholas Piggin, Pali Rohár, Petr Mladek, Rohan McLure, Russell Currey, Sachin Sant, Sathvika Vasireddy, Sourabh Jain, Stefan Berger, Stephen Rothwell, and Sudhakar Kuppusamy. * tag 'powerpc-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (114 commits) powerpc/pseries: Avoid hcall in plpks_is_available() on non-pseries powerpc: dts: turris1x.dts: Set lower priority for CPLD syscon-reboot powerpc/e500: Add missing prototype for 'relocate_init' powerpc/64: Fix unannotated intra-function call warning powerpc/epapr: Don't use wrteei on non booke powerpc: Pass correct CPU reference to assembler powerpc/mm: Rearrange if-else block to avoid clang warning powerpc/nohash: Fix build with llvm-as powerpc/nohash: Fix build error with binutils >= 2.38 powerpc/pseries: Fix endianness issue when parsing PLPKS secvar flags macintosh: windfarm: Use unsigned type for 1-bit bitfields powerpc/kexec_file: print error string on usable memory property update failure powerpc/machdep: warn when machine_is() used too early powerpc/64: Replace -mcpu=e500mc64 by -mcpu=e5500 powerpc/eeh: Set channel state after notifying the drivers selftests/powerpc: Fix incorrect kernel headers search path powerpc/rtas: arch-wide function token lookup conversions powerpc/rtas: introduce rtas_function_token() API powerpc/pseries/lpar: convert to papr_sysparm API powerpc/pseries/hv-24x7: convert to papr_sysparm API ...
2023-02-25Merge tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds13-26/+600
Pull Compute Express Link (CXL) updates from Dan Williams: "To date Linux has been dependent on platform-firmware to map CXL RAM regions and handle events / errors from devices. With this update we can now parse / update the CXL memory layout, and report events / errors from devices. This is a precursor for the CXL subsystem to handle the end-to-end "RAS" flow for CXL memory. i.e. the flow that for DDR-attached-DRAM is handled by the EDAC driver where it maps system physical address events to a field-replaceable-unit (FRU / endpoint device). In general, CXL has the potential to standardize what has historically been a pile of memory-controller-specific error handling logic. Another change of note is the default policy for handling RAM-backed device-dax instances. Previously the default access mode was "device", mmap(2) a device special file to access memory. The new default is "kmem" where the address range is assigned to the core-mm via add_memory_driver_managed(). This saves typical users from wondering why their platform memory is not visible via free(1) and stuck behind a device-file. At the same time it allows expert users to deploy policy to, for example, get dedicated access to high performance memory, or hide low performance memory from general purpose kernel allocations. This affects not only CXL, but also systems with high-bandwidth-memory that platform-firmware tags with the EFI_MEMORY_SP (special purpose) designation. Summary: - CXL RAM region enumeration: instantiate 'struct cxl_region' objects for platform firmware created memory regions - CXL RAM region provisioning: complement the existing PMEM region creation support with RAM region support - "Soft Reservation" policy change: Online (memory hot-add) soft-reserved memory (EFI_MEMORY_SP) by default, but still allow for setting aside such memory for dedicated access via device-dax. - CXL Events and Interrupts: Takeover CXL event handling from platform-firmware (ACPI calls this CXL Memory Error Reporting) and export CXL Events via Linux Trace Events. - Convey CXL _OSC results to drivers: Similar to PCI, let the CXL subsystem interrogate the result of CXL _OSC negotiation. - Emulate CXL DVSEC Range Registers as "decoders": Allow for first-generation devices that pre-date the definition of the CXL HDM Decoder Capability to translate the CXL DVSEC Range Registers into 'struct cxl_decoder' objects. - Set timestamp: Per spec, set the device timestamp in case of hotplug, or if platform-firwmare failed to set it. - General fixups: linux-next build issues, non-urgent fixes for pre-production hardware, unit test fixes, spelling and debug message improvements" * tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (66 commits) dax/kmem: Fix leak of memory-hotplug resources cxl/mem: Add kdoc param for event log driver state cxl/trace: Add serial number to trace points cxl/trace: Add host output to trace points cxl/trace: Standardize device information output cxl/pci: Remove locked check for dvsec_range_allowed() cxl/hdm: Add emulation when HDM decoders are not committed cxl/hdm: Create emulated cxl_hdm for devices that do not have HDM decoders cxl/hdm: Emulate HDM decoder from DVSEC range registers cxl/pci: Refactor cxl_hdm_decode_init() cxl/port: Export cxl_dvsec_rr_decode() to cxl_port cxl/pci: Break out range register decoding from cxl_hdm_decode_init() cxl: add RAS status unmasking for CXL cxl: remove unnecessary calling of pci_enable_pcie_error_reporting() dax/hmem: build hmem device support as module if possible dax: cxl: add CXL_REGION dependency cxl: avoid returning uninitialized error code cxl/pmem: Fix nvdimm registration races cxl/mem: Fix UAPI command comment cxl/uapi: Tag commands from cxl_query_cmd() ...
2023-02-24selftests/bpf: run mptcp in a dedicated netnsHangbin Liu1-2/+17
The current mptcp test is run in init netns. If the user or default system config disabled mptcp, the test will fail. Let's run the mptcp test in a dedicated netns to avoid none kernel default mptcp setting. Suggested-by: Martin KaFai Lau <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Matthieu Baerts <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-02-24selftests/bpf: move SYS() macro into the test_progs.hHangbin Liu11-263/+193
A lot of tests defined SYS() macro to run system calls with goto label. Let's move this macro to test_progs.h and add configurable "goto_label" as the first arg. Suggested-by: Martin KaFai Lau <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-02-24Merge tag 'for-linus-iommufd' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Some polishing and small fixes for iommufd: - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem - Use GFP_KERNEL_ACCOUNT inside the iommu_domains - Support VFIO_NOIOMMU mode with iommufd - Various typos - A list corruption bug if HWPTs are used for attach" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Do not add the same hwpt to the ioas->hwpt_list twice iommufd: Make sure to zero vfio_iommu_type1_info before copying to user vfio: Support VFIO_NOIOMMU with iommufd iommufd: Add three missing structures in ucmd_buffer selftests: iommu: Fix test_cmd_destroy_access() call in user_copy iommu: Remove IOMMU_CAP_INTR_REMAP irq/s390: Add arch_is_isolated_msi() for s390 iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code iommufd: Convert to msi_device_has_isolated_msi() vfio/type1: Convert to iommu_group_has_isolated_msi() iommu: Add iommu_group_has_isolated_msi() genirq/msi: Add msi_device_has_isolated_msi()
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds60-58/+933
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-23Merge tag 'probes-v6.3' of ↵Linus Torvalds3-2/+51
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull kprobes updates from Masami Hiramatsu: - Skip negative return code check for snprintf in eprobe - Add recursive call test cases for kprobe unit test - Add 'char' type to probe events to show it as the character instead of value - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols - Fix kselftest to check filter on eprobe event correctly - Add filter on eprobe to the README file in tracefs - Fix optprobes to check whether there is 'under unoptimizing' optprobe when optimizing another kprobe correctly - Fix optprobe to check whether there is 'under unoptimizing' optprobe when fetching the original instruction correctly - Fix optprobe to free 'forcibly unoptimized' optprobe correctly * tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/eprobe: no need to check for negative ret value for snprintf test_kprobes: Add recursed kprobe test case tracing/probe: add a char type to show the character value of traced arguments selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols selftests/ftrace: Fix eprobe syntax test case to check filter support tracing/eprobe: Fix to add filter on eprobe description in README file x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range x86/kprobes: Fix __recover_optprobed_insn check optimizing logic kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
2023-02-23Merge tag 'trace-v6.3' of ↵Linus Torvalds3-0/+88
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Add function names as a way to filter function addresses - Add sample module to test ftrace ops and dynamic trampolines - Allow stack traces to be passed from beginning event to end event for synthetic events. This will allow seeing the stack trace of when a task is scheduled out and recorded when it gets scheduled back in. - Add trace event helper __get_buf() to use as a temporary buffer when printing out trace event output. - Add kernel command line to create trace instances on boot up. - Add enabling of events to instances created at boot up. - Add trace_array_puts() to write into instances. - Allow boot instances to take a snapshot at the end of boot up. - Allow live patch modules to include trace events - Minor fixes and clean ups * tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits) tracing: Remove unnecessary NULL assignment tracepoint: Allow livepatch module add trace event tracing: Always use canonical ftrace path tracing/histogram: Fix stacktrace histogram Documententation tracing/histogram: Fix stacktrace key tracing/histogram: Fix a few problems with stacktrace variable printing tracing: Add BUILD_BUG() to make sure stacktrace fits in strings tracing/histogram: Don't use strlen to find length of stacktrace variables tracing: Allow boot instances to have snapshot buffers tracing: Add trace_array_puts() to write into instance tracing: Add enabling of events to boot instances tracing: Add creation of instances at boot command line tracing: Fix trace_event_raw_event_synth() if else statement samples: ftrace: Make some global variables static ftrace: sample: avoid open-coded 64-bit division samples: ftrace: Include the nospec-branch.h only for x86 tracing: Acquire buffer from temparary trace sequence tracing/histogram: Wrap remaining shell snippets in code blocks tracing/osnoise: No need for schedule_hrtimeout range bpf/tracing: Use stage6 of tracing to not duplicate macros ...
2023-02-23Merge tag 'ktest-v6.3' of ↵Linus Torvalds2-10/+31
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest Pull ktest updates from Steven Rostedt: - Fix three instances that the tty is not given back to the console on exit. Forcing the user to do a "reset" to get the console back. - Fix the console monitor to not hang when too much data is given by the ssh output. * tag 'ktest-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: ktest: Restore stty setting at first in dodie ktest.pl: Add RUN_TIMEOUT option with default unlimited ktest.pl: Give back console on Ctrt^C on monitor ktest.pl: Fix missing "end_monitor" when machine check fails
2023-02-23Merge tag 'linux-kselftest-kunit-6.3-rc1' of ↵Linus Torvalds1-85/+101
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit update from Shuah Khan: - add Function Redirection API to isolate the code being tested from other parts of the kernel. Documentation/dev-tools/kunit/api/functionredirection.rst has the details. * tag 'linux-kselftest-kunit-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Add printf attribute to fail_current_test_impl lib/hashtable_test.c: add test for the hashtable structure Documentation: Add Function Redirection API docs kunit: Expose 'static stub' API to redirect functions kunit: Add "hooks" to call into KUnit when it's built as a module kunit: kunit.py extract handlers tools/testing/kunit/kunit.py: remove redundant double check
2023-02-23Merge tag 'linux-kselftest-next-6.3-rc1' of ↵Linus Torvalds43-67/+201
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest update from Shuah Khan: - several patches to fix incorrect kernel headers search path from Mathieu Desnoyers - a few follow-on fixes found during testing the above change - miscellaneous fixes - support for filtering and enumerating tests * tag 'linux-kselftest-next-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (40 commits) selftests/user_events: add a note about user_events.h dependency selftests/mount_setattr: fix to make run_tests failure selftests/mount_setattr: fix redefine struct mount_attr build error selftests/sched: fix warn_unused_result build warns selftests/ptp: Remove clean target from Makefile selftests: use printf instead of echo -ne selftests/ftrace: Fix bash specific "==" operator selftests: tpm2: remove redundant ord() selftests: find echo binary to use -ne options selftests: Fix spelling mistake "allright" -> "all right" selftests: tdx: Use installed kernel headers search path selftests: ptrace: Use installed kernel headers search path selftests: memfd: Use installed kernel headers search path selftests: iommu: Use installed kernel headers search path selftests: x86: Fix incorrect kernel headers search path selftests: vm: Fix incorrect kernel headers search path selftests: user_events: Fix incorrect kernel headers search path selftests: sync: Fix incorrect kernel headers search path selftests: seccomp: Fix incorrect kernel headers search path selftests: sched: Fix incorrect kernel headers search path ...
2023-02-23Merge tag 'nolibc.2023.02.06a' of ↵Linus Torvalds4-2/+53
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc updates from Paul McKenney: - Add s390 support - Add support for the ARM Thumb1 instruction set - Fix O_* flags definitions for open() and fcntl() - Make errno a weak symbol instead of a static variable - Export environ as a weak symbol - Export _auxv as a weak symbol for auxilliary vector retrieval - Implement getauxval() and getpagesize() - Further improve self tests, including permitting userland testing of the nolibc library * tag 'nolibc.2023.02.06a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (28 commits) selftests/nolibc: Add a "run-user" target to test the program in user land selftests/nolibc: Support "x86_64" for arch name selftests/nolibc: Add `getpagesize(2)` selftest nolibc/sys: Implement `getpagesize(2)` function nolibc/stdlib: Implement `getauxval(3)` function tools/nolibc: add auxiliary vector retrieval for s390 tools/nolibc: add auxiliary vector retrieval for mips tools/nolibc: add auxiliary vector retrieval for riscv tools/nolibc: add auxiliary vector retrieval for arm tools/nolibc: add auxiliary vector retrieval for arm64 tools/nolibc: add auxiliary vector retrieval for x86_64 tools/nolibc: add auxiliary vector retrieval for i386 tools/nolibc: export environ as a weak symbol on s390 tools/nolibc: export environ as a weak symbol on riscv tools/nolibc: export environ as a weak symbol on mips tools/nolibc: export environ as a weak symbol on arm tools/nolibc: export environ as a weak symbol on arm64 tools/nolibc: export environ as a weak symbol on i386 tools/nolibc: export environ as a weak symbol on x86_64 tools/nolibc: make errno a weak symbol instead of a static one ...
2023-02-23selftests/bpf: Add a test case for bpf_cgroup_from_id()Tejun Heo3-0/+44
Add a test case for bpf_cgroup_from_id. Signed-off-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/Y/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-23selftests: fib_tests: Add test cases for IPv4/IPv6 in route notifyLu Wei1-1/+95
Add tests to check whether the total fib info length is calculated corretly in route notify process. Signed-off-by: Lu Wei <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski1-6/+26
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix broken listing of set elements when table has an owner. 2) Fix conntrack refcount leak in ctnetlink with related conntrack entries, from Hangyu Hua. 3) Fix use-after-free/double-free in ctnetlink conntrack insert path, from Florian Westphal. 4) Fix ip6t_rpfilter with VRF, from Phil Sutter. 5) Fix use-after-free in ebtables reported by syzbot, also from Florian. 6) Use skb->len in xt_length to deal with IPv6 jumbo packets, from Xin Long. 7) Fix NETLINK_LISTEN_ALL_NSID with ctnetlink, from Florian Westphal. 8) Fix memleak in {ip_,ip6_,arp_}tables in ENOMEM error case, from Pavel Tikhomirov. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: x_tables: fix percpu counter block leak on error path when creating new netns netfilter: ctnetlink: make event listener tracking global netfilter: xt_length: use skb len to match in length_mt6 netfilter: ebtables: fix table blob use-after-free netfilter: ip6t_rpfilter: Fix regression with VRF interfaces netfilter: conntrack: fix rmmod double-free race netfilter: ctnetlink: fix possible refcount leak in ctnetlink_create_conntrack() netfilter: nf_tables: allow to fetch set elements when table has an owner ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-22selftests/bpf: Fix BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL for empty flow labelStanislav Fomichev2-1/+25
Kernel's flow dissector continues to parse the packet when the (optional) IPv6 flow label is empty even when instructed to stop (via BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL). Do the same in our reference BPF reimplementation. Signed-off-by: Stanislav Fomichev <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-22Merge tag 'landlock-6.3-rc1' of ↵Linus Torvalds2-17/+143
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock updates from Mickaël Salaün: "This improves documentation, and makes some tests more flexible to be able to run on systems without overlayfs or with Yama restrictions" * tag 'landlock-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: MAINTAINERS: Update Landlock repository selftests/landlock: Test ptrace as much as possible with Yama selftests/landlock: Skip overlayfs tests when not supported landlock: Explain file descriptor access rights
2023-02-22selftests/bpf: Tests for uninitialized stack readsEduard Zingerman2-0/+96
Three testcases to make sure that stack reads from uninitialized locations are accepted by verifier when executed in privileged mode: - read from a fixed offset; - read from a variable offset; - passing a pointer to stack to a helper converts STACK_INVALID to STACK_MISC. Signed-off-by: Eduard Zingerman <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-22bpf: Allow reads from uninit stackEduard Zingerman8-135/+98
This commits updates the following functions to allow reads from uninitialized stack locations when env->allow_uninit_stack option is enabled: - check_stack_read_fixed_off() - check_stack_range_initialized(), called from: - check_stack_read_var_off() - check_helper_mem_access() Such change allows to relax logic in stacksafe() to treat STACK_MISC and STACK_INVALID in a same way and make the following stack slot configurations equivalent: | Cached state | Current state | | stack slot | stack slot | |------------------+------------------| | STACK_INVALID or | STACK_INVALID or | | STACK_MISC | STACK_SPILL or | | | STACK_MISC or | | | STACK_ZERO or | | | STACK_DYNPTR | This leads to significant verification speed gains (see below). The idea was suggested by Andrii Nakryiko [1] and initial patch was created by Alexei Starovoitov [2]. Currently the env->allow_uninit_stack is allowed for programs loaded by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities. A number of test cases from verifier/*.c were expecting uninitialized stack access to be an error. These test cases were updated to execute in unprivileged mode (thus preserving the tests). The test progs/test_global_func10.c expected "invalid indirect read from stack" error message because of the access to uninitialized memory region. This error is no longer possible in privileged mode. The test is updated to provoke an error "invalid indirect access to stack" because of access to invalid stack address (such error is not verified by progs/test_global_func*.c series of tests). The following tests had to be removed because these can't be made unprivileged: - verifier/sock.c: - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init stack_value" BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode. - verifier/var_off.c: - "indirect variable-offset stack access, max_off+size > max_initialized" - "indirect variable-offset stack access, uninitialized" These tests verify that access to uninitialized stack values is detected when stack offset is not a constant. However, variable stack access is prohibited in unprivileged mode, thus these tests are no longer valid. * * * Here is veristat log comparing this patch with current master on a set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg and cilium BPF binaries (see [3]): $ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log File Program States (A) States (B) States (DIFF) -------------------------- -------------------------- ---------- ---------- ---------------- bpf_host.o tail_handle_ipv6_from_host 349 244 -105 (-30.09%) bpf_host.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_lxc.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_sock.o cil_sock4_connect 70 48 -22 (-31.43%) bpf_sock.o cil_sock4_sendmsg 68 46 -22 (-32.35%) bpf_xdp.o tail_handle_nat_fwd_ipv4 1554 803 -751 (-48.33%) bpf_xdp.o tail_lb_ipv4 6457 2473 -3984 (-61.70%) bpf_xdp.o tail_lb_ipv6 7249 3908 -3341 (-46.09%) pyperf600_bpf_loop.bpf.o on_event 287 145 -142 (-49.48%) strobemeta.bpf.o on_event 15915 4772 -11143 (-70.02%) strobemeta_nounroll2.bpf.o on_event 17087 3820 -13267 (-77.64%) xdp_synproxy_kern.bpf.o syncookie_tc 21271 6635 -14636 (-68.81%) xdp_synproxy_kern.bpf.o syncookie_xdp 23122 6024 -17098 (-73.95%) -------------------------- -------------------------- ---------- ---------- ---------------- Note: I limited selection by states_pct<-30%. Inspection of differences in pyperf600_bpf_loop behavior shows that the following patch for the test removes almost all differences: - a/tools/testing/selftests/bpf/progs/pyperf.h + b/tools/testing/selftests/bpf/progs/pyperf.h @ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args *ctx) } if (event->pthread_match || !pidData->use_tls) { - void* frame_ptr; - FrameData frame; + void* frame_ptr = 0; + FrameData frame = {}; Symbol sym = {}; int cur_cpu = bpf_get_smp_processor_id(); W/o this patch the difference comes from the following pattern (for different variables): static bool get_frame_data(... FrameData *frame ...) { ... bpf_probe_read_user(&frame->f_code, ...); if (!frame->f_code) return false; ... bpf_probe_read_user(&frame->co_name, ...); if (frame->co_name) ...; } int __on_event(struct bpf_raw_tracepoint_args *ctx) { FrameData frame; ... get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback ... } SEC("raw_tracepoint/kfree_skb") int on_event(struct bpf_raw_tracepoint_args* ctx) { ... ret |= __on_event(ctx); ret |= __on_event(ctx); ... } With regards to value `frame->co_name` the following is important: - Because of the conditional `if (!frame->f_code)` each call to __on_event() produces two states, one with `frame->co_name` marked as STACK_MISC, another with it as is (and marked STACK_INVALID on a first call). - The call to bpf_probe_read_user() does not mark stack slots corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks these slots as BPF_MISC, this happens because of the following loop in the check_helper_call(): for (i = 0; i < meta.access_size; i++) { err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B, BPF_WRITE, -1, false); if (err) return err; } Note the size of the write, it is a one byte write for each byte touched by a helper. The BPF_B write does not lead to write marks for the target stack slot. - Which means that w/o this patch when second __on_event() call is verified `if (frame->co_name)` will propagate read marks first to a stack slot with STACK_MISC marks and second to a stack slot with STACK_INVALID marks and these states would be considered different. [1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/ [3] [email protected]:anakryiko/cilium.git Suggested-by: Andrii Nakryiko <[email protected]> Co-developed-by: Alexei Starovoitov <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-22Merge tag 'for-linus-2023022201' of ↵Linus Torvalds10-0/+1886
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID updates from Benjamin Tissoires: - HID-BPF infrastructure: this allows to start using HID-BPF. Note that the mechanism to ship HID-BPF program through the kernel tree is still not implemented yet (but is planned). This should be a no-op for 99% of users. Also we are gaining kselftests for the HID tree (Benjamin Tissoires) - Some UAF fixes in workers when using uhid (Pietro Borrello & Benjamin Tissoires) - Constify hid_ll_driver (Thomas Weißschuh) - Allow more custom IIO sensors through HID (Philipp Jungkamp) - Logitech HID++ fixes for scroll wheel, protocol and debug (Bastien Nocera) - Some new device support: Steam Deck (Vicki Pfau), UClogic (José Expósito), Logitech G923 Xbox Edition steering wheel (Walt Holman), EVision keyboards (Philippe Valembois) - other assorted code cleanups and fixes * tag 'for-linus-2023022201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (99 commits) HID: mcp-2221: prevent UAF in delayed work hid: bigben_probe(): validate report count HID: asus: use spinlock to safely schedule workers HID: asus: use spinlock to protect concurrent accesses HID: bigben: use spinlock to safely schedule workers HID: bigben_worker() remove unneeded check on report_field HID: bigben: use spinlock to protect concurrent accesses HID: logitech-hidpp: Add myself to authors HID: logitech-hidpp: Retry commands when device is busy HID: logitech-hidpp: Add more debug statements HID: Add support for Logitech G923 Xbox Edition steering wheel HID: logitech-hidpp: Add Signature M650 HID: logitech-hidpp: Remove HIDPP_QUIRK_NO_HIDINPUT quirk HID: logitech-hidpp: Don't restart communication if not necessary HID: logitech-hidpp: Add constants for HID++ 2.0 error codes Revert "HID: logitech-hidpp: add a module parameter to keep firmware gestures" HID: logitech-hidpp: Hard-code HID++ 1.0 fast scroll support HID: i2c-hid: goodix: Add mainboard-vddio-supply dt-bindings: HID: i2c-hid: goodix: Add mainboard-vddio-supply HID: i2c-hid: goodix: Stop tying the reset line to the regulator ...
2023-02-22Merge tag 'sound-6.3-rc1' of ↵Linus Torvalds6-92/+316
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "The majority of works in this cycle are about ASoC spread over trees. Most of them are for new devices and cleanups / refactoring works, and not much significant changes are seen in the core side. Below are some highlights: ASoC: - Continued refactoring to move into common helper functions - Lots of DT schema conversons and stylistic nits - Continued work on building out the new SOF IPC4 scheme - Continued work for Intel AVS - New drivers for Awinc AT88395, Infineon PEB2466, Iron Device SMA1303, Mediatek MT8188, Realtek RT712, Renesas IDT821034, Samsung/Tesla FSD SoC I2S, and TI TAS5720A-Q1 ALSA: - A few cleanups to make the remove callbacks to void returns - FireWire refactoring and enhancements - PCM kselftest enhancements" * tag 'sound-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (398 commits) ALSA: hda/hdmi: Register with vga_switcheroo on Dual GPU Macbooks ASoC: soc-ac97: Return correct error codes ASoC: soc-dapm.h: fixup warning struct snd_pcm_substream not declared ASoC: cs35l45: Remove separate namespace for tables ASoC: cs35l45: Remove separate tables module ASoC: soc-ac97: Convert to agnostic GPIO API ASoC: dt-bindings: renesas,rsnd.yaml: drop "dmas/dma-names" from "rcar_sound,ssi" ALSA: hda: cs35l41: Enable Amp High Pass Filter ALSA: hda: cs35l41: Ensure firmware/tuning pairs are always loaded ALSA: hda: cs35l41: Correct error condition handling ASoC: codecs: wcd934x: Use min macro for comparison and assignment ASoC: Intel: Skylake: Fix struct definition ASoC: tlv320adcx140: extend list of supported samplerates ASoC: imx-pcm-rpmsg: Remove unused variable SoC: rt5682s: Disable jack detection interrupt during suspend ASoC: SOF: Intel: hda-dsp: Set streaming flag for d0i3 ASoC: SOF: Intel: Enable d0i3 work for ipc4 ASoC: SOF: ipc4: Wake up dsp core before sending ipc msg ASoC: SOF: Intel: hda-dsp: use set_pm_gate according to ipc version ASoC: SOF: Introduce a new set_pm_gate() IPC PM op ...
2023-02-22Merge branch 'for-6.3/hid-bpf' into for-linusBenjamin Tissoires10-0/+1886
Initial support of HID-BPF (Benjamin Tissoires) The history is a little long for this series, as it was intended to be sent for v6.2. However some last minute issues forced us to postpone it to v6.3. Conflicts: * drivers/hid/i2c-hid/Kconfig: commit bf7660dab30d ("HID: stop drivers from selecting CONFIG_HID") conflicts with commit 2afac81dd165 ("HID: fix I2C_HID not selected when I2C_HID_OF_ELAN is") the resolution is simple enough: just drop the "default" and "select" lines as the new commit from Arnd is doing
2023-02-21Merge tag 'net-next-6.3' of ↵Linus Torvalds218-3023/+11299
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
2023-02-21Merge tag 'kvm-x86-apic-6.3' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-0/+55
KVM x86 APIC changes for 6.3: - Remove a superfluous variables from apic_get_tmcct() - Fix various edge cases in x2APIC MSR emulation - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Reset xAPIC when userspace forces "impossible" x2APIC => xAPIC transition
2023-02-21Merge tag 'arm64-upstream' of ↵Linus Torvalds31-111/+1535
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit architectural register (ZT0, for the look-up table feature) that Linux needs to save/restore - Include TPIDR2 in the signal context and add the corresponding kselftests - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG (ARM CMN) at probe time - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64 - Permit EFI boot with MMU and caches on. Instead of cleaning the entire loaded kernel image to the PoC and disabling the MMU and caches before branching to the kernel bare metal entry point, leave the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of all of system RAM to populate the initial page tables - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64 kernel (the arm32 kernel only defines the values) - Harden the arm64 shadow call stack pointer handling: stash the shadow stack pointer in the task struct on interrupt, load it directly from this structure - Signal handling cleanups to remove redundant validation of size information and avoid reading the same data from userspace twice - Refactor the hwcap macros to make use of the automatically generated ID registers. It should make new hwcaps writing less error prone - Further arm64 sysreg conversion and some fixes - arm64 kselftest fixes and improvements - Pointer authentication cleanups: don't sign leaf functions, unify asm-arch manipulation - Pseudo-NMI code generation optimisations - Minor fixes for SME and TPIDR2 handling - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable, replace strtobool() to kstrtobool() in the cpufeature.c code, apply dynamic shadow call stack in two passes, intercept pfn changes in set_pte_at() without the required break-before-make sequence, attempt to dump all instructions on unhandled kernel faults * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits) arm64: fix .idmap.text assertion for large kernels kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests kselftest/arm64: Copy whole EXTRA context arm64: kprobes: Drop ID map text from kprobes blacklist perf: arm_spe: Print the version of SPE detected perf: arm_spe: Add support for SPEv1.2 inverted event filtering perf: Add perf_event_attr::config3 arm64/sme: Fix __finalise_el2 SMEver check drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable arm64/signal: Only read new data when parsing the ZT context arm64/signal: Only read new data when parsing the ZA context arm64/signal: Only read new data when parsing the SVE context arm64/signal: Avoid rereading context frame sizes arm64/signal: Make interface for restore_fpsimd_context() consistent arm64/signal: Remove redundant size validation from parse_user_sigframe() arm64/signal: Don't redundantly verify FPSIMD magic arm64/cpufeature: Use helper macros to specify hwcaps arm64/cpufeature: Always use symbolic name for feature value in hwcaps arm64/sysreg: Initial unsigned annotations for ID registers arm64/sysreg: Initial annotation of signed ID registers ...
2023-02-22netfilter: ip6t_rpfilter: Fix regression with VRF interfacesPhil Sutter1-6/+26
When calling ip6_route_lookup() for the packet arriving on the VRF interface, the result is always the real (slave) interface. Expect this when validating the result. Fixes: acc641ab95b66 ("netfilter: rpfilter/fib: Populate flowic_l3mdev field") Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-02-21Merge tag 'm68k-for-v6.3-tag1' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - Add seccomp support - defconfig updates - Miscellaneous fixes and improvements * tag 'm68k-for-v6.3-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: /proc/hardware should depend on PROC_FS selftests/seccomp: Add m68k support m68k: Add kernel seccomp support m68k: Check syscall_trace_enter() return code m68k: defconfig: Update defconfigs for v6.2-rc3 m68k: q40: Do not initialise statics to 0
2023-02-21Merge tag 'rcu.2023.02.10a' of ↵Linus Torvalds6-17/+16
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably: - Throttling callback invocation based on the number of callbacks that are now ready to invoke instead of on the total number of callbacks - Several patches that suppress false-positive boot-time diagnostics, for example, due to lockdep not yet being initialized - Make expedited RCU CPU stall warnings dump stacks of any tasks that are blocking the stalled grace period. (Normal RCU CPU stall warnings have done this for many years) - Lazy-callback fixes to avoid delays during boot, suspend, and resume. (Note that lazy callbacks must be explicitly enabled, so this should not (yet) affect production use cases) - Make kfree_rcu() and friends take advantage of polled grace periods, thus reducing memory footprint by almost two orders of magnitude, admittedly on a microbenchmark This also begins the transition from kfree_rcu(p) to kfree_rcu_mightsleep(p). This transition was motivated by bugs where kfree_rcu(p), which can block, was typed instead of the intended kfree_rcu(p, rh) - SRCU updates, perhaps most notably fixing a bug that causes SRCU to fail when booted on a system with a non-zero boot CPU. This surprising situation actually happens for kdump kernels on the powerpc architecture This also adds an srcu_down_read() and srcu_up_read(), which act like srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side critical section to be handed off from one task to another - Clean up the now-useless SRCU Kconfig option There are a few more commits that are not yet acked or pulled into maintainer trees, and these will be in a pull request for a later merge window - RCU-tasks updates, perhaps most notably these fixes: - A strange interaction between PID-namespace unshare and the RCU-tasks grace period that results in a low-probability but very real hang - A race between an RCU tasks rude grace period on a single-CPU system and CPU-hotplug addition of the second CPU that can result in a too-short grace period - A race between shrinking RCU tasks down to a single callback list and queuing a new callback to some other CPU, but where that queuing is delayed for more than an RCU grace period. This can result in that callback being stranded on the non-boot CPU - Torture-test updates and fixes - Torture-test scripting updates and fixes - Provide additional RCU CPU stall-warning information in kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute timeout limit for expedited RCU CPU stall warnings * tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() kernel/notifier: Remove CONFIG_SRCU init: Remove "select SRCU" fs/quota: Remove "select SRCU" fs/notify: Remove "select SRCU" fs/btrfs: Remove "select SRCU" fs: Remove CONFIG_SRCU drivers/pci/controller: Remove "select SRCU" drivers/net: Remove "select SRCU" drivers/md: Remove "select SRCU" drivers/hwtracing/stm: Remove "select SRCU" drivers/dax: Remove "select SRCU" drivers/base: Remove CONFIG_SRCU rcu: Disable laziness if lazy-tracking says so rcu: Track laziness during boot and suspend rcu: Remove redundant call to rcu_boost_kthread_setaffinity() rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts rcu: Align the output of RCU CPU stall warning messages rcu: Add RCU stall diagnosis information sched: Add helper nr_context_switches_cpu() ...
2023-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-4/+22
Per-next-PR merge. net/smc/af_smc.c b5dd4d698171 ("net/smc: llc_conf_mutex refactor, replace it with rw_semaphore") e40b801b3603 ("net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>