aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2023-06-05bpf: Make bpf_refcount_acquire fallible for non-owning refsDave Marchevsky2-1/+5
This patch fixes an incorrect assumption made in the original bpf_refcount series [0], specifically that the BPF program calling bpf_refcount_acquire on some node can always guarantee that the node is alive. In that series, the patch adding failure behavior to rbtree_add and list_push_{front, back} breaks this assumption for non-owning references. Consider the following program: n = bpf_kptr_xchg(&mapval, NULL); /* skip error checking */ bpf_spin_lock(&l); if(bpf_rbtree_add(&t, &n->rb, less)) { bpf_refcount_acquire(n); /* Failed to add, do something else with the node */ } bpf_spin_unlock(&l); It's incorrect to assume that bpf_refcount_acquire will always succeed in this scenario. bpf_refcount_acquire is being called in a critical section here, but the lock being held is associated with rbtree t, which isn't necessarily the lock associated with the tree that the node is already in. So after bpf_rbtree_add fails to add the node and calls bpf_obj_drop in it, the program has no ownership of the node's lifetime. Therefore the node's refcount can be decr'd to 0 at any time after the failing rbtree_add. If this happens before the refcount_acquire above, the node might be free'd, and regardless refcount_acquire will be incrementing a 0 refcount. Later patches in the series exercise this scenario, resulting in the expected complaint from the kernel (without this patch's changes): refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 207 at lib/refcount.c:25 refcount_warn_saturate+0xbc/0x110 Modules linked in: bpf_testmod(O) CPU: 1 PID: 207 Comm: test_progs Tainted: G O 6.3.0-rc7-02231-g723de1a718a2-dirty #371 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0xbc/0x110 Code: 6f 64 f6 02 01 e8 84 a3 5c ff 0f 0b eb 9d 80 3d 5e 64 f6 02 00 75 94 48 c7 c7 e0 13 d2 82 c6 05 4e 64 f6 02 01 e8 64 a3 5c ff <0f> 0b e9 7a ff ff ff 80 3d 38 64 f6 02 00 0f 85 6d ff ff ff 48 c7 RSP: 0018:ffff88810b9179b0 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: 0000000000000202 RSI: 0000000000000008 RDI: ffffffff857c3680 RBP: ffff88810027d3c0 R08: ffffffff8125f2a4 R09: ffff88810b9176e7 R10: ffffed1021722edc R11: 746e756f63666572 R12: ffff88810027d388 R13: ffff88810027d3c0 R14: ffffc900005fe030 R15: ffffc900005fe048 FS: 00007fee0584a700(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005634a96f6c58 CR3: 0000000108ce9002 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> bpf_refcount_acquire_impl+0xb5/0xc0 (rest of output snipped) The patch addresses this by changing bpf_refcount_acquire_impl to use refcount_inc_not_zero instead of refcount_inc and marking bpf_refcount_acquire KF_RET_NULL. For owning references, though, we know the above scenario is not possible and thus that bpf_refcount_acquire will always succeed. Some verifier bookkeeping is added to track "is input owning ref?" for bpf_refcount_acquire calls and return false from is_kfunc_ret_null for bpf_refcount_acquire on owning refs despite it being marked KF_RET_NULL. Existing selftests using bpf_refcount_acquire are modified where necessary to NULL-check its return value. [0]: https://lore.kernel.org/bpf/[email protected]/ Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail") Reported-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-06-05perf stat: Document --metric-no-threshold and threshold colorsIan Rogers1-0/+15
Document the threshold behavior for -M/--metrics. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ahmad Yasin <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Eduard Zingerman <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-05perf expr: Make the evaluation of & and | logical and lazyIan Rogers2-17/+109
Currently the & and | operators are only used in metric thresholds like (from the tma_retiring metric): tma_retiring > 0.7 | tma_heavy_operations > 0.1 Thresholds are always computed when present, but a lack of events may mean the threshold can't be computed. This happens with the option --metric-no-threshold for say the metric tma_retiring on Tigerlake model CPUs. To fully compute the threshold tma_heavy_operations is needed and it needs the extra events of IDQ.MS_UOPS, UOPS_DECODED.DEC0, cpu/UOPS_DECODED.DEC0,cmask=1/ and IDQ.MITE_UOPS. So --metric-no-threshold is a useful option to reduce the number of events needed and potentially multiplexing of events. Rather than just fail threshold computations like this, we may know a result from just the left or right-hand side. So, for tma_retiring if its value is "> 0.7" we know it is over the threshold. This allows the metric to have the threshold coloring, when possible, without all the counters being programmed. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Kan Liang <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ahmad Yasin <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Eduard Zingerman <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-05kselftest/arm64: add MOPS to hwcap testKristina Martsenko1-0/+22
Add the MOPS hwcap to the hwcap kselftest and check that a SIGILL is not generated when the feature is detected. A SIGILL is reliable when the feature is not detected as SCTLR_EL1.MSCEn won't have been set. Signed-off-by: Kristina Martsenko <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2023-06-05perf LoongArch: Simplify mksyscalltblTiezhu Yang1-27/+11
In order to print the numerical entries of the syscall table, there is no need to call the host compiler to build and then run a program, this can be done directly by the shell script. This is similar with commit 9854e7ad35fe ("perf arm64: Simplify mksyscalltbl"). For now, the mksyscalltbl file of LoongArch is almost same with arm64. Reviewed-by: Huacai Chen <[email protected]> Reviewed-by: Leo Yan <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-05perf arm64: Use max_nr to define SYSCALLTBL_ARM64_MAX_IDTiezhu Yang1-3/+3
Like x86, powerpc, mips and s390, use max_nr which is a digital number to define SYSCALLTBL_ARM64_MAX_ID. Reviewed-by: Huacai Chen <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-05perf arm64: Handle __NR3264_ prefixed syscall numberTiezhu Yang1-2/+3
After commit 9854e7ad35fe ("perf arm64: Simplify mksyscalltbl"), in the generated syscall table file syscalls.c, there exist some __NR3264_ prefixed syscall numbers such as [__NR3264_ftruncate], it looks like not so good, just do some small filter operations to handle __NR3264_ prefixed syscall number as a digital number. Without this patch: [__NR3264_ftruncate] = "ftruncate", With this patch: [46] = "ftruncate", Suggested-by: Alexander Kapshuk <[email protected]> Reviewed-by: Huacai Chen <[email protected]> Reviewed-by: Leo Yan <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-05perf arm64: Rename create_table_from_c() to create_sc_table()Tiezhu Yang1-2/+2
After commit 9854e7ad35fecf30 ("perf arm64: Simplify mksyscalltbl") it has been removed the temporary C program and used shell to generate syscall table, so let us rename create_table_from_c() to create_sc_table() to avoid confusion. Suggested-by: Leo Yan <[email protected]> Reviewed-by: Huacai Chen <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-05perf tools: Declare syscalltbl_*[] as const for all archsTiezhu Yang7-13/+13
syscalltbl_*[] should never be changing, let us declare it as const. Suggested-by: Ian Rogers <[email protected]> Reviewed-by: Huacai Chen <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-05selftests: mptcp: update userspace pm subflow testsGeliang Tang1-1/+2
To align with what is done by the in-kernel PM, update userspace pm subflow selftests, by sending the a remove_addrs command together before the remove_subflows command. This will get a RM_ADDR in chk_rm_nr(). Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE") Fixes: 5e986ec46874 ("selftests: mptcp: userspace pm subflow tests") Link: https://github.com/multipath-tcp/mptcp_net-next/issues/379 Cc: [email protected] Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-05selftests: mptcp: update userspace pm addr testsGeliang Tang1-0/+8
This patch is linked to the previous commit ("mptcp: only send RM_ADDR in nl_cmd_remove"). To align with what is done by the in-kernel PM, update userspace pm addr selftests, by sending a remove_subflows command together after the remove_addrs command. Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE") Fixes: 97040cf9806e ("selftests: mptcp: userspace pm address tests") Cc: [email protected] Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-05perf bench: Add missing setlocale() call to allow usage of %'d style formattingArnaldo Carvalho de Melo1-0/+2
Without this we were not getting the thousands separator for big numbers. Noticed while developing 'perf bench uprobe', but the use of %' predates that, for instance 'perf bench syscall' uses it. Before: # perf bench uprobe all # Running uprobe/baseline benchmark... # Executed 1000 usleep(1000) calls Total time: 1054082243ns 1054082.243000 nsecs/op # After: # perf bench uprobe all # Running uprobe/baseline benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,053,715,144ns 1,053,715.144000 nsecs/op # Fixes: c2a08203052f8975 ("perf bench: Add basic syscall benchmark") Cc: Adrian Hunter <[email protected]> Cc: Andre Fredette <[email protected]> Cc: Clark Williams <[email protected]> Cc: Dave Tucker <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Derek Barbosa <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Tiezhu Yang <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-05selftests: router_bridge_vlan: Set vlan_default_pvid 0 on the bridgePetr Machata1-1/+1
When everything is configured, VLAN membership on the bridge in this selftest are as follows: # bridge vlan show port vlan-id swp2 1 PVID Egress Untagged 555 br1 1 Egress Untagged 555 PVID Egress Untagged Note that it is possible for untagged traffic to just flow through as VLAN 1, instead of using VLAN 555 as intended by the test. This configuration seems too close to "works by accident", and it would be better to just shut out VLAN 1 altogether. To that end, configure vlan_default_pvid of 0: # bridge vlan show port vlan-id swp2 555 br1 555 PVID Egress Untagged Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-05selftests: router_bridge_vlan: Add a diagramPetr Machata1-0/+22
Add a topology diagram to this selftest to make the configuration easier to understand. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-05selftests: mlxsw: egress_vid_classification: Fix the diagramPetr Machata1-3/+2
The topology diagram implies that $swp1 and $swp2 are members of the bridge br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the diagram. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-05selftests: mlxsw: ingress_rif_conf_1d: Fix the diagramPetr Machata1-3/+2
The topology diagram implies that $swp1 and $swp2 are members of the bridge br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the diagram. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-05selftests: alsa: pcm-test: Fix compiler warnings about the formatMirsad Goran Todorovac1-5/+5
GCC 11.3.0 issues warnings in this module about wrong sizes of format specifiers: pcm-test.c: In function ‘test_pcm_time’: pcm-test.c:384:68: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 5 \ has type ‘unsigned int’ [-Wformat=] 384 | snprintf(msg, sizeof(msg), "rate mismatch %ld != %ld", rate, rrate); pcm-test.c:455:53: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has \ type ‘long int’ [-Wformat=] 455 | "expected %d, wrote %li", rate, frames); pcm-test.c:462:53: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has \ type ‘long int’ [-Wformat=] 462 | "expected %d, wrote %li", rate, frames); pcm-test.c:467:53: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has \ type ‘long int’ [-Wformat=] 467 | "expected %d, wrote %li", rate, frames); Simple fix according to compiler's suggestion removed the warnings. Signed-off-by: Mirsad Goran Todorovac <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-06-05Merge 6.4-rc5 into usb-nextGreg Kroah-Hartman27-50/+262
We need the USB fixes in here are well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-06-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-0/+75
Pull kvm fixes from Paolo Bonzini: "ARM: - Address some fallout of the locking rework, this time affecting the way the vgic is configured - Fix an issue where the page table walker frees a subtree and then proceeds with walking what it has just freed... - Check that a given PA donated to the guest is actually memory (only affecting pKVM) - Correctly handle MTE CMOs by Set/Way - Fix the reported address of a watchpoint forwarded to userspace - Fix the freeing of the root of stage-2 page tables - Stop creating spurious PMU events to perform detection of the default PMU and use the existing PMU list instead x86: - Fix a memslot lookup bug in the NX recovery thread that could theoretically let userspace bypass the NX hugepage mitigation - Fix a s/BLOCKING/PENDING bug in SVM's vNMI support - Account exit stats for fastpath VM-Exits that never leave the super tight run-loop - Fix an out-of-bounds bug in the optimized APIC map code, and add a regression test for the race" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Add test for race in kvm_recalculate_apic_map() KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds KVM: x86: Account fastpath-only VM-Exits in vCPU stats KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker KVM: arm64: Document default vPMU behavior on heterogeneous systems KVM: arm64: Iterate arm_pmus list to probe for default PMU KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed() KVM: arm64: Populate fault info for watchpoint KVM: arm64: Reload PTE after invoking walker callback on preorder traversal KVM: arm64: Handle trap of tagged Set/Way CMOs arm64: Add missing Set/Way CMO encodings KVM: arm64: Prevent unconditional donation of unmapped regions from the host KVM: arm64: vgic: Fix a comment KVM: arm64: vgic: Fix locking comment KVM: arm64: vgic: Wrap vgic_its_create() with config_lock KVM: arm64: vgic: Fix a circular locking issue
2023-06-03Fix gitignore for recently added usptream self testsWeihao Gao1-0/+3
This resolves the issue that generated binary is showing up as an untracked git file after every build on the kernel. Signed-off-by: Weihao Gao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-03Merge tag 'probes-fixes-6.4-rc4' of ↵Linus Torvalds1-18/+27
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Return NULL if the trace_probe list on trace_probe_event is empty - selftests/ftrace: Choose testing symbol name for filtering feature from sample data instead of fixed symbol * tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: selftests/ftrace: Choose target function for filter test from samples tracing/probe: trace_probe_primary_from_call(): checked list_first_entry
2023-06-03selftests/ftrace: Choose target function for filter test from samplesMasami Hiramatsu (Google)1-18/+27
Since the event-filter-function.tc expects the 'exit_mmap()' directly calls 'kmem_cache_free()', this is vulnerable to code modifications. Choose the target function for the filter test from the sample event data so that it can keep test running correctly even if the caller function name will be changed. Link: https://lore.kernel.org/linux-trace-kernel/167919441260.1922645.18355804179347364057.stgit@mhiramat.roam.corp.google.com/ Link: https://lore.kernel.org/all/CA+G9fYtF-XEKi9YNGgR=Kf==7iRb2FrmEC7qtwAeQbfyah-UhA@mail.gmail.com/ Reported-by: Linux Kernel Functional Testing <[email protected]> Fixes: 7f09d639b8c4 ("tracing/selftests: Add test for event filtering on function name") Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]>
2023-06-02tools: ynl-gen: generate static descriptions of notificationsJakub Kicinski1-10/+42
Notifications may come in at any time. The family must be always ready to parse a random incoming notification. Generate notification table for parsing and tell YNL which request we're processing to distinguish responses from notifications. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02tools: ynl-gen: switch to family structJakub Kicinski1-0/+15
We'll want to store static info about the family soon. Generate a struct. This changes creation from, e.g.: ys = ynl_sock_create("netdev", &yerr); to: ys = ynl_sock_create(&ynl_netdev_family, &yerr); on user's side. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02tools: ynl-gen: generate alloc and free helpers for reqJakub Kicinski1-0/+17
We expect user to allocate requests with calloc(), make things a bit more consistent and provide helpers. Generate free calls, too. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02tools: ynl-gen: move the response reading logic into YNLJakub Kicinski1-36/+27
We generate send() and recv() calls and all msg handling for each operation. It's a lot of repeated code and will only grow with notification handling. Call back to a helper YNL lib instead. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02tools: ynl-gen: generate enum-to-string helpersJakub Kicinski1-0/+66
It's sometimes useful to print the name of an enum value, flag or name of the op. Python can do it, add C helper code gen for getting names of things. Example: static const char * const netdev_xdp_act_strmap[] = { [0] = "basic", [1] = "redirect", [2] = "ndo-xmit", [3] = "xsk-zerocopy", [4] = "hw-offload", [5] = "rx-sg", [6] = "ndo-xmit-sg", }; const char *netdev_xdp_act_str(enum netdev_xdp_act value) { value = ffs(value) - 1; if (value < 0 || value >= (int)MNL_ARRAY_SIZE(netdev_xdp_act_strmap)) return NULL; return netdev_xdp_act_strmap[value]; } Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02tools: ynl-gen: add error checking for nested structsJakub Kicinski1-1/+2
Parsing nested types may return an error, propagate it. Not marking as a fix, because nothing uses YNL upstream. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02tools: ynl-gen: loosen type consistency check for eventsJakub Kicinski1-7/+8
Both event and notify types are always consistent. Rewrite the condition checking if we can reuse reply types to be less picky and let notify thru. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02tools: ynl-gen: don't override pure nested structJakub Kicinski1-1/+2
For pure structs (parsed nested attributes) we track what forms of the struct exist in request and reply directions. Make sure we don't overwrite the recorded struct each time, otherwise the information is lost. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02tools: ynl-gen: fix unused / pad attribute handlingJakub Kicinski1-3/+17
Unused and Pad attributes don't carry information. Unused should never exist, and be rejected. Pad should be silently skipped. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02tools: ynl-gen: add extra headers for user spaceJakub Kicinski1-0/+7
Make sure all relevant headers are included, we allocate memory, use memcpy() and Linux types without including the headers. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02KVM: selftests: Add test for race in kvm_recalculate_apic_map()Michal Luczaj2-0/+75
Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during APIC map construction to hunt for TOCTOU bugs in KVM. KVM's optimized map recalc makes multiple passes over the list of vCPUs, and the calculations ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where toggling LAPIC_MODE_DISABLED is quite interesting. Signed-off-by: Michal Luczaj <[email protected]> Co-developed-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-02selftests/bpf: Add access_inner_map selftestRhys Rustad-Elliott2-0/+76
Add a selftest that accesses a BPF_MAP_TYPE_ARRAY (at a nonzero index) nested within a BPF_MAP_TYPE_HASH_OF_MAPS to flex a previously buggy case. Signed-off-by: Rhys Rustad-Elliott <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-06-02objtool: Add __kunit_abort() to noreturnsJosh Poimboeuf1-0/+1
Fixes a bunch of warnings like: drivers/input/tests/input_test.o: warning: objtool: input_test_init+0x1cb: stack state mismatch: cfa1=4+64 cfa2=4+56 lib/kunit/kunit-test.o: warning: objtool: kunit_log_newline_test+0xfb: return with modified stack frame ... Fixes: 260755184cbd ("kunit: Move kunit_abort() call out of kunit_do_failed_assertion()") Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/20230602175453.swsn3ehyochtwkhy@treble
2023-06-02selftests: tls: add tests for poll behaviorJakub Kicinski1-0/+131
Make sure we don't generate premature POLLIN events. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-01selftests/tc-testing: replace mq with invalid parent IDZhengchao Shao1-1/+24
The test case shown in [1] triggers the kernel to access the null pointer. Therefore, add related test cases to mq. The test results are as follows: ./tdc.py -e 0531 1..1 ok 1 0531 - Replace mq with invalid parent ID ./tdc.py -c mq 1..8 ok 1 ce7d - Add mq Qdisc to multi-queue device (4 queues) ok 2 2f82 - Add mq Qdisc to multi-queue device (256 queues) ok 3 c525 - Add duplicate mq Qdisc ok 4 128a - Delete nonexistent mq Qdisc ok 5 03a9 - Delete mq Qdisc twice ok 6 be0f - Add mq Qdisc to single-queue device ok 7 1023 - Show mq class ok 8 0531 - Replace mq with invalid parent ID [1] https://lore.kernel.org/all/[email protected]/ Signed-off-by: Zhengchao Shao <[email protected]> Reviewed-by: Pedro Tammela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski29-33/+181
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/sfc/tc.c 622ab656344a ("sfc: fix error unwinds in TC offload") b6583d5e9e94 ("sfc: support TC decap rules matching on enc_src_port") net/mptcp/protocol.c 5b825727d087 ("mptcp: add annotations around msk->subflow accesses") e76c8ef5cc5b ("mptcp: refactor mptcp_stream_accept()") Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-01Merge tag 'net-6.4-rc5' of ↵Linus Torvalds10-5/+83
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Happy Wear a Dress Day. Fairly standard-sized batch of fixes, accounting for the lack of sub-tree submissions this week. The mlx5 IRQ fixes are notable, people were complaining about that. No fires burning. Current release - regressions: - eth: mlx5e: - multiple fixes for dynamic IRQ allocation - prevent encap offload when neigh update is running - eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters Current release - new code bugs: - eth: mlx5e: DR, add missing mutex init/destroy in pattern manager Previous releases - always broken: - tcp: deny tcp_disconnect() when threads are waiting - sched: prevent ingress Qdiscs from getting installed in random locations in the hierarchy and moving around - sched: flower: fix possible OOB write in fl_set_geneve_opt() - netlink: fix NETLINK_LIST_MEMBERSHIPS length report - udp6: fix race condition in udp6_sendmsg & connect - tcp: fix mishandling when the sack compression is deferred - rtnetlink: validate link attributes set at creation time - mptcp: fix connect timeout handling - eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked - eth: amd-xgbe: fix the false linkup in xgbe_phy_status - eth: mlx5e: - fix corner cases in internal buffer configuration - drain health before unregistering devlink - usb: qmi_wwan: set DTR quirk for BroadMobi BM818 Misc: - tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set" * tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits) mptcp: fix active subflow finalization mptcp: add annotations around sk->sk_shutdown accesses mptcp: fix data race around msk->first access mptcp: consolidate passive msk socket initialization mptcp: add annotations around msk->subflow accesses mptcp: fix connect timeout handling rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg rtnetlink: call validate_linkmsg in rtnl_create_link ice: recycle/free all of the fragments from multi-buffer frame net: phy: mxl-gpy: extend interrupt fix to all impacted variants net: renesas: rswitch: Fix return value in error path of xmit net: dsa: mv88e6xxx: Increase wait after reset deactivation net: ipa: Use correct value for IPA_STATUS_SIZE tcp: fix mishandling when the sack compression is deferred. net/sched: flower: fix possible OOB write in fl_set_geneve_opt() sfc: fix error unwinds in TC offload net/mlx5: Read embedded cpu after init bit cleared net/mlx5e: Fix error handling in mlx5e_refresh_tirs net/mlx5: Ensure af_desc.mask is properly initialized ...
2023-06-01KVM: selftests: Extend cpuid_test to verify KVM_GET_CPUID2 "nent" updatesSean Christopherson1-0/+21
Verify that KVM reports the actual number of CPUID entries on success, but doesn't touch the userspace struct on failure (which for better or worse, is KVM's ABI). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-01KVM: selftests: Add dirty logging page splitting testBen Gardon2-0/+260
Add a test for page splitting during dirty logging and for hugepage recovery after dirty logging. Page splitting represents non-trivial behavior, which is complicated by MANUAL_PROTECT mode, which causes pages to be split on the first clear, instead of when dirty logging is enabled. Add a test which makes assertions about page counts to help define the expected behavior of page splitting and to provide needed coverage of the behavior. This also helps ensure that a failure in eager page splitting is not covered up by splitting in the vCPU path. Tested by running the test on an Intel Haswell machine w/wo MANUAL_PROTECT. Reviewed-by: Vipin Sharma <[email protected]> Signed-off-by: Ben Gardon <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: let the user run without hugetlb, as suggested by Paolo] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-01KVM: selftests: Move dirty logging functions to memstress.(c|h)Ben Gardon3-77/+87
Move some helper functions from dirty_log_perf_test.c to the memstress library so that they can be used in a future commit which tests page splitting during dirty logging. Reviewed-by: Vipin Sharma <[email protected]> Signed-off-by: Ben Gardon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-01KVM: selftests: touch all pages of args on each memstress iterationPaolo Bonzini1-0/+3
Access the same memory addresses on each iteration of the memstress guest code. This ensures that the state of KVM's page tables is the same after every iteration, including the pages that host the guest page tables for args and vcpu_args. This difference is visible when running the proposed dirty_log_page_splitting_test[*] on AMD, or on Intel with pml=0 and eptad=0. The tests fail due to different semantics of dirty bits for page-table pages on AMD (and eptad=0) and Intel. Both AMD and Intel with eptad=0 treat page-table accesses as writes, therefore more pages are dropped before the repopulation phase when dirty logging is disabled. The "missing" page had been included in the population phase because it hosts the page tables for vcpu_args, but repopulation does not need it." Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Vipin Sharma <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: add additional details in changelog] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-01perf script: Increase PID/TID width for outputNamhyung Kim1-3/+3
On large systems, it's common that PID/TID is bigger than 5-digit and it makes the output unaligned. Let's increase the width to 7. Before: $ perf script ... swapper 0 [006] 1540823.803935: 1369324 cycles:P: ffffffff9c755588 ktime_get+0x18 ([kernel.kallsyms]) gvfsd-dnssd 95114 [004] 1540823.804164: 1643871 cycles:P: ffffffff9cfdca5c __get_user_8+0x1c ([kernel.kallsyms]) perf-exec 1558582 [000] 1540823.804209: 1018714 cycles:P: ffffffff9c924ab9 __slab_free+0x9 ([kernel.kallsyms]) nmcli 1558589 [007] 1540823.804384: 1859212 cycles:P: 7f70537a8ad8 __strchrnul_evex+0x18 (/usr/lib/x86_64-linux-gnu/libc.so.6> sleep 1558582 [000] 1540823.804456: 987425 cycles:P: 7fd35bb27b30 _dl_init+0x0 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2> dbus-daemon 3043 [003] 1540823.804575: 1564465 cycles:P: ffffffff9cb2bb70 llist_add_batch+0x0 ([kernel.kallsyms]) gdbus 1558592 [001] 1540823.804766: 1315219 cycles:P: ffffffff9c797b2e audit_filter_syscall+0x9e ([kernel.kallsyms]) NetworkManager 3452 [005] 1540823.805301: 1558782 cycles:P: 7fa957737748 g_bit_lock+0x58 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7400.5> After: $ perf script ... swapper 0 [006] 1540823.803935: 1369324 cycles:P: ffffffff9c755588 ktime_get+0x18 ([kernel.kallsyms]) gvfsd-dnssd 95114 [004] 1540823.804164: 1643871 cycles:P: ffffffff9cfdca5c __get_user_8+0x1c ([kernel.kallsyms]) perf-exec 1558582 [000] 1540823.804209: 1018714 cycles:P: ffffffff9c924ab9 __slab_free+0x9 ([kernel.kallsyms]) nmcli 1558589 [007] 1540823.804384: 1859212 cycles:P: 7f70537a8ad8 __strchrnul_evex+0x18 (/usr/lib/x86_64-linux-gnu/libc.so.6> sleep 1558582 [000] 1540823.804456: 987425 cycles:P: 7fd35bb27b30 _dl_init+0x0 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2> dbus-daemon 3043 [003] 1540823.804575: 1564465 cycles:P: ffffffff9cb2bb70 llist_add_batch+0x0 ([kernel.kallsyms]) gdbus 1558592 [001] 1540823.804766: 1315219 cycles:P: ffffffff9c797b2e audit_filter_syscall+0x9e ([kernel.kallsyms]) NetworkManager 3452 [005] 1540823.805301: 1558782 cycles:P: 7fa957737748 g_bit_lock+0x58 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7400.5> Reviewer notes: Adrian added: "Might be worth noting that currently the biggest PID_MAX_LIMIT is 2^22 so pids don't get bigger than 7 digits presently" $ echo $((2 ** 22)) 4194304 $ echo -n $((2 ** 22)) | wc -c 7 $ Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-01perf pmu: Warn about invalid config for all PMUs and configsIan Rogers3-15/+49
Don't just check the raw PMU type, the only core PMU on homogeneous x86, check raw and all dynamically added PMUs. Extend the perf_pmu__warn_invalid_config to check all 4 config values. Rather than process the format list once per event, store the computed masks for each config value. Don't ignore the mask being zero, which is likely for config2 and config3, add config_masks_present so config values can be ignored only when no format information is present. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rob Herring <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-01perf pmu: Only warn about unsupported formats onceIan Rogers2-0/+10
Avoid scanning format list for each event parsed. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rob Herring <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-01selftests/bpf: Test table ID fib lookup BPF helperLouis DeLosSantos1-8/+53
Add additional test cases to `fib_lookup.c` prog_test. These test cases add a new /24 network to the previously unused veth2 device, removes the directly connected route from the main routing table and moves it to table 100. The first test case then confirms a fib lookup for a remote address in this directly connected network, using the main routing table fails. The second test case ensures the same fib lookup using table 100 succeeds. An additional pair of tests which function in the same manner are added for IPv6. Signed-off-by: Louis DeLosSantos <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-01bpf: Add table ID to bpf_fib_lookup BPF helperLouis DeLosSantos1-3/+18
Add ability to specify routing table ID to the `bpf_fib_lookup` BPF helper. A new field `tbid` is added to `struct bpf_fib_lookup` used as parameters to the `bpf_fib_lookup` BPF helper. When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` and `BPF_FIB_LOOKUP_TBID` flags the `tbid` field in `struct bpf_fib_lookup` will be used as the table ID for the fib lookup. If the `tbid` does not exist the fib lookup will fail with `BPF_FIB_LKUP_RET_NOT_FWDED`. The `tbid` field becomes a union over the vlan related output fields in `struct bpf_fib_lookup` and will be zeroed immediately after usage. This functionality is useful in containerized environments. For instance, if a CNI wants to dictate the next-hop for traffic leaving a container it can create a container-specific routing table and perform a fib lookup against this table in a "host-net-namespace-side" TC program. This functionality also allows `ip rule` like functionality at the TC layer, allowing an eBPF program to pick a routing table based on some aspect of the sk_buff. As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN datapath. When egress traffic leaves a Pod an eBPF program attached by Cilium will determine which VRF the egress traffic should target, and then perform a FIB lookup in a specific table representing this VRF's FIB. Signed-off-by: Louis DeLosSantos <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-01perf test: Update parse-events expectations to test for multiple eventsIan Rogers1-518/+590
With PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events opening on multiple PMUs, the test expectations need updating to test for multiple events. TODOs are added to document existing hybrid perf bugs. Tested on hybrid alderlake and non-hybrid tigerlake. Signed-off-by: Ian Rogers <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-01perf parse-events: Wildcard most "numeric" eventsIan Rogers6-35/+106
Numeric events are either raw events or those with ABI defined numbers matched by the lexer. PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events should wildcard match on hybrid systems. So "cycles" should match each PMU type with an extended type, not just PERF_TYPE_HARDWARE. Change wildcard matching to add the event even if wildcard PMU scanning fails, there will be no extended type but this best matches previous behavior. Only set the extended type when the event type supports it and when perf_pmus__supports_extended_type is true. This new function returns true if >1 core PMU and avoids potential errors on older kernels. Modify evsel__compute_group_pmu_name using a helper perf_pmu__is_software to determine when grouping should occur. Try to use PMUs, and evsel__find_pmu, as being more dependable than evsel->pmu_name. Set a parse events error if a hardware term's PMU lookup fails, to provide extra diagnostics. Fixes: 8bc75f699c141420 ("perf parse-events: Support wildcards on raw events") Reported-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>