aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-08-19libperf: Handle read format in perf_evsel__read()Namhyung Kim3-3/+83
The perf_counts_values should be increased to read the new lost data. Also adjust values after read according the read format. This supports PERF_FORMAT_GROUP which has a different data format but it's only available for leader events. Currently it doesn't have an API to read sibling (member) events in the group. But users may read the sibling event directly. Also reading from mmap would be disabled when the read format has ID or LOST bit as it's not exposed via mmap. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools headers UAPI: Sync linux/perf_event.h with the kernel sourcesNamhyung Kim1-1/+4
To pick the trivial change in: 119a784c81270eb8 ("perf/core: Add a new read format to get a number of lost samples") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools headers UAPI: Sync x86's asm/kvm.h with the kernel sourcesArnaldo Carvalho de Melo1-2/+8
To pick the changes in: 43bb9e000ea4c621 ("KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specific") 94dfc73e7cf4a31d ("treewide: uapi: Replace zero-length arrays with flexible-array members") bfbcc81bb82cbbad ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior") b172862241b48499 ("KVM: x86: PIT: Preserve state of speaker port data bit") ed2351174e38ad4f ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault") That just rebuilds kvm-stat.c on x86, no change in functionality. This silences these perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h' diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h Cc: Chenyi Qiang <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paul Durrant <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools headers UAPI: Sync KVM's vmx.h header with the kernel sourcesArnaldo Carvalho de Melo1-1/+3
To pick the changes in: 2f4073e08f4cc5a4 ("KVM: VMX: Enable Notify VM exit") That makes 'perf kvm-stat' aware of this new NOTIFY exit reason, thus addressing the following perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h' diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Tao Xu <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools include UAPI: Sync linux/vhost.h with the kernel sourcesArnaldo Carvalho de Melo1-0/+9
To get the changes in: f345a0143b4dd1cf ("vhost-vdpa: uAPI to suspend the device") Silencing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h' diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h To pick up these changes and support them: $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after $ diff -u before after --- before 2022-08-18 09:46:12.355958316 -0300 +++ after 2022-08-18 09:46:19.701182822 -0300 @@ -29,6 +29,7 @@ [0x75] = "VDPA_SET_VRING_ENABLE", [0x77] = "VDPA_SET_CONFIG_CALL", [0x7C] = "VDPA_SET_GROUP_ASID", + [0x7D] = "VDPA_SUSPEND", }; = { [0x00] = "GET_FEATURES", $ For instance, see how those 'cmd' ioctl arguments get translated, now VDPA_SUSPEND will be as well: # perf trace -a -e ioctl --max-events=10 0.000 ( 0.011 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1) = 0 21.353 ( 0.014 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1) = 0 25.766 ( 0.014 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740) = 0 25.845 ( 0.034 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70) = 0 25.916 ( 0.011 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0) = 0 25.941 ( 0.025 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ATOMIC, arg: 0x7ffe4a22c840) = 0 32.915 ( 0.009 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_RMFB, arg: 0x7ffe4a22cf9c) = 0 42.522 ( 0.013 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740) = 0 42.579 ( 0.031 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70) = 0 42.644 ( 0.010 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0) = 0 # Cc: Adrian Hunter <[email protected]> Cc: Eugenio Pérez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools headers kvm s390: Sync headers with the kernel sourcesArnaldo Carvalho de Melo1-0/+1
To pick the changes in: f5ecfee944934757 ("KVM: s390: resetting the Topology-Change-Report") None of them trigger any changes in tooling, this time this is just to silence these perf build warnings: Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h' diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h Cc: Janosch Frank <[email protected]> Cc: Pierre Morel <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools headers UAPI: Sync linux/kvm.h with the kernel sourcesArnaldo Carvalho de Melo1-0/+108
To pick the changes in: 8a061562e2f2b32b ("RISC-V: KVM: Add extensible CSR emulation framework") f5ecfee944934757 ("KVM: s390: resetting the Topology-Change-Report") 450a563924ae9437 ("KVM: stats: Fix value for KVM_STATS_UNIT_MAX for boolean stats") 1b870fa5573e260b ("kvm: stats: tell userspace which values are boolean") db1c875e0539518e ("KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices") 94dfc73e7cf4a31d ("treewide: uapi: Replace zero-length arrays with flexible-array members") 084cc29f8bbb034c ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis") 2f4073e08f4cc5a4 ("KVM: VMX: Enable Notify VM exit") ed2351174e38ad4f ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault") e9bf3acb23f0a6e1 ("KVM: s390: Add KVM_CAP_S390_PROTECTED_DUMP") 8aba09588d2af37c ("KVM: s390: Add CPU dump functionality") 0460eb35b443f73f ("KVM: s390: Add configuration dump functionality") fe9a93e07ba4f29d ("KVM: s390: pv: Add query dump information") 35d02493dba1ae63 ("KVM: s390: pv: Add query interface") c24a950ec7d60c4d ("KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata for SEV-ES") ffbb61d09fc56c85 ("KVM: x86: Accept KVM_[GS]ET_TSC_KHZ as a VM ioctl.") 661a20fab7d156cf ("KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SEND") fde0451be8fb3208 ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC") 28d1629f751c4a5f ("KVM: x86/xen: Kernel acceleration for XENVER_version") 536395260582be74 ("KVM: x86/xen: handle PV timers oneshot mode") 942c2490c23f2800 ("KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID") 2fd6df2f2b47d430 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") 35025735a79eaa89 ("KVM: x86/xen: Support direct injection of event channel events") That just rebuilds perf, as these patches add just an ioctl that is S390 specific and may clash with other arches, so are so far being excluded in the harvester script: $ tools/perf/trace/beauty/kvm_ioctl.sh > before $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h $ tools/perf/trace/beauty/kvm_ioctl.sh > after $ diff -u before after $ grep 390 tools/perf/trace/beauty/kvm_ioctl.sh egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \ $ This is also by now used by tools/testing/selftests/kvm/, a simple test build succeeded. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h' diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Cc: Anup Patel <[email protected]> Cc: Ben Gardon <[email protected]> Cc: Chenyi Qiang <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Janosch Frank <[email protected]> Cc: João Martins <[email protected]> Cc: Matthew Rosato <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Gonda <[email protected]> Cc: Pierre Morel <[email protected]> Cc: Tao Xu <[email protected]> Link: https://lore.kernel.org/lkml/YvzuryClcn%[email protected]/ Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools headers UAPI: Sync drm/i915_drm.h with the kernel sourcesArnaldo Carvalho de Melo1-87/+300
To pick up the changes in: a913bde810fc464d ("drm/i915: Update i915 uapi documentation") 525e93f6317a08a0 ("drm/i915/uapi: add NEEDS_CPU_ACCESS hint") 141f733bb3abb000 ("drm/i915/uapi: expose the avail tracking") 3f4309cbdc849637 ("drm/i915/uapi: add probed_cpu_visible_size") a50794f26f52c66c ("uapi/drm/i915: Document memory residency and Flat-CCS capability of obj") That don't add any new ioctl, so no changes in tooling. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Matt Roper <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Niranjana Vishwanathapura <[email protected]> Cc: Ramalingam C <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools headers cpufeatures: Sync with the kernel sourcesArnaldo Carvalho de Melo1-2/+4
To pick the changes from: 2b1299322016731d ("x86/speculation: Add RSB VM Exit protections") 28a99e95f55c6185 ("x86/amd: Use IBPB for firmware calls") 4ad3278df6fe2b08 ("x86/speculation: Disable RRSBA behavior") 26aae8ccbc197223 ("x86/cpu/amd: Enumerate BTC_NO") 9756bba28470722d ("x86/speculation: Fill RSB on vmexit for IBRS") 3ebc170068885b6f ("x86/bugs: Add retbleed=ibpb") 2dbb887e875b1de3 ("x86/entry: Add kernel IBRS implementation") 6b80b59b35557065 ("x86/bugs: Report AMD retbleed vulnerability") a149180fbcf336e9 ("x86: Add magic AMD return-thunk") 15e67227c49a5783 ("x86: Undo return-thunk damage") a883d624aed463c8 ("x86/cpufeatures: Move RETPOLINE flags to word 11") aae99a7c9ab371b2 ("x86/cpufeatures: Introduce x2AVIC CPUID bit") 6f33a9daff9f0790 ("x86: Fix comment for X86_FEATURE_ZEN") 51802186158c74a0 ("x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <[email protected]> Cc: Alexandre Chartre <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Daniel Sneddon <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Pawan Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Wyes Karny <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools headers UAPI: Sync linux/fscrypt.h with the kernel sourcesArnaldo Carvalho de Melo1-1/+2
To pick the changes from: 6b2a51ff03bf0c54 ("fscrypt: Add HCTR2 support for filename encryption") That don't result in any changes in tooling, just causes this to be rebuilt: CC /tmp/build/perf-urgent/trace/beauty/sync_file_range.o LD /tmp/build/perf-urgent/trace/beauty/perf-in.o addressing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h' diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h Cc: Adrian Hunter <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Huckleberry <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo1-0/+8
To pick up the changes in: 2b1299322016731d ("x86/speculation: Add RSB VM Exit protections") 4af184ee8b2c0a69 ("tools/power turbostat: dump secondary Turbo-Ratio-Limit") 4ad3278df6fe2b08 ("x86/speculation: Disable RRSBA behavior") d7caac991feeef1b ("x86/cpu/amd: Add Spectral Chicken") 6ad0ad2bf8a67e27 ("x86/bugs: Report Intel retbleed vulnerability") c59a1f106f5cd484 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") 465932db25f36648 ("x86/cpu: Add new VMX feature, Tertiary VM-Execution control") 027bbb884be006b0 ("KVM: x86/speculation: Disable Fill buffer clear within guests") 51802186158c74a0 ("x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug") Addressing these tools/perf build warnings: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' That makes the beautification scripts to pick some new entries: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2022-08-17 09:05:13.938246475 -0300 +++ after 2022-08-17 09:05:22.221455851 -0300 @@ -161,6 +161,7 @@ [0x0000048f] = "IA32_VMX_TRUE_EXIT_CTLS", [0x00000490] = "IA32_VMX_TRUE_ENTRY_CTLS", [0x00000491] = "IA32_VMX_VMFUNC", + [0x00000492] = "IA32_VMX_PROCBASED_CTLS3", [0x000004c1] = "IA32_PMC0", [0x000004d0] = "IA32_MCG_EXT_CTL", [0x00000560] = "IA32_RTIT_OUTPUT_BASE", @@ -212,6 +213,7 @@ [0x0000064D] = "PLATFORM_ENERGY_STATUS", [0x0000064e] = "PPERF", [0x0000064f] = "PERF_LIMIT_REASONS", + [0x00000650] = "SECONDARY_TURBO_RATIO_LIMIT", [0x00000658] = "PKG_WEIGHTED_CORE_C0_RES", [0x00000659] = "PKG_ANY_CORE_C0_RES", [0x0000065A] = "PKG_ANY_GFXE_C0_RES", $ Now one can trace systemwide asking to see backtraces to where those MSRs are being read/written, see this example with a previous update: # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" ^C# If we use -v (verbose mode) we can see what it does behind the scenes: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" Using CPUID AuthenticAMD-25-21-0 0x6a0 0x6a8 New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) 0x6a0 0x6a8 New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) mmap size 528384B ^C# Example with a frequent msr: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2 Using CPUID AuthenticAMD-25-21-0 0x48 New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) 0x48 New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) mmap size 528384B Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols 0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) futex_wait_queue_me ([kernel.kallsyms]) futex_wait ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so) 0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule_idle ([kernel.kallsyms]) do_idle ([kernel.kallsyms]) cpu_startup_entry ([kernel.kallsyms]) secondary_startup_64_no_verify ([kernel.kallsyms]) # Cc: Adrian Hunter <[email protected]> Cc: Daniel Sneddon <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Len Brown <[email protected]> Cc: Like Xu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Pawan Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Hoo <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Link: https://lore.kernel.org/lkml/YvzbT24m2o5U%[email protected]/ Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19perf beauty: Update copy of linux/socket.h with the kernel sourcesArnaldo Carvalho de Melo1-8/+8
To pick the changes in: 7fa875b8e53c288d ("net: copy from user before calling __copy_msghdr") ebe73a284f4de8c5 ("net: Allow custom iter handler in msghdr") 7c701d92b2b5e517 ("skbuff: carry external ubuf_info in msghdr") c04245328dd7e915 ("net: make __sys_accept4_file() static") That don't result in any changes in the tables generated from that header. This silences this perf build warning: Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h' diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h Cc: David Ahern <[email protected]> Cc: David S. Miller <[email protected]> Cc: Dylan Yudaken <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Yajun Deng <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19perf cpumap: Fix alignment for masks in event encodingIan Rogers6-60/+154
A mask encoding of a cpu map is laid out as: u16 nr u16 long_size unsigned long mask[]; However, the mask may be 8-byte aligned meaning there is a 4-byte pad after long_size. This means 32-bit and 64-bit builds see the mask as being at different offsets. On top of this the structure is in the byte data[] encoded as: u16 type char data[] This means the mask's struct isn't the required 4 or 8 byte aligned, but is offset by 2. Consequently the long reads and writes are causing undefined behavior as the alignment is broken. Fix the mask struct by creating explicit 32 and 64-bit variants, use a union to avoid data[] and casts; the struct must be packed so the layout matches the existing perf.data layout. Taking an address of a member of a packed struct breaks alignment so pass the packed perf_record_cpu_map_data to functions, so they can access variables with the right alignment. As the 64-bit version has 4 bytes of padding, optimizing writing to only write the 32-bit version. Committer notes: Disable warnings about 'packed' that break the build in some arches like riscv64, but just around that specific struct. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dave Marchevsky <[email protected]> Cc: German Gomez <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Kees Kook <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19perf cpumap: Compute mask size in constant timeIan Rogers1-12/+1
perf_cpu_map__max() computes the cpumap's maximum value, no need to iterate over all values. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dave Marchevsky <[email protected]> Cc: German Gomez <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Kees Kook <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19perf cpumap: Synthetic events and const/staticIan Rogers3-14/+12
Make the cpumap arguments const to make it clearer they are in rather than out arguments. Make two functions static and remove external declarations. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dave Marchevsky <[email protected]> Cc: German Gomez <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Kees Kook <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19perf cpumap: Const map for max()Ian Rogers2-2/+2
Allows max() to be used with 'const struct perf_cpu_maps *'. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dave Marchevsky <[email protected]> Cc: German Gomez <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Kees Kook <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-18Merge tag 'net-6.0-rc2' of ↵Linus Torvalds65-793/+2351
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - tcp: fix cleanup and leaks in tcp_read_skb() (the new way BPF socket maps get data out of the TCP stack) - tls: rx: react to strparser initialization errors - netfilter: nf_tables: fix scheduling-while-atomic splat - net: fix suspicious RCU usage in bpf_sk_reuseport_detach() Current release - new code bugs: - mlxsw: ptp: fix a couple of races, static checker warnings and error handling Previous releases - regressions: - netfilter: - nf_tables: fix possible module reference underflow in error path - make conntrack helpers deal with BIG TCP (skbs > 64kB) - nfnetlink: re-enable conntrack expectation events - net: fix potential refcount leak in ndisc_router_discovery() Previous releases - always broken: - sched: cls_route: disallow handle of 0 - neigh: fix possible local DoS due to net iface start/stop loop - rtnetlink: fix module refcount leak in rtnetlink_rcv_msg - sched: fix adding qlen to qcpu->backlog in gnet_stats_add_queue_cpu - virtio_net: fix endian-ness for RSS - dsa: mv88e6060: prevent crash on an unused port - fec: fix timer capture timing in `fec_ptp_enable_pps()` - ocelot: stats: fix races, integer wrapping and reading incorrect registers (the change of register definitions here accounts for bulk of the changed LoC in this PR)" * tag 'net-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) net: moxa: MAC address reading, generating, validity checking tcp: handle pure FIN case correctly tcp: refactor tcp_read_skb() a bit tcp: fix tcp_cleanup_rbuf() for tcp_read_skb() tcp: fix sock skb accounting in tcp_read_skb() igb: Add lock to avoid data race dt-bindings: Fix incorrect "the the" corrections net: genl: fix error path memory leak in policy dumping stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove() net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_run net/mlx5e: Allocate flow steering storage during uplink initialization net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset net: mscc: ocelot: make struct ocelot_stat_layout array indexable net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work net: mscc: ocelot: turn stats_lock into a spinlock net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counter net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet counters net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it ...
2022-08-18Merge tag 'linux-kselftest-next-6.0-rc2' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fix from Shuah Khan: - fix landlock test build regression * tag 'linux-kselftest-next-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/landlock: fix broken include of linux/landlock.h
2022-08-18Merge tag 'trace-rtla-v6.0' of ↵Linus Torvalds4-33/+43
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull rtla tool fixes from Steven Rostedt: "Fixes for the Real-Time Linux Analysis tooling: - Fix tracer name in comments and prints - Fix setting up symlinks - Allow extra flags to be set in build - Consolidate and show all necessary libraries not found in build error" * tag 'trace-rtla-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: rtla: Consolidate and show all necessary libraries that failed for building tools/rtla: Build with EXTRA_{C,LD}FLAGS tools/rtla: Fix command symlinks rtla: Fix tracer name
2022-08-18net: moxa: MAC address reading, generating, validity checkingSergei Antonov1-6/+7
This device does not remember its MAC address, so add a possibility to get it from the platform. If it fails, generate a random address. This will provide a MAC address early during boot without user space being involved. Also remove extra calls to is_valid_ether_addr(). Made after suggestions by Andrew Lunn: 1) Use eth_hw_addr_random() to assign a random MAC address during probe. 2) Remove is_valid_ether_addr() from moxart_mac_open() 3) Add a call to platform_get_ethdev_address() during probe 4) Remove is_valid_ether_addr() from moxart_set_mac_address(). The core does this v1 -> v2: Handle EPROBE_DEFER returned from platform_get_ethdev_address(). Move MAC reading code to the beginning of the probe function. Signed-off-by: Sergei Antonov <[email protected]> Suggested-by: Andrew Lunn <[email protected]> CC: Yang Yingliang <[email protected]> CC: Pavel Skripkin <[email protected]> CC: Guobin Huang <[email protected]> CC: Yang Wei <[email protected]> CC: Christophe JAILLET <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18Merge branch 'tcp-some-bug-fixes-for-tcp_read_skb'Jakub Kicinski2-28/+26
Cong Wang says: ==================== tcp: some bug fixes for tcp_read_skb() This patchset contains 3 bug fixes and 1 minor refactor patch for tcp_read_skb(). V1 only had the first patch, as Eric prefers to fix all of them together, I have to group them together. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tcp: handle pure FIN case correctlyCong Wang2-3/+4
When skb->len==0, the recv_actor() returns 0 too, but we also use 0 for error conditions. This patch amends this by propagating the errors to tcp_read_skb() so that we can distinguish skb->len==0 case from error cases. Fixes: 04919bed948d ("tcp: Introduce tcp_read_skb()") Reported-by: Eric Dumazet <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jakub Sitnicki <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tcp: refactor tcp_read_skb() a bitCong Wang1-17/+9
As tcp_read_skb() only reads one skb at a time, the while loop is unnecessary, we can turn it into an if. This also simplifies the code logic. Cc: Eric Dumazet <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jakub Sitnicki <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tcp: fix tcp_cleanup_rbuf() for tcp_read_skb()Cong Wang1-10/+14
tcp_cleanup_rbuf() retrieves the skb from sk_receive_queue, it assumes the skb is not yet dequeued. This is no longer true for tcp_read_skb() case where we dequeue the skb first. Fix this by introducing a helper __tcp_cleanup_rbuf() which does not require any skb and calling it in tcp_read_skb(). Fixes: 04919bed948d ("tcp: Introduce tcp_read_skb()") Cc: Eric Dumazet <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jakub Sitnicki <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tcp: fix sock skb accounting in tcp_read_skb()Cong Wang1-0/+1
Before commit 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()"), skb was not dequeued from receive queue hence when we close TCP socket skb can be just flushed synchronously. After this commit, we have to uncharge skb immediately after being dequeued, otherwise it is still charged in the original sock. And we still need to retain skb->sk, as eBPF programs may extract sock information from skb->sk. Therefore, we have to call skb_set_owner_sk_safe() here. Fixes: 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()") Reported-and-tested-by: [email protected] Tested-by: Stanislav Fomichev <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jakub Sitnicki <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18igb: Add lock to avoid data raceLin Ma2-1/+13
The commit c23d92b80e0b ("igb: Teardown SR-IOV before unregister_netdev()") places the unregister_netdev() call after the igb_disable_sriov() call to avoid functionality issue. However, it introduces several race conditions when detaching a device. For example, when .remove() is called, the below interleaving leads to use-after-free. (FREE from device detaching) | (USE from netdev core) igb_remove | igb_ndo_get_vf_config igb_disable_sriov | vf >= adapter->vfs_allocated_count? kfree(adapter->vf_data) | adapter->vfs_allocated_count = 0 | | memcpy(... adapter->vf_data[vf] Moreover, the igb_disable_sriov() also suffers from data race with the requests from VF driver. (FREE from device detaching) | (USE from requests) igb_remove | igb_msix_other igb_disable_sriov | igb_msg_task kfree(adapter->vf_data) | vf < adapter->vfs_allocated_count adapter->vfs_allocated_count = 0 | To this end, this commit first eliminates the data races from netdev core by using rtnl_lock (similar to commit 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink")). And then adds a spinlock to eliminate races from driver requests. (similar to commit 1e53834ce541 ("ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero") Fixes: c23d92b80e0b ("igb: Teardown SR-IOV before unregister_netdev()") Signed-off-by: Lin Ma <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18Merge branch '100GbE' of ↵Jakub Kicinski6-18/+85
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-17 (ice) This series contains updates to ice driver only. Grzegorz prevents modifications to VLAN 0 when setting VLAN promiscuous as it will already be set. He also ignores -EEXIST error when attempting to set promiscuous and ensures promiscuous mode is properly cleared from the hardware when being removed. Benjamin ignores additional -EEXIST errors when setting promiscuous mode since the existing mode is the desired mode. Sylwester fixes VFs to allow sending of tagged traffic when no VLAN filters exist. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix VF not able to send tagged traffic with no VLAN filters ice: Ignore error message when setting same promiscuous mode ice: Fix clearing of promisc mode with bridge over bond ice: Ignore EEXIST when setting promisc mode ice: Fix double VLAN error when entering promisc mode ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18dt-bindings: Fix incorrect "the the" correctionsGeert Uytterhoeven2-2/+2
Lots of double occurrences of "the" were replaced by single occurrences, but some of them should become "to the" instead. Fixes: 12e5bde18d7f6ca4 ("dt-bindings: Fix typo in comment") Signed-off-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/c5743c0a1a24b3a8893797b52fed88b99e56b04b.1660755148.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18net: genl: fix error path memory leak in policy dumpingJakub Kicinski2-3/+17
If construction of the array of policies fails when recording non-first policy we need to unwind. netlink_policy_dump_add_policy() itself also needs fixing as it currently gives up on error without recording the allocated pointer in the pstate pointer. Reported-by: [email protected] Fixes: 50a896cf2d6f ("genetlink: properly support per-op policy dumping") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18stmmac: intel: Add a missing clk_disable_unprepare() call in ↵Christophe JAILLET1-0/+1
intel_eth_pci_remove() Commit 09f012e64e4b ("stmmac: intel: Fix clock handling on error and remove paths") removed this clk_disable_unprepare() This was partly revert by commit ac322f86b56c ("net: stmmac: Fix clock handling on remove path") which removed this clk_disable_unprepare() because: " While unloading the dwmac-intel driver, clk_disable_unprepare() is being called twice in stmmac_dvr_remove() and intel_eth_pci_remove(). This causes kernel panic on the second call. " However later on, commit 5ec55823438e8 ("net: stmmac: add clocks management for gmac driver") has updated stmmac_dvr_remove() which do not call clk_disable_unprepare() anymore. So this call should now be called from intel_eth_pci_remove(). Fixes: 5ec55823438e8 ("net: stmmac: add clocks management for gmac driver") Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/d7c8c1dadf40df3a7c9e643f76ffadd0ccc1ad1b.1660659689.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tee: add overflow check in register_shm_helper()Jens Wiklander1-0/+3
With special lengths supplied by user space, register_shm_helper() has an integer overflow when calculating the number of pages covered by a supplied user space memory region. This causes internal_get_user_pages_fast() a helper function of pin_user_pages_fast() to do a NULL pointer dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 Modules linked in: CPU: 1 PID: 173 Comm: optee_example_a Not tainted 5.19.0 #11 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 pc : internal_get_user_pages_fast+0x474/0xa80 Call trace: internal_get_user_pages_fast+0x474/0xa80 pin_user_pages_fast+0x24/0x4c register_shm_helper+0x194/0x330 tee_shm_register_user_buf+0x78/0x120 tee_ioctl+0xd0/0x11a0 __arm64_sys_ioctl+0xa8/0xec invoke_syscall+0x48/0x114 Fix this by adding an an explicit call to access_ok() in tee_shm_register_user_buf() to catch an invalid user space address early. Fixes: 033ddf12bcf5 ("tee: add register user memory") Cc: [email protected] Reported-by: Nimish Mishra <[email protected]> Reported-by: Anirban Chakraborty <[email protected]> Reported-by: Debdeep Mukhopadhyay <[email protected]> Suggested-by: Jerome Forissier <[email protected]> Signed-off-by: Jens Wiklander <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-08-17net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_runLorenzo Bianconi1-1/+1
Fix possible NULL pointer dereference in mtk_xdp_run() if the ebpf program returns XDP_TX and xdp_convert_buff_to_frame routine fails returning NULL. Fixes: 5886d26fd25bb ("net: ethernet: mtk_eth_soc: add xmit XDP support") Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/627a07d759020356b64473e09f0855960e02db28.1660659112.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net/mlx5e: Allocate flow steering storage during uplink initializationLeon Romanovsky1-8/+17
IPsec code relies on valid priv->fs pointer that is the case in NIC flow, but not correct in uplink. Before commit that mentioned in the Fixes line, that pointer was valid in all flows as it was allocated together with priv struct. In addition, the cleanup representors routine called to that not-initialized priv->fs pointer and its internals which caused NULL deference. So, move FS allocation to be as early as possible. Fixes: af8bbf730068 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer") Signed-off-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/ae46fa5bed3c67f937bfdfc0370101278f5422f1.1660639564.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17Merge branch 'fixes-for-ocelot-driver-statistics'Jakub Kicinski7-378/+1581
Vladimir Oltean says: ==================== Fixes for Ocelot driver statistics This series contains bug fixes for the ocelot drivers (both switchdev and DSA). Some concern the counters exposed to ethtool -S, and others to the counters exposed to ifconfig. I'm aware that the changes are fairly large, but I wanted to prioritize on a proper approach to addressing the issues rather than a quick hack. Some of the noticed problems: - bad register offsets for some counters - unhandled concurrency leading to corrupted counters - unhandled 32-bit wraparound of ifconfig counters The issues on the ocelot switchdev driver were noticed through code inspection, I do not have the hardware to test. This patch set necessarily converts ocelot->stats_lock from a mutex to a spinlock. I know this affects Colin Foster's development with the SPI controlled VSC7512. I have other changes prepared for net-next that convert this back into a mutex (along with other changes in this area). ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ↵Vladimir Oltean1-27/+26
ocelot->stats Rather than reading the stats64 counters directly from the 32-bit hardware, it's better to rely on the output produced by the periodic ocelot_port_update_stats(). It would be even better to call ocelot_port_update_stats() right from ocelot_get_stats64() to make sure we report the current values rather than the ones from 2 seconds ago. But we need to export ocelot_port_update_stats() from the switch lib towards the switchdev driver for that, and future work will largely undo that. There are more ocelot-based drivers waiting to be introduced, an example of which is the SPI-controlled VSC7512. In that driver's case, it will be impossible to call ocelot_port_update_stats() from ndo_get_stats64 context, since the latter is atomic, and reading the stats over SPI is sleepable. So the compromise taken here, which will also hold going forward, is to report 64-bit counters to stats64, which are not 100% up to date. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offsetVladimir Oltean6-289/+540
With so many counter addresses recently discovered as being wrong, it is desirable to at least have a central database of information, rather than two: one through the SYS_COUNT_* registers (used for ndo_get_stats64), and the other through the offset field of struct ocelot_stat_layout elements (used for ethtool -S). The strategy will be to keep the SYS_COUNT_* definitions as the single source of truth, but for that we need to expand our current definitions to cover all registers. Then we need to convert the ocelot region creation logic, and stats worker, to the read semantics imposed by going through SYS_COUNT_* absolute register addresses, rather than offsets of 32-bit words relative to SYS_COUNT_RX_OCTETS (which should have been SYS_CNT, by the way). Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: make struct ocelot_stat_layout array indexableVladimir Oltean5-306/+1243
The ocelot counters are 32-bit and require periodic reading, every 2 seconds, by ocelot_port_update_stats(), so that wraparounds are detected. Currently, the counters reported by ocelot_get_stats64() come from the 32-bit hardware counters directly, rather than from the 64-bit accumulated ocelot->stats, and this is a problem for their integrity. The strategy is to make ocelot_get_stats64() able to cherry-pick individual stats from ocelot->stats the way in which it currently reads them out from SYS_COUNT_* registers. But currently it can't, because ocelot->stats is an opaque u64 array that's used only to feed data into ethtool -S. To solve that problem, we need to make ocelot->stats indexable, and associate each element with an element of struct ocelot_stat_layout used by ethtool -S. This makes ocelot_stat_layout a fat (and possibly sparse) array, so we need to change the way in which we access it. We no longer need OCELOT_STAT_END as a sentinel, because we know the array's size (OCELOT_NUM_STATS). We just need to skip the array elements that were left unpopulated for the switch revision (ocelot, felix, seville). Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_workVladimir Oltean1-0/+4
The 2 methods can run concurrently, and one will change the window of counters (SYS_STAT_CFG_STAT_VIEW) that the other sees. The fix is similar to what commit 7fbf6795d127 ("net: mscc: ocelot: fix mutex lock error during ethtool stats read") has done for ethtool -S. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: turn stats_lock into a spinlockVladimir Oltean3-9/+8
ocelot_get_stats64() currently runs unlocked and therefore may collide with ocelot_port_update_stats() which indirectly accesses the same counters. However, ocelot_get_stats64() runs in atomic context, and we cannot simply take the sleepable ocelot->stats_lock mutex. We need to convert it to an atomic spinlock first. Do that as a preparatory change. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counterVladimir Oltean1-1/+1
This register, used as part of stats->tx_dropped in ocelot_get_stats64(), has a wrong address. At the address currently given, there is actually the c_tx_green_prio_6 counter. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: fix incorrect ndo_get_stats64 packet countersVladimir Oltean5-30/+42
Reading stats using the SYS_COUNT_* register definitions is only used by ocelot_get_stats64() from the ocelot switchdev driver, however, currently the bucket definitions are incorrect. Separately, on both RX and TX, we have the following problems: - a 256-1023 bucket which actually tracks the 256-511 packets - the 1024-1526 bucket actually tracks the 512-1023 packets - the 1527-max bucket actually tracks the 1024-1526 packets => nobody tracks the packets from the real 1527-max bucket Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS all track the wrong thing. However this doesn't seem to have any consequence, since ocelot_get_stats64() doesn't use these. Even though this problem only manifests itself for the switchdev driver, we cannot split the fix for ocelot and for DSA, since it requires fixing the bucket definitions from enum ocelot_reg, which makes us necessarily adapt the structures from felix and seville as well. Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet countersVladimir Oltean1-1/+2
What the driver actually reports as 256-511 is in fact 512-1023, and the TX packets in the 256-511 bucket are not reported. Fix that. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support itVladimir Oltean1-2/+5
ds->ops->port_stp_state_set() is, like most DSA methods, optional, and if absent, the port is supposed to remain in the forwarding state (as standalone). Such is the case with the mv88e6060 driver, which does not offload the bridge layer. DSA warns that the STP state can't be changed to FORWARDING as part of dsa_port_enable_rt(), when in fact it should not. The error message is also not up to modern standards, so take the opportunity to make it more descriptive. Fixes: fd3645413197 ("net: dsa: change scope of STP state setter") Reported-by: Sergei Antonov <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Sergei Antonov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: dsa: sja1105: fix buffer overflow in sja1105_setup_devlink_regions()Rustam Subkhankulov1-1/+1
If an error occurs in dsa_devlink_region_create(), then 'priv->regions' array will be accessed by negative index '-1'. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rustam Subkhankulov <[email protected]> Fixes: bf425b82059e ("net: dsa: sja1105: expose static config as devlink region") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski9-261/+390
Florian Westphal says: ==================== netfilter: conntrack and nf_tables bug fixes The following patchset contains netfilter fixes for net. Broken since 5.19: A few ancient connection tracking helpers assume TCP packets cannot exceed 64kb in size, but this isn't the case anymore with 5.19 when BIG TCP got merged, from myself. Regressions since 5.19: 1. 'conntrack -E expect' won't display anything because nfnetlink failed to enable events for expectations, only for normal conntrack events. 2. partially revert change that added resched calls to a function that can be in atomic context. Both broken and fixed up by myself. Broken for several releases (up to original merge of nf_tables): Several fixes for nf_tables control plane, from Pablo. This fixes up resource leaks in error paths and adds more sanity checks for mutually exclusive attributes/flags. Kconfig: NF_CONNTRACK_PROCFS is very old and doesn't provide all info provided via ctnetlink, so it should not default to y. From Geert Uytterhoeven. Selftests: rework nft_flowtable.sh: it frequently indicated failure; the way it tried to detect an offload failure did not work reliably. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: testing: selftests: nft_flowtable.sh: rework test to detect offload failure testing: selftests: nft_flowtable.sh: use random netns names netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag netfilter: nf_tables: really skip inactive sets when allocating name netfilter: nfnetlink: re-enable conntrack expectation events netfilter: nf_tables: fix scheduling-while-atomic splat netfilter: nf_ct_irc: cap packet search space to 4k netfilter: nf_ct_ftp: prefer skb_linearize netfilter: nf_ct_h323: cap packet size at 64k netfilter: nf_ct_sane: remove pseudo skb linearization netfilter: nf_tables: possible module reference underflow in error path netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: Fix suspicious RCU usage in bpf_sk_reuseport_detach()David Howells2-1/+26
bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags() to obtain the value of sk->sk_user_data, but that function is only usable if the RCU read lock is held, and neither that function nor any of its callers hold it. Fix this by adding a new helper, __locked_read_sk_user_data_with_flags() that checks to see if sk->sk_callback_lock() is held and use that here instead. Alternatively, making __rcu_dereference_sk_user_data_with_flags() use rcu_dereference_checked() might suffice. Without this, the following warning can be occasionally observed: ============================= WARNING: suspicious RCU usage 6.0.0-rc1-build2+ #563 Not tainted ----------------------------- include/net/sock.h:592 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 5 locks held by locktest/29873: #0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121 #1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70 #2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0 #3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd #4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4 stack backtrace: CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x4c/0x5f bpf_sk_reuseport_detach+0x6d/0xa4 reuseport_detach_sock+0x75/0xdd inet_unhash+0xa5/0x1c0 tcp_set_state+0x169/0x20f ? lockdep_sock_is_held+0x3a/0x3a ? __lock_release.isra.0+0x13e/0x220 ? reacquire_held_locks+0x1bb/0x1bb ? hlock_class+0x31/0x96 ? mark_lock+0x9e/0x1af __tcp_close+0x50/0x4b6 tcp_close+0x28/0x70 inet_release+0x8e/0xa7 __sock_release+0x95/0x121 sock_close+0x14/0x17 __fput+0x20f/0x36a task_work_run+0xa3/0xcc exit_to_user_mode_prepare+0x9c/0x14d syscall_exit_to_user_mode+0x18/0x44 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: cf8c1e967224 ("net: refactor bpf_sk_reuseport_detach()") Signed-off-by: David Howells <[email protected]> cc: Hawkins Jiawei <[email protected]> Link: https://lore.kernel.org/r/166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17Merge tag 'ntfs3_for_6.0' of ↵Linus Torvalds14-329/+835
https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: - implement FALLOC_FL_INSERT_RANGE - fix some logic errors - fixed xfstests (tested on x86_64): generic/064 generic/213 generic/300 generic/361 generic/449 generic/485 - some dead code removed or refactored * tag 'ntfs3_for_6.0' of https://github.com/Paragon-Software-Group/linux-ntfs3: (39 commits) fs/ntfs3: uninitialized variable in ntfs_set_acl_ex() fs/ntfs3: Remove unused function wnd_bits fs/ntfs3: Make ni_ins_new_attr return error fs/ntfs3: Create MFT zone only if length is large enough fs/ntfs3: Refactoring attr_insert_range to restore after errors fs/ntfs3: Refactoring attr_punch_hole to restore after errors fs/ntfs3: Refactoring attr_set_size to restore after errors fs/ntfs3: New function ntfs_bad_inode fs/ntfs3: Make MFT zone less fragmented fs/ntfs3: Check possible errors in run_pack in advance fs/ntfs3: Added comments to frecord functions fs/ntfs3: Fill duplicate info in ni_add_name fs/ntfs3: Make static function attr_load_runs fs/ntfs3: Add new argument is_mft to ntfs_mark_rec_free fs/ntfs3: Remove unused mi_mark_free fs/ntfs3: Fix very fragmented case in attr_punch_hole fs/ntfs3: Fix work with fragmented xattr fs/ntfs3: Make ntfs_fallocate return -ENOSPC instead of -EFBIG fs/ntfs3: extend ni_insert_nonresident to return inserted ATTR_LIST_ENTRY fs/ntfs3: Check reserved size for maximum allowed ...
2022-08-17dcache: move the DCACHE_OP_COMPARE case out of the __d_lookup_rcu loopLinus Torvalds1-23/+49
__d_lookup_rcu() is one of the hottest functions in the kernel on certain loads, and it is complicated by filesystems that might want to have their own name compare function. We can improve code generation by moving the test of DCACHE_OP_COMPARE outside the loop, which makes the loop itself much simpler, at the cost of some code duplication. But both cases end up being simpler, and the "native" direct case-sensitive compare particularly so. Cc: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-08-17net: dsa: microchip: ksz9477: fix fdb_dump last invalid entryArun Ramadoss1-0/+3
In the ksz9477_fdb_dump function it reads the ALU control register and exit from the timeout loop if there is valid entry or search is complete. After exiting the loop, it reads the alu entry and report to the user space irrespective of entry is valid. It works till the valid entry. If the loop exited when search is complete, it reads the alu table. The table returns all ones and it is reported to user space. So bridge fdb show gives ff:ff:ff:ff:ff:ff as last entry for every port. To fix it, after exiting the loop the entry is reported only if it is valid one. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Arun Ramadoss <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17ice: Fix VF not able to send tagged traffic with no VLAN filtersSylwester Dziedziuch2-11/+57
VF was not able to send tagged traffic when it didn't have any VLAN interfaces and VLAN anti-spoofing was enabled. Fix this by allowing VFs with no VLAN filters to send tagged traffic. After VF adds a VLAN interface it will be able to send tagged traffic matching VLAN filters only. Testing hints: 1. Spawn VF 2. Send tagged packet from a VF 3. The packet should be sent out and not dropped 4. Add a VLAN interface on VF 5. Send tagged packet on that VLAN interface 6. Packet should be sent out and not dropped 7. Send tagged packet with id different than VLAN interface 8. Packet should be dropped Fixes: daf4dd16438b ("ice: Refactor spoofcheck configuration functions") Signed-off-by: Sylwester Dziedziuch <[email protected]> Signed-off-by: Mateusz Palczewski <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>