Age | Commit message (Collapse) | Author | Files | Lines |
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-02-29
We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).
The main changes are:
1) Extend the BPF verifier to enable static subprog calls in spin lock
critical sections, from Kumar Kartikeya Dwivedi.
2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
in BPF global subprogs, from Andrii Nakryiko.
3) Larger batch of riscv BPF JIT improvements and enabling inlining
of the bpf_kptr_xchg() for RV64, from Pu Lehui.
4) Allow skeleton users to change the values of the fields in struct_ops
maps at runtime, from Kui-Feng Lee.
5) Extend the verifier's capabilities of tracking scalars when they
are spilled to stack, especially when the spill or fill is narrowing,
from Maxim Mikityanskiy & Eduard Zingerman.
6) Various BPF selftest improvements to fix errors under gcc BPF backend,
from Jose E. Marchesi.
7) Avoid module loading failure when the module trying to register
a struct_ops has its BTF section stripped, from Geliang Tang.
8) Annotate all kfuncs in .BTF_ids section which eventually allows
for automatic kfunc prototype generation from bpftool, from Daniel Xu.
9) Several updates to the instruction-set.rst IETF standardization
document, from Dave Thaler.
10) Shrink the size of struct bpf_map resp. bpf_array,
from Alexei Starovoitov.
11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
from Benjamin Tissoires.
12) Fix bpftool to be more portable to musl libc by using POSIX's
basename(), from Arnaldo Carvalho de Melo.
13) Add libbpf support to gcc in CORE macro definitions,
from Cupertino Miranda.
14) Remove a duplicate type check in perf_event_bpf_event,
from Florian Lehner.
15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
with notrace correctly, from Yonghong Song.
16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
array to fix build warnings, from Kees Cook.
17) Fix resolve_btfids cross-compilation to non host-native endianness,
from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
selftests/bpf: Test if shadow types work correctly.
bpftool: Add an example for struct_ops map and shadow type.
bpftool: Generated shadow variables for struct_ops maps.
libbpf: Convert st_ops->data to shadow type.
libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
bpf, arm64: use bpf_prog_pack for memory management
arm64: patching: implement text_poke API
bpf, arm64: support exceptions
arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
bpf: add is_async_callback_calling_insn() helper
bpf: introduce in_sleepable() helper
bpf: allow more maps in sleepable bpf programs
selftests/bpf: Test case for lacking CFI stub functions.
bpf: Check cfi_stubs before registering a struct_ops type.
bpf: Clarify batch lookup/lookup_and_delete semantics
bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
bpf, docs: Fix typos in instruction-set.rst
selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
bpf: Shrink size of struct bpf_map/bpf_array.
...
====================
Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f53 ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").
Then, much later, we ended up removing the workaround in commit
43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.
Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround. But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.
It looks like there are at least two separate issues that all hit in
this area:
(a) some versions of gcc don't mark the asm goto as 'volatile' when it
has outputs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
which is easy to work around by just adding the 'volatile' by hand.
(b) Internal compiler errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
which are worked around by adding the extra empty 'asm' as a
barrier, as in the original workaround.
but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.
but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Instead of using magic offsets to access BTF ID set data, leverage types
from btf_ids.h (btf_id_set and btf_id_set8) which define the actual
layout of the data. Thanks to this change, set sorting should also
continue working if the layout changes.
This requires to sync the definition of 'struct btf_id_set8' from
include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync
the rest of the file at the moment, b/c that would require to also sync
multiple dependent headers and we don't need any other defs from
btf_ids.h.
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com
|
|
Updated check_forking() and bench_forking() to use __mt_dup() to duplicate
maple tree.
Link: https://lkml.kernel.org/r/20231027033845.90608-9-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In some cases, nested locks may be needed, so {mtree,mas}_lock_nested is
introduced. For example, when duplicating maple tree, we need to hold the
locks of two trees, in which case nested locks are needed.
At the same time, add the definition of spin_lock_nested() in tools for
testing.
Link: https://lkml.kernel.org/r/20231027033845.90608-3-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"As usual, lots of singleton and doubleton patches all over the tree
and there's little I can say which isn't in the individual changelogs.
The lengthier patch series are
- 'kdump: use generic functions to simplify crashkernel reservation
in arch', from Baoquan He. This is mainly cleanups and
consolidation of the 'crashkernel=' kernel parameter handling
- After much discussion, David Laight's 'minmax: Relax type checks in
min() and max()' is here. Hopefully reduces some typecasting and
the use of min_t() and max_t()
- A group of patches from Oleg Nesterov which clean up and slightly
fix our handling of reads from /proc/PID/task/... and which remove
task_struct.thread_group"
* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
scripts/gdb/vmalloc: disable on no-MMU
scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
.mailmap: add address mapping for Tomeu Vizoso
mailmap: update email address for Claudiu Beznea
tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
.mailmap: map Benjamin Poirier's address
scripts/gdb: add lx_current support for riscv
ocfs2: fix a spelling typo in comment
proc: test ProtectionKey in proc-empty-vm test
proc: fix proc-empty-vm test with vsyscall
fs/proc/base.c: remove unneeded semicolon
do_io_accounting: use sig->stats_lock
do_io_accounting: use __for_each_thread()
ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
ocfs2: fix a typo in a comment
scripts/show_delta: add __main__ judgement before main code
treewide: mark stuff as __ro_after_init
fs: ocfs2: check status values
proc: test /proc/${pid}/statm
compiler.h: move __is_constexpr() to compiler.h
...
|
|
Prior to f747e6667ebb2 __is_constexpr() was in its only user minmax.h.
That commit moved it to const.h - but that file just defines ULL(x) and
UL(x) so that constants can be defined for .S and .c files.
So apart from the word 'const' it wasn't really a good location. Instead
move the definition to compiler.h just before the similar
is_signed_type() and is_unsigned_type().
This may not be a good long-term home, but the three definitions belong
together.
Link: https://lkml.kernel.org/r/2a6680bbe2e84459816a113730426782@AcuMS.aculab.com
Signed-off-by: David Laight <david.laight@aculab.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Users complained about OOM errors during fork without triggering
compaction. This can be fixed by modifying the flags used in
mas_expected_entries() so that the compaction will be triggered in low
memory situations. Since mas_expected_entries() is only used during fork,
the extra argument does not need to be passed through.
Additionally, the two test_maple_tree test cases and one benchmark test
were altered to use the correct locking type so that allocations would not
trigger sleeping and thus fail. Testing was completed with lockdep atomic
sleep detection.
The additional locking change requires rwsem support additions to the
tools/ directory through the use of pthreads pthread_rwlock_t. With this
change test_maple_tree works in userspace, as a module, and in-kernel.
Users may notice that the system gave up early on attempting to start new
processes instead of attempting to reclaim memory.
Link: https://lkml.kernel.org/r/20230915093243epcms1p46fa00bbac1ab7b7dca94acb66c44c456@epcms1p4
Link: https://lkml.kernel.org/r/20231012155233.2272446-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: <jason.sim@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and bpf.
Current release - regressions:
- bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE
- netfilter: fix entries val in rule reset audit log
- eth: stmmac: fix incorrect rxq|txq_stats reference
Previous releases - regressions:
- ipv4: fix null-deref in ipv4_link_failure
- netfilter:
- fix several GC related issues
- fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
- eth: team: fix null-ptr-deref when team device type is changed
- eth: i40e: fix VF VLAN offloading when port VLAN is configured
- eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB
Previous releases - always broken:
- core: fix ETH_P_1588 flow dissector
- mptcp: fix several connection hang-up conditions
- bpf:
- avoid deadlock when using queue and stack maps from NMI
- add override check to kprobe multi link attach
- hsr: properly parse HSRv1 supervisor frames.
- eth: igc: fix infinite initialization loop with early XDP redirect
- eth: octeon_ep: fix tx dma unmap len values in SG
- eth: hns3: fix GRE checksum offload issue"
* tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()
igc: Expose tx-usecs coalesce setting to user
octeontx2-pf: Do xdp_do_flush() after redirects.
bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
net: ena: Flush XDP packets on error.
net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()
net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev'
netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
netfilter: nf_tables: fix memleak when more than 255 elements expired
netfilter: nf_tables: disable toggling dormant table state more than once
vxlan: Add missing entries to vxlan_get_size()
net: rds: Fix possible NULL-pointer dereference
team: fix null-ptr-deref when team device type is changed
net: bridge: use DEV_STATS_INC()
net: hns3: add 5ms delay before clear firmware reset irq source
net: hns3: fix fail to delete tc flower rules during reset issue
net: hns3: only enable unicast promisc when mac table full
net: hns3: fix GRE checksum offload issue
net: hns3: add cmdq check for vf periodic service task
net: stmmac: fix incorrect rxq|txq_stats reference
...
|
|
Marcus and Satya reported an issue where BTF_ID macro generates same
symbol in separate objects and that breaks final vmlinux link.
ld.lld: error: ld-temp.o <inline asm>:14577:1: symbol
'__BTF_ID__struct__cgroup__624' is already defined
This can be triggered under specific configs when __COUNTER__ happens to
be the same for the same symbol in two different translation units,
which is already quite unlikely to happen.
Add __LINE__ number suffix to make BTF_ID symbol more unique, which is
not a complete fix, but it would help for now and meanwhile we can work
on better solution as suggested by Andrii.
Cc: stable@vger.kernel.org
Reported-by: Satya Durga Srinivasu Prabhala <quic_satyap@quicinc.com>
Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com>
Closes: https://github.com/ClangBuiltLinux/linux/issues/1913
Debugged-by: Nathan Chancellor <nathan@kernel.org>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4Bzb5KQ2_LmhN769ifMeSJaWfebccUasQOfQKaOd0nQ51tw@mail.gmail.com/
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230915-bpf_collision-v3-2-263fc519c21f@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Building memblock tests produces the following warning:
cc -I. -I../../include -Wall -O2 -fsanitize=address -fsanitize=undefined -D CONFIG_PHYS_ADDR_T_64BIT -c -o main.o main.c
In file included from tests/common.h:9,
from tests/basic_api.h:5,
from main.c:2:
./linux/memblock.h:601:50: warning: ‘struct seq_file’ declared inside parameter list will not be visible outside of this definition or declaration
601 | static inline void memtest_report_meminfo(struct seq_file *m) { }
| ^~~~~~~~
Add declaration of 'struct seq_file' to tools/include/linux/seq_file.h
to fix it.
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
|
|
Building memblock tests produces the following warning:
cc -I. -I../../include -Wall -O2 -fsanitize=address -fsanitize=undefined -D CONFIG_PHYS_ADDR_T_64BIT -c -o main.o main.c
In file included from ../../include/linux/pfn.h:5,
from ./linux/memory_hotplug.h:6,
from ./linux/init.h:7,
from ./linux/memblock.h:11,
from tests/common.h:8,
from tests/basic_api.h:5,
from main.c:2:
../../include/linux/mm.h:14: warning: "__ALIGN_KERNEL" redefined
14 | #define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
|
In file included from ../../include/linux/mm.h:6,
from ../../include/linux/pfn.h:5,
from ./linux/memory_hotplug.h:6,
from ./linux/init.h:7,
from ./linux/memblock.h:11,
from tests/common.h:8,
from tests/basic_api.h:5,
from main.c:2:
../../include/uapi/linux/const.h:31: note: this is the location of the previous definition
31 | #define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (__typeof__(x))(a) - 1)
|
Remove definitions of __ALIGN_KERNEL and __ALIGN_KERNEL_MASK from
tools/include/linux/mm.h to fix it.
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
|
|
This patch fix the follow errors.
commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") pass nid
parameter to reserve_bootmem_region(),
$ make -C tools/testing/memblock/
...
memblock.c: In function ‘memmap_init_reserved_pages’:
memblock.c:2111:25: error: too many arguments to function ‘reserve_bootmem_region’
2111 | reserve_bootmem_region(start, end, nid);
| ^~~~~~~~~~~~~~~~~~~~~~
../../include/linux/mm.h:32:6: note: declared here
32 | void reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
| ^~~~~~~~~~~~~~~~~~~~~~
memblock.c:2122:17: error: too many arguments to function ‘reserve_bootmem_region’
2122 | reserve_bootmem_region(start, end, nid);
| ^~~~~~~~~~~~~~~~~~~~~~
commit dcdfdd40fa82 ("mm: Add support for unaccepted memory") call
accept_memory() in memblock.c
$ make -C tools/testing/memblock/
...
cc -fsanitize=address -fsanitize=undefined main.o memblock.o \
lib/slab.o mmzone.o slab.o tests/alloc_nid_api.o \
tests/alloc_helpers_api.o tests/alloc_api.o tests/basic_api.o \
tests/common.o tests/alloc_exact_nid_api.o -o main
/usr/bin/ld: memblock.o: in function `memblock_alloc_range_nid':
memblock.c:(.text+0x7ae4): undefined reference to `accept_memory'
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
Link: https://lore.kernel.org/r/tencent_6F19BC082167F15DF2A8D8BEFE8EF220F60A@qq.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
|
|
We don't have definitions of __always_unused or __noreturn in the tools
version of compiler.h, add them so we can use them in kselftests.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230728-arm64-signal-memcpy-fix-v4-3-0c1290db5d46@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Port over the definition of OPTIMIZER_HIDE_VAR() so we can use it in
kselftests.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230728-arm64-signal-memcpy-fix-v4-2-0c1290db5d46@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Copy the kernel version of the header to fix the header diff build
warning. Some new definitions were only added to the tools side header,
but these are only used in Perf so move them to a different header.
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Link: https://lore.kernel.org/r/20230522102604.1081416-1-james.clark@arm.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: acme@kernel.org
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: siyanteng@loongson.cn
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"Third version of perf tool updates, with the build problems with with
using a 'vmlinux.h' generated from the main build fixed, and the bpf
skeleton build disabled by default.
Build:
- Require libtraceevent to build, one can disable it using
NO_LIBTRACEEVENT=1.
It is required for tools like 'perf sched', 'perf kvm', 'perf
trace', etc.
libtraceevent is available in most distros so installing
'libtraceevent-devel' should be a one-time event to continue
building perf as usual.
Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
sufficient for lots of users not interested in those libtraceevent
dependent features.
- Allow Python support in 'perf script' when libtraceevent isn't
linked, as not all features requires it, for instance Intel PT does
not use tracepoints.
- Error if the python interpreter needed for jevents to work isn't
available and NO_JEVENTS=1 isn't set, preventing a build without
support for JSON vendor events, which is a rare but possible
condition. The two check error messages:
$(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
$(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)
- Make libbpf 1.0 the minimum required when building with out of
tree, distro provided libbpf.
- Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
demangler, add 'perf test' entry for it.
- Make binutils libraries opt in, as distros disable building with it
due to licensing, they were used for C++ demangling, for instance.
- Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
equivalent) isn't installed, we'll just have a build warning:
Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
- Add a feature test for scandirat(), that is not implemented so far
in musl and uclibc, disabling features that need it, such as
scanning for tracepoints in /sys/kernel/tracing/events.
perf BPF filters:
- New feature where BPF can be used to filter samples, for instance:
$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
- In addition to 'period' (PERF_SAMPLE_PERIOD), the other
PERF_SAMPLE_ can be used for filtering, and also some other sample
accessible values, from tools/perf/Documentation/perf-record.txt:
Essentially the BPF filter expression is:
<term> <operator> <value> (("," | "||") <term> <operator> <value>)*
The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops
The <operator> can be one of:
==, !=, >, >=, <, <=, &
The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)
perf lock contention:
- Show lock type with address.
- Track and show mmap_lock, siglock and per-cpu rq_lock with address.
This is done for mmap_lock by following the current->mm pointer:
$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80
- Update default map size to 16384.
- Allocate single letter option -M for --map-nr-entries, as it is
proving being frequently used.
- Fix struct rq lock access for older kernels with BPF's CO-RE
(Compile once, run everywhere).
- Fix problems found with MSAn.
perf report/top:
- Add inline information when using --call-graph=fp or lbr, as was
already done to the --call-graph=dwarf callchain mode.
- Improve the 'srcfile' sort key performance by really using an
optimization introduced in 6.2 for the 'srcline' sort key that
avoids calling addr2line for comparision with each sample.
perf sched:
- Make 'perf sched latency/map/replay' to use "sched:sched_waking"
instead of "sched:sched_waking", consistent with 'perf record'
since d566a9c2d482 ("perf sched: Prefer sched_waking event when it
exists").
perf ftrace:
- Make system wide the default target for latency subcommand, run the
following command then generate some network traffic and press
control+C:
# perf ftrace latency -T __kfree_skb
^C
DURATION | COUNT | GRAPH |
0 - 1 us | 27 | ############# |
1 - 2 us | 22 | ########### |
2 - 4 us | 8 | #### |
4 - 8 us | 5 | ## |
8 - 16 us | 24 | ############ |
16 - 32 us | 2 | # |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
#
perf top:
- Add --branch-history (LBR: Last Branch Record) option, just like
already available for 'perf record'.
- Fix segfault in thread__comm_len() where thread->comm was being
used outside thread->comm_lock.
perf annotate:
- Allow configuring objdump and addr2line in ~/.perfconfig., so that
you can use alternative binaries, such as llvm's.
perf kvm:
- Add TUI mode for 'perf kvm stat report'.
Reference counting:
- Add reference count checking infrastructure to check for use after
free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
more to come.
To build with it use -DREFCNT_CHECKING=1 in the make command line
to build tools/perf. Documented at:
https://perf.wiki.kernel.org/index.php/Reference_Count_Checking
- The above caught, for instance, fix, present in this series:
- Fix maps use after put in 'perf test "Share thread maps"':
'maps' is copied from leader, but the leader is put on line 79
and then 'maps' is used to read the reference count below - so
a use after put, with the put of maps happening within
thread__put.
Fixed by reversing the order of puts so that the leader is put
last.
- Also several fixes were made to places where reference counts were
not being held.
- Make this one of the tests in 'make -C tools/perf build-test' to
regularly build test it and to make sure no direct access to the
reference counted structs are made, doing that via accessors to
check the validity of the struct pointer.
ARM64:
- Fix 'perf report' segfault when filtering coresight traces by
sparse lists of CPUs.
- Add support for 'simd' as a sort field for 'perf report', to show
ARM's NEON SIMD's predicate flags: "partial" and "empty".
arm64 vendor events:
- Add N1 metrics.
Intel vendor events:
- Add graniterapids, grandridge and sierraforrest events.
- Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
silvermont, skylake, tigerlake and westmereep-dp
- Refresh metrics for alderlake-n, broadwell, broadwellde,
broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
skylakex.
perf stat:
- Implement --topdown using JSON metrics.
- Add TopdownL1 JSON metric as a default if present, but disable it
for now for some Intel hybrid architectures, a series of patches
addressing this is being reviewed and will be submitted for v6.5.
- Use metrics for --smi-cost.
- Update topdown documentation.
Vendor events (JSON) infrastructure:
- Add support for computing and printing metric threshold values. For
instance, here is one found in thesapphirerapids json file:
{
"BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
"MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
"MetricGroup": "smi",
"MetricName": "smi_cycles",
"MetricThreshold": "smi_cycles > 0.1",
"ScaleUnit": "100%"
},
- Test parsing metric thresholds with the fake PMU in 'perf test
pmu-events'.
- Support for printing metric thresholds in 'perf list'.
- Add --metric-no-threshold option to 'perf stat'.
- Add rand (reverse and) and has_pmem (optane memory) support to
metrics.
- Sort list of input files to avoid depending on the order from
readdir() helping in obtaining reproducible builds.
S/390:
- Add common metrics: - CPI (cycles per instruction), prbstate (ratio
of instructions executed in problem state compared to total number
of instructions), l1mp (Level one instruction and data cache misses
per 100 instructions).
- Add cache metrics for z13, z14, z15 and z16.
- Add metric for TLB and cache.
ARM:
- Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
(Memory Tagging Extension) and MOPS (Memory Operations) load/store.
Intel PT hardware tracing:
- Add event type names UINTR (User interrupt delivered) and UIRET
(Exiting from user interrupt routine), documented in table 32-50
"CFE Packet Type and Vector Fields Details" in the Intel Processor
Trace chapter of The Intel SDM Volume 3 version 078.
- Add support for new branch instructions ERETS and ERETU.
- Fix CYC timestamps after standalone CBR
ARM CoreSight hardware tracing:
- Allow user to override timestamp and contextid settings.
- Fix segfault in dso lookup.
- Fix timeless decode mode detection.
- Add separate decode paths for timeless and per-thread modes.
auxtrace:
- Fix address filter entire kernel size.
Miscellaneous:
- Fix use-after-free and unaligned bugs in the PLT handling routines.
- Use zfree() to reduce chances of use after free.
- Add missing 0x prefix for addresses printed in hexadecimal in 'perf
probe'.
- Suppress massive unsupported target platform errors in the unwind
code.
- Fix return incorrect build_id size in elf_read_build_id().
- Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .
- Add missing new parameter in kfree_skb tracepoint to the python
scripts using it.
- Add 'perf bench syscall fork' benchmark.
- Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
'perf mem'.
- Fix wrong size expectation for perf test 'Setup struct
perf_event_attr' caused by the patch adding
perf_event_attr::config3.
- Fix some spelling mistakes"
* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
Revert "perf build: Warn for BPF skeletons if endian mismatches"
perf metrics: Fix SEGV with --for-each-cgroup
perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
perf stat: Separate bperf from bpf_profiler
perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
perf test record+probe_libc_inet_pton: Fix call chain match on s390
perf tracepoint: Fix memory leak in is_valid_tracepoint()
perf cs-etm: Add fix for coresight trace for any range of CPUs
perf build: Fix unescaped # in perf build-test
perf unwind: Suppress massive unsupported target platform errors
perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
perf script: Print raw ip instead of binary offset for callchain
perf symbols: Fix return incorrect build_id size in elf_read_build_id()
perf list: Modify the warning message about scandirat(3)
perf list: Fix memory leaks in print_tracepoint_events()
perf lock contention: Rework offset calculation with BPF CO-RE
perf lock contention: Fix struct rq lock access
perf stat: Disable TopdownL1 on hybrid
perf stat: Avoid SEGV on counter->name
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures &
drivers that did this inconsistently follow this new, common
convention, and fix all the fallout that objtool can now detect
statically
- Fix/improve the ORC unwinder becoming unreliable due to
UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
and UNWIND_HINT_UNDEFINED to resolve it
- Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code
- Generate ORC data for __pfx code
- Add more __noreturn annotations to various kernel startup/shutdown
and panic functions
- Misc improvements & fixes
* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
x86/hyperv: Mark hv_ghcb_terminate() as noreturn
scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
btrfs: Mark btrfs_assertfail() __noreturn
objtool: Include weak functions in global_noreturns check
cpu: Mark nmi_panic_self_stop() __noreturn
cpu: Mark panic_smp_self_stop() __noreturn
arm64/cpu: Mark cpu_park_loop() and friends __noreturn
x86/head: Mark *_start_kernel() __noreturn
init: Mark start_kernel() __noreturn
init: Mark [arch_call_]rest_init() __noreturn
objtool: Generate ORC data for __pfx code
x86/linkage: Fix padding for typed functions
objtool: Separate prefix code from stack validation code
objtool: Remove superfluous dead_end_function() check
objtool: Add symbol iteration helpers
objtool: Add WARN_INSN()
scripts/objdump-func: Support multiple functions
context_tracking: Fix KCSAN noinstr violation
objtool: Add stackleak instrumentation to uaccess safe list
...
|
|
Pull virtio updates from Michael Tsirkin:
"virtio,vhost,vdpa: features, fixes, and cleanups:
- reduction in interrupt rate in virtio
- perf improvement for VDUSE
- scalability for vhost-scsi
- non power of 2 ring support for packed rings
- better management for mlx5 vdpa
- suspend for snet
- VIRTIO_F_NOTIFICATION_DATA
- shared backend with vdpa-sim-blk
- user VA support in vdpa-sim
- better struct packing for virtio
and fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (52 commits)
vhost_vdpa: fix unmap process in no-batch mode
MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS
tools/virtio: fix build caused by virtio_ring changes
virtio_ring: add a struct device forward declaration
vdpa_sim_blk: support shared backend
vdpa_sim: move buffer allocation in the devices
vdpa/snet: use likely/unlikely macros in hot functions
vdpa/snet: implement kick_vq_with_data callback
virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support
virtio: add VIRTIO_F_NOTIFICATION_DATA feature support
vdpa/snet: support the suspend vDPA callback
vdpa/snet: support getting and setting VQ state
MAINTAINERS: add vringh.h to Virtio Core and Net Drivers
vringh: address kdoc warnings
vdpa: address kdoc warnings
virtio_ring: don't update event idx on get_buf
vdpa_sim: add support for user VA
vdpa_sim: replace the spinlock with a mutex to protect the state
vdpa_sim: use kthread worker
vdpa_sim: make devices agnostic for work management
...
|
|
Pull documentation updates from Jonathan Corbet:
"Commit volume in documentation is relatively low this time, but there
is still a fair amount going on, including:
- Reorganize the architecture-specific documentation under
Documentation/arch
This makes the structure match the source directory and helps to
clean up the mess that is the top-level Documentation directory a
bit. This work creates the new directory and moves x86 and most of
the less-active architectures there.
The current plan is to move the rest of the architectures in 6.5,
with the patches going through the appropriate subsystem trees.
- Some more Spanish translations and maintenance of the Italian
translation
- A new "Kernel contribution maturity model" document from Ted
- A new tutorial on quickly building a trimmed kernel from Thorsten
Plus the usual set of updates and fixes"
* tag 'docs-6.4' of git://git.lwn.net/linux: (47 commits)
media: Adjust column width for pdfdocs
media: Fix building pdfdocs
docs: clk: add documentation to log which clocks have been disabled
docs: trace: Fix typo in ftrace.rst
Documentation/process: always CC responsible lists
docs: kmemleak: adjust to config renaming
ELF: document some de-facto PT_* ABI quirks
Documentation: arm: remove stih415/stih416 related entries
docs: turn off "smart quotes" in the HTML build
Documentation: firmware: Clarify firmware path usage
docs/mm: Physical Memory: Fix grammar
Documentation: Add document for false sharing
dma-api-howto: typo fix
docs: move m68k architecture documentation under Documentation/arch/
docs: move parisc documentation under Documentation/arch/
docs: move ia64 architecture docs under Documentation/arch/
docs: Move arc architecture docs under Documentation/arch/
docs: move nios2 documentation under Documentation/arch/
docs: move openrisc documentation under Documentation/arch/
docs: move superh documentation under Documentation/arch/
...
|
|
Fix the build dependency for virtio_test. The virtio_ring that is used from
the test requires container_of_const(). Change to use container_of.h kernel
header directly and adapt related codes.
Signed-off-by: Shunsuke Mie <mie@igel.co.jp>
Message-Id: <20230417022037.917668-2-mie@igel.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Rename the fallthrough attribute to better align with the kernel
version. Copy the definition from include/linux/compiler_attributes.h
including the #else clause. Adding the #else clause allows the tools
compiler.h header to drop the check for a definition entirely and keeps
both definitions together.
Change any __fallthrough statements to fallthrough anywhere it was used
within perf.
This allows other tools to use the same key word as the kernel.
Committer notes:
Did some missing conversions to:
builtin-list.c
Also included gtk.h before the 'fallthrough' definition in:
tools/perf/ui/gtk/hists.c
tools/perf/ui/gtk/helpline.c
tools/perf/ui/gtk/browser.c
As it is the arg name for a macro in glib.h:
/var/home/acme/git/perf-tools-next/tools/include/linux/compiler-gcc.h:16:55: error: missing binary operator before token "("
16 | # define fallthrough __attribute__((__fallthrough__))
| ^
/usr/include/glib-2.0/glib/gmacros.h:637:28: note: in expansion of macro ‘fallthrough’
637 | #if g_macro__has_attribute(fallthrough)
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Tom Rix <trix@redhat.com>
Cc: linux-sparse@vger.kernel.org <linux-sparse@vger.kernel.org>
Cc: llvm@lists.linux.dev <llvm@lists.linux.dev>
Link: https://lore.kernel.org/r/20221125154947.2163498-1-Liam.Howlett@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When using dynamically assigned CoreSight trace IDs the drivers can output
the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet.
Update cs-etm decoder to handle this packet by setting the CPU/Trace ID
mapping.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Trace IDs are now dynamically allocated.
Previously used the static association algorithm that is no longer
used. The 'cpu * 2 + seed' was outdated and broken for systems with high
core counts (>46). as it did not scale and was broken for larger
core counts.
Trace ID will now be sent in PERF_RECORD_AUX_OUTPUT_HW_ID record.
Legacy ID algorithm renamed and retained for limited backward
compatibility use.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The information to associate Trace ID and CPU will be changing.
Drivers will start outputting this as a hardware ID packet in the data
file which if present will be used in preference to the AUXINFO values.
To prepare for this we provide a helper functions to do the individual ID
mapping, and one to extract the IDs from the completed metadata blocks.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Move the x86 documentation under Documentation/arch/ as a way of cleaning
up the top-level directory and making the structure of our docs more
closely match the structure of the source directories it describes.
All in-kernel references to the old paths have been updated.
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Mark reported that the ORC unwinder incorrectly marks an unwind as
reliable when the unwind terminates prematurely in the dark corners of
return_to_handler() due to lack of information about the next frame.
The problem is UNWIND_HINT_EMPTY is used in two different situations:
1) The end of the kernel stack unwind before hitting user entry, boot
code, or fork entry
2) A blind spot in ORC coverage where the unwinder has to bail due to
lack of information about the next frame
The ORC unwinder has no way to tell the difference between the two.
When it encounters an undefined stack state with 'end=1', it blindly
marks the stack reliable, which can break the livepatch consistency
model.
Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and
UNWIND_HINT_END_OF_STACK.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
|
|
The ENTRY unwind hint type is serving double duty as both an empty
unwind hint and an unret validation annotation.
Unret validation is unrelated to unwinding. Separate it out into its own
annotation.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ff7448d492ea21b86d8a90264b105fbd0d751077.1677683419.git.jpoimboe@kernel.org
|
|
Unwind hints and ORC entry types are two distinct things. Separate them
out more explicitly.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/cc879d38fff8a43f8f7beb2fd56e35a5a384d7cd.1677683419.git.jpoimboe@kernel.org
|
|
Reduce the amount of header sync churn by splitting the shared objtool.h
types into a new file.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/dec622720851210ceafa12d4f4c5f9e73c832152.1677683419.git.jpoimboe@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Add Adrian Hunter to MAINTAINERS as a perf tools reviewer
- Sync various tools/ copies of kernel headers with the kernel sources,
this time trying to avoid first merging with upstream to then update
but instead copy from upstream so that a merge is avoided and the end
result after merging this pull request is the one expected,
tools/perf/check-headers.sh (mostly) happy, less warnings while
building tools/perf/
- Fix counting when initial delay configured by setting
perf_attr.enable_on_exec when starting workloads from the perf
command line
- Don't avoid emitting a PERF_RECORD_MMAP2 in 'perf inject
--buildid-all' when that record comes with a build-id, otherwise we
end up not being able to resolve symbols
- Don't use comma as the CSV output separator the "stat+csv_output"
test, as comma can appear on some tests as a modifier for an event,
use @ instead, ditto for the JSON linter test
- The offcpu test was looking for some bits being set on
task_struct->prev_state without masking other bits not important for
this specific 'perf test', fix it
* tag 'perf-tools-fixes-for-v6.3-1-2023-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf tools: Add Adrian Hunter to MAINTAINERS as a reviewer
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
tools headers x86 cpufeatures: Sync with the kernel sources
tools include UAPI: Sync linux/vhost.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources
tools include UAPI: Synchronize linux/fcntl.h with the kernel sources
tools headers: Synchronize {linux,vdso}/bits.h with the kernel sources
tools headers UAPI: Sync linux/prctl.h with the kernel sources
tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
perf stat: Fix counting when initial delay configured
tools headers svm: Sync svm headers with the kernel sources
perf test: Avoid counting commas in json linter
perf tests stat+csv_output: Switch CSV separator to @
perf inject: Fix --buildid-all not to eat up MMAP2
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf test: Fix offcpu test prev_state check
|
|
To pick up the changes in this cset:
cbdb1f163af2bb90 ("vdso/bits.h: Add BIT_ULL() for the sake of consistency")
That just causes perf to rebuild, the macro included doesn't clash with
anything in tools/{perf,objtool,bpf}.
This addresses this perf build warning:
Warning: Kernel ABI header at 'tools/include/linux/bits.h' differs from latest version at 'include/linux/bits.h'
diff -u tools/include/linux/bits.h include/linux/bits.h
Warning: Kernel ABI header at 'tools/include/vdso/bits.h' differs from latest version at 'include/vdso/bits.h'
diff -u tools/include/vdso/bits.h include/vdso/bits.h
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Pick up dependencies - freshly merged upstream via xen-next - before applying
dependent objtool changes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add a 'signal' field which allows unwind hints to specify whether the
instruction pointer should be taken literally (like for most interrupts
and exceptions) rather than decremented (like for call stack return
addresses) when used to find the next ORC entry.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org
|
|
To pick up the changes in:
07a368b3f55a79d3 ("bug: introduce ASSERT_STRUCT_OFFSET")
This cset only introduces a build time assert macro, that may be useful
at some point for tooling, for now it silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/linux/build_bug.h' differs from latest version at 'include/linux/build_bug.h'
diff -u tools/include/linux/build_bug.h include/linux/build_bug.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/Y8f0jqQFYDAOBkHx@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on (see merge
commit 382b5b87a97d: "Fix a number of issues with MTE, such as
races on the tags being initialised vs the PG_mte_tagged flag as
well as the lack of support for VM_SHARED when KVM is involved.
Patches from Catalin Marinas and Peter Collingbourne").
- Merge the pKVM shadow vcpu state tracking that allows the
hypervisor to have its own view of a vcpu, keeping that state
private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB
pages only) as a prefix of the oncoming support for 4kB and 16kB
pages.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
s390:
- Second batch of the lazy destroy patches
- First batch of KVM changes for kernel virtual != physical address
support
- Removal of a unused function
x86:
- Allow compiling out SMM support
- Cleanup and documentation of SMM state save area format
- Preserve interrupt shadow in SMM state save area
- Respond to generic signals during slow page faults
- Fixes and optimizations for the non-executable huge page errata
fix.
- Reprogram all performance counters on PMU filter change
- Cleanups to Hyper-V emulation and tests
- Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
guest running on top of a L1 Hyper-V hypervisor)
- Advertise several new Intel features
- x86 Xen-for-KVM:
- Allow the Xen runstate information to cross a page boundary
- Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
- Add support for 32-bit guests in SCHEDOP_poll
- Notable x86 fixes and cleanups:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
a few years back when eliminating unnecessary barriers when
switching between vmcs01 and vmcs02.
- Clean up vmread_error_trampoline() to make it more obvious that
params must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL
irrespective of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM
incorrectly thinking a guest needs TSC scaling when running on a
CPU with a constant TSC, but no hardware-enumerated TSC
frequency.
- Advertise (on AMD) that the SMM_CTL MSR is not supported
- Remove unnecessary exports
Generic:
- Support for responding to signals during page faults; introduces
new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
Selftests:
- Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
- Fix build errors that occur in certain setups (unsure exactly what
is unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
- Introduce actual atomics for clear/set_bit() in selftests
- Add support for pinning vCPUs in dirty_log_perf_test.
- Rename the so called "perf_util" framework to "memstress".
- Add a lightweight psuedo RNG for guest use, and use it to randomize
the access pattern and write vs. read percentage in the memstress
tests.
- Add a common ucall implementation; code dedup and pre-work for
running SEV (and beyond) guests in selftests.
- Provide a common constructor and arch hook, which will eventually
be used by x86 to automatically select the right hypercall (AMD vs.
Intel).
- A bunch of added/enabled/fixed selftests for ARM64, covering
memslots, breakpoints, stage-2 faults and access tracking.
- x86-specific selftest changes:
- Clean up x86's page table management.
- Clean up and enhance the "smaller maxphyaddr" test, and add a
related test to cover generic emulation failure.
- Clean up the nEPT support checks.
- Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
- Fix an ordering issue in the AMX test introduced by recent
conversions to use kvm_cpu_has(), and harden the code to guard
against similar bugs in the future. Anything that tiggers
caching of KVM's supported CPUID, kvm_cpu_has() in this case,
effectively hides opt-in XSAVE features if the caching occurs
before the test opts in via prctl().
Documentation:
- Remove deleted ioctls from documentation
- Clean up the docs for the x86 MSR filter.
- Various fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
KVM: x86: Add proper ReST tables for userspace MSR exits/flags
KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
KVM: arm64: selftests: Align VA space allocator with TTBR0
KVM: arm64: Fix benign bug with incorrect use of VA_BITS
KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
KVM: x86: Advertise that the SMM_CTL MSR is not supported
KVM: x86: remove unnecessary exports
KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
tools: KVM: selftests: Convert clear/set_bit() to actual atomics
tools: Drop "atomic_" prefix from atomic test_and_set_bit()
tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
perf tools: Use dedicated non-atomic clear/set bit helpers
tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
KVM: arm64: selftests: Enable single-step without a "full" ucall()
KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
KVM: Remove stale comment about KVM_REQ_UNHALT
KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
KVM: Reference to kvm_userspace_memory_region in doc and comments
KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
...
|
|
x86 Xen-for-KVM:
* Allow the Xen runstate information to cross a page boundary
* Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
* add support for 32-bit guests in SCHEDOP_poll
x86 fixes:
* One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
* Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
* Clean up the MSR filter docs.
* Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
* Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
* Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
* Advertise (on AMD) that the SMM_CTL MSR is not supported
* Remove unnecessary exports
Selftests:
* Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
* Fix an ordering issue in the AMX test introduced by recent conversions
to use kvm_cpu_has(), and harden the code to guard against similar bugs
in the future. Anything that tiggers caching of KVM's supported CPUID,
kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
the caching occurs before the test opts in via prctl().
* Fix build errors that occur in certain setups (unsure exactly what is
unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
* Introduce actual atomics for clear/set_bit() in selftests
Documentation:
* Remove deleted ioctls from documentation
* Various fixes
|
|
Drop tools' non-atomic test_and_set_bit() and test_and_clear_bit() helpers
now that all users are gone. The names will be claimed in the future for
atomic versions.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Copy bitfield.h from include/linux/bitfield.h. A subsequent change will
make use of some FIELD_{GET,PREP} macros defined in this header.
The header was copied as-is, no changes needed.
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-6-ricarkol@google.com
|
|
The current find_{symbol,func}_containing() functions are broken in
the face of overlapping symbols, exactly the case that is needed for a
new ibt/endbr supression.
Import interval_tree_generic.h into the tools tree and convert the
symbol tree to an interval tree to support proper range stabs.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111146.330203761@infradead.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
|
|
Pull bitmap updates from Yury Norov:
- Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld)
- cleanup nr_cpu_ids vs nr_cpumask_bits mess (me)
This series cleans that mess and adds new config FORCE_NR_CPUS that
allows to optimize cpumask subsystem if the number of CPUs is known
at compile-time.
- optimize find_bit() functions (me)
Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT()
macros.
- add find_nth_bit() (me)
Adds find_nth_bit(), which is ~70 times faster than bitcounting with
for_each() loop:
for_each_set_bit(bit, mask, size)
if (n-- == 0)
return bit;
Also adds bitmap_weight_and() to let people replace this pattern:
tmp = bitmap_alloc(nbits);
bitmap_and(tmp, map1, map2, nbits);
weight = bitmap_weight(tmp, nbits);
bitmap_free(tmp);
with a single bitmap_weight_and() call.
- repair cpumask_check() (me)
After switching cpumask to use nr_cpu_ids, cpumask_check() started
generating many false-positive warnings. This series fixes it.
- Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin
Schneider)
Extends the API with one more function and applies it in sched/core.
* tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits)
sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot()
lib/test_cpumask: Add for_each_cpu_and(not) tests
cpumask: Introduce for_each_cpu_andnot()
lib/find_bit: Introduce find_next_andnot_bit()
cpumask: fix checking valid cpu range
lib/bitmap: add tests for for_each() loops
lib/find: optimize for_each() macros
lib/bitmap: introduce for_each_set_bit_wrap() macro
lib/find_bit: add find_next{,_and}_bit_wrap
cpumask: switch for_each_cpu{,_not} to use for_each_bit()
net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}
cpumask: add cpumask_nth_{,and,andnot}
lib/bitmap: remove bitmap_ord_to_pos
lib/bitmap: add tests for find_nth_bit()
lib: add find_nth{,_and,_andnot}_bit()
lib/bitmap: add bitmap_weight_and()
lib/bitmap: don't call __bitmap_weight() in kernel code
tools: sync find_bit() implementation
lib/find_bit: optimize find_next_bit() functions
lib/find_bit: create find_first_zero_bit_le()
...
|
|
Pull Rust introductory support from Kees Cook:
"The tree has a recent base, but has fundamentally been in linux-next
for a year and a half[1]. It's been updated based on feedback from the
Kernel Maintainer's Summit, and to gain recent Reviewed-by: tags.
Miguel is the primary maintainer, with me helping where needed/wanted.
Our plan is for the tree to switch to the standard non-rebasing
practice once this initial infrastructure series lands.
The contents are the absolute minimum to get Rust code building in the
kernel, with many more interfaces[2] (and drivers - NVMe[3], 9p[4], M1
GPU[5]) on the way.
The initial support of Rust-for-Linux comes in roughly 4 areas:
- Kernel internals (kallsyms expansion for Rust symbols, %pA format)
- Kbuild infrastructure (Rust build rules and support scripts)
- Rust crates and bindings for initial minimum viable build
- Rust kernel documentation and samples
Rust support has been in linux-next for a year and a half now, and the
short log doesn't do justice to the number of people who have
contributed both to the Linux kernel side but also to the upstream
Rust side to support the kernel's needs. Thanks to these 173 people,
and many more, who have been involved in all kinds of ways:
Miguel Ojeda, Wedson Almeida Filho, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Adam Bratschi-Kaye, Benno Lossin,
Maciej Falkowski, Finn Behrens, Sven Van Asbroeck, Asahi Lina, FUJITA
Tomonori, John Baublitz, Wei Liu, Geoffrey Thomas, Philip Herron,
Arthur Cohen, David Faust, Antoni Boucher, Philip Li, Yujie Liu,
Jonathan Corbet, Greg Kroah-Hartman, Paul E. McKenney, Josh Triplett,
Kent Overstreet, David Gow, Alice Ryhl, Robin Randhawa, Kees Cook,
Nick Desaulniers, Matthew Wilcox, Linus Walleij, Joe Perches, Michael
Ellerman, Petr Mladek, Masahiro Yamada, Arnaldo Carvalho de Melo,
Andrii Nakryiko, Konstantin Shelekhin, Rasmus Villemoes, Konstantin
Ryabitsev, Stephen Rothwell, Andy Shevchenko, Sergey Senozhatsky, John
Paul Adrian Glaubitz, David Laight, Nathan Chancellor, Jonathan
Cameron, Daniel Latypov, Shuah Khan, Brendan Higgins, Julia Lawall,
Laurent Pinchart, Geert Uytterhoeven, Akira Yokosawa, Pavel Machek,
David S. Miller, John Hawley, James Bottomley, Arnd Bergmann,
Christian Brauner, Dan Robertson, Nicholas Piggin, Zhouyi Zhou, Elena
Zannoni, Jose E. Marchesi, Leon Romanovsky, Will Deacon, Richard
Weinberger, Randy Dunlap, Paolo Bonzini, Roland Dreier, Mark Brown,
Sasha Levin, Ted Ts'o, Steven Rostedt, Jarkko Sakkinen, Michal
Kubecek, Marco Elver, Al Viro, Keith Busch, Johannes Berg, Jan Kara,
David Sterba, Connor Kuehl, Andy Lutomirski, Andrew Lunn, Alexandre
Belloni, Peter Zijlstra, Russell King, Eric W. Biederman, Willy
Tarreau, Christoph Hellwig, Emilio Cobos Álvarez, Christian Poveda,
Mark Rousskov, John Ericson, TennyZhuang, Xuanwo, Daniel Paoliello,
Manish Goregaokar, comex, Josh Stone, Stephan Sokolow, Philipp Krones,
Guillaume Gomez, Joshua Nelson, Mats Larsen, Marc Poulhiès, Samantha
Miller, Esteban Blanc, Martin Schmidt, Martin Rodriguez Reboredo,
Daniel Xu, Viresh Kumar, Bartosz Golaszewski, Vegard Nossum, Milan
Landaverde, Dariusz Sosnowski, Yuki Okushi, Matthew Bakhtiari, Wu
XiangCheng, Tiago Lam, Boris-Chengbiao Zhou, Sumera Priyadarsini,
Viktor Garske, Niklas Mohrin, Nándor István Krácser, Morgan Bartlett,
Miguel Cano, Léo Lanteri Thauvin, Julian Merkle, Andreas Reindl,
Jiapeng Chong, Fox Chen, Douglas Su, Antonio Terceiro, SeongJae Park,
Sergio González Collado, Ngo Iok Ui (Wu Yu Wei), Joshua Abraham,
Milan, Daniel Kolsoi, ahomescu, Manas, Luis Gerhorst, Li Hongyu,
Philipp Gesang, Russell Currey, Jalil David Salamé Messina, Jon Olson,
Raghvender, Angelos, Kaviraj Kanagaraj, Paul Römer, Sladyn Nunes,
Mauro Baladés, Hsiang-Cheng Yang, Abhik Jain, Hongyu Li, Sean Nash,
Yuheng Su, Peng Hao, Anhad Singh, Roel Kluin, Sara Saa, Geert
Stappers, Garrett LeSage, IFo Hancroft, and Linus Torvalds"
Link: https://lwn.net/Articles/849849/ [1]
Link: https://github.com/Rust-for-Linux/linux/commits/rust [2]
Link: https://github.com/metaspace/rust-linux/commit/d88c3744d6cbdf11767e08bad56cbfb67c4c96d0 [3]
Link: https://github.com/wedsonaf/linux/commit/9367032607f7670de0ba1537cf09ab0f4365a338 [4]
Link: https://github.com/AsahiLinux/linux/commits/gpu/rust-wip [5]
* tag 'rust-v6.1-rc1' of https://github.com/Rust-for-Linux/linux: (27 commits)
MAINTAINERS: Rust
samples: add first Rust examples
x86: enable initial Rust support
docs: add Rust documentation
Kbuild: add Rust support
rust: add `.rustfmt.toml`
scripts: add `is_rust_module.sh`
scripts: add `rust_is_available.sh`
scripts: add `generate_rust_target.rs`
scripts: add `generate_rust_analyzer.py`
scripts: decode_stacktrace: demangle Rust symbols
scripts: checkpatch: enable language-independent checks for Rust
scripts: checkpatch: diagnose uses of `%pA` in the C side as errors
vsprintf: add new `%pA` format specifier
rust: export generated symbols
rust: add `kernel` crate
rust: add `bindings` crate
rust: add `macros` crate
rust: add `compiler_builtins` crate
rust: adapt `alloc` crate to the kernel
...
|
|
Rust symbols can become quite long due to namespacing introduced
by modules, types, traits, generics, etc. For instance,
the following code:
pub mod my_module {
pub struct MyType;
pub struct MyGenericType<T>(T);
pub trait MyTrait {
fn my_method() -> u32;
}
impl MyTrait for MyGenericType<MyType> {
fn my_method() -> u32 {
42
}
}
}
generates a symbol of length 96 when using the upcoming v0 mangling scheme:
_RNvXNtCshGpAVYOtgW1_7example9my_moduleINtB2_13MyGenericTypeNtB2_6MyTypeENtB2_7MyTrait9my_method
At the moment, Rust symbols may reach up to 300 in length.
Setting 512 as the maximum seems like a reasonable choice to
keep some headroom.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com>
Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com>
Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Add support for kmem_cache_free_bulk() and kmem_cache_alloc_bulk() to the
radix tree test suite.
Link: https://lkml.kernel.org/r/20220906194824.2110408-6-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Sync find_first_bit() and find_next_bit() implementation with the
mother kernel.
Also, drop unused find_last_bit() and find_next_clump8().
Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
|
When gfp_types.h was split from gfp.h, it broke the radix test suite. Fix
the test suite by using gfp_types.h in the tools gfp.h header.
Link: https://lkml.kernel.org/r/20220902191923.1735933-1-willy@infradead.org
Fixes: cb5a065b4ea9 (headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO") eradicates
CC_HAS_ASM_GOTO, and in the process also causes the perf tool on x86 to
use asm_volatile_goto when compiling __GEN_RMWcc.
However, asm_volatile_goto is not declared in the perf tool headers,
which causes a compilation error:
In file included from tools/arch/x86/include/asm/atomic.h:7,
from tools/include/asm/atomic.h:6,
from tools/include/linux/atomic.h:5,
from tools/include/linux/refcount.h:41,
from tools/lib/perf/include/internal/cpumap.h:5,
from tools/perf/util/cpumap.h:7,
from tools/perf/util/env.h:7,
from tools/perf/util/header.h:12,
from pmu-events/pmu-events.c:9:
tools/arch/x86/include/asm/atomic.h: In function ‘atomic_dec_and_test’:
tools/arch/x86/include/asm/rmwcc.h:7:2: error: implicit declaration of function ‘asm_volatile_goto’ [-Werror=implicit-function-declaration]
asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \
^~~~~~~~~~~~~~~~~
Define asm_volatile_goto in compiler_types.h if not declared, like the
main kernel header files do.
Fixes: a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull bitmap updates from Yury Norov:
- fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo)
- optimize out non-atomic bitops on compile-time constants (Alexander
Lobakin)
- cleanup bitmap-related headers (Yury Norov)
- x86/olpc: fix 'logical not is only applied to the left hand side'
(Alexander Lobakin)
- lib/nodemask: inline wrappers around bitmap (Yury Norov)
* tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits)
lib/nodemask: inline next_node_in() and node_random()
powerpc: drop dependency on <asm/machdep.h> in archrandom.h
x86/olpc: fix 'logical not is only applied to the left hand side'
lib/cpumask: move some one-line wrappers to header file
headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure
headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>
headers/deps: mm: Optimize <linux/gfp.h> header dependencies
lib/cpumask: move trivial wrappers around find_bit to the header
lib/cpumask: change return types to unsigned where appropriate
cpumask: change return types to bool where appropriate
lib/bitmap: change type of bitmap_weight to unsigned long
lib/bitmap: change return types to bool where appropriate
arm: align find_bit declarations with generic kernel
iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
lib/test_bitmap: test the tail after bitmap_to_arr64()
lib/bitmap: fix off-by-one in bitmap_to_arr64()
lib: test_bitmap: add compile-time optimization/evaluations assertions
bitmap: don't assume compiler evaluates small mem*() builtins calls
net/ice: fix initializing the bitmap in the switch code
bitops: let optimize out non-atomic bitops on compile-time constants
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Introduce 'perf lock contention' subtool, using new lock contention
tracepoints and using BPF for in kernel aggregation and then
userspace processing using the perf tooling infrastructure for
resolving symbols, target specification, etc.
Since the new lock contention tracepoints don't provide lock names,
get up to 8 stack traces and display the first non-lock function
symbol name as a caller:
$ perf lock report -F acquired,contended,avg_wait,wait_total
Name acquired contended avg wait total wait
update_blocked_a... 40 40 3.61 us 144.45 us
kernfs_fop_open+... 5 5 3.64 us 18.18 us
_nohz_idle_balance 3 3 2.65 us 7.95 us
tick_do_update_j... 1 1 6.04 us 6.04 us
ep_scan_ready_list 1 1 3.93 us 3.93 us
Supports the usual 'perf record' + 'perf report' workflow as well as
a BCC/bpftrace like mode where you start the tool and then press
control+C to get results:
$ sudo perf lock contention -b
^C
contended total wait max wait avg wait type caller
42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20
23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a
6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30
3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c
1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115
1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148
2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b
1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06
2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f
1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c
...
- Add new 'perf kwork' tool to trace time properties of kernel work
(such as softirq, and workqueue), uses eBPF skeletons to collect info
in kernel space, aggregating data that then gets processed by the
userspace tool, e.g.:
# perf kwork report
Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
----------------------------------------------------------------------------------------------------
nvme0q5:130 | 004 | 1.101 ms | 49 | 0.051 ms | 26035.056403 s | 26035.056455 s |
amdgpu:162 | 002 | 0.176 ms | 9 | 0.046 ms | 26035.268020 s | 26035.268066 s |
nvme0q24:149 | 023 | 0.161 ms | 55 | 0.009 ms | 26035.655280 s | 26035.655288 s |
nvme0q20:145 | 019 | 0.090 ms | 33 | 0.014 ms | 26035.939018 s | 26035.939032 s |
nvme0q31:156 | 030 | 0.075 ms | 21 | 0.010 ms | 26035.052237 s | 26035.052247 s |
nvme0q8:133 | 007 | 0.062 ms | 12 | 0.021 ms | 26035.416840 s | 26035.416861 s |
nvme0q6:131 | 005 | 0.054 ms | 22 | 0.010 ms | 26035.199919 s | 26035.199929 s |
nvme0q19:144 | 018 | 0.052 ms | 14 | 0.010 ms | 26035.110615 s | 26035.110625 s |
nvme0q7:132 | 006 | 0.049 ms | 13 | 0.007 ms | 26035.125180 s | 26035.125187 s |
nvme0q18:143 | 017 | 0.033 ms | 14 | 0.007 ms | 26035.169698 s | 26035.169705 s |
nvme0q17:142 | 016 | 0.013 ms | 1 | 0.013 ms | 26035.565147 s | 26035.565160 s |
enp5s0-rx-0:164 | 006 | 0.004 ms | 4 | 0.002 ms | 26035.928882 s | 26035.928884 s |
enp5s0-tx-0:166 | 008 | 0.003 ms | 3 | 0.002 ms | 26035.870923 s | 26035.870925 s |
--------------------------------------------------------------------------------------------------------
See commit log messages for more examples with extra options to limit
the events time window, etc.
- Add support for new AMD IBS (Instruction Based Sampling) features:
With the DataSrc extensions, the source of data can be decoded among:
- Local L3 or other L1/L2 in CCX.
- A peer cache in a near CCX.
- Data returned from DRAM.
- A peer cache in a far CCX.
- DRAM address map with "long latency" bit set.
- Data returned from MMIO/Config/PCI/APIC.
- Extension Memory (S-Link, GenZ, etc - identified by the CS target
and/or address map at DF's choice).
- Peer Agent Memory.
- Support hardware tracing with Intel PT on guest machines, combining
the traces with the ones in the host machine.
- Add a "-m" option to 'perf buildid-list' to show kernel and modules
build-ids, to display all of the information needed to do external
symbolization of kernel stack traces, such as those collected by
bpf_get_stackid().
- Add arch TSC frequency information to perf.data file headers.
- Handle changes in the binutils disassembler function signatures in
perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer).
- Fix building the perf perl binding with the newest gcc in distros
such as fedora rawhide, where some new warnings were breaking the
build as perf uses -Werror.
- Add 'perf test' entry for branch stack sampling.
- Add ARM SPE system wide 'perf test' entry.
- Add user space counter reading tests to 'perf test'.
- Build with python3 by default, if available.
- Add python converter script for the vendor JSON event files.
- Update vendor JSON files for most Intel cores.
- Add vendor JSON File for Intel meteorlake.
- Add Arm Cortex-A78C and X1C JSON vendor event files.
- Add workaround to symbol address reading from ELF files without phdr,
falling back to the previoous equation.
- Convert legacy map definition to BTF-defined in the perf BPF script
test.
- Rework prologue generation code to stop using libbpf deprecated APIs.
- Add default hybrid events for 'perf stat' on x86.
- Add topdown metrics in the default 'perf stat' on the hybrid machines
(big/little cores).
- Prefer sampled CPU when exporting JSON in 'perf data convert'
- Fix ('perf stat CSV output linter') and ("Check branch stack
sampling") 'perf test' entries on s390.
* tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
perf stat: Refactor __run_perf_stat() common code
perf lock: Print the number of lost entries for BPF
perf lock: Add --map-nr-entries option
perf lock: Introduce struct lock_contention
perf scripting python: Do not build fail on deprecation warnings
genelf: Use HAVE_LIBCRYPTO_SUPPORT, not the never defined HAVE_LIBCRYPTO
perf build: Suppress openssl v3 deprecation warnings in libcrypto feature test
perf parse-events: Break out tracepoint and printing
perf parse-events: Don't #define YY_EXTRA_TYPE
tools bpftool: Don't display disassembler-four-args feature test
tools bpftool: Fix compilation error with new binutils
tools bpf_jit_disasm: Don't display disassembler-four-args feature test
tools bpf_jit_disasm: Fix compilation error with new binutils
tools perf: Fix compilation error with new binutils
tools include: add dis-asm-compat.h to handle version differences
tools build: Don't display disassembler-four-args feature test
tools build: Add feature test for init_disassemble_info API changes
perf test: Add ARM SPE system wide test
perf tools: Rework prologue generation code
perf bpf: Convert legacy map definition to BTF-defined
...
|